Friday, August 28, 2009

JsonP


Rdbhost.com now provides JSONP as a format option.

JSONP allows data to be requested from a site different from that of the referencing page. You can put a page on your domain somewhere, and retrieve data from www.rdbhost.com using a cross-site request and the JSONP format.

Let us look at a very simple example. Rdbhost maintains a table of performance statistics (on itself, about itself) in a table 'stats' in account db000000005 . We will query that table from our page for a count of records. The entire example page is 11 lines:

01 <html>
02 <body>
03 Count of records in 'stats': <span id="count"></span>
04 <script>
05 function showct(data){
06 document.getElementById("count").innerHTML = data['records']['rows'][0]["count"];
07 };
08 </script>
09 <script src="http://www.rdbhost.com/db/r0000000005?q=SELECT%20count(*)%20from%20stats&callback=showct"></script>
10 </body>
11 </html>
Lines 01 to 03 are just html to show the count when we get it. Lines 05 to 07 are a javascript function to process the count data after receiving it, and line 09 is the actual request.

Line 09 is a script tag: it retrieves the page at the url identified in 'src', and processes it as a script. The 'callback' parameter indicates the name to 'wrap' around the data, and implies the request needs JSONP output format. The content received is this:

showct({
"records": {
"header": {
"count": 20
},
"rows": [
{
"count": "5456"
}
]
},
"row_count": [
1,
"1 Rows Affected"
],
"status": [
"complete",
"OK"
]
})

When this is processed as a script, the 'showct(...)' is interpreted as a function call, and executed. The function had been defined in lines 05 to 07, and just extracted the 'count' element of the JSON and put it in the HTML element 'count', for display.

This page works cross-site, so you could host such a page on your site, under your domain, and retrieve records from rdbhost.com dynamically. If you prefer jQuery, you can retrieve JSONP content using the .getJSON function, including a 'callback' parameter in the url to retrieve. Using jQuery has the virtue that you do not need a named callback function, as jQuery will allow its typical anonymous callbacks even with JSONP.

Limitations:
  • Requests use 'GET' mode, not 'POST', so any authentication information would be there in the link for users to read.
  • Rdbhost does not permit authcodes to be passed in GET requests, for security reasons. JSONP usage is thus only useful for queries using the 'r' role, with no authentication. To query your own account, you would need to enable the 'r' role from the profile page, and probably GRANT it some SELECT privileges on tables in the account, using rdbadmin or the SQL_form.
Example pages can be found at the following links. Use view source to see the internals.


The request urls can easily get long enough to be unwieldy. These two pages demonstrate a technique for putting the url in a javascript variable, where it can be line-wrapped.

The Rdbhost website itself is at: http://www.rdbhost.com.

Monday, August 10, 2009

Stackoverflow Data

Stackoverflow.com, in case you haven't heard of it, is a very popular question and answer site for programmers.

It was developed by Jeff Atwood and Joel Spoelski, both well known bloggers on technology, and they produce an entertaining podcast on the development of stackoverflow.com and (more or less) related computer topics.

Anyway, to get to the point, the data at stackoverflow is all user generated, and licensed under Creative Commons. They release, about once a month, an xml dump of all the data, in a big archived file. The release includes all posts and all comments, and some user profile data.

With some pro-active assistance from Stéphane Bortzmeyer, I have imported the data from the August dump into an Rdbhost database and made it available to anonymous users, with SELECT privilege only (no changes to data permitted). The database engine behind the Rdbhost webservice is Postgresql.

The database is at: www.rdbhost.com/rdbadmin/main.html?r0000000767
Just click the login button; no authentication is required. The SQL admin software is a work-alike to Adminer (formerly phpMinAdmin), implemented in javascript.

Hopefully, the table and field names make sense and their meanings can be inferred. If you find the indexes to be insufficient for your purposes, email me or put a comment on this blog entry. There is a 3 second query duration limit, so any query needing a full-table scan on any of the larger tables will likely fail.

Have fun.

Edited: Added mention of Postgresql and made a few grammar/punctuation corrections.

Monday, August 3, 2009

Binary Content


Formats

The usual approach to using rdbhost is to request records, consisting of multiple fields, and that record data is incorporated into a structured JSON or XML page and delivered to requesting client. The client can then disassemble the structure to extract the data for use. Those JSON or XML formatted pages are text, and non-text data gets encoded/escaped to make it fit a textual structure.

Sometimes, though, you just want the raw data, without textual escaping. This would be useful for an image database, for example. You would like to just send the binary image data to the browser or other client, and have it processed without any text-binary decoding.


Binary Format

The 'binary' format does just this. If your request calls for a binary format, using the format parameter, all the fields of all the records of the output are just catenated together and the resulting blob is sent to the client. I would expect the typical use is querying for one field of one record, but if you have need to aggregate multiple fields together, it will do that. The data to be queried this way should generally be stored in bytea fields; bytea fields are delivered as-is, others are utf-8 decoded and non-decodable characters are escaped.

When you request binary format, the format parameter value can have content-type appended to it, delimited by a colon; for example "format=binary:image/jpeg" would result in the query data being sent as a binary blob, and with a 'content-type: image/jpeg' header line. If you are delivering the binary content directly to a browser, the content-type is very helpful to the browser in rendering the object correctly.

Inserting Binary Data into Database

Getting the binary data into the database is its own problem. Our sql_form supports http file uploads of binary data. Put the binary data in a file, and upload it through the form, and it will be processed by the server as binary. Unless you put it into a bytea field, it will be utf-8 encoded going into the table, and will be utf-8 decoded when delivered in response to a SELECT. The Python DB API module does not, today, support binary uploads, but that is planned for the next version. In the meantime, the PostgreSQL decode(...,'base64') function can be used to put binary data into a field, but you would need a client side base64 encoder to pre-encode the data.

Sample Page

A sample page can be viewed here. The full url is below, and it incorporates the account and the query all into the query string. The page itself is delivered in response to a database query, and the embedded image comes from another database query.

http://dev.rdbhost.com/db/r0000000763?q=select+data+from+d+where+tag%3d%27page%27&format=binary:text/html

That page was created using the sql_form form linked from the profile page, with SQL INSERTs and the file upload feature.


David

17 May 2010
Updated to correct url.