Sunday, November 28, 2010

Django?

I suspect that the Django ORM gets more use than either SQLObject or SQLAlchemy, just because it is part of the Django stack, included in Django installs by default. For this reason, I would like to include Rdbhost support in Django.

A day plus spent in pursuit of that goal has been fruitless. This post is a summary of my experience.

Before I announce support for an ORM or framework, I try to ensure that as much as possible of the framework's tests pass with this database. I found Django's testing tools difficult to understand and difficult to use.

I started testing in an app, with 'manage.py test', until someone cued me in to the runtests.py utility; the Pip/easy_install version of Django does not include most of the test suite, so when I found out, I switched to the tarball version.

Testing with runtests.py worked better, but still not great. Firstly, it seemed to ignore the '--fastfail' switch when the failure was an exception (an Error, in testing parlance), rather than a negative result of an assertion (a Fail). Most 'failures', for me, will be exceptions, and --fastfail should abort on them, as do unittest and py.test with the -x switch.

runtests.py doesn't abort on Error, and its error message is just 'ERROR', without a test name or line number or anything. Aborting with a ctl-C will abort the test run, but will not display the test report for prior failures. So the only way to get a test result (with test name and stack-trace) displayed for an exception is to wait for the batch of tests to complete, which could be 20-50 tests and take upwards of 10 minutes. This is an extremely tedious way to diagnose problems.

When you do have the name of a test, like 'modeltests.aggregration.tests.BaseAggregateTestCase', and try to run it specifically, with 'runtests.py modeltests.aggregration.tests.BaseAggregateTestCase', you get an error about 'test labels' and 'app...'

When I tried to runtests.py with a simple 'sqlite3' based settings file, I still got many errors, failing on 25-35% of the tests. This casts doubt on whether the test suite itself is maintained, and it is very frustrating to try to get a component to comply with test expectations when you cannot be sure the tests aren't themselves broken. The test runner seemed to find the json fixtures when running with the sqlite3 settings, but not with the rdbhost settings. Why would retrieving data from a file in the distro depend on what backend was configured?

The test suite looks fairly comprehensive, but without a test runner that will run specific tests in a predictable way, developing to the tests is not practical. The Django compatibility effort here is on the shelf for a while.

Tuesday, November 23, 2010

Page hosting at Rdbhost

We have long presented our service as a database hosting service, providing data querying for applications hosted somewhere else.

Now, you can host the whole thing right here.

Just stuff your html and JavaScript files into database blob fields, point your domain at our server
and register the domain name as belonging to your account here.  The registration is done from the 'Domain Alias' page, linked from the profile page.

Rdbhost will interpret a plain-jane URL as identifying specific data elements, and deliver your html and JavaScript blobs as pages to your users' browsers.

Setting the whole thing up involves a few steps, so it is probably not the easiest plan to develop incrementally with an Rdbhosted configuration;  instead, develop using a local server, as discussed in the DNS page, and move the JavaScript and html files to Postgresql field blobs for production deployment.

See a discussion of the features at:

http://www.rdbhost.com/user_domains.html

Monday, November 22, 2010

Query Timing

A small change has been made to add a data element to the data returned by queries.  The change applies to both the JSON variants and XML.

There is a new top-level element called 'times' which is an array of two decimal values.  The first decimal value is the duration, in seconds, of the processing the website front-end does prior to passing the query to the database back-end. The second is the duration of the database operation in the database backend.

A sample excerpt for JSON:
"status": [
        "complete", 
        "OK"
    ], 
    "times": [
        "0.036002", 
        "0.000000"
    ]

There is a third major element in the on-site processing of the query, and that is the time spent serializing the results into JSON or XML.  This time is not included in the time record, as the time array has to be populated prior to serializing.

You can measure the total round-trip time, yourself, from the client, but that time includes internet overhead.  These data items may be helpful in optimizing your queries.

As always, comments are welcome.

Monday, November 15, 2010

How to Train Your Database Server

Rdbhost allows SQL queries to be submitted from the browser and executed.   A cursory consideration of that model suggests security problems, in that whatever is in the browser can be manipulated, and user-generated queries can be submitted by any user using authentication information extracted from your source code.

We thought of that, and provide means to authenticate individual queries and prevent execution of unapproved queries.  The Rdbhost service supports four different roles; two of them allow free-form queries, and the other two only allow pre-authorized queries.

Roles for Free-form Queries
The first two, the s- (super) and r- (read) roles allow any arbitrary query to be executed.  The first is all-powerful, owning every resource in the database, and relies on an authcode (a long non-memorable password) to prevent use by arbitrary non-authorized users.  The second, the r- role, allows any query from any user, and relies on the Postgresql role privileges being set to prevent unwanted changes to the database.  Basically the r- role would have only SELECT privileges on relevant tables, allowing data to be read, but erroring on attempts to change the data.  The r- role provides the convenience of being able to change queries in your client source code without retraining the server.  Web applications, in my experience, tend to have many more read queries than write queries, both in access count and in number of distinct query strings.

Roles for Mutating Queries
The other two roles allow only pre-authorized queries, and are the roles subject to training.  These roles may
submit queries with either the SQL query verbatim, or with a keyword that retrieves the SQL from a lookup table.  The query SQL itself must already be in either the keyword lookup table or in a table of pre-authorized queries.  The keyword lookup table is called 'lookup.queries' and is intended for manual entry and maintenance.  The other table is called 'lookup.preauth_queries', and is intended for automated query entry.  These two roles are prefixed with 'p' and 'a'.  'p' is a mnemonic for public and has no authcode,  while 'a' is a mnemonic for 'authenticated', and does have an authcode.  Most apps will use 'r-' role and 'p-' roles exclusively.

Training
Queries are entered into the preauth_queries table via the training process.  The website has a page in the members area for enabling training.  Training is enabled for individual client IPs separately, so some clients can be using the table in the usual safe way, while your workstation client, registered as a trainer, is entering queries into it.   Once your workstation's IP is added to the trainers list, you can run any query from any role; when a query is received for a p- or a- role it is added to the preauth_queries table before executing.  So executing your apps test suite or simply exercising every feature of the app will put all necessary queries into the preauth_queries table.  You can then disable the training mode, test your app again, and release it.

Overview
The general approach to application development in Javascript is:
  1. Create an account, with its own database, on www.Rdbhost.com
  2. Create necessary roles from role_manager page
  3. Enable training for your workstation from training page
  4. Configure your hosts file to include necessary local and Rdhost server entries
  5. Run a local server; a tiny Python 8-liner will do
  6. Develop your application
  7. Develop your application (it's in here twice, because it's big)
  8. Develop a test suite (optional)
  9. Clear the preauth_queries table on the server; this removes development cruft
  10. Run your test suite, or just exercise all SQL-using features of the app
  11. Disable training for your workstation, from training page
  12. Test application, either by test suite or manually
  13. Release and distribute your app
A process of thirteen steps seems long, but only steps 6-8,10 and 12 are more than a few mouse-clicks worth of effort.

Related Links:

  1. Training Page
  2. PostgreSQL Roles on Rdbhost
  3. Javascript API module for Rdbhost

Your comments are welcome, to dkeeney@rdbhost.com, or in the blogspot comment facility below.