Tuesday, February 16, 2010

Rdbhost on twitter

We are not really active on twitter, basically just letting twitterfeed create tweets based on blog posts here, but...

if you keep up with things via twitter rather than a blog-reader, then you can get announcements of new blogposts here from our twitter feed.

www.twitter.com/rdbhost

RdbHost on ProgrammableWeb blog

We got a mention on ProgrammableWeb.com, in their API News page. Thanks, PW!!

See:

5 new APIs

Saturday, February 13, 2010

SQLAlchemy and the Stackoverflow Data Dump


We recently wrote an rdbhost module for SQLAlchemy (SQLA), allowing you to use Rdbhost databases through SQLAlchemy. It was mentioned in this blog about a month ago, here.

In the last couple of days I uploaded the February Stackoverflow data dump to our SO database here (account 767), and placed an announcement on meta.SO, including a mention of the DB API module. If you are an SQLAlchemy fan, and wish to explore the SO data via SQLAlchemy, this post will help get you started.

You do need SQLAlchemy installed, and it is installable from easy_install, or manually from a downloaded copy. You need the rdbhost-module, as well, placed in the databases folder within your SQLAlchemy install. SQLA will find it there.

Example:
Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32
IDLE 1.2.4  
>>> from sqlalchemy import *
>>> role = 'r0000000767'
>>> authcode = '-'
>>> dsn = 'rdbhost://'+role+':'+authcode+'@www.rdbhost.com'
>>> db = create_engine(dsn)
>>> meta = MetaData(db)
>>> meta.reflect(schema='so')
... elided warnings from sqlalchemy about partial indexes ...
>>> for t in meta.tables:
....  print t

so.posts
so.badges
so.tags
so.users
so.votes
so.comments
so.tagging
>>> posts = meta.tables['so.posts']
>>> print posts.columns
['posts.id', 'posts.type', 'posts.parentid', 'posts.title', 
'posts.creation', 'posts.owner', 'posts.accepted_answer', 
'posts.score', 'posts.viewcount', 'posts.body',  
'posts.lasteditor', 'posts.lasteditorname', 
'posts.lasteditdate', 'posts.lastactivitydate', 
'posts.communityowneddate', 'posts.closeddate', 
'posts.tags', 'posts.answercount',  
'posts.commentcount', 'posts.favoritecount']
>>> from sqlalchemy import select
>>> users = meta.tables['so.users']
>>> s = select([users],users.c.reputation > 100000,limit=50)
>>> recs = db.execute(s)
>>> for user in recs:
.... print user.name, user.reputation

Marc Gravell 109683
Jon Skeet 134930
>>>

Friday, February 12, 2010

Bulk Data Transfer to/from Rdbhost

An upgrade to the server recently added the ability to upload and download data in bulk.

Downloading works very similarly to pg_dump, providing any of the three formats that pg_dump supports.

a) plaintext
b) tar file
c) compressed format

Uploading works like psql for plaintext, and pg_restore for the other two formats. In fact, the server feeds the input to those two utilities behind the scenes, so the results you get are exactly the same as if you were to use the utility directly. If you are uploading, drop in advance any existing tables that are in the upload, as the upload does not drop them for you.

The bulk transfer page is linked from the profile page, after logging in.

The upload streams the data to PostgreSQL, so processing starts when the upload starts; if you cancel the upload (by closing the browser window, for example) partway done, you will likely leave the database in an incomplete state. Remember to reclean the database before retrying the upload.