Saturday, February 13, 2010

SQLAlchemy and the Stackoverflow Data Dump


We recently wrote an rdbhost module for SQLAlchemy (SQLA), allowing you to use Rdbhost databases through SQLAlchemy. It was mentioned in this blog about a month ago, here.

In the last couple of days I uploaded the February Stackoverflow data dump to our SO database here (account 767), and placed an announcement on meta.SO, including a mention of the DB API module. If you are an SQLAlchemy fan, and wish to explore the SO data via SQLAlchemy, this post will help get you started.

You do need SQLAlchemy installed, and it is installable from easy_install, or manually from a downloaded copy. You need the rdbhost-module, as well, placed in the databases folder within your SQLAlchemy install. SQLA will find it there.

Example:
Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32
IDLE 1.2.4  
>>> from sqlalchemy import *
>>> role = 'r0000000767'
>>> authcode = '-'
>>> dsn = 'rdbhost://'+role+':'+authcode+'@www.rdbhost.com'
>>> db = create_engine(dsn)
>>> meta = MetaData(db)
>>> meta.reflect(schema='so')
... elided warnings from sqlalchemy about partial indexes ...
>>> for t in meta.tables:
....  print t

so.posts
so.badges
so.tags
so.users
so.votes
so.comments
so.tagging
>>> posts = meta.tables['so.posts']
>>> print posts.columns
['posts.id', 'posts.type', 'posts.parentid', 'posts.title', 
'posts.creation', 'posts.owner', 'posts.accepted_answer', 
'posts.score', 'posts.viewcount', 'posts.body',  
'posts.lasteditor', 'posts.lasteditorname', 
'posts.lasteditdate', 'posts.lastactivitydate', 
'posts.communityowneddate', 'posts.closeddate', 
'posts.tags', 'posts.answercount',  
'posts.commentcount', 'posts.favoritecount']
>>> from sqlalchemy import select
>>> users = meta.tables['so.users']
>>> s = select([users],users.c.reputation > 100000,limit=50)
>>> recs = db.execute(s)
>>> for user in recs:
.... print user.name, user.reputation

Marc Gravell 109683
Jon Skeet 134930
>>>

No comments:

Post a Comment