Inserting Data Into Tables

After having created and set up tables for an experiment they’re ready to have data loaded into them. In this example, I will load ascii files that have already been formatted into a structure that matches the tables to which I’m loading. I’ve again embedded the procedure in a python script, so I can load multiple ascii files (pertaining, say, to multiple experimental conditions and runs) pretty easily.

#!/usr/bin/python

import commands
import os
import sys
import time
import MySQLdb

subjs = ['6']
conditions = ['classaction','classobject','veraction','verobject']
hemis = ['lh','rh']
runs = ['1','2','3','4','5','6']

try:
    connection = MySQLdb.connect(read_default_file="~/.my.cnf",db="yourdb" )
except MySQLdb.Error, e:
    print "Error %d: %s" % (e.args[0], e.args[1])
    sys.exit (1)

cursor = connection.cursor()

def insertdata_to_db():
    print "que hora es?\n"+time.ctime()
    try:
        file = "'/disks/gpfs/fmri/results/s0"+ss+"/SUMA/surface.data/s0"+ss+"."+h+"."+cc+"BT.run"+rr+".readyfordb.txt'"
        print "File loading: "+file
        insert_statement = "load data local infile "+file+" into table vertexdata fields terminated by ' ';"
        print "Insert statement: "+insert_statement
        cursor.execute(insert_statement)
        print "Number of rows inserted: %d" % cursor.rowcount
    except MySQLdb.Error, e:
        print "Error during s0"+ss+"_"+cc+"_"+h+" run "+rr+" %d -->> %s" % (e.args[0], e.args[1])
        sys.exit (1)

for ss in subjs:
    for cc in conditions:
        for rr in runs:
            for h in hemis:
                insertdata_to_db()

The script again uses the MySQLdb module. I’ve created a function called insertdata_to_db that uses this sql as an insert statement

load data local infile filename into table vertexdata fields terminated by ' ';

because I’m loading a file from the local fileshare, and the columns in this ascii file are separated by a space, denoted in the sql with fields terminated by ' '. The function also looks for exceptions and boots out if there is an error while loading. I then run this function within nested loops to iterate over subjs, conditions, runs, and hemis.

Creating Tables for your Experiment

[This assumes you've already set up yourself with a relational DBMS (MySQL and PostgreSQL are usually winners) and have access to create tables in your database. FYI: I'm using MySQL.]

Setting up a DBMS requires quite a bit of thought and foresight. The flexibility available is very broad, so the real constraints are more with the user rather than the infrastructure. Some questions to consider:

  • At what stage of processing do I want to put the data into tables (e.g. raw time series, motion-corrected time series, “cleaned” time series, just the HRFs for conditions)?
  • What information do I want to index (e.g. ROI identifiers, voxels, subject identifiers)?
  • What meta data do I want to hold?
  • Do I want primary key and foreign key usage?
  • What sort of relationships do I want between tables?
  • For first steps, it might be best to start simple and let the complexities emerge and be handled as more experience with the infrastructural tools is gained.

    For actually creating the tables in my database, I’ve found MySQL Administrator (part of the MySQL GUI Tools) to be extremely straightforward and helpful. Of course, procedures for creating tables can also be scripted and make use of simple SQL statements such as:

    create table afni_db_test (x INT, INDEX(x), y INT, INDEX(y), z INT, INDEX(z), consistent float, filler float, inconsistent float);
    

    In this line, the table created is called afni_db_test, columns are x, y, z, filler, and inconsistent, with x, y, and z being indices.
    And here is that statment embedded in a python script (using the MySQLdb module):

    #!/usr/bin/python
    
    import os
    import sys
    import commands
    import MySQLdb
    
    _user='you'
    _host='yourhost'
    _passwd='yourpassword'
    _db='test'
    
    try:
        connection = MySQLdb.connect(host=_host,user=_user,db=_db,passwd=_passwd)
    except MySQLdb.Error, e:
        print "Error %d: %s" % (e.args[0], e.args[1])
        sys.exit (1)
    
    cursor = connection.cursor()
    
    create_table_statement = 'create table afni_db_test (x INT, INDEX(x), y INT, INDEX(y), z INT, INDEX(z), consistent float, filler float, inconsistent float);'
    
    cursor.execute(create_table_statement)
    

    Choosing the type of storage engine you would like to use is also something to consider and dependent on what you’d like to do with your data. By default, MySQL uses MyISAM, but InnoDB is also popular. The InnoDB engine supports primary and foreign key constraints, while the MyISAM engine does not. Changing the type of storage engine is very easy in the MySQL Administrator GUI. Just go to the ‘Catalogs’ tab, select your Schemata and you get a list of the tables in your database. Then, double-click the table you want and its schema will open in a new window. There, just choose the ‘Table Options’ tab and in the drop-down menu you can select a new storage engine (click for bigger image):

    More about storage engines available with MySQL can be found here.

    Simplified Data Visualization #1

    This is a simple Python script that uses the MySQLdb module and AFNI’s DriveSuma program for querying a database and taking jpeg snapshots to get an immediate representation of surface data. This particular code iterates over a set of subjects – established here via an environment variable called “LV_SS_LIST” that is a list of subject number identifiers for a particular group of subjects that were given the same experimental stimuli. The query:
    select emblemfemlh.voxel, round(speechT,4) from emblemfemlh where speechT > 1.796 and subject = "+ss+";"
    selects the voxel and a T value for the speech condition “speechT” with the parameters that those T values are greater than a threshold (1.796) for each subject.

    #!/usr/bin/python
    # this is to get query output onto SUMA brains
    
    import commands
    import os
    import sys
    import MySQLdb
    
    subjs = os.getenv("LV_SS_LIST").split()
    
    os.system("suma -niml &" )
    os.system("DriveSuma -com show_surf -label mamba -i_fs $SUBJECTS_DIR/HORRY/SUMA/HORRY.lh.mesh140_std.smoothwm.asc -com viewer_cont -load_view $gpfsemb/sweetview.niml.vvs -key ctrl+right -key R" )
    
    #--------------------------- connection mechanism, query and creation of tmp file ---------------
    try:
            connection = MySQLdb.connect(db = "yourdb",read_default_file="~/.my.cnf" )
    except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])
            sys.exit (1)
    
    for ss in subjs:
            id = commands.getoutput("echo $$" )
            outfile = "/tmp/outfile."+id+".threshd"+ss+".1D"
            query = "select emblemfemlh.voxel, round(speechT,4) from emblemfemlh where speechT > 1.796 and subject = "+ss+";"
            cursor = connection.cursor()
            cursor.execute(query)
            result = list(cursor.fetchall())
            bully = ''
            for rr in list(result):
                    bully += (str(rr[0])+" "+str(rr[1]))+"\n"
    
            file = open(outfile,"w" )
            file.write(bully)
            file.close()
            os.system("DriveSuma -com surf_cont -load_dset "+outfile+" -com surf_cont -1_only y -I_sb 1 -I_range 10 -switch_cmap afni_p11 -com recorder_cont -save_as "+outfile+".jpg" )
    
    #---------------------------- kill suma below ----------------
    os.system("DriveSuma -com kill_suma" )
    

    MySQL for Python

    MySQLdb is the Python DB API-2.0 interface. This module has proven very useful for coding db queries into procedures in Python (my preferred coding language). Below is a simple procedure using this module that performs a query and writes the result to a file, where it can be used for analysis in R:

    #!/usr/bin/python
    
    import commands
    import os
    import sys
    import MySQLdb
    
    try:
        connection = MySQLdb.connect(db="yourdb",read_default_file="~/.my.cnf" )
    except MySQLdb.Error, e:
        print "Error %d: %s" % (e.args[0], e.args[1])
        sys.exist (1)
    
    query = "select subject, vertex, speech_lag, emblem_lag, embspeech_lag from ccf_phase2_lh where seed_region = 'IDEAL' and vertex between 1 and 5;"
    
    outfile = os.getcwd()+"/test_aov.lh.txt"
    cursor = connection.cursor()
    cursor.execute(query)
    result = list(cursor.fetchall())
    bully = ''
    for rr in list(result):
        bully += str(rr[0])+" "+str(rr[1])+" "+str(rr[2])+" "+str(rr[3])+" "+str(rr[4])+"\n"
    
    file = open(outfile,'w' )
    file.write(bully)
    file.close()
    

    MySQLdb can be found here.

    Posted in DBMS, Python. Tags: , . 1 Comment »