MyISAM or InnoDB? (an ongoing question)

There seems to be posts like this that pop up weekly. Choosing the “right” storage engine is a basic first step in setting up the experimental workflow environment to operate in. Each has pros and cons, but it always comes back to what you want/expect to be able to do with your data.

What should I use MYISAM or INNODB?

Posted in DBMS. Tags: . Leave a Comment »

Drizzle

WTF? This may be applicable once it gets off the ground. A stripped down version of MySQL (that will also have the latest InnoDB architecture). Leaner and meaner options are always tempting.

Drizzle: Lightweight SQL Database for Cloud and Web

Posted in DBMS. Tags: . Leave a Comment »

Inserting Data Into Tables

After having created and set up tables for an experiment they’re ready to have data loaded into them. In this example, I will load ascii files that have already been formatted into a structure that matches the tables to which I’m loading. I’ve again embedded the procedure in a python script, so I can load multiple ascii files (pertaining, say, to multiple experimental conditions and runs) pretty easily.

#!/usr/bin/python

import commands
import os
import sys
import time
import MySQLdb

subjs = ['6']
conditions = ['classaction','classobject','veraction','verobject']
hemis = ['lh','rh']
runs = ['1','2','3','4','5','6']

try:
    connection = MySQLdb.connect(read_default_file="~/.my.cnf",db="yourdb" )
except MySQLdb.Error, e:
    print "Error %d: %s" % (e.args[0], e.args[1])
    sys.exit (1)

cursor = connection.cursor()

def insertdata_to_db():
    print "que hora es?\n"+time.ctime()
    try:
        file = "'/disks/gpfs/fmri/results/s0"+ss+"/SUMA/surface.data/s0"+ss+"."+h+"."+cc+"BT.run"+rr+".readyfordb.txt'"
        print "File loading: "+file
        insert_statement = "load data local infile "+file+" into table vertexdata fields terminated by ' ';"
        print "Insert statement: "+insert_statement
        cursor.execute(insert_statement)
        print "Number of rows inserted: %d" % cursor.rowcount
    except MySQLdb.Error, e:
        print "Error during s0"+ss+"_"+cc+"_"+h+" run "+rr+" %d -->> %s" % (e.args[0], e.args[1])
        sys.exit (1)

for ss in subjs:
    for cc in conditions:
        for rr in runs:
            for h in hemis:
                insertdata_to_db()

The script again uses the MySQLdb module. I’ve created a function called insertdata_to_db that uses this sql as an insert statement

load data local infile filename into table vertexdata fields terminated by ' ';

because I’m loading a file from the local fileshare, and the columns in this ascii file are separated by a space, denoted in the sql with fields terminated by ' '. The function also looks for exceptions and boots out if there is an error while loading. I then run this function within nested loops to iterate over subjs, conditions, runs, and hemis.

Creating Tables for your Experiment

[This assumes you've already set up yourself with a relational DBMS (MySQL and PostgreSQL are usually winners) and have access to create tables in your database. FYI: I'm using MySQL.]

Setting up a DBMS requires quite a bit of thought and foresight. The flexibility available is very broad, so the real constraints are more with the user rather than the infrastructure. Some questions to consider:

  • At what stage of processing do I want to put the data into tables (e.g. raw time series, motion-corrected time series, “cleaned” time series, just the HRFs for conditions)?
  • What information do I want to index (e.g. ROI identifiers, voxels, subject identifiers)?
  • What meta data do I want to hold?
  • Do I want primary key and foreign key usage?
  • What sort of relationships do I want between tables?
  • For first steps, it might be best to start simple and let the complexities emerge and be handled as more experience with the infrastructural tools is gained.

    For actually creating the tables in my database, I’ve found MySQL Administrator (part of the MySQL GUI Tools) to be extremely straightforward and helpful. Of course, procedures for creating tables can also be scripted and make use of simple SQL statements such as:

    create table afni_db_test (x INT, INDEX(x), y INT, INDEX(y), z INT, INDEX(z), consistent float, filler float, inconsistent float);
    

    In this line, the table created is called afni_db_test, columns are x, y, z, filler, and inconsistent, with x, y, and z being indices.
    And here is that statment embedded in a python script (using the MySQLdb module):

    #!/usr/bin/python
    
    import os
    import sys
    import commands
    import MySQLdb
    
    _user='you'
    _host='yourhost'
    _passwd='yourpassword'
    _db='test'
    
    try:
        connection = MySQLdb.connect(host=_host,user=_user,db=_db,passwd=_passwd)
    except MySQLdb.Error, e:
        print "Error %d: %s" % (e.args[0], e.args[1])
        sys.exit (1)
    
    cursor = connection.cursor()
    
    create_table_statement = 'create table afni_db_test (x INT, INDEX(x), y INT, INDEX(y), z INT, INDEX(z), consistent float, filler float, inconsistent float);'
    
    cursor.execute(create_table_statement)
    

    Choosing the type of storage engine you would like to use is also something to consider and dependent on what you’d like to do with your data. By default, MySQL uses MyISAM, but InnoDB is also popular. The InnoDB engine supports primary and foreign key constraints, while the MyISAM engine does not. Changing the type of storage engine is very easy in the MySQL Administrator GUI. Just go to the ‘Catalogs’ tab, select your Schemata and you get a list of the tables in your database. Then, double-click the table you want and its schema will open in a new window. There, just choose the ‘Table Options’ tab and in the drop-down menu you can select a new storage engine (click for bigger image):

    More about storage engines available with MySQL can be found here.

    The groundwork

    Posted in DBMS. Tags: . Leave a Comment »