/ python

Announcing: CorpusDB

Available at https://github.com/kitefishlabs/CorpusDB. This is the Python implementation, intended to mirror CBPSC as much as possible. This software is under heavy development. Full documentation and lots of examples coming soon!


  • Python - I use version 2.7.2 bundled with Mac OS 10.8.
  • Numpy, Ipython, Matplotlib - use the Superpack.
  • sc-0.3.1 - SuperCollider lib, not strictly necessary, but handy to have.
  • jsonpickle-0.4.0 - JSON lib.
  • http://bregman.dartmouth.edu/bregman - using the Bregman Toolkit's imagesc function for the time being.
  • Supercollider - scsynth synthesis engine, must be installed

Installation and use

  • cd into the appropriate directories and (sudo) python setup.py install
  • from corpusdb import *
  • look at the examples folder for some example corpora and tasks

The gist...

anchorpath = os.path.expanduser("~/.../corpusdb/examples/1_simple_analysis\")

corpus = corpusdb.CorpusDB(anchorpath)

# create the full path and add the sound file
f = os.path.join(anchorpath, 'snd', '24940__vexst__amen.wav')
node = corpus.add_sound_file(filename=os.path.basename(f), tratio=1)
# a Node object is returned, in this case a Sampler Node, which has a sound file ID
sfid = node.sfid
# pass the full path, the ID, and the transposition ratio to the analysis function
corpus.analyze_sound_file(os.path.basename(f), sfid, tratio=1)
powers, mfccs = corpus.get_raw_metadata(sfid)
cut_list = findCutPoints(mfccs, powers)

for i, frame in enumerate(cut_list[:-1]):
	corpus.add_sound_file_unit(sfid, onset=(frame*0.04), dur=((cut_list[i+1]-frame)*0.04))

# segment

# finalize internal data structures

Note that the findCutPoints function is not shown, but this is an algorithm that should take raw MFCC frame data and return a list of time points where segments begin.