Announcing: CorpusDB

Corpus-based Processing for Python/SuperCollider

Available at This is the Python implementation, intended to mirror CBPSC as much as possible. This software is under heavy development. Full documentation and lots of examples coming soon!


  • Python - I use version 2.7.2 bundled with Mac OS 10.8.*.

  • Numpy, Ipython, Matplotlib - use the Superpack.

  • sc-0.3.1 - SuperCollider lib, not strictly necessary, but handy to have.

  • jsonpickle-0.4.0 - JSON lib.

  • - using the Bregman Toolkit's imagesc function for the time being.

  • Supercollider - scsynth synthesis engine, must be installed

Installation and use

  • cd into the appropriate directories and (sudo) python install

  • from corpusdb import *

  • look at the examples folder for some example corpora and tasks

The gist...

`anchorpath = os.path.expanduser("~/.../corpusdb/examples/1_simple_analysis\")
corpus = corpusdb.CorpusDB(anchorpath)

# create the full path and add the sound file
f = os.path.join(anchorpath, 'snd', '24940__vexst__amen.wav')
node = corpus.add_sound_file(filename=os.path.basename(f), tratio=1)

# a Node object is returned, in this case a Sampler Node, which has a sound file ID
sfid = node.sfid

# pass the full path, the ID, and the transposition ratio to the analysis function
corpus.analyze_sound_file(os.path.basename(f), sfid, tratio=1)
powers, mfccs = corpus.get_raw_metadata(sfid)
cut_list = findCutPoints(mfccs, powers)

for i, frame in enumerate(cutlist[:-1]):
soundfileunit(sfid, onset=(frame0.04), dur=((cut_list[i+1]-frame)0.04))

# segment

# finalize internal data structures

Note that the findCutPoints function is not shown, but this is an algorithm that should take raw MFCC frame data and return a list of time points where segments begin.