CorpusDB: Corpus-based Processing for Python/SuperCollider

Available at This is the Python implementation, intended to mirror CBPSC as much as possible.


  • Python - I use version 2.7.2 bundled with Mac OS 10.8..
  • Numpy, Ipython, Matplotlib - use the Superpack.
  • sc-0.3.1 - SuperCollider lib, not strictly necessary, but handy to have.
  • jsonpickle-0.4.0 - JSON lib.
  • - using the Bregman Toolkit's imagesc function for the time being.
  • Supercollider - scsynth synthesis engine, must be installed

Installation and use

  • cd into the appropriate directories and (sudo) python install
  • from corpusdb import *
  • look at the examples folder for some example corpora and tasks

The gist...

`anchorpath = os.path.expanduser("~/.../corpusdb/examples/1_simple_analysis\")
corpus = corpusdb.CorpusDB(anchorpath)

# create the full path and add the sound file
f = os.path.join(anchorpath, 'snd', '24940__vexst__amen.wav')
node = corpus.add_sound_file(filename=os.path.basename(f), tratio=1)

# a Node object is returned, in this case a Sampler Node, which has a sound file ID
sfid = node.sfid

# pass the full path, the ID, and the transposition ratio to the analysis function
corpus.analyze_sound_file(os.path.basename(f), sfid, tratio=1)
powers, mfccs = corpus.get_raw_metadata(sfid)
cut_list = findCutPoints(mfccs, powers)
for i, frame in enumerate(cutlist[:-1]):corpus.addsoundfileunit(sfid, onset=(frame0.04), dur=((cutlist[i+1]-frame)0.04))
# segment

# finalize internal data structures

Note that the findCutPoints function is not shown, but this is an algorithm that should take raw MFCC frame data and return a list of time points where segments begin.

Show Comments

Copyright (C) 2019, Thomas Stoll, Kitefish Labs