A few notes on CBPSC

Structure, conventions, etc.

This tutorial will cover a few important but somewhat more obscure aspects of CBPSC: suggested layout for project folders, the structure of the database in CorpusDB, corpora vs. sub-corpora, and some other conventions used throughout the project...

Creating folders to organize a project

This is an example of what my directory structure might look like after adding some sound files and xml files:

/../../myCorpusName/ /../../myCorpusName/cbpsc.gui.rtf ... /myCorpusName/snd/ /myCorpusName/snd/soundfile_dir/soundfile1 /myCorpusName/snd/soundfile_dir/soundfile2 ... /myCorpusName/snd/another_soundfile_dir/soundfileA /myCorpusName/snd/another_soundfile_dir/soundfileB ... /myCorpusName/xml/ /myCorpusName/xml/my_subcorpus.xml ...

You may come up with a better structure; there should be no restriction on where you put files. CorpusDB's analysis functions will write .md.aiff files to a directory named 'md' (creating the folder if necessary) alongside any sound file that you analyze.

Corpora versus sub-corpora.

There are many ways to do this. I have tried one way and used it somewhat extensively. You will notice that in the list of descriptors that there is one called sfgrpID or "sound file group id". This descriptor can be used (and is used by default within the import function) to separate out groups of sound files. For example, you might have metal sounds and wood sounds that you wish to distinguish as two groups in CBPSC.

If you create two separate XML files--say, one for all the wood sounds and one for all the metals--then you could import those two XML files into an empty corpus and have both groups in one corpus, tagged with a number corresponding to the "batch" that they were imported as. Simply put, every time you import, the sound file group ID is incremented, and this feature can be used to group sound files within a corpus.