User’s Guide, Chapter 53: Advanced Corpus and Metadata Searching

from music21 import *

Creating multiple corpus repositories via local corpora

In addition to the default local corpus, music21 allows users to create and save as many named local corpora as they like, which will persist from session to session.

Let’s create a new local corpus, give it a directory to find music files in, and then save it:

aNewLocalCorpus = corpus.corpora.LocalCorpus('A new corpus')

We can see that our new local corpus is saved by checking for the names of all saved local corpora using the corpus.manager list:

 [None, 'trecento', 'A new corpus', 'bach', 'fake']


When running listLocalCorporaNames(), you will see None - indicating the default local corpus - along with the names of any non-default local corpora you’ve manually created yourself. In the above example, a number of other corpora have already been created.

In Python2, take care to make all of these “unicode” entries.

Finally, we can delete the local corpus we previously created like this:


Inspecting metadata bundle search results

Let’s take a closer look at some search results:

bachBundle = corpus.corpora.CoreCorpus().search('bach', 'composer')
 <music21.metadata.bundles.MetadataBundle {22 entries}>
 <music21.metadata.bundles.MetadataEntry: bach_choraleAnalyses_riemenschneider001_rntxt>
 <music21.metadata.RichMetadata at 0x10ee73978>
mdpl = bachBundle[0].metadataPayload
bachAnalysis0 = bachBundle[0].parse()

Manipulating multiple metadata bundles

Another useful feature of music21‘s metadata bundles is that they can be operated on as though they were sets, allowing you to union, intersect and difference multiple metadata bundles, thereby creating more complex search results:

corelliBundle ='corelli', field='composer')
 <music21.metadata.bundles.MetadataBundle {1 entry}>
 <music21.metadata.bundles.MetadataBundle {23 entries}>

Consult the API for class:~music21.metadata.bundles.MetadataBundle for a more in depth look at how this works.

Getting a metadata bundle

In music21, metadata is information about a score, such as its composer, title, initial key signature or ambitus. A metadata bundle is a collection of metadata pulled from an arbitrarily large group of different scores. Users can search through metadata bundles to find scores with certain qualities, such as all scores in a given corpus with a time signature of 6/8, or all scores composed by Monteverdi.

There are a number of different ways to acquire a metadata bundle. The easiest way to get the metadataBundle for the core corpus is simply to download music21: we include a pre-made metadataBundle (in corpus/metadataCache/core.json) so that this step is unnecessary for the core corpus unless you’re contributing to the project. But you may want to create metadata bundles for your own local corpora. Access the metadataBundle attribute of any Corpus instance to get its corresponding metadata bundle:

coreCorpus = corpus.corpora.CoreCorpus()
 <music21.metadata.bundles.MetadataBundle 'core': {14493 entries}>

Music21 also provides a handful of convenience methods for getting metadata bundles associated with the virtual, local or core corpora:

coreBundle = corpus.corpora.CoreCorpus().metadataBundle
localBundle = corpus.corpora.LocalCorpus().metadataBundle
otherLocalBundle = corpus.corpora.LocalCorpus('blah').metadataBundle
virtualBundle = corpus.corpora.VirtualCorpus().metadataBundle

But really advanced users can also make metadata bundles manually, by passing in the name of the corpus you want the bundle to refer to, or, equivalently, an actual Corpus instance itself:

coreBundle = metadata.bundles.MetadataBundle('core')
coreBundle = metadata.bundles.MetadataBundle(corpus.corpora.CoreCorpus())

However, you’ll need to read the bundle’s saved data from disk before you can do anything useful with the bundle. Bundles don’t read their associated JSON files automatically when they’re manually instantiated.

 <music21.metadata.bundles.MetadataBundle 'core': {0 entries}>
 <music21.metadata.bundles.MetadataBundle 'core': {14493 entries}>

Creating persistent metadata bundles

Metadata bundles can take a long time to create. So it’d be nice if they could be written to and read from disk. Unfortunately we never got around to...nah, just kidding. Of course you can. Just call .write() on one:

coreBundle = metadata.bundles.MetadataBundle('core')
 <music21.metadata.bundles.MetadataBundle 'core': {14493 entries}>

They can also be completely rebuilt, as you will want to do for local corpora. To add information to a bundle, use the addFromPaths() method:

newBundle = metadata.bundles.MetadataBundle()
paths = corpus.corpora.CoreCorpus().getBachChorales()
failedPaths = newBundle.addFromPaths(paths)

then call .write() to save to disk

 <music21.metadata.bundles.MetadataBundle {402 entries}>


Building metadata information can be an incredibly intensive process. For example, building the core metadata bundle can easily take as long as four hours! And this is even though the building process uses multiple cores. Please use caution, and be patient, when building metadata bundles from large corpora. To monitor the corpus-building progress, make sure to set ‘debug’ to True in your user settings:

>>> environment.UserSettings()['debug'] = True

You can delete, rebuild and save a metadata bundle in one go with the rebuild() method:

virtualBundle = corpus.corpora.VirtualCorpus().metadataBundle

The process of rebuilding will store the file as it goes (for safety) so at the end there is no need to call .write().

To delete a metadata bundle’s cached-to-disk JSON file, use the delete() method:


Deleting a metadata bundle’s JSON file won’t empty the in-memory contents of that bundle. For that, use clear():