tools for segmenting – that is, dividing up a score into small, possibly overlapping sections – for searching across pieces for similarity.

Speed notes:

this module is definitely a case where running PyPy rather than cPython will give you a 3-5x speedup.

If you really want to do lots of comparisons, the scoreSimilarity method will use pyLevenshtein if it is installed from . You will need to compile it by running sudo python install on Mac or Unix (compilation is much more difficult on Windows; sorry). The ratios are very slightly different, but the speedup is between 10 and 100x!

Functions, junk=None, forceDifflib=False)

Returns either a difflib.SequenceMatcher or pyLevenshtein StringMatcher.StringMatcher object depending on what is installed.

If forceDifflib is True then use difflib even if pyLevenshtein is installed:, *args, **kwds), giveUpdates=False, *args, **kwds)

Returns a dictionary of the lists from indexScoreParts for each score in scoreFilePaths

>>> searchResults ='bwv19')
>>> fpsNamesOnly = sorted([searchResult.sourcePath
...     for searchResult in searchResults])
>>> len(fpsNamesOnly)
>>> scoreDict = search.segment.indexScoreFilePaths(fpsNamesOnly[2:5])
>>> len(scoreDict['bwv190.7.mxl'])
>>> scoreDict['bwv190.7.mxl'][0]['measureList']
[0, 5, 11, 17, 22, 27]
>>> scoreDict['bwv190.7.mxl'][0]['segmentList'][0]

Creates segment and measure lists for each part of a score Returns list of dictionaries of segment and measure lists

>>> bach = corpus.parse('bwv66.6')
>>> scoreList = search.segment.indexScoreParts(bach)
>>> scoreList[1]['segmentList'][0]
>>> scoreList[1]['measureList'][0:3]
[0, 4, 8]

Load the scoreDictionary from filePath, filePath=None)

Save the score dict from indexScoreFilePaths as a .json file for quickly reloading

Returns the filepath (assumes you’ll probably be using a temporary file), minimumLength=20, giveUpdates=False, includeReverse=False, forceDifflib=False)

Find the level of similarity between each pair of segments in a scoreDict.

This takes twice as long as it should because it does not cache the pairwise similarity.

>>> filePaths = []
>>> filePaths.append('bwv197.5.mxl')[0].sourcePath)
>>> filePaths.append('bwv190.7.mxl')[0].sourcePath)
>>> filePaths.append('bwv197.10.mxl')[0].sourcePath)
>>> scoreDict = search.segment.indexScoreFilePaths(filePaths)
>>> scoreSim = search.segment.scoreSimilarity(scoreDict)
>>> len(scoreSim)

Returns a list of tuples of first score name, first score voice number, first score measure number, second score name, second score voice number, second score measure number, and similarity score (0 to 1).

>>> for result in scoreSim[64:68]:
...     result
(...'bwv197.5.mxl', 0, 1, 4, ...'bwv197.10.mxl', 3, 1, 4, 0.0)
(...'bwv197.5.mxl', 0, 1, 4, ...'bwv197.10.mxl', 3, 2, 9, 0.0)
(...'bwv197.5.mxl', 0, 2, 9, ...'bwv190.7.mxl', 0, 0, 0, 0.07547...)
(...'bwv197.5.mxl', 0, 2, 9, ...'bwv190.7.mxl', 0, 1, 5, 0.07547...), segmentLengths=30, overlap=12, algorithm=None)

Translates a monophonic part with measures to a set of segments of length segmentLengths (measured in number of notes) with an overlap of overlap notes using a conversion algorithm of algorithm (default: search.translateStreamToStringNoRhythm). Returns two lists, a list of segments, and a list of measure numbers that match the segments.

If algorithm is None then a default algorithm of is used

>>> from music21 import *
>>> luca = corpus.parse('luca/gloria')
>>> lucaCantus =[0]
>>> segments, measureLists = search.segment.translateMonophonicPartToSegments(lucaCantus)
>>> segments[0:2]

Segment zero begins at measure 1. Segment 1 begins at measure 7:

>>> measureLists[0:2]
[1, 7]
>>> segments, measureLists = search.segment.translateMonophonicPartToSegments(
...     lucaCantus, 
...     algorithm=search.translateDiatonicStreamToString)
>>> segments[0:2]
>>> measureLists[0:2]
[1, 7]