Utility routines for processing text in scores and other musical objects.
assembleAllLyrics(streamIn, maxLyrics=10, lyricSeparation='\n')¶
Concatenate all Lyrics text from a stream. The Stream is automatically flattened.
uses assembleLyrics to do the heavy work.
maxLyrics just determines how many times we should parse through the score, since it is not easy to determine what the maximum number of lyrics exist in the score.
Here is a demo with one note and five lyrics.
>>> f = corpus.parse('demos/multiple-verses.xml') >>> l = text.assembleAllLyrics(f) >>> l '\n1. First\n2. Second\n3. Third\n4. Fourth\n5. Fifth'
Concatenate text from a stream. The Stream is automatically flattened.
The lineNumber parameter determines which line of text is assembled.
>>> s = stream.Stream() >>> n1 = note.Note() >>> n1.lyric = "Hi" >>> n2 = note.Note() >>> n2.lyric = "there" >>> s.append(n1) >>> s.append(n2) >>> text.assembleLyrics(s) 'Hi there'
Given a text string, if an article is found in a leading position, place it at the end with a comma.
>>> text.postpendArticle('The Ale is Dear') 'Ale is Dear, The' >>> text.postpendArticle('The Ale is Dear', 'en') 'Ale is Dear, The' >>> text.postpendArticle('The Ale is Dear', 'it') 'The Ale is Dear' >>> text.postpendArticle('Il Combattimento di Tancredi e Clorinda', 'it') 'Combattimento di Tancredi e Clorinda, Il'
Given a text string, if an article is found in a trailing position with a comma, place the article in front and remove the comma.
>>> text.prependArticle('Ale is Dear, The') 'The Ale is Dear' >>> text.prependArticle('Ale is Dear, The', 'en') 'The Ale is Dear' >>> text.prependArticle('Ale is Dear, The', 'it') 'Ale is Dear, The' >>> text.prependArticle('Combattimento di Tancredi e Clorinda, Il', 'it') 'Il Combattimento di Tancredi e Clorinda'
TextBox(content=None, x=500, y=500)¶
A TextBox is arbitrary text that might be positioned anywhere on a page, independent of notes or staffs. A page attribute specifies what page this text is found on; style.absoluteY and style.absoluteX position the text from the bottom left corner in units of tenths.
This object is similar to the TextExpression object, but does not have as many position parameters, enclosure attributes, and the ability to convert to RepeatExpressions and TempoTexts.
>>> from music21 import text, stream >>> y = 1000 # set a fixed vertical distance >>> s = stream.Stream()
Specify character, x position, y position
>>> tb = text.TextBox('m', 250, y) >>> tb.style.fontSize = 40 >>> tb.style.alignVertical = 'bottom' >>> s.append(tb)
>>> tb = text.TextBox('u', 300, y) >>> tb.style.fontSize = 60 >>> tb.style.alignVertical = 'bottom' >>> s.append(tb)
>>> tb = text.TextBox('s', 550, y) >>> tb.style.fontSize = 120 >>> tb.style.alignVertical = 'bottom' >>> s.append(tb)
>>> tb = text.TextBox('ic', 700, y) >>> tb.style.alignVertical = 'bottom' >>> tb.style.fontSize = 20 >>> tb.style.fontStyle = 'italic' >>> s.append(tb)
>>> tb = text.TextBox('21', 850, y) >>> tb.style.alignVertical = 'bottom' >>> tb.style.fontSize = 80 >>> tb.style.fontWeight = 'bold' >>> tb.style.fontStyle = 'italic' >>> s.append(tb)
TextBox read-only properties
Read-only properties inherited from
TextBox read/write properties
Get or set the content.
>>> te = text.TextBox('Con fuoco') >>> te.content 'Con fuoco' >>> te.style.justify = 'center' >>> te.style.justify 'center'
Get or set the page number. The first page (page 1) is the default.
>>> te = text.TextBox('Great Score') >>> te.content 'Great Score' >>> te.page 1 >>> te.page = 2 >>> te.page 2
Read/write properties inherited from
Methods inherited from
TextBox instance variables
Instance variables inherited from
Attempts to detect language on the basis of trigrams
uses code from http://code.activestate.com/recipes/326576-language-detection-using-character-trigrams/ unknown author. No license given.
See Trigram docs below...
returns the code of the most likely language for a passage, works on unicode or ascii. current languages: en, fr, de, it, cn, or None
>>> ld = text.LanguageDetector() >>> ld.mostLikelyLanguage("Hello there, how are you doing today? " + ... "I haven't seen you in a while.") 'en' >>> ld.mostLikelyLanguage("Ciao come stai? Sono molto lento oggi, ma non so perche.") 'it' >>> ld.mostLikelyLanguage("Credo in unum deum. Patrem omnipotentem. Factorum celi") 'la'
>>> ld = text.LanguageDetector() >>> ld.mostLikelyLanguage("") is None True
returns a number representing the most likely language for a passage or 0 if there is no text.
Useful for feature extraction.
The codes are the index of the language name in LanguageDetector.languageCodes + 1
>>> ld = text.LanguageDetector() >>> for i in range(0, len(ld.languageCodes)): ... print(str(i+1) + " " + ld.languageCodes[i]) 1 en 2 fr 3 it 4 de 5 cn 6 la 7 nl >>> numLang = ld.mostLikelyLanguageNumeric("Hello there, how are you doing today? " + ... "I haven't seen you in a while.") >>> numLang 1 >>> ld.languageCodes[numLang - 1] 'en'
See LanguageDector above. From http://code.activestate.com/recipes/326576-language-detection-using-character-trigrams/
The frequency of three character sequences is calculated. When treated as a vector, this information can be compared to other trigrams, and the difference between them seen as an angle. The cosine of this angle varies between 1 for complete similarity, and 0 for utter difference. Since letter combinations are characteristic to a language, this can be used to determine the language of a body of text. For example:
>>> reference_en = Trigram('/path/to/reference/text/english') >>> reference_de = Trigram('/path/to/reference/text/german') >>> unknown = Trigram('url://pointing/to/unknown/text') >>> unknown.similarity(reference_de) #_DOCS_SHOW 0.4 >>> unknown.similarity(reference_en) #_DOCS_SHOW 0.95
would indicate the unknown text is almost cetrtainly English. As syntax sugar, the minus sign is overloaded to return the difference between texts, so the above objects would give you:
#_DOCS_SHOW >>> unknown - reference_de #_DOCS_SHOW 0.6 #_DOCS_SHOW >>> reference_en - unknown # order doesn’t matter. #_DOCS_SHOW 0.05
As it stands, the Trigram ignores character set information, which means you can only accurately compare within a single encoding (iso-8859-1 in the examples). A more complete implementation might convert to unicode first.
As an extra bonus, there is a method to make up nonsense words in the style of the Trigram’s text.
>>> reference_en.makeWords(30) My withillonquiver and ald, by now wittlectionsurper, may sequia, tory, I ad my notter. Marriusbabilly She lady for rachalle spen hat knong al elf
Trigram read-only properties
Returns a character likely to follow the given string two character string, or a space if nothing is found.
returns a string of made-up words based on the known text.
calculates the scalar length of the trigram vector and stores it in self.length.
returns a number between 0 and 1 indicating similarity between two trigrams. 1 means an identical ratio of trigrams; 0 means no trigrams in common.