Evaluation!

A number of songs were tested against. The bulk of tests were not done using mimiced target samples for the simple reason that it allowed me to verify that the behaviors I was observing were a result of the algorithm, not whether I had made a good mimic or not. There were are number of things to investigate:

  1. Performance for different number of algorithm iterations

  2. Performance for different numbers of target descriptor components M.

  3. Performance for different numbers of backing descriptor componets N.

  4. The effects of introducing sparsity on different vectors to reduce effect of weak components.

  5. Different risk boundaries for the Bayes binary masking.

Besides qualitative evaluation, quantitative evaluation was doing using through use of Mean Square Evaluation (MSE) and the Blind Source Separation Tool-Box from http://bass-db.gforge.inria.fr/bss_eval/ which is a standard speech separation evaluation set. While the absolute results from MSE were often not very helpful, the trending follows the most important metric, Source Interference Ratio (SIR). MSE is calculated from the ffts making it less computationally expensive at mid stages and will likely become the driving metric in automatic optimization to come later. The BSS tool-kit results compare the actual wav outputs returning Signal to Distortion (SDR), Signal to Artifacts Ratio (SAR), and SIR. SIR measures the leakage between the two extracted tracks and is the best measure for success. SDR and SAR relate more closely to general audio quality.

I have also done a comparison with one of the most basic means of extraction, the straight application of a Bayes Binary Mask.

Data targets and sources have attempted to be somewhat diverse. With one noted exception, all test were run using source audio. Mimics are noted where done in addition:



Next- Results