Package startup

Class Similarity


public class Similarity
extends Object
Measure similarity between multi-interval sets with string labels.

Similarity uses a client-provided definition of label similarities, where 0 is least similar and 1 is most similar.

The similarity between two nonempty multi-interval sets is the ratio: (sum of piecewise-matching between the sets) / (span of the sets) where the span is the length of the smallest interval that contains all the intervals from both sets, and the amount of piecewise-matching for any unit interval [i, i+1) is:

  • 0 if neither set has a label on that interval
  • 0 if only one set has a label on that interval
  • otherwise, the similarity between the labels as defined for this Similarity instance

For example, suppose you have multi-interval sets that use labels "happy", "sad", and "meh"; and similarity between labels is defined as:

  • 1 if both are "happy", both "sad", or both "meh"
  • 0.5 if one is "meh" and the other is "happy" or "sad"
  • 0 otherwise

Then the similarity between these two sets:

  • { "happy" = [[0, 1), [2,4)], "sad" = [[1,2)] }
  • { "sad" = [[1, 2)], "meh" = [[2,3)], "happy" = [[3,4)] }

would be: (0 + 1 + 0.5 + 1) / (4 - 0) = 0.625

Label similarities are provided as a list of definition strings, where each one must contain exactly three pieces, separated by one or more spaces. The first two pieces give a pair of labels, and the third piece gives the decimal similarity between them, in a format allowed by Double.valueOf(String), between 0 and 1 inclusive. The definition strings may not contain newlines. Similarity between labels is symmetric, so the order of labels in each pair is irrelevant. A pair may not appear more than once. The similarity between all other pairs of labels is 0. This format cannot define non-zero similarity for labels that contain newlines or spaces, or for the empty string label.

For example, the following 5 definitions give the similarity values used above:

 happy happy 1
 sad   sad   1
 meh   meh   1
 meh   happy 0.5
 meh   sad   0.5

PS2 instructions: this is a required class, and you MUST NOT weaken the required specification. You MAY strengthen it, add additional methods, etc.

  • Method Details

    • similarity

      public static double similarity​(List<String> similarities, MultiIntervalSet<String> a, MultiIntervalSet<String> b)
      Compute similarity between two multi-interval sets under the given definition. Returns a value between 0 and 1 inclusive.
      similarities - label similarity definition as described above
      a - non-empty multi-interval set with string labels
      b - non-empty multi-interval set with string labels
      similarity between a and b as defined above (or as close as possible within the precision of a double)