Package startup

Class Similarity


  • public class Similarity
    extends Object
    Measure similarity between multi-interval sets of Strings.

    Similarity uses a client-provided definition of label similarities, where 0 is least similar and 1 is most similar.

    Given two multi-interval sets, let min be the minimum start of any of their intervals, and let max be the maximum end. The similarity between the two sets is the ratio: (sum of piecewise-matching between the sets) / (max - min)

    The amount of piecewise-matching for any unit interval [i, i+1) is:

    • 0 if neither set has a label on that interval
    • 0 if only one set has a label on that interval
    • otherwise, the similarity between the labels as defined for this Similarity instance

    For example, suppose you have multi-interval sets that use labels "happy", "sad", and "meh"; and similarity between labels is defined as:

    • 1 if both are "happy", both "sad", or both "meh"
    • 0.5 if one is "meh" and the other is "happy" or "sad"
    • 0 otherwise

    Then the similarity between these two sets:

    • { "happy" = [[0, 1), [2,4)], "sad" = [[1,2)] }
    • { "sad" = [[1, 2)], "meh" = [[2,3)], "happy" = [[3,4)] }

    would be: (0 + 1 + 0.5 + 1) / (4 - 0) = 0.625

    Label similarities are provided as a list of definition strings, where each one must contain exactly three pieces, separated by one or more spaces. The first two pieces give a pair of labels, and the third piece gives the decimal similarity between them, in a format allowed by Double.valueOf(String), between 0 and 1 inclusive. The definition strings may not contain newlines. Similarity between labels is symmetric, so the order of labels in each pair is irrelevant. A pair may not appear more than once. The similarity between all other pairs of labels is 0. This format cannot define non-zero similarity for labels that contain newlines or spaces, or for the empty string label.

    For example, the following 5 definitions give the similarity values used above:

     happy happy 1
     sad   sad   1
     meh   meh   1
     meh   happy 0.5
     meh   sad   0.5
     

    PS2 instructions: this is a required class, and you MUST NOT weaken the required specification. You MAY strengthen it, add additional methods, etc.

    • Method Detail

      • similarity

        public static double similarity​(List<String> similarities,
                                        MultiIntervalSet<String> a,
                                        MultiIntervalSet<String> b)
        Compute similarity between two multi-interval sets under the given definition. Returns a value between 0 and 1 inclusive.
        Parameters:
        similarities - label similarity definition as described above
        a - non-empty multi-interval set of strings
        b - non-empty multi-interval set of strings
        Returns:
        similarity between a and b as defined above