Class Similarity
 java.lang.Object

 startup.Similarity

public class Similarity extends Object
A [TODO mutable?] measure of similarity between multiinterval sets of Strings.An instance of Similarity uses a clientprovided definition of label similarities, where 0 is least similar and 1 is most similar.
Given two multiinterval sets, let min be the minimum start of any of their intervals, and let max be the maximum end. The similarity between the two sets is the ratio: (sum of piecewisematching between the sets) / (max  min)
The amount of piecewisematching for any unit interval [i, i+1) is:
 0 if neither set has a label on that interval
 0 if only one set has a label on that interval
 otherwise, the similarity between the labels as defined for this Similarity instance
For example, suppose you have multiinterval sets that use labels "happy", "sad", and "meh"; and similarity between labels is defined as:
 1 if both are "happy", both "sad", or both "meh"
 0.5 if one is "meh" and the other is "happy" or "sad"
 0 otherwise
Then the similarity between these two sets:
 { "happy" = [[0, 1), [2,4)], "sad" = [[1,2)] }
 { "sad" = [[1, 2)], "meh" = [[2,3)], "happy" = [[3,4)] }
would be: (0 + 1 + 0.5 + 1) / (4  0) = 0.625
PS2 instructions: this is a required ADT class, and you MUST NOT weaken the required specifications. However you MAY strengthen the specifications and you MAY add additional methods.


Constructor Summary
Constructors Constructor Description Similarity(File similarities)
Create a new Similarity where similarity between labels is defined in the given file.

Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
similarity(MultiIntervalSet<String> a, MultiIntervalSet<String> b)
Compute similarity between two multiinterval sets.



Constructor Detail

Similarity
public Similarity(File similarities) throws IOException
Create a new Similarity where similarity between labels is defined in the given file. Each line of similarities must contain exactly three pieces, separated by one or more spaces. The first two pieces give a pair of strings, and the third piece gives the decimal similarity between them, in a format allowed byDouble.valueOf(String)
, between 0 and 1 inclusive. Similarity between labels is symmetric, so the order of strings in the pair is irrelevant. A pair may not appear more than once. The similarity between all other pairs of strings is 0. This format cannot define nonzero similarity for strings that contain newlines or spaces, or for the empty string.For example, the following file defines the similarity function used in the example at the top of this class:
happy happy 1 sad sad 1 meh meh 1 meh happy 0.5 meh sad 0.5
 Parameters:
similarities
 label similarity definition as described above Throws:
IOException
 if the similarity file cannot be found or read


Method Detail

similarity
public double similarity(MultiIntervalSet<String> a, MultiIntervalSet<String> b)
Compute similarity between two multiinterval sets. Returns a value between 0 and 1 inclusive. Parameters:
a
 nonempty multiinterval set of stringsb
 nonempty multiinterval set of strings Returns:
 similarity between a and b as defined above

