Class Similarity
- java.lang.Object
-
- startup.Similarity
-
public class Similarity extends Object
Measure similarity between multi-interval sets of Strings.Similarity uses a client-provided definition of label similarities, where 0 is least similar and 1 is most similar.
Given two multi-interval sets, let min be the minimum start of any of their intervals, and let max be the maximum end. The similarity between the two sets is the ratio: (sum of piecewise-matching between the sets) / (max - min)
The amount of piecewise-matching for any unit interval [i, i+1) is:
- 0 if neither set has a label on that interval
- 0 if only one set has a label on that interval
- otherwise, the similarity between the labels as defined for this Similarity instance
For example, suppose you have multi-interval sets that use labels "happy", "sad", and "meh"; and similarity between labels is defined as:
- 1 if both are "happy", both "sad", or both "meh"
- 0.5 if one is "meh" and the other is "happy" or "sad"
- 0 otherwise
Then the similarity between these two sets:
- { "happy" = [[0, 1), [2,4)], "sad" = [[1,2)] }
- { "sad" = [[1, 2)], "meh" = [[2,3)], "happy" = [[3,4)] }
would be: (0 + 1 + 0.5 + 1) / (4 - 0) = 0.625
Label similarities are provided as a list of definition strings, where each one must contain exactly three pieces, separated by one or more spaces. The first two pieces give a pair of labels, and the third piece gives the decimal similarity between them, in a format allowed by
Double.valueOf(String)
, between 0 and 1 inclusive. The definition strings may not contain newlines. Similarity between labels is symmetric, so the order of labels in each pair is irrelevant. A pair may not appear more than once. The similarity between all other pairs of labels is 0. This format cannot define non-zero similarity for labels that contain newlines or spaces, or for the empty string label.For example, the following 5 definitions give the similarity values used above:
happy happy 1 sad sad 1 meh meh 1 meh happy 0.5 meh sad 0.5
PS2 instructions: this is a required class, and you MUST NOT weaken the required specification. You MAY strengthen it, add additional methods, etc.
-
-
Method Summary
Modifier and Type Method Description static double
similarity(List<String> similarities, MultiIntervalSet<String> a, MultiIntervalSet<String> b)
Compute similarity between two multi-interval sets under the given definition.
-
-
-
Method Detail
-
similarity
public static double similarity(List<String> similarities, MultiIntervalSet<String> a, MultiIntervalSet<String> b)
Compute similarity between two multi-interval sets under the given definition. Returns a value between 0 and 1 inclusive.- Parameters:
similarities
- label similarity definition as described abovea
- non-empty multi-interval set of stringsb
- non-empty multi-interval set of strings- Returns:
- similarity between a and b as defined above
-
-