public class Similarity extends Object
An instance of Similarity uses a client-provided definition of label similarities, where 0 is least similar and 1 is most similar.
Given two multi-interval sets, let min be the minimum start of any of their intervals, and let max be the maximum end. The similarity between the two sets is the ratio: (sum of piecewise-matching between the sets) / (max - min)
The amount of piecewise-matching for any unit interval [i, i+1) is:
For example, suppose you have multi-interval sets that use labels "happy", "sad", and "meh"; and similarity between labels is defined as:
Then the similarity between these two sets:
would be: (0 + 1 + 0.5 + 1) / (4 - 0) = 0.625
PS2 instructions: this is a required ADT class, and you MUST NOT weaken the required specifications. However you MAY strengthen the specifications and you MAY add additional methods.
Constructor | Description |
---|---|
Similarity(File similarities) |
Create a new Similarity where similarity between labels is defined in the
given file.
|
Modifier and Type | Method | Description |
---|---|---|
double |
similarity(MultiIntervalSet<String> a,
MultiIntervalSet<String> b) |
Compute similarity between two multi-interval sets.
|
public Similarity(File similarities) throws IOException
Double.valueOf(String)
, between 0 and 1 inclusive.
Similarity between labels is symmetric, so the order of strings in the pair
is irrelevant. A pair may not appear more than once. The similarity between
all other pairs of strings is 0. This format cannot define non-zero
similarity for strings that contain newlines or spaces, or for the empty
string.
For example, the following file defines the similarity function used in the example at the top of this class:
happy happy 1 sad sad 1 meh meh 1 meh happy 0.5 meh sad 0.5
similarities
- label similarity definition as described aboveIOException
- if the similarity file cannot be found or readpublic double similarity(MultiIntervalSet<String> a, MultiIntervalSet<String> b)
a
- non-empty multi-interval set of stringsb
- non-empty multi-interval set of strings