Problem Set 2: Multi-Startup Set

Alpha due: Monday, March 12, 2018, 10:00 pm
Code reviews due: Friday, March 16, 2018, 11:00 am
Beta due: Monday, March 19, 2018, 10:00 pm

The purpose of this problem set is to practice designing, testing, and implementing abstract data types. This problem set focuses on implementing mutable types where the specifications are provided to you; the next problem set will focus on immutable types where you must choose specifications.

Design Freedom and Restrictions

On several parts of this problem set, the classes and methods will be yours to specify and create, but you must pay attention to the PS2 instructions sections in the provided documentation.

You must satisfy the specifications of the provided interfaces and classes. In some cases, you are permitted to strengthen the provided specifications or add new methods, but in other cases you are not permitted to change them at all.

On this problem set, Didit provides less feedback about the correctness of your code:

It is your responsibility to examine Didit feedback and make sure your code compiles and runs properly for grading.
However, correctness is your responsibility alone, and you must rely on your own careful specification, testing, and implementation to achieve it.

Please remember to push early: we cannot guarantee timely Didit builds, especially near the problem set deadline.

Get the code

To get started,

Ask Didit to create a remote psets/ps2 repository for you on Athena.
Clone the repo. Find the ssh:// URL at the top of your Didit assignment page and run:

git clone «URL» ps2
Import into Eclipse. See Problem Set 0 for if you need a refresher on how to create, clone, or import your repository.

Overview

Java Collections Framework provides many useful data structures for working with collections of objects: lists, maps, queues, sets, and so on. Let’s use some of these building blocks to build a data structure of our own, and then build one very small application of that ADT.

Interval sets · Multi-interval sets · Similarity between multi-interval sets

Interval sets

Problems 1-3: we will implement the interval set abstract type, a mutable set of unique labels where each label is associated with a non-overlapping interval on the number line.

For example, we can picture the interval set { A = [0,5), B = [10,30), C = [30,35) } as:

In this example, the labels are the unique String objects "A", "B", and "C"; and each is associated with an interval, which in our type will be an interval of long values. Every interval is half-open (also called half-closed), meaning that it includes its start point but not its end point, so [0,5) represents { x ∈ ℝ | 0 ≤ x < 5 }.

Read the Javadoc documentation for IntervalSet generated from src/interval/IntervalSet.java.

Labels do not have to have String type, but can have any immutable type. IntervalSet<L> is a generic type similar to List<E> or Map<K,V>. In the specification of List, E is a placeholder for the type of elements in the list. In Map, K and V are placeholders for the types of keys and values. Only later does the client of List or Map choose particular types for these placeholders by constructing, for example, a List<Clown> or a Map<Car,Integer>.

The specification of a generic type is written in terms of the placeholders. For example, the specification of Map<K,V> says that K must be an immutable type: if you want to make something the key in a Map, it shouldn’t be a mutable object because the Map may not work correctly.

In our specification for IntervalSet<L>, we make the same demand about the immutability of type L. Clients of IntervalSet who try to use mutable labels have violated the precondition. They cannot expect correct behavior.

For this problem set, we will implement IntervalSet twice, with two different reps, to practice choosing abstraction functions and rep invariants and preventing rep exposure. There are many good reasons for a library to provide multiple implementations of a type (for example, the ArrayList and LinkedList implementations of List satisfy clients with different performance requirements), and we’re following that model.

Multi-interval sets

Problem 4: using interval sets, we will implement the multi-interval set type, a mutable set of labels where each label is associated with one or more globally-non-overlapping intervals.

For example, { A = [[0,5),[20,25)], B = [[10,20),[25,30)], C = [[30,35)] }:

Once again, the labels are "A", "B", and "C". Label "C" is associated with one interval, labels "A" and "B" each with two.

Read the Javadoc documentation for MultiIntervalSet generated from src/interval/MultiIntervalSet.java.

Similarity between multi-interval sets

Problem 5: with our multi-interval set datatype in hand, the MVP for your new “” startup needs one crucial piece of proprietary technology — a way to measure the similarity between multi-interval sets.

For example, given { A = [[0,5),[20,25)], B = [[10,20),[25,30)] } and { A = [[20,35)], B = [[10,20)], C = [[0,5)] }:

We might consider an interval completely similar (value 1) when the labels are equal (e.g. on [10,20) and [20,25)) and dissimilar (value 0) otherwise.

Similarity will be defined as a ratio of the extent spanned by the two multi-interval sets, so the similarity between these two sets with the binary definition of label similarity is:

( 5×0 + 5×0 + 10×1 + 5×1 + 5×0 + 5×0 ) ÷ ( 35 - 0 ) = 15 / 35 ≈ 0.42857

(Hover or tap on individual terms to highlight that part of the figure above.)

However, our Similarity type will support a client-supplied definition of label similarity, so for example the user might specify that "A" and "C" have similarity 0 but "A" and "B" are closer, similarity ½. Using that definition:

( 5×0 + 5×0 + 10×1 + 5×1 + 5×0.5 + 5×0 ) ÷ ( 35 - 0 ) = 17.5 / 35 = 0.5

Clients of Similarity define label similarity by providing a file: for each pair of labels in the file, the client specifies a numeric similarity between 0 and 1. The similarity between all other pairs of labels is 0.

Read the Javadoc documentation for Similarity generated from src/startup/Similarity.java. The file format is specified in the Similarity(File) constructor Javadoc.

Problem 1: Test `IntervalSet<String>`

Devise, document, and implement tests for IntervalSet<String>.

For now, we’ll only test (and then implement) interval sets with String labels. Later, we’ll expand to other kinds of labels.

Correction

The instructions in IntervalSetTest.java say “you must not add … non-@Test methods.”

You may add private helper methods. Do not add public non-@Test methods.

In order to accommodate running our tests on multiple implementations of the IntervalSet interface, here is the setup:

The testing strategy and tests for the static IntervalSet.empty() method are in IntervalSetEmptyTest.java. Since the method is static, there will be only one implementation, and we only need to run these tests once. We’ve provided these tests. You are free to change or add to them, but you can leave them as-is for this problem.
Write your testing strategy and your tests for all the instance methods in IntervalSetTest.java. In these tests, you must use the emptyInstance() method to get fresh empty sets, not IntervalSet.empty()! See the provided testInitialLabelsEmpty() for an example.

Java note

IntervalSetTest is an abstract class. Abstract classes and subclassing have their uses, but in general should be avoided.

Unlike IntervalSetEmptyTest, which works like any other JUnit test class we’ve written, IntervalSetTest is different because you can’t run it directly. It has a blank waiting to be filled in: the emptyInstance() method that will provide empty IntervalSet objects. In the next problem, we’ll see two subclasses of IntervalSetTest that fill in the blank by returning empty sets of different types.

Your tests in IntervalSetTest must be legal clients of the IntervalSet spec. Any tests specific to your implementations will go in the subclasses in the next problem.

Commit to Git. Once you’re happy with your solution to this problem, commit and push!

Problem 2: Implement `IntervalSet<String>`

Now we’ll implement interval sets with String labels — twice.

For all the classes you write in this problem set:

Document the abstraction function and representation invariant.
Along with the rep invariant, document how the type prevents rep exposure.
Implement checkRep to check the rep invariant.
Implement toString with a useful human-readable representation of the abstract value.

All your classes must have clear documented specifications. This means every method will have a Javadoc comment, except when you use @Override:

When a method is specified in an interface and then implemented by a concrete class — for example, insert(..) in RepMapIntervalSet is specified in IntervalSet — the @Override annotation with no Javadoc comment indicates the method has the same spec. Unless the concrete class needs to strengthen the spec, don’t write the spec again (DRY).

You can choose either RepMapIntervalSet or RepListIntervalSet to write first. There should be no dependence or sharing of code between your two implementations.

2.1. Implement `RepMapIntervalSet`

For RepMapIntervalSet, you must use the rep provided:

private final Map<String, Long> startMap = new HashMap<>();
private final Map<String, Long> endMap = new HashMap<>();

You may not add fields to the rep or choose not to use one of the fields. You must use the names startMap and endMap, even if they are not the names you would choose.

Be sure to use the @Override annotation on toString to ensure that you are correctly overriding the Object version of the method, not creating a new, different method.

You should strengthen the spec of RepMapIntervalSet.toString() and write tests for it in RepMapIntervalSetTest.java. Don’t add those tests to IntervalSetTest, which must test the IntervalSet spec.

All Java classes inherit Object.toString() with its very underdetermined spec. IntervalSet doesn’t specify a stronger spec, but your concrete classes can. The choice not to strengthen the spec of toString in the IntervalSet interface is deliberate: since toString is intended for human consumption, keeping its spec weak helps prevent other code from depending on its particular format.

Run the tests by right-clicking on RepMapIntervalSetTest.java and selecting Run As → JUnit Test.

Commit to Git. Once you’re happy with your solution to this problem, commit and push!

2.2. Implement `RepListIntervalSet`

For RepListIntervalSet, you must use the rep provided:

private final List<String> labelList = new ArrayList<>();
private final List<Long> valueList = new ArrayList<>();

You may not add fields to the rep or choose not to use one of the fields. You must use the names labelList and valueList, even if they are not the names you would choose.

You should strengthen the spec of RepListIntervalSet.toString() and write tests for it in RepListIntervalSetTest.java. Don’t add those tests to IntervalSetTest, which must test the IntervalSet spec.

Run the tests by right-clicking on RepListIntervalSetTest.java and selecting Run As → JUnit Test.

Commit to Git. Once you’re happy with your solution to this problem, commit and push!

Problem 3: Implement generic `IntervalSet<L>`

Nothing in either of your implementations of IntervalSet should rely on the fact that labels are of type String in particular.

The spec says that labels “must be immutable” and are “compared using equals.” Let’s change both of our implementations to support labels of any type that meets these conditions.

3.1. Make the implementations generic

Change the declarations of your concrete classes to read:

public class RepMapIntervalSet<L> implements IntervalSet<L> { ... }

and:

public class RepListIntervalSet<L> implements IntervalSet<L> { ... }

Update both of your implementations to support any type of label, using placeholder L instead of String. Roughly speaking, you should be able to find-and-replace all the instances of String in your code with L!

This refactoring will make RepMapIntervalSet and RepListIntervalSet generic types.

Any place you previously referred to these types without a generic parameter, you will need to add it. For example, if you called a constructor like new RepMapIntervalSet(), that will need to become new RepMapIntervalSet<String>(). Depending on context, you can use diamond notation <> to avoid writing type parameters twice.

When you’re done with the conversion, all your instance method tests should pass.

Commit to Git. As always, once you’re happy with your solution to this problem, commit and push!

3.2. Test other types of labels

Because the implementation of IntervalSet does not know or care about the actual type of the labels, and only compares them according to the IntervalSet specification, your test suite with String labels is sufficient.

However, since this is our first generic type, to gain confidence that you do indeed support different types of labels, modify or add to your tests of toString() in RepMap- and RepListIntervalSetTest so that they create and use RepMap- and RepListIntervalSet instances (respectively) with a couple different types of (immutable) labels.

Note: you cannot write tests with other types of labels in IntervalSetTest, because those tests must use the emptyInstance() method to get new sets. That method returns IntervalSet<String>.

3.3. Implement `IntervalSet.empty()`

Pick one of your IntervalSet implementations for clients to use, and implement IntervalSet.empty(). Unlike ArrayList and LinkedList, where clients who only want a List are forced to break the abstraction barrier by calling a concrete constructor, clients of IntervalSet should not be aware of how we’ve implemented it.

Java note

You may use @SafeVarargs where necessary, for example if you write testing helper functions that take a variable number of arguments.

At this point, all of your interval set code (implementations and tests) must have no warnings from the compiler (warnings have a symbol, as opposed to the for errors) and no @SuppressWarnings annotations (which would disable the warning, nice try).

If you cannot figure out how to make your implementations generic while satisfying the Java compiler’s static type checking, please visit lab or office hours, or ask a question on Piazza.

At this point in the problem set, we’ve completed the test-first implementation of IntervalSet.

RepMap- and RepListIntervalSetTest should have a green bar from JUnit and excellent code coverage from the coverage tool.
This is a good opportunity for self code review: remove dead code and debugging printlns, DRY up duplication, and so on.
And of course, commit and push.

Problem 4: `MultiIntervalSet<L>`

Just as we implemented the IntervalSet ADT by relying on other ADTs as building blocks, we can use IntervalSet as part of the specification and as part of the representation of another ADT.

4.1 Test `MultiIntervalSet`

Devise, document, and implement tests for MultiIntervalSet in MultiIntervalSetTest.java.

Java note

If you define a helper type to use in the rep of MultiIntervalSet, we require that class be declared in MultiIntervalSet.java.

If that type is immutable, be careful: not implementing equality as described in class 15 will mean you cannot use equals to compare objects of your helper type.

4.2 Implement `MultiIntervalSet`

Implement MultiIntervalSet in MultiIntervalSet.java.

You must use IntervalSet in the rep of MultiIntervalSet. For example, IntervalSet instances might be elements in a list or values in a map. Your MultiIntervalSet should rely only on the specs of IntervalSet, not on the details of a particular IntervalSet implementation, and it should use the static empty() method to obtain interval set instances. The implementation of MultiIntervalSet is otherwise entirely up to you.

Run all the IntervalSet and MultiIntervalSet tests by right-clicking on the interval package in the test folder and selecting Run As → JUnit Test.

Problem 5: Fame and fortune with `IntervalSet`

This problem description was randomly generated

If you don’t like it, please reload the page to receive a different problem.

You are the founder and CTO of , the hottest new startup that just called “ for , but .” In your MVP, you’re leveraging the most advanced artificial intelligence algorithms and machine learning architecture to deliver unprecedented breakthrough results in the world of .

Of course, that’s all a bunch of . As CTO, you realize it all comes down to managing multi-interval sets that represent . works by comparing . You need to develop tools for measuring the similarity of multi-interval sets: completely different will score 0, with the score approaching 1 as they become more alike.

And best of all, when you pivot your startup to and rebrand as , the startup in , you’ll use those multi-interval sets to represent ! In the new startup, you’re going to compare — so your success in the startup world all comes down to this one problem.

5.1 Specify and test `Similarity`

Devise, document, and implement tests for Similarity in SimilarityTest.java.

You decide whether Similarity is mutable or immutable, and you should modify the class Javadoc to state that part of the spec.

In the future, every immutable type you write will override equals and hashCode to implement the Object contract as described in class 15 on equality, but that’s not required for this problem set.

You are free to add additional methods to Similarity, but you may not change the signatures of the required methods. You are free to strengthen the specifications of the required methods, but you may not weaken them. Note how this might mean your tests for Similarity are not usable against someone else’s implementation, because you’ve strengthened or added to the spec.

Your tests should make use of one or more sample definition files. Put those files in the test/startup directory — the same directory as SimilarityTest.java. Reference them in the code with, for example:

new File("test/startup/.txt")

While you still must test and implement Similarity(File), remember that you may add operations to Similarity. Adding a creator that takes a String or Map, for example, might simplify your tests for similarity(..).

5.2 Implement `Similarity`

Implement Similarity in Similarity.java.

Your Similarity type should rely only on the specs of MultiIntervalSet, not on the details of any particular implementation.

The implementation of Similarity is entirely up to you.

For reading in the label similarity definition files, there are at least three Java APIs to consider:

Java note

Another option, if you want to program in a functional style: use Files.lines(..), which returns a Stream<String>.

FileReader is the standard class for reading from a file. Wrapping the FileReader in a BufferedReader allows you to use the convenient readLine() method to read whole lines at a time.
The utility function Files.readAllLines(..) will read all the lines from a Path. Our constructor takes a File, but File provides a toPath() method to obtain a Path.
Class Scanner is designed for breaking up input text into pieces. It provides many features and you’ll need to read its specs carefully.

Run all the tests in the project by right-clicking on the test folder and selecting Run As → JUnit Test.

Before you’re done

Make sure you have documented specifications, in the form of properly-formatted Javadoc comments, for all your types and operations.
Make sure you have documented abstraction functions and representation invariants, in the form of a comment near the field declarations, for all your implementations.

With the rep invariant, also say how the type prevents rep exposure.

Make sure all types use checkRep to check the rep invariant and implement toString with a useful human-readable representation of the abstract value.
To gain static checking of the correct signature, use @Override when you override toString.

Also use @Override when a class implements an interface method, to remind readers where they can find the spec.
Make sure you have a thorough, principled test suite for every type.

Submitting

Make sure you commit AND push your work to your repository on Athena. We will use the state of your repository on Athena as of 10:00pm on the deadline date. When you git push, the continuous build system attempts to compile your code and run some basic tests. You can always review your build results at didit.csail.mit.edu.

Didit feedback is provided on a best-effort basis:

There is no guarantee that Didit tests will run within any particular timeframe, or at all. If you push code close to the deadline, the large number of submissions will slow the turnaround time before your code is examined.
If you commit and push right before the deadline, the Didit build does not have to complete in order for that commit to be graded.
Passing some or all of the public tests on Didit is no guarantee that you will pass the full battery of autograding tests — but failing them is almost sure to mean lost points on the problem set.

Grading

Your overall ps2 grade will be computed as approximately:
~35% alpha autograde + ~5% alpha manual grade + ~45% beta autograde + ~15% beta manual grade

The autograder tests will not change from alpha to beta.

Manual grading of the alpha will only examine the internal documentation (AF, RI, etc.) and implementation of your ADTs. Manual grading of the beta may examine any part, including re-examining ADT implementations, and how you addressed code review feedback.

Problem Set 2: Multi-Startup Set

Get the code

Overview

Interval sets

Multi-interval sets

Similarity between multi-interval sets

Problem 1: Test IntervalSet​<String>

Problem 2: Implement IntervalSet​<String>

2.1. Implement RepMapIntervalSet

2.2. Implement RepListIntervalSet

Problem 3: Implement generic IntervalSet<L>

3.1. Make the implementations generic

3.2. Test other types of labels

3.3. Implement IntervalSet.empty()

Problem 4: Multi­IntervalSet<L>

4.1 Test MultiIntervalSet

4.2 Implement MultiIntervalSet

Problem 5: Fame and fortune with IntervalSet

5.1 Specify and test Similarity

5.2 Implement Similarity

Before you’re done

Submitting

Grading

Problem 1: Test `IntervalSet<String>`

Problem 2: Implement `IntervalSet<String>`

2.1. Implement `RepMapIntervalSet`

2.2. Implement `RepListIntervalSet`

Problem 3: Implement generic `IntervalSet<L>`

3.3. Implement `IntervalSet.empty()`

Problem 4: `MultiIntervalSet<L>`

4.1 Test `MultiIntervalSet`

4.2 Implement `MultiIntervalSet`

Problem 5: Fame and fortune with `IntervalSet`

5.1 Specify and test `Similarity`

5.2 Implement `Similarity`