Problem Set 2: Multi-Startup Set
The purpose of this problem set is to practice designing, testing, and implementing abstract data types. This problem set focuses on implementing mutable types where the specifications are provided to you; the next problem set will focus on immutable types where you must choose specifications.
On several parts of this problem set, the classes and methods will be yours to specify and create, but you must pay attention to the PS2 instructions sections in the provided documentation.
You must satisfy the specifications of the provided interfaces and classes. In some cases, you are permitted to strengthen the provided specifications or add new methods, but in other cases you are not permitted to change them at all.
On this problem set, Didit provides less feedback about the correctness of your code:
- It is your responsibility to examine Didit feedback and make sure your code compiles and runs properly for grading.
- However, correctness is your responsibility alone, and you must rely on your own careful specification, testing, and implementation to achieve it.
Please remember to push early: we cannot guarantee timely Didit builds, especially near the problem set deadline.
Get the code
Ask Didit to create a remote
psets/ps2
repository for you on GitHub.mit.Clone the repo. Find the
ssh://
URL at the top of your Didit assignment page and run: git clone «URL»Import into Eclipse. See Problem Set 0 for if you need a refresher on how to create, clone, or import your repository.
Overview
Java Collections Framework provides many useful data structures for working with collections of objects: lists, maps, queues, sets, and so on. Let’s use some of these building blocks to build a data structure of our own, and then build one very small application of that ADT.
Interval sets · Multi-interval sets · Similarity between multi-interval sets
Interval sets
Problems 1-3: we will implement the interval set abstract type, a mutable set of unique labels where each label is associated with a non-overlapping interval on the number line.
For example, we can picture the interval set { A = [0,5), B = [10,30), C = [30,35) } as:
In this example, the labels are the unique String
objects "A"
, "B"
, and "C"
; and each is associated with an interval, which in our type will be an interval of long
values.
Every interval is half-open (also called half-closed), meaning that it includes its start point but not its end point, so [0,5) represents { x ∈ ℝ | 0 ≤ x < 5 }.
Read the Javadoc documentation for IntervalSet
generated from src/interval/IntervalSet.java
.
Labels do not have to have String
type, but can have any immutable type.
IntervalSet<L>
is a generic type similar to List<E>
or Map<K,V>
.
In the specification of List
, E
is a placeholder for the type of elements in the list.
In Map
, K
and V
are placeholders for the types of keys and values.
Only later does the client of List
or Map
choose particular types for these placeholders by constructing, for example, a List<Clown>
or a Map<Car,Integer>
.
The specification of a generic type is written in terms of the placeholders.
For example, the specification of Map<K,V>
says that K
must be an immutable type: if you want to make something the key in a Map
, it shouldn’t be a mutable object because the Map
may not work correctly.
In our specification for IntervalSet<L>
, we make the same demand about the immutability of type L
.
Clients of IntervalSet
who try to use mutable labels have violated the precondition.
They cannot expect correct behavior.
For this problem set, we will implement IntervalSet
twice, with two different reps, to practice choosing abstraction functions and rep invariants and preventing rep exposure.
There are many good reasons for a library to provide multiple implementations of a type (for example, the ArrayList
and LinkedList
implementations of List
satisfy clients with different performance requirements), and we’re following that model.
Multi-interval sets
Problem 4: using interval sets, we will implement the multi-interval set type, a mutable set of labels where each label is associated with one or more globally-non-overlapping intervals.
For example, { A = [[0,5),[20,25)], B = [[10,20),[25,30)], C = [[30,35)] }:
Once again, the labels are "A"
, "B"
, and "C"
.
Label "C"
is associated with one interval, labels "A"
and "B"
each with two.
Read the Javadoc documentation for MultiIntervalSet
generated from src/interval/MultiIntervalSet.java
.
Similarity between multi-interval sets
Problem 5: with our multi-interval set datatype in hand, the MVP for your new “” startup needs one crucial piece of proprietary technology — a way to measure the similarity between multi-interval sets.
For example, given { A = [[0,5),[20,25)], B = [[10,20),[25,30)] } and { A = [[20,35)], B = [[10,20)], C = [[0,5)] }:
We might consider an interval completely similar (value 1) when the labels are equal (e.g. on [10,20) and [20,25)) and dissimilar (value 0) otherwise.
Similarity will be defined as a ratio of the extent spanned by the two multi-interval sets, so the similarity between these two sets with the binary definition of label similarity is:
( 5×0 + 5×0 + 10×1 + 5×1 + 5×0 + 5×0 ) ÷ ( 35 - 0 ) = 15 / 35 ≈ 0.42857
(Hover or tap on individual terms to highlight that part of the figure above.)
However, our Similarity
type will support a client-supplied definition of label similarity, so for example the user might specify that "A"
and "C"
have similarity 0 but "A"
and "B"
are closer, similarity ½.
Using that definition:
( 5×0 + 5×0 + 10×1 + 5×1 + 5×0.5 + 5×0 ) ÷ ( 35 - 0 ) = 17.5 / 35 = 0.5
Clients of Similarity
define label similarity by providing a file: for each pair of labels in the file, the client specifies a numeric similarity between 0 and 1.
The similarity between all other pairs of labels is 0.
Read the Javadoc documentation for Similarity
generated from src/startup/Similarity.java
.
The file format is specified in the Similarity(File)
constructor Javadoc.
Problem 1: Test IntervalSet<String>
Devise, document, and implement tests for IntervalSet<String>
.
For now, we’ll only test (and then implement) interval sets with String
labels.
Later, we’ll expand to other kinds of labels.
In order to accommodate running our tests on multiple implementations of the IntervalSet
interface, here is the setup:
The testing strategy and tests for the static
IntervalSet.empty()
method are inIntervalSetEmptyTest.java
. Since the method is static, there will be only one implementation, and we only need to run these tests once. We’ve provided these tests. You are free to change or add to them, but you can leave them as-is for this problem.Write your testing strategy and your tests for all the instance methods in
IntervalSetTest.java
. In these tests, you must use theemptyInstance()
method to get fresh empty sets, notIntervalSet.empty()
! See the providedtestInitialLabelsEmpty()
for an example.
Java note
IntervalSetTest
is an abstract class.
Abstract classes and subclassing have their uses, but in general should be avoided.
Unlike IntervalSetEmptyTest
, which works like any other JUnit test class we’ve written, IntervalSetTest
is different because you can’t run it directly.
It has a blank waiting to be filled in: the emptyInstance()
method that will provide empty IntervalSet
objects.
In the next problem, we’ll see two subclasses of IntervalSetTest
that fill in the blank by returning empty sets of different types.
Your tests in IntervalSetTest
must be legal clients of the IntervalSet
spec.
Any tests specific to your implementations will go in the subclasses in the next problem.
Refer to Abstract Data Types: Testing an abstract data type.
When testing instance methods, remember to partition the state of input this
.
Commit to Git. Once you’re happy with your solution to this problem, commit and push!
Problem 2: Implement IntervalSet<String>
Now we’ll implement interval sets with String
labels — twice.
For all the classes you write in this problem set:
- Document the abstraction function and representation invariant.
- Along with the rep invariant, document how the type prevents rep exposure.
- Implement
checkRep
to check the rep invariant. - Implement
toString
with a useful human-readable representation of the abstract value.
All your classes must have clear documented specifications.
This means every method will have a Javadoc comment, except when you use @Override
:
- When a method is specified in an interface and then implemented by a concrete class — for example,
insert(..)
inRepMapIntervalSet
is specified inIntervalSet
— the@Override
annotation with no Javadoc comment indicates the method has the same spec. Unless the concrete class needs to strengthen the spec, don’t write the spec again (DRY).
You can choose either RepMapIntervalSet
or RepListIntervalSet
to write first.
There should be no dependence or sharing of code between your two implementations.
2.1. Implement RepMapIntervalSet
For RepMapIntervalSet
, you must use the rep provided:
private final Map<String, Long> startMap = new HashMap<>();
private final Map<String, Long> endMap = new HashMap<>();
You may not add fields to the rep or choose not to use one of the fields.
You must use the names startMap
and endMap
, even if they are not the names you would choose.
The no-argument constructor must create an empty interval set (you are free to add other constructors that take arguments).
Be sure to use the @Override
annotation on toString
to ensure that you are correctly overriding the Object
version of the method, not creating a new, different method.
You should strengthen the spec of RepMapIntervalSet.toString()
and write tests for it in RepMapIntervalSetTest.java
.
Don’t add those tests to IntervalSetTest
, which must test the IntervalSet
spec.
All Java classes inherit Object.toString()
with its very underdetermined spec.
IntervalSet
doesn’t specify a stronger spec, but your concrete classes can.
The choice not to strengthen the spec of toString
in the IntervalSet
interface is deliberate: since toString
is intended for human consumption, keeping its spec weak helps prevent other code from depending on its particular format.
Run the tests by right-clicking on RepMapIntervalSetTest.java
and selecting Run As → JUnit Test.
Commit to Git. Once you’re happy with your solution to this problem, commit and push!
2.2. Implement RepListIntervalSet
For RepListIntervalSet
, you must use the rep provided:
private final List<String> labelList = new ArrayList<>();
private final List<Long> valueList = new ArrayList<>();
You may not add fields to the rep or choose not to use one of the fields.
You must use the names labelList
and valueList
, even if they are not the names you would choose.
The no-argument constructor must create an empty interval set (you are free to add other constructors that take arguments).
You should strengthen the spec of RepListIntervalSet.toString()
and write tests for it in RepListIntervalSetTest.java
.
Don’t add those tests to IntervalSetTest
, which must test the IntervalSet
spec.
Run the tests by right-clicking on RepListIntervalSetTest.java
and selecting Run As → JUnit Test.
Commit to Git. Once you’re happy with your solution to this problem, commit and push!
Problem 3: Implement generic IntervalSet<L>
Nothing in either of your implementations of IntervalSet
should rely on the fact that labels are of type String
in particular.
The spec says that labels “must be immutable” and are “compared using equals
.”
Let’s change both of our implementations to support labels of any type that meets these conditions.
3.1. Make the implementations generic
Change the declarations of your concrete classes to read:
public class RepMapIntervalSet<L> implements IntervalSet<L> { ... }
and:
public class RepListIntervalSet<L> implements IntervalSet<L> { ... }
Update both of your implementations to support any type of label, using placeholder
L
instead ofString
. Roughly speaking, you should be able to find-and-replace all the instances ofString
in your code withL
!This refactoring will make
RepMapIntervalSet
andRepListIntervalSet
generic types.Any place you previously referred to these types without a generic parameter, you will need to add it. For example, if you called a constructor like
new RepMapIntervalSet()
, that will need to becomenew RepMapIntervalSet<String>()
. Depending on context, you can use diamond notation<>
to avoid writing type parameters twice.When you’re done with the conversion, all your instance method tests should pass.
Commit to Git. As always, once you’re happy with your solution to this problem, commit and push!
3.2. Test other types of labels
Because the implementation of IntervalSet
does not know or care about the actual type of the labels, and only compares them according to the IntervalSet
specification, your test suite with String
labels is sufficient.
However, since this is our first generic type, to gain confidence that you do indeed support different types of labels, modify or add to your tests of toString()
in RepMap-
and RepListIntervalSetTest
so that they create and use RepMap-
and RepListIntervalSet
instances (respectively) with a couple different types of (immutable) labels.
Note: you cannot write tests with other types of labels in IntervalSetTest
, because those tests must use the emptyInstance()
method to get new sets.
That method always returns IntervalSet<String>
.
3.3. Implement IntervalSet.empty()
Pick one of your IntervalSet
implementations for clients to use, and implement IntervalSet.empty()
.
Unlike ArrayList
and LinkedList
, where clients who only want a List
are forced to break the abstraction barrier by calling a concrete constructor, clients of IntervalSet
should not be aware of how we’ve implemented it.
Java note
You may use @SafeVarargs
where necessary, for example if you write testing helper functions that take a variable number of arguments.
At this point, all of your interval set code (implementations and tests) must have no warnings from the compiler (warnings have a symbol, as opposed to the for errors) and no @SuppressWarnings
annotations (which would disable the warning, nice try).
If you cannot figure out how to make your implementations generic while satisfying the Java compiler’s static type checking, please visit lab or office hours, or ask a question on Piazza.
At this point in the problem set, we’ve completed the test-first implementation of IntervalSet
.
RepMap-
andRepListIntervalSetTest
should have a green bar from JUnit and excellent code coverage from the coverage tool.This is a good opportunity for self code review: remove dead code and debugging
println
s, DRY up duplication, and so on.
Problem 4: MultiIntervalSet<L>
Just as we implemented the IntervalSet
ADT by relying on other ADTs as building blocks, we can use IntervalSet
as part of the specification and as part of the representation of another ADT.
4.1 Test MultiIntervalSet
Devise, document, and implement tests for MultiIntervalSet
in MultiIntervalSetTest.java
.
MultiIntervalSet
does not strengthen the spec of toString()
.
If you wish to test a stronger spec of toString()
that your implementation will satisfy, put those tests in a MyMultiIntervalSetTest
class in the interval
package in the test
folder alongside the other JUnit test classes.
Your tests in MultiIntervalSetTest
must be legal clients of the MultiIntervalSet
spec.
Java note
If you define a helper type to use in the rep of MultiIntervalSet
, we require that class be declared in MultiIntervalSet.java
.
And if your helper type is immutable, be careful: not implementing equality as described in class 15 will mean you cannot use equals
to compare objects of your helper type.
4.2 Implement MultiIntervalSet
Implement MultiIntervalSet
in MultiIntervalSet.java
.
You must use IntervalSet
in the rep of MultiIntervalSet
.
For example, IntervalSet
instances might be elements in a list or values in a map.
Your MultiIntervalSet
should rely only on the specs of IntervalSet
, not on the details of a particular IntervalSet
implementation, and it should use the static empty()
method to obtain interval set instances.
The implementation of MultiIntervalSet
is otherwise entirely up to you.
Run all the IntervalSet
and MultiIntervalSet
tests by right-clicking on the interval
package in the test
folder and selecting Run As → JUnit Test.
Problem 5: Fame and fortune with IntervalSet
You are the founder and CTO of , the hottest new startup that just called “ for , but .” In your MVP, you’re leveraging the most advanced artificial intelligence algorithms and machine learning architecture to deliver unprecedented breakthrough results in the world of .
Of course, that’s all a bunch of . As CTO, you realize it all comes down to managing multi-interval sets that represent . works by comparing . You need to develop tools for measuring the similarity of multi-interval sets: completely different will score 0, with the score approaching 1 as they become more alike.
And best of all, when you pivot your startup to and rebrand as , the startup in , you’ll use those multi-interval sets to represent ! In the new startup, you’re going to compare — so your success in the startup world all comes down to this one problem.
5.1 Specify and test Similarity
Devise, document, and implement tests for Similarity
in SimilarityTest.java
.
You decide whether Similarity
is mutable or immutable, and you should modify the class Javadoc to state that part of the spec.
In the future, every immutable type you write will override equals
and hashCode
to implement the Object
contract as described in class 15 on equality, but that’s not required for this problem set.
You are free to add additional methods to Similarity
, but you may not change the signatures of the required methods.
You are free to strengthen the specifications of the required methods, but you may not weaken them.
Note how this might mean your tests for Similarity
are not usable against someone else’s implementation, because you’ve strengthened or added to the spec.
Your tests should make use of one or more sample definition files.
Put those files in the test/startup
directory — the same directory as SimilarityTest.java
.
Reference them in the code with, for example:
new File("test/startup/.txt")
While you still must test and implement Similarity(File)
, remember that you may add operations to Similarity
.
Adding a creator that takes a String
or Map
, for example, might simplify your tests for similarity(..)
.
5.2 Implement Similarity
Implement Similarity
in Similarity.java
.
Your Similarity
type should rely only on the specs of MultiIntervalSet
, not on the details of any particular implementation.
The implementation of Similarity
is entirely up to you.
For reading in the label similarity definition files, there are at least three Java APIs to consider:
Java note
Another option, if you want to program in a functional style: use Files.lines(..)
, which returns a Stream<String>
.
FileReader
is the standard class for reading from a file. Wrapping theFileReader
in aBufferedReader
allows you to use the convenientreadLine()
method to read whole lines at a time.The utility function
Files.readAllLines(..)
will read all the lines from aPath
. Our constructor takes aFile
, butFile
provides atoPath()
method to obtain aPath
.Class
Scanner
is designed for breaking up input text into pieces. It provides many features and you’ll need to read its specs carefully.
Run all the tests in the project by right-clicking on the test
folder and selecting Run As → JUnit Test.
Before you’re done
Make sure you have documented specifications, in the form of properly-formatted Javadoc comments, for all your types and operations.
Make sure you have documented abstraction functions and representation invariants, in the form of a comment near the field declarations, for all your implementations.
With the rep invariant, also say how the type prevents rep exposure.
Make sure all types use
checkRep
to check the rep invariant and implementtoString
with a useful human-readable representation of the abstract value.To gain static checking of the correct signature, use
@Override
when you overridetoString
.Also use
@Override
when a class implements an interface method, to remind readers where they can find the spec.Make sure you have a thorough, principled test suite for every type.
Submitting
Make sure you commit AND push your work to your repository on Athena.
We will use the state of your repository on Athena as of 10:00pm on the deadline date.
When you git push
, the continuous build system attempts to compile your code and run some basic tests.
You can always review your build results at didit.csail.mit.edu/6.031/fa18.
Didit feedback is provided on a best-effort basis:
- There is no guarantee that Didit tests will run within any particular timeframe, or at all. If you push code close to the deadline, the large number of submissions will slow the turnaround time before your code is examined.
- If you commit and push right before the deadline, the Didit build does not have to complete in order for that commit to be graded.
- Passing some or all of the public tests on Didit is no guarantee that you will pass the full battery of autograding tests — but failing them is almost sure to mean lost points on the problem set.
Grading
Your overall ps2 grade will be computed as approximately:
~35% alpha autograde + ~5% alpha manual grade + ~45% beta autograde + ~15% beta manual grade
The autograder tests will not change from alpha to beta.
Manual grading of the alpha will only examine the internal documentation (AF, RI, etc.) and implementation of your ADTs. Manual grading of the beta may examine any part, including re-examining ADT implementations, and how you addressed code review feedback.