Problem Set 2: Multi-Startup Set
- Alpha due
- Wednesday, March 24, 2021, 10:00 pm
- Code reviews due
- Sunday, March 28, 2021, 2:00 pm
- Beta due
- Wednesday, March 31, 2021, 10:00 pm
The purpose of this problem set is to practice designing, testing, and implementing abstract data types. This problem set focuses on implementing mutable types where the specifications are provided to you; the next problem set will focus on immutable types where you must choose specifications.
On several parts of this problem set, the classes and methods will be yours to specify and create, but you must pay attention to the PS2 instructions sections in the provided documentation.
You must satisfy the specifications of the provided interfaces and classes.
On the last problem, you are permitted to strengthen some provided specifications or add new methods to the Similarity
class, but otherwise you are not permitted to change the specifications at all.
On this problem set, Didit provides less feedback about the correctness of your code:
- It is your responsibility to examine Didit feedback and make sure your code compiles and runs properly for grading.
- However, correctness is your responsibility alone, and you must rely on your own careful specification, testing, and implementation to achieve it.
Please remember to push early: we cannot guarantee timely Didit builds, especially near the problem set deadline.
Get the code
Ask Didit to create a remote
psets/ps2
repository for you on github.mit.edu.Clone the repo. Find the
git clone
command at the top of your Didit assignment page, copy it entirely, paste it into your terminal, and run it.Import into Eclipse. See Problem Set 0 for if you need a refresher on how to create, clone, or import your repository.
Overview
Java Collections Framework provides many useful data structures for working with collections of objects: lists, maps, queues, sets, and so on. Let’s use some of these building blocks to build a data structure of our own, and then build one small application of that ADT.
Interval sets · Multi-interval sets · Similarity between multi-interval sets
Interval sets
Problems 1-3: we will implement the interval set abstract type, a mutable set of unique labels where each label is associated with a non-overlapping interval on the number line.
For example, we can picture the interval set { A = [0,5), B = [10,30), C = [30,35) } as:
In this example, the labels are the unique String
objects "A"
, "B"
, and "C"
; and each is associated with an interval, which in our type will be an interval of long
values.
Every interval is half-open (also called half-closed), meaning that it includes its start point but not its end point, so [0,5) represents { x ∈ ℝ | 0 ≤ x < 5 }.
Read the Javadoc documentation for IntervalSet
generated from src/interval/IntervalSet.java
.
Labels do not have to have String
type, but can have any immutable type.
IntervalSet<L>
is a generic type similar to List<E>
or Map<K,V>
.
In the specification of List
, E
is a placeholder for the type of elements in the list.
In Map
, K
and V
are placeholders for the types of keys and values.
Only later does the client of List
or Map
choose particular types for these placeholders by constructing, for example, a List<Clown>
or a Map<Car,Integer>
.
The specification of a generic type is written in terms of the placeholders.
For example, the specification of Map<K,V>
says that K
must be an immutable type: if you want to make something the key in a Map
, it shouldn’t be a mutable object because the Map
may not work correctly.
In our specification for IntervalSet<L>
, we make the same demand about the immutability of type L
.
Clients of IntervalSet
who try to use mutable labels have violated the precondition.
They cannot expect correct behavior.
For this problem set, we will implement IntervalSet
twice, with two different reps, to practice choosing abstraction functions and rep invariants and preventing rep exposure.
There are many good reasons for a library to provide multiple implementations of a type (for example, the ArrayList
and LinkedList
implementations of List
satisfy clients with different performance requirements), and we’re following that model.
Multi-interval sets
Problem 4: using interval sets, we will implement the multi-interval set type, a mutable set of labels where each label is associated with one or more globally-non-overlapping intervals.
For example, { A = [[0,5),[20,25)], B = [[10,20),[25,30)], C = [[30,35)] }:
Once again, the labels are "A"
, "B"
, and "C"
.
Label "C"
is associated with one interval, labels "A"
and "B"
each with two.
Read the Javadoc documentation for MultiIntervalSet
generated from src/interval/MultiIntervalSet.java
.
Similarity between multi-interval sets
Problem 5: with our multi-interval set datatype in hand, the MVP for your new “” startup needs one crucial piece of proprietary technology — a way to measure the similarity between multi-interval sets.
For example, given { A = [[0,5),[20,25)], B = [[10,20),[25,30)] } and { A = [[20,35)], B = [[10,20)], C = [[0,5)] }:
We might consider an interval on the number line completely similar (value 1) when the labels are equal (e.g. on [10,20) and [20,25)) and dissimilar (value 0) otherwise.
Similarity will be defined as a ratio of the extent spanned by the two multi-interval sets, so the similarity between these two sets with the binary definition of label similarity is:
( 5×0 + 5×0 + 10×1 + 5×1 + 5×0 + 5×0 ) ÷ ( 35 - 0 ) = 15 / 35 ≈ 0.42857
(Hover or tap on individual terms to highlight that part of the figure above.)
However, our similarity
function will support a client-supplied definition of label similarity, so for example the user might specify that "A"
and "C"
have similarity 0 but "A"
and "B"
are closer, similarity ½.
Using that definition:
( 5×0 + 5×0 + 10×1 + 5×1 + 5×0.5 + 5×0 ) ÷ ( 35 - 0 ) = 17.5 / 35 = 0.5
Clients of similarity
define label similarity by providing a list of pairs of labels with their numeric similarity between 0 and 1.
The similarity between all other pairs of labels is 0.
Read the Javadoc documentation for Similarity
generated from src/startup/Similarity.java
.
Problem 1: Test IntervalSet<String>
Devise, document, and implement tests for IntervalSet<String>
.
For now, we’ll only test (and then implement) interval sets with String
labels.
Later, we’ll implement a generic IntervalSet<L>
that can handle other kinds of labels too.
In order to accommodate running our tests on multiple implementations of the IntervalSet
interface, here is the setup:
The testing strategy and tests for the static
IntervalSet.empty()
method are inEmptyTest.java
. Since the method is static, there will be only one implementation, and we only need to run these tests once. We’ve provided these tests and you do not need to change them.Write your testing strategy and your tests for the instance methods in
IntervalSetTest.java
. In these tests, you must use theemptyInstance()
method to get fresh empty sets, notIntervalSet.empty()
! See the providedtestInitialLabelsEmpty()
for an example.
Java note
IntervalSetTest
is an abstract class.
Abstract classes and subclassing have their uses, but in general should be avoided.
Unlike EmptyTest
, which works like any other JUnit test class we’ve written, IntervalSetTest
is different because you can’t run it directly.
It has a blank waiting to be filled in: the emptyInstance()
method that will provide empty IntervalSet
objects.
In the next problem, we’ll see two subclasses of IntervalSetTest
that fill in the blank by returning empty sets of different types.
Your tests in IntervalSetTest
must be legal clients of the IntervalSet
spec.
Any tests specific to your implementations will go in the subclasses in the next problem.
Refer to Abstract Data Types: Testing an abstract data type.
When testing instance methods, remember to partition the state of input this
.
reading exercises
Commit to Git. Once you’re happy with your solution to this problem, commit and push!
Problem 2: Implement IntervalSet<String>
Now we’ll implement interval sets with String
labels — twice.
For all the classes you write in this problem set:
- Document the abstraction function and representation invariant.
- And along with the rep invariant, document how the type prevents rep exposure. See Documenting the AF, RI, & SRE.
- Implement
checkRep
to check the rep invariant. - Implement
toString
with a useful human-readable representation of the abstract value.
All your classes must have clear documented specifications.
This means every method will have a Javadoc comment, except when you use @Override
:
- When a method is specified in an interface and then implemented by a concrete class — for example,
insert(..)
inRepMapIntervalSet
is specified inIntervalSet
— the@Override
annotation with no Javadoc comment indicates the method has the same spec. Unless the concrete class needs to strengthen the spec, don’t write the spec again (DRY).
You can choose either RepMapIntervalSet
or RepListIntervalSet
to write first.
For this exercise in ADT building, there must be no dependence or sharing of code between your two implementations. You may not, for example, create shared helper methods used by both classes.
reading exercises
2.1. Implement RepMapIntervalSet
For RepMapIntervalSet
, you must use the rep provided:
private final Map<String, Long> startMap = new HashMap<>();
private final Map<Long, Long> endMap = new HashMap<>();
You may not add fields to the rep or choose not to use one of the fields.
You must use the names startMap
and endMap
, even if they are not the names you would choose.
The no-argument constructor must create an empty interval set (you are free to add other constructors that take arguments).
Using the @Override
annotation on toString
ensures that you are correctly overriding the Object
version of that method, not creating a new, different method.
All Java classes inherit Object.toString()
with its very underdetermined spec.
By using @Override
and no Javadoc comment, RepMapIntervalSet
keeps that same spec.
The choice not to strengthen the spec is deliberate: since toString
is intended for human consumption, keeping its spec weak helps prevent other code from depending on its particular format.
Because of this very weak spec, you do not need to test toString
.
Run your IntervalSetTest
tests on RepMapIntervalSet
by right-clicking on RepMapIntervalSetTest.java
and selecting Run As → JUnit Test.
Commit to Git. Once you’re happy with your solution to this problem, commit and push!
2.2. Implement RepListIntervalSet
For RepListIntervalSet
, you must use the rep provided:
private final List<String> labelList = new ArrayList<>();
private final List<Long> valueList = new ArrayList<>();
You may not add fields to the rep or choose not to use one of the fields.
You must use the names labelList
and valueList
, even if they are not the names you would choose.
The no-argument constructor must create an empty interval set (you are free to add other constructors that take arguments).
As mentioned above, RepMap-
and RepListIntervalSet
may not share any code.
Run the tests by right-clicking on RepListIntervalSetTest.java
and selecting Run As → JUnit Test.
Commit to Git. Once you’re happy with your solution to this problem, commit and push!
Problem 3: Implement generic IntervalSet<L>
Nothing in either of your implementations of IntervalSet
should rely on the fact that labels are of type String
in particular.
The spec says that labels “must be immutable” and are “compared using equals
.”
Let’s change both of our implementations to support labels of any type that meets these conditions.
3.1. Make the implementations generic
Change the declarations of your concrete classes to read:
public class RepMapIntervalSet<L> implements IntervalSet<L> { ... }
and:
public class RepListIntervalSet<L> implements IntervalSet<L> { ... }
Update both of your implementations to support any type of label, using placeholder
L
instead ofString
. Roughly speaking, you should be able to find-and-replace all the instances ofString
in your code withL
!This refactoring will make
RepMapIntervalSet
andRepListIntervalSet
generic types.Any place you previously referred to these types without a generic parameter, you will need to add it. For example, if you called a constructor like
new RepMapIntervalSet()
, that will need to becomenew RepMapIntervalSet<String>()
. Depending on context, you can use diamond notation<>
to avoid writing type parameters twice.When you’re done with the conversion, all your instance method tests should pass.
Commit to Git. As always, once you’re happy with your solution to this problem, commit and push!
3.2. Other types of labels
Because the implementation of IntervalSet
does not know or care about the actual type of the labels, and only compares them according to the IntervalSet
specification, your test suite with String
labels is sufficient…
Except that our choice of String
for the test suite could prevent us from finding two Java-String
-specific bugs:
- If the implementation incorrectly compares labels using
==
instead ofequals
, identicalString
literals are very likely to return true when compared with==
, masking the bug. See the Optimization hides a bug exercise in Basic Java: == vs. equals(). - If the implementation incorrectly compares labels using their
toString
, it will break for label types wheretoString
can return the same string for unequal instances (toString
doesn’t guarantee any uniqueness), but we cannot find this bug by testingString
labels.
Review your RepMap-
and RepListIntervalSet
implementations to see that you:
- Never compare label objects with
==
. - Never convert labels to strings with
toString
, except in theRepMap-
orRepListIntervalSet
class’ owntoString
.
Note: you cannot write tests with other types of labels in IntervalSetTest
, because those tests must use the emptyInstance()
method to get new sets.
That method always returns IntervalSet<String>
.
3.3. Implement IntervalSet.empty()
Pick one of your IntervalSet
implementations for clients to use, and implement IntervalSet.empty()
.
Unlike ArrayList
and LinkedList
, where clients who only want a List
are forced to break the abstraction barrier by calling a concrete constructor, clients of IntervalSet
should not be aware of how we’ve implemented it.
You may use @SafeVarargs
where necessary, for example if you write testing helper functions that take a variable number of arguments.
At this point, all of your interval set code (implementations and tests) must have no warnings from the compiler (warnings have a symbol, as opposed to the for errors) and no @SuppressWarnings
annotations (which would disable the warning, nice try).
If you cannot figure out how to make your implementations generic while satisfying the Java compiler’s static type checking, please visit lab or office hours, or ask a question on Piazza.
At this point in the problem set, we’ve completed the test-first implementation of IntervalSet
.
RepMap-
andRepListIntervalSetTest
should have a green bar from JUnit and excellent code coverage from the coverage tool.This is a good opportunity for self code review: remove dead code and debugging
println
s, DRY up duplication, and so on.
Problem 4: MultiIntervalSet<L>
Just as we implemented the IntervalSet
ADT by relying on other ADTs as building blocks, we can use IntervalSet
as part of the specification and as part of the representation of another ADT.
4.1 Test MultiIntervalSet
Devise, document, and implement tests for MultiIntervalSet
in MultiIntervalSetTest.java
.
Your tests in MultiIntervalSetTest
must be legal clients of the MultiIntervalSet
spec.
As before, since the spec of toString
is very weak, you do not need to test toString
.
Java note
If you define a helper type to use in the rep of MultiIntervalSet
, we require that class be declared in MultiIntervalSet.java
.
And if your helper type is immutable, be careful: not implementing equality as described in class 15 will mean you cannot use equals
to compare objects of your helper type.
4.2 Implement MultiIntervalSet
Implement MultiIntervalSet
in MultiIntervalSet.java
.
You must use IntervalSet
in the rep of MultiIntervalSet
.
For example, IntervalSet
instances might be elements in a list or values in a map.
Your MultiIntervalSet
should rely only on the specs of IntervalSet
, not on the details of a particular IntervalSet
implementation, and it should use the static empty()
method to obtain interval set instances.
The implementation of MultiIntervalSet
is otherwise entirely up to you.
Run all the IntervalSet
and MultiIntervalSet
tests by right-clicking on the interval
package in the test
folder and selecting Run As → JUnit Test.
Problem 5: Fame and fortune with IntervalSet
You are the founder and CTO of , the hottest new startup that just called “ for , but .” In your MVP, you’re leveraging the most advanced artificial intelligence algorithms and machine learning architecture to deliver unprecedented breakthrough results in the world of .
Of course, that’s all a bunch of . As CTO, you realize it all comes down to managing multi-interval sets that represent . works by comparing . You need to develop tools for measuring the similarity of multi-interval sets: completely different will score 0, with the score approaching 1 as they become more alike.
And best of all, when you pivot your startup to and rebrand as , the startup in , you’ll use those multi-interval sets to represent ! In the new startup, you’re going to compare — so your success in the startup world all comes down to this one problem.
5.1 Specify and test similarity
Devise, document, and implement tests for similarity(..)
in SimilarityTest.java
.
You are free to add additional methods to the Similarity
class, but you may not change the signature of the required method.
You are free to strengthen the required specification, but you may not weaken it.
Note how this might mean your tests for Similarity
are not usable against someone else’s implementation, because you’ve strengthened or added to the spec.
You are even free to make Similarity
into an abstract data type, in which case you should decide if it is mutable or immutable, document its rep invariant and abstraction function, specify its operations, and so on.
Similarity
must still provide the static similiarity(..)
method.
5.2 Implement similarity
Implement similarity(..)
in Similarity.java
.
Your Similarity
code should rely only on the specs of MultiIntervalSet
, not on the details of any particular implementation.
The implementation of Similarity
is entirely up to you.
Run all the tests in the project by right-clicking on the test
folder and selecting Run As → JUnit Test.
Before you’re done
Make sure you have documented specifications, in the form of properly-formatted Javadoc comments, for all your types and operations.
Make sure you have documented abstraction functions and representation invariants, in the form of a comment near the field declarations, for all your implementations.
With the rep invariant, also say how the type prevents rep exposure.
Make sure all types use
checkRep
to check the rep invariant and implementtoString
with a useful human-readable representation of the abstract value.To gain static checking of the correct signature, use
@Override
when you overridetoString
.Also use
@Override
when a class implements an interface method, to remind readers where they can find the spec.Make sure you have a thorough, principled test suite for every type.
Submitting
Make sure you commit AND push your work to your repository on github.mit.edu.
We will use the state of your repository on github.mit.edu as of 10:00pm on the deadline date.
When you git push
, the continuous build system attempts to compile your code and run the public tests (which are only a subset of the autograder tests).
You can always review your build results at didit.mit.edu/6.031/sp21.
Didit feedback is provided on a best-effort basis:
- There is no guarantee that Didit tests will run within any particular timeframe, or at all. If you push code close to the deadline, the large number of submissions will slow the turnaround time before your code is examined.
- If you commit and push right before the deadline, the Didit build does not have to complete in order for that commit to be graded.
- Passing some or all of the public tests on Didit is no guarantee that you will pass the full battery of autograding tests — but failing them is almost sure to mean lost points on the problem set.
Grading
Your overall ps2 grade will be computed as approximately:
~35% alpha autograde + ~5% alpha manual grade + ~45% beta autograde + ~15% beta manual grade
The autograder tests will not change from alpha to beta.
Manual grading of the alpha will only examine the internal documentation (AF, RI, etc.) and implementation of your ADTs. Manual grading of the beta may examine any part, including re-examining ADT implementations, and how you addressed code review feedback.