6.031 — Software Construction
Spring 2017

Norn Specification (Phase 1)

The purpose of this system is to represent and evaluate mailing list expressions. The result of evaluating a mailing list expression is a set of recipients who should receive an email message. Note that this system won’t send email itself. It will just produce a comma-separated list of recipient email addresses, suitable for copy and paste into an mail client.

A recipient is an email address with a username and domain name, like bitdiddle@mit.edu. Usernames and domain names are nonempty case-insensitive strings of letters, digits, underscores, dashes, and periods. This system does not enforce any other constraints on domain names. For example, we don’t care how many dot-separated parts a domain name has, or whether it ends with a valid top-level domain like .com or .edu.

List expressions

The simplest list expressions are:

  • the empty string, which means no recipients;
  • an email address, as defined above, which represents a single recipient;
  • a list name, which represents the set of recipients of a mailing list.

Like usernames and domain names, list names are nonempty case-insensitive strings of letters, digits, underscores, dashes, and periods.

Set operators

A list expression may also use these set operators:

  • e,f means set union: recipients in either e or f;
  • e!f means set difference: recipients in e but not in f;
  • e*f means set intersection: recipients in both e and f.

Note that ! in this language is a binary operator, not a unary operator like it is in Java.

Of these three operators, * has highest precedence, and , has lowest precedence, so a,b!c*d is equivalent to a,(b!(c*d)). Operators at the same level of precedence group from left to right, so a!b!c means (a!b)!c. This is the same order as the usual arithmetic operators, in which 9-3-5 means (9-3)-5 rather than 9-(3-5).

Parentheses may be used to group subexpressions:

  • (e) represents the same recipients as e.

List definitions

A list expression may define a list name:

  • listname=e defines listname as the expression e, and returns the set of recipients of e.

The name definition operator = has lower precedence than ,, so that x=a,b is equivalent to x=(a,b).

Name definitions are not intended to be nested inside other expressions. Examples of nesting include (x=a),(y=b) and x=y=z. If a name definition is a subexpression of an operator with the same or higher precedence, then the meaning of the expression is not defined by this spec. It’s up to you to decide what it means, or if it’s simply an invalid expression.

In this phase of the project, you may assume as a precondition that all list names are defined before being used and never redefined. So the first occurrence of listname will always be a definition listname=e. The definition cannot be recursive, so e does not mention listname either, and the system will never see another definition of listname. We’ll loosen these restrictions in the phase 2 specification.

Sequencing

Finally, a list expression may be a sequence of list expressions or list definitions separated by semicolons:

  • e ; f represents the recipients produced by f, after substituting the expressions of all named list definitions found in e

For example, consider x = a@mit.edu,b@mit.edu ; x * b@mit.edu. After substituting for x in the second part of the expression, this expression is equivalent to (a@mit.edu,b@mit.edu) * b@mit.edu, which represents the single recipient b@mit.edu.

Note that a semicolon is a separator between expressions, not a terminator. An expression ending with a semicolon, like a@mit.edu;, actually ends with the empty expression, so it evaluates to the empty set of recipients.

Semicolons have the lowest precedence of all the operators. Like name definitions, sequences are not intended to be nested as subexpressions of operators with higher precedence, so the behavior or validity of, for example, (a;b),(c;d) is up to you.

Whitespace

Whitespace characters around operators, email addresses, and list names are irrelevant and ignored, so a,b,c means the same as a , b , c. Whitespace characters are spaces, tabs (\t), carriage returns (\r), and linefeeds (\n).

Console user interface

The mailing list expression language should be implemented in an interactive system that allows the user to enter expressions at the console and see how they expand into lists of recipients. When a valid list expression is entered at the console, the output should be a valid list expression with the same meaning but consisting only of email addresses and commas.

For example, we could interact with it like this (user input in green):

These are example outputs, not fully determined outputs. Your system’s output may vary within the bounds of the spec. Examples of variation include whitespace, capitalization, and order.

> hobbits = bilbo@shire, frodo@shire, sam@shire, merry@shire, pippin@shire
bilbo@shire, frodo@shire, sam@shire, merry@shire, pippin@shire

> gandalf@cosmos, hobbits
gandalf@cosmos, bilbo@shire, frodo@shire, sam@shire, merry@shire, pippin@shire

> bagginses = bilbo@shire, frodo@shire
bilbo@shire, frodo@shire

> hobbits !bagginses
merry@shire, sam@shire, pippin@shire

Semantically, the interactive system’s output should be the result of evaluating the concatenation of all (non-erroneous) inputs that have been entered by the user, separated by ;. So the last result printed above should be the same as the result of the single concatenated expression:

hobbits = bilbo@shire, frodo@shire, sam@shire, merry@shire, pippin@shire ; 
gandalf@cosmos, hobbits ; 
bagginses = bilbo@shire, frodo@shire ; 
hobbits !bagginses

Even though this is the specified semantics, your system does not have to actually implement it this way. It makes more sense to create a mutable data type to store list definitions, so that you do not have to repeatedly evaluate old expressions.

If the user enters an invalid list expression, the system should print an informative human-readable error message. This message is not further specified.

Going Further

If you want to go beyond the spec, you might extend the list expression language with new features, e.g.:

  • user-written comments, like // this is a comment
  • regular expressions that expand into a set of list names, like /6.\d+-students/
  • macro definitions, like {1}-minus-fascists = {1} !all-faculty

You might extend the console user interface with additional commands, e.g.:

  • showing the definition of a list instead of evaluating it, e.g. !show hobbits
  • showing all currently-defined lists, e.g. !showall
  • debugging an expression by showing how each of its subexpressions evaluate, e.g. !debug hobbits

None of these features are required. Optional features make your system more powerful and interesting, but your grade will be based on whether your code is safe from bugs, easy to understand, and ready for change, not by how many extra features it has.