6.102
6.102 — Software Construction
Spring 2024

Reading 9: Functional Programming

Software in 6.102

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

Objectives

In this reading we’ll talk about several design patterns for operating on sequences of elements, and you’ll see how treating functions themselves as first-class values that we can pass around and manipulate in our programs is an especially powerful idea.

  • Iterators and generators
  • Map/filter/reduce
  • Higher-order functions
  • Immutability

First-class functions

Let’s start by reviewing an important Big Idea that you should have already encountered in 6.101 [formerly 6.009]: functions as first-class data values, meaning that they can be stored in variables, passed as arguments to functions, and created dynamically like other values.

For example, Math.sqrt is a reference to an object representing the sqrt function. The type of that object is (x: number) => number.

But you can also assign that function to another variable if you like, and it still behaves like sqrt:

const mySquareRoot: (x: number) => number = Math.sqrt;
mySquareRoot(16.0); // returns 4.0

The type that we declared on mySquareRoot is a function type expression. Note that the parameter name x is required! If you write (number) => number, it means “a function with a parameter named number of type any that returns a number“, because omitted types are implicitly any.

You can also pass a reference to the function as a parameter to another function. You can use functions the same way you would use any other value in TypeScript, like numbers or string references or other object references. In other words, functions in TypeScript are first-class, which means they can be treated like any other value in the language: passed as parameters, returned as return values, and stored in variables and data structures.

Programming languages are full of things that are not first-class. For example, access-control is not first-class – you can’t pass public or private as a parameter into a function, or store it in a data structure. The notion of public and private is an aspect of your program that TypeScript offers no way to refer to or manipulate at runtime. Similarly, a while loop or if statement is not first-class. You can’t refer to that piece of code all by itself, or manipulate it at runtime.

In old programming languages, only data was first-class: built-in types (like numbers) and user-defined types. But in modern programming languages, like Python and JavaScript, both data and functions are first-class. First-class functions are a very powerful programming idea. The first practical programming language that used them was Lisp, invented by John McCarthy at MIT. But the idea of programming with functions as first-class values actually predates computers, tracing back to Alonzo Church’s lambda calculus. The lambda calculus used the Greek letter λ to define new functions; this term stuck, and you’ll see it as a keyword not only in Lisp and its descendants, but also in Python.

Guido van Rossum, the creator of Python, wrote a blog post about the design principle that led not only to first-class functions in Python, but first-class methods as well: First-class Everything.

Function expressions in TypeScript

We have already been using function expressions, starting as far back as Testing, since the mocha framework uses them:

describe("Math.max", function() {
  it("covers a < b", function() {
    assert.strictEqual(Math.max(1, 2), 2);
  });
});

This code has two function expressions – the first one is passed as an argument to describe(), and the second (nested inside it) is passed as an argument to it().

TypeScript also has a more compact arrow syntax that avoids the need for a function keyword:

describe("Math.max", () => {
  it("covers a < b", () => {
    assert.strictEqual(Math.max(1, 2), 2);
  });
});

If the body of the function consists of only a single expression (i.e. like a Python lambda expression), then even the curly braces can be omitted:

it("covers a < b", () => assert.strictEqual(Math.max(1, 2), 2) );

But it turns out that there is a technical difference between arrow functions and function expressions that is important when using methods: a function expression may redefine this, but an arrow function uses the definition of this from its surrounding context. So arrow functions should always be used inside an instance method, rather than function expressions. Understanding This, Bind, Call, and Apply in JavaScript has a good explanation of the issues behind the meaning of this in JavaScript.

reading exercises

Sorting dogs

Suppose we have:

interface Dog {
    name(): string;
    breed(): Breed;
    loudnessOfBark(): number;
}

We have an array:

dogs: Array<Dog>

The Array.sort method takes a function argument that tells it how to compare two objects in the set; in this case, two Dogs. The type of this argument is (dog1: Dog, dog2: Dog) => number, and it should return a negative number if dog1 < dog2 (meaning dog1 should sort before dog2), a positive number if dog1 > dog2, or zero if dog1 = dog2.

Which of these would sort our dogs from quietest bark to loudest?

dogs.sort(function(dog1: Dog, dog2: Dog) {
    return dog2.loudnessOfBark() - dog1.loudnessOfBark();
});

(missing explanation)

dogs.sort(function(dog1: Dog, dog2: Dog) {
    return dog1.loudnessOfBark() > dog2.loudnessOfBark() ? dog1 : dog2;
});

(missing explanation)

dogs.sort((dog1: Dog, dog2: Dog) => {
  return dog1.loudnessOfBark() - dog2.loudnessOfBark();
});
dogs.sort((dog1: Dog, dog2: Dog) => dog1.loudnessOfBark() - dog2.loudnessOfBark());

(missing explanation)

function quietestFirst(dog1: Dog, dog2: Dog) {
    return dog1.loudnessOfBark() - dog2.loudnessOfBark();
};

dogs.sort(quietestFirst());

(missing explanation)

Abstracting out control

In this reading we discuss map/filter/reduce, a design pattern that substantially simplifies the implementation of functions that operate over sequences of elements.

In this example, we’ll have lots of sequences — arrays of files; input streams that are sequences of lines; lines that are sequences of words. Map/filter/reduce will enable us to operate on those sequences with no explicit control statements in our code — not a single for loop or if statement.

Suppose we’re given the following problem: write a function that finds all the words in the TypeScript files in your project.

Following good practice, we break it down into several simpler steps and write a function for each one:

  • find all the files in the project, by scanning recursively from the project’s root folder
  • restrict them to files with a particular suffix, in this case .ts
  • open each file and read it in line-by-line
  • break each line into words

Writing the individual functions for these substeps, we’ll find ourselves writing a lot of low-level iteration code. For example, here’s what the recursive traversal of the project folder might look like:

import fs from 'fs';
import path from 'path';

/**
 * Find names of all files in the filesystem subtree rooted at folder.
 * @param folder root of subtree, requires fs.lstatSync(folder).isDirectory() === true
 * @returns array of names of all ordinary files (not folders) that have folder as
 *          their ancestor
 */
function allFilesIn(folder: string): Array<string> {
    let files: Array<string> = [];
    for (const child of fs.readdirSync(folder)) {
        const fullNameOfChild = path.join(folder, child);
        if (fs.lstatSync(fullNameOfChild).isDirectory()) {
            files = files.concat(allFilesIn(fullNameOfChild));
        } else if (fs.lstatSync(fullNameOfChild).isFile()) {
            files.push(fullNameOfChild);
        }
    }
    return files;
}

And here’s what the filtering function might look like, which restricts that file array down to just the TypeScript files. Imagine calling this like onlyFilesWithSuffix(files, ".ts"):

/**
 * Filter an array of files to those that end with suffix.
 * @param filenames array of filenames
 * @param suffix string to test
 * @returns a new array consisting of only those files whose names end with suffix
 */
function onlyFilesWithSuffix(filenames: Array<string>, suffix: string): Array<string> {
    const result: Array<string> = [];
    for (const f of filenames) {
        if (f.endsWith(suffix)) {
            result.push(f);
        }
    }
    return result;
}

→ full code for the example

Iterator

Before we dive into map/filter/reduce, we’ll look at a simpler design pattern for sequences that also abstracts away from the details of control. An iterator is an object that steps through a sequence of elements and returns the elements one by one. Iterators are used under the covers in TypeScript when you’re using a for...of loop to step through an array.

An Iterator<E> over a collection of elements of type E has one method:

  • next() returns an object {done: boolean, value?: E}
    • where value is the next element in the collection if done is false
    • and there are no more elements if done is true

So this code:

let filenames: Array<string> = ...;
for (const f of filenames) {
    console.log(f);
}

is implemented by TypeScript by code roughly equivalent (this translation ignores some details) to this:

let filenames: Array<string> = ...;
let iter: Iterator<string> = filenames.iterator();
for (let result = iter.next(); ! result.done; result = iter.next()) {
    const f: string = result.value;
    console.log(f);
}

Note that the next() method is a mutator operation for Iterator, not only returning an element but also advancing the iterator so that the subsequent call to next() will return a different element.

Why do iterators exist? There are many kinds of collection data structures (linked lists, maps, hash tables) with different kinds of internal representations. The iterator concept allows a single uniform way to access them all, so that client code is simpler and the collection implementation can change without changing the client code that iterates over it. For example, although we could iterate over filenames using indices:

for (let i = 0; i < filenames.length; i++) {
    let f = filenames[i];
    // ...

…this code now depends on the length property of Array, as well as the [..] indexing operation, which might be different (or nonexistent) in another data structure. A Set, for example, uses size instead of length, and has no [..] indexing at all. The iterator design pattern abstracts away from these details, so we can write the same kind of for...of loop for many different data structures.

Most modern languages (including Python, C#, and Ruby) use the notion of an iterator. It’s an effective design pattern — a well-tested solution to a common design problem. Map/filter/reduce is also a design pattern, and we’ll see others as we move through the course.

Iterable vs. Iterator

It’s important to understand the difference between an iterator and an iterable.

An iterable type represents a sequence of elements that can be iterated over. In TypeScript, the Iterable<E> interface represents an iterable, and requires the type to have an iterator method which returns a fresh Iterator object for the sequence. Python has a similar notion of iterable, where the required method is called __iter__.

An iterator, on the other hand, does not represent an entire sequence of elements. Its state represents a particular point in an iteration over a sequence (so it actually represents the remainder of the sequence that it is iterating over). But an iterator is a mutable object. Each time next() is called, the iterator advances to the next element, and previous elements are no longer accessible through the iterator. Once the iterator reaches the end, it cannot be reset back to the beginning of the sequence.

As a result, when you are working with sequences – storing them in variables, passing them to functions, returning them from functions – you should be using iterable types, not Iterator itself. Often these iterable types are specific types like Array or Set. But if you are writing a function that doesn’t need any of the particular operations of these types — that only needs to iterate using for...of — then consider using the supertype Iterable instead.

reading exercises

Iterators are mutable

Suppose you have a function doSomething() defined like this:

// Uses the strings in `sequence` to compute something
function doSomething(sequence: Iterator<string>): string;

And you have a variable myStrings that you pass to doSomething():

doSomething(myStrings);

Which of the following are true?

(missing explanation)

Operating on sequences

Suppose you are writing a helper method for summing a sequence of numbers. What would be the most ready-for-change way to specify it?

(missing explanation)

MyIterator

To better understand how an iterator works, here’s a simple implementation of an iterator for Array<string>. Because our simple iterator can only handle strings, we’ll also simplify next() to return string|undefined – i.e., either the next value from the array, or undefined if we’ve reached the end. (Note that Iterator<E> can’t do this in general, because undefined might very well be one of the elements in the collection, if it’s part of the E type.)

/**
 * A MyIterator is a mutable object that iterates over
 * the elements of a Array<string>, from first to last.
 * This is just an example to show how an iterator works.
 * In practice, you should use the Array's own iterator
 * object, returned by its iterator() method.
 */
class MyIterator {

    private readonly array: Array<string>;
    private index: number;
    // AF(array, index) = the sequence of elements array[index]...array[array.length-1]
    // RI: index is a nonnegative integer

    /**
     * Make an iterator.
     * @param array array to iterate over
     */
    public constructor(array: Array<string>) {
        this.array = array;
        this.index = 0;
    }

    /**
     * Get the next element of the array.
     * Modifies: this iterator to advance it to the element 
     *           following the returned element.
     * @returns the next element in the array, 
     *          or undefined if there are no more elements
     */
    public next(): string|undefined {
        if (this.index >= this.array.length) {
            return undefined;
        } else {
            const value = this.array[this.index];
            ++this.index;
            return value;
        }
    }
}
iterMyIteratorarrayindex0filenamesArray<string>012"a.txt""b.txt""c.pdf"

Here’s a snapshot diagram showing a typical state for a MyIterator object in action:

Note that we draw the arrow from array with a double line, to indicate that it’s readonly. That means the arrow can’t change once it’s drawn. But the array object it points to is mutable — elements can be changed within it — and declaring array as readonly has no effect on that.

Mutation undermines an iterator

Let’s try using our iterator for a simple job. Suppose we have an array of filenames, like ["a.txt", "b.txt", "c.pdf"]. We want a function removeFilesWithExtension that will delete filenames with a particular extension, like .txt, from the array, leaving the other filenames behind. For this example, the function should remove the elements "a.txt" and "b.txt", leaving behind a one-element array ["c.pdf"].

Following test-first programming, we write a spec, choose and implement test cases to cover some partitions, and finally implement the function, using our MyIterator class:

/**
 * Remove filenames with a given extension.
 * Modifies the filenames array by removing filenames that end with the extension.
 * 
 * @param filenames array of filenames
 * @param extension must start with ".", e.g. ".txt"
 */
function removeFilesWithExtension(filenames: Array<string>, extension: string): void {
    let iter: MyIterator = new MyIterator(filenames);
    for (let filename = iter.next(); filename !== undefined; filename = iter.next()) {
        if (filename.endsWith(extension)) {
            // remove the filename from the array
            filenames.splice(filenames.indexOf(filename), 1);
        }
    }
}

Now we run our test cases, and they work! … almost. This test case fails:

// removeFilesWithExtension(["a.txt", "b.txt", "c.pdf"], ".txt")
//   expected ["c.pdf"], actual ["b.txt","c.pdf"]

We got the wrong answer: removeFilesWithExtension left a matching filename behind in the array! Why? Trace through what happens. It will help to use a snapshot diagram showing the MyIterator object and the Array object and update it while you work through the code.

reading exercises

Draw a snapshot diagram

Let’s draw a snapshot diagram to illustrate the bug. You will need to refer back to the source code for the MyIterator class and the removeFilesWithExtension() method above.

We are running the test case removeFilesWithExtension(["a.txt", "b.txt", "c.pdf"], ".txt").

In the box below, use Snapdown to draw a snapshot diagram showing the state of the program right at the start of the body of removeFilesWithExtension(). Open the Snapdown help sidebar to get the details of Snapdown syntax – it’s a little language for drawing diagrams with text.

"a.txt""b.txt""c.pdf"TODOArray<fixme>01"change me!"

(missing explanation)

Adding an iterator

Now update your snapshot diagram to reflect the effect of the first line, let iter: MyIterator = new MyIterator(filenames);

"a.txt""b.txt""c.pdf"TODOArray<fixme>01"change me!"

(missing explanation)

Entering the loop

Now update your snapshot diagram to reflect the effect of the first call to iter.next().

"a.txt""b.txt""c.pdf"TODOArray<fixme>01"change me!"

(missing explanation)

Remove an item

Now update your snapshot diagram to reflect the effect of the call filenames.splice(filenames.indexOf(filename), 1). If an index no longer exists in the Array, you should remove it from the diagram.

"a.txt""b.txt""c.pdf"TODOArray<fixme>01"change me!"

(missing explanation)

Next iteration of the loop

Now update your snapshot diagram to reflect the effect of the next call to iter.next().

"a.txt""b.txt""c.pdf"TODOArray<fixme>01"change me!"

(missing explanation)

Analysis

(missing explanation)

(missing explanation)

(missing explanation)

Note that this isn’t just a bug in our MyIterator. The built-in iterator in Array suffers from the same problem, and so does the for loop that’s syntactic sugar for it. So do lists and for loops in Python. Mutating a list that is currently being iterated over is simply not safe in general.

Mutation can make simple contracts very complex

This is a fundamental issue with mutable types. Multiple references to the same mutable object (aliasing) may mean that multiple places in your program — possibly widely separated — are relying on that object to remain consistent.

To put it in terms of specifications, contracts can’t be enforced in just one place anymore, e.g. between the client of a class and the implementer of a class. Contracts involving mutable objects now depend on the good behavior of everyone who has a reference to the mutable object.

As a symptom of this non-local contract phenomenon, consider iterators. Try to find where it documents the crucial requirement on the client that we’ve just discovered — that you shouldn’t mutate an array while you’re iterating over it. Who takes responsibility for it? The iterator specification? Array? Can you find it?

The need to reason about global properties like this makes it much harder to understand, and be confident in the correctness of, programs with mutable data structures. We still often do it — for performance and convenience — but we pay a big price in bug safety for doing so.

For these reasons, functional programming largely avoids mutation. The map/filter/reduce pattern discussed below produces new sequences of elements, rather than mutating existing ones, and the first-class functions that are passed to map/filter/reduce are generally pure functions, which return an output rather than mutating any of their inputs.

Map/filter/reduce abstraction

The map/filter/reduce patterns in this reading do something similar to Iterator, but at an even higher level: they treat the entire sequence of elements as a unit. In this paradigm, the control statements disappear: specifically, the for statements, the if statements, and the return statements in the code from our introductory example will be gone. We’ll also be able to get rid of most of the temporary names (i.e., the local variables filenames, f, and result). And we will be able to stop mutating the sequences we are using – no more calling push or splice on the arrays we are manipulating and creating.

We’ll have three operations for iterable types: map, filter, and reduce. Let’s look at each one in turn, and then look at how they work together. We’ll focus on using Array as the iterable type, because it provides map, filter, and reduce operations natively in TypeScript.

Map

Map applies a unary function to each element in the sequence and returns a new sequence containing the results, in the same order:

map : Array<‍E> × (E → F) → Array<‍F>

For example:

const array: Array<number> = [1, 4, 9, 16];
const result = array.map(Math.sqrt);

This code starts with an array containing the numbers 1, 4, 9, 16 and applies the square root function to each element to produce the values 1.0, 2.0, 3.0, 4.0.

In the call array.map(Math.sqrt):

  • map has type Array<‍number> × (number → number) → Array<‍number>
  • the input array has type Array<‍number>
  • the mapping function Math.sqrt has type number → number
  • the return value has type Array<‍number>

We can write the same expression more compactly as:

[1, 4, 9, 16].map(Math.sqrt);

which is the syntax that future examples will use.

Another example of a map:

["A", "b", "C"].map(s => s.toLocaleLowerCase())

which produces an array containing "a", "b", "c". (Notice that when we call a function like Math.sqrt, we can refer to it directly. Calling a method like toLocaleLowerCase requires a lambda expression to wrap the method call syntax.)

Map is useful even if you don’t care about the return value of the function. When you have a sequence of mutable objects, for example, you may want to map a mutator operation over them. Because mutator operations typically return void, however, you can use forEach instead of map. forEach applies the function to each element of the sequence, but does not collect their return values into a new sequence:

itemsToRemove.forEach(item => mySet.delete(item));

(A lambda is needed here, too. The expression mySet.delete by itself would refer to the delete function without binding mySet to this, which will not work.)

The map and forEach operations capture a common pattern for operating over sequences: doing the same thing to each element of the sequence.

reading exercises

map 1

What sequence is the result of

["1", "2", "3"].map((s: string) => s.length);

(missing explanation)

map 2

What sequence is the result of

[1, 2, 3].map((s: string) => s.length);

(missing explanation)

Filter

Our next important sequence operation is filter, which tests each element of an Array<E> with a unary function from E to boolean.

Elements that satisfy the predicate are kept; those that don’t are removed. A new sequence is returned; filter doesn’t modify its input sequence.

filter : Array<‍E> × (E → boolean) → Array<‍E>

Examples:

[1, 2, 3, 4]
   .filter(x => x%2 === 1);
// returns [1, 3]

["x", "y", "2", "3", "a"]
   .filter(s => "abcdefghijklmnopqrstuvwxyz".includes(s)));
// returns ["x", "y", "a"]

const isNonempty = (s: string) => s.length > 0;
["abc", "", "d"]
   .filter(isNonempty);
// returns ["abc", "d"]

reading exercises

filter 1

Given:

const s1 = "Abel";
const s2 = "Baker";
const s3 = "Charlie";

What sequence is the result of

[s1, s2, s3]
    .filter(s => s.startsWith("A"));

(missing explanation)

filter 2

Again given:

const s1 = "Abel";
const s2 = "Baker";
const s3 = "Charlie";

What sequence is the result of

[s1, s2, s3]
    .filter(s => s.startsWith("Z"));

(missing explanation)

filter 3

What is the result of

const isDigit = (s: string) => s.length === 1 && "0123456789".includes(s);
["a", "1", "b", "2"]
    .filter(isDigit);

(missing explanation)

filter 4

What is the result of

const isDigit = (s: string) => s.length === 1 && "0123456789".includes(s);
["a", "1", "b", "2"]
    .filter( ! isDigit);

(missing explanation)

Reduce

Our final operator, reduce, combines the elements of the sequence together, using a binary function. In addition to the function and the array, it also takes an initial value that initializes the reduction, and that ends up being the return value if the array is empty:

reduce : Array<‍E> × (E × E → E) × E → E

arr.reduce(f, init) combines the elements of the array together. Here is one way that it might compute the result:

result0 = init
result1 = f(result0, arr[0])
result2 = f(result1, arr[1])
...
resultn = f(resultn-1, arr[n-1])

resultn is the final result for an n-element sequence.

Adding numbers is probably the most straightforward example:

[1,2,3].reduce( (x,y) => x+y,  0)

…which computes as shown below:

reduce( [1, 2, 3], + , 0 )
= ((0 + 1) + 2) + 3
= 6

Initial value

There are three design choices in the reduce operation. First is whether to require an initial value. In TypeScript, the initial value can be omitted, in which case reduce uses the first element of the sequence as the initial value of the reduction. But if the sequence is empty, then reduce has no value to return, and reduce throws a TypeError. It’s important to keep this in mind when using reducers like max, which have no well-defined initial value:

[5, 8, 3, 1].reduce((x,y) => Math.max(x,y))
// computes max(max(max(5,8),3),1) and returns 8

[].reduce((x,y) => Math.max(x,y))
// throws TypeError!

Reduction to another type

The second design choice is the return type of the reduce operation. It doesn’t necessarily have to match the element type of the original sequence. For example, we can use reduce to concatenate an array of numbers (type E) into a string (type F). This changes the reduce operation in two ways:

  • the initial value now has type F
  • the binary function is now an accumulator, of type F × E → F, that takes the current result (of type F) and a new element from the sequence (of type E), and produces an accumulated result of type F.

So a more general form of the reduce operation now looks like this:

reduce : Array<‍E> × (F × E → F) × F → F

In the special case where F is the same type as E, this type signature is the same as above.

Here’s a simple example that concatenates a sequence of numbers into a string:

[1,2,3].reduce( (s: string, n: number) => s + n, "" );
// returns "123"

We’ve included parameter type declarations in the lambda expression above in order to clarify the difference between the two parameters of the accumulator function. If F is the same type as E, it’s ok to omit these type declarations in the lambda expression.

Order of operations

The third design choice is the order in which the elements are accumulated. TypeScript’s reduce operation combines the sequence starting from the left (the first element):

result0 = init
result1 = f(result0, arr[0])
result2 = f(result1, arr[1])
...
resultn = f(resultn-1, arr[n-1])

Another TypeScript operation, reduceRight, goes in the other direction:

result0 = init
result1 = f(result0, arr[n-1])
result2 = f(result1, arr[n-2])
...
resultn = f(resultn-1, arr[0])

to produce resultn as the final result.

Here’s a diagram that contrasts the two ways to reduce, from the left or from the right. If the operator is non-commutative (like string concatenation, shown here), then each direction might produce a different answer:

reduce( [1, 2, 3], + , "" )
= (("" + 1) + 2) + 3
= "123"
reduceRight( [1, 2, 3], + , "" )
= (("" + 3) + 2) + 1
= "321"

reading exercises

reduce 1
[e1, e2, e3].reduce( (x: boolean, y: string) => x && (y === 'true'),
                     true );

In this reduction, what should be the type of the elements e1,e2,e3?

(missing explanation)

Assuming e1,e2,e3 have that type, which is the best description of the behavior of this reduction?

(missing explanation)

reduce 2

What is the result of:

[1, 2, 3].reduce((a, b) => a * b, 0)

(missing explanation)

What is the result of:

["oscar", "papa", "tango"]
    .reduce((a,b) => a.length > b.length ? a : b)

(missing explanation)

reduce 3

This exercise explores three different ways to reduce an array using Math.min(), finding the minimum value. Unlike the reducing functions in the previous exercises, which have identity elements that are natural to use as initial values of the reduction, Math.min() has no natural identity element.

Given an array of numbers:

const array: Array<number> = [...];

…here are three possible ways to compute the minimum value of the array:

const result = array.reduce((x, y) => Math.min(x, y));
let result;
try {
    result = array.reduce((x, y) => Math.min(x, y));
} catch (e) {
    result = 0;
}
const result = array.reduce((x, y) => Math.min(x, y), Number.POSITIVE_INFINITY);

(missing explanation)

(missing explanation)

(missing explanation)

Back to the motivating example

Going back to the example we started with, where we want to find all the words in the TypeScript files in our project, let’s try creating a useful abstraction for filtering files by suffix:

const endsWith = (suffix: string) => {
    return (filename: string) => filename.endsWith(suffix);
}

TypeScript’s string.endsWith is a function string × string → boolean.

Our new endsWith wrapper returns functions that are useful as filters. It takes a filename suffix like .ts and dynamically generates a function that we can use with filter to test for that suffix. Given a Array<string> filenames, we can now write, e.g., filenames.filter(endsWith(".ts")) to obtain a new filtered array.

endsWith is a different kind of beast than our usual functions. It’s a higher-order function, meaning that it’s a function that takes another function as an argument, or returns another function as its result, as endsWith does. Higher-order functions are operations on the data type of functions. We’ve already seen several higher-order functions in this reading, of course: map, filter, and reduce all take a function as an argument.

In this case, endsWith is a creator of functions. Its signature is: string → (string → boolean).

Now let’s use map and filter to recursively traverse the folder tree:

function allFilesIn(folder: string): Array<string> {
    const children: Array<string> = fs.readdirSync(folder)
                                    .map(f => path.join(folder, f));
    const descendants: Array<string> = children
                                       .filter(f => fs.lstatSync(f).isDirectory())
                                       .map(allFilesIn)
                                       .flat();
    return [
        ...descendants,
        ...children.filter(f => fs.lstatSync(f).isFile())
    ];
}

The first line gets all the children of the folder, which might look like this:

["src/client", "src/server", "src/Main.ts", ...]

The second line is the key bit: it filters the children for just the subfolders using the isDirectory method, and then recursively maps allFilesIn against this array of subfolders! The result might look like this:

[["src/client/MyClient.ts", ...], ["src/server/MyServer.ts", ...], ...]

So we have to flatten it to remove the nested structure, which is what flat) does:

["src/client/MyClient.ts", ..., "src/server/MyServer.ts", ...]

(This pattern of map-and-flatten is common enough that arrays also have a method to do both that we could have used instead: flatMap.)

Finally we add the immediate children that are plain files (not folders), and that’s our result.

We can also do the other pieces of the problem with map/filter/reduce. Once we have the array of all files underneath the current folder, we can filter it to just the TS files:

const filenames: Array<string> = allFilesIn(".")
                                 .filter(endsWith(".ts"));

Now that we have the files we want to extract words from, we’re ready to load their contents:

const fileContents: Array<Array<string>> = filenames.map(f => {
    try {
        const data = fs.readFileSync(f, { encoding: "utf8", flag: "r" });
        return data.split("\n");
    } catch (e) {
        console.error(e);
    }
});

Finally, we can flatten the array of arrays of lines into a simple array of lines:

const lines: Array<string> = fileContents.flat();

and then extract the nonempty words from each line:

const words: Array<string> = lines.map(line => line.split(/\W+/)
                                               .filter(s => s.length > 0))
                             .flat();

(Here we call split not with a simple string delimiter, but with a regular expression, which will be discussed more in a future reading. This regular expression identifies substrings of non-word characters.)

And we’re done, we have our array of all words in the project’s TypeScript files! As promised, the control statements have disappeared.

As can be seen in the example above, one common pattern when using map/filter/reduce is method chaining. Since each function returns a new array, calls to map/filter/reduce can be placed one after another.

→ full code for the example

Benefits of abstracting out control

Map/filter/reduce can often make code shorter and simpler, and allow the programmer to focus on the heart of the computation rather than on the details of loops, branches, and other control statements.

This pattern is so useful that Google put it at the heart of an influential distributed system, MapReduce, that processes massive amounts of data on a cluster of computers.

reading exercises

map/filter/reduce

This function computes the product of just the odd integers in an array:

function productOfOdds(array: Array<number>): number {
    let result = 1;
    for (let x of array) {
        if (x % 2 === 1) {
            result *= x;
        }
    }
    return result;
}

Rewrite this code using map, filter, and reduce:

function productOfOdds(array: Array<number>): number {
    return ▶▶EXPR◀◀;
}
const x: number = 5;
const y: number = 8;

function isOdd(x: number): boolean {
  return x % 2 === 1;
}

const xIsOdd: boolean = x % 2 === 1;

function alwaysOdd(x: number): number {
  return isOdd(x) ? x : 1;
}

function identityFunction(x: number): number {
    return x;
}

function sum(x: number, y: number): number {
  return x + y;
}

const identity = (x: number) => x;

function product(x: number, y: number): number {
  return x * y;
}

function alwaysTrue(x: number): boolean {
  return true;
}

Assuming this code is part of the file shown on the right, which of the following expressions can replace ▶▶EXPR◀◀?

array
.map(identityFunction)
.filter(isOdd)
.reduce(product, 1)

(missing explanation)

array
.map(identity)
.filter(alwaysTrue)
.reduce(product, 1)

(missing explanation)

array
.map(x)
.filter(alwaysOdd)
.reduce(product, 1)

(missing explanation)

array
.map(alwaysOdd)
.filter(alwaysTrue)
.reduce(product, 1)

(missing explanation)

array
.map(identityFunction)
.filter(xIsOdd)
.reduce(x*y, 1)

(missing explanation)

Avoiding loops in TypeScript

Python makes filter and map easily accessible with list comprehensions:

doubleOdds = [ x*2 for x in arr if x % 2 == 1 ]

… which is like the TypeScript:

const doubleOdds = arr.filter(x => x % 2 === 1).map(x => x*2)

The result of a list comprehension is a list by virtue of the [] brackets. In Python, comprehensions work over any iterable source, and can also be used to build sets and dictionaries. For example:

arr = [ 4, 7, 5, 6, 7, 4 ]  # given a list
{ x for x in arr if x > 5 } # ... create a set of values greater than 5:
# => a set { 6, 7 }

input = { 'apple': 'red', 'banana': 'yellow' } # given a dictionary
{ value: key for key, value in input.items() } # ... create a dictionary with
                                               #     key-value pairs inverted
# => a dictionary { 'red': 'apple', 'yellow': 'banana' }

We can do the same things in TypeScript:

const arr = [ 4, 7, 5, 6, 7, 4 ];
new Set(arr.filter(x => x > 5));
// => a Set { 6, 7 }

const input = new Map([ ['apple','red'], ['banana','yellow'] ]);
new Map([ ...input.entries() ].map(entry => entry.reverse()));
// => a Map { 'red' => 'apple', 'yellow' => 'banana' }

Breaking down the construction of that inverted map:

  • Notice that the Map constructor takes a series of two-element arrays: the first element of each is a key, the second is its value.
  • Map.entries returns an iterable that iterates over two-element key-value arrays.
  • Because JavaScript/TypeScript doesn’t provide map and filter on iterables, only on arrays, we use the spread operator ... to stuff those key-value arrays into an array.
  • Now we can call map on that array, and for each [key,value] entry, we reverse it to [value,key] and create the new map.

Just as idiomatic Python code will avoid for loops in favor of comprehensions for filtering and mapping data structures, idiomatic TypeScript will take advantage of map and filter in concert with tools like Map.entries and array methods like Array.entries, .some, and .every to avoid loops.

Any time you write a loop, reach for those tools first, instead of an index counter.

One more example

Let’s look at a typical database query example. Suppose we have a database about digital cameras, in which each object is of type Camera with various properties (brand, pixels, cost, etc.). The whole database is in an array called cameras. Then we can describe queries on this database using map/filter/reduce:

// What's the highest resolution Nikon sells? 
cameras.filter(camera => camera.brand === "Nikon")
       .map(camera => camera.pixels)
       .reduce((x,y) => Math.max(x,y));

Relational databases use the map/filter/reduce paradigm (where it’s called project/select/aggregate). SQL (Structured Query Language) is the de facto standard language for querying relational databases. A typical SQL query looks like this:

select max(pixels) from cameras where brand = "Nikon"

cameras is a sequence (of table rows, where each row has the data for one camera)

where brand = "Nikon" is a filter

pixels is a map (extracting just the pixels field from the row)

max is a reduce

Summary

Zooming out, this reading is about functional programming: modeling problems and implementing systems with immutable data and operations that implement pure functions, as opposed to mutable data and operations with side effects.

Because functions in TypeScript are first-class, it is much easier to write higher-order functions that abstract away control statements.

Some languages — Haskell, Scala, OCaml — are strongly associated with functional programming. Many other languages — Swift, Ruby, and so on — use functional programming to a greater or lesser extent.

Functional programming helps make code:

  • Safe from bugs. More immutable data reduces the chance your code has unintended side effects.

  • Easy to understand. Chaining operations like map, filter, and reduce instead of using sequential control statements like for and if makes your code much easier to read and follow.

  • Ready for change. The functional style of programming promotes code syntax that follows its semantics, which allows the programmer to focus on the heart of the computation and is designed to evolve in tandem with its problem model.