6.031
6.031 — Software Construction
Spring 2022

Reading 16: Map, Filter, Reduce

Software in 6.031

Safe from bugsEasy to understandReady for change
Correct today and correct in the unknown future. Communicating clearly with future programmers, including future you. Designed to accommodate change without rewriting.

Objectives

In this reading you’ll learn a design pattern for implementing functions that operate on sequences of elements, and you’ll see how treating functions themselves as first-class values that we can pass around and manipulate in our programs is an especially powerful idea.

  • Map/filter/reduce
  • Functional objects
  • Higher-order functions

First-class functions

Let’s start by reviewing an important Big Idea that you should have already encountered in 6.009: functions as first-class data values, meaning that they can be stored in variables, passed as arguments to functions, and created dynamically like other values.

For example, Math.sqrt is a reference to an object representing the sqrt function. The type of that object is number => number.

But you can also assign that function object to another variable if you like, and it still behaves like sqrt:

const mySquareRoot: (x: number) => number = Math.sqrt;
mySquareRoot(16.0); // returns 4.0

Note the type that we declared on mySquareRoot – this is a function type expression.

You can also pass a reference to the function object as a parameter to another function. You can use functions the same way you would use any other value in TypeScript, like numbers or string references or other object references. In other words, functions in TypeScript are first-class, which means they can be treated like any other value in the language: passed as parameters, returned as return values, and stored in variables and data structures.

Programming languages are full of things that are not first-class. For example, access control is not first-class – you can’t pass public or private as a parameter into a function, or store it in a data structure. The notion of public and private is an aspect of your program that TypeScript offers no way to refer to or manipulate at runtime. Similarly, a while loop or if statement is not first-class. You can’t refer to that piece of code all by itself, or manipulate it at runtime.

In old programming languages, only data was first-class: built-in types (like numbers) and user-defined types. But in modern programming languages, like Python and JavaScript, both data and functions are first-class. First-class functions are a very powerful programming idea. The first practical programming language that used them was Lisp, invented by John McCarthy at MIT. But the idea of programming with functions as first-class values actually predates computers, tracing back to Alonzo Church’s lambda calculus. The lambda calculus used the Greek letter λ to define new functions; this term stuck, and you’ll see it as a keyword not only in Lisp and its descendants, but also in Python.

Guido van Rossum, the creator of Python, wrote a blog post about the design principle that led not only to first-class functions in Python, but first-class methods as well: First-class Everything.

Function expressions in TypeScript

We have already been using function expressions, starting as far back as Testing, since the mocha framework uses them:

describe("Math.max", function() {
  it("covers a < b", function() {
    assert.strictEqual(Math.max(1, 2), 2);
  });
});

This code has two function expressions – the first one is passed as an argument to describe(), and the second (nested inside it) is passed as an argument to it().

TypeScript also has a more compact arrow syntax that avoids the need for a function keyword:

describe("Math.max", () => {
  it("covers a < b", () => {
    assert.strictEqual(Math.max(1, 2), 2);
  });
});

If the body of the function consists of only a single expression (i.e. like a Python lambda expression), then even the curly braces can be omitted:

it("covers a < b", () => assert.strictEqual(Math.max(1, 2), 2) );

But it turns out that there is a technical difference between arrow functions and function expressions that is important when using methods: a function expression may redefine this, but an arrow function uses the definition of this from its surrounding context. So arrow functions should always be used inside an instance method, rather than function expressions. Understanding This, Bind, Call, and Apply in JavaScript has a good explanation of the issues behind the meaning of this in JavaScript.

reading exercises

Sorting dogs

Suppose we have:

interface Dog {
    name(): string;
    breed(): Breed;
    loudnessOfBark(): number;
}

We have an array:

dogs: Array<Dog>

The Array.sort method takes a function argument that tells it how to compare two objects in the set; in this case, two Dogs. The type of this argument is (a: Dog, b: Dog) -> number, and it returns a negative number if a<b (i.e. a should sort before b), a positive number if a>b, or zero if a=b.

Which of these would sort our dogs from quietest bark to loudest?

dogs.sort(function(dog1: Dog, dog2: Dog) {
    return dog2.loudnessOfBark() - dog1.loudnessOfBark();
});

(missing explanation)

dogs.sort(function(dog1: Dog, dog2: Dog) {
    return dog1.loudnessOfBark() > dog2.loudnessOfBark() ? dog1 : dog2;
});

(missing explanation)

dogs.sort((dog1: Dog, dog2: Dog) => {
  return dog1.loudnessOfBark() - dog2.loudnessOfBark();
});
dogs.sort((dog1: Dog, dog2: Dog) => dog1.loudnessOfBark() - dog2.loudnessOfBark());

(missing explanation)

function quietestFirst(dog1: Dog, dog2: Dog) {
    return dog1.loudnessOfBark() - dog2.loudnessOfBark();
};

dogs.sort(quietestFirst());

(missing explanation)

Abstracting out control flow

In this reading we discuss map/filter/reduce, a design pattern that substantially simplifies the implementation of functions that operate over sequences of elements. In this example, we’ll have lots of sequences — arrays of files; input streams that are sequences of lines; lines that are sequences of words. Map/filter/reduce will enable us to operate on those sequences with no explicit control flow — not a single for loop or if statement.

Suppose we’re given the following problem: write a function that finds all the words in the TypeScript files in your project.

Following good practice, we break it down into several simpler steps and write a function for each one:

  • find all the files in the project, by scanning recursively from the project’s root folder
  • restrict them to files with a particular suffix, in this case .ts
  • open each file and read it in line-by-line
  • break each line into words

Writing the individual functions for these substeps, we’ll find ourselves writing a lot of low-level iteration code. For example, here’s what the recursive traversal of the project folder might look like:

import fs from 'fs';
import path from 'path';

/**
 * Find names of all files in the filesystem subtree rooted at folder.
 * @param folder root of subtree, requires fs.lstatSync(folder).isDirectory() === true
 * @returns array of names of all ordinary files (not folders) that have folder as
 *          their ancestor
 */
function allFilesIn(folder: string): Array<string> {
    let files: Array<string> = [];
    for (const child of fs.readdirSync(folder)) {
        const fullNameOfChild = path.join(folder, child);
        if (fs.lstatSync(fullNameOfChild).isDirectory()) {
            files = files.concat(allFilesIn(fullNameOfChild));
        } else if (fs.lstatSync(fullNameOfChild).isFile()) {
            files.push(fullNameOfChild);
        }
    }
    return files;
}

And here’s what the filtering function might look like, which restricts that file array down to just the TypeScript files. Imagine calling this like onlyFilesWithSuffix(files, ".ts"):

/**
 * Filter an array of files to those that end with suffix.
 * @param filenames array of filenames
 * @param suffix string to test
 * @returns a new array consisting of only those files whose names end with suffix
 */
function onlyFilesWithSuffix(filenames: Array<string>, suffix: string): Array<string> {
    const result: Array<string> = [];
    for (const f of filenames) {
        if (f.endsWith(suffix)) {
            result.push(f);
        }
    }
    return result;
}

→ full code for the example

Iterator abstraction

We’ve already seen one design pattern that abstracts away from the details of iterating over a data structure: Iterator.

Iterator gives you a sequence of elements from a data structure, without you having to worry about whether the data structure is a set or a hash table or an array — the Iterator looks the same no matter what the data structure is.

For example, given an Array<string> filenames, we can iterate using indices:

for (let i = 0; i < filenames.length; i++) {
    let f = filenames[i];
    // ...

But this code depends on the length property of Array, as well as [] for indexing, which might be different in another data structure. Using an iterator abstracts away the details:

const iter: Iterator<string> = filenames.iterator();
while (true) {
    let next = iter.next();
    if (next.done) break;
    let f = next.value;
    // ...

Now the loop will be identical for any iterable type that provides an Iterator. In TypeScript, these types all provide the iterator method. Any such iterable type can be used with TypeScript’s for..of statement — here, for (let f of filenames) — and under the hood, it uses an iterator.

Map/filter/reduce abstraction

The map/filter/reduce patterns in this reading do something similar to Iterator, but at an even higher level: they treat the entire sequence of elements as a unit. In this paradigm, the control statements disappear: specifically, the for statements, the if statements, and the return statements in the code from our introductory example will be gone. We’ll also be able to get rid of most of the temporary names (i.e., the local variables filenames, f, and result).

We’ll have three operations for iterable types: map, filter, and reduce. Let’s look at each one in turn, and then look at how they work together. We’ll focus on using Array as the iterable type, because it provides map, filter, and reduce operations natively in TypeScript.

Map

Map applies a unary function to each element in the sequence and returns a new sequence containing the results, in the same order:

map : Array<‍E> × (E → F) → Array<‍F>

For example:

let array: Array<number> = [1, 4, 9, 16];
let result = array.map(Math.sqrt);

This code starts with an array containing the numbers 1, 4, 9, 16, and applies the square root function to each element to produce the values 1.0, 2.0, 3.0, 4.0. In the call array.map(Math.sqrt):

  • map has type Array<‍number> × (number → number) → Array<‍number>
  • the input array has type Array<‍number>
  • the mapping function Math.sqrt has type number → number
  • the return value has type Array<‍number>

We can write the same expression more compactly as:

[1, 4, 9, 16].map(Math.sqrt);

which is the syntax that future examples will use.

Another example of a map:

["A", "b", "C"].map(s => s.toLocaleLowerCase())

which produces an array containing "a", "b", "c". (Notice that when we call a function like Math.sqrt, we can refer to it directly. Calling a method like toLocaleLowerCase requires a lambda expression to wrap the method call syntax.)

Map is useful even if you don’t care about the return value of the function. When you have a sequence of mutable objects, for example, you may want to map a mutator operation over them. Because mutator operations typically return void, however, you can use forEach instead of map. forEach applies the function to each element of the sequence, but does not collect their return values into a new sequence:

itemsToRemove.forEach(item => mySet.delete(item));

(A lambda is needed here, too. The expression mySet.delete by itself would refer to the delete function without binding mySet to this, which will not work.)

The map and forEach operations capture a common pattern for operating over sequences: doing the same thing to each element of the sequence.

reading exercises

map 1

What sequence is the result of

["1", "2", "3"].map((s: string) => s.length);

(missing explanation)

map 2

What sequence is the result of

[1, 2, 3].map((s: string) => s.length);

(missing explanation)

Filter

Our next important sequence operation is filter, which tests each element of an Array<E> with a unary function from E to boolean.

Elements that satisfy the predicate are kept; those that don’t are removed. A new sequence is returned; filter doesn’t modify its input sequence.

filter : Array<‍E> × (E → boolean) → Array<‍E>

Examples:

[1, 2, 3, 4]
   .filter(x => x%2 === 1);
// returns [1, 3]

["x", "y", "2", "3", "a"]
   .filter(s => "abcdefghijklmnopqrstuvwxyz".includes(s)));
// returns ["x", "y", "a"]

const isNonempty = (s: string) => s.length > 0;
["abc", "", "d"]
   .filter(isNonempty);
// returns ["abc", "d"]

reading exercises

filter 1

Given:

let s1 = "Abel";
let s2 = "Baker";
let s3 = "Charlie";

What sequence is the result of

[s1, s2, s3]
    .filter(s => s.startsWith("A"));

(missing explanation)

filter 2

Again given:

let s1 = "Abel";
let s2 = "Baker";
let s3 = "Charlie";

What sequence is the result of

[s1, s2, s3]
    .filter(s => s.startsWith("Z"));

(missing explanation)

filter 3

What is the result of

let isDigit = (s: string) => s.length === 1 && "0123456789".includes(s);
["a", "1", "b", "2"]
    .filter(isDigit);

(missing explanation)

filter 4

What is the result of

let isDigit = (s: string) => s.length === 1 && "0123456789".includes(s);
["a", "1", "b", "2"]
    .filter( ! isDigit);

(missing explanation)

Reduce

Our final operator, reduce, combines the elements of the sequence together, using a binary function. In addition to the function and the array, it also takes an initial value that initializes the reduction, and that ends up being the return value if the array is empty:

reduce : Array<‍E> × (E × E → E) × E → E

arr.reduce(f, init) combines the elements of the array together. Here is one way that it might compute the result:

result0 = init
result1 = f(result0, arr[0])
result2 = f(result1, arr[1])
...
resultn = f(resultn-1, arr[n-1])

resultn is the final result for an n-element sequence.

Adding numbers is probably the most straightforward example:

[1,2,3]
    .reduce((x,y) => x+y, 0)
// computes (((0+1)+2)+3) to produce 6

Initial value

There are three design choices in the reduce operation. First is whether to require an initial value. In TypeScript, the initial value can be omitted, in which case reduce uses the first element of the sequence as the initial value of the reduction. But if the sequence is empty, then reduce has no value to return, and reduce throws a TypeError. It’s important to keep this in mind when using reducers like max, which have no well-defined initial value:

[5, 8, 3, 1].reduce((x,y) => Math.max(x,y))
// computes max(max(max(5,8),3),1) and returns 8

[].reduce((x,y) => Math.max(x,y))
// throws TypeError!

Reduction to another type

The second design choice is the return type of the reduce operation. It doesn’t necessarily have to match the element type of the original sequence. For example, we can use reduce to concatenate an array of numbers (type E) into a string (type F). This changes the reduce operation in two ways:

  • the initial value now has type F
  • the binary function is now an accumulator, of type F × E → F, that takes the current result (of type F) and a new element from the sequence (of type E), and produces an accumulated result of type F.

So a more general form of the reduce operation now looks like this:

reduce : Array<‍E> × (F × E → F) × F → F

In the special case where F is the same type as E, this type signature is the same as above.

Here’s a simple example that concatenates a sequence of numbers into a string:

[1,2,3].reduce( (s: string, n: number) => s + n, "" );
// returns "123"

We’ve included parameter type declarations in the lambda expression above in order to clarify the difference between the two parameters of the accumulator function.

Order of operations

The third design choice is the order in which the elements are accumulated. TypeScript’s reduce operation combines the sequence starting from the left (the first element):

result0 = init
result1 = f(result0, arr[0])
result2 = f(result1, arr[1])
...
resultn = f(resultn-1, arr[n-1])

Another TypeScript operation, reduceRight, goes in the other direction:

result0 = init
result1 = f(result0, arr[n-1])
result2 = f(result1, arr[n-2])
...
resultn = f(resultn-1, arr[0])

to produce resultn as the final result.

Here’s a diagram of the two ways to reduce, from the left or from the right. If the operator is non-commutative (like string concatenation, shown here), then each direction might produce a different answer:

reduce( [1, 2, 3], + , "" )
= (("" + 1) + 2) + 3
= "123"
reduceRight( [1, 2, 3], + , "" )
= (("" + 3) + 2) + 1
= "321"

reading exercises

reduce 1
[e1, e2, e3].reduce( (x: boolean, y: string) => x && y === 'true',
                     true );

In this reduction, what should be the type of the elements e1,e2,e3?

(missing explanation)

Assuming e1,e2,e3 have that type, which is the best description of the behavior of this reduction?

(missing explanation)

reduce 2

What is the result of:

[1, 2, 3].reduce((a, b) => a * b, 0)

(missing explanation)

What is the result of:

["oscar", "papa", "tango"]
    .reduce((a,b) => a.length > b.length ? a : b)

(missing explanation)

reduce 3

This exercise explores three different ways to reduce an array using Math.min(), finding the minimum value. Unlike the reducing functions in the previous exercises, which have identity elements that are natural to use as initial values of the reduction, Math.min() has no natural identity element.

Given an array of numbers:

let array: Array<number> = [...];

…here are three possible ways to compute the minimum value of the array:

let result = array.reduce((x, y) => Math.min(x, y));
let result;
try {
    result = array.reduce((x, y) => Math.min(x, y));
} catch (e) {
    result = 0;
}
let result = array.reduce((x, y) => Math.min(x, y), Number.POSITIVE_INFINITY);

(missing explanation)

(missing explanation)

(missing explanation)

Back to the motivating example

Going back to the example we started with, where we want to find all the words in the TypeScript files in our project, let’s try creating a useful abstraction for filtering files by suffix:

let endsWith = (suffix: string) => {
    return (filename: string) => filename.endsWith(suffix);
}

TypeScript’s string.endsWith is a function string × stringboolean.

Our new endsWith wrapper returns functions that are useful as filters. It takes a filename suffix like .ts and dynamically generates a function that we can use with filter to test for that suffix. Given a Array<string> filenames, we can now write, e.g., filenames.filter(endsWith(".ts")) to obtain a new filtered array.

endsWith is a different kind of beast than our usual functions. It’s a higher-order function, meaning that it’s a function that takes another function as an argument, or returns another function as its result, as endsWith does. Higher-order functions are operations on the data type of functions. In this case, endsWith is a creator of functions.

Its signature is: string → (stringboolean).

Now let’s use map and filter to recursively traverse the folder tree:

function allFilesIn(folder: string): Array<string> {
    const children: Array<string> = fs.readdirSync(folder)
                                    .map(f => path.join(folder, f));
    const descendants: Array<string> = children
                                       .filter(f => fs.lstatSync(f).isDirectory())
                                       .map(allFilesIn)
                                       .flat();
    return [
        ...descendants,
        ...children.filter(f => fs.lstatSync(f).isFile())
    ];
}

The first line gets all the children of the folder, which might look like this:

["src/client", "src/server", "src/Main.ts", ...]

The second line is the key bit: it filters the children for just the subfolders using the isDirectory method, and then recursively maps allFilesIn against this array of subfolders! The result might look like this:

[["src/client/MyClient.ts", ...], ["src/server/MyServer.ts", ...], ...]

So we have to flatten it to remove the nested structure, which is what flat) does:

["src/client/MyClient.ts", ..., "src/server/MyServer.ts", ...]

(This pattern of map-and-flatten is common enough that arrays also have a method to do both that we could have used instead: flatMap.)

Finally we add the immediate children that are plain files (not folders), and that’s our result.

We can also do the other pieces of the problem with map/filter/reduce. Once we have the array of all files underneath the current folder, we can filter it to just the TS files:

const filenames: Array<string> = allFilesIn(".")
                                 .filter(s => endsWith(".ts"));

Now that we have the files we want to extract words from, we’re ready to load their contents:

const fileContents: Array<Array<string>> = filenames.map(f => {
    try {
        const data = fs.readFileSync(f, { encoding: "utf8", flag: "r" });
        return data.split("\n");
    } catch (e) {
        console.error(e);
    }
});

Finally, we can flatten the array of arrays of lines into a simple array of lines:

const lines: Array<string> = fileContents.flat();

and then extract the nonempty words from each line:

const words: Array<string> = lines.map(line => line.split(/\W+/)
                                               .filter(s => s.length > 0))
                             .flat();

(Here we call split not with a simple string delimiter, but with a string-processing tool called a regular expression that we will discuss in an upcoming reading. This regular expression identifies substrings of non-word characters.)

And we’re done, we have our array of all words in the project’s TypeScript files! As promised, the control statements have disappeared.

→ full code for the example

Benefits of abstracting out control

Map/filter/reduce can often make code shorter and simpler, and allow the programmer to focus on the heart of the computation rather than on the details of loops, branches, and control flow.

reading exercises

map/filter/reduce

This function computes the product of just the odd integers in an array:

function productOfOdds(array: Array<number>): number {
    let result = 1;
    for (let x of array) {
        if (x % 2 === 1) {
            result *= x;
        }
    }
    return result;
}

Rewrite this code using map, filter, and reduce:

function productOfOdds(array: Array<number>): number {
    return ▶▶EXPR◀◀;
}
const x: number = 5;
const y: number = 8;

function isOdd(x:number): boolean {
  return x % 2 === 1;
}

const xIsOdd: boolean = x % 2 === 1;

function alwaysOdd(x:number): number {
  return isOdd(x) ? x : 1;
}

function identityFunction(x:number): number {
    return x;
}

function sum(x:number, y:number): number {
  return x + y;
}

const identity = (x:number) => x;

function product(x:number, y:number): number {
  return x * y;
}

function alwaysTrue(x:number): boolean {
  return true;
}

Assuming this code is part of the file shown on the right, which of the following expressions can replace ▶▶EXPR◀◀?

array
.map(identityFunction)
.filter(isOdd)
.reduce(product, 1)

(missing explanation)

array
.map(identity)
.filter(alwaysTrue)
.reduce(product, 1)

(missing explanation)

array
.map(x)
.filter(alwaysOdd)
.reduce(product, 1)

(missing explanation)

array
.map(alwaysOdd)
.filter(alwaysTrue)
.reduce(product, 1)

(missing explanation)

array
.map(identityFunction)
.filter(xIsOdd)
.reduce(x*y, 1)

(missing explanation)

Avoiding loops in TypeScript

Python makes filter and map easily accessible with list comprehensions:

doubleOdds = [ x*2 for x in arr if x % 2 == 1 ]

… which is like the TypeScript:

const doubleOdds = arr.filter(x => x % 2 === 1).map(x => x*2)

The result of a list comprehension is a list by virtue of the [] brackets. In Python, comprehensions work over any iterable source, and can also be used to build sets and dictionaries. For example:

arr = [ 4, 7, 5, 6, 7, 4 ]  # given a list
{ x for x in arr if x > 5 } # ... create a set of values greater than 5:
# => a set { 6, 7 }

input = { 'apple': 'red', 'banana': 'yellow' } # given a dictionary
{ value: key for key, value in input.items() } # ... create a dictionary with
                                               #     key-value pairs inverted
# => a dictionary { 'red': 'apple', 'yellow': 'banana' }

We can do the same things in TypeScript:

const arr = [ 4, 7, 5, 6, 7, 4 ];
new Set(arr.filter(x => x > 5));
// => a Set { 6, 7 }

const input = new Map([ ['apple','red'], ['banana','yellow'] ]);
new Map([ ...input.entries() ].map(entry => entry.reverse()));
// => a Map { 'red' => 'apple', 'yellow' => 'banana' }

Breaking down the construction of that inverted map:

  • Notice that the Map constructor takes a series of two-element arrays: the first element of each is a key, the second is its value.
  • Map.entries returns an iterable that iterates over two-element key-value arrays.
  • Because JavaScript/TypeScript doesn’t provide map and filter on iterables, only on arrays, we use the spread operator ... to stuff those key-value arrays into an array.
  • Now we can call map on that array, and for each [key,value] entry, we reverse it to [value,key] and create the new map.

Just as idiomatic Python code will avoid for loops in favor of comprehensions for filtering and mapping data structures, idiomatic TypeScript will take advantage of map and filter in concert with tools like Map.entries and array methods like Array.entries, .some, and .every to avoid loops.

Any time you write a loop, reach for those tools first, instead of an index counter.

One more example

Let’s look at a typical database query example. Suppose we have a database about digital cameras, in which each object is of type Camera with various properties (brand, pixels, cost, etc.). The whole database is in an array called cameras. Then we can describe queries on this database using map/filter/reduce:

// What's the highest resolution Nikon sells? 
cameras.filter(camera => camera.brand === "Nikon")
       .map(camera => camera.pixels)
       .reduce((x,y) => Math.max(x,y));

Relational databases use the map/filter/reduce paradigm (where it’s called project/select/aggregate). SQL (Structured Query Language) is the de facto standard language for querying relational databases. A typical SQL query looks like this:

select max(pixels) from cameras where brand = "Nikon"

cameras is a sequence (of table rows, where each row has the data for one camera)

where brand = "Nikon" is a filter

pixels is a map (extracting just the pixels field from the row)

max is a reduce

Summary

Zooming out, this reading is about modeling problems and implementing systems with immutable data and operations that implement pure functions, as opposed to mutable data and operations with side effects. Functional programming is the name for this style of programming.

Because functions in TypeScript are first-class, it is much easier to write higher-order functions that abstract away control flow code.

Some languages — Haskell, Scala, OCaml — are strongly associated with functional programming. Many other languages — Swift, Ruby, and so on — use functional programming to a greater or lesser extent.