Reading 26: Map, Filter, Reduce

Software in 6.031

Safe from bugs	Easy to understand	Ready for change
Correct today and correct in the unknown future.	Communicating clearly with future programmers, including future you.	Designed to accommodate change without rewriting.

Objectives

In this reading you’ll learn a design pattern for implementing functions that operate on sequences of elements, and you’ll see how treating functions themselves as first-class values that we can pass around and manipulate in our programs is an especially powerful idea.

Map/filter/reduce
Functional objects
Higher-order functions

Introduction: an example

Suppose we’re given the following problem: write a function that finds all the words in the TypeScript files in your project.

Following good practice, we break it down into several simpler steps and write a function for each one:

find all the files in the project, by scanning recursively from the project’s root folder
restrict them to files with a particular suffix, in this case .ts
open each file and read it in line-by-line
break each line into words

Writing the individual functions for these substeps, we’ll find ourselves writing a lot of low-level iteration code. For example, here’s what the recursive traversal of the project folder might look like:

import fs from 'fs';

/**
 * Find names of all files in the filesystem subtree rooted at folder.
 * @param folder root of subtree, requires fs.lstatSync(folder).isDirectory() === true
 * @return array of names of all ordinary files (not folders) that have folder as
 *         their ancestor
 */
function allFilesIn(folder:string):Array<string> {
    let files:Array<string> = [];
    for (let child of fs.readdirSync(folder)) {
        const fullNameOfChild = path.join(folder, child);
        if (fs.lstatSync(fullNameOfChild).isDirectory()) {
            files = files.concat(allFilesIn(fullNameOfChild));
        } else if (fs.lstatSync(fullNameOfChild).isFile()) {
            files.push(fullNameOfChild);
        }
    }
    return files;
}

And here’s what the filtering function might look like, which restricts that file array down to just the TypeScript files. Imagine calling this like onlyFilesWithSuffix(files, ".ts"):

/**
 * Filter an array of files to those that end with suffix.
 * @param files array of filenames
 * @param suffix string to test
 * @return a new array consisting of only those files whose names end with suffix
 */
function onlyFilesWithSuffix(filenames:Array<string>, suffix:string):Array<string> {
    let result:Array<string> = [];
    for (let f of filenames) {
        if (f.endsWith(suffix)) {
            result.push(f);
        }
    }
    return result;
}

→ full code for the example

In this reading we discuss map/filter/reduce, a design pattern that substantially simplifies the implementation of functions that operate over sequences of elements. In this example, we’ll have lots of sequences — arrays of files; input streams that are sequences of lines; lines that are sequences of words. Map/filter/reduce will enable us to operate on those sequences with no explicit control flow — not a single for loop or if statement.

Along the way, we’ll also see another example of an important Big Idea that we encountered in a previous reading: functions as first-class data values, meaning that they can be stored in variables, passed as arguments to functions, and created dynamically like other values.

Abstracting out control flow

We’ve already seen one design pattern that abstracts away from the details of iterating over a data structure: Iterator.

Iterator abstraction

Iterator gives you a sequence of elements from a data structure, without you having to worry about whether the data structure is a set or a hash table or an array — the Iterator looks the same no matter what the data structure is.

For example, given an Array<string> filenames, we can iterate using indices:

for (let i = 0; i < filenames.length; i++) {
    let f = filenames[i];
    // ...

But this code depends on the length property of Array, as well as [] for indexing, which might be different in another data structure. Using an iterator abstracts away the details:

Iterator<string> iter = filenames.iterator();
while (true) {
    let next = iter.next();
    if (next.done) break;
    let f = next.value;
    // ...

Now the loop will be identical for any iterable type that provides an Iterator. In TypeScript, these types all provide the iterator method. Any such iterable type can be used with TypeScript’s for..of statement — here, for (let f of filenames) — and under the hood, it uses an iterator.

Map/filter/reduce abstraction

The map/filter/reduce patterns in this reading do something similar to Iterator, but at an even higher level: they treat the entire sequence of elements as a unit. In this paradigm, the control statements disappear: specifically, the for statements, the if statements, and the return statements in the code from our introductory example will be gone. We’ll also be able to get rid of most of the temporary names (i.e., the local variables filenames, f, and result).

We’ll have three operations for iterable types: map, filter, and reduce. Let’s look at each one in turn, and then look at how they work together. We’ll focus on using Array as the iterable type, because it provides map, filter, and reduce operations natively in TypeScript.

Map

Map applies a unary function to each element in the sequence and returns a new sequence containing the results, in the same order:

map : Array<‍E> × (E → F) → Array<‍F>

For example:

let array:Array<number> = [1, 4, 9, 16];
let result = array.map(x => Math.sqrt(x));

This code starts with an array containing the numbers 1, 4, 9, 16, and applies the square root function to each element to produce the values 1.0, 2.0, 3.0, 4.0. In the call array.map(x => Math.sqrt(x)):

map has type Array<‍number> × (number → number) → Array<‍number>
the input array has type Array<‍number>
the mapping function x => Math.sqrt(x) has type number → number
the return value has type Array<‍number>

We can write the same expression more compactly as:

[1, 4, 9, 16].map(x => Math.sqrt(x))

which is the syntax that future examples will use.

Another example of a map:

["A", "b", "C"].map(s => s.toLocaleLowerCase())

which produces an array containing "a", "b", "c".

This operation captures a common pattern for operating over sequences: doing the same thing to each element of the sequence.

Function references

A lambda expression like x => Math.sqrt(x) has an unnecessary level of indirection – it takes an argument, calls sqrt on that argument, and directly returns the result of sqrt. So calling the lambda expression is effectively the same as calling sqrt directly.

TypeScript lets us eliminate this level of indirection by referring to the sqrt function directly:

[1, 4, 9, 16].map(Math.sqrt)

But let’s pause here for a second, because we’re doing something unusual with functions. The map method takes a reference to a function as its argument — not to the result of that function. When we wrote map(Math.sqrt), we didn’t call sqrt, like Math.sqrt(25) is a call. Instead we referred to the function itself by name. Math.sqrt is a reference to an object representing the sqrt function. The type of that object is number => number. More generally, functions in TypeScript have types notated using function type expressions, which can represent arbitrary unary functions.

But you can also assign that function object to another variable if you like, and it still behaves like sqrt:

let mySquareRoot = Math.sqrt;
mySquareRoot(16.0); // returns 4.0

You can also define your own function and assign it to another variable:

let runHailstoneStep = (x:number) => x % 2 === 0 ? x/2 : x*3+1;
runHailstoneStep(7.0); // returns 22.0

You can also pass a reference to the function object as a parameter to another function, and that’s what we’re doing here with map. You can use function objects the same way you would use any other value in TypeScript, like numbers or string references or other object references. In other words, functions in TypeScript are first-class.

More ways to use map

Map is useful even if you don’t care about the return value of the function. When you have a sequence of mutable objects, for example, you may want to map a mutator operation over them. Because mutator operations typically return void, however, you can use forEach instead of map. forEach applies the function to each element of the sequence, but does not collect their return values into a new sequence:

workers.forEach(worker => worker.postMessage('hello!'));

sockets.forEach(socket => socket.close());

reading exercises

map 1

What sequence is the result of

["1", "2", "3"].map(s => s.length);

(missing explanation)

map 2

What sequence is the result of

[1, 2, 3].map(s => s.length);

(missing explanation)

Filter

Our next important sequence operation is filter, which tests each element of an Array<E> with a unary function from E to boolean.

Elements that satisfy the predicate are kept; those that don’t are removed. A new sequence is returned; filter doesn’t modify its input sequence.

filter : Array<‍E> × (E → boolean) → Array<‍E>

Examples:

["x", "y", "2", "3", "a"]
   .filter(s => s.match(/[a-z]/i));
// returns ["x", "y", "a"]

[1, 2, 3, 4]
   .filter(x => x%2 === 1);
// returns [1, 3]

let isNonempty = (s:string) => s.length > 0;
["abc", "", "d"]
   .filter(isNonempty);
// returns ["abc", "d"]

reading exercises

filter 1

Given:

let s1 = "Abel";
let s2 = "Baker";
let s3 = "Charlie";

What sequence is the result of

[s1, s2, s3]
    .filter(s => s.startsWith("A"));

[s1, s2, s3]

["Abel", "Baker", "Charlie"]

["Abel", "", ""]

["Abel"]

[]

static error

dynamic error

(missing explanation)

filter 2

Again given:

let s1 = "Abel";
let s2 = "Baker";
let s3 = "Charlie";

What sequence is the result of

[s1, s2, s3]
    .filter(s => s.startsWith("Z"));

[s1, s2, s3]

["Abel", "Baker", "Charlie"]

["", "", ""]

[]

static error

dynamic error

(missing explanation)

filter 3

What is the result of

let isDigit = (s:string) => s.match(/[0-9]/i);
["a", "1", "b", "2"]
    .filter(isDigit);

(missing explanation)

filter 4

What is the result of

let isDigit = (s:string) => s.match(/[0-9]/i);
["a", "1", "b", "2"]
    .filter( ! isDigit);

(missing explanation)

Reduce

Our final operator, reduce, combines the elements of the sequence together, using a binary function. In addition to the function and the array, it also takes an initial value that initializes the reduction, and that ends up being the return value if the array is empty:

reduce : Array<‍E> × (E × E → E) × E → E

arr.reduce(f, init) combines the elements of the array together. Here is one way that it might compute the result:

result₀ = init
result₁ = f(result₀, arr[0])
result₂ = f(result₁, arr[1])
...
result_n = f(result_n-1, arr[n-1])

result_n is the final result for an n-element sequence.

Adding numbers is probably the most straightforward example:

[1,2,3]
    .reduce((x,y) => x+y, 0)
// computes (((0+1)+2)+3) to produce 6

Initial value

There are three design choices in the reduce operation. First is whether to require an initial value. In TypeScript, the initial value can be omitted, in which case reduce uses the first element of the sequence as the initial value of the reduction. But if the sequence is empty, then reduce has no value to return, and reduce throws a TypeError. It’s important to keep this in mind when using reducers like max, which have no well-defined initial value:

[5, 8, 3, 1].reduce((x,y) => Math.max(x,y))
// computes max(max(max(5,8),3),1) and returns 8

[].reduce((x,y) => Math.max(x,y))
// throws TypeError!

Reduction to another type

The second design choice is the return type of the reduce operation. It doesn’t necessarily have to match the element type of the original sequence. For example, we can use reduce to concatenate an array of numbers (type E) into a string (type F). This changes the reduce operation in two ways:

the initial value now has type F
the binary function is now an accumulator, of type F × E → F, that takes the current result (of type F) and a new element from the sequence (of type E), and produces an accumulated result of type F.

So a more general form of the reduce operation now looks like this:

reduce : Array<‍E> × (F × E → F) × F → F

In the special case where F is the same type as E, this type signature is the same as above.

Here’s a simple example that concatenates a sequence of numbers into a string:

[1,2,3].reduce( (s:string, n:number) => s + n, "" );
// returns "123"

We’ve included parameter type declarations in the lambda expression above in order to clarify the difference between the two parameters of the accumulator function.

Order of operations

The third design choice is the order in which the elements are accumulated. TypeScript’s reduce operation combines the sequence starting from the left (the first element):

result₀ = init
result₁ = f(result₀, arr[0])
result₂ = f(result₁, arr[1])
...
result_n = f(result_n-1, arr[n-1])

Another TypeScript operation, reduceRight, goes in the other direction:

result₀ = init
result₁ = f(result₀, arr[n-1])
result₂ = f(result₁, arr[n-2])
...
result_n = f(result_n-1, arr[0])

to produce result_n as the final result.

Here’s a diagram of the two ways to reduce, from the left or from the right. If the operator is non-commutative (like string concatenation, shown here), then each direction might produce a different answer:

	reduce( [1, 2, 3], + , "" ) = (("" + 1) + 2) + 3 = "123"		reduceRight( [1, 2, 3], + , "" ) = (("" + 3) + 2) + 1 = "321"

reading exercises

reduce 1

[e1, e2, e3].reduce( (x:boolean, y:string) => x && y === 'true',
                     true );

In this reduction, what should be the type of the elements e1,e2,e3?

boolean

number

string

string × boolean → boolean

boolean × boolean → boolean

(missing explanation)

Assuming e1,e2,e3 have that type, which is the best description of the behavior of this reduction?

always returns false

returns true iff the array is empty

returns true iff e1,e2,e3 are all true

returns true iff e1,e2,e3 are strings

returns true iff e1,e2,e3 are all the string "true"

returns true iff at least one of e1,e2,e3 is the string "true"

(missing explanation)

reduce 2

What is the result of:

[1, 2, 3].reduce((a, b) => a * b, 0)

(missing explanation)

What is the result of:

["oscar", "papa", "tango"]
    .reduce((a,b) => a.length > b.length ? a : b)

(missing explanation)

reduce 3

This exercise explores three different ways to reduce an array using Math.min(), finding the minimum value. Unlike the reducing functions in the previous exercises, which have identity elements that are natural to use as initial values of the reduction, Math.min() has no natural identity element.

Given an array of numbers:

let array:new Array<number> = [...];

…here are three possible ways to compute the minimum value of the array:

let result = array.reduce((x, y) => Math.min(x, y));

let result;
try {
    result = array.reduce((x, y) => Math.min(x, y));
} catch (e) {
    result = 0;
}

let result = array.reduce((x, y) => Math.min(x, y), Number.POSITIVE_INFINITY);

Which of the following are true about the first approach?

result is the right answer when the array is [1, 2, 3]

result is the right answer when the array is [-1, -2, -3]

result is 0 when the array is empty

result is a large number when the array is empty

an exception is thrown when the array is empty

(missing explanation)

Which of the following are true about the second approach?

result is the right answer when the array is [1, 2, 3]

result is the right answer when the array is [-1, -2, -3]

result is 0 when the array is empty

result is a large number when the array is empty

an exception is thrown when the array is empty

(missing explanation)

Which of the following are true about the third approach?

result is the right answer when the array is [1, 2, 3]

result is the right answer when the array is [-1, -2, -3]

result is 0 when the array is empty

result is infinite when the array is empty

an exception is thrown when the array is empty

(missing explanation)

Back to the intro example

Going back to the example we started with, where we want to find all the words in the TypeScript files in our project, let’s try creating a useful abstraction for filtering files by suffix:

let endsWith = (suffix: string) => {
    return (filename: string) => filename.endsWith(suffix);
}

TypeScript’s string.endsWith is a function string × string → boolean.

Our new endsWith wrapper returns functions that are useful as filters. It takes a filename suffix like .ts and dynamically generates a function that we can use with filter to test for that suffix. Given a Array<string> filenames, we can now write, e.g., filenames.filter(endsWith(".ts")) to obtain a new filtered array.

endsWith is a different kind of beast than our usual functions. It’s a higher-order function, meaning that it’s a function that takes another function as an argument, or returns another function as its result, as endsWith does. Higher-order functions are operations on the datatype of functions. In this case, endsWith is a creator of functions.

Its signature is: string → (string → boolean).

Now let’s use map and filter to recursively traverse the folder tree:

function allFilesIn(folder: string): Array<string> {
    let children: Array<string> = fs.readdirSync(folder);
    let descendants: Array<string> = children
                                     .filter(f => fs.lstatSync(f).isDirectory())
                                     .flatMap(allFilesIn);
    return [
        ...descendants,
        ...children.filter(f => fs.lstatSync(f).isFile())
    ];
}

The first line gets all the children of the folder, which might look like this:

["src/client", "src/server", "src/Main.ts", ...]

The second line is the key bit: it filters the children for just the subfolders using the isDirectory method, and then recursively maps allFilesIn against this array of subfolders! The result might look like this:

[["src/client/MyClient.ts", ...], ["src/server/MyServer.ts", ...], ...]

So we have to flatten it to remove the nested structure, which is what flatMap) does:

["src/client/MyClient.ts", ..., "src/server/MyServer.ts", ...]

Finally we add the immediate children that are plain files (not folders), and that’s our result.

We can also do the other pieces of the problem with map/filter/reduce. Once we have the array of all files underneath the current folder, we can filter it to just the TS files:

let filenames: Array<string> = allFilesIn(".")
                      .filter(s => endsWith(".ts"));

Now that we have the files we want to extract words from, we’re ready to load their contents:

let fileContents: Array<Array<string>> = filenames.map(f => { 
    try { 
        const data = fs.readFileSync(f, { encoding: "utf8", flag: "r" });
        return data.split(/\r?\n/);
    } catch (e) {
        console.error(e);
    }
});

Finally, we can flatten the array of arrays of lines into a simple array of lines:

let lines: Array<string> = fileContents.flatMap(x => x);

and then extract the nonempty words from each line:

let words: Array<string> = lines.flatMap(line => line.split(/\W+/)
                                             .filter(s => s.length > 0));

And we’re done, we have our array of all words in the project’s TypeScript files! As promised, the control statements have disappeared.

→ full code for the example

Benefits of abstracting out control

Map/filter/reduce can often make code shorter and simpler, and allow the programmer to focus on the heart of the computation rather than on the details of loops, branches, and control flow.

By arranging our program in terms of map, filter, and reduce, and in particular using immutable datatypes and pure functions (functions that do not mutate data) as much as possible, we’ve created more opportunities for safe concurrency. Maps and filters using pure functions over immutable datatypes are instantly parallelizable — invocations of the function on different elements of the sequence can be run in different threads, on different processors, even on different machines, and the result will still be the same. MapReduce is a pattern for parallelizing very large computations in this way, that require a cluster of machines to compute.

reading exercises

map/filter/reduce

This function computes the product of just the odd integers in an array:

function productOfOdds(array: Array<number>): number {
    let result = 1;
    for (let x of array) {
        if (x % 2 === 1) {
            result *= x;
        }
    }
    return result;
}

Rewrite this code using map, filter, and reduce:

function productOfOdds(array: Array<number>): number {
    return ▶▶EXPR◀◀;
}

const x:number = 5;
const y:number = 8;

function isOdd(x:number): boolean {
  return x % 2 === 1;
}

const xIsOdd:boolean = x % 2 === 1;

function oddOrIdentity(x:number): number {
  return isOdd(x) ? x : 1;
}

function identityFunction(x:number): number {
    return x;
}

function sum(x:number, y:number): number {
  return x + y;
}

const identity = (x:number) => x;

function product(x:number, y:number): number {
  return x * y;
}

function alwaysTrue(x:number): boolean {
  return true;
}

Assuming this code is part of the file shown on the right, which of the following expressions can replace ▶▶EXPR◀◀?

array
.map(identityFunction)
.filter(isOdd)
.reduce(product, 1)

(missing explanation)

array
.map(identity)
.filter(alwaysTrue)
.reduce(product, 1)

(missing explanation)

array
.map(x)
.filter(oddOrIdentity)
.reduce(product, 1)

(missing explanation)

array
.map(oddOrIdentity)
.filter(alwaysTrue)
.reduce(product, 1)

(missing explanation)

array
.map(identityFunction)
.filter(xIsOdd)
.reduce(x*y, 1)

(missing explanation)

More examples

Let’s look at a typical database query example. Suppose we have a database about digital cameras, in which each object is of type Camera with various properties (brand, pixels, cost, etc.). The whole database is in an array called cameras. Then we can describe queries on this database using map/filter/reduce:

// What's the highest resolution Nikon sells? 
cameras.filter(camera => camera.brand === "Nikon")
       .map(camera => camera.pixels)
       .reduce((x,y) => Math.max(x,y));

Relational databases use the map/filter/reduce paradigm (where it’s called project/select/aggregate). SQL (Structured Query Language) is the de facto standard language for querying relational databases. A typical SQL query looks like this:

select max(pixels) from cameras where brand = "Nikon"

cameras is a sequence (of table rows, where each row has the data for one camera)

where brand = "Nikon" is a filter

pixels is a map (extracting just the pixels field from the row)

max is a reduce

Summary

This reading is about modeling problems and implementing systems with immutable data and operations that implement pure functions, as opposed to mutable data and operations with side effects. Functional programming is the name for this style of programming.

Because functions in TypeScript are first-class, it is much easier to write higher-order functions that abstract away control flow code.

Some languages — Haskell, Scala, OCaml — are strongly associated with functional programming. Many other languages — Swift, Ruby, and so on — use functional programming to a greater or lesser extent.