Reading 13: Map, Filter, Reduce
Software in 6.102
Objectives
In this reading you’ll learn a design pattern for implementing functions that operate on sequences of elements, and you’ll see how treating functions themselves as first-class values that we can pass around and manipulate in our programs is an especially powerful idea.
First-class functions
Let’s start by reviewing an important Big Idea that you should have already encountered in 6.101 [formerly 6.009]: functions as first-class data values, meaning that they can be stored in variables, passed as arguments to functions, and created dynamically like other values.
For example, Math.sqrt
is a reference to an object representing the sqrt
function.
The type of that object is (x: number) => number
.
But you can also assign that function object to another variable if you like, and it still behaves like sqrt
:
const mySquareRoot: (x: number) => number = Math.sqrt;
mySquareRoot(16.0); // returns 4.0
The type that we declared on mySquareRoot
is a function type expression.
Note that the parameter name x
is required!
If you write (number) => number
, it means “a function with a parameter named number
of type any
that returns a number
“, because omitted types are implicitly any
.
You can also pass a reference to the function object as a parameter to another function. You can use functions the same way you would use any other value in TypeScript, like numbers or string references or other object references. In other words, functions in TypeScript are first-class, which means they can be treated like any other value in the language: passed as parameters, returned as return values, and stored in variables and data structures.
Programming languages are full of things that are not first-class.
For example, access control is not first-class – you can’t pass public
or private
as a parameter into a function, or store it in a data structure.
The notion of public
and private
is an aspect of your program that TypeScript offers no way to refer to or manipulate at runtime.
Similarly, a while
loop or if
statement is not first-class.
You can’t refer to that piece of code all by itself, or manipulate it at runtime.
In old programming languages, only data was first-class: built-in types (like numbers) and user-defined types. But in modern programming languages, like Python and JavaScript, both data and functions are first-class. First-class functions are a very powerful programming idea. The first practical programming language that used them was Lisp, invented by John McCarthy at MIT. But the idea of programming with functions as first-class values actually predates computers, tracing back to Alonzo Church’s lambda calculus. The lambda calculus used the Greek letter λ to define new functions; this term stuck, and you’ll see it as a keyword not only in Lisp and its descendants, but also in Python.
Guido van Rossum, the creator of Python, wrote a blog post about the design principle that led not only to first-class functions in Python, but first-class methods as well: First-class Everything.
Function expressions in TypeScript
We have already been using function expressions, starting as far back as Testing, since the mocha framework uses them:
describe("Math.max", function() {
it("covers a < b", function() {
assert.strictEqual(Math.max(1, 2), 2);
});
});
This code has two function expressions – the first one is passed as an argument to describe()
, and the second (nested inside it) is passed as an argument to it()
.
TypeScript also has a more compact arrow syntax that avoids the need for a function
keyword:
describe("Math.max", () => {
it("covers a < b", () => {
assert.strictEqual(Math.max(1, 2), 2);
});
});
If the body of the function consists of only a single expression (i.e. like a Python lambda
expression), then even the curly braces can be omitted:
it("covers a < b", () => assert.strictEqual(Math.max(1, 2), 2) );
But it turns out that there is a technical difference between arrow functions and function
expressions that is important when using methods: a function
expression may redefine this
, but an arrow function uses the definition of this
from its surrounding context.
So arrow functions should always be used inside an instance method, rather than function
expressions.
Understanding This, Bind, Call, and Apply in JavaScript has a good explanation of the issues behind the meaning of this
in JavaScript.
reading exercises
Suppose we have:
interface Dog {
name(): string;
breed(): Breed;
loudnessOfBark(): number;
}
We have an array:
dogs: Array<Dog>
The Array.sort
method takes a function argument that tells it how to compare two objects in the set; in this case, two Dog
s.
The type of this argument is (a: Dog, b: Dog) => number
, and it returns a negative number if a<b
(i.e. a
should sort before b
), a positive number if a>b
, or zero if a=b
.
Which of these would sort our dogs from quietest bark to loudest?
dogs.sort(function(dog1: Dog, dog2: Dog) {
return dog2.loudnessOfBark() - dog1.loudnessOfBark();
});
(missing explanation)
dogs.sort(function(dog1: Dog, dog2: Dog) {
return dog1.loudnessOfBark() > dog2.loudnessOfBark() ? dog1 : dog2;
});
(missing explanation)
dogs.sort((dog1: Dog, dog2: Dog) => {
return dog1.loudnessOfBark() - dog2.loudnessOfBark();
});
dogs.sort((dog1: Dog, dog2: Dog) => dog1.loudnessOfBark() - dog2.loudnessOfBark());
(missing explanation)
function quietestFirst(dog1: Dog, dog2: Dog) {
return dog1.loudnessOfBark() - dog2.loudnessOfBark();
};
dogs.sort(quietestFirst());
(missing explanation)
Abstracting out control flow
In this reading we discuss map/filter/reduce, a design pattern that substantially simplifies the implementation of functions that operate over sequences of elements.
In this example, we’ll have lots of sequences — arrays of files; input streams that are sequences of lines; lines that are sequences of words.
Map/filter/reduce will enable us to operate on those sequences with no explicit control flow — not a single for
loop or if
statement.
Suppose we’re given the following problem: write a function that finds all the words in the TypeScript files in your project.
Following good practice, we break it down into several simpler steps and write a function for each one:
- find all the files in the project, by scanning recursively from the project’s root folder
- restrict them to files with a particular suffix, in this case
.ts
- open each file and read it in line-by-line
- break each line into words
Writing the individual functions for these substeps, we’ll find ourselves writing a lot of low-level iteration code. For example, here’s what the recursive traversal of the project folder might look like:
import fs from 'fs';
import path from 'path';
/**
* Find names of all files in the filesystem subtree rooted at folder.
* @param folder root of subtree, requires fs.lstatSync(folder).isDirectory() === true
* @returns array of names of all ordinary files (not folders) that have folder as
* their ancestor
*/
function allFilesIn(folder: string): Array<string> {
let files: Array<string> = [];
for (const child of fs.readdirSync(folder)) {
const fullNameOfChild = path.join(folder, child);
if (fs.lstatSync(fullNameOfChild).isDirectory()) {
files = files.concat(allFilesIn(fullNameOfChild));
} else if (fs.lstatSync(fullNameOfChild).isFile()) {
files.push(fullNameOfChild);
}
}
return files;
}
And here’s what the filtering function might look like, which restricts that file array down to just the TypeScript files.
Imagine calling this like onlyFilesWithSuffix(files, ".ts")
:
/**
* Filter an array of files to those that end with suffix.
* @param filenames array of filenames
* @param suffix string to test
* @returns a new array consisting of only those files whose names end with suffix
*/
function onlyFilesWithSuffix(filenames: Array<string>, suffix: string): Array<string> {
const result: Array<string> = [];
for (const f of filenames) {
if (f.endsWith(suffix)) {
result.push(f);
}
}
return result;
}
Iterator abstraction
One design pattern that abstracts away from the details of iterating over a data structure is an iterator.
An iterator gives you a sequence of elements from a data structure, without you having to worry about whether the data structure is a set or a hash table or an array — the Iterator
looks the same no matter what the data structure is.
For example, given an Array<string>
filenames
, we can iterate using indices:
for (let i = 0; i < filenames.length; i++) {
let f = filenames[i];
// ...
But this code depends on the length
property of Array
, as well as []
for indexing, which might be different in another data structure.
(A Set
, for example, uses size
instead of length, and has no []
indexing at all.)
Using an iterator abstracts away the details:
const iter: Iterator<string> = filenames.iterator();
while (true) {
let next = iter.next();
if (next.done) break;
let f = next.value;
// ...
Now the loop will be identical for any iterable type that provides an Iterator
.
filenames
could be an Array
or Set
or any other iterable type, and the code would work the same.
In TypeScript, these types all provide the iterator
method.
Any such iterable type can be used with TypeScript’s for..of
statement — here, for (let f of filenames)
— and under the hood, it uses an iterator.
Map/filter/reduce abstraction
The map/filter/reduce patterns in this reading do something similar to Iterator, but at an even higher level: they treat the entire sequence of elements as a unit.
In this paradigm, the control statements disappear: specifically, the for
statements, the if
statements, and the return
statements in the code from our introductory example will be gone.
We’ll also be able to get rid of most of the temporary names (i.e., the local variables filenames
, f
, and result
).
We’ll have three operations for iterable types: map, filter, and reduce.
Let’s look at each one in turn, and then look at how they work together.
We’ll focus on using Array
as the iterable type, because it provides map, filter, and reduce operations natively in TypeScript.
Map
Map applies a unary function to each element in the sequence and returns a new sequence containing the results, in the same order:
map : Array<E> × (E → F) → Array<F>
For example:
const array: Array<number> = [1, 4, 9, 16];
const result = array.map(Math.sqrt);
This code starts with an array containing the numbers 1, 4, 9, 16
and applies the square root function to each element to produce the values 1.0, 2.0, 3.0, 4.0
.
In the call array.map(Math.sqrt)
:
map
has typeArray<number> × (number → number) → Array<number>
- the input
array
has typeArray<number>
- the mapping function
Math.sqrt
has typenumber → number
- the return value has type
Array<number>
We can write the same expression more compactly as:
[1, 4, 9, 16].map(Math.sqrt);
which is the syntax that future examples will use.
["A", "b", "C"].map(s => s.toLocaleLowerCase())
which produces an array containing "a", "b", "c"
.
(Notice that when we call a function like Math.sqrt
, we can refer to it directly.
Calling a method like toLocaleLowerCase
requires a lambda expression to wrap the method call syntax.)
Map is useful even if you don’t care about the return value of the function.
When you have a sequence of mutable objects, for example, you may want to map a mutator operation over them.
Because mutator operations typically return void
, however, you can use forEach
instead of map
.
forEach
applies the function to each element of the sequence, but does not collect their return values into a new sequence:
itemsToRemove.forEach(item => mySet.delete(item));
(A lambda is needed here, too.
The expression mySet.delete
by itself would refer to the delete
function without binding mySet
to this
, which will not work.)
The map
and forEach
operations capture a common pattern for operating over sequences: doing the same thing to each element of the sequence.
Filter
Our next important sequence operation is filter, which tests each element of an Array<E>
with a unary function from E
to boolean
.
Elements that satisfy the predicate are kept; those that don’t are removed. A new sequence is returned; filter doesn’t modify its input sequence.
filter : Array<E> × (E → boolean) → Array<E>
[1, 2, 3, 4]
.filter(x => x%2 === 1);
// returns [1, 3]
["x", "y", "2", "3", "a"]
.filter(s => "abcdefghijklmnopqrstuvwxyz".includes(s)));
// returns ["x", "y", "a"]
const isNonempty = (s: string) => s.length > 0;
["abc", "", "d"]
.filter(isNonempty);
// returns ["abc", "d"]
reading exercises
Reduce
Our final operator, reduce, combines the elements of the sequence together, using a binary function. In addition to the function and the array, it also takes an initial value that initializes the reduction, and that ends up being the return value if the array is empty:
reduce : Array<E> × (E × E → E) × E → E
arr.reduce(f, init)
combines the elements of the array together.
Here is one way that it might compute the result:
result0 = init
result1 = f(result0, arr[0])
result2 = f(result1, arr[1])
...
resultn = f(resultn-1, arr[n-1])
resultn is the final result for an n-element sequence.
Adding numbers is probably the most straightforward example:
[1,2,3]
.reduce((x,y) => x+y, 0)
// computes (((0+1)+2)+3) to produce 6
Initial value
There are three design choices in the reduce operation.
First is whether to require an initial value.
In TypeScript, the initial value can be omitted, in which case reduce uses the first element of the sequence as the initial value of the reduction.
But if the sequence is empty, then reduce has no value to return, and reduce throws a TypeError
.
It’s important to keep this in mind when using reducers like max
, which have no well-defined initial value:
[5, 8, 3, 1].reduce((x,y) => Math.max(x,y))
// computes max(max(max(5,8),3),1) and returns 8
[].reduce((x,y) => Math.max(x,y))
// throws TypeError!
Reduction to another type
The second design choice is the return type of the reduce operation.
It doesn’t necessarily have to match the element type of the original sequence.
For example, we can use reduce to concatenate an array of numbers (type E
) into a string (type F
).
This changes the reduce operation in two ways:
- the initial value now has type
F
- the binary function is now an accumulator, of type
F × E → F
, that takes the current result (of typeF
) and a new element from the sequence (of typeE
), and produces an accumulated result of typeF
.
So a more general form of the reduce operation now looks like this:
reduce : Array<E> × (F × E → F) × F → F
In the special case where F
is the same type as E
, this type signature is the same as above.
Here’s a simple example that concatenates a sequence of numbers into a string:
[1,2,3].reduce( (s: string, n: number) => s + n, "" );
// returns "123"
We’ve included parameter type declarations in the lambda expression above in order to clarify the difference between the two parameters of the accumulator function.
If F
is the same type as E
, it’s ok to omit these type declarations in the lambda expression.
Order of operations
The third design choice is the order in which the elements are accumulated.
TypeScript’s reduce
operation combines the sequence starting from the left (the first element):
result0 = init
result1 = f(result0, arr[0])
result2 = f(result1, arr[1])
...
resultn = f(resultn-1, arr[n-1])
Another TypeScript operation, reduceRight
, goes in the other direction:
result0 = init
result1 = f(result0, arr[n-1])
result2 = f(result1, arr[n-2])
...
resultn = f(resultn-1, arr[0])
to produce resultn as the final result.
Here’s a diagram of the two ways to reduce, from the left or from the right. If the operator is non-commutative (like string concatenation, shown here), then each direction might produce a different answer:
reduce( [1, 2, 3], + , "" ) = (("" + 1) + 2) + 3 = "123" |
reduceRight( [1, 2, 3], + , "" ) = (("" + 3) + 2) + 1 = "321" |
|||
reading exercises
This exercise explores three different ways to reduce an array using Math.min()
, finding the minimum value.
Unlike the reducing functions in the previous exercises, which have identity elements that are natural to use as initial values of the reduction, Math.min()
has no natural identity element.
Given an array of numbers:
const array: Array<number> = [...];
…here are three possible ways to compute the minimum value of the array:
const result = array.reduce((x, y) => Math.min(x, y));
let result;
try {
result = array.reduce((x, y) => Math.min(x, y));
} catch (e) {
result = 0;
}
const result = array.reduce((x, y) => Math.min(x, y), Number.POSITIVE_INFINITY);
(missing explanation)
(missing explanation)
(missing explanation)
Back to the motivating example
Going back to the example we started with, where we want to find all the words in the TypeScript files in our project, let’s try creating a useful abstraction for filtering files by suffix:
const endsWith = (suffix: string) => {
return (filename: string) => filename.endsWith(suffix);
}
TypeScript’s string.endsWith
is a function string × string → boolean
.
Our new endsWith
wrapper returns functions that are useful as filters.
It takes a filename suffix like .ts
and dynamically generates a function that we can use with filter to test for that suffix.
Given a Array<string> filenames
, we can now write, e.g., filenames.filter(endsWith(".ts"))
to obtain a new filtered array.
endsWith
is a different kind of beast than our usual functions.
It’s a higher-order function, meaning that it’s a function that takes another function as an argument, or returns another function as its result, as endsWith
does.
Higher-order functions are operations on the data type of functions.
In this case, endsWith
is a creator of functions.
Its signature is: string → (string → boolean
).
Now let’s use map and filter to recursively traverse the folder tree:
function allFilesIn(folder: string): Array<string> {
const children: Array<string> = fs.readdirSync(folder)
.map(f => path.join(folder, f));
const descendants: Array<string> = children
.filter(f => fs.lstatSync(f).isDirectory())
.map(allFilesIn)
.flat();
return [
...descendants,
...children.filter(f => fs.lstatSync(f).isFile())
];
}
The first line gets all the children of the folder, which might look like this:
["src/client", "src/server", "src/Main.ts", ...]
The second line is the key bit: it filters the children for just the subfolders using the isDirectory
method, and then recursively maps allFilesIn
against this array of subfolders!
The result might look like this:
[["src/client/MyClient.ts", ...], ["src/server/MyServer.ts", ...], ...]
So we have to flatten it to remove the nested structure, which is what flat
) does:
["src/client/MyClient.ts", ..., "src/server/MyServer.ts", ...]
(This pattern of map-and-flatten is common enough that arrays also have a method to do both that we could have used instead: flatMap
.)
Finally we add the immediate children that are plain files (not folders), and that’s our result.
We can also do the other pieces of the problem with map/filter/reduce. Once we have the array of all files underneath the current folder, we can filter it to just the TS files:
const filenames: Array<string> = allFilesIn(".")
.filter(endsWith(".ts"));
Now that we have the files we want to extract words from, we’re ready to load their contents:
const fileContents: Array<Array<string>> = filenames.map(f => {
try {
const data = fs.readFileSync(f, { encoding: "utf8", flag: "r" });
return data.split("\n");
} catch (e) {
console.error(e);
}
});
Finally, we can flatten the array of arrays of lines into a simple array of lines:
const lines: Array<string> = fileContents.flat();
and then extract the nonempty words from each line:
const words: Array<string> = lines.map(line => line.split(/\W+/)
.filter(s => s.length > 0))
.flat();
(Here we call split
not with a simple string delimiter, but with a regular expression, which we discussed in a previous reading.
This regular expression identifies substrings of non-word characters.)
And we’re done, we have our array of all words in the project’s TypeScript files! As promised, the control statements have disappeared.
Benefits of abstracting out control
Map/filter/reduce can often make code shorter and simpler, and allow the programmer to focus on the heart of the computation rather than on the details of loops, branches, and control flow.
reading exercises
This function computes the product of just the odd integers in an array:
function productOfOdds(array: Array<number>): number {
let result = 1;
for (let x of array) {
if (x % 2 === 1) {
result *= x;
}
}
return result;
}
Rewrite this code using map, filter, and reduce:
function productOfOdds(array: Array<number>): number {
return ▶▶EXPR◀◀;
}
const x: number = 5;
const y: number = 8;
function isOdd(x:number): boolean {
return x % 2 === 1;
}
const xIsOdd: boolean = x % 2 === 1;
function alwaysOdd(x:number): number {
return isOdd(x) ? x : 1;
}
function identityFunction(x:number): number {
return x;
}
function sum(x:number, y:number): number {
return x + y;
}
const identity = (x:number) => x;
function product(x:number, y:number): number {
return x * y;
}
function alwaysTrue(x:number): boolean {
return true;
}
Assuming this code is part of the file shown on the right, which of the following expressions can replace ▶▶EXPR◀◀
?
array
.map(identityFunction)
.filter(isOdd)
.reduce(product, 1)
(missing explanation)
array
.map(identity)
.filter(alwaysTrue)
.reduce(product, 1)
(missing explanation)
array
.map(x)
.filter(alwaysOdd)
.reduce(product, 1)
(missing explanation)
array
.map(alwaysOdd)
.filter(alwaysTrue)
.reduce(product, 1)
(missing explanation)
array
.map(identityFunction)
.filter(xIsOdd)
.reduce(x*y, 1)
(missing explanation)
Avoiding loops in TypeScript
Python makes filter
and map
easily accessible with list comprehensions:
doubleOdds = [ x*2 for x in arr if x % 2 == 1 ]
… which is like the TypeScript:
const doubleOdds = arr.filter(x => x % 2 === 1).map(x => x*2)
The result of a list comprehension is a list by virtue of the [
…]
brackets.
In Python, comprehensions work over any iterable source, and can also be used to build sets and dictionaries.
For example:
arr = [ 4, 7, 5, 6, 7, 4 ] # given a list
{ x for x in arr if x > 5 } # ... create a set of values greater than 5:
# => a set { 6, 7 }
input = { 'apple': 'red', 'banana': 'yellow' } # given a dictionary
{ value: key for key, value in input.items() } # ... create a dictionary with
# key-value pairs inverted
# => a dictionary { 'red': 'apple', 'yellow': 'banana' }
We can do the same things in TypeScript:
const arr = [ 4, 7, 5, 6, 7, 4 ];
new Set(arr.filter(x => x > 5));
// => a Set { 6, 7 }
const input = new Map([ ['apple','red'], ['banana','yellow'] ]);
new Map([ ...input.entries() ].map(entry => entry.reverse()));
// => a Map { 'red' => 'apple', 'yellow' => 'banana' }
Breaking down the construction of that inverted map:
- Notice that the
Map
constructor takes a series of two-element arrays: the first element of each is a key, the second is its value. Map.entries
returns an iterable that iterates over two-element key-value arrays.- Because JavaScript/TypeScript doesn’t provide
map
andfilter
on iterables, only on arrays, we use the spread operator...
to stuff those key-value arrays into an array. - Now we can call
map
on that array, and for each[key,value]
entry, we reverse it to[value,key]
and create the new map.
Just as idiomatic Python code will avoid for
loops in favor of comprehensions for filtering and mapping data structures, idiomatic TypeScript will take advantage of map
and filter
in concert with tools like Map.entries
and array methods like Array.entries
, .some
, and .every
to avoid loops.
Any time you write a loop, reach for those tools first, instead of an index counter.
Making an abstract syntax tree
You may remember makeAbstractSyntaxTree()
from a previous reading, which walks over a parse tree to create an abstract syntax tree.
Here is a snippet of its original code, using a for
loop:
...
} else if (parseTree.name === IntegerGrammar.Sum) {
// sum ::= primary ('+' primary)*;
const children: Array<ParseTree<IntegerGrammar>> = parseTree.childrenByName(IntegerGrammar.Primary);
let expression: IntegerExpression = makeAbstractSyntaxTree(children[0] ?? assert.fail("missing child"));
for (let i = 1; i < children.length; ++i) {
expression = new Plus(expression, makeAbstractSyntaxTree(children[i] ?? assert.fail("missing child")));
}
return expression;
} ...
Let’s simplify this code using map/filter/reduce.
reading exercises
...
} else if (parseTree.name === IntegerGrammar.Sum) {
// sum ::= primary ('+' primary)*;
const children: Array<ParseTree<IntegerGrammar>> = parseTree.childrenByName(IntegerGrammar.Primary);
const subexprs: Array<IntegerExpression> = children.map(_______________);
const expression: IntegerExpression = subexprs.reduce(...);
return expression;
} ...
What can fill the blank of the map
call above? Use just one identifier, rather than a lambda expression.
(missing explanation)
...
} else if (parseTree.name === IntegerGrammar.Sum) {
// sum ::= primary ('+' primary)*;
const children: Array<ParseTree<IntegerGrammar>> = parseTree.childrenByName(IntegerGrammar.Primary);
const subexprs: Array<IntegerExpression> = children.map(...);
const expression: IntegerExpression = subexprs.reduce((result, subexpr) => ___________);
return expression;
} ...
What can fill the blank of the reduce
call above, to produce the final abstract syntax tree node representing this sum?
(missing explanation)
One more example
Let’s look at a typical database query example.
Suppose we have a database about digital cameras, in which each object is of type Camera
with various properties (brand
, pixels
, cost
, etc.).
The whole database is in an array called cameras
.
Then we can describe queries on this database using map/filter/reduce:
// What's the highest resolution Nikon sells?
cameras.filter(camera => camera.brand === "Nikon")
.map(camera => camera.pixels)
.reduce((x,y) => Math.max(x,y));
Relational databases use the map/filter/reduce paradigm (where it’s called project/select/aggregate). SQL (Structured Query Language) is the de facto standard language for querying relational databases. A typical SQL query looks like this:
select max(pixels) from cameras where brand = "Nikon"
cameras
is a sequence (of table rows, where each row has the data for one camera)
where brand = "Nikon"
is a filter
pixels
is a map (extracting just the pixels field from the row)
Summary
Zooming out, this reading is about functional programming: modeling problems and implementing systems with immutable data and operations that implement pure functions, as opposed to mutable data and operations with side effects.
Because functions in TypeScript are first-class, it is much easier to write higher-order functions that abstract away control flow code.
Some languages — Haskell, Scala, OCaml — are strongly associated with functional programming. Many other languages — Swift, Ruby, and so on — use functional programming to a greater or lesser extent.
Functional programming helps make code:
Safe from bugs. More immutable data reduces the chance your code has unintended side effects.
Easy to understand. Chaining operations like map, filter, and reduce in favor of sequential control flow statments like for and if makes your code much easier to read and follow.
Ready for change. The functional style of programming promotes code syntax that follows its semantics, which allows the programmer to focus on the heart of the computation and is designed to evolve in tandem with its problem model.