Scientific computing

(for the rest of us)

Map, Filter, Accumulate, Reduce

In a lot of applications, we want to apply some operation to all elements in a collection, and then aggregate these elements together in a grand unified answer. In this module, we will have a look at the map-filter-reduce strategy, as well as the accumulate operation.

Calculating an average. Easy, right? We load Statistics, and call mean. What if we made it fun? What if we made it way, way more complicated than it has any right to be?

Let’s take a vector $\mathbf{x}$. The average value of $\mathbf{x}$ is its sum divided by its length:

$$ \frac{x_1}{n} + \frac{x_2}{n} + … + \frac{x_n}{n} $$

In other words, for every element $x_i$, we want to divide it by $n$, and then we want to reduce all of these using the sum operation. This is a task for map and reduce.

The map function will take a function as argument, and apply it to every element of a collection. For example, this is a way to use map to double every entry in an array:

x = [1, 2, 3, 4]
map(e -> 2e, x)
4-element Vector{Int64}:
 2
 4
 6
 8

On the other hand, reduce accepts a binary operator (something requiring two arguments), and applies it sequentially alongside an array until a single value remains. For example, the sum of an array expressed as reduce is:

y = [1, 2, 4, 5]
reduce(+, y)
12

And so, we have enough to build a very crude function to get the average of an array of values:

x = [1, 2, 3, 4, 5, 6, 7]
n = length(x)
reduce(+, map(e -> e / n, x))
4.0
The reduce operation has no well defined behavior when using an operator without associativity, like a substraction. This is because reduce(-, [a,b,c]) can be a-(b-c) or (a-b)-c; the documentation for reduce has a number of alternatives to suggest.

Another function that is often used together with map and reduce is filter. The filter function evaluates a condition on every element of a collection:

filter(isodd, 1:10)
5-element Vector{Int64}:
 1
 3
 5
 7
 9

Using filter can be done before map (we want to apply on operation, but only on some elements), or after map (we want to apply the operation and see where we stand). This sequence of operations is commonly known as map-filter-reduce, and is a very expressive way of chaining together operations.

Another related function is accumulate, which works much like reduce but without collapsing the vector to a single element. For example the sequence of $n!$ is

accumulate(*, 1:5)
5-element Vector{Int64}:
   1
   2
   6
  24
 120

and the cumulative sum of an array is

accumulate(+, 1:5)
5-element Vector{Int64}:
  1
  3
  6
 10
 15

Chained together, these four functions can get really powerful. For example, we can use accumulate to write a logistic growth model in a single line. We can define some parameters:

r, K, n₀ = 2.3, 1.0, 0.01
(2.3, 1.0, 0.01)

We can now define a model that takes two arguments, as per the documentation of accumulate (which you should definitely read):

model = (n, _) -> n + n * r * (1 - n / K)
#5 (generic function with 1 method)

And run it for a number of steps defined by the array; note that we use the init keyword to “seed” the process with a value of our choice, here the initial population size:

nt = accumulate(model, zeros(Float64, 10); init = n₀)
10-element Vector{Any}:
 0.03277
 0.10567109233
 0.3230319312543046
 0.8260012273364679
 1.156564586819236
 0.7400873564894737
 1.1825108973734353
 0.6861223097964039
 1.1814468271273222
 0.6883963372629648

This is a time-discrete model with no loop!

The map function has a variant for arrays with more than one dimension called mapslices, which works on slices of high-dimensional arrays. It’s useful to perform, e.g., row-wise or column-wise operations on matrices.