Scientific computing

(for the rest of us)

Mutating functions

In this module, we will see how Julia deals with collections when they are passed as arguments to a function, why this can be terrifying when coming from other languages that are less concerned with economy of memory, and how we can use this behavior to write more efficient code.

Let’s wrap the numbers from one to 25 in a matrix:

A = reshape(collect(1:25), (5, 5))
5×5 Matrix{Int64}:
 1   6  11  16  21
 2   7  12  17  22
 3   8  13  18  23
 4   9  14  19  24
 5  10  15  20  25

And now, let’s double them:

2A
5×5 Matrix{Int64}:
  2  12  22  32  42
  4  14  24  34  44
  6  16  26  36  46
  8  18  28  38  48
 10  20  30  40  50

Very well. Now let’s do this within a function:

function twice(M)
    for i in eachindex(M)
        M[i] = 2M[i]
    end
    return M
end
twice (generic function with 1 method)

We can call this function on A:

twice(A)
5×5 Matrix{Int64}:
  2  12  22  32  42
  4  14  24  34  44
  6  16  26  36  46
  8  18  28  38  48
 10  20  30  40  50

And let’s have a look at A again:

A
5×5 Matrix{Int64}:
  2  12  22  32  42
  4  14  24  34  44
  6  16  26  36  46
  8  18  28  38  48
 10  20  30  40  50
Yeah. You see how that can be a problem, right? A has been changed because we passed it through the twice function. This is super dangerous. But also really useful. With great power, and all that.

So what is going on?

When you pass a collection to a Julia function, you do not pass a copy of this object. You pass the location of this collection in memory, for the function to work with. It might make sense if you know C, and no sense if you know R, but this is what Julia does, and the net benefit is: regardless of the size of the array you work on, it “costs” the same memory footprint to send it to a function.

For this reason, it is very easy (in fact, it is the default) for Julia functions to perform mutations on their arguments. By convention, this behavior is indicated by a ! at the end of the function name. Think of it as the language going Hey, listen!, and asking you to pay attention.

So let’s start again, with a collection of values, and we will get a matrix with the remainder of their integer division by any other integer:

A = reshape(collect(1:25), (5, 5))
5×5 Matrix{Int64}:
 1   6  11  16  21
 2   7  12  17  22
 3   8  13  18  23
 4   9  14  19  24
 5  10  15  20  25

The mutating function will have a ! at the end of its name:

function remainder!(X::Matrix{T}, d::T) where {T <: Integer}
    for i in eachindex(X)
        X[i] = X[i] % d
    end
    return X
end
remainder! (generic function with 1 method)
Note that we use eachindex to move through the matrix linearly, as we discussed in the module on iteration. We would get the same behavior iterating over rows and columns explicitly, but this notation is more concise, and therefore more readable and “better”.

This function will overwrite its argument X. How do we avoid this? We cannot. It is a consequence of functions receiving the memory location of collections, as opposed to a copy of the collection. But wait! This is our solution! If we do not want the function to modify our original collection, we can point it towards a copy of this collection!

function remainder(X::Matrix{T}, d::T) where {T <: Integer}
    Y = copy(X)
    remainder!(Y, d)
    return Y
end
remainder (generic function with 1 method)

This function does not end with a !: this is because it is not modifying its first argument. Instead, we create a copy of this argument. This is when we re-use the function that changes its argument, to modify the copy, which we then return. This forms the basis of a very powerful (and common) design pattern in Julia: write a function foo!, then write a function foo who copies its first argument and calls foo! on the copy.

We can call our new function:

remainder(A, 3)
5×5 Matrix{Int64}:
 1  0  2  1  0
 2  1  0  2  1
 0  2  1  0  2
 1  0  2  1  0
 2  1  0  2  1

We can check that A has not been affected:

A
5×5 Matrix{Int64}:
 1   6  11  16  21
 2   7  12  17  22
 3   8  13  18  23
 4   9  14  19  24
 5  10  15  20  25

Success!

There is a little subtelty here related to dispatch. Our remainder and remainder! are different functions, even though they belong to the same “family”. This may be important if you look at the methods for remainder, as it will not list the methods for remainder!.

And now, here is how we turn this into a very powerful way to be efficient about memory management. We will write another method for remainder! that over-writes, not its original argument, but an arbitrary array:

Z = similar(A)
5×5 Matrix{Int64}:
 139717375034448  139718667886960  139718667887280  139718667887600  139718667886768
 139718667886704  139718667887024  139718667887344  139718667887664  139717375034432
 139718667886832  139718667887088  139718667887408  139718667887728  139718667887088
 139717375034432  139718667887152  139718667887472  139718667887792  139718667887280
 139718667886896  139718667887216  139718667887536  139718667887856  139718667887472
The similar function will create an object that is similar to its argument in the sense that it will have the same shape and type, but the value inside this object are not predictible.

This might be a placeholder array we will use to store results temporarily. For example, when we perform thousands of iterations, we might not need to re-allocate the memory every time: in this case, it is beneficial to over-write the same memory locations over and over again.

function remainder!(x::Matrix{T}, X::Matrix{T}, d::T) where {T <: Integer}
    for i in eachindex(X)
        x[i] = X[i] % d
    end
    return x
end
remainder! (generic function with 2 methods)

The way to call this function is:

remainder!(Z, A, 2)
5×5 Matrix{Int64}:
 1  0  1  0  1
 0  1  0  1  0
 1  0  1  0  1
 0  1  0  1  0
 1  0  1  0  1
remainder!(Z, A, 3)
5×5 Matrix{Int64}:
 1  0  2  1  0
 2  1  0  2  1
 0  2  1  0  2
 1  0  2  1  0
 2  1  0  2  1

But wait! The new version of remainder! we wrote is suspiciously similar to the earliest version of remainder!. In fact, with the exception of a single change, they are the same function. It is therefore time to revisit our initial remainder! function:

function remainder!(X::Matrix{T}, d::T) where {T <: Integer}
    return remainder!(X, X, d)
end
remainder! (generic function with 2 methods)

See, writing read from X and write in Z and read from X and write in X is essentially the same thing, because we are giving Julia memory locations. And because remainder! is a function with different methods, the dispatch system is now working for us.

In practice, we would take great care to ensure that the dimensions of the various objects are compatible, but this has been omitted for the sake of brevity. This can be done with @assert, or more explicitly with checks and the throwing of exceptions.

So let’s summarize, and write our little remainder “library”, from the most basal to the most abstract function:

function remainder!(x::Matrix{T}, X::Matrix{T}, d::T) where {T <: Integer}
    for i in eachindex(X)
        x[i] = X[i] % d
    end
    return x
end

function remainder!(X::Matrix{T}, d::T) where {T <: Integer}
    return remainder!(X, X, d)
end

function remainder(X::Matrix{T}, d::T) where {T <: Integer}
    Y = copy(X)
    remainder!(Y, d)
    return Y
end
remainder (generic function with 1 method)

What happens when we do the following?

remainder(A, 3)
5×5 Matrix{Int64}:
 1  0  2  1  0
 2  1  0  2  1
 0  2  1  0  2
 1  0  2  1  0
 2  1  0  2  1

We start by calling remainder, which makes a copy of the first argument, then calls remainder! (using a single argument); this then calls remainder! using the two arguments, and gives us the result without modifying A:

A
5×5 Matrix{Int64}:
 1   6  11  16  21
 2   7  12  17  22
 3   8  13  18  23
 4   9  14  19  24
 5  10  15  20  25

Let’s take a step back. Working with this design pattern is very useful when we sometimes have to overwrite some memory. In other cases, we only care about getting an object returned. Or we definitely want to overwrite the object we pass as an argument. To accomodate these use-cases, writing functions with and without !, and that operate on themselves or similar objects, give us the flexibility we need, and is a very Julian code design.