Scientific computing

(for the rest of us)

Welcome!

One specific challenge, when writing code as a scientist, is that we care a lot about getting the right answer; but of course, the right answer is not always obvious. So we should be very careful with the code we write. A piece of code that crashes is annoying; but a piece of code that runs, and give you the wrong answer can compromise your science and your career. This guide will help you adopt practices that make it less likely to introduce mistakes in your code, and more likely to catch them. Hopefully, this will let all of us write code we can trust more.

Good principles in scientific computing can help you write code that is easier to maintain, easier to reproduce, and easier to debug. But it can be difficult to find an introduction to get you started. The goal of this project is to provide reproducible documents you can use to get started on the most important points. You can use these lessons on your own, or as a group.

This material has been designed according to three core ideas:

First, each module is short, and introduces a single concept. In many places, you will notice advice redirecting you to the documentation. The Julia manual is extremely thorough, and the point of this material is to show “how things work” (as opposed to going into all of the different ways they can be made to work).

Second, this material is not about showcasing different packages. There are some situations where we will need to go into the details of e.g. Makie for plotting, but it is expected that you will, again, read the documentation for the packages that are loaded. Every time a package is mentioned, you can click on its name to be redirected to the Julia Packages page; packages with no documentation on the hub (like e.g. Base) have a different styling. The material relies most of the time only on packages from Julia’s standard library.

Finally, no error messages. This is an important design concept. Although error messages happen in the daily practice of programming, the point of this material is to anticipate and handle exceptions gracefully.

Getting Started

These modules offer a very high-level overview of how to setup a project, organize your work, and install Julia. They do not require any familiarity with programming, and are mostly concerned about making sure that you can plan a computing project ahead of time. Note that installation instructions are not covered until late into the module (by design!).

In this module, before we write any code, we will start thinking about what a project is, how we can set one up on our computer, and why this might help defeat coder’s block.

00/01 Flowcharts

One of the most powerful tool to plan a programming task is to draw a flowchart. In simple terms, a flowchart will let you map the different steps that the program will have to follow, and see what is required for each of them. To illustrate, we will use a flowchart not of a program, but of a pancake recipe.

00/02 Pseudo-code

To facilitate the transition between diagram and code, one important step is to write pseudo-code, i.e. text that looks reasonably like code, but is not. This pseudo-code will not help the computer think about our problem, but it might help us think about the problem in ways that will make the actual programming easier.

In this module, we will see how we can install Julia, setup a default version, and go through some of the usual tools involved in setting up a good Julia development environment. We will not deal with the installation of packages quite yet, as this will be done with its own module.

In this previous module, we did not load a single package: everything we wanted to do was provided by Julia “out of the box”. In most applications, we will need to get functionalities from other packages, and this is where Julia’s package manager shines.

Fundamentals

The goal of these modules is to increase your familiarity with some of the most fundamental notions: how to think like a computer, in values that are either true or false, and what types are (and why they matter!).

In this module, we will get acquainted with one of the most important type of variables: Boolean values. They represent values that are either true or false, which is a key element in a number of problems.

In the previous module, we have introduce important notions about Boolean values. In this module, we will expand upon this knowledge in ways that will enable us to be more expressive with the code we write.

In this module, we will look at the “ternary operator”, a very efficient shortcut to perform a logical test in a single line. This is a construct we will use quite a lot to express both possible outcomes of a conditional expression using a single line!

01/03 Types

In this module, we will look at one of the most important concept in Julia: types. Types are, to be really imprecise, the way a programming language thinks about a value. A lot of problems arise from the fact that programming languages are very opinionated.

Data Structures

In these modules, we will take a little tour of some of the most important ways to represent data; in particular, we will see how they can be used, when they should be used, and how they interact.

A lot of scientific computing eventually boils down to accessing things in structures that look like vectors or matrices. In this module, we will examine the basic syntax to create, interact with, and transpose these structures. This is one of the most foundational module in the class, as we will be using an absurd quantity of vectors and matrices moving forward.

In the previous module, we have introduced the notion of Arrays, and experimented with the shape of vectors and matrices. In this module, we will continue our exploration of these objects, and see how we can modify and access the information they store.

In this module, we will explore two very useful data structures: dictionaries, which serve as “key-value” stores, and pairs, which serve as (essentially) the same thing but smaller.

In this module, we will explore data structures that look a lot like arrays, but have subtly different use cases: tuples, named tuples, and sets. Knowing when to use arrays and when to use others data structure can really make a difference in your programming!s

In this module, we will look at string and characters, i.e. representations of text. These objects are really interesting in Julia because not only do they store information, they can store a little bit of computation as well. The point of this module is to go through the basics of what strings are, and we will revisit advanced operations in later sections.

In this module, we will briefly see how we can define our own types (aka struct), and give them a hierarchy. We will barely scratch the surface of what can be done with custom types, as the real fun will take place in the modules on dispatch and overloading (don’t read them yet!).

The Flow

The point of this module is to understand how we can direct the flow of execution. This covers iteration, loops, advanced loops, and the ways in which we can handle (and react to) errors arising during execution. These modules will give you to most important building block towards building complex programs.

Oh no. Oh no no no. This is not a fun module. This will not be pleasant. But this will, very much, be necessary and incredibly empowering. Sit down, buckle up, we’re about to see what loops do.

In this module, we will see how we actually iterate over objects in Julia. Although the content of the previous module is very important, as it forms the basis of all ways to iterate, there are a number of functions that greatly facilitate our task. We finish this module by simulating a simple host-parasitoid model.

In this module, we will start integrating skills from the previous modules, both about iteration and about data structures, indexing, slicing, etc. We will simulate the temporal dynamics of two populations, one of hosts and one of parasitoids, using a simple time-discrete model.

In this module, we will see how we can use the while construct to make a series of instructions repeat until a condition is met, and how to deal with common caveats that can arise when using a while loop.

In the previous two modules, we have written loops that terminate when a condition is met (while), or when the collection has been iterated over entirely (for). In some cases, we may want to fine-tune the behavior of our iteration. In these cases, we can use some special keywords to jump out of the loop entirely, or skip some steps.

In previous modules, we have used a try/catch statement. In this module, we will go into some detail about what it means, and how to use them to write code that handles errors gracefully.

Basic Functions Usage

Julia is a language that loves dealing with functions. In this section, we will have a look at the various ways to declare them, but also take a deep dive in the dispatch system, and finally examine how we can create our own functions as we go.

Everything should be a function. Everything. Especially in Julia, for performance related reasons that are far beyond the scope of this material. So one of the first, most significant piece of knowledge to acquire is: how do I declare a function?

The point of this module is to understand dispatch, which is to say, the way the correct method is called based on the arguments; we will also see how to use it to write the least possible amount of code!

In this module, we will expand on the previous content (understanding dispatch) to get familiar with a central design paradigm of Julia: multiple dispatch. We will do so by writing code to simulate the growth of a population in space.

In this module, we will learn how we can write functions that return other functions. Because this seems a little weird at first, we will also discuss situations in which this is a useful design pattern, and see how this approach can be used together with Julia’s powerful dispatch system.

In some of the previous modules, we have used a notation that looked like function.(arguments), or x .+ y. In this module, we will talk about what the . notation does, and more broadly, what broadcasting is.

In the previous modules, we have defined functions that used positional arguments, some with default values, some without. In this module, we will look at keyword arguments and splatting, to build functions that we can control a bit more.

In this module, we will look at what is probably the most important part of writing a function: writing its documentation. By the end of this module, you will be able to write a docstring for your function that is accessible through Julia’s help mode.

Advanced Functions Usage

Julia is a language that loves dealing with functions. In this section, we will have a look at the various ways to declare them, but also take a deep dive in the dispatch system, and finally examine how we can create our own functions as we go.

In this module, we will see how Julia deals with collections when they are passed as arguments to a function, why this can be terrifying when coming from other languages that are less concerned with economy of memory, and how we can use this behavior to write more efficient code.

In this lesson, we will see how we can dispatch on parametric types, in order to have a fine-grained control on what method is used for different types of data collections. This is a core design pattern in Julia, and we will illustrate it by building some functions related to measuring the distances between points.

In this module, we will explore the Test package, which allows to programmatically test the behavior of a function. We will see how testing can bring us closer to being confident in our code.

05/03 Recursion

What is recursion, if not recursion persevering? In this module, we will see how to call functions recursively, and discuss when this is appropriate in real life.

In this module, we will talk about type stability, and see how we can annotate the functions in Julia to be explicit about what type they return.

Advanced Topics

This section contains modules that cover more advanced topics, that will let you write code that is more and more expressive, more and more safe to run, and more and more efficient. Picking up these skills will be important to follow along with the applied examples in the next section.

In a lot of applications, we want to apply some operation to all elements in a collection, and then aggregate these elements together in a grand unified answer. In this module, we will have a look at the map-filter-reduce strategy, as well as the accumulate operation.

In this module, we will see how to locate interesting values in collections, and how to extract and test the existence of some of these values. This is important knowledge in order to build more advanced programs, and we will put it in action in the following section.

06/02 Overloading

Overloading is a very powerful mechanism, through which we can add methods to existing functions to make them work with our own types. In this module, we will discuss how to overload existing functions, and how to use this approach in practice.

Files

The goal of these modules is to develop proficiency in working with files. We will see how to handle the path, create and delete directories, download files, and open common data formats like delimited files, CSV, and JSON.

07/00 The Path

One of the main obstacle to reproductible projects is issues with describing where files are. In this module, we will talk about the path, and how to refer to locations in a way that will work on any computer.

In this module, we will see how Julia allows downloading files from the internet, and how we can decide where to store them. This is a common task when getting external data, and will be the basis of a number of advanced training modules in the final section of this material.

A lot of files we use in scientific computing are very simple, and organized as tables. There are a lot of packages in Julia to handle these files, including the full-featured DataFrames and DataFramesMeta. But in this module, we will focus on the standard library package DelimitedFiles, which allows to store files where fields are separated by a specified character.

The JSON format is really interesting to store highly structured information. In this module, we will see how it maps naturally on the Dict data structure, how to use it to load and save data, and how to print the contents of a JSON file. As an illustration, we will look at the time series of vaccination against COVID-19 in New Zealand.

Not all data come from static files. In a large number of scientific applications, we need to collect data from websites, often using a RESTful API. In this module, we will use a very simple example to show how we can collect data about postal codes, get them as a JSON file, and process them.

Applications

In this section, we will go through longer modules involving real data, in order to apply the various skills we picked up from the previous sections. These are the modules you should read last, as even though they do not introduce new material, they will require to have some command over the concepts from multiple previous modules.

In this module, we will use the gradient descent algorithm to perform a linear regression, to estimate the brain mass of an animal if we know its body mass. This will draw on concepts from a number of previous modules, while also presenting an example of how core programming skills can be applied for research.

Naive Bayes Classifiers are formidable because they can learn so much about a dataset based on relatively scarce information. In this module, we will build one from scratch, using (mostly) methods from Julia’s standard library.

In this module,

In this module, we will have a look at indexing in order to simulate the behavior of a forest when trees can catch on fire, be planted, and regrow. This is a common example in complex system studies, and produces very visually pleasing structures in space! As a treat, we will spend a little more time learning about how Makie works.

In this module, we will look at a way to start working on the travelling salesperson problem. This is mostly an excuse to play with simulated annealing, which is a really cool optimisation algorithm.