Programming is intimidating

If your working environment looks anything like mine, the first thing you may see on a new project will look like this:

tpoisot@x1:~$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.3 (2020-11-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> 

This is highly intimidating.

Facing an empty text editor or a blinking REPL (it stands for Read/Execute/Print Loop, and this the place where you type things) prompt at the beginning of a programming task is like facing an empty page at the beginning of a writing project. To make it easier to work through, one really good (but also really uncommon) approach is to walk away from the computer. Instead of trying to write code, we will first try to articulate the structure of our project - and not jumping to our keyboard works really well for this.

In this lesson, we will see how we can optimize our preparation for any project, by having good project organization, thinking about the flowchart of our program, and writing pseudo-code as needed.

After this lesson, you will be able to …

Project management

Here is a productivity tip: work in such a way that you do not need to think about the non-important parts of your project. For us, this means having a standard layout for any project, and using it consistently. We will suggest one such layout below, but this is not the only one. There are an infinity of them, and the one that works is the one that works for you.

Still, there are some good guidelines to follow.

First, a project is a folder, everything in the project belongs to the folder, and everything outside the folder cannot belong to the project. This is important because it maximizes the chances that you can copy your project to another computer, and have it “just work”. Of course in real life, things seldom “just work”, and this is why we should pay extra care to the practices that maximise the chances of it being the case.

This is not necessarily true if your project involves remote computing - or very large amounts of data. But then again, if you work in such a way, we expect that you will have read the documentation, and sought proper training.

Second, file paths are not your friends. By far the biggest obstacle to reproducibility is the difference in ways to tell the computer where files are. We encourage to use relative paths, relative specifically to the folder in which your project lives. So, for example,

/home/tim/ohno/my/computer/is_a_mess/MyCoolProject/data/file.txt

is an absolute file path, from the very root of where files live. But I am only working on MyCoolProject at the moment, so the relative path of this file is

data/file.txt

Or is it? On my machine running Linux, sure. But on a Windows machine I would use in a computer lab? The file would be at

data\file.txt

This difference between / and \ is trivial to us, but not to the computer (they are, actually, trivial to Julia which tends to parse paths well all the time). So to avoid these issues, it is better to let your programming language construct the path correclty. Julia has the joinpath function, and so we would refer to this file as:

joinpath("data", "file.txt")
"data/file.txt"

Third, and final principle, similar things should be grouped together. In other words, it will be easier to navigate our project if there is a folder called data and it contains data, a folder called figures and it contains figures, and a folder called code which contains code. The name and variety of these folders is left up to you.

With this in mind, here is a possible template for a project.

.
├── .git                    # We use version control, always!
│   └── ...
├── artifacts               # Outputs that are not figures go here
│   └── summary.csv
├── code                    # Code that creates something goes here
│   ├── figure01.jl
│   └── simulations.jl
├── data                    # Raw data goes here
│   └── BCI.data
├── lib                     # Useful functions go here
│   ├── environmentaldata.jl
│   └── model.jl
├── LICENSE                 # We use a license, always!
├── README.md               # Also mandatory: a README
├── Project.toml
├── Manifest.toml
└── text                    # This folder can store notes and documents
    └── notes.md

There are a few files here on which it is worth spending a little more time. Their names are presented below as links, to the resource that we think is the most informative. It is, indeed, a very good idea to read the content of these links as well.

The README is a standard file for all projects, which gives information about the project, how to run it, the dependencies, who is in charge, etc. Writing a good README is a difficult task, but this is going to be the point of entry in your project. Show it some love.

LICENSE is the text of a license, which gives information about intellectual property and your own liability regarding the use of your project. Picking FOSS (Free and Open Source Software) licenses is recommended, and there are a lot of the to accomodate different use cases.

Finally, and these are language-specific, the Julia package manager will create files to track the dependencies of your project. Check out the documentation.

Flowcharts

One of the most powerful tool to plan a programming task is to draw a flochart. In simple terms, a flowchart will let you map the different steps that the program will have to follow, and see what is required for each of them. To illustrate, we will use a flowchart not of a program, but of a pancake recipe.

At the beginning, it can be useful to break the task down into coarse steps:

graph LR prepare --> cook cook --> eat

This is not much to look at, but this is a very high-level view of what we want to achieve. It is also easy to zoom in on every task, and try to further break it down into smaller tasks. Let’s do this for the task at left: prepare.

graph LR liquid[liquid ingredients] --> batter solid[solid ingredients] --> batter batter --> cook cook --> eat

“All” we have done so far is to split the prepare step into the preparation of solid and liquid ingredients. To come up with a full flowchart, we need to repeat this process until we are satisfied by the completeness of it:

graph LR liquid[liquid ingredients] --> mix{mix} solid[solid ingredients] --> mix mix --> batter batter --> cook{cook} cook --> eat{eat}

Notice that we have introduced a new shape: the rotated square will be used to note actions, and anything else will be either an output or an input of these actions.

We like the idea of using different shapes to indicate different steps: what goes in, what is the action, are there any artifacts produced or side-effects? This being said, even a very coarse flowchart can help you.

If we repeat the exercise for most of the boxes, we may end up with a flowchart like this. It might seem a little overwhelming at first, but this is a helpful document. It maps out the precise steps and processes in your project, and can help you both to plan what to write, but also to understand what you have written previously.

graph TD flour --> sift{sift} sugar --> mix1{mix} bak[baking powder] --> mix1 salt --> mix1 sift --> mix1 milk --> mix2{mix} eggs --> whisk1{whisk} whisk1 --> mix2 butter --> melt{melt} melt --> mix2 mix1 --> combine{combine} mix2 --> combine combine --> batter griddle[flat griddle] --> heat{heat} batter --> cook{cook} heat --> cook cook --> flip{flip} flip --> cook2{cook} cook2 --> pancake pancake --> eat{eat}

See how a simple task (“make crepes!") can be broken down into dozens of litle steps, each representing a single well-defined action? Programming is, essentially, that. We are going to need to specificy things as unambiguously as possible, and breaking the big tasks down is going to be immensely helpful.

Pseudo-code

To facilitate the transition between diagram and code, one important step is to write pseudo-code, i.e. text that looks reasonably like code, but is not.

Pseudo-code is mostly useful when it comes to working on a single function. The overall structure of the project can be done as a flowchart, and then each function can be written as pseudo-code. This is, in fact, a really good exercise to try with your own projects.

Let us try with an example - we want to sort numbers. We will write pseudo-code for a very, veru inefficient algorithm:

take a list of numbers X
create an empty list Y

as long as X still has elements in it
    find the minimum of X
    add it to Y
    remove it from X

return Y

This is, essentially, what pseudo-code is: a way to explain in your own words what the function should do. This can be translated line-by-line into code:

Pseudo-codeCode
take a list of numbers Xfunction badsort(X::Vector{T}) where {T <: Number}
create an empty list YY = eltype(X)[]
as long as X still has elements in itwhile length(X) > 0
find the minimum of Xm, i = findmin(X)
add it to Ypush!(Y, m)
remove it from Xdeleteat!(X, i)
end
return Yreturn Y
end

The right column is actual code! In practice, we rarely (never!) convert pseudo-code to code literally. But when you are writing your first functions, it may be a good idea.