One specific challenge, when writing code as a scientist, is that we care a lot about getting the right answer; but of course, the right answer is not always obvious. So we should be very careful with the code we write. A piece of code that crashes is annoying; but a piece of code that runs, and give you the wrong answer can compromise your science and your career. This guide will help you adopt practices that make it less likely to introduce mistakes in your code, and more likely to catch them. Hopefully, this will let all of us write code we can trust more.

Good principles in scientific computing can help you write code that is easier to maintain, easier to reproduce, and easier to debug. But it can be difficult to find an introduction to get you started. The goal of this project is to provide reproducible documents you can use to get started on the most important points. You can use these lessons on your own, or as a group.

The content is divided in three categories: lessons, capstones and primers. We recommend starting by the lessons - they will provide basic information. Capstones are examples of real-life application of the content of lessons. Finally, primers are very short bits of additional information, which can be browsed at any time.

Who is this material for?

This material is aimed at people who have already interacted with a computer using a programming language, but want to adopt best practices that make their code more robust. It can also be used to facilitate the onboarding of new people in your lab or your project.

Scientific computing can be very diverse, ranging from a few-step analysis of small data sets to simulations running for weeks on supercomputers. We focus on the most common situations that every scientist encounters at some stage of a research project: data analyses performed on a standard desktop computer. The general ideas and principles that we expose carry over to other situations as well, but the concrete tools and methods may not be suitable for tasks requiring special hardware such as GPUs or supercomputers, or for projects requiring a significant software development effort.

We will use the Julia programming language; but you don’t need to know anything about it either. We will keep the discussion very general. In fact, you will see that good practices for scientific computing have very little to do with tools and technical things; instead, they rely on thinking about programming in a slightly different way. You will be able to apply these principles to any language you prefer to use.

How to use this material

The best way to read this material is to keep a window with either [JuliaBox][jlbox] or a terminal running Julia open, and type the code. It is tempting to copy and paste, but typing the code actually matters.

Snippets of code that are important are presented this way:

[rand(i) for i in 1:5]

Bits of code of lower importance (pseudocode or code you are not meant to type), are presented this way:

for each_element in vector

Finally, the output of code is presented this way:

2-element Array{Float64,1}:

Throughout the lessons, we have added some asides – they are ranked in order of importance. The first are “informations”:

All that should matter in the choice of tools, language, environment, is that it lets you become productive, and solve the problem you want to solve.

“Opinions” are points we would like to raise for the reader’s consideration, and can be ignored. Example:

People who think it’s OK to criticize others based on their choice of language, OS, text editor, etc, should go home and think about what they did.

“Warnings” are points that can be important, but not necessarily as a novice. It is worth keeping a mental note of them, especially in the long term. Example:

Any time you are about to comment on people’s choice of tools, ask yourself whether this is really necessary, and the answer is usually “no”. The Good Tool is the one that works for its user.

“Dangers” are really important points, that can prove especially dangerous or risky to everyone. They are worth reading a few times over. Example:

This toxic behaviour is driving brilliant people away, and should never be tolerated. Disliking Windows has not made anyone edgy or cool since 1998.

Want to see this material as a workshop?

This material can be given in a workshop format, ideally over two days, covering several lessons and one or two capstone examples. Please contact Timothée Poisot for more information.

Want to contribute to this material?

There are a number of ways to contribute. Before you start, please have a look at our Code of Conduct. It boils down to be nice and respectful – no contribution, no matter how amazing it may be, justifies or excuses bad behaviour. For the actual details on how to contribute, head over to the guidelines.

Want to read more?

In a rush? Yes you are. We suggest “Good enough practices in scientific computing” to get you started.

A little bit more time? We think “Best practices in scientific computing” might suit you.

Want a more complete thing to read? “The pragmatic programmer” is a masterpiece. I have also heard great things about “Clean code”. The online book “How to think like a computer scientist” is based on Julia, and very thorough. Finally, “Hands-on design patterns and best practices with Julia” is a wonderfully accessible book that will make you a better programmer, even if Julia is not your main language.

If you still have some time, you can read something about ways to improve user confidence in your software, or ways to elevate code as a research output.

Code is Science is a very nice project about making peer review of scientific code more common. They have a list of issues you can tackle to help!

Finally, a short Q&A at Nature Jobs about this project.


Special thanks to…

Comments, ideas, feedback: Hao Ye, Philipp Bayer, Tim Head, Ethan White, Andrew MacDonald

Other contributions: Konrad Hinsen