Docker For Scientific Projects
Docker is very useful for creating a virtual machine of sorts. This can be run on any computer and should provide the same environment making it a great tool for e.g. developing consistently as part of a team.
Requirements
Image Definition
The Docker image is defined in the Dockerfile
which is run in sequence and
must start with a FROM
directive. An example sequence of commands in the
Dockerfile
is given below.
# Select the base image.
FROM continuumio/anaconda3
# Set environment variables.
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
LABEL version="1.0"
The FROM
instruction provides a base image upon which to build. There are
many available options for this, accessed through the
Docker Hub. As the majority of the work we do is
related to scientific programming in python, a good base image is provided by
Continuum, with the Anaconda python 3 build. This perhaps isn’t the most
efficient way to generate an image, as there will be over 100 python packages
preinstalled. However, it is a relatively easy way to get a lot of
functionality quickly. Once the base image has been specified, we then set
environment variables such as language and labels.
# Create the root directory.
RUN mkdir project_name
COPY . /project_name/
ENV HOME=/project_name
ENV SHELL=/bin/bash
VOLUME /project_name
WORKDIR /project_name
Once the initial setup has been defined, it is necessary to establish the
directory structure, as above. Under the assumption that the Docker image is
generated for a specific project, the project directory is copied into a new
folder and set as the $HOME
directory. This makes it easy to get up and
running as quickly as possible when loading the image.
# Set the PYTHONPATH.
ENV PYTHONPATH=$PWD/:$PYTHONPATH
# Install additional python packages.
RUN pip install pytest-cov pyinstrument
For python projects, we add the required folders to the $PYTHONPATH
. Finally,
any additional python packages can be installed with pip. This will then give
an environment in which we can develop as we normally would, but with some
added flexibility and no need to edit our local setup.
Running Image
To start with, it is necessary to install Docker from
here. Once installed and running, we can load up
containers from the image defined in the Dockerfile
. In order to start a
Docker container, we run the following commands:
$ docker build -t project_name .
$ docker run -it project_name bash
This will load up the environment in the $HOME
directory set in the
Dockerfile
. From here it is possible to run any tests or example scripts as
one otherwise would.
To exit the container, just press ctrl + d
.
Docker images can get quite large and take up a lot of space. It is occasionally a good idea to check what images are available and perhaps clean things up.
$ docker images
$ docker system prune
There is some good advice on maintaining reasonably sized images here.
Summary
This is a very basic example of how one could set up a Docker image for development of a small project. It should be noted that typically CI can use images from Docker Hub, meaning it is possible to create the same environment for development, testing, and deployment.