Docker is quickly becoming a new paradigm in software development, so I became curious to find out why it is so special.
Why does it exist?
Originally software was small self-contained parcels, with no worries about setting up complicated servers or processes to support applications. But with the advent of the internet and an interconnected world, it all changed. A business had to buy hardware, and a system administrator spent days setting it up for development or production with no path to ease scalability. Fast forward a bit, now had more powerful hardware, capable of running smaller virtual computers, allowing crude and inefficient scalability. There was no supporting infrastructure or software tools, but we could copy virtual computers across hardware, across the internet. Hardware vendors caught up to the new trend and started creating new products that made virtualisation much more viable and efficient, but this approach was never really going to be the best solution. We needed a better way. Linux namespaces gives us the scalability of a fully-fledged virtual machine but with much less resource usage. A process runs in Linux in a sandboxed environment with a virtual file-system and access to underlying hardware with a light weight wrapper. In this new paradigm we needed a way to replace the disk images of virtual machines and their management (execution, storage and distribution), we call one of these tools Docker.
So what is Docker?
This part of the story is about Linux namespaces and we start it with something called a process identifier or PID. When you open up an application on your computer it is assigned a number so Windows or Linux can keep track of it, this number is called a PID. Linux stores its list of PIDs in a tree structure similar to a folder hierarchy on a desktop PC. When Linux starts up it creates a PID 1 and all other processes are placed under this PID. The problem is that standard permissions allow processes to access and inspect the tree. Linux namespaces create little virtual trees inside of the bigger tree that can not access or know about processes outside of itself. This enhances the security of Linux and creates the isolation needed for Docker to function. Linux provides the same concept for network and disk IO. Docker starts an image using these namespaces to isolate the executable within from the rest of the system. Docker is so much more though, because it deals with the creation and distribution of these images too, it uses Git like functionality to build new images from base images and provides a way of exporting and importing these images onto your local repository. I can pull a Linux image from the docker repository, run a few commands that enables that image to run a Laravel project, save my changes to the new image and publish it back to a docker repository. Someone else can then download my image and be sure it runs exactly the way it would on my host, using the same Apache, the same PHP and the same version of Laravel.
How do I use Docker?
Docker is a command line tool, most Linux developers prefer using the command line anyway. A normal Docker work flow starts by searching the Docker repository for images that fits the need of the user either to run as is or to extend. The command for searching docker is “docker search ‘image’”. To download an image from the repository you use the “docker pull ‘image’” command similar to Git. To execute a command in a Docker image use the “docker run ‘image’ ‘command’“ syntax. I suggest using the “-it” flags if you need to make the command interactive. Using the “-d” flag runs the Docker container (running image) in daemon mode. Using the “docker run -it Ubuntu /bin/bash” command opens up a terminal to the container that allows you to run commands like “apt-get install mysql-server”. At this point you would want to save the changes you have made to a container. First we need to find the container ID, we can do this by using the “docker ps” command to list all the running containers. To actually save the changes we need to save it to a new image by using the “docker commit ‘container ID’ ‘new image name’” command. Now we have a brand new docker image that we can reuse time and again.
Now we come to the really useful part. I mentioned that you can run commands inside a docker container by running its bash command and then save the changes using “docker commit”. Docker files help automate this process by providing a syntax that is stored in a file, making quick revision and automation scripts possible. A Docker file starts with a FROM tag, this specifies the base image that the file is using. The MAINTAINER tag is normally next and provides a reference to the author of the image. A RUN tag allows you to run executables on the system like “apt-get’ or “yum”. It’s best to chain RUN tags using the && operator because Docker creates a temporary image every time you execute a RUN tag command. WORKDIR is another useful tag, this sets the current active directory and is likely required by some less well designed applications to work. Some docker image can be run without providing an explicit command (“docker run cassandra”), it’s because the image has an ENTRYPOINT tag, this tag specifies a default command to use when executing the image. And finally the ADD command, this adds files from the host file-system to the docker image.
Why Docker is so useful
Normally you would store a copy of the Docker file in the project somewhere and have a CI server build artefacts using the syntax. The artefacts are deployable components that the Dev Ops and QAs can use. Here’s the scenario: A developer is asked to implement a new feature, they make some changes and pushes it up on a new branch. The CI server picks up the changes and builds a new Docker image. The QA pulls the new Docker image and runs it on their own desktops. They test the feature and they find a bug. The bug is logged against that docker build and communicated to the developer. The developer wants to see the bug for themselves so they pull the image too and go through the same steps. They come across the bug because they are able to replicate the exact same environment as the QA and are able to fix bug. They push up a new commit and a new image is created. It’s tested and it passes QA. The changes are merged to the master branch in Git and this fires off a production Docker image. The Dev Ops team pull the image on the production server and simply executes the “docker run ‘image’” command. Solving the problem that even with the best of intentions the deployment environments have subtle but important differences. A good example is the difference in configuration between Apache 2.2 and Apache 2.4. A developer might use Windows or OS X but the production environment is a Linux server.
An important part of knowing Docker, is knowing how to manage the data inside of a container, for example Docker does not persist file changes between containers. Docker has the concept of data volumes that’s similar to AWS’s EBS. Docker volumes provide sharing of volumes between containers and persistence on the container file-system. When using Docker files the changes to files being added to an image forces docker to bypass cache and redo every step to ensure consistency. Changes to a data volume bypasses this. In other words when updating an image file changes do not trigger a clean Docker build. Docker allows for volumes to be created from the command line too by using the “docker run -v ‘volume’ ‘image’ ‘command’”. Creating volumes from command line has one very useful feature: they can mount the host file-system in a container ex: “docker run -v ‘host dir’:’image dir’ ‘image’ ‘command’”. This allows you to update your project files in real time, reflecting changes immediately. The best way to persist data volumes between container instances is to create a data volume container by calling “docker create -v ‘image dir’ —name ‘image name’ ‘base image’”. The volumes can be mounted to new containers with the “docker run -d —volumes-from ‘volume name’ ‘image name’”. Multiple run calls like this one can be made and the containers will share the same mounted directory. To back up a data volume use the following command “docker run —volumes-from ‘volume name’ -v $(pwd):/backup Ubuntu tar cvf /backup/backup.tar /’image dir’”.
The first and most useful linking tool to learn is port mapping. When creating server containers they normally expose a network port. By default it is not available to the host machine. By specifying the “-P” flag when running a container, the port the container is listening on is mapped to a high value port on the host machine. By using the “-p ‘host port’:’container port’“ the container port can be explicitly mapped to the host machine. Docker has a linking service as well, a common and good use case to demonstrate how this works is a web application communicating with a database server. Start a database server with the following “docker run ––name db training/postgres”. By explicitly setting the name of a container the following example is much easier to follow. To run a new container while linking the database we run the following command: “docker run -d -P ––name web ––link db:db training/webapp python app.py”. By linking the database container to the web container the web container can inspect the properties of the database container. The properties of the database container is exposed to the web container by environment variables that the web application can use to configure a connection back to the database server.
Docker is an amazing tool for standardising the runtime environment of an application allow a more structured work flow for server side application and allowing more robust testing and deployments.