NLEN
Direct technisch advies?
Home > Blog

How to run Postgres on Docker part 2

Craig Healey 26-09-2019 11:45 AM
Categories: Blog, Open Source, PostgreSQL, Review, Technology

In my previous blog post, I quickly ran through how to set up PostgreSQL inside a Docker container, up to the point where you could administer the database with psql or pgAdmin. What I didn’t do was explain how any of it worked. If you’re trying to anything even slightly different, or if you encounter errors running it as I outlined, then you’re going to need to know what all of those commands actually do.  In this blogpost I will explain some Docker basics.

Understanding Docker basics

Docker is a set of products that allow software to be run in virtualized environments, called containers. In the case of Docker Toolbox running on Windows, those containers are running inside a VirtualBox, normally called default. Docker creates this when it first runs. Last time we created a container called some-postgres. But where did we get all the software to make the container – the Operating System and PostgreSQL? All of that was contained in an image. We used an image called postgres that was stored on the Docker Hub. Let’s look at the command used to create the container:

docker run

Most docker commands start with the keyword docker. 

docker run --help 

will give you a list of possible flags.
The command run creates a container.
The flag --name gives it a name. Containers are unusual in Docker, in that if you don’t specify a name they are given a default name such as angry_davinci, jolly_wing or tender_banach. For most of the other Docker objects, if you don’t specify a name, you end up having to use the hash key that Docker generates. In this case I used some-postgres, as suggested on the postgres Hub page.

Ports

The next flag publishes the container’s port. All of the flags have a multi-character name preceded by 2 dashes (POSIX standard), but some of them have also have single-character alias as well. So, I could have used -p 5432:5432 or --publish 5432:5432. The first number is the external port, and the second one is the internal port. The default port for PostgreSQL servers is 5432. So, if we create multiple PostgreSQL containers running on the default port (which we will shortly), we need to give them different external ports. If I wanted to create 3 such containers that could all be accessed from pgAdmin at the same time, I’d use the following port flags:
-p 5432:5432
-p 5433:5432
-p 5434:5432

On pgAdmin, I’d create 3 servers with ports 5432, 5433 and 5434.

The next flag, -e or --env, lists environment variables specific to the image. In this case, we want to set the postgres user password so that we can connect via pgAdmin. If an image needs to set such variables, they should be listed somewhere, and in the case of this PostgreSQL image they are explained in detail halfway down the front page

The last flag is -d or --detach, specifying that we want the container to run in the background. If you forget this, then running the command will put you straight into the container and when you exit, the container will stop.
Finally, the name of the image is specified, in this case postgres. You might type a list of commands you want the container to run, immediately after the image name, but in this case we don’t need to do that.

Dockerfile

Running containers directly in this way doesn’t give you a lot of control, especially as Docker is supposed to improve automation rather than typing skills. The normal way to run Docker is a three-step process. First you create a text file, called a Dockerfile, containing a base image. The base image is the first thing in the Dockerfile (although you can have comments, starting with #) and is preceded by the keyword FROM. There is a special base image, called scratch, that doesn’t contain anything at all. So, to build your own image from scratch, the first line of your Dockerfile will be:


FROM scratch

 

You could then add whatever you want into your image. But you don’t need to start from scratch. You can extend existing images. You could create an ubuntu image with specific tools and variables set. From there, you could build other images by installing different versions of PostgreSQL onto exactly the same underlying OS. That would allow you to test only the changes in PostgreSQL versions. Using the Dockerfile, you build your own image, just like the ones you can pull from the Docker hub. And using that image, you run a container.

 

Images

 

So, let’s build an image based on the postgres image we looked at last time. Create a text file called Dockerfile.  By default, Docker looks for the Dockerfile in the current working directory (called the build context). You can, of course, run Docker commands from a Command Prompt or PowerShell window, not just the Docker Terminal program. And you can store your Dockerfiles in whatever directory suits you, just use the -f flag to specify the file location. I’m going to be lazy and create a test directory right in the Docker Toolbox directory. Don’t forget that when moving around in the Terminal window, you need to use Linux commands. So, ls instead of dir, and pwd instead of echo %cd%. You also have access to vi editor if you want. My Dockerfile consists of just 2 lines:

FROM postgres
ENV POSTGRES_PASSWORD=mysecretpassword

To build the image, type

docker build -t craig/postgres:version1 .

Don’t forget the full stop (period) at the end. That tells Docker to use the Dockerfile in the current directory. 

docker build

You can see the build is a two-step process. First it finds the base image. If it’s already downloaded, as here, then it moves on to the next step, otherwise it pulls it from the Docker Hub for you. Then it adds the next command line, in this case setting the POSTGRES_PASSWORD variable. That involves an intermediate container, which is automatically removed for you. After the build has been successful and the image is given an Image ID, you get a security warning. Important if you’re considering doing this for a real system, but for now you can safely ignore it. If I build version2 of my image without changing the Dockerfile, Docker is smart enough to know that nothing needs changing. It will create a new image, called craig/postgres:version2, but the Image ID will be the same as for craig/postgres:version1. You can see the possibility of multiple levels of dependencies being created here. Unfortunately, there isn’t an easy method of viewing dependencies short of third-party scripts and tools. The closest thing to a Docker command is:

docker inspect --format='{{.Id}} {{.Parent}} {{.RepoTags}}' $(docker images --quiet)

docker inspect

This will list the sha256 of an image (the first 12 digits of which are used as the Image ID) followed by the image’s immediate parent, if any. For more on the formatting command, see this blog. When it comes time to delete images, you won’t be able to if it has a child image, so you may need to resort to this in order to get rid of images you no longer need. You can also use:

docker history postgres

docker history postgres

which lists all commands used to create the image, any images created along the way, and the size of each individual step.

To list all images on our machine, use the docker image ls command:

docker image ls 

 

docker image ls

There are a number of things to look at here. First, the naming convention. Images are often stored in registries (Docker even has it’s own Registry image) and you use git-like commands to pull and push images to the registries. The first part of the name is the Registry Hostname. If you have an account on Docker Hub, that will be your Registry Hostname when you push the image. Some images don’t have a Registry Hostname, they’re mostly Official Images. The part after the slash is the name of your image. And finally, the tag part allows you to mark different versions of an image. As I have no intention of pushing these images to a registry, I’m just following the naming convention to keep everything nice and tidy. Notice that the official postgres image is tagged with latest. If you don’t specify a tag when you build the image, latest is added.  I didn’t specify a tag for the postgres image in my Dockerfile, so it used the latest. If I want a previous version of PostgreSQL, I can use e.g. 

 

FROM postgres:9.6 

 

and it will pull that version of postgres from the Docker Hub. Of course, I’m assuming that postgres:9.6 is a container built with PostgreSQL v9.6, but as it’s an Official Image, that’s a fairly safe bet. You can usually check the tags on the Docker hub page of the image you are using. If I want to, I can also look at the metadata for the image that I have pulled down (or any Docker object on my machine) using the docker inspect command. First use: 

 

docker pull postgres:9.6 

to pull the specific postgres image from the Docker hub. Then run:

docker inspect postgres:9.6
 

and you’ll see a large JSON document. 
If I create another image that uses the same version of postgres, Docker doesn’t download it again, it uses the image it has. If I use both the default postgres image and version 9.6, then Docker pulls the older image down as well, and I will have postgres:latest and postgres:9.6 stored on my machine. If you build a few images, keep an eye on how much space the default VirtualBox is using, as that is where they are stored. Some images are built with the minimum of tools to be as lightweight as possible. The latest postgres image is v11. If I pull a postgres v11 image built on Alpine Linux:

docker pull postgres:11-alpine 

and then compare sizes

docker image ls 

docker image ls

You’ll see the latest version is 313Mb, then 9.6 version is 230Mb and the lightweight 11-alpine version is 71.9Mb. This is something to consider if you’re building your own images.

Volumes

Docker containers are supposed to be ephemeral, created and destroyed by automation tools at any time. But if your container is a database, what happens to the data? You can create persistent data Volumes that reside on the VirtualBox machine and can be accessed by containers which have been created in the correct way. Volumes exist independently of Images and Containers, so you can use the same Volume for multiple Containers, even at the same time. Unfortunately maintaining Volumes isn’t as easy as it should be in Docker. Unlike Containers, Volumes don’t get automatically named, and as it’s easy to implicitly create a Volume, you may end up with lots of mysteriously named Volumes and no idea where they are used. To avoid that issue, you can explicitly create a Volume and give it a name, before using it with a Container.

docker volume create postgres_vol_1

You can also just use the flag -v and specify a name for the volume when you run the Container and map it to the directory, like this

docker run -v postgres_vol_2:/var/lib/postgresql/data --name volume-postgres -p 5433:5432 -d craig/postgres:version1

The /var/lib/postgresql/data directory is the default data directory for PostgreSQL using the Official Image, as discussed under PGDATA in the How to extend this image section of their Docker Hub page. 

If you’ve run both of the above commands, when you type

docker volume ls
 

docker volume ls

you’ll see you have 3 volumes: a default on with a long string of digits for a name, and the 2 postgres_vol volumes. As only postgres_vol_2 is being used, go ahead and remove postgres_vol_1:

docker volume rm postgres_vol_1

Note, even 

docker system prune 

won’t remove Volumes. You need to use 

docker volume prune 

or 

docker volume rm  

If the Volume has an ID instead of a name, you have to type the name in full. Other commands will work with a partial ID that is unique, but not this one.

Networks

Docker creates 3 networks by default, which you can see using

docker network ls

docker network ls

  • Bridge – the network in which containers are run by default. The Docker bridge driver automatically installs rules in the host machine so that containers on different bridge networks cannot communicate directly with each other.
  • Host – use the host’s networking directly
  • None – disable all networking

To create your own network so that containers on the network are isolated from containers not on the network, use: 

docker network create –-driver bridge my_bridge_network

Then add the --network my_bridge_network flag when you create the container. There’s lots more to be said about networking in Docker, but as this is a database blog, I’ll just refer you to the Networking tutorial over at Docker.

Docker-machine

Finally, let’s briefly look at one of the other tools that comes with Docker, that being Docker Machine. Docker Machine allows you to create and control virtual machines running Docker, amongst other things. For the full rundown check out the official pages. But for now, we’ll just look at creating and removing a VirtualBox machine. First, list the machines you already have, which should just be default:

docker-machine ls

docker machine ls

Then create a new machine called virtual-docker

docker-machine create virtual-docker

After a short while it should finish.

docker-machine create virtual docker

List the machines again, using 

docker-machine ls

docker-machine ls

Notice that the default machine is shown as active. Any docker commands you run will be run on that machine. Test this by listing the running containers using 

docker ps

Now change machines by setting the DOCKER_HOST variable to whatever the URL of the virtual-docker machine is, in my case 192.168.99.111:2376

DOCKER_HOST=tcp://192.168.99.111:2376

Again, list both docker machines and running containers

docker-machine ls
docker ps

docker-machine ls

You can see that virtual-docker is now shown as active, and no containers are present. You can also open VirtualBox Manager and see that a new machine is running. Any docker commands you now type will be run in the virtual-docker machine. To set the active machine back to default and remove the virtual-docker machine

DOCKER_HOST=tcp://192.168.99.105:2376
docker-machine rm virtual-docker

As with all the commands, more information can be found using the --help flag.

Coming up in part three…

This has been quite a wordy blog. In the next part I’ll look at deploying a PostgreSQL cluster, both the easy way (based off someone else’s hard work) and the hard way (creating your own images).

How to run Postgres on Docker part 1

How to run Postgres on Docker part 3

How to Postgres on Kubernetes (part 1)

Back to blogoverview

React