Tag: postgresql

What are you building?

A project, a product, or a brand?

Actually, luck has played a big role in our industry, and sometimes you don’t know what you’re building until later. I could have called this post “What’s in a name?” or “Why I have to work around MySQL performance bugs when there’s at least one technically superior and totally free database we could have used instead.”

This is a story of well-meaning people who made what seemed like reasonable choices at the time, some of which have caused mild suffering. They say the road to hell is paved with good intentions.

So what’s in a name? In the 1980’s if you’re a geeky professor creating a project to replace the “Ingres” database project, “Postgres” sounds like a pretty clever name. If SQL becomes a popular standard and is now supported, changing “Postgres” to “PostgreSQL” also seems pretty sensible. There’s continuity. The handful of students who used it at the time were not confused at all. But even for them it might have been better to choose the name of a painter more recent than Ingres (DegaSQL, MatisSQL, PicasSQL, KandinSQL, MonetSQL, RenoirDB, SeuratSQL, DaliBase?).

Then there’s MySQL. Sigh. Maybe we’re all shallow, narcissistic people at heart, or maybe we all just follow the orders of people who are, but “MySQL” is a near-perfect brand name because it’s all about “me“, and it’s “mine”. No need to explain what’s up with the name. Sold. Add to that a focus on the needs of ISP’s and Linux users, and it becomes second only to Oracle in popularity. PostgreSQL is in 4th place now, but the usage scores drop off a cliff after 3rd place (which is Microsoft SQL Server).

It turns out, though, that the brilliant name choice of “MySQL” was just as random as the awful name chosen for its competitor. “Monty” Widenus named MySQL after his first daughter, My. By the way, more-actively-maintained-and-improved fork of MySQL, languishing in 20th place in usage rankings, is MariaDB, named after his second daughter. If he’d had only one daughter, named Ingrid instead of My, the fork might have been called “PostgridSQL”.

So, do I have a point? Just that you might want to think when you create something about who it’s for and to think, when naming or re-branding it, about how people who are not currently “in the loop” will perceive it. Also, if your technology is really solid and well-designed, please be lucky.

For info, recent database usage rankings from db-engines.com:

1. Oracle Relational DBMS 1449.25
2. MySQL  Relational DBMS 1370.13
3. Microsoft SQL Server Relational DBMS 1165.81
4. MongoDB  Document store 314.62
5. PostgreSQL Relational DBMS 306.60
6. DB2 Relational DBMS 188.57
7. Cassandra  Wide column store 131.12
8. Microsoft Access Relational DBMS 126.22
9. SQLite Relational DBMS 106.78
10. Redis  Key-value store 104.49
11. Elasticsearch  Search engine 87.41
20. MariaDB Relational DBMS 34.66


Gentle Introduction: Docker

The tool I present in this article is the container technology phenomenon known as docker.  If you know a lot about docker already, this article is not for you (unless you have suggestions to improve the article).

What is it?

Briefly, docker containers are like virtual machines, except that instead of emulating a whole system, each container runs as an isolated process on a shared kernel. This is much more efficient. Each container is specified by a file (DockerFile – more on this later, if you get all the way to the end of this post), which is basically a scripting and configuration file for setting up everything that differs from the default system.

What’s the big deal?

I do not enjoy system administration or configuration, so docker does not come naturally to me as an interest, but it’s probably going to change the way all developers work (it’s already changed things for some).

What’s so revolutionary about docker and containers? Surely it’s an infrastructure thing best left to the operations people, right? Surely it’s only popular because it lets cloud infrastructure providers save money on hardware because containers use fewer resources than full VM’s?

No. Basically, because of docker, execution environment can be treated as code.

infrastructure == code

In the not-so-far future, because of containers, your build tool will not only build, package and deploy the software you develop, but it will also build, package and deploy the execution environment on which your software will run, including installing database(s), application server, etc.  Some people are already doing this using custom gradle code. In the long run, containers might even make virtual machine environments like the JVM and the CLR obsolete, because those environments were originally created to shield programs from differences between execution environments.

Docker has a central repository, Docker hub, which allows docker users to pull execution environment configurations (known as “images”) from the internet using the command line in the way that maven’s central repository allows maven to pull jar files.

Hello, Postgres

So let’s go through some steps to do something rather basic with docker: installing and running a postgresql database server in a docker container. It’s easy, because postgresql is freely available on docker hub.

Running a database in a container could be useful for integration tests – each developer would have an identical local database installation.

First you need to install docker.  You’ll need to follow the instructions for your machine on the docker installation docs page.
I installed it on Windows. First I had to fiddle with my BIOS menu to enable virtualization.  Then I downloaded and launched the Docker Toolbox binary. Pretty straightforward stuff.  The ToolBox installed an icon on my desktop called Docker Quickstart Terminal. Double-clicking it launches docker in a bash shell command-line window (in a VM). It takes a while to warm up. The VM is assigned a fairly standard IP address which is shown in the terminal when finishes launching. For me it was


Once you get a text prompt, you can pull the latest stable version of the postgresql image using the command:

docker pull postgres

This may take some time to finish all the downloads.

Once done, you can create a container called my-postgres to run the database:

docker run -p 5432:5432 --name my-postgres -e POSTGRES_PASSWORD=password -d postgres

Note that the -p option maps a port of the container to a port on the host. So for my-postgres the default postgres odbc port will be accessible from outside the container.
The -e option sets environment variables (in this case it sets the password for the default user “postgres” to be “password” (OWASP would not approve, but this is a tutorial).
The -d option runs the container as a background (a.k.a. “detached” or “daemon”) process and this option is not followed by an argument (the “postgres” at the end of the command is the name of the image to use).
On linux, you may need to prefix docker commands with sudo.

You can verify that the container is running with the command docker ps -a.

If you need to reboot your machine or if for some reason you find that your container is not running, you can restart it with the command `docker restart my-postgres`.

So how do you access that database now?  You need to know the IP address to connect to.  If you installed on Windows using the Docker Toolbox, and if you can’t find the IP in your Docker Quickstart Terminal window, the IP should appear in the output of the command

docker-machine ls


To get the IP on linux (don’t bother doing this on Windows, the result is only useful within the linux environment running docker), it seems you need to call

docker inspect --format '{{.NetworkSettings.IPAddress}}' my-postgres

Once you have the IP you can access via ODBC on port 5432 – using a tool like PGAdmin, for example. And you can write unit tests or create data sources in your web app using the IP and the port 5432.

And then?

This is already pretty cool, but you can go a lot farther with docker, linking data containers to application containers so that they can communicate with each other, extending the postgres container image to call SQL scripts on startup, mapping file directories in the container to file directories on the physical machine, etc. It can get a bit complicated, but the docker user guide is a good place to look to find out how to do all that.

A closer look at the DockerFile

I said earlier that infrastructure is now code, with Docker.  For the PostgreSQL container we used in our example, let’s take a quick, hand-wavy look at the DockerFile code under the hood.  You don’t need to do this to use the PostgreSQL image, but this is a techie blog, so let’s have a look under the hood.  The DockerFile for postgres 9.5 is at this link on docker hub.

The first line is:
FROM debian:jessie

Inheritance! A DockerFile starts with a FROM to indicate the parent image upon which it builds. By the way, the parent DockerFile debian:jessie is here, and it builds upon what is effectively the root image.

In the postgres DockerFile, there is a variety of commands. We see RUN used to launch linux commands like mkdir (to create file directories) and apt-get (to download and install software packages). We see ENV commands that set environment variables, like “PATH” and “PGDATA”. The EXPOSE command in the file tells Docker that the port in question is used by the container but does not make it visible from the outside (we did that with our docker run command above using the -p option). There’s also the VOLUME command which creates a mount point for a disk, a COPY command which copies into the container a shell script file which is part of the image (we see here that the postgres image consists of a DockerFile and a shell script file). There’s also an ENTRYPOINT command which launches the script file when the container is launched (ENTRYPOINT is different from RUN in that ENTRYPOINT associates the life cycle and output of the application launched with the life cycle and output of the container, allowing the use of a container to run a one-shot command and output the result, for example).