Taming Git For Docker

Docker is neat. If you don’t know it yet, you should read this now.

How we build

We’re using a cluster of machines specifically setup to build Docker images. The build happens by cloning a repo, then sending it to Docker to build. This leads us to a problem.

When a git repo is cloned, the mtime (modification timestamp) on the filesystem is the time of the clone, not the friendly time that you see in Github (which is based on the last commit). Docker uses the file’s mtime to determine whether or not something has been changed, so NOTHING of yours gets cached in Docker’s slices. There’s a solution, however…

Awww, man!!

After a few hours of cycling through ideas like “we’ll keep clones of every repo on a volume and simply update the head” and “If we download from Github using tar.gz or zip, will the times be preserved?” I was left thourougly unamused and baffled.

Interwebs to the rescue

With a little Googling (every programmer’s best friend), I found a Gist by Jeffery Fernandez that loops through git ls-files and touches each file with the last commit time. Cool! We’ll just clone, process with this script, then docker build. It still didn’t cache the slices from build to build. WTF?

Almost there

We’re using docker build via the API, which takes a tarball of the filesystem. Our git clone BUILDS a filesystem with filenames, creating directories to store the files. Each of the directories was still timestamped at clone time. A tiny hack will fix that right up.

tl;dr Use this script if you’re doing a git clone before every build

Building Nirvana

  1. git clone
  2. copy script into the root of the git tree
  3. run script
  4. build the docker image

Ask a question or share this article, we’d love to hear from you!