Container Image Internals, Part 2: docker build and push | Dan Lorenc's Blog

Container Image Internals, Part 2: docker build and push

This is part 2 in a series on container image internals. See part 1 here.

Have you ever built a Docker image from a Dockerfile using the ‘docker build’ command, and pushed it to a registry with ‘docker push’? After reading this post you should understand exactly how these commands work.

Prereqs

I tested the following commands on MacOS Sierra. If you plan on following along, you’ll need:

  • A text editor you’re comfortable with.
  • The amazing ‘jq’ tool for formatting and manipulating json data, which can be installed with Homebrew.

I used the Google Container Registry for this example, so if you want to push or work with private images you’ll also need to install the Google Cloud SDK and create a project to help with authentication.

Docker Build

Let’s start with a simple example of creating a new image with some files in it from scratch, then work on how to append to an existing image.

A manifest contains two main sections: the layers and the config object. So first let’s create a working directory, and create our layer:

# Make a "rootfs" directory. All files inserted into our final image are relative to root.
mkdir myfirstdockerimage && cd myfirstdockerimage
mkdir -p rootfs/hello
echo "Hello, Docker" > rootfs/hello/docker

# Tar/gz this into a layer
tar -czf layer.tar.gz -C rootfs/ .

Next, let’s create our config object. This will look like:

  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": $size,
    "digest": "$digest"
  },

in our finished manifest.

The schema for manifests is well documented, but the schema for config objects is a little bit harder to track down. The OCI spec leaves the contents of this object up to the environment you want to run your container in (Docker, rkt, etc.).

From inspecting some configs from existing images, Docker seems to require a few fields:

{
  "architecture": "amd64",
  "config": {
  },
  "history": [
    {
      "created_by": "Bash!"
    }
  ],
  "os": "linux",
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:69e4bd05139a843cbde4d64f8339b782f4da005e1cae56159adfc92311504719"
    ]
  }
}

The config section can contain environment variables, the default CMD and ENTRYPOINT of your container and a few other settings. The rootfs section contains a list of layers and diff_ids that look pretty similar to our manifest. Unfortunately, the diff_ids are actually slightly different than the digests contained in our manifest, they’re actually a sha256 of the ‘uncompressed’ layers.

We can create one with this script:

cat <<EOF > config.json
{
  "architecture": "amd64",
  "config": {
  },
  "history": [
    {
      "created_by": "Bash!"
    }
  ],
  "os": "linux",
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:$(gunzip layer.tar.gz --to-stdout | shasum -a 256 | cut -d' ' -f1)"
    ]
  }
}
EOF

Now that we have our layer and config, we create our manifest:

# We need to know the size and digest of our layer and config:
size=$(wc -c layer.tar.gz | awk '{print $1}')
digest="sha256:$(shasum -a 256 layer.tar.gz | cut -d' ' -f1)"
config_size=$(wc -c config.json | awk '{print $1}')
config_digest="sha256:$(shasum -a 256 config.json | cut -d' ' -f1)"

cat << EOF > manifest.json
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": $config_size,
    "digest": "$config_digest"
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": $size,
      "digest": "$digest"
    }
  ]
}
EOF

To recap, we should now we have our layer, config and manifest. Your directory should look like:

$ find . -type f
./config.json
./layer.tar.gz
./manifest.json
./rootfs/hello/docker

Next, let’s learn how to push it to a registry.

Docker Push

Pushing a docker image is basically the opposite of pull: we upload all of our blobs (including the config blob), then push the manifest.

There’s one caveat though - pushing requires us to setup authentication. The Docker protocol supports a few different authentication mechanisms and explaining them all is out of the scope of this series. We’ll stick to the auth used by GCR.

Authentication to GCR is based on access tokens, which we can obtain from the Google Cloud SDK with this command:

gcloud auth print-access-token

Also, make sure to create a Google Cloud Project so you have a GCR repository to push to. Save that project ID in an environment variable for use later:

PROJECT=<project>

Let’s start pushing!

Uploading layers

To upload a single layer, we first check to see if we need to upload it. This can help prevent us from wasting time and bandwidth uploading layers that the registry already has. To figure that out, you can send a HEAD request like this:

curl -I \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  https://gcr.io/v2/$PROJECT/hello/blobs/$digest

We’re including the authorization information as a header, and using the -I flag to send a HEAD request. If the layer exists, we’ll get a 200. Otherwise a 404.

There are a few different ways to upload a layer. We’ll use the simplest method here, a two-part monolithic upload. Just like it sounds, the “two-part” method involves two http requests. The first one is to tell the registry we’re about to upload a blob, so it can tell us where to send it. The second request does the actual upload to the location returned by the first request.

Here’s what the first request should look like:

curl -i -d '' \
  -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  https://gcr.io/v2/$PROJECT/hello/blobs/uploads/

We need to pass the -d “ flag so curl will attach a ‘Content-Length’ header for us. The ‘-i’ flag is so we can see the returned headers. The important one in this case is the “Location” header, which tells us where to send our actual blob.

The final upload request will look like this:

curl -X PUT \
  $location\?digest=$digest \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  --data-binary @layer.tar.gz \
  -H "Content-Type: application/octet-stream"

Where $location should be the location we got from our first request. We need to pass the digest of the layer as a query parameter, even though this is a PUT request. The “@layer.tar.gz” tells curl to load the request contents from that file on disk.

Putting this all together into a small script that checks if we need to upload our layer, then does the upload:

# Check to see if the layer exists:
returncode=$(curl -w "%{http_code}" \
            -o /dev/null \
            -I -H "Authorization: Bearer $(gcloud auth print-access-token)" \
            https://gcr.io/v2/$PROJECT/hello/blobs/$digest)

if [[ $returncode -ne 200  ]]; then
    # Start the upload and get the location header.
    # The HTTP response seems to include carriage returns, which we need to strip
    location=$(curl -i -X POST \
               https://gcr.io/v2/$PROJECT/hello/blobs/uploads/ \
               -H "Authorization: Bearer $(gcloud auth print-access-token)" \
               -d "" | grep Location | cut -d" " -f2 | tr -d '\r')

    # Do the upload
    curl -X PUT $location\?digest=$digest \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        --data-binary @layer.tar.gz
fi

Config Upload

Configs are basically stored by the registry as normal blobs. They get referenced differently in the manifest, but they’re still uploaded by their digest and stored normally.

The same type of script we used for layers will work here:

returncode=$(curl -w "%{http_code}" -o /dev/null \
    -I -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    https://gcr.io/v2/$PROJECT/hello/blobs/$config_digest)

if [[ $returncode -ne 200  ]]; then
    # Start the upload and get the location header.
    # The HTTP response seems to include carriage returns, which we need to strip
    location=$(curl -i -X POST \
        https://gcr.io/v2/$PROJECT/hello/blobs/uploads/ \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -d "" | grep Location | cut -d" " -f2 | tr -d '\r')

    # Do the upload
    curl -X PUT $location\?digest=$config_digest \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        --data-binary @config.json
fi

Manifest Upload

Now finally, we can upload our manifest. This is the step where we decide how to “tag” our new image.

We include a parameter in our URL, which can either be a digest or a tag. Let’s push our manifest to the tag “:world”, so we can reference it later as:

gcr.io/$PROJECT/hello:world

Here’s the manifest push command:

curl -X PUT \
    https://gcr.io/v2/$PROJECT/hello/manifests/world \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    --data-binary @manifest.json

Now let’s pull the image and make sure everything worked. We can pull the image we just pushed with Docker:

docker pull gcr.io/$PROJECT/hello:world

Remember, this image should contain a single file at the path hello/docker. To get it out, we have to start a container so we can use “docker cp” on it.

c=$(docker run -d gcr.io/dlorenc-vmtest2/hello:world exit 2> /dev/null)
docker cp c:/hello/docker .
cat docker
Hello, Docker

Summary

In this article, we learned how to construct a simple docker image from scratch without using any docker commands. We learned how to name and push this image to a registry, then pulled it back down into a real docker daemon to check that it all worked.

In the next article, we’ll expand on this by putting it to use modifying existing images in the registry.

2017 | About