Packer and Terraform are the products HashiCorp recommends for building and deploying applications in the modern datacenter, but these two tools have some differences and rough edges that mean they don't tessellate as well as they could. In this article I'll talk about a prototype I built that uses the core of Terraform to create a Packer replacement, creating a more flexible tool that has a much better story for integration with Terraform for deployment.

The HashiCorp Stack

Modern web applications are rarely composed of just a managed application server and a database, but are rather composed of a wide array of different services: CDNs and load balancers, smart DNS servers, externally-managed storage and caching infrastructure, outsourced mail servers, and so on and so on.

Traditional application deployment most commonly makes a sharp distinction between the the deployment of code and the management of the infrastructure on which the code runs and which the code uses. Tools like Puppet, Chef and cfengine are primarily focused on automating the configuration of particular servers, with separate tooling like Foreman for bootstrapping those servers and, in many cases, home-grown scripts doing the actual deployment of applications.

Terraform is an open source project by HashiCorp that takes a new angle on cloud infrastructure and application deployment. It takes the declarative resource definition style of Puppet but applies it to the definition of resources that can be created, updated and deleted via network APIs. In a world of cloud-hosted infrastructure, that can include everything from DNS records to virtual machines to git repositories. While Terraform's current set of resources is weighted towards infrastructure-as-a-service platforms, its model is suitably generic — just Create, Read, Update and Delete operations — that it can be readily applied to just about anything that could be reasonably modelled as a REST API resource.

As well as a resource-oriented configuration language for API-managed objects, Terraform's other key feature is its concept of state, which allows Terraform not only to create resources but to keep track of previously-created resources and update and delete them as needed. The current state of one deployment can be used as an input to another, so as well as managing infrastructure Terraform can also be seen as a way of publishing and sharing infrastructure between teams within an organization.

Deploying in terms of API-managed resources works best when applications are built in terms of such services. One certainly could provision a stock Ubuntu EC2 instance with Terraform and then from then on install software on it via a traditional "copy-files-flip-symlink-restart" workflow, if we produce an "Amazon Machine Image" (AMI) with the application already installed then we can have Terraform deploy it directly, and benefit from the fast startup time that comes from having done most of the setup work at application build time rather than at deploy time.

HashiCorp's answer to this need is Packer, a tool which in fact predates Terraform and is designed to automate the creation of machine images for various cloud infrastructure platforms. Its model is pretty simple:

  • A builder spins up an environment based on an existing image and prepares that environment so Packer can interact with it.

  • One or more provisioners communicate with the created environment to customize it in arbitrary ways. For example, a chef-solo provisioner allows Chef to be used to apply changes to the machine.

  • The builder then captures some kind of image of the build environment, which becomes the artifact of the build.

When applied to AWS, the builder creates an EC2 instance, the provisioners interact with it over SSH, and then the final artifact is an AMI that can be used as the basis for new instances created at deployment time.

Putting the Pieces Together

Perhaps due to it predating Terraform by some years, the interopability between Packer and Terraform is unfortunately rather limited. If you wish to move beyond manually pasting the ids of the generated artifacts into your Terraform configuration, you're left either with parsing Packer's rather awkward (but certainly machine-readable) CSV output, or running Packer inside HashiCorp's Atlas platform. Either way it doesn't feel completely natural, and that's rather unfortunate for two tools that supposedly belong to the same family.

Along with the frustration of the poor tessellation of these tools, I also couldn't shake the idea that Packer's high-level capabilities are pretty close to being a subset of Terraform's capabilities.

Build vs. Deploy: Not so different?

When we put aside various implementation details and the set of resources that happen to be implemented in each codebase today, there are only a few minor differences between the build process afforded by Packer and the deployment process afforded by Terraform:

  • Terraform manages a set of long-lived resources across multiple incremental deployments. Packer produces a completely distinct set of resources for each run.

  • Terraform creates and updates resources to move them towards a configured end state. Packer provisions certain resources only to assist in the build process, destroying them once the process is complete.

  • Terraform keeps track of the resources it's created, so they can be updated and deleted by later runs. Packer cedes control of the resources it created as soon as it completes, leaving the user to do manual cleanup.

The first two of these are legitimate and fundamental differences in purpose between the build and deployment phases. The last of these is arguably a limitation of Packer: though there is little reason to alter the artifacts of a build, it would actually be rather useful to be able to automatically destroy resources created for older versions of an app that are no longer needed in order to reduce storage costs.

The first two differences also apply to the use of Puppet or Chef as a one-off image provisioning tool vs. their use for ongoing management of a long-lived machine. If these tools can be applied to both problems, and if Terraform's configuration is at a similar level of expressive power to Puppet's, perhaps we could apply the resource management guts of Terraform to both problems also.

Padstone: A Terraform Build Prototype

I created Padstone to explore the idea of applying Terraform's model to the problem of building application artifacts. Padstone is a small command line tool that puts a new, build-oriented façade on the underlying mechanisms of Terraform.

The most readily-obvious difference between Padstone and Terraform is that the set of subcommands it accepts are more oriented around an application build workflow:

$ padstone --help
Usage:
  padstone [OPTIONS] <build | destroy | publish>

Help Options:
  -h, --help  Show this help message

Available commands:
  build    Run a build and produce a state file
  destroy  Destroy the results of a build
  publish  Publish a state file to remote storage

Executing builds with Padstone

The padstone build command expects as arguments:

  • a path to a directory containing Terraform-like configuation files (named with a .pad extension, a superset of standard Terraform config as we will see in a moment)

  • a path at which a state file will be created to record the results of the build process

  • zero or more values to populate the user variables defined in the configuration

Much like terraform apply, padstone build uses Terraform providers and provisioners to create various resources and then records the state of these resources in a JSON state file. Unlike Terraform, Padstone always starts with an empty state and thus creates a fresh set of resources for each run, thus addressing the first of our differences from the section above. One down, one to go!

Temporary Build Infrastructure

To address the second of the differences I noted, Padstone extends the Terraform configuration model with the concept of a temporary resource. A temporary resource is created the same way as any other resource, except that padstone build will destroy all of the temporary resources before it returns.

At the time of writing, Terraform lacks a resource type for creating an AMI, which is the main purpose of the AWS family of builders in Packer. However, AMIs can be created via the API just like any other resource, so it's a simple matter to extend Terraform to support these API functions, as I did in Terraform pull request #2784. I used a build of Terraform with that patch applied in order to illustrate how Padstone can achieve the same result as Packer.

The following Padstone config builds an AMI via a similar process to that used by Packer's amazon-ebs builder and Chef provisioner:

We can put this config in webserver/ami.pad and then create an AMI for a particular application with Padstone as follows:

$ padstone build webserver/ webserver-0.0.1.tfstate version=0.0.1 \
       vpc_id=vpc-xxxxxxxx subnet-id=subnet-xxxxxxxx

[aws_key_pair.provision] Creating...
[aws_security_group.ssh] Creating...
[aws_instance.base] Creating...
[aws_instance.base] Provisioning...
[aws_ami_from_instance.image] Creating...
--- Build succeeded! Now destroying temporary resources... ---
[aws_instance.base] Destroying...
[aws_key_pair.provision] Destroying...
[aws_security_group.ssh] Destroying...

Outputs:
- version = 0.0.1
- ami_id = ami-xxxxxxxx

With this extra step of destroying the temporary resources we resolve the second difference between deploy and build. Throughout the process Padstone maintains the current resource state in the webserver-0.0.1.tfstate, and so once the process is complete it contains just the outputs and the non-temporary resources.

Cleaning Up

As noted above, Packer provides no automatic way to destroy the resources it created once they are no longer needed. By writing out a state file, Padstone can overcome this limitation of Packer:

$ padstone destroy webserver/ webserver-0.0.1.tfstate version=0.0.1 \
       vpc_id=vpc-xxxxxxxx subnet-id=subnet-xxxxxxxx

[aws_ami_from_instance.image] Destroying...
All resources destroyed

All that's required to benefit from this is to record somewhere the state file for each version. For example, if the build process is being orchestrated by Jenkins then its ability to capture files as artifacts could be used to attach the state to the Jenkins build result.

Publishing Artifacts

The original motivation for Padstone was to create a build tool that integrates well with Terraform. This is achieved by publishing the state file that describes the created resources, so that it can be imported into a Terraform deployment using the terraform_remote_state resource:

$ padstone publish webserver-0.0.1.tfstate s3 \
       region=us-west-2 bucket=padstone-results \
       key=exampleapp/webserver-0.0.1.tfstate

As long as whoever is running the deployment has access to the same S3 bucket, this state can be used by replicating the same settings in the Terraform configuration:

Of course, since all Terraform resources are available in Padstone it is also possible for one Padstone configuration to consume resources from another, allowing the build process get the same collaboration benefits as Terraform brings to the deployment process.

How Padstone Works

Padstone re-uses a lot of code from Terraform. It has its own top-level configuration parser in order to support the temporary resources, and of course the implementation of its unique commands, but really all it's doing is transforming its input into something that the Terraform core can consume and then running the same old "apply" and "destroy" steps.

The concept of temporary resources is implemented via some trickery: Terraform's dependency resolver wouldn't normally allow a resource to be destroyed without also destroying its dependents, but Padstone is able to break this rule by splitting the internal state data structure in two, maintaining the temporaries and the results as separate manifests. The temporaries can then be cleaned up by destroying the state where the results are excluded, causing Terraform to temporarily "forget" that the result resources exist.

The full details are, of course, in the code.

Where to from here?

In its current state, Padstone is just a prototype and far from ready to use. Although it's shown that there's potential for a superior solution compared to Packer, the set of resources supported in today's Terraform does not include all of the items that Packer can produce.

However, with the right set of Terraform resources Padstone could push beyond Packer's narrow focus on machine images to many other per-app-version resource types. For example:

  • A resource for creating objects in S3 buckets could be used to distribute arbitrary files, like library archives for use in other builds, or application archives to deploy with AWS OpsWorks.

  • A provider for Fastly could exploit the concept of configuration versions that's built in to their API in order to push new configurations at build time and simply activate them at deploy time.

  • Padstone could be used to create a separate set of EC2 instances for each application version, and then have Terraform manage only the load balancer in front of them to support quick rollback to the still-running older version in the event of issues. The instances for each version can then have their own independent lifecycle. Similar thinking could apply to any other kind of resource that's deployed behind some sort of switching layer that allows backend resource selections to change quickly.

  • In principle, Padstone could benefit from Terraform providers that manipulate local resources on the system where Padstone is running. This is not appropriate for standard Terraform since those resources would usually not be available to other users of the created state, but Padstone could use them as temporary resources to assist in the creation of a non-local result resource, such as running VirtualBox locally to create a machine image that is ultimately upladed to S3, or registered in Atlas as a shared Vagrant box.

At the moment I have no strong wish to develop and maintain a competitor to Packer. Rather I'm sharing this proof-of-concept in the hope of stimulating a discussion about ways in which these problems could be solved in a more integrated, flexible manner by the "HashiCorp stack". I feel that such a project would be far more successful if lead and coordinated by a team whose full time job is creating DevOps tools. Never say never, though!