Terraform Environment+Application Design Pattern: Creating Environments

This is part two of a series of articles on the Terraform E+A Pattern. If you've not read the first part, you may wish to start at the beginning for context on the goals of this pattern and the terminology we're using here.

A key part of applying this pattern is deciding what you consider to be shared infrastructure and what you consider to be application infrastructure. The right trade-off for your situation will depend on the architecture of your system and the work culture within your team, but I generally try to keep the environment infrastructure to a minimum and push as much as possible into applications, since that keeps each application somewhat self-contained and reduces the need for coordinated changes to both environment and application.

So for our purposes here I will define an environment as having the following components:

An AWS VPC in a single region
Several AWS subnets in different availability zones within that VPC.
A cluster of Consul servers, with one server per subnet.

The specific network architecture used here is not part of the pattern, but is rather just an example. You might not use AWS at all, or you might decide to spread your app across multiple regions, or do something entirely different! The important part is that an environment establishes somewhere for the application resources to live such that they can communicate with one another, and establishes some sort of data store to use for configuration.

As a cost tradeoff -- and, more importantly, to illustrate how to create differences between environments -- we will have five subnets and Consul servers in the production environment but only three of each in the QA environment.

Each environment consists of a separate Terraform configuration, but the environment's configuration consists only of instantiations of shared Terraform modules to create the necessary components. For the sake of example we'll assume that all environment-level configuration lives in a single repository with the following directory structure:

QA/
  env-QA.tf
  config-QA.tf
PROD/
  env-PROD.tf
  config-PROD.tf
shared/
  region/
    region.tf
  az/
    az.tf
  consul-cluster/
    consul.tf

Creating Environment Infrastructure

Each environment has its own separate Terraform configuration, but these consist only of references to the shared modules, configuring them appropriately for each environment. Here's an example of how that might look for QA, in the env-QA.tf file:

These top-level environment configurations just serve to wire together all of the parts that make up an environment. Having a separate configuration for each environment allows us to easily create slight variations between them, while using the shared modules minimizes the code duplication resulting from this structure. The env-PROD.tf file would then follow the same structure but would instantiate the ../shared/az module five times, allowing us to create more redundancy in production while still making QA a realistic-enough copy of the general environment structure.

The details of the region and az modules are AWS-specific and not very important for this article, but we will see what might go in the region.tf file as an example of the general principles of shared modules:

The other shared modules proceed in a similar manner, accepting various variables as input, declaring necessary resources, and returning details about those resources to be used for the next step.

Publishing Environment Configuration

The final important part of provisioning an environment is to publish information about the infrastructure it provides so that applications can make use of this infrastructure when they are deployed into the environment. This is the purpose of the config-QA.tf and config-PROD.tf files, which in our case will write the relevant settings into Consul using an arbitrary but systematic set of Consul keys:

The intent here is to make both environments produce an identical structure in Consul but with differing values. Later we will see that application configurations can then read from these predictable locations to automatically discover the environment resources, regardless of which environment they are deployed into.

Provisioning the Environments

With all of this in place, we can separately create or update environment using the usual Terraform workflow:

cd QA
terraform get
terraform plan -out=tfplan
terraform apply tfplan

At the time of writing Terraform has a limitation where it struggles to create this sort of multi-layer system: it will try to activate the consul provider before it has had a chance to create the Consul server.

This should eventually get fixed by partial apply, but until then the workaround is to add an additional argument for the first run of the plan command:

terraform plan -out=tfplan -target=module.consul

This only needs to be done for the first run, to make Terraform skip trying to deal with the consul_key_prefix resource until the Consul server exists. Once Terraform has successfully run with -target, run it again as shown above to complete the environment.

Assuming that you're following along with these specific technology choices, once these configurations successfully apply you should find the relevant settings in each environment's Consul key/value store. These configuration settings can now be used both by other Terraform configurations and by other consumers that are able to access the configuration store, giving a single source of truth on the environment's infrastructure settings.

In this particular example our environments are pretty minimal. Depending on the technology choices elsewhere in your stack, you may wish to add additional shared infrastructure here such as container orchestration with Kubernetes or Nomad, a secret store like Vault, etc. The important thing is that the environment creates the fabric onto which all of the applications will be deployed; it supports the applications within it and creates channels of communication that allow the applications within that environment to interact with one another.

Why not use `terraform_remote_state`?

Terraform has a data source terraform_remote_state that allows outputs from one Terraform configuration to be used by another. This provides a low-friction way to connect Terraform configurations. For those familiar with this feature, it may come as a surprise to see this article suggest a general data store such as Consul as a solution for sharing configuration information.

Using the remote state mechanism for configuration storage is, in fact, a perfectly reasonable choice: the E+A pattern requires there be a place to share configuration settings, but leaves the selection of technology for this up to the implementor.

With that said, using a non-Terraform-specific configuration store such as Consul does have some advantages:

Terraform's state format is not (yet?) considered a stable format suitable for consumption by third-party applications, so publishing data via remote state makes it accessible only to Terraform. On the other hand, publishing to a generic store like Consul means that the same data can additionally be used by other systems. With Consul in particular, its companion utility consul-template can be used to create templated configuration files on a server that update automatically as the data evolves in Consul.
The Terraform state data contains lots of other information in addition to the outputs exposed by the terraform_remote_state data source, exposing all of the implementation details of the corresponding module. In some cases this can include secret information such as database passwords and private keys which may be inappropriate to share broadly. By intentionally publishing specific data into a generic data store, a stronger distinction is maintained between information published for general use vs. details that ought to be more tightly controlled.

The terraform_remote_state data source does still have its place as a means to share information between closely-related configurations that form parts of a single subsystem, but it has weaknesses when used to create interfaces between subsystems.

Populating Our Environments

With the environments created, we're ready to move on to the next step of deploying the applications themselves! We'll get into that in the next part.