19 October, 2021

Multi-env AWS with Terraform and Terragrunt in one hour

tl;dr This guide is for people who want to get started with AWS and Terraform, who want to do it properly, securely and want to get building FAST.

In 2020 I left my permanent job and started Merge Mamba (now shut down). I made a lot of mistakes on the way (blog post to follow) and one of the big ones was spending too long tweaking the tech stack, and not enough time actually figuring out if I had a viable product.

My investment in tech has left me with a solid base from which I can iterate new ideas really fast. Much of that is down to the time I spent getting Terraform to play nicely with a multi-account AWS setup. Over the past few years I've seen a lot of different Terraform setups in everything from startups to large financial institutions and if you:

  • Are an AWS user
  • Are a Terraform user (you should be)
  • Need to build FAST but securely, and in a way that'll make future hires/devops happy

...the setup I'm describing here will be useful. It's nothing groundbreaking and there may be better ways to do it. All I've done is read the docs (pretty much everything in here is lifted from other tutorials) and figure out the slightly less obvious stuff but hopefully it'll get you going with a multi-env AWS infrastructure stack in less than an hour.

Prerequisites

  • Install Terraform
  • Install Terragrunt
  • Have an AWS account for each environment and one root account.
  • Have admin access to AWS (for bootstrapping)

The Example Repo

Clone or fork the example repo. The example repo is split into two sections. All .tf files are in the modules directory and the supporting Terragrunt code is within the env directory. Placeholders are of the form <YOUR_.*> and indicate where you need to fill values in. There are a few READMEs lying around in the repo that may also be useful.

Terraform structure

Terragrunt is a wrapper around Terraform that adds some functionality to make it easier to write modular, parsimonious Terraform. See the docs for more but at a high level:

All common Terraform configuration (e.g. providers, common vars) is defined in env/terragrunt.hcl and ensures commonality between your environments.

Each subdirectory of env/ represents a different AWS account and there's an env.yaml file with per-env configuration in each one that is made available to your terraform. Edit env/<env_name>/env.yaml to include the correct AWS account id and a name for your environment. Ensure these correspond to the AWS accounts you created above.

Each subdirectory in /modules is a terraform stack that can be applied to any environment that has a corresponding directory in /env/<env_name>/. You'll notice that /env/root/iam_users exists but /env/prod/iam_users does not as we want to apply this module only in the root account. In this way you can restrict resources to certain environments at a module level (as well as at a resource level by switching on the environment name that's made available to your terraform code) via the inputs {} block in the top-level terragrunt.hcl file. You can introduce dependencies between these modules if you need to pass data between them (see the Terragrunt docs for more).

env \               <- each env has a directory here
    - root
    - prod
    - dev
    
modules \           <- each tf stack has a directory in here
    - users
    - roles
    - product_a
    - product_b

Environments

This guide assumes that you're using dedicated subdomains for each non-prod environment. For example:

  • (*.)henrycourse.com refers to my production environment
  • (*.)<env>.henrycourse.com refers to my environment <env>

You don't have to do it like this, but it is an assumption of this guide.

AWS Accounts And Users

In this guide I'm going to assume that you have two environments: prod and dev. With this setup, it's trivial to add new environments (there is zero repetition of terraform code, just add a new subdirectory in the env dir) and override per-environment configuration as required.

Each environment has its own AWS account that contains all of its resources and roles required to access them. This strong separation is great for ops and security.

All AWS users live in the root account and access environments by assuming roles in those environments. This makes securing access to environments very straightforward and means that user management is reasonably centralised.

Create AWS accounts for each environment that you need.

AWS auth

I'm assuming for this guide that your default AWS profile points to a user or role in the root account with admin permissions in all the accounts you own. Once you've created users and roles, you can stop using the root user and switch to a profile-per-env approach as I've assumed for this guide. e.g.

[default]
region = eu-west-2
output = json

[profile prod]
role_arn = arn:aws:iam::...
source_profile = default

[profile dev]
role_arn = arn:aws:iam::...
source_profile = default

If that's not the case you'll need to edit the provider block in env/terragrunt.hcl to change the profile selector:

provider "aws" {
  region  = "<YOUR_HOME_REGION>"
  profile = "${local.env_vars.env == "root" ? "default" : local.env_vars.env}"
}

provider "aws" {
  alias   = "useast"
  region  = "us-east-1"
  profile = "${local.env_vars.env == "root" ? "default" : local.env_vars.env}"
}

You don't need to have this sorted now if you're running Terraform as an AWS root/admin user but once you've created roles in the next steps you

Creating a state store

Step one involves creating an s3 bucket to hold remote Terraform state. I've chosen to have a single bucket in the root account that holds the state for all envs under separate keys. With some minor tweaking you could use one bucket per environment.

To create the state bucket:

  1. In the env/root/tf_state_s3 directory, run terragrunt apply to create the state bucket using local state, check the diff and type yes to confirm.
  2. Uncomment the terraform {} block in modules/tf_state_s3/tf_state_s3.tf.
  3. Uncomment the remote_state {} block in env/terragrunt.hcl and add the name of your state bucket.
  4. In the env/root/tf_state_s3 directory, run terragrunt apply and type yes when prompted to copy the local state to the s3 bucket.

The s3 bucket is now ready to go as a remote state store.

Creating users

Run terragrunt apply in the env/root/iam_users directory to create a user called terraform in the root account. Add as many users as you need or replace this step as required with your chosen auth providerl

Creating roles

We'll now create a role called terraform in the prod and dev accounts. Run terragrunt apply in the env/<env>/iam_roles directories to do this. N.B. This creates a role with admin privileges in those accounts that is assumable by the terraoform user. Notice the code in /env/prod/iam_roles/terragrunt.hcl that adds a dependency on the user stack in the root account:

dependency "iam_users" {
  config_path = "../../root/iam_users"
}

inputs = {
  terraform_user_arn = dependency.iam_users.outputs.terraform_user_arn
}

This is an example of Terragrunt's module syntax and lets you keep separation between resource stacks in different accounts while explicitly structuring the dependencies between them.

...and that's it

At this point you have a multi-account terraform setup with users defined in the root account, the ability to control access to environment accounts and the ability to create environments as required. Your terraform can be split into self-contained stacks, and you can easily control what gets deployed where and the dependencies between them (including across account boundaries)

Extra reading

DNS and certificates took me a little while to figure out but if you need them, you might find this section useful. Code is provided in the repo.

Route53

There are no zones defined in the root account and each environment defines their own zones. This is nice because once again it keeps DNS record management separated per-environment and allows for easy access control. One thing that complicates this slightly is that you need to provide the subdomain nameservers in a new NS record in the top-level zone (which in this setup lives in the prod environment). See here for more info.

To do that with Terragrunt you can output the subdomain nameservers and provide them as input to the prod/route53 stack where you can create the required NS records.

Letsencrypt certificates

If you're a Let's Encrypt user (you should be) you might find this certificate setup useful. If you're using Route53 DNS, Terraform can automatically update and store LE certificates in ACM. This makes it very easy to generate, roll and deploy certificates to anywhere in AWS.

The ACME Terraform provider needs access to Route53 to do this and if you want to use your certs in Cloudfront, you'll need to create a duplicate cert in the us-east-1 region. To see this all in action, check out the terraform in modules/certs/certs.tf. It'll create certs in your home region and us-east-1 and has the right lifecycle configuration to allow rolling of certificates when they're in use. This is still a bit of a pain though, check the README in the directory for more.

© 2023 Henry Course