16 April, 2023
Multi-env AWS with Terraform and Terragrunt revisited
Since publishing the first version of this article
I've built a startup using the same techniques to manage the AWS infrastructure. The fundamental concept has worked
really well but
on the way I've made a few improvements to multi-region support as well as reducing duplication in the Terragrunt
configuration. It's nothing groundbreaking but there's enough there to warrant an updated article. I've pushed some new
example code to the same repository and the original code is on a branch
called v1
.
Multi-region support
Operating across 2 continents, we needed our product to work across multiple regions from day one. I made a few tweaks to our terraform stack to support this. Within each environment directory I added directories for each region while keeping global or non-aws modules at the top level.
- prod
- cloudfront
- datadog
- eu-west-2
- s3
- us-east-1
- ...
Within each region directory is a region.yaml
file that contains the region name
region: eu-west-2
In base.hcl
we add some logic to resolve the current region as a module input
# Load env vars...
locals {
env_vars = yamldecode(join("\n", [
file(find_in_parent_folders("env.yaml")),
fileexists("${get_terragrunt_dir()}/../region.yaml") ? file("${get_terragrunt_dir()}/../region.yaml") : ""
]))
aws_root_account_id = "<ROOT_ACCOUNT_ID>"
}
# ...and make them available as inputs
inputs = {
region = try(local.env_vars.region, "eu-west-2") # Default to eu-west-2
env = local.env_vars.env
aws_account_id = local.env_vars.aws_account_id
aws_root_account_id = local.aws_root_account_id
}
We can now access region
as a variable in every terraform module
variable "env" {
type = string
}
And our generated provider block is configured to select the relevant region
provider "aws" {
region = "${local.env_vars.region}"
profile = "${local.env_vars.env}"
}
Slightly DRYer Terragrunt
Although this is well documented elsewhere, I moved a lot of the common Terragrunt configuration to a separate
directory. For example, /env/_common/certs.hcl
contains
terraform {
source = "../../../../modules//certs"
}
dependency "lets_encrypt" {
config_path = "../../lets_encrypt"
}
inputs = {
example_com_acme_certificate = dependency.lets_encrypt.outputs.acme_certificate
}
which has the advantage of standardising dependency and input configuration for cert modules in all regions and
environments. /env/<env_name>/<region>/terragrunt.hcl
can be reduced to:
include {
path = find_in_parent_folders("base.hcl")
}
include "certs" {
path = "${get_terragrunt_dir()}/../../../_common/certs.hcl"
}
while retaining the option to add additional dependencies or inputs per-env as required.
Running the stack
I had some really useful feedback from @yb-jmogavero that made me realise I hadn't made clear how this stack is meant to be run. In summary, I've had a go running this in CI/CD but at our scale it's quicker (and a bit less scary) for me to make infrastructure changes manually.
As a compromise we run
terragrunt run-all init -upgrade --terragrunt-non-interactive && terragrunt run-all plan --terragrunt-non-interactive
in /env
whenever infrastructure changes are present in a pull request.
We use the Terragrunt dependencies
block to make explicit the implicit dependencies between Terragrunt
modules e.g. between aws_accounts
and iam_roles
. This ensures that the run-all
command won't attempt to plan a
module before its dependents have been executed. You can verify your module dependencies by installing the dot
graphing
tool and running terragrunt graph-dependencies | dot -Tsvg > graph.svg
in /env
.
It is worth mentioning that at the time of writing run-all
will still execute a module even if execution of its
dependents fail. this can create a very sticky situation situation if you're using Terragrunt to
delete a stack as subsequent attempts to delete the upstream module will fail.
This approach has worked reasonably well for us and gives the team automatic feedback if their infrastructure changes fail to plan. This ensures that I only need to review pull requests that are known to plan correctly and can then apply and fix any errors that arise at apply time.
Thanks for reading and hope it was useful. Submit a PR/issue if you've got any suggestions, I'm always up for a chat and keen to hear what you're building.