create an S3 hosted Jekyll website with Terraform

2022-09-21

Mark Hughes

Why I'm here

A year ago I bought my personal domain and I never got around to using it, so when it came up for renewal I decided that I really ought to do something with it. By happy chance, I've been looking at things to do to brush up on certain cloud skills, so I thought "Hey! Why don't I build my personal website in S3?".

There are simpler ways to get a static site hosted, like Github pages, which works great with Jekyll and abstracts away a lot of the complexity, like certificates and routing, and is free, whereas an S3 hosted site can come with charges (So far mine isn't large enough, nor does it generate enough traffic, to incur charges, but we'll see). AWS does have some advantages, it provides faster responses than GH Pages, particularly if you use Cloudfront. Unlike GH-Pages, S3 allows HTTP redirects, and for your site you can use any static site generator, whereas GH only allows Jekyll. I realise that I've chosen to use Jekyll now, but if I decide I don't like it and want to give something else a try later on (I probably will), I can.

Getting Started

Prerequisites

To get started you'll need the following * An AWS Account with permissions to create resources in S3, Cloudfront, ACM, and Route 53. * Terraform and the AWS CLI, with your credentials configured. * A text editor or IDE. I've used VSCode but far be it from me to tell you what to do here. * Ruby and Jekyll installed on your computer. * Ownership of a domain, from any provider is fine.

Stage 1: Build your AWS resources

If you're new to Terraform, you might think it's easier to create your resources in the AWS Console, but I'm a bit of an evangelist and I swear that Terraform is the best thing since sliced bread. Since before sliced bread, even. Terraform will manage all your resources for you, allow you to modify them at will, and it makes it much easier to fix silly mistakes like typos in a bucket name.

The first thing to do when getting started with a new Terraform project is to decide how you want the state to be stored. For a one man project, a local TF state might be sufficient, but most projects won't be, and may have more than one person working on the code or making changes to the infrastructure, so it's a good habit to have the state in a remote backend. There are many backends you can use, but I'm going with S3, because I'm already using AWS it makes sense to have an S3 bucket for my state file.

remote state bucket

main.tf

provider "aws" {
  region = "eu-west-2"
}

resource "aws_s3_bucket" "terraform_state" {
  bucket = "mph-terraform-state"

  acl = "private"
  force_destroy = true

}

The provider block specifies that I'm using the AWS provider, and I'm creating my bucket in the EU-West-2 (London) region simply because I'm in the UK. The resource block defines the S3 bucket to create, and I've called it terraform state. The acl is set to private because I don't want the state file to be publicly accessible over the internet. force_destroy is optional but with it set to true, when you delete the bucket, any objects should be deleted too, to prevent errors. With this set to false, you will be protected from accidentally deleting anything in it. The following command creates the bucket using this configuration

terraform apply

This will plan out your resources and any changes it has to make, before asking you to confirm you want to apply the plan. When you type 'yes' it will go ahead and create that bucket. When that's done, you'll get a green 'Apply complete!' message, telling you how many resources were added, how many were changed, and how many were destroyed. You can find your bucket in the console, or use the aws s3 ls command to confirm it's there in your account.

Once we have the bucket that our state file is going to be added to, we can now get started on creating our project.

mkdir static-web
cd static web
code . # I use this command to open VS Code to this project folder

When you work with terraform, you could use a single file and create everything in that file. It will work. But it will be a bit messy and difficult to navigate so it's best to organise your resources into separate files. When Terraform runs, it stitches all the .tf files in the folder together and treats them as one anyway. So the first file we will work in will be called providers.tf

provider

providers.tf

terraform {
    required_version = "~> 1.0"

    required_providers {
      aws = {
        source = "hashicorp/aws"
        version = "~> 3.0"
      }
    }

    backend "s3" {
        bucket = "mph-terraform-state"
        key = "aws_remote_state_static_web"
        region = "eu-west-2"
    }
}

provider "aws" {
    region = "eu-west-2"
}

provider "aws" {
    alias = "acm_provider"
    region = "us-east-1"
}

The first block, terraform, specifies three things: * required version (the earliest version of terraform that should be used) * required providers (the plugins needed for this, in our case only the aws provider) * backend (this is where the state file will be stored)

For the backend, I have given it the following attributes - bucket: the name of the bucket, which is the one we created in the previous step - key: this will be the location of the statefile within the S3 bucket - region: This is the AWS region that the bucket's in

You will notice that I've got two provider blocks. This is because I'll be using ACM to manage SSL certificates on CloudFront, and when using ACM with Cloudfront, you must request certificates in the US East region, and all my other resources are in the eu-West-2 region, so I've declared the main provider with the eu-west-2 region, and another provider in the us-east-1 region, and when you have two of the same type of provider, you must give the 2nd one an alias or terraform won't know which one to use.

variables

Next we'll create the variable declarations and definitions. For this we use two files, one to define the variables, and a different file that is used to set the values for the variables

variables.tf

variable "domain_name" {
  type = string
  description = "The domain name for the website"
}

variable "bucket_name" {
    type = string
    description = "The name of the bucket without the www. prefix. Normally the same as the domain name"
}

variable "common_tags" {
    description = "Common tags you want applied to all components"
}

These are fairly self explanatory.

vars/variables.tfvars

domain_name = "markhughes.tech"
bucket_name = "markhughes.tech"

common_tags = {
  Project = "markhughes.tech"
}

The first two, domain name and bucket name, must be the same value, because the bucket you create to host the site has to be named the same as your domain name. The tags one you can create any tags you like and give them whatever values you see fit, these are useful in real-world projects where you can apply cost-centre or team tags.

S3

Now we're getting to the meaty part. We need to create our S3 bucket. In fact, we're going to create two buckets. This is because we'll have our website hosted in a bucket called www.markhughes.tech, but we also want requests to markhughes.tech to redirect requests to www.markhughes.tech, so we have a bucket for that, which is configured as a redirect bucket.

s3.tf

#S3 bucket for website
resource "aws_s3_bucket" "www_bucket" {
    bucket = "www.${var.bucket_name}"
    policy = templatefile("templates/s3-www-policy.json", { bucket = "www.$var.bucket_name}" } )

    tags = var.common_tags
}

resource "aws_s3_bucket_website_configuration" "www_config" {
    bucket = aws_s3_bucket.www_bucket.bucket

    index_document {
      suffix = "index.html"
    }

    error_document {
      key = "404.html"
    }
}

resource "aws_s3_bucket_cors_configuration" "www_cors" {
    bucket = aws_s3_bucket.www_bucket.id

    cors_rule {
        allowed_headers = ["*"]
        allowed_methods = ["POST"]
        allowed_origins = ["https://www.${var.domain_name}"]
        max_age_seconds = 3000
    }

    cors_rule {
        allowed_methods = ["GET"]
        allowed_origins = ["*"]
    }
}

resource "aws_s3_bucket_acl" "www_acl" {
    bucket = aws_s3_bucket.www_bucket.id
    acl = "public-read"
}


#S3 bucket for redirecting non-www to wwww
resource "aws_s3_bucket" "root_bucket" {
    bucket = var.bucket_name
    policy = templatefile("templates/s3-root-policy.json", { bucket = var.bucket_name })

    tags = var.common_tags
}

resource "aws_s3_bucket_website_configuration" "root_config" {
    bucket = aws_s3_bucket.root_bucket.bucket

    redirect_all_requests_to {
      host_name = "www.${var.domain_name}"
      protocol = "https"
    }
}

resource "aws_s3_bucket_acl" "root_acl" {
    bucket = aws_s3_bucket.root_bucket.id
    acl = "public-read"
}

There are a few resource blocks here, I'll go through them one by one.

resource "aws_s3_bucket" "www_bucket" {
    bucket = "www.${var.bucket_name}"
    policy = templatefile("templates/s3-www-policy.json", { bucket = "www.$var.bucket_name}" } )

    tags = var.common_tags
}

we've called this resource www_bucket. This is going to be the main bucket, where all our web code is hosted, and we'll reach the site in our browsers at www.markhughes.tech We have three attributes - bucket: this is the name we give to the bucket. This has to be the same as the domain, but see we've put the www in front of the domain name. - policy: this is an access policy we give to the bucket. The bucket has to be publicly accessible - tags: any tags we want to give to the resources.

resource "aws_s3_bucket_website_configuration" "www_config" {
    bucket = aws_s3_bucket.www_bucket.bucket

    index_document {
      suffix = "index.html"
    }

    error_document {
      key = "404.html"
    }
}

this resource is the bucket website configuration that is applied to our bucket. - bucket: this is the name of the bucket we want to attach this configuration to. - index_document: This is the home page that will be loaded when you go to the domain in your browser - error_document: This will be the document that is presented when you encounter an error, which in most cases will be a 404, so we've called ours 404.html and present a page not found message.

resource "aws_s3_bucket_cors_configuration" "www_cors" {
    bucket = aws_s3_bucket.www_bucket.id

    cors_rule {
        allowed_headers = ["*"]
        allowed_methods = ["POST"]
        allowed_origins = ["https://www.${var.domain_name}"]
        max_age_seconds = 3000
    }

    cors_rule {
        allowed_methods = ["GET"]
        allowed_origins = ["*"]
    }
}

This block is where we declare the cors configuration for our website. - bucket: As before, the name of the bucket to attach this configuration to. - cors_rule: The first one that declares where POST requests are allowed from, in this case only from our domain - cors_rule: the second one we want to allow GET Requests from anywhere on the internet 100 cors rules can be configured in a single configuration, but we only need these two.

resource "aws_s3_bucket_acl" "www_acl" {
    bucket = aws_s3_bucket.www_bucket.id
    acl = "public-read"
}

This is the acl that gets applied to our bucket. - bucket: you guessed it... - acl: the acl rules that we apply to the bucket.

NOTE The eagle eyed among you might notice that this is different to how I created the acl in the state bucket, that's because I created the state bucket a while ago and have just reused it for this project and haven't updated it to use this newer, preferred way of configuring the acl

resource "aws_s3_bucket" "root_bucket" {
    bucket = var.bucket_name
    policy = templatefile("templates/s3-root-policy.json", { bucket = var.bucket_name })

    tags = var.common_tags
}

This is the second bucket, for the root domain, which we've called root_bucket. The attributes are pretty much the same as the first bucket, except the name doesn't have the 'www' part.

resource "aws_s3_bucket_website_configuration" "root_config" {
    bucket = aws_s3_bucket.root_bucket.bucket

    redirect_all_requests_to {
      host_name = "www.${var.domain_name}"
      protocol = "https"
    }
}

This is the website configuration for the root bucket. Instead of the index and error documents, we just have a different attribute - redirect_all_requests_to: this tells it to redirect all requests to the www domain, and use https

resource "aws_s3_bucket_acl" "root_acl" {
    bucket = aws_s3_bucket.root_bucket.id
    acl = "public-read"
}

This acl is the same as the www one, because we want both buckets to be puclicly readable.

acm

Once we've got the configuration for the S3 buckets, we can move onto the certificates, using acm

acm.tf

#SSL Certificate
resource "aws_acm_certificate" "ssl_certificate" {
    provider = aws.acm_provider
    domain_name = var.domain_name
    subject_alternative_names = [ "*.${var.domain_name}" ]
    validation_method = "DNS"

    tags = var.common_tags

    lifecycle {
        create_before_destroy = true
    }
}

resource "aws_route53_record" "cert_validation" {
     for_each = {
        for dvo in aws_acm_certificate.ssl_certificate.domain_validation_options : dvo.domain_name => {
            name   = dvo.resource_record_name
            record = dvo.resource_record_value
            type   = dvo.resource_record_type
        }
     }

    allow_overwrite = true
    name            = each.value.name
    records         = [each.value.record]
    ttl             = "60"
    type            = each.value.type
    zone_id         = aws_route53_zone.main.zone_id
}

resource "aws_acm_certificate_validation" "cert_validation" {
    provider = aws.acm_provider
    certificate_arn = aws_acm_certificate.ssl_certificate.arn
    validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

The first resource is the certificate itself.

resource "aws_acm_certificate" "ssl_certificate" {
    provider = aws.acm_provider
    domain_name = var.domain_name
    subject_alternative_names = [ "*.${var.domain_name}" ]
    validation_method = "DNS"

    tags = var.common_tags

    lifecycle {
        create_before_destroy = true
    }
}

this resource tells ACM to create a certificate, which we call ssl_certificate - provider: Remember when I said we created a separate provider with an alias, for us-east-1? We're referencing that here, to create this certificate in Virginia instead of London. - domain_name: This is going to be the certificate cn. - subject_alternative_names: This field allows us to declare alternative domains that the certificate can work with. Using *.${var.domain} allows us to use this certificate with any and all subdomains - validation_method: This is important. AWS needs you to prove you own the domain before it can issue certificates to you, and this is how they do that. You can use either DNS or email validation, DNS is more automated, so is better, but with some providers not always possible, so email is an option. We'll come back to this later. - tags: you know what these are for - lifecycle: using the create_before_destroy argument here, if terraform has to change this resource, it creates a new one before destroying the old one.

resource "aws_route53_record" "cert_validation" {
     for_each = {
        for dvo in aws_acm_certificate.ssl_certificate.domain_validation_options : dvo.domain_name => {
            name   = dvo.resource_record_name
            record = dvo.resource_record_value
            type   = dvo.resource_record_type
        }
     }

    allow_overwrite = true
    name            = each.value.name
    records         = [each.value.record]
    ttl             = "60"
    type            = each.value.type
    zone_id         = aws_route53_zone.main.zone_id
}

This creates a route 53 record (or multiple, in some cases) that is used as part of the DNS validation process. This part does some DNS wizardy that I didn't fully read about but is important to get it working.

resource "aws_acm_certificate_validation" "cert_validation" {
    provider = aws.acm_provider
    certificate_arn = aws_acm_certificate.ssl_certificate.arn
    validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

This is the resource that uses the created CNAME record to validate the certificate, and when Terraform is running, this bit will wait on you to update your DNS records and do it's thing. - provider: this also uses the Virginia AWS provider - certificate_arn: this refers to the certificate block we created above, this tells the validation which certificate it needs to validate - validation_record_fqdns: this is the dns of the route_53 record created above. I have to admit I don't fully know how it works but I think the validation polls that record until it gets the cname record returned from your domain.

Cloudfront

Next we've got the Cloudfront part. This is where we'll create the cloudfront distributions that will sit in front of our S3 buckets, which has two key benefits. S3 on its own doesn't work with SSL certificates, you can only have http, not https. Cloudfront provides a way to use SSL for an S3 hosted site, the other benefit is that it provides caching in multiple regions across the globe, reducing latency for users.

cloudfront.tf

#Cloudfront distribution for main site
resource "aws_cloudfront_distribution" "www_s3_distribution" {
    origin {
        domain_name = "www.markhughes.tech.s3-website.eu-west-2.amazonaws.com"
        origin_id = "S3-www.${var.bucket_name}"

        custom_origin_config {
            http_port = 80
            https_port = 443
            origin_protocol_policy = "http-only"
            origin_ssl_protocols = ["TLSv1", "TLSv1.1", "TLSv1.2"]
        }
    }

    enabled = true
    is_ipv6_enabled = true
    default_root_object = "index.html"

    aliases = ["www.${var.domain_name}"]

    custom_error_response {
        error_caching_min_ttl = 0
        error_code = 404
        response_code = 200
        response_page_path = "/404.html"
    }

    default_cache_behavior {
        allowed_methods = ["GET", "HEAD"]
        cached_methods = ["GET", "HEAD"]
        target_origin_id = "S3-www.${var.bucket_name}"

        forwarded_values {
            query_string = false
            cookies {
                forward = "none"
            }
        }

      viewer_protocol_policy = "redirect-to-https"
      min_ttl = 31536000
      default_ttl = 31536000
      max_ttl = 31536000
      compress = true
    }

    restrictions {
        geo_restriction {
            restriction_type = "none"
        }
  }

  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }

  tags = var.common_tags
}

# Cloudfront S3 for redirect to www.
resource "aws_cloudfront_distribution" "root_s3_distribution" {
  origin {
    domain_name = "markhughes.tech.s3-website.eu-west-2.amazonaws.com"
    origin_id = "S3-.${var.bucket_name}"
    custom_origin_config {
      http_port = 80
      https_port = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols = ["TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }

  enabled = true
  is_ipv6_enabled = true

  aliases = [var.domain_name]

  default_cache_behavior {
    allowed_methods = ["GET", "HEAD"]
    cached_methods = ["GET", "HEAD"]
    target_origin_id = "S3-.${var.bucket_name}"

    forwarded_values {
      query_string = true

      cookies {
        forward = "none"
      }

      headers = ["Origin"]
    }

    viewer_protocol_policy = "allow-all"
    min_ttl = 0
    default_ttl = 86400
    max_ttl = 31536000
  }

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }

  tags = var.common_tags
}

This file has two resources, a cloudfront distribution for each of our S3 buckets.

resource "aws_cloudfront_distribution" "www_s3_distribution" {
    origin {
        domain_name = "www.markhughes.tech.s3-website.eu-west-2.amazonaws.com"
        origin_id = "S3-www.${var.bucket_name}"

        custom_origin_config {
            http_port = 80
            https_port = 443
            origin_protocol_policy = "http-only"
            origin_ssl_protocols = ["TLSv1", "TLSv1.1", "TLSv1.2"]
        }
    }

    enabled = true
    is_ipv6_enabled = true
    default_root_object = "index.html"

    aliases = ["www.${var.domain_name}"]

    custom_error_response {
        error_caching_min_ttl = 0
        error_code = 404
        response_code = 200
        response_page_path = "/404.html"
    }

    default_cache_behavior {
        allowed_methods = ["GET", "HEAD"]
        cached_methods = ["GET", "HEAD"]
        target_origin_id = "S3-www.${var.bucket_name}"

        forwarded_values {
            query_string = false
            cookies {
                forward = "none"
            }
        }

      viewer_protocol_policy = "redirect-to-https"
      min_ttl = 31536000
      default_ttl = 31536000
      max_ttl = 31536000
      compress = true
    }

    restrictions {
        geo_restriction {
            restriction_type = "none"
        }
  }

  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate_validation.cert_validation.certificate_arn
    ssl_support_method = "sni-only"
    minimum_protocol_version = "TLSv1.1_2016"
  }

  tags = var.common_tags
}

This is a big block, and a lot of the attributes are self explanatory so I won't explain all of them, and are mostly the same for both distributions so I'll just go over the first, but the key ones are as follows - origin: this specifies the domain name and ID of the origin server, which are the endpoint and a name for the www S3 bucket, respectively. - enabled: Whether the distribution is enabled to accept end user requests for content. - default_root_object: the objet you want cloudfront to return when a user requests the root URL - viewer_protocol_policy: This specifies the protocols that users can use to access the site. This effectively redirects http to https - restrictions: If you wanted to prevent users in a certain region from accessing your site, you could add restrictions here. I've gone with none. - viewer_certificate: This specifies the certificate you want to use with the distribution, we've specified the arn of the certificate we created in acm.tf

Route 53

The last bit of terraform code we have to create is for Route 53, this simply creates a hosted zone for your domain, and in there it creates records for your www and root domains, pointing them to their respective cloudfront distributions

resource "aws_route53_zone" "main" {
    name = var.domain_name
    tags = var.common_tags
}

resource "aws_route53_record" "root-a" {
    zone_id = aws_route53_zone.main.zone_id
    name = var.domain_name
    type = "A"

    alias {
        name = aws_cloudfront_distribution.root_s3_distribution.domain_name
        zone_id = aws_cloudfront_distribution.root_s3_distribution.hosted_zone_id
        evaluate_target_health = false
    }
}

resource "aws_route53_record" "www-a" {
    zone_id = aws_route53_zone.main.zone_id
    name = "www.${var.domain_name}"
    type = "A"

    alias {
        name = aws_cloudfront_distribution.www_s3_distribution.domain_name
        zone_id = aws_cloudfront_distribution.www_s3_distribution.hosted_zone_id
        evaluate_target_health = false
    }
}

The main zone just needs the domain name The individual records A records with aliases directing to the cloudfront distribution domain names.

To build all of this we first need to initialise terraform, then plan and apply the configuration.

terraform init

terraform plan --var-file vars/variables.tfvars

terraform apply --var-file vars/variables.tfvars

terraform init will install our providers and any plugins we're using

terraform plan --var-file vars/variables.tfvars will show you all the infrastructure which has changed or will be created, in this case nothing will be changed and it'll all be created.

terraform apply --var-file vars/variables.tfvars will show you all the changes, ask you to confirm, then will begin creating it all. In the terminal you'll see it running.

While it's running, if you navigate to ACM in the console and look for domains you'll see two entries that have your domain, plus the domain with the wildcard. They both have the same CNAME name and values, you'll need to copy one of them and add it to your DNS provider's records, because the validation step will be waiting for this. Once you've added it, leave terraform running because it takes some time to validate, around 30-35 mins.

You'll also need to point your domain name servers to the AWS servers. These will be in the Route 53 console.

Once everything is up and running, you can go to your domain but will get an error, because we have the infrastructure up and running, but we haven't created the site yet.

Stage 2: Create the website

Run the following commands to create a new jekyll project and upload the static HTML files to the S3 bucket, but make sure to replace my bucketname with yours.

 jekyll new www 
cd www
aws s3 sync _site s3://www.markhughes.tech --delete

This uploads a template Jekyll website to your S3 bucket, that you'll be able to access from your domain. The delete flag will delete any files that exist in the bucket but not on your local machine.

It is possible to use Terraform to also upload your website files so that you don't also have to run the aws s3 sync command, but I haven't really done it that way before. Maybe sometime in the future I'll do it that way and update this post.