This blog: A serverless experiment

I’ve been playing with AWS for a few years, on and off. I’ve used many individual pieces of what’s available - but I haven’t yet tried to combine a larger set of feature into a thing. Until now. This blog.

This blog is powered by jekyll. Jekyll can build your blog from markdown files into a static tree that you can serve from any old webserver. With no server-side code, and a pretty small file size, it would have been a pretty good fit for the small web hotel that was included in my Internet-subscription back in 1998 (10MB for free!). Jekyll is used in Github pages, my daytime job blogs (1, 2), among other things.

After much trying and failing, I ended up with a Cloud Formation-template which assemble together a stack of resources that are used to deliver this site. The template can be invoked again to create another stack. I have kept two resources outside the template: The SSL wildcard certificate for my domain, and the Route 53 hosted DNS zone.

The stack

  • S3 website-bucket
  • Code Commit-repository
  • Lambda function
  • Cloudfront-Distribution
  • Route 53 recordsets (IPv4 and IPv6)
  • IAM Group
  • Two IAM Roles
  • Instance profile
  • Extra set of Cloudfront-distribution and Route 53 recordsets for handling redirect from the www.-prefix to the proper URL.

So a decent shopping list.

The bucket

A simple S3 bucket, configured as a website bucket, using index.html as an index page, and 404.html as an error page. This bucket can be addressed directly by HTTP, and one can go by with only a bucket and DNS if HTTP is enough. However, adding CloudFront on top of the stack does provide the usual benefits of a CDN, and SSL/HTTP2-support - last but not least: lower bandwidth costs than transporting data from an AWS region for edge traffic in Europe/America.

The git-repository

CodeCommit lets you create git-repositories. You can push code to them, branch, create merge requests - the usual git-stuff. These can have triggers, which can be a Lambda-function, or a SNS-topic. I want the site to be built when I push to the repository, so I make the trigger call a Lambda-function.

The pushTrigger-function

Lambda can be used to run some piece of code without deploying it to a instance. In this case we want the function to spawn an instance which will set up an environment which can be used to build the site - clone the repository, run a script, then shut itself down (and terminate itself). The following python code does the trick.

import os
import boto3

def pushHandler(event, context):
    client = boto3.client('ec2', region_name=os.environ.get('REGION'))
    res = client.run_instances(ImageId='ami-5e29aa31',
                               InstanceType='m3.medium',
                               MinCount=1, MaxCount=1,
                               InstanceInitiatedShutdownBehavior='terminate',
                               IamInstanceProfile= { "Arn": os.environ.get('INSTANCEPROFILE') },
                               UserData="""#!/bin/bash
    yum install aws-cli git -y
    yum install -y curl gpg gcc gcc-c++ make
    gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
    curl -sSL https://get.rvm.io | bash -s stable
    usermod -a -G rvm ec2-user
    cd ~ec2-user
    sudo -u ec2-user git config --global credential.helper '!aws codecommit credential-helper $@'
    sudo -u ec2-user git config --global credential.UseHttpPath true
    sudo -u ec2-user git clone https://git-codecommit."""+os.environ.get('REGION')+""".amazonaws.com/v1/repos/"""+os.environ.get('REPONAME')+"""
    cd ~ec2-user/"""+os.environ.get('REPONAME')+"""
    ls -la
    sudo -u ec2-user bash sync.bash
    aws s3 cp /var/log/cloud-init-output.log s3://"""+os.environ.get('TARGETBUCKET')+"""/output-last.txt
    poweroff
    """)
    if len(res['Instances']) == 1:
        return {"message": "success"}
    else:
        raise Exception('I failed!')

It starts an instance, passes a bash script which will be run during first boot of the instance. “But where are the credentials?” you might say. Let’s talk about roles.

The pushTrigger-role

IAM-policies and roles may be my favourite feature of AWS. Not because dealing with permissions is in any way fun. It’s not. The AWS SDK will look for its access keys in multiple locations; It will use the access keys if provided to the function, it will use environment variables if set, and it will query the metadata-API and use those - if provided. When a resource has a role applied on itself, the AWS SDK will find the access keys on its own. If these keys are compromised, the damage is limited - the keys are only valid for a limited time - and only when used from a certain location.

  PushTriggerRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
        - PolicyName:
            Fn::Join:
            - "-"
            - - Ref: AWS::StackName
              - "PushTriggerPolicy"
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              Effect: Allow
              Action:
                - iam:PassRole
                - ec2:RunInstances
              Resource:
                - !GetAtt UpdateBucketRole.Arn
                - "arn:aws:ec2:*:*:subnet/*"
                - "arn:aws:ec2:*::image/ami-5e29aa31"
                - "arn:aws:ec2:*:*:instance/*"
                - "arn:aws:ec2:*:*:volume/*"
                - "arn:aws:ec2:*:*:security-group/*"
                - "arn:aws:ec2:*:*:network-interface/*"

The Lambda-function has access to create an ec2-instance with one specific image, and to pass one specific AWS role along to the instance. The resource section can be further stripped down to only allow the instance to live in a certain network, using a specific security group.

The UpdateBucket-role

In the pushTrigger-role above, the instance has a role passed on to it. This is for the access the ec2 instance need to deploy the site. It needs to be able to clone the CodeCommit-repository, and it needs to be able to write data to the S3 bucket. The gist of how the policy document is set up is pretty similar; A list of actions and resources. But a difference between a role for a lambda function, and a role for an ec2-instance is that the instance also need an instance-profile resource.

  UpdateBucketProfile:
    Type: "AWS::IAM::InstanceProfile"
    Properties:
      Roles:
      - Ref: UpdateBucketRole
  UpdateBucketRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
              - ec2.amazonaws.com
          Action:
            - sts:AssumeRole
      Path: "/"
      Policies:
        - PolicyName:
            Fn::Join:
            - "-"
            - - Ref: AWS::StackName
              - "UpdateBucketPolicy"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              Effect: Allow
              Action:
                - codecommit:GetRepositoryTriggers
                - s3:*
                - codecommit:GetTree
                - codecommit:GitPull
                - codecommit:BatchGetRepositories
                - codecommit:GetObjectIdentifier
                - codecommit:GetBlob
                - codecommit:GetReferences
                - codecommit:CancelUploadArchive
                - codecommit:GetCommit
                - codecommit:GetUploadArchiveStatus
                - codecommit:GetCommitHistory
                - codecommit:GetRepository
                - codecommit:GetBranch
              Resource:
                - Fn::Join:
                  - ""
                  - - "arn:aws:s3:::"
                    - Ref: Hostname
                - Fn::Join:
                  - ""
                  - - "arn:aws:s3:::"
                    - Ref: Hostname
                    - "/*"
                - Fn::Join:
                  - ":"
                  - - "arn:aws:codecommit"
                    - Ref: AWS::Region
                    - Ref: AWS::AccountId
                    - Ref: RepoName

The policy gives full access to the S3 bucket, and complete read access to the code commit repository. One can probably cut down access to some CodeCommit functions, and for example disable the ability to delete the S3 bucket to further restrict to need-to-access.

The IAM Group

When creating all these resources as part of a stack, it’s nice to think of “what kind of access is needed to work on this solution?” - and create a set of groups that provide that access. That way, you can create separate users in your AWS account that have access to only what they need. In this case, the stack consists of a group with access to the CodeCommit-repository, and to the S3 bucket. The latter is mostly for debugging and convenience.

We’ve covered the bucket that contains the site, we’ve covered the repository, the push-trigger, and the role based accesses that connect these things. But it is still not available to access. Let’s talk about CloudFront (CDN) and Route 53 (DNS).

The CloudFront Distribution

We set up a distribution that use the SSL certificate that I already have created in Amazon Certificate Manager, limit this to SNI, turn on HTTP2 and IPv6 (I can’t really see any reason not to). Since the origin is a S3 bucket, we strip query strings and cookies from any requests - allowing the cache to be as efficient as can be. All requests will be redirected to https.

A quirk here is that the origin is the website endpoint of the bucket. You can use a bucket as the backend directly - but in this configuration, the index-documents will only work on the top level. As Jekyll generates pages in subdirectories, and by default will require the index-document to be looked for - we need to use the website endpoint.

CloudFront supports invalidation requests for the cache, and if it weren’t for the fact that you currently can not grant access to creating invalidation requests on a single distribution - it’s either all distributions or none - I would allow the ec2 instance to invalidate the entire cache on deployment as a convenience.

The Route53 Recordsets

We create two recordsets in our hosted zone; one A-record, and one AAAA-record. These are both AliasTargets against the DNS-name of the cloud front distribution. AliasTargets works around the limitation that you can’t use a CNAME to alias a different domain on top level. This means that trygvevea.com can be an alias to a cloudfront-distribution - which will move around, depending on what Amazon wants to do.

The redirect-resources

We have another S3-bucket, CloudFront-Distribution and two Route53 recordsets. These are created to handle a redirect from www.trygvevea.com to trygvevea.com. The redirect itself is performed by the S3 bucket. The Cloudfront-distribution will terminate SSL, and just relay the redirect configured in the S3 bucket. The recordsets alias cloud front. This isn’t necessary if you accept that the www.-prefix would be a dead page.

The end result

Drawing of the flow when updating or visiting the site

Is this over-engineered?

Yes. No. Maybe. Depends.

For a simple blog? Yes. As an exercise to get hands-on experience using the AWS-toolset to create a world class scalable stack to deliver an application? Not at all.

Back in 2012, I created a small web based game called Grabitty LITE (I am still doing work on Grabitty - which I hope to release someday) - this was served from a small web server. The game was covered by a live streamer with about 3000 viewers, who wanted to play the game all at once. This managed to saturate the bandwidth of my webserver. After this happened, I moved the game over to a S3 website bucket - and never had the problem again.

This skill set, and this way of thinking / solving problems, does, and will in the future, serve me well.

Architecture Security

The exposed attack-surface is very limited. There’s basically no software that can become compromised - and if I were to forget to touch this blog for a few years, it will still be there just as I left it. The only real attack vector is the credentials of any user that has access to update the code. Again, that can become a significantly more difficult attack to execute with multi factor authentication for the user.

There are some weaknesses with the instance - it sets up everything on each run, and pipe some URL to bash as root. That is probably the worst part of this design. This can be remedied by preparing an AMI in advance that is set up with the tools you need. In that case, you don’t have to worry about attacks coming from upstream code updates.

Costs?

It’s hard to tell what this costs. My experience with AWS is that “the first shot is free” is a quite accurate description. However, the design I’ve described above limits the potential cost pretty much to bandwidth - most other things are either really cheap, or within the free tier.

  • CodeCommit: Very unlikely that I will exceed the never-expiring free tier. (5 users, 50GB storage, ~10000 “git requests” per month)
  • S3 storage: The blog is only a few MB large, and it’s unlikely that it will exceed 1GB. That’s $0.0245 today. Most transfer costs will be limited by caching in the CDN.
  • EC2: Since the instance sets up everything every time, it takes ~15-20 minutes before it has performed a deploy. That amounts to a couple of cents for every push. This can be reduced by preparing an AMI in advanced. At one weekly blogpost, this isn’t something I’d be worried about.
  • CloudFront: A page view of this blog is ~250KB or so, which would mean that 4000 page views is around 1GB - at $0.085.

So I would expect the total costs to be less than a dollar a month. The wildcard is bandwidth.