Why I like AWS CDK

I love to code, and I love to work in infrastructure. Back when I started out in my professional career, in the early 2000s, these two things were, more often than not, areas of concern of different professions. Nowadays, with SRE teams in many companies, DevOps being considered a lifestyle and platform engineer becoming an established role, there is plenty of work to do with my particular passions.

One of the fun things that came out of the fusion of these skills - or rather the demands and opportunities of cloud infrastructures - is of course Infrastructure as Code (IaC). While the landscape of IaC tooling is still limited - compared to the humongous diversity of choice you have when solving “pure” coding problems - and there was a time when this whole endeavour threatened to be the exclusive domain of declarative DSLs, which I am not a fan of - there are at least a few contenders out there that scratch both my itches. One in particular is the topic of this post: AWS CDK, which is (as the name implies) a tooling from and exclusively for the AWS ecosystem.

What makes AWS CDK great?

The AWS Cloud Development Toolkit (CDK) is a 2nd generation infrastructure as code solution. By 2nd generation I am emphasizing the evolution from the previously mentioned purely declarative (JSON / YAML / ..) approaches, which I consider first generation.

Admittedly, those early toolings already solved most of the problems: Having a reproducible, documented (by code), consistent description of your infrastructure, that can be automatically executed from within a CI/CD pipeline, checked into version control, code reviewed, (integration / end-to-end) tested and all the things. That was the promise and that was what they delivered. However, working with CloudFormation YAML and on and off with Terraform HCL for a few years now, there is room for improvement. That in itself is not a critic, it is to be expected. First iteration rarely are perfect. Nor should they be.

In a sense, the first generation toolings addressed (and largely solved) functionality. Machines; satisfy the APIs. The second generation tooling now also addresses usability. Humans; make it maintainable; make it fun!

With the full power of a programming language - like Typescript (or Python, or whichever flavor you prefer) - AWS CDK really hit a nerve - or alas mine. So will / do Pulumi and Terraform CDK (🤞) - but that is another story. What I am getting at: The C in IaC moves further towards coding, as in programming in a high-level programming language, which comes with a lot of inherent benefits.

This article is not about comparing IaC solutions, but taking a look at AWS CDK from a platform engineer perspective. From here on out I will only make comparisons to CloudFormation YAML, as that is what AWS CDK replaces (well, it still builds and executes CloudFormation YAML under the hood - if you want to be nitpicky - but since you do not write any CloudFormation YAML anymore, I think to say “replacing” is not too far fetched).

The right level of abstraction

One thing I really love about AWS CDK is that it provides “the right level” of abstraction for my kind of work, meaning: I do not have to deal with all details of all resources - or even all involved resources - unless I need to for reasons outside of the tooling.

To give you an example, here is full amount of required code to setup a VPC, including subnets, routing tables and NAT gateways with associated EIPs:

import * as cdk from '@aws-cdk/core';
import * as ec2 from '@aws-cdk/aws-ec2';

export class VpcDemoStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // setup VPC
    new ec2.Vpc(this, 'Vpc');
  }
}

Doesn’t look overly excessive, does it?

Now to the reasons outside of the tooling. For example: what if you do not want to use NAT gateways, but NAT instances - maybe because your dev cluster doesn’t need to cost an arm and a leg - then you can adjust that part easily.

While we are at cost reduction, per default all availability zones (AZ) in the region are used. That means subnets, routing tables and NAT resources are created in all of them. Let’s limit the amount of AZs to two, so we still have redundancy but at lower cost.

All together, here you go:

import * as cdk from '@aws-cdk/core';
import * as ec2 from '@aws-cdk/aws-ec2';

export class VpcDemoStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // setup VPC
    new ec2.Vpc(this, 'Vpc', {
        maxAzs: 2,
        natGatewayProvider: new ec2.NatInstanceProvider({
            instanceType: new ec2.InstanceType('t3.micro'),
        })
    });
  }
}

This design - that allows you to engage with details only when and if you need to - follows a principle that is close to what the UI/UX folks call progressive disclosure. Especially when dealing with IaC that manages complex cloud resources (which often require a lot of boilerplate that does not need tinkering with in many cases) this can help a lot to keep your code clean and maintainable. I am a big fan, if that wasn’t clear.

Consider reviewing the changes in between the above code examples vs reviewing the changes in equivalent CloudFormation YAML files.

Standardization & Compliance without the toil

Another thing I very much appreciate is how easy it is to align and standardize common settings of all your cloud resources. If you ever had the questionable pleasure of manually adding the same set of tags to all the resources in all your stacks which are written in pure CloudFormation YAML, you will be happy to hear that there is an out-of-the-box solution for that in AWS CDK.

But that is not all. If you are working in any kind of enterprise setup, think about compliance with company-wide best practice like “encrypt all S3 buckets” or generally “encrypt all that can be encrypted (S3, SNS, SQS, ..)” or whatever you think should apply to all or a specified set of resources of your choosing. Again, if you ever went through the pain of updating all the existing stacks in order to comply with a new standard / best practice / policy / gut feeling / you-name-it - you will be pleased how AWS CDK helps you solve that.

If I peaked your interest, then let me introduce you to what AWS calls Aspects. In short: Aspects allow you to apply an operation on a defined set of resources. The defined set could be as easy as all, or only S3 buckets, or only S3 buckets which already have tag X, or whatever you can think of. The operation can be tagging, adding the right encryption, adding a forwarded header to all behaviors of all CloudFront distribution … The sky is the limit.

Simple case: Tagging

Ok, enough talk, let’s get back to the code. First the tagging part. AWS took an effort to wrap that so that you do not even need to understand how to work with Aspects. Good on them! Here a code example, extending the generic bin/<your-stack>.ts file that a cdk init .. would have created for you:

#!/usr/bin/env node
import 'source-map-support/register';
import * as cdk from '@aws-cdk/core';
import { Tags } from '@aws-cdk/core';
import { MyStack } from '../lib/my-stack';

const app = new cdk.App();
new MyStack(app, 'MyStack');

// add the tags service=my-service and team=my-team to ALL (taggable) resources
Tags.of(app).add('service', 'my-service');
Tags.of(app).add('team', 'my-team');

I don’t know about you, but I really like that. Keep in mind that this can be done “on every level”. Above, it will be applied to all resources “in the App”. You could equally easily do it from within the stack (think for example: NestedStacks). The type of the first parameter for the of() function - where I inserted the app instance - is a cdk.IConstruct. Every resource that AWS CDK deals with, be it the cdk.App, a cdk.Stack or an s3.S3 bucket. All of them implement the cdk.IConstruct interface and can be used!

Create generic Aspects

Ok, now you saw tags, which are a specific case of aspects, let’s take look at the general case. At the core of how aspects work is the cdk.IAspect interface:

interface IAspect {
   visit(node: IConstruct): void;
}

This is quite straight forward, a classical visitor pattern. Any aspect needs to implement exactly one method: visit. Note that the node parameter is of the type IConstruct, hence all aspects can be applied to any resource - and in turn: decision on which resource (type(s)) it should act upon must happen in the method body.

Now assume you have a best practice, which says: All S3 buckets must be encrypted server-side (which is a good advice anyway). Let me first show you how the aspect would look like, then following how to apply it.

// assume this is in a file in your CDK project, e.g. `lib/s3-encryption-aspect.ts`
import * as cdk from '@aws-cdk/core';
import * as s3 from '@aws-cdk/aws-s3';

export class S3EncryptionAspect implements cdk.IAspect {
    visit(node: cdk.IConstruct): void {
        // bail out, if we're not dealing with an S3 bucket
        // this aspect _can_ get _all_ resources, including Stacks and Apps
        if (!(node instanceof s3.CfnBucket)) {
            return;
        }

        // bail out if encryption is already specified (e.g. using a custom key)
        if (node.bucketEncryption) {
            return;
        }

        // apply per default server encryption
        node.bucketEncryption = {
            serverSideEncryptionConfiguration: [
                {
                    serverSideEncryptionByDefault: {
                        sseAlgorithm: 'AES256'
                    }
                }
            ]
        };
    }
}

Ok, a few things to note:

It’s s3.CfnBucket, not s3.Bucket: When working “on this level of detail”, you will not use the high level types, but basically directly work with CloudFormation resources. AWS calls them L1 (level one) constructs, which are available as Cfn<Type> and L2 (level two) constructs, which the s3.Bucket above would be, that hide complexity. More on that below. For now note: when executing cdk synth later on, you can see that the Typescript structure of the s3.CfnBucket exactly matches the Properties of the (generated) CloudFormation YAML:

--%<--
  Bucket12341234:
    Type: AWS::S3::Bucket
    Properties:
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
--%<--

Bail out on your own: As advertised, you have to implement the decisions on what kind of resources the aspect is to act upon yourself. The first if condition takes care of that.
Aspects are meant to be re-usable to support D.R.Y.: You won’t write an Aspect to modify one resource in one stack. Make sure that your design is sufficiently flexible and sensible. The best practice from above states only that buckets must be server-side encrypted. There is no reason to overwrite / change an already configured server-side encryption.

Use generic Aspects

The hard part was the writing of the Aspect. Once that is done, it can be as easily applied as the tagging example above. First a quick stack example, that defines a stack which contains an S3 bucket without encryption:

import * as cdk from '@aws-cdk/core';
import * as s3 from '@aws-cdk/aws-s3';

export class S3DemoStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // this is how you would create an encrypted S3 bucket with the high level API
    /* new s3.Bucket(this, 'Bucket', {
      encryption: s3.BucketEncryption.S3_MANAGED,
    }); */

    // for the demo, let's just create a bucket without encryption
    new s3.Bucket(this, 'Bucket')
  }
}

Now the modified bin/ file, which uses the above stack and applies the Aspect:

#!/usr/bin/env node
import 'source-map-support/register';
import * as cdk from '@aws-cdk/core';
import { Aspects } from '@aws-cdk/core';
import { S3DemoStack } from '../lib/s3-demo-stack';
import { S3EncryptionAspect } from '../lib/s3-encryption-aspect';

const app = new cdk.App();
new S3DemoStack(app, 'S3DemoStack');

// apply the aspects to all resources (the aspect will 
// decide itself on what resource it should make changes)
Aspects.of(app).add(new S3EncryptionAspect());

One step further: Compliance in a box

Let’s say you have a multitude of best practices / policies / rules that you want to apply to all the stacks that will be deployed in your organization / multi-stack project, then you can create your own (NPM, ..) package that exports your own App (or Stack or ..) that automatically applies all the things you need to make the managed resources compliant.

Here an example how that could look like using a custom App for the ACME company:

import * as cdk from '@aws-cdk/core';
import { Aspects, Tags } from '@aws-cdk/core';
import { S3EncryptionAspect } from './s3-encryption-aspect';

/**
 * All ACME owned resources must be tagged with service and team
 */
export interface AcmeAppProps extends cdk.AppProps {
    service: string;
    team: string;
}

/**
 * Wrap ACME compliance so that it can be easily re-used
 */
export class AcmeApp extends cdk.App {
    constructor(props: AcmeAppProps) {
        super(props);

        // make sure everything is tagged
        Tags.of(this).add('service', props.service);
        Tags.of(this).add('team', props.team);

        // apply all the aspects that implement your best practices
        Aspects.of(this).add(new S3EncryptionAspect());
        //Aspects.of(this).add(new SQSEncryptionAspect());
        //Aspects.of(this).add(new YourOtherBestPracticeAspect());
        //...
    }
}

And lastly, using the above custom App in your bin/ script:

#!/usr/bin/env node
import 'source-map-support/register';
import * as cdk from '@aws-cdk/core';
import { AcmeApp } from '@acme-aws-cdk/core';
import { ArbitraryServiceStack } from '../lib/arbitrary-service-stack';

const app = new AcmeApp({
    service: 'my-awesome-service',
    team: 'my-awesome-team'
});
new ArbitraryServiceStack(app, 'ArbitraryServiceStack');

And that’s it. Compliance in a box. Nicely done.

Platforming - aka higher level abstraction

When working as a platform engineer your customers are developers. Among other things we build infrastructure tooling and automation that results in a layer of abstraction, which simplifies and hides the underlying complexity. This allows developers to concentrate on their actual work, which is not understanding and defining every last nut and bolt that makes up their service infrastructure, but building outstanding products that create value for the business.

However, in modern DevOps days, developers are at least somewhat responsible for their infrastructure, as in: they own parts of it, they might do on-call for it, they will create and remove resources as they need. They just don’t have - or at least need to have - the same high detail perspective on these resources. Deciding whether to use a NAT gateway or NAT instances should not waste their time. They are concerned about whether they are deploying a production or a development stage, that their services has no outages during deployment, is reliably available afterwards, always provides the performance needed and does all of that cost efficiently.

With this in might, I need to bring up the L1 and L2 constructs AWS CDK provides again. A quick refresh:

L1 constructs, like s3.CfnBucket, directly map onto CloudFormation resources, allow (and require) to work on the same level of detail as you would with CloudFormation YAML
L2 constructs, like s3.Bucket are higher level constructs, that provide a sensible default, keep you from writing boilerplate, take care of glue logic etc

Getting to my point, there is a third type of constructs, which AWS calls pattern construct (all the details here). I will refer to these pattern constructs as L3 (even though AWS does not), just to keep it simple (for me) and clear (for you). To give you an idea what those L3’s can do, here a list of what AWS currently (v1.89.0) provides:

@aws-cdk/aws-route53-patterns: You can specify a single resource (HttpsRedirect), that: Creates a CloudFront that exposes a domain you want to fully redirect ot another, creates an S3 bucket that implements the redirects, creates Route 53 alias records to route the domain to the CloudFront and uses the ACM to issue all the needed certs.

@aws-cdk/aws-ecs-patterns: Comes with two L3 constructs, which both ECS tasks in a provided ECS cluster and setup an ALB in-front, that routes all traffic to a set of tasks (containers). Flavors are:

ApplicationLoadBalancedEc2Service: Uses custom ECS with runs on EC2 instances
ApplicationLoadBalancedFargateService: Uses Fargate ECS tasks

A lot of ready-made AWS CDK solution constructs: or alternatively on Github

I hope this gives you an idea why I am particular exited about the possibility to encapsulate higher level platform patterns in AWS CDK modules, which can then easily be re-used by others.

However, what AWS CDK comes with is only where it starts. Consider your own patterns, that you see in your organization or across your projects. Just thinking how much time that can save for me makes me feel warm and fuzzy. Not to speak of the impact this can have on standardization.

For example, say in your organization most services run behind an ALB on EC2 instances, which are described in a template and scaled via auto scaling. Some have also CloudFront in-front. Some use RDS databases, or mount EBS volumes, or EFS, or .. There you have an L3 construct in the planning.

Or from another side: Any vendor can provide AWS CDK modules, so you can launch their product in different flavors (EC2 vs ECS vs EKS vs ..), completely encapsulated with a nice interface, as another L3 construct.

Long story short: This is more than practical, this is absolutely fantastic.

Is it then all rainbows and butterflies?

Well, there are always some clouds on the horizon (cheap pun intended). As AWS CDK is “only” version one, there are learnings and feedbacks I like to share. Also, as a native Berliner, I need to do my part in upholding our reputation as complainers. The highest praise a Berliner offers for even the most delicious food: It wasn’t bad (which can be understood as a complaint in itself). With that in mind:

The rapid version change annoyance

AWS CDK uses semver packages are versioned like MAJOR.MINOR.PATCH as in 1.89.0 at the time of this writing this article. If you have a look at the changelog, you will find that about every 7-10 days a new minor version is released. Those contain many bug fixes, but also newly added features caused by new product or new feature releases. The active development is of course great, but comes at a price. Especially since AWS merrily adds non-stable features within their packages, which (of course) will see breaking changes.

This has consequences for the day-2-day development. Nearly every time I get back to a project, AWS CDK will have moved a few minor versions ahead. So any newly added @aws-cdk/* packages will come in a later version than any already installed @aws-cdk/* package. While semver defines that minor version changes must be non-interface breaking, that does not apply for experimental code.

Mind, I am not referring to (wholly) experimental packages (which exist), but to experimental parts of otherwise stable packages. This also does not affect only newly released products or features, but long established APIs.

Anyway, understand this as a heads-up, it can sneak up on you. Make sure to read documentation attentively (watch for that @experimental note). Consider to stick with a (minor) version longer than AWS does. Consider setting aside maintenance time to stay up2date and address resulting API changes in your code.

In short: continuous development is appreciated. Taking the time to come up with well thought out interfaces that allow rapid development is appreciated. Experimental code, hidden among the stable, can give you stomach pains, if you are not careful.

Referencing existing resources

The AWS CDK allows you to use high-level objects where CloudFormation YAML would expect e.g. ARNs or IDs. As shown with the VPC example, they try to keep the boilerplate down. Another good illustration of that point is how you grant IAM permissions in AWS CDK. Consider the following excerpt, which create a new S3 bucket and a new user and then grants the user read and write permissions to that S3 bucket:

const user = new iam.User(this, 'User');
const bucket = new s3.Bucket(this, 'Bucket');
bucket.grantReadWrite(user);

Aside from the AWS::IAM::User and the AWS::S3::Bucket, this will also create an AWS::IAM::Policy that implements the read / write permissions for the user to the bucket. Neat.

Now, where is the problem? Well, in the above example, all the resources used were created in the same context (e.g. Stack). Now, if you have existing resources, and want to use them in your stack, then you need to access them differently.

Say you want to replicate the same above, just the user was created in a different stack or maybe even manually, then you need to “load” it, instead of creating it. AWS CDK provides for that purpose static from* methods on the resource types. Here is how you would load a User whose ARN you know:

const user = iam.User.fromUserArn(this, 'User', 'arn:..of:the:user');
const bucket = new s3.Bucket(this, 'Bucket');
bucket.grantReadWrite(user);

Still, there is no problem. Seems straight forward: that nice fromUserArn function has a simple signature. Aside from the scope (this) and the id ('User') there is only the ARN that is required. That means: You can load a User just by knowing their unique ARN. Makes sense, no?

Now we are arriving at the actual problem, because that simple signature from above is not consistently available at all. For example, when working with private namespaces in ECS (if you haven’t, consider them as something like private Route53 Hosted Zones), then you need to do that:

const namespace = servicediscovery.PrivateDnsNamespace.fromPrivateDnsNamespaceAttributes({
    namespaceArn: 'arn:..of:the:namespace',
    namespaceId: 'namespace123-id',
    namespaceName: 'namespace123',
});

To be clear: All three attributes are mandatory, I did not add them for fun. As you might also note: The namespace ARN is a globally unique identifier. Same goes for the ID, which is sufficient for the AWS CLI requires to fetch a namespace. So why do you need to provide two unique identifiers and another attribute? Well, if you look into the implementation, you’ll find no “loading” is happening at all. It’s just filling the interface requirements, and needs you to fill in what it cannot derive from the environment and context.

As I said, I like to complain. I am not sure at all whether this just reflects the early state (v1 after all), or whether this is intentionally API design. I stumbled across a comment from a core contributes that might imply they want to be very strict about determinism - but having SSM value loading, allowing to package and upload files to S3 during build and even build and upload docker container images .. I figure that determinism train has left the station a while ago. (Yes I know that happens during synthesizing, so could a convenient load<Something>FromOnly<UniqueIdentifier>() function..).

Huge, show-stopping problem? No. Inconvenient and bothersome? Yes.

Summary

I don’t want to end at a negative note at all, as for me AWS CDK is a great tool! It’s a marvelous piece of software design, fun to work with and addresses usability issues that were not tackled by the first generation. It makes maintainability significantly less painful and less error prone. I give it my strong recommendation when working exclusively - or at least mainly - with AWS.

I am exited to see what the upcoming v2 will bring!

Ulrich Kautz Blog