Reliability

Distributed system design, recovery planning, and adapting to changing requirements

1: Versioning
2: Same Region Replication
3: Cross Region Replication

This pillar focuses on the business and technical aspects of your workload and data. What sort of recovery time is acceptable to the business if there is a failure? How long do you keep backups for? Do you need to go global or is a single AZ the right choice for you and the business? We’ll also cover changes and incident response and recovery as well as change management.

Reliability in AWS regions/zones/s3

Letis first talk about how AWS is built. There are regions such as eu-west-1 and us-west-2. These are geographic locations for the clusters of data centres AWS opperate. They are also largely separated in most cases protecting them from a single region failure, it’s not 100% fault tollerent but outages that affect all regions at once are a rare occassion because of region isolation. Within a region you have zones, for example eu-west-1a, eu-west-1b and eu-west-1c. These zones are huge, and not just a single data centre, they are clusters of datacentres that are hyper local to each other, however, each zone may be miles apart from another.

This gives you a massive amount of reliability if you run in all the zones in a region when coming to storage or compute. In Amazon S3 Standard your data is replicated in all three zones within a region and you are protected from a single zone failure. In fact untill you move into S3 One Zone Infrequent Access (S3 OneZone-IA) you are protected against failure. If you compare this to you datacentres you effectively have DR from the start with having to do anything!!!!

Some companies may want to go further though and this is where a skilled Solutions Architect will talk to the business and see if they want to go beyond the high avaliability in a region and set up replication to another region. The trade off here is cost and it will double your data store costs plus network transfer fees.

1 - Versioning

Keep multiple versions of your objects

OpEx Sec Rel Perf Cost Sus

S3 allows you to keep multiple copies/variants of an object. This is really useful if you are updating a file and writing over the top but may need to revert to an older version for DR or other events where you may wish to restore a preserved object.

To enable versioning on S3 you do this at the bucket level and in terraform its super simple:

    versioning = {
      enabled = true
    }

Versioning even saves you from accidental deletion. If you delete an object when versioning is enabled, S3 flags the object with a delete marker instead of removing it permanently. By default, S3 Versioning is disabled on buckets, and you must explicitly enable it.

Unless you are using SDK’s you are best using the AWS console, if you dive into the S3 console open a bucketed and look at an object thats recently been updated, you’ll see the tags and how to restore to a previous version.

Versioning in the S3 console

Technical considerations

Versioning is a great get out of jail free card should something (or someone) go wrong, it lets you got a stable earlier version. There is however a consideration on how long you keep the versions for and you’ll need to discus this with the business.

Business considerations

If you keep multiple versions you end up with higher storage costs, so you’ll need to agree with the technical team if it’s needed or backups from another system would be better suited.

2 - Same Region Replication

How S3 stores and replicates your data

OpEx Sec Rel Perf Cost Sus

Amazon S3 now supports automatic and asynchronous replication of newly uploaded S3 objects to a destination bucket in the same AWS Region. Amazon S3 Same-Region Replication (SRR) adds a new replication option to Amazon S3, building on S3 Cross-Region Replication (CRR) which replicates data across different AWS Regions.

Data stored in any Amazon S3 storage class, except for S3 One Zone-IA, is always stored across a minimum of three Availability Zones, each separated by miles within an AWS Region. SRR makes another copy of S3 objects within the same AWS Region, with the same redundancy as the destination storage class. This allows you to automatically aggregate logs from different S3 buckets for in-region processing, or configure live replication between test and development environments. SRR helps you address data sovereignty and compliance requirements by keeping a copy of your objects in the same AWS Region as the original.

When an S3 object is replicated using SRR, the metadata, Access Control Lists (ACL), and object tags associated with the object are also part of the replication. Once SRR is configured on a source bucket, any changes to the object, metadata, ACLs, or object tags trigger a new replication to the destination bucket.

In this example we are going to create a AWS bucket which will be a source and a destination bucket which is where we will replicate the too, the code can be found in the chapter3/001 folder. The example create both buckets and the replication policy,

resource "aws_iam_policy" "replication" {
  name        = var.iam_role_name
  description = "Replication Policy"
  policy      = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObjectVersionForReplication",
                "s3:GetObjectVersionAcl"
            ],
            "Resource": [
                "arn:aws:s3:::${aws_s3_bucket.source.id}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetReplicationConfiguration"
            ],
            "Resource": [
                "arn:aws:s3:::${aws_s3_bucket.source.id}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ReplicateObject",
                "s3:ReplicateDelete",
                "s3:ReplicateTags",
                "s3:GetObjectVersionTagging"
            ],
            "Resource": "arn:aws:s3:::${aws_s3_bucket.destination.id}/*"
        }
    ]
}
EOF
}

This policy allows for replication to occur and in order for it to work we must attach it to a role:

resource "aws_iam_role" "replication" {
  name = var.iam_role_name
  tags = local.global_tags

  assume_role_policy = <<EOF
{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Principal":{
            "Service":"s3.amazonaws.com"
         },
         "Action":"sts:AssumeRole"
      }
   ]
}
EOF

}

Finally we create a resource so we can use the ARN’s in creating our source bucket. You’ll see in the code that we have created a rule that copies everything in /copy_me to the destination bucket:

replication_configuration {
    role = aws_iam_role.replication.arn
    rules {
      id     = "destination"
      prefix = "DIRECTORY_NAME/"
      status = "Enabled"

      destination {
        bucket        = aws_s3_bucket.destination.arn
        storage_class = "STANDARD_IA"
      }
    }
  }

Once you’ve run terraform on this you simply need to drop a file in the copy_me folder and on the source bucket and then go have a look in the destination bucket.

3 - Cross Region Replication

Replicate data to another region for backup

OpEx Sec Rel Perf Cost Sus

Amazon S3 cross region replication can be used for a few reasons. You may wish to have the data backed up 100’s of miles away from your origin region for regulation reasons, you can also change acccount and ownership to prevent against accidental data loss. Another region maybe that you want to move data closer to the end user to reduce latancy. You can set cross region replication at a bucket level, a pre defined prefix or even on an object level with the correct tags.

There’s a module for that

(https://github.com/asicsdigital/terraform-aws-s3-cross-account-replication)[https://github.com/asicsdigital/terraform-aws-s3-cross-account-replication]

Instead of reinventing the wheel it sometimes makes sense to use prebuilt modules that will get you up and running more quickly. This module helps you simply configure replication across regions and would be my recommended choice for the job in hand. It’s supper simple and requires the following terraform variables:

Required

source_bucket_name - Name for the source bucket (which will be created by this module)
source_region - Region for source bucket
dest_bucket_name - Name for the destination bucket (optionally created by this module)
dest_region - Region for the destination bucket
replication_name - Short name for this replication (used in IAM roles and source bucket configuration)
aws.source - AWS provider alias for source account
aws.dest - AWS provider alias for destination account

Optional

create_dest_bucket - Boolean for whether this module should create the destination bucket
replicate_prefix - Prefix to replicate, default "" for all objects. Note if specifying, must end in a /

Usage

provider "aws" {
  alias   = "source"
  profile = "source-account-aws-profile"
  region  = "us-west-1"
}

provider "aws" {
  alias   = "dest"
  profile = "dest-account-aws-profile"
  region  = "us-east-1"
}

module "s3-cross-account-replication" {
  source             = "github.com/asicsdigital/terraform-aws-s3-cross-account-replication?ref=v1.0.0"
  source_bucket_name = "source-bucket"
  source_region      = "us-west-1"
  dest_bucket_name   = "dest-bucket"
  dest_region        = "us-east-1"
  replication_name   = "my-replication-name"

  providers {
    "aws.source" = "aws.source"
    "aws.dest"   = "aws.dest"
  }
}

output "dest_account_id" {
  value = "${module.s3-cross-account-replication.dest_account_id}"
}

Amazon Replication Time Control

Amazon S3 replication time control helps you meet compliance “or business requirements” for data replication and provides visibility into Amazon S3 replication activity. Replication time control replicates most objects “that you upload” to Amazon S3 in seconds, and 99.99 percent of those objects within 15 minutes. S3 Replication Time Control, by default, includes S3 replication metrics and S3 event notifications, with which you can monitor the total number of S3 API operations that are pending replication, the total size of objects pending replication, and the maximum replication time.

Business considerations

Consider your users, if you have large data assets (video or audio, etc) then you’ll want this to be closer to the end user so they get a good experience. Even when using cloudfront, it can take time to come from eu-west-1 to ap-southeast-2 for example. So for smooth a smooth user experience consider this option. You may also have a regualtory reason to make sure backups exist 100’s of miles from the origin.