This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Cost Optimization

The Cost Optimization pillar includes the ability to run systems to deliver business value at the lowest price point

The real art of any system is getting to grips with costs and maximizing the margins of your product. In terms of Amazon S3, this comes down to the storage medium you are using, you want to balance the best price with the performance and reliability pillars and it can be a tough line to walk.

If you are storing data on S3 rather than EBS (elastic block storage) volumes, EFS (elastic file storage), FSx for Luture or even running propriatory cloud storage systems in the cloud, you are already doing well with cost optimization. By the way, Those propriatory systems are just using the same fundermental building blocks you have like EBS and adding a few features, seldom will they save you money. So the real trick to saving money in S3 is down to the data, you’ll need to identify hot data thats always accessed and make sure that data thats cold gets moved to a cheaper teir. Also this sounds simple, if you don’t need the data and the business agree delete it! I’ve seen customers with 8 years worth of snapshots for EBS and 10 years worth of daily tgz files stored in an s3 bucketTo make matters worse there where several buckets identical which lead to approxomately 100TB of storage that was never accessed, or likely to be! If you are wondering what that cost it was ~$24500.00 a month!!!! Yeah I know!

1 - Tiering

Understanding different tier cost and performance

OpEx Sec Rel Perf Cost Sus

Let’s have a talk about the different storage tiers in S3. Amazon S3 offers a plethora of storage classes that you can choose from to best suit your needs around data access, resiliency, and cost requirements. Each storage class has it’s merrits and in this section we’ll look at which ones will be best for you. If you don’t know which class of storage is right for your workload, or you have a workload with unpredictable patterns of usage you can even use S3 Intelligent-Tiering which automaticcally moves your data to the right class based on usage patterns, which will lean to lower costs for storaging your data. The there is S3 Standard) for frequently accessed data, the type of things you may access daily. This features

  • Low latency and high throughput performance
  • Designed for durability of 99.999999999% of objects across multiple Availability Zones (within a region)
  • Resilient against events that impact an entire Availability Zone
  • Designed for 99.99% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability
  • Supports SSL for data in transit and encryption of data at rest
  • S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes

The trade off here is that it costs more to have all those feature than other tiers. Saying that the cost isn’t hugh for storage in S3 which is one of it’s main attractions at $0.023 per GB per month.

S3 Standard-Infrequent Access (S3 Standard-IA) at first glance looks almost identical to Standard S3. However it really is designed for files you don’t open often. The costs is approx $0.0.134 per GB per Month but if you switch this tier you may increate your API / object call pricing thus removing andy benefit.

  • Same low latency and high throughput performance of S3 Standard
  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Resilient against events that impact an entire Availability Zone
  • Data is resilient in the event of one entire Availability Zone destruction
  • Designed for 99.9% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability
  • Supports SSL for data in transit and encryption of data at rest
  • S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes

S3 One Zone-Infrequent Access (S3 One Zone-IA) for less frequently accessed data. For me this is where it gets interesting. If you have data thats not accessed on a regular basis and can be recreated easily S3 One-Zone-IA could be for you. As you may have guessed instead of your data being replicated in all AZ’s in a region this is going to live only in one. The Pay back for that is it’s even cheeper at $0.01048 per GB per Month. This risk to you though is that if an AZ has an issue you could loose the data.

  • Same low latency and high throughput performance of S3 Standard
  • Designed for durability of 99.999999999% of objects in a single Availability Zone†
  • Designed for 99.5% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability
  • Supports SSL for data in transit and encryption of data at rest
  • S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes

Now lets take a look at the Glacier side of things. Glacier is cold storage designed for archiving data that you may never touch again or only in very certain situations. Think of a audit happening at work and they need records from 6 years ago. It would be expensive to store 6 years worth of data in standard S3 so this is where Glacier comes in.

Released in 2021 S3 Glacier Instant Retrieval is for archive data that needs immediate access. With Glacier you normally have a time delay to get the data back you are archiving, anywhere between 1-12 hours but Instant retrieval does what it says on the tin and you can instantly get your databack. As long as this is the sort of data you only access once a quarter it could be a good place for you to store data that fits this requirement. With a cost $0.005 per GB per hour for data you may need to access its a good place to store it.

  • Data retrieval in milliseconds with the same performance as S3 Standard
  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Data is resilient in the event of the destruction of one entire Availability Zone
  • Designed for 99.9% data availability in a given year
  • 128 KB minimum object size
  • Backed with the Amazon S3 Service Level Agreement for availability
  • S3 PUT API for direct uploads to S3 Glacier Instant Retrieval, and S3 Lifecycle management for automatic migration of objects

S3 Glacier Flexible Retrieval (formerly S3 Glacier) for rarely accessed long-term data that does not require immediate access is yet another tier of storage, With a slightly lower cost of $0.00408 per GB per Month. It doesn’t sound like a big reduction but when you are storing petabytes of data it can really add up quickly. The downside is that you need to put in a request for this data to be retrieved and it can take 1-12 hours to do so.

  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Data is resilient in the event of one entire Availability Zone destruction
  • Supports SSL for data in transit and encryption of data at rest
  • Ideal for backup and disaster recovery use cases when large sets of data occasionally need to be retrieved in minutes, without concern for costs
  • Configurable retrieval times, from minutes to hours, with free bulk retrievals
  • S3 PUT API for direct uploads to S3 Glacier Flexible Retrieval, and S3 Lifecycle management for automatic migration of objects

Finally Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive) for long-term archive and digital preservation with retrieval in hours at the lowest cost storage in AWS’s object storage tiers at $0.0018 per GB per Month, the downside is it takes 12 hours to restore your objects back into S3.

  • Designed for durability of 99.999999999% of objects across multiple Availability Zones
  • Lowest cost storage class designed for long-term retention of data that will be retained for 7-10 years
  • Ideal alternative to magnetic tape libraries
  • Retrieval time within 12 hours
  • S3 PUT API for direct uploads to S3 Glacier Deep Archive, and S3 Lifecycle management for automatic migration of objects

There is technically another class of S3 storage that lives on AWS Outposts, which are appliances that you can store in your own DC but this book isn’t going to cover that but can get more information here: https://aws.amazon.com/outposts/

2 - Intelligent Tiering

Using the best price storage with intelligent tiering

OpEx Sec Rel Perf Cost Sus

When I tell you intelligent tiering is going to automatically save you money you might seem a little skeptical, but it really can in the right hands. The thing is there is a catch, but it’s not a big one. That is that aws charge you a small fee, but it really is small unless you have billions of objects in S3 as it’s about $0.10 per million objects monitored a month.

S3 Intelligent-Tiering is the ideal storage class for data with unknown, changing, or unpredictable access patterns, independent of object size or retention period. You can use S3 Intelligent-Tiering as the default storage class for virtually any workload, especially data lakes, data analytics, new applications, and user-generated content. The only caviet is that it doesn’t include objects under 128 KB, however they also don’t count in the charges for automation. These small objects remain in the frequent access tier.

How it works

Intelligent tiering is designed to optimise where you data is stored based on access patterns from daily use. It will then move lesser used objects onto cheaper tiers of S3 storage, when those patterns of access change it can also move the data back into the higher tiers. It’s configured with three tiers, frequent use, infrequent use and a very low cost tier for data rarely accessed. After 30 days of an not being access ed Intelligent tiering automatically moves the data to the lower cost storage S3 IA which should save you about 40% on your storage costs. After 90 days of no access Inteligent Tiering will move the data to Glacier Instant Access which brings your savings to 68% over standard storage costs. You can aditionally configure and opt in to Glacier Deep Archive which would boost your savings to 95%.

3 - Life Cycle Policies

Save money on storage

OpEx Sec Rel Perf Cost Sus

In order to maximise your savings when using object storage you have to use the right tier for that data. You will need to balance cost against speed, reliability and avaliability. AWS Glacier Deep Archive is the cheapest storage but is a 12 hour wait to get your data back going to be a anacceptable buisiness requirement. When you decide the right balance in order to move the data between tiers you can imppliment Life Cycle policies. An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. There are two types of actions:

Transition actions – These actions define when objects transition to another storage class. For example, you might choose to transition objects to the S3 Standard-IA storage class 30 days after creating them, or archive objects to the S3 Glacier Flexible Retrieval storage class one year after creating them.

There are costs associated with lifecycle transition requests and moving between tiers, the costs are minimal and however the savings of using a cheaper storage should be greater.

Expiration actions – These actions define when objects expire. Amazon S3 deletes expired objects on your behalf.

Managing object lifecycle

Define S3 Lifecycle configuration rules for objects that have a well-defined lifecycle. For example:

  • If you upload periodic logs to a bucket, your application might need them for a week or a month. After that, you might want to delete them.
  • Some documents are frequently accessed for a limited period of time. After that, they are infrequently accessed. At some point, you might not need real-time access to them, but your organization or regulations might require you to archive them for a specific period. After that, you can delete them.
  • You might upload some types of data to Amazon S3 primarily for archival purposes. For example, you might archive digital media, financial and healthcare records, raw genomics sequence data, long-term database backups, and data that must be retained for regulatory compliance.

With S3 Lifecycle configuration rules, you can tell Amazon S3 to transition objects to less-expensive storage classes, or archive or delete them.

Using Terraform to manage your lifecycle

If we take a look at example code we have been working on we can add the required section in there.

  lifecycle_rule = [
    {
      id      = "log"
      enabled = true
      prefix  = "log/"

      tags = {
        rule      = "log"
        autoclean = "true"
      }

      transition = [
        {
          days          = 30
          storage_class = "ONEZONE_IA"
          }, {
          days          = 60
          storage_class = "GLACIER"
        }
      ]

      expiration = {
        days = 90
      }

      noncurrent_version_expiration = {
        days = 30
      }
    },
    {
      id                                     = "log1"
      enabled                                = true
      prefix                                 = "log1/"
      abort_incomplete_multipart_upload_days = 7

      noncurrent_version_transition = [
        {
          days          = 30
          storage_class = "STANDARD_IA"
        },
        {
          days          = 60
          storage_class = "ONEZONE_IA"
        },
        {
          days          = 90
          storage_class = "GLACIER"
        },
      ]

      noncurrent_version_expiration = {
        days = 300
      }
    },
  ]

The rule above has multiple parts. Anything in the log prefix (folder) will move to One Zone IA, After 60 days the data will move to Glacier and after 90 days the data will be deleted. Theres also a second rule for the profix log1, it’ll move data to Standard IA after 30 days and then into One Zone IA after 60 days, after 90 days they’ll move to Glacier and then finially 300 days later the data will be deleted. You will have noticed that the first rule is applied in a different way to the other. The second rule is used to clean up none current versions of files. As files change you can end up with a big long list of old versions. So the second rule is very importnat.

Technical considerations

Rememeber to clean up the versions, they’re not visable but can add a sizable cost to the monthly bill if not managed. If you are underr GDPR you need to work out a way of expunging a persons data in these versions and backups also.

Business considerations

The business should supply the information for retention policies. Theres often rule and regulation around how long you must store data for.

4 - S3 Lens

Monitor and act on your usage

OpEx Sec Rel Perf Cost Sus

Amazon S3 Storage Lens provides a single pane of glass approach to your object usage and activity across your entire Amazon S3 storage, either in a single account or across an organization.

You can collect activity metrics together and display them in a dashboard. You can even download the data in CSV or Parquet format.

Configuration

Amazon S3 Storage Lens requires a configuration that contains the properties that are used to aggregate metrics on your behalf for a single dashboard or export. This includes all or partial sections of your organization account’s storage, including filtering by Region, bucket, and prefix-level (available only with advanced metrics) scope. It includes information about whether you chose free metrics or advanced metrics and recommendations. It also includes whether a metrics export is required, and information about where to place the metrics export if applicable.

Default dashboard

The S3 Storage Lens default dashboard on the console is named default-account-dashboard. S3 preconfigures this dashboard to visualize the summarized insights and trends of your entire account’s aggregated storage usage and activity metrics, and updates them daily in the Amazon S3 console. You can’t modify the configuration scope of the default dashboard, but you can upgrade the metrics selection from free metrics to the paid advanced metrics and recommendations. You can also configure the optional metrics export, or even disable the dashboard. However, you can’t delete the default dashboard.

The screenshoot below shows a sample of the default dashboard and if you are a graph nerd like myself or good friend Kieren you are in for a treat:

Example S3 Lens Default Dashboards