What is S3?

What can you do with S3?

1: File Storage
2: Object Storage

Amazon S3 is a Simple Storage Service, it’s an object-store. An object store is different from traditional file systems as in it is a flat structure. This is the strangest bit to wrap your head around because when you see files on an object store you often see them in a way that represents folders, don’t be fooled it’s all flat (unlike the earth). We call the files we place in the store objects and they are comprised of the data and the metadata, the store itself we call a bucket. You access S3 in a multitude of ways, there’s an API with SDK’s for many languages (python,go,nodeJS,Java,.net, plus many more), there’s an SFTP service you can run, console access or you can use 3rd party applications like cyberduck, you can even mount it as a file system with a set of tools we’ll talk about later. When using S3 it used to be the case that there was eventual consistency, this came from the fact that when you made an API request to upload a file it took time for all the other endpoints to acknowledge the file in the store, so if you tried a read after write API call it could sometimes fail. You’ve probably read this on the internet a few times. However, the good news is that it is no longer true, as S3 supports strong consistency and you can do read’s immediately after a write these days.

We are going to learn in this book how we can use S3 to:

Secure S3
Automate creation and lifecycle policies
Store files (large and small)
Host simple web sites
Trigger actions in lambda,SNS and EventBridge

To understand how an object store differs to file storage lets look at how each work.

1 - File Storage

What is file Storage?

File storage puts data in folder-like structures and a hierarchical file. The file is stored as one file, one chunk of data and is not broken down into smaller blocks. Folders can be nested and in order to retrieve the file, the entire directory path is required to get to the data. Lots of NAS systems you buy at-home use this type of storage because it’s cheaper than block storage. It’s very good for lots of small files and can potentially scale to millions of files. The file system is generally local, as in one physical location. File storage also stores limited metadata with the file, typically, date created, file, and path. If you think of file storage like a multi-story car park, you drive in and park in a spot. In order to get your car back, you need to remember the car park location, the floor you were parked on, and the space in which you left the car. All of those things are needed to successfully get your car back.

File and Folder Hierarchical Structure

2 - Object Storage

What is Object Storage?

Object storage on the other hand is a flat storage system of unstructured data. The files are stored in whole or part along with metadata and unique identifiers that are used to locate and identify the data you are accessing. The metadata tags have much more detail, such as location, owner, creation timestamp, file type, and many more. The data and the metadata are bundled together and placed into a flat storage pool for retrieval later. It is a low-cost approach to storage allowing systems to massively scale and even be dispersed globally. It typically has a slightly longer latency for retrieval than a file storage system and you generally access it via REST API calls rather than in a POSIX nature (but we’ll chat about this later. In our car park analogy this time we drive to the car park and hand the keys to a valet, who in turn gives us a ticket. The car is parked for us in a massive ground-level car park that stretched for miles and when we return the ticket the car is fetched from the right space. S3 is an object so fits this description and it’s big, I’ll dive into more details in chapter 2.

Flat structure of an Object store, data and metadata

The main differences between the two are summarized in this table below:

WA Pillar	File Storage	Object Storage
Operational Excellence	A limited number of metadata tags	Customizable metadata tags with no limits
Reliability	Typically on one physical site	Can be scaled globally across multiple regions
Reliability	Scales to millions of files	Scales infinitely beyond petabytes
Performance Efficiency	Best performance with small files	Great for large files and high concurrency and throughput
Cost Optimization	SAN solutions can be expensive	Cheaper and cloud providers only charge for what you use

Object storage is very cost-efficient for many use cases. It also has several tiers of storage available to the end-user. These tiers have pros and cons and you can find out more in chapter 5.