Lots of comments here talking about how great S3 is.
Anyone willing to give a cliff notes about what's good about it?
I've been running various sites and apps for a decade, but have never touched S3 because the bandwidth costs are 1, sometimes 2 orders of magnitude more expensive than other static hosting solutions.
One underappreciated feature of S3 - that allowed it to excel in workloads like the Tables feature described in the article - is that it's able to function as the world's highest throughout network filesystem. And you don't have to do anything to configure it (as the article points out). By storing data on S3, you get to access the full cross-sectional bandwidth of EC2, which is colossal. For effectively all workloads, you will max out your network connection before S3's. This enables workloads that can't scale anywhere else. Things like data pipelines generating unplanned hundred-terabit-per-second traffic spikes with hotspots that would crash any filesystem cluster I've ever seen. And you don't have to pay a lot for it- once you're done using the bandwidth, you can archive the data elsewhere or delete it.
You've totally hit the nail on the head. This is the real moat of S3, the fact that they have so much front-end throughput available from the gigantic buildout that folks can take advantage of without any capacity pre-planning.
There are a few things about S3 that I find extremely powerful.
The biggest is, if I need to store some data. I know what the data is (so I don't need to worry about needing to in a moment notice traverse a file structure for example, I know my filenames), I can store that data, I don't need to figure out how much space I need ahead of time and it is there when I need it. Maybe it automatically moves to another storage tier to save me some money but I can reliably assume it will be there when I need it. Just that simplicity alone is worth a lot, I never need to think later that I need to expand some space, possibly introducing downtime depending on the setup, maybe dealing with partitions, etc.
Related to that is static hosting. I have ran a CDN and other static content out of S3 with cloud front in front of it. The storage cost was almost non existent due to how little actual data we were talking about and only paid for cloudfront costs when there were requests. If nothing was being used it was almost "free". Even when being used it was very cheap for my use cases.
Creating daily inventory reports in S3 is awesome.
But the thing that really is almost "magic" once you understand its quirks. Athena (and quick sight built on top of that and similar tools). The ability to store data in S3 like inventory reports that I already mentioned, access logs, cloud watch logs, or any structured data that you may not need to query often enough to warrant a full long running database. It may cost you a few dollars to run your Athena query and it is not going to be super quick, but if you know what you're looking for it is amazing.
S3 has often fallen into a "catch all" solution for me whenever I need to store data large enough that I don't want to keep it in a database (RDBMS or Redis).
Need to save a file somewhere? Dump it in S3. It's generally affordable (obviously dependent on scale and use), fast, easy, and super configurable.
Being able to expose something to the outside, or with a presigned URL is a huge advantage as well.
Off the top of my head, I think of application storage generally in this tier ordering (just off the top of my head based on the past few years of software development, no real deep thought here):
1. General application data that needs to be read, written, and related - RDBMS
2. Application data that needs to be read and written fast, no relations - Redis
3. Application data that is mostly stored and read - S3
Replace any of those with an equivalent storage layer.
Do you need to replace your SFTP server? S3. Do you need to backup TB of db files? S3. Do you need a high performance web cache? S3. Host your SPA? S3 backs cloudfront. Shared filesystem between desktop computers? Probably a bad idea but you can do it with S3. Need a way for customers to securely drop files somewhere? Signed S3 URI. Need to store metrics? Logs? S3. Load balancer logs? S3. And it's cheaper than an EBS volume, and doesn't need resizing every couple of quarters. And there are various SLAs which make it cheaper (Glacier) or more expensive (High Performance). S3 makes a great storage backend for a lot of use cases especially when your data is coming in from multiple regions across the globe. There are some quibbles about eventual consistency but in general it is an easy backend to build for.
The bandwidth is free within AWS, the ingress is also free
I've used it lately in the following setup
- stuff on internet push their data into my bucket => this is mostly free (you only pay s3 operations)
- on object creation, an event is fired and a lambda is spawned => this is free
- the lambda code reads the object => this is mostly free (again, you only pay s3 operations)
- the lambda process the data, trim it, repackage it, compress it => you pay for the compute
- the lambda store the resulting data somewhere else and delete the s3 object => you pay the egress
S3 is great for being able to stick files somewhere and not have to think about any of the surrounding infrastructure on an ongoing basis [1]. You don't have to worry about keeping a RAID server, swapping out disks when one fails, etc.
For static hosting, it's fine, but as you say, it's not necessarily the cheapest, though you can bring the cost down by sticking a CDN (Cloudflare/CloudFront) in front of it. There are other use cases where it really shines though.
[1]: I say ongoing basis because you will need to figure out your security controls, etc. at the beginning so it's not totally no-thought.
Anyone willing to give a cliff notes about what's good about it?
I've been running various sites and apps for a decade, but have never touched S3 because the bandwidth costs are 1, sometimes 2 orders of magnitude more expensive than other static hosting solutions.