Amazon Simple Storage Service (S3) was one of the first cloud storage services launched by Amazon back in 2006. Offering highly durable and always available storage, S3 allows you to easily store objects of any size on Amazon’s cloud. In addition to object storage, AWS also offers file or block storage along with many other services to support small companies and enterprises alike in offloading their data to the cloud. Amazon’s massive outage in the US-EAST-1 region in 2017, which affected more than 150,000 websites, demonstrates just how critical S3 is to the well-being of the internet.
In this two-part series, I cover the basics of cloud storage and compare the most popular cloud storage services from the three major public cloud providers (AWS, Azure, and GCP) to help you determine which is right for you. Here, I discuss Amazon Web Services (AWS). In Part 2, I will focus on Microsoft Azure and Google Cloud storage options.
What Is Cloud Storage?
Cloud storage is any storage purchased from a third-party provider that stores, manages, and operates your data on the internet. One of the main advantages of cloud storage is that you aren’t weighed down by storage as you are with on-premises systems (the size in GB or TB, for example). Cloud storage is also more cost efficient, since you don’t need to invest time and money purchasing, designing, and administering your own storage infrastructure (servers, network, and applications).
Using a cloud storage service offers numerous benefits:
- Durable storage: The storage is automatically replicated to different data centers—something companies traditionally needed to handle themselves.
- Cost-effective: Managing your own storage infrastructure requires a significant budget up front, while the cloud is built on a pay-as-you-go model, so you only pay for what you consume.
It’s no surprise then that thousands of companies worldwide, and enterprises in particular, have chosen the cloud migration path over the last decade, often solely for storage needs.
So, let’s talk storage options.
Are you a tech blogger?
Cloud Storage Types
There are three different types of cloud storage: object, block, and file.
Object storage manages your data as complete objects, making it suitable for modern web applications that retrieve data via REST API calls. This type of cloud storage is often used to back up or archive your files for big data analytics.
Block storage stores data as evenly sized blocks, where each block has an “address” (a combination of block sectors and tracks). Block storage can only be used if attached to a physical or virtual server, which formats the underlying filesystem to be used. The downside here is that formatted block storage can only be attached to one physical or virtual server at a time. This means when provisioning a block storage volume, you can only use it from a single EC2 instance on AWS, for example.
File storage combines elements of both object and block storage: The data is organized in files, but the storage type itself requires it be mounted as a filesystem. The advantage over block storage is that file storage volumes can be mounted to multiple servers at the same time. In terms of performance and latency, however, block storage has the upper hand.
Considerations for Choosing a Cloud Storage Service
Following are some important considerations when choosing a public cloud storage service:
- Which cloud storage type will you use?
- Object storage
- Block storage
- File storage
- Which use cases do you need to support?
- Back-up and archival
- With accompanying cloud compute services (e.g., running your own servers in the cloud)
- Database migration (SQL vs. NoSQL)
- Big data analysis (e.g., Hadoop, Spark, Flink)
- Machine learning/artificial intelligence (ML/AI) workloads
- What are your security and compliance requirements? Are you bound by specific security or regulatory standards when it comes to how and where your data should be stored, for example:
- Bring-your-own encryption (BYOE), also known as bring your own key (BYOK)
- Long-term archiving
- What type of data connectivity do you require?
- Dedicated connection
- VPN over the internet
- Public access
These are all questions your organization’s infrastructure, security, and compliance teams should be able to answer.
Once you’ve clarified your specific needs, it’s time to choose a cloud storage service. In the next section, I cover the different cloud storage services offered by AWS.
AWS Cloud Storage Services
AWS offers a range of services to cover all your cloud storage needs. In this section, I provide an overview of the four most popular storage options.
Amazon S3 is AWS’ most popular cloud storage service. By nature, it is an object store service, offering 99.999999999% durability (the “eleven 9s of durability”) and no limit on the number of objects you can store. While there is a single object size limit of 5 TB, it’s highly unlikely your objects will exceed that.
Data is organized in S3 buckets, but you can also create a hierarchy of buckets using prefixes (folders). Once the data has been uploaded to the S3 buckets (via a web console or API calls with AWS CLI or SDKs), it is automatically replicated to three different availability zones inside the same region. You can also enable cross-region S3 bucket replication.
Other useful features include:
- Versioning: Allows you to save multiple versions of your files, while charging you only for each single copy.
- Lifecycle policies: Lets you move your data out of S3 buckets to Glacier (long-term storage), change the S3 storage class, and even delete your data based on the constraints defined in your policies.
- Static website hosting: Supports the hosting of static HTML/CSS/JS websites, which eliminates the need for customers to provision and manage their own web servers.
- Encryption and additional security: Offers several options for server and client-side encryption of objects using an AWS-provider or customer-provided encryption keys, as well as the option to use MFA for access/deletion of the objects.
S3 pricing depends on the combination of the total size of objects you’re saving in S3, the S3 storage classes you’re using, the total number of API calls to your S3 buckets, and your data transfer costs.
Amazon S3 Glacier
Previously known as AWS Glacier, this is essentially an add-on service to S3, since Glacier cannot really be used without it. Glacier relies on S3 lifecycle policies for data to be moved from S3 to Glacier.
A long-term cold-storage archival service, Glacier is useful for moving data you need to keep for a long time but won’t be using frequently; this is often a requirement of compliance standards (e.g., GDPR).
Compared to S3 buckets, with Glacier, data is saved in archives, and a single archive can be up to 40 TB. Once you have your archives ready, you can group them in vaults, which can be locked so that data cannot be modified or tampered with in any way.
Glacier bills in two ways:
- Based on the total size of data you’re archiving (far cheaper than S3 per GB)
- On retrieval requests per GB.
If you don’t retrieve your data, you will only be charged for the storage. If, however, you need to retrieve your data fast—within minutes—you’ll pay a hefty sum.
Amazon Elastic Block Store is AWS’ only block storage service. EBS volumes can only be used by EC2 instances, where the volume is mounted and formatted with a filesystem. EBS volumes can be provisioned on SSD or standard HDD disks and come in different types.
- gp3: For common workloads that require cost-performance balance, gp3 volumes is ideal. It features many improvements compared with the previous gp2 generation.
- io2: For workloads that require high IOPS, io2 offers a possible ratio of 500:1 per GB and a durability of 99.999%, making these drives 20 times more durable than the commodity disks you can purchase for your data center server needs.
- Throughput Optimized HDD (st1) and Cold HDD (sc1): When it comes to standard HDD disks, these volume types emphasize size and throughput, but IOPS capabilities are limited here.
AWS EFS and AWS FSx
AWS offers two file cloud storage services: AWS EFS and AWS FSx. While quite similar, the main difference between the two is that EFS is used to provision NFSv4 shared storage to be used primarily with EC2 instances running Linux, while FSx is shared storage that supports Windows Server or Lustre’s high-performance file system.
Both are mounted by EC2 instances and are regionally based, meaning that instances from all availability zones inside a region can mount them. This makes these services an excellent choice for highly available shared storage options in AWS Cloud.
Additional AWS Storage Services
If you need hybrid connectivity with your on-premises services, AWS Storage Gateway features integrations with data center hypervisors and even tape library backup solutions. AWS Backup, capable of utilizing all types of AWS Cloud storage, is another useful service for keeping your data secure.
Now that you know the basics of cloud storage and the main AWS storage options, stay tuned for Part 2 to find out everything you need to know about Microsoft Azure and Google Cloud storage options. And if you’re on the market for a cloud database provider, I’ll be covering the topic in depth in an upcoming series.