Today, it is critical to learn and understand the structure and features available in a cloud computing environment, as it can provide various benefits and lower your costs. Nevertheless, choosing a provider that best suits your needs from the world’s top three cloud computing vendors—Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)—is not an easy task.
Your choice will depend on many factors, such as the type of services offered, their reliability and performance, SLAs, and migration support, not to mention how you intend to use the cloud.
In this multi-part series, we’ll discuss how to tackle this decision for six different purposes: databases, containers, serverless, storage, PaaS/compute, and enterprise expansion. Our goal is for the relevant decision makers—data scientists, data architects, software engineers, and IT leaders—to be able to choose the cloud provider that is the right option for their business needs.
Here, I’ll cover different cloud providers for databases, starting with some background on data management and what a cloud database is, followed by an overview of the top cloud providers’ (AWS, GCP, Azure) database services, their features, and benefits.
It’s All About the Data
Data management is at the heart of any application, playing a crucial role in how it functions. This means that the performance of data management is directly tied to that of the client application.
There are a variety of ways that data can be stored in different database types. So, when you are moving to the cloud, the database solutions you have access to become a crucial factor due to the issues of global replication, latency, read/write performance, and, most importantly, costs. When comparing different options, you also need to consider the ease of migrating your current cloud or on-premises database, as well as the support offered.
A cloud database is a scalable, highly available database offered as a cloud computing service, where it can be accessed through the cloud platform UI or vendor APIs. It is offered by the cloud provider as a fully managed service, i.e., database-as-a-service (DBaaS), and can even be self-managed by the user running on a cloud virtual machine, similar to the traditional database but running on a cloud platform. The DBaaS model is more valuable when building software products on cloud infrastructure; since on-demand self-service is a major characteristic of cloud computing, it is the natural choice to leverage the flexibility and agility of cloud services.
Advantages of Using Cloud Databases
If you have doubts about a cloud database being right for your organization, here are their main features and benefits to help you finalize the decision:
Ease of use and provisioning
Fully managed by the cloud provider
Agility and faster time to market
We know tech blogging.
These capabilities allow you to focus more on your application and getting it on the market, rather than on managing on-premises database servers.
In just a few steps, you can deploy and vertically scale a cloud database via either the platform UI or by using a vendor-provided API. Additionally, organizations can take advantage of a cloud database without having to procure hardware, install database software, or deal with patching, as the cloud providers take care of all of this: installations, patches, upgrades, and data backup.
Types of Cloud Databases
The following are the three main categories of cloud-managed databases:
Relational Databases (SQL)
These are the cloud version of the traditional relational databases that allow you to organize and store your data in related tables, where the relationship between tables and field types are defined in the database schema. The SQL (Structured Query Language) is used with relational databases to write and query data items. Relational databases have been used since the 1970s, with MySQL being the most popular open-source version today. The common use case for SQL-based databases is to provide the transactional database for any enterprise or web application.
Non-Relational Databases (NoSQL)
Non-relational databases (a.k.a. NoSQL databases) are different from traditional relational databases since they provide a dynamic and flexible mechanism to store a large volume of data in a variety of data models (schemaless), such as documents. Furthermore, NoSQL databases automatically distribute the data across multiple machines, allowing you to scale out the database’s capabilities. On the other hand, unlike relational databases, NoSQL databases update ACID properties (atomicity, consistency, isolation, durability). MongoDB, Apache Cassandra, and Couchbase are among the most popular NoSQL databases today.
Typically, a data warehouse database is used to store historical, large amounts of data (Big Data) from different sources within a business, mainly to empower business intelligence and analytic workloads. The main reason to have a data warehouse database is to separate analytics processing from transactional databases for the sake of the performance and also to aggregate the data from multiple databases and applications in a central store. The most popular cloud data warehouse solutions are Google BigQuery, Amazon Redshift, and Microsoft Azure Synapse (formerly SQL Data Warehouse).
Top Cloud Database Services
Many cloud databases are available that support the storing and accessing of data in the cloud. Let’s look at the cloud databases offered by each top cloud provider: Amazon, Google, and Microsoft.
Amazon Relational Database Service (RDS)
This is the managed relational database cloud service on the AWS platform and provides most of the relational database engines, including MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server. Its core component is the DB instance configured with a selected DB engine. The figure below shows how easy it is to provision and configure an RDS instance in the AWS Management Console in just a few clicks:
Figure 1: New AWS RDS database configurations. (Source: Amazon)
Amazon’s RDS gives you the ability to set up a primary instance and synchronous secondary instances to achieve a high availability database service. You can also scale the reading and increase the query performance by using MySQL, MariaDB, or PostgreSQL read replicas.
Here, you have a fully managed NoSQL database cloud service on the AWS platform, providing applications with a service that has a flexible data model (schemaless); it also delivers reliable performance, even for large-scale applications. The table is the main building block of DynamoDB for storing a collection of items composed of one or more attributes, and this is where you can typically use the AWS SDKs to read, write, modify, and query items. Furthermore, DynamoDB has built-in backup, restore, and security features.
A simple, fully managed, scalable, and cost-effective cloud data warehouse service, Redshift can power all of your analytic workloads. It provides an easy way to create a warehouse cluster into which you can load your data and then be able to access and analyze it using a SQL client or the console query editor.
This is the managed AWS database service that is MongoDB-compatible, allowing you to easily migrate applications that rely on MongoDB with the same code. You can deploy a DocumentDB cluster with multiple instances in different Availability Zones in just a few steps as shown below:
Figure 2: New Amazon DocumentDB cluster set-up. (Source: Amazon)
This is a fully managed RDBMS instance based on MySQL, PostgreSQL, or SQL Server on the Google cloud platform. It comes with scalability, reliability, security, and automated replication, as well as built-in backup features. Cloud SQL can also be connected to any external application; plus, GCP services, like App Engine or Google Kubernetes Engine (GKE) applications, can conveniently access it.
The following figure shows the different options you have on the GCP console when creating a new SQL instance:
Figure 3: Types of SQL instances in GCP. (Source: Google)
Cloud spanner is considered to be the next version of the Google cloud relational database, as it provides SQL features but, most importantly, has large scalability, high performance with strong consistency, and up to 99.999% availability. It’s also fully managed by GCP and can be accessed on its console here.
Google Cloud Datastore
Here, we have a managed NoSQL database service that provides capabilities like automatic scalability, sharding/replication, ACID transactions, schemaless, and a powerful query engine. Google has been offering this for a while, and it is mainly used in App Engine applications, although Google now recommends Firestore as the replacement for the latest version of Datastore.
A Big Data warehouse and analytics platform that is highly available and scalable and is fully managed by Google in terms of the underlying infrastructure, BigQuery enables organizations to store and analyze (using familiar SQL) TBs of data in one place. You can also manage your data under multiple datasets containing many data tables. Here’s an example of a BigQuery dataset with one table:
Figure 4: Example of a Google BigQuery dataset. (Source: Google)
Azure SQL is a group of SQL-related services built on top of the SQL Server database engine in Azure cloud. It includes the following options as seen in the screenshot below:
Figure 5: SQL deployment options in the Azure portal. (Source: Microsoft)
Azure SQL Database
A fully managed Microsoft SQL Server database, this option comes with full database management capabilities and built-in cloud scalability, high availability, restore/backups, and automated patching and upgrading. The Azure platform manages the underlying infrastructure for you.
Azure SQL Managed Instance
A cloud-managed SQL Server (Enterprise Edition) instance, this is seen as the Azure cloud version of the on-premises SQL Server instance; thus, Azure natively supports the migration of an on-premises SQL Server to the cloud-managed instance.
SQL Server on Azure VMs
One can argue this particular offering is far less interesting than the fully managed ones, as Azure provides the SQL virtual machines (VMs) for you and leaves all administrative operations in your hands.
Microsoft’s relational cloud database offering is centered on its popular on-premises enterprise SQL Server. As a matter of fact, Microsoft releases its patches and feature updates to Azure SQL services ahead of its on-premises-based SQL Server product. This indicates how important it is for Microsoft to apply the latest capabilities to the Azure cloud SQL in order to stay on top of the cloud provider list.
Azure Cosmos DB
This is a database service that is globally distributed across Azure regions; data is replicated to be close to the application’s end users. Furthermore, it provides the multi-model databases including MongoDB, Cassandra, and SQL.
The two key features of Cosmos DB are that it has 5 nines availability for data reads and writes, that is, high availability, and scaling of throughput and storage. It has been actively used internally by Microsoft’s own products like Skype.
Azure Synapse (formerly SQL Data Warehouse)
The Azure cloud service that provides both data warehousing and Big Data analytics features in a unified place, Azure Synapse enables you to ingest large volumes of data from a variety of sources and store (warehouse) them in relational tables; you can then explore, analyze, and visualize data by leveraging Synapse’s analytic capabilities. For instance, Azure Synapse Studio provides the ability to develop SQL scripts and spark jobs.
This database allows you to create large-scale end-to-end data analytics using components such as Synapse SQL, Spark, Synapse Pipelines, and Studio.
Move Your Database to the Cloud
Leading cloud database providers are working hard to make database migration as easy and as smooth as possible. Still, you should consider the following factors before migrating your on-premises database to a managed cloud database:
- Database compatibility: Carefully evaluate the compatibility of the cloud database you are considering with your current on-premises version to eliminate the risk of incompatible versions on the cloud.
- Data movement: The cloud provider should have a migration service to physically import the data from your on-premises database management system, as the process can be risky and time-consuming.
- Cost: Perform a cost analysis of the different cloud databases to make sure you choose the offering that is most cost-effective for your needs.
- Code: Evaluate the required application code changes and plan the functionality, performance, and security testing ahead of the migration process.
Note: It’s always helpful to read up on specific use cases out there when considering your own move to the cloud. For example, AWS provides a good overview of how to migrate your on-premises Oracle Database to the AWS Cloud. Check out “Strategies for Migrating Oracle Databases to AWS” for more details on the different methods, services, and tools to achieve this particular migration.
Organizations often need to deal with choosing the right database option for their applications, and this is no different in the cloud computing era. Companies usually don’t choose cloud providers just for the sake of a database, but this category is becoming more and more important due to the issues of global replication, latency, and read/write speeds, as well as costs.
This article summarized the top cloud databases and gave an overview of the different DBaaS options and considerations. Still, the best way to find the right fit for your company is to try out each offering for yourself.