By Trevor Pott, IOD Expert
What will the data center of 2020 look like? In all likelihood, it will look much the same as today’s data center does, but more …cloudy. The data centers of 2020 will blur the lines between public and private cloud and hybrid cloud will be the new normal.
The cloud in this context doesn’t refer to the public cloud exclusively. Though public cloud adoption is steadily increasing, it does not look set to kill off the private datacenter any time soon.
That said, public clouds seem likely to be the driver of growth in IT for the foreseeable future. Various prediction show that by 2020 the number of workloads, as well as the amount of storage under management in the public cloud will dwarf all of the world’s private data centers combined. None of this means the end of the private data center, though.
The growth of the public cloud simply means that we’ll be doing things in the public cloud that aren’t practical on-premises. We’ll work on datasets that don’t fit in private clouds. We’ll combine data from thousands, perhaps even millions of private data centers, and use that data in ways we still can’t imagine.
Using the public cloud, we’ll be able to stand up thousands of workloads in the blink of an eye and destroy them all at the same speed. The public cloud is about adding new capabilities to what we already do, and the data resulting from our newfound capabilities will feed into our private data centers.
Why Public Clouds
Before diving into the private data centers of 2020, it’s important to examine what the public clouds of 2020 will look like. By many estimates, they’ll be running the bulk of the world’s workloads, so the question is why.
The public cloud isn’t seeing rapid adoption because it’s cheaper than on-premises IT. This is a dangerous, decade-old myth started by marketers. Once, long ago, the public cloud was less expensive than using the top of the line enterprise IT gear. Today, renewed competition has driven down prices in many of the most intractable IT verticals and reshaped discussions about public cloud economics.
A public cloud provider will always be able to build and operate a data center for less money than any non-hyperscale data center operator could. They will not, however, be selling time on their cloud at cost. Predictions of radical price wars amongst public cloud providers that would drop the cost of subscribing to public clouds below the cost of standing up private infrastructure never came to pass.
So if economics aren’t really driving public cloud adoption, what is? The answer is twofold: ease of use, and proprietary services.
Ease of use drove adoption of the public cloud during the first decade of its existence. The idea that anyone with a credit card could immediately stand up workloads and start acting on them was transformative. Internal IT teams were slow to provision and slow to respond to requests for change. Public clouds were not.
Today, public clouds are increasingly attracting developers because of proprietary services. These services include IBM’s Watson, Microsoft’s Cognitive Services, Amazon’s Artificial Intelligence (AI) and Machine Learning as a Service (MLaaS) offerings, as well as Google’s various proprietary services.
These services, along with more traditional Big Data, Business Intelligence (BI), and analytics tools are Bulk Data Computational Analysis (BDCA) services. BDCA tools collectively provide the ability to chew on large and/or complicated data sets and extract meaning. They are also services that developers cannot access on-premises.
These BDCA tools are driving the creation of applications that rely on the public cloud. Some of these applications will exist exclusively in the public cloud. Some applications will have services that will also live on-premises.
Understanding why public clouds are attractive is important because they will define what private data centers will look like.
Keeping it Local
Some things just need to be on-premises. The reasons for this vary, but it’s true for a large enough number of organizations that local data centers are unlikely to die off.
Regulatory compliance will play a big role in preserving the on-premises data center. While Americans don’t look likely to face many barriers to adoption of public cloud services, regulatory differences in other jurisdictions aren’t likely to be resolvable in the near future. This will put pressure on organizations to keep certain types of data – particularly Personally Identifiable Information (PII) – close to home.
On-site requirements for data usage will also lead to organizations keeping a certain number of workloads on-premises. Manufacturing equipment will likely be driven from local workloads. These are unlikely to be run from a public cloud instance because for most organizations the cost of adequate internet connectivity will simply be too high.
The ever-increasing number of sensors and Internet of Things (IoT) devices will also likely send their data to a local gateway. These gateways may do various things from carefully checking for PII and anonymizing data to stripping out irrelevant sensor data. Ultimately, the data collected from sensors and other IoT devices will end up in the public cloud, however, as the number and type of sensors grows it is increasingly unlikely that this data will be streamed to the public cloud without some form of pre-processing.
Other workloads, such as those operating in hospitals, running air traffic control, or even managing a city’s street lights are considered life critical. In many jurisdictions, there are laws regarding the resiliency and availability of these workloads that are difficult if not impossible to meet with those workloads running in the public cloud.
Public cloud reliability has been patchy over the past decade, and an SLA cannot restore someone’s life. Even if the public cloud providers could guarantee life-critical levels of reliability, the intermediate internet service providers won’t.
These intermediate internet services providers are a very real problem today, and do not look likely to change by 2020. They are prohibitively expensive and place a hard limit on which workloads can be moved to the public cloud and which cannot. Some workloads are highly latency sensitive while others gobble network throughput.
For all these reasons, and more besides, organizations around the world will be reluctant to give up their on-premises data centers. That said, the number of workloads which simply must be run on-premises is shrinking.
The Back and Forth
Today, we are seeing the rise of private and hybrid cloud solutions. There are various contenders, and it will take a few years to see who wins, but what is clear is that by 2020 today’s virtualization stacks will be replaced by fully-fledged hybrid clouds.
These hybrid clouds already come with all the ease of use capabilities that made the public cloud so attractive. These include self-service portals, RESTful APIs, YAML context-aware workload instantiation and more. Many cloud software developers are even working on full integration of composable infrastructure solutions, meaning that YAML context could be augmented or replaced by the deep integration of desired state configuration tools (Puppet, Chef, Ansible, Salt ,and so forth).
This code-addressable composable infrastructure is what makes public clouds able to start up thousands of workloads in a few lines of code and tear them back down just as easily. The challenge private cloud solutions face is that, much more so than public clouds, private clouds will have to provide support for legacy workloads that may never be fully composable.
Those 30 and 40 year old applications that organizations haven’t gotten around to recoding today won’t be recoded by 2020, and they’ll likely still be operating on the on-premises infrastructure. The difference being that on-premises infrastructure will no longer be simple virtualization, it will be a cloud. And the organization’s developers will be trying to integrate even those ancient legacy applications into the modern solutions they’ve build with a decade of exposure to the agility of the public cloud.
Composable Workloads
Those workloads which are fully composable are the easiest to move back and forth between a public cloud provider and on-premises infrastructure. The only part of a composable workload that is stateful and persistent is the data; everything else is regenerated regularly.This means that workloads, even those which consist of multiple services, can be stood up or torn down easily.
Consider a workload consisting of a load balancer, web servers, front end databases and backend databases. This workload could be broken up such that a load balancer, some web servers and some front end databases run on each of the major public clouds, with the back end database running on-premises.
A relatively simple configuration could ensure that PII that needs to stay on-premises is never copied to the front end database instances in the public clouds. Instead, advanced privacy tools like data masking can be employed to ensure that the frontend databases operating in the various public clouds only contain anonymized data.
The web servers running on each of the public clouds could execute applications that tied into the proprietary BDCA tools provided by each of the public cloud providers and feedback the results to the on-premises backend database in the form of metadata.
The workloads in the public cloud could be created only when there was analysis to run and be destroyed when the analysis was finished. The dataset they acted upon could be generated on the fly as a subset of a privacy-sensitive on-premises database.
These are the sorts of operations that developers have been doing in the public cloud for years using containers: stateless workloads that perform specific tasks and which are then discarded. The data centers of 2020 will be full of them. While traditional virtual machines will be used for classic, stateful workloads, these workloads will serve as the nucleus of a maelstrom of constantly spawning and terminating stateless containers.
Hybrid Cloud in a Can
The traditional data center approach of buying storage, networking and compute separately and then having systems administrators assemble, configure and manage each component is coming to a rapid end. Organizations are no longer interested in paying for administrators whose sole focus is to keep the lights on.
Solutions are emerging which provide private and hybrid clouds as turnkey appliances. Microsoft’s Azure Stack is one such example, while OpenStack and its derivatives are another category. VMware-based solutions such as VXRail are regularly evolving, and a host of non-OpenStack, KVM-based solutions are competing for relevance as well.
These are the first of a new generation, and they have a lot of growing up to do. Many of these offerings lack enterprise data services and most lack any sort of solution which can provide the stateful, reliable storage both on-premises and in the public cloud as required for hybrid cloud operation.
The data centers of 2020 will require storage solutions that span multiple clouds, both public and private. To take full advantage of the IT solutions of the day, our data will have to be made available to workloads operating in all the major public clouds as well as our on-premises data centers. Our storage will need to cope with failed internet links and our workloads will have to respawn in different data centers to route around failures.
The technology to do all these things exists, today. It is even commercially available. It simply hasn’t been packaged together, tested, and evolved to be as easy to use as today’s public cloud services. That is what is going to happen over the next few years. Then we’ll really see what clouds can do.
= = =
Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He is cofounder of eGeek Consulting Ltd. and splits his time between systems administration, consulting and technology writing. As a consultant he helps Silicon Valley start-ups better understand systems administrators and how to sell to them.