The impact of AI on data centres

Before we start, let's get pedantic: at this point in time artificial intelligence is a purely theoretical concept. True AI, a sentient computer capable of initiative and human interaction, remains within the realm of science fiction. The AI research field is full of conflicting ideas, and it’s not clear whether we can actually build a machine that can replicate the inner workings of the human brain.

And yet, being a part of the IT industry, you’ll have seen proclamations that one product or another delivers AI functionality. Just a few years ago, the same functionality would be called data analytics.

This being said, machine learning and related techniques are already producing some impressive results, so this post will look at the potential near-future implications of AI research – if we’re being optimistic.

In the data centre

The impact of AI on data centres can be divided into two broad categories – the impact on hardware and architectures, as the users start adopting AI-inspired technologies, and the impact on the management and operation of the facilities themselves.

We’ll start with the first category: turns out that machine learning and services like speech and image recognition require a new breed of servers, equipped with novel components such as Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs). All of these require massive amounts of power, and produce massive amounts of heat.

Nvidia, the world’s largest supplier of graphics chips, has just announced DGX-2, a 10U box for algorithm training that includes 16 Volta V100 GPUs along with two Intel Xeon Platinum CPUs and 30TB of flash storage. DGX-2 delivers up to two Petaflops of compute, and consumes a whopping 10kW of power – more than an entire 42U rack of traditional servers.

And Nvidia is not alone in pushing the envelope on power density: DGX-2 is actually a reference design, and server vendors have been given permission to iterate and create their own variants, some of which might be even more power-hungry. Meanwhile, Intel has just confirmed rumours that it’s working on its own data centre GPUs – expected to hit the market in 2020.

As power densities go up, so does the amount of heat that needs to be removed from the servers, and this will inevitably result in growing adoption of liquid cooling. Dumping $200,000 worth of equipment into a tub of mineral oil or bringing water pipes into the rack might not seem like a good idea today, but these approaches might offer the only way to cool servers of the future.

As a consequence, AI research will require data centres engineered for higher power densities, with additional cooling and very, very strong floors.

For the data centre

But machine learning is also useful in management of the data centre, where it can help optimize energy consumption and server use.

For example, an algorithm could spot under-utilized servers, automatically move the workloads and either switch off idle machines to conserve energy, or rent them out as part of a cloud service, creating an additional revenue stream.

Google has famously claimed that it used AI to reduce its data centre Power Usage Effectiveness rating by 15 percent, saving millions on electricity. While the company is reluctant to share this technology, other businesses are bringing similar capabilities mainstream.

American software vendor Nlyte has just partnered with IBM to integrate Watson – perhaps the most famous ‘cognitive computing’ product to date – into its Data Centre Infrastructure Management (DCIM) products.

“Behold, a new member of the data centre team, one that never takes a vacation or your lunch from the breakroom,” quipped Amy Benett, North American marketing lead for Watson IoT.

Beyond management, AI could improve physical security by tracking individuals throughout the data centre using CCTV, and alerting its masters when something looks out of order.

I think it’s a safe bet to say that every DCIM vendor will eventually offer some kind of AI functionality. Or at least something they call AI functionality.

Source: virtusdatacentres