Seconds matter when mission critical systems go down, with the recovery potentially taking hours or even days to restore the entire infrastructure, damaging reputation and the ability to trade. Yet many data centres are failing to get to grips with thermal issues, putting their facilities at risk, warns Vertiv’s head of data centre optimisation, EMEA, Simon Brady.
“When carrying out surveys, in 2017, I found thermal issues in virtually every case – in fact, it was the number one issue,” comments Brady. “It is rare for me to find a temperature underneath a raised floor anywhere near where it
Brady points out that there is a lack of knowledge in the sector and many data centre operators do not have an accurate picture of their facility’s thermal status.
“With 29% of outages due to thermal issues, data centres need to improve their thermal strategy, comments Brady.
“If equipment gets too hot, it can lead to thermal overload. It is not an equipment failure, per se, or a software issue, it is simply that part of the room is too warm. This is 100% preventable… thermal management and general best practice in the white space is a huge issue.”
It is not just overheating that is posing a problem; many data centres are also over cooling, leading to wasted energy and higher costs.
Brady points out that thermal utilisation in data centres is typically less than 38%. “This stems from the design of the data centre from day one… there is a huge over-estimation of capacity,” he says.
Brady advises that effective control and monitoring systems can help optimise the temperature and avoid knee-jerk reactions. A lack of data can lead to either over cooling, or undercooling, if operators haven’t fully understood the issues in their data centre.
One strategy is to use the services of a third party to provide remote diagnostics and preventive monitoring for thermal management equipment, with the aim of delivering increased uptime and operational efficiency. Data can be transferred from equipment, to allow experts to gain a real-time insight and quickly identify, diagnose, and resolve any irregularities that may arise in operation.
There is a strong business case for tackling the problem; Brady points out that over 35% of data centre energy use is attributed to cooling: “After the IT load, cooling is the number one consumer of energy. Data centres need to understand what the best practices are. There are a number of standards that are out there to help.
“The EU Code of Conduct for Data Centres offers lots of hints and tips, for example, and it is free to download. As I walk around data centres, I see problems all the time…sometimes I think there is a bit of apathy: ‘It’s not broke – I don’t have to fix it.’
“Data centres need to make a plan and tackle the small things such as blanking off; if you tackle the small things, it will enable you to deal with the bigger issues; to increase your temperatures, overall, in your cold aisle; raise chilled water temperatures on your chiller and implement free cooling. You cannot do these things until you have tackled the basics.”
Another major issue facing data centres is a fundamental skills gap in the sector, which urgently needs to be addressed.
“There is a tendency to pigeonhole people,” comments Brady. “We have some ‘rocket science’ clever thermal people, but they don’t know about power, and we have power people who don’t know about thermal management. There needs to be a wider view.
“We are trying to bring people up to at least a minimum level, through our training programmes but our industry is desperately short of electrical and mechanical engineers. There is an urgent need to replace those in the industry who are retiring, so we are looking to develop apprentice programmes – taking people straight out of college, university and even school level. I think we should be encouraging the education sector to place a lot more value on the career of electrical engineer or mechanical engineer,” Brady continues.
While training is a key issue that needs to be addressed to drive improvement, Brady also highlights some interesting developments in terms of technology.
“Adiabatic, evaporative and free cooling are the way to go. The cost of R401 gas [refrigerant] has gone from £5 per kilo to £50 per kilo in the space of six months due to EU import restrictions. The cost is going up for every air-conditioning manufacturer.
“By the end of the year, it could cost more for the gas for an air-conditioning unit than the actual air-conditioning system itself. Having a system that doesn’t rely on mechanical cooling compressors, is going to be critical moving forward, if this situation isn’t resolved.”
The return on investment, when replacing a traditional chiller with an adiabatic free cooling system is typically about three years, according to Brady. However, it can be as little as one year.
In the future, Brady also sees a role for virtual and augmented reality technologies within data centres to overcome one of the main causes of downtime: human error.
The real-time data collected by sensors can easily be integrated with virtual reality to help evaluate issues with thermal management. But from a service perspective, technology could offer significant opportunities.
“One of the main causes of downtime is human error. The British Airways failure was due to the wrong person switching the wrong thing off at the wrong time. An augmented reality system could direct engineering personnel to ensure equipment is switched off in the right order, with checks and balances built in. We are looking at this for our own service engineers. When virtual reality glasses are donned, the technology could identify the equipment, what health and safety checks need to be performed, and what tool has to be used and in what order,” Brady concludes.
“This could happen now; I have seen demonstrations. Once the technology becomes more mature, service engineers will be able to operate more quickly and, most importantly, more safely.”