Data centers hold crucial information to most organizations’ operations, it needs to be kept secure, protected, and still accessible always. Which is why data centers also need to be inspected on a regular basis and including thermal imaging in that inspection can ensure the data house within them is safeguarded.
10 steps to use a thermal camera in a data center
While thermal mapping, or the walk through the data center gathering thermal scans, there are multiple places you should pull out your thermal camera. From inspecting the electrical source to the server racks and everything in between, including the critical heating, ventilation, and air-conditioning (HVAC) system.
To make the most out of your inspection session, you want to map an operating system that’s pulling as large an electrical load as possible. Current running through the wires produces heat energy, and that's what an infrared camera will pick up on when you’re looking at the temperature around the center.
In a data center, the components are like a series of dominoes, if one fails, it takes everything downstream with it. So, it makes sense to begin at the beginning, or what the National Electric Code (NEC) calls "the source." That’s typically a transformer or a substation.
1. Check the source
On a transformer, use your thermal camera to scan the secondary windings and coils, looking at the terminations and lugs, or bolted connections, inside the box. Specifically, look for:
- Thermal anomalies, differences in temperature between similar components
- Physical damage or debris, it can interfere with normal operation
- A change between temperature between circuit phases, this will indicate load imbalance
You’ll want to look for these three things in every area you scan with a thermal camera.
2. Check the alternate source
Many data centers have an alternate source of power for redundancy. This second source could be another utility transformer on a different grid or a standby generator. Alternate power sources must be scanned and inspected too, following the same process as the main source. Make sure to check them out while they are in use and under load.
3. Check standby generators
Just like with the previous steps, use your thermal camera to look at the standby generators while they are powered up with everything downstream running off them. Follow the same process you did for the main and alternate sources, check:
- Termination points
- For damage and debris
4. Look at the cooling or exhaust systems
To detect problems with cooling or exhaust systems, you'll need to record actual temperatures rather than observing ΔTs. Use your thermal camera to pinpoint temperatures and look back at the trend line you’ve been building throughout your regular walk throughs.
5. Check the switches
When a transfer switch is functioning correctly, it senses where the power is coming from (main or standby) and switches to that source. Take the time to take a thermal image of that switch during your inspection. Because if it fails, it won't matter how good maintenance procedures are downstream, it will cause problems.
With current running through the transfer switch, look for heating that might signal loose connections. These could be a sign of insufficient torque or compression on a lug or termination.
6. Check the switchboard
The main switchboard cabinet houses various components including busbars, bolted connections, and fuse clips. Look for thermal anomalies in connections (including bus connections), terminations, fuses, and fuse clips. Also look for the same things you’ve been looking for all along, damage, debris, and imbalance.
7. Check the UPS
An uninterruptible power supply (UPS) is usually the next stop after the switchboard. When inspecting a UPS, make sure you’re working under load, specifically scan these areas:
- The input connections
- The terminals
- The inverter sections
- Pay close attention to the small fuses and capacitors
- The battery section
- Look at terminal posts, casings, and feeders. A bad cell heats up very quickly under load.
After you scan everything under load, immediately scan the batteries not loaded. Bad cells cool very quickly when the load is removed. Finally, check the on-board transformer, if there is one.
8. Check the PDU
Power distribution units (PDUs) are downstream of the UPS and are typically located close to the servers, where they distribute power. Normally, a PDU will have a circuit breaker panel and sometimes a transformer. While scanning PDUs, look at:
- Terminals, including circuit breaker terminals
Visually check for damage and debris, and if a PDU is not a straight-through-voltage model, scan the on-board transformer.
9. Scan the server racks
Fitting more server racks into a space is always a draw, but that also means you’ll see an increased demand in the centers’ power and cooling capabilities. This can make scanning the area with a thermal camera a bit tricky, but it’s still a useful step in the mapping process.
Use a thermal camera to monitor:
- Power strips and power supplies built into the racks
- Wiring connections
- Plugs and plug strips
Look for overheating due to loose connections and loose or bent plugs.
A thermal scan can also detect broken cords and broken conductors in wires. To detect the latter condition, look for what is called "the barber pole effect," you’ll see the thermal differences of the twisted strands on your thermal camera.
10. Check air cooling effectiveness
You should also monitor the areas where air enters, and heat is expelled from the server racks, and the building. This includes checking the:
Server rack cooling systems
Both a thermal imager and temperature/airflow meter are useful for monitoring air cooling effectiveness. You can:
- map cooling patterns into, out of and around server racks and
- confirm whether cooling is adequate or not.
Monitoring this helps you identifie where to install perforated panels to improve circulation or blanking plates to keep hot air from entering empty slots on unfilled racks. These strategies help many data center keep their servers cool enough to maintain their server warranties.
HVAC systems are essential in data centers because of the amount of heat generated by servers. Many servers are designed to automatically and autonomously shut down when their temperature exceeds 75 or 76ºF.
If you are running with an AC system, you’ll want to check the following for overheating that signals misalignment, unbalance in the fans, or degradation in the motors and bearings.
- Crimped or bolted connections
- Mechanical components
An infrared image will also reveal a refrigerant leak if it is blowing against the cabinet.
If you’re using a split system or chilled-water system with cooling towers, you’ll have to check the outside and inside components. For example, a split system's evaporator coil is typically inside the building while the condensing unit is outside. Check the evaporator coil for icing but be aware that there's no point in checking the AC system inside if you are not going to go outside. There are usually fuses and terminations (lugs) outside, and, if there's a cooling tower, there are motors. Use your thermal imager to check the flow for leaks in the towers.
Why use a thermal camera?
A thermal camera can display and store images of an object's surface temperatures. Using a thermal camera, you can easily detect anomalies in the temperatures of electrical or mechanical components. Overheating often indicates a potential problem that requires maintenance before failure occurs. In data centers, where cooling is important to keep servers running, uncharacteristically cool surfaces might also indicate a problem, perhaps an imbalance in the HVAC system that requires correcting.
Thermal cameras can also record actual surface temperatures throughout your data center. This helps detect situations such as an overheating transformer or motor before it breaks down completely, giving you the time to repair or replace. When thermal images reveal potential problems, you can capture them, upload them to a computer, and use software for reporting and analysis.
By regularly monitoring equipment and keeping a thermal "track record" on for long-term comparison, you can catch abnormal temperature reading readings and changes from the trend easier. To ensure the consistency required for side-by-side comparison, follow a sampling route through your facility, and scan the same objects or areas each time. This will help you create a thermal map of your data center and any equipment within to compare with each time.
Along with repair records, thermal trending information provides a documented data trail for insurance carriers, management, and any others who require confirmation of a reliable operation.