Perspectives on AdvancedTCA thermal compatibility
Experts from CP-TA member companies representing semiconductor, board-level, and system-level viewpoints gathered to share their perspectives on AdvancedTCA thermal compatibility. This roundtable discussion – moderated by Brian Moore, 2010 SCOPE Alliance technical chair – included Chris Engels, technical marketing manager at Emerson Network Power; Todd Keaffaber, platform system architect for Intel; Yves Desrochers, lead architect for Kontron Canada; and Eric Gregory, senior product line manager for Radisys.
Brian: Why did CP-TA develop thermal interoperability specifications for AdvancedTCA components?
Yves: The root need is to set a common vocabulary that means the same thing to everybody when describing the thermal behavior of a system component. A chassis vendor could claim that its product was capable of 200 W cooling, but that statement could mean very different things to different people. True cooling capacity depends a lot on the ambient temperature at which you are working and also the fan noise level you’re required to maintain. So the whole idea behind this was that we needed a more precise description of the cooling requirements that would avoid the current ambiguities and provide concrete measurable parameters.
Todd: It boiled down to an integration issue. One of the biggest pain points with integrating AdvancedTCA systems is mixing and matching components from different vendors. Some blades require more thermal removal than others, and the slots in a chassis do not all cool uniformly. A chassis may be rated by its highest performing slot, but other slots may not perform at that level. This created a thermal integration nightmare when trying to figure out which blades should be in which slots.
Chris: Yes, just getting a rating like 200 W doesn’t really tell you anything; that’s just electrical power that says nothing about thermal system integration. If you have a blade that uses 200 W, but the power is spread around the blade evenly, you could even cool that blade by convection. But if you have a hot spot that consumes, say, half of the board’s power, you need much more cooling capacity for that specific slot. So power alone doesn’t tell you anything. You need measurable criteria for both the chassis slots and the blades that have to be integrated there, and that’s what CP-TA guidelines give you.
Brian: What are the CP-TA thermal guidelines?
Eric: There are two parts to the specifications: the Interoperability Compliance Document (ICD) and the Test Procedure Manual (TPM). The ICD outlines airflow criteria for a variety of different ambient temperatures and operating conditions based on NEBS (Network Equipment Building Standards) requirements; if you design to them you can achieve interoperability. Test methodologies are in the TPM along with standards for reporting test results.
Todd: The current ICD defines four thermal classes: B.1, 2, 3, and 4, with class B.4 the highest performing class. With a chassis, for example, all of the classes specify the minimum airflow that must be put through the weakest performing slot at a specific ambient air temperature. For instance, B.4 says that at 55 ºC your weakest slot must supply at least 40 cfm. Similarly, the blades face maximum airflow requirements. In the B.4 class, for instance, the front board shall not require more than 17.9 cfm at 25 ºC for operation. What this does is impose thermal performance requirements on both the chassis and the blades so that when you match them together you know that the chassis will be able to supply enough airflow for the blade and that the blade does not require more than the chassis can provide.
Chris: The requirements also specify behaviors under several failure scenarios for the cooling system. For example, what happens when you have the system open for maintenance when replacing a filter in a fan tray, or if one of the redundant fans in a high availability system fails? These conditions are also to be tested.
Brian: So AdvancedTCA vendors can rate their blades and other components by their compliance with these different thermal classes?
Eric: They can rate their products but can’t claim compliance yet. Vendors can say “designed for B.4” or that sort of thing, but not that they “meet” or are “certified” at B.4 because the CP-TA hasn’t yet approved a specific test tool for the certification process. Even without certification, though, standardizing the levels, the testing, and the reporting goes a long way toward easing thermal integration for developers. Standardization doesn’t completely eliminate the need for rigorous thermal testing of the final system design, but if you choose components so that their ratings are compatible, you can feel confident that they will work together and pass the tests.
Brian: How is the testing done?
Eric: There are different ways of measuring airflows – direct and indirect methods – and the TPM tells you the equipment needed and the recipe for making those measurements. For instance, you can use a wind tunnel approach and measure a blade’s airflow impedance. The TPM tells you how to measure the air pressure drop across the blade and what those values need to be to pass the test. For a chassis you can measure the pressure difference between inlet and outlet and calculate airflow from the pressure drop and the slot geometry.
Chris: Another approach for measuring chassis airflow is to use a thermal load. You place the load in the slot, where it heats the air. You measure the air temperature at the inlet and outlet and the difference along with the known thermal power allows you to calculate the slot’s airflow. A third methodology is to measure air velocity with an anemometer sensor (this is the test methodology described in the TPM). Whatever methodology is used, they all allow determining airflow in a chassis.
Brian: How should AdvancedTCA vendors utilize these thermal interoperability specifications?
Yves: The four classes give vendors an opportunity to design for different application environments and price points. Still, chassis vendors should strive to have the best possible chassis by achieving or exceeding the requirements for the class they are targeting, while blade vendors should try to stay below the maximums for the class. That way there is some room to grow. The trend in the industry is to push the envelope because customers always need more processing capability and more memory, and so they use more power. Having headroom in the cooling capacity means that a system can remain installed longer before needing complete replacement.
Chris: The headroom also helps increase MTBF. The greater coolness with which a system operates, the more reliably it can run. Also, the headroom gives a system greater resilience to problems like clogged filters. Over time filters get dirty and the system’s cooling capacity goes down. Having headroom ensures that components remain adequately cooled, improving reliability. Aiming to just meet the bare minimum requirements may mean a cost savings in the short term but can mean extra expenses in the long term.
Brian: Isn’t there already headroom built into the ICD guidelines?
Yves: The minimums and maximums were designed to match each other with just enough headroom to avoid overdesign and unnecessary costs. But we did put margin in to account for thermal testing realities. Thermal testing is not an exact science and needs a margin for error.
Todd: Yes, there was a balance there that we tried to strike.
Yves: And even though there may not be exact absolute values, the testing methodology is very useful and provides a good base for comparisons.
Brian: What about the NEPs that are designing their own AdvancedTCA boards or chassis? Do they still need to address these requirements?
Eric: Platforms that get deployed to the field are going to be used for years, with technology upgrades replacing the contents over time. Designing to these specifications, especially B.4, will give those systems the ability to more reliably utilize next-generation blades with their higher-power requirements. If their cooling capacity is limited they may not be able to do so.
Todd: Also, by following the standardized testing for their chassis and blade designs, it gives them the ability to do an apples-to-apples comparison when making the upgrades. This will give them a high degree of confidence in knowing when the building blocks will work together and when there might be a problem.
Brian: The ICD is now at revision 1.1. What does the future hold for these guidelines?
Chris: There is a technology shift in the industry. CPUs are getting faster and faster each generation, and while there is a lot of work going on to improve power efficiency, in the end this performance increase means that the power a CPU will consume or what an AdvancedTCA blade needs is also growing over the years. There has been a 200 W limit on AdvancedTCA boards for a long time (it is now at 400 W) – mostly because of cooling issues – but that is shifting because of market pressure. This means that over time PICMG may allow even higher power levels on AdvancedTCA boards, and when that happens CP-TA may need to evolve higher thermal class levels such as B.5, B.6, and so on.
Todd: There are also plans to get a standard test tool on the market for the current ICD requirements. This tool would automatically perform the thermal testing and generate a report in the proper format. Having this tool would allow CP-TA to start certifying vendor claims of compliance with the ICD thermal class requirements.
Chris: Beyond that, we expect to see updates to the thermal interoperability testing in the future, but it is too early to state when this will happen. When CP-TA sees a need for new thermal specifications and tests in the future, CP-TA will develop them. That’s what CP-TA is all about: advancing xTCA product interoperability.
Brian: Thank you all for sharing your insights on thermal compatibility in AdvancedTCA designs. Thanks, also, to the CP-TA for developing these thermal guidelines and interoperability specifications. The work that the Association is doing to resolve this and other interoperability issues is helping streamline AdvancedTCA-based system development and giving NEP designers the tools to confidently utilize the AdvancedTCA off-the-shelf design approach.