Improvement in Server Compute Performance Using Advanced Air Cooled Thermal Solutions

Author(s):  
Devdatta Kulkarni ◽  
Sandeep Ahuja ◽  
Sanjoy Saha

Continuously increasing demand for higher compute performance is pushing for improved advanced thermal solutions. In high performance computing (HPC) area, most of the end users deploy some sort of direct or indirect liquid cooling thermal solutions. But for the users who have air cooled data centers and air cooled thermal solutions are challenged to cool next generation higher Thermal Design Power (TDP) processors in the same platform form factor without changing environmental boundary conditions. This paper presents several different advanced air cooled technologies developed to cool high TDP processors in the same form factor and within the same boundary conditions of current generation processor. Comparison of thermal performance using different cooling technologies such as Liquid Assist Air Cooling (LAAC) and Loop Heat Pipe (LHP) are presented in this paper. A case study of Intel’s Knights Landing (KNL) processor is presented to show case the increase in compute performance due to different advanced air cooling technologies.

Author(s):  
Suchismita Sarangi ◽  
Will A. Kuhn ◽  
Scott Rider ◽  
Claude Wright ◽  
Shankar Krishnan

Efficient and compact cooling technologies play a pivotal role in determining the performance of high performance computing devices when used with highly parallel workloads in supercomputers. The present work deals with evaluation of different cooling technologies and elucidating their impact on the power, performance, and thermal management of Intel® Xeon Phi™ coprocessors. The scope of the study is to demonstrate enhanced cooling capabilities beyond today’s fan-driven air-cooling for use in high performance computing (HPC) technology, thereby improving the overall Performance per Watt in datacenters. The various cooling technologies evaluated for the present study include air-cooling, liquid-cooling and two-phase immersion-cooling. Air-cooling is evaluated by providing controlled airflow to a cluster of eight 300 W Xeon Phi coprocessors (7120P). For liquid-cooling, two different cold plate technologies are evaluated, viz, Formed tube cold pates and Microchannel based cold plates. Liquidcooling with water as working fluid, is evaluated on single Xeon Phi coprocessors, using inlet conditions in accordance with ASHRAE W2 and W3 class liquid cooled datacenter baselines. For immersion-cooling, a cluster of multiple Xeon Phi coprocessors is evaluated, with three different types of Integrated Heat Spreaders (IHS), viz., bare IHS, IHS with a Boiling Enhancement Coating (BEC) and IHS with BEC coated pin-fins. The entire cluster is immersed in a pool of Novec 649 (3M fluid, boiling point 49 °C at 1 atm), with polycarbonate spacers used to reduce the volume of fluid required, to achieve target fluid/power density of ∼ 3 L/kW. Flow visualization is performed to provide further insight into the boiling behavior during the immersion-cooling process. Performance per Watt of the Xeon Phi coprocessors is characterized as a function of the cooling technologies using several HPC workloads benchmark run at constant frequency, such as the Intel proprietary Power Thermal Utility (PTU), and industry standard HPC benchmarks LINPACK, DGEMM, SGEMM and STREAM. The major parameters measured by sensors on the coprocessor include total power to the coprocessor, CPU temperature, and memory temperature, while the calculated outputs of interest also include the performance per watt and equivalent thermal resistance. As expected, it is observed that both liquid and immersion cooling show improved performance per Watt and lower CPU temperature compared to air-cooling. In addition to elucidating the performance/watt improvement, this work reports on the relationship of cooling technologies on total power consumed by the Xeon-Phi card as a function of coolant inlet temperatures. Further, the paper discusses form-factor advantages to liquid and immersion cooling and compares technologies on a common platform. Finally, the paper concludes by discussing datacenter optimization for cooling in the context of leakage power control for Xeon-Phi coprocessors.


Author(s):  
Ridvan A. Sahan ◽  
Rahima K. Mohammed ◽  
Amy Xia ◽  
Ying-Feng Pang

Increasing thermal design power (TDP) trends with shrinking form factor requirements creates the need for advanced cooling technology development. This investigation proposes multiple innovative water cooler technologies to achieve higher thermal performance liquid-cooling (LC) solutions addressing the limitations of air-cooling (AC). High performance water cooler design options will also meet the miniaturization trends of computing market by providing scalable solution to address smaller board real-estate. This investigation serves multi-fold advantages: 1) introduces four water cooler technologies employing different fin base plate designs, diamond fins, micro-fins, skived micro-fins, and twisted diamond fins, along with an optimized flow distribution path design accompanying each cooler, 2) provides scalable thermal solutions, 3) addresses particle clogging via fin base plate as well as flow distribution path optimization, 4) addresses galvanic corrosion by eliminating the use of two dissimilar metals and introducing acrylic housing, 5) introduces acrylic housing for weight management. Results show that twisted diamond fin, micro-fin and skived micro-fin coolers provide up to 5°C performance improvement resulting in lower pressure drop across water cooler compared to diamond fin cooler and about 37°C improvement compared to air-cooled active heatsink solution.


2021 ◽  
Vol 32 (8) ◽  
pp. 2035-2048
Author(s):  
Mochamad Asri ◽  
Dhairya Malhotra ◽  
Jiajun Wang ◽  
George Biros ◽  
Lizy K. John ◽  
...  

2020 ◽  
Vol 10 (7) ◽  
pp. 2634
Author(s):  
JunWeon Yoon ◽  
TaeYoung Hong ◽  
ChanYeol Park ◽  
Seo-Young Noh ◽  
HeonChang Yu

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.


Author(s):  
Devdatta P. Kulkarni ◽  
Priyanka Tunuguntla ◽  
Guixiang Tan ◽  
Casey Carte

Abstract In recent years, rapid growth is seen in computer and server processors in terms of thermal design power (TDP) envelope. This is mainly due to increase in processor core count, increase in package thermal resistance, challenges in multi-chip integration and maintaining generational performance CAGR. At the same time, several other platform level components such as PCIe cards, graphics cards, SSDs and high power DIMMs are being added in the same chassis which increases the server level power density. To mitigate cooling challenges of high TDP processors, mainly two cooling technologies are deployed: Liquid cooling and advanced air cooling. To deploy liquid cooling technology for servers in data centers, huge initial capital investment is needed. Hence advanced air-cooling thermal solutions are being sought that can be used to cool higher TDP processors as well as high power non-CPU components using same server level airflow boundary conditions. Current air-cooling solutions like heat pipe heat sinks, vapor chamber heat sinks are limited by the heat transfer area, heat carrying capacity and would need significantly more area to cool higher TDP than they could handle. Passive two-phase thermosiphon (gravity dependent) heat sinks may provide intermediate level cooling between traditional air-cooled heat pipe heat sinks and liquid cooling with higher reliability, lower weight and lower cost of maintenance. This paper illustrates the experimental results of a 2U thermosiphon heat sink used in Intel reference 2U, 2 node system and compare thermal performance using traditional heat sinks solutions. The objective of this study was to showcase the increased cooling capability of the CPU by at least 20% over traditional heat sinks while maintaining cooling capability of high-power non-CPU components such as Intel’s DIMMs. This paper will also describe the methodology that will be used for DIMMs serviceability without removing CPU thermal solution, which is critical requirement from data center use perspective.


Sign in / Sign up

Export Citation Format

Share Document