Thermal and Airflow Analysis of Sequent's New Generation ...

13
Thermal and Airflow Analysis of Sequent's New Generation Pentium Pro® Based Enterprise Server by Henry C. Bosak Staff Mechanical Engineer Sequent Computer Systems, Inc. This document is copyright and may not be reproduced by any method, translated, transmitted or stored in a retrieval system without prior written permission of Flomerics Limited. Flomerics Limited 81 Bridge Road Hampton Court Surrey KT8 9HH UK Tel: +44 (0) 181 941 8810 Fax: +44 (0) 181 941 8730

Transcript of Thermal and Airflow Analysis of Sequent's New Generation ...

Thermal and Airflow Analysis of Sequent's New Generation Pentium Pro® Based

Enterprise Server

by

Henry C. Bosak Staff Mechanical Engineer

Sequent Computer Systems, Inc.

This document is copyright and may not be reproduced by any method, translated, transmitted or stored in a retrieval system without prior written permission of Flomerics Limited.

Flomerics Limited 81 Bridge Road Hampton Court Surrey KT8 9HH UK

Tel: +44 (0) 181 941 8810 Fax: +44 (0) 181 941 8730

Thermal and Airflow Analysis of Sequent's New

Generation Pentium Pro® Based Enterprise Server.

Henry C. Bosak Staff Mechanical Engineer

Sequent Computer Systems, Inc. 15450 S.W. Koll Parkway

Beaverton, OR. 97006 (503)626-5700

ABSTRACT Sequent is a large provider of high end mission critical servers where time to market is key to the success of the company. Sequent uses Flotherm as a key resource in the concept and development stages of new products in order to stay competitive in this global market. STING is the new Sequent server that is discussed in this paper.

The overall cooling approach is discussed, including the use of front to rear cooling for the server module, transverse airflow of system boards, and impingement cooling of the 4x processor card. Preliminary correlation between Flotherm predictions and measured data for flow impedance, delivery and flow velocities to critical areas was excellent. Initial comparisons between the Pentinm Pro predictions and test data were disappointing. However, the overall cooling scheme was robust enough that with minor changes to the blower specification, no impact to the system development schedule was incurred. Further work closed the gap between measured and calculated case temperatures for the processors, revealing several important factors. These included: changes to the thermal model between initial analysis and test; unknown changes to heatsink; and discovery of test methodology error.

From this work, it was concluded that accurate and timely delivery of thermal models from chip suppliers to their customers is crucial in order for both parties to be successful in the future.

INTRODUCTION Sequent Computer Systems, Inc. (NASDAQ: SQNT) is a leading architect and provider of open client/server solutions for large organizations. Sequent develops, manufactures and sells commercial symmetric multiprocessing systems (SNIP) that support very large-scale on-line transaction processing (OLTP), decision support (DSS) and web-based applications. Its project-oriented offerings include consulting and professional services geared to help organizations re-architect their existing information technology infrastructures. In addition, the company partners closely with other open systems vendors to deliver complete solutions to its customers. The current architecture can configure from 2-30 Pentium® processors in 1 cabinet. With clustering, 4 systems can be configured with up to 120 processors focused on a single application.

Sequent's next-generation system architecture called NUMA-Q TM uses state of the art processor interconnect technology that will enable the company to build systems containing more than 250 standard processors and store 100 Terabytes (TB) of data, providing performance tenfold those of conventional SMP systems. Initial product will be configured with up to 32 processors and 8 way clustering providing a total of 256 processors and store up to 32 TB of data.

At the core of Sequent's NUMA-Q architecture is an intelligent, high-speed interconnect -- called IQ- Link TM that breaks the backplane barrier that has bound conventional SMP for over a decade. NUMA-Q is Sequent's new cache-coherent nonuniform memory access (CC-NUMA) architecture, which leverages the

Intel Pentium Pro (P6) multiprocessor technology. The NUMA-Q architecture will enable Sequent to build massively scaled systems that provide unparalleled availability, scalability and manageability at levels previously available only on the largest mainframes.

The basic building block of the NUMA-Q is based on Intel's Standard High Volume (SHV) four-processor architecture using the Intel Pentium Pro processor. Sequent enhances this architecture with extra redundancy and robustness for increased availability in enterprise-computing environments. Sequent connects multiple Pentium Pro "quads" with its new intelligent, high-speed IQ-Link, which moves data between the quads at a rate of 1 GB per second using a data pump chipset jointly engineered by Sequent and Vitesse Corp. The effective bus bandwidth of a NUMA-Q-based system scales linearly as quads ake added and can be as high as 32 GB per second for a 252-processor system.

The focus of this paper will deal with the cooling requirements of the quad enclosure, the development of the overall cooling strategy, and the use of Flotherm® in the concept and development stages of the QUA_D. Close attention is paid to the Pentinm Pro microprocessor and system level analysis performed using Flotherm and comparison between analytic and test data.

DISCUSS, ION

Product Overview The "Quad" Enclosure conforms to the EIA industry standard 19inch form factor for rack mounting into cabinets. Overall dimensions are 10.5inches (266mm) in height, 17.0inches (431mm) in width and 32.5inches (825mm) in depth. Fully configured weight is 120 pounds (54.5Kg). The enclosure is constructed of 18Ga (.048in thick) cold rolled steel that is nickel plated. Maximum power dissipation is 0.960Kw, airflow is front to rear. The enclosure is mounted into the cabinet on slides to allow easy serviceability in the field. The major internal components of the quad include:

1 .

2.

3.

4.

5.

6.

7.

8.

Two 750 watt power supplies which provide full redundancy and are on line replaceable.

Fully redundant, on line replaceable blower.

A baseboard which provides memory and I/O subsystem interfaces to the Pentium pro processors.

"Quad Memory Controller" (QMC) which supports up to 1 Gigabyte of RAM (Level 3 cache).

"Quad Processor Card" (QPC) which has 4 Pentium Pro processors and supporting DC-DC converters.

(LYNX) card which is the high speed interconnect board "IQ-Link".

7 PCI slots

Management and Diagnostic Controller" monitoring board.

Power Input' 220V AC, single phase. .

See figures l&2

(MDC) board which is the quad control, diagnostics and

U

:--L,

i m h

-$ o

System Cooling Requirements and Approach The Quad is a high end, mission critical server that requires high reliability and availability in order to meet customer demands for performance and uptime. It was important the airmover NOT be a detrimental contributing factor when determining system availability and reliability. This directed the scheme toward a redundant and online replaceable airmover.

The concept that was developed employed front to rear cooling, with air intake through the power supplies in the front passing through the airmover and exhausting into the board area via a "throttle plate". The throttle plate allows distribution of air volume as needed across the printed circuit boards and exits out the rear of the quad. This front to rear scheme was chosen in order to follow a company wide initiative of designing all new equipment with that goal in mind. Initial airflow rates were calculated from the following equation: W~3160QAT, where Q--volume of air in cubic feet per minute, W--available heat transfer rate in Kw, for standard air density. AT is the change in air temperature in °F. This equation has been found to be a fairly reliable way of estimating initial air volume requirements from a system level perspective. Detailed analysis is then performed on selected boards and targeted chips (like the Pentium Pro).

Bus timing requirements dictated the Pentium Pro's be in close proximity to each other on the quad processor card (see Figure 3). The functional specification for the quad required the cooling capacity have the capability to cool at a minimum 45 watt microprocessors, with the potential to cool future higher power devices, should Sequent want to extend the life of the product. Many low to mid level servers either on the market or coming to market cool the Pentium Pro processors with "heatsink mounted fans" in addition to system level fans. While this provides an adequate and inexpensive manner in which to cool the high power Pentium Pro (up to 45 watts power dissipation), it is not very reliable. If the heatsink cooling fan fails, the system will most likely go down or crash. There is a heatsink on the market that contains 2 fans on it, which provides redundancy, but are still required to bring the system down in order to repair or replace the heatsink/airmover, and system reliability suffers tremendously due to the number of fans that are likely to fail over time. Table 1 shows the QUAD hardware reliability. Note the significant impact to server interru 9tions with 4 heatsink mounted fans.

System Hardware Reliabili~" MTBF Type of airmover

"59,500 hours with full redundant"system airmover

51,800 hours with 1 non redundant system airmover

L 14,400 hours with 1 non redundant system airmover & 4 heatsink mounted fans

Table 1

Flotherm modeling was used in the concept stages to compare transverse and impingement cooling for the Pentium Pro's located on the QPC. Transverse cooling would have required heatsinks to be on the order of 2.0 inches tall and 50 cubic feet per minute (CFM) airflow vs. impingement which only required a 0.60 in tall heatsink and 20% less air to maintain the same temperatures. Comparison between the two types of approaches revealed that in the transverse approach, the 2 downstream processors would experience a thermal wake, an airflow shadow, and development of a significant thermal boundary layer (see Figures 4&5). (Detailed impingement modeling and data is covered in the next section). It was also desirable to maintain junction temperatures at the same operating level to minimize thermally related timing parameter shifts between processors.

Figure 4

Bc, un~,=~ T y r , ~

. . . . . . . . . . . . ou~

l¢ll|rnit P l I ~

E ~ | I F i n

. . . . . . . . . . . . Vent . . . . . . . . . . . . . Pilfer Bourne

~.OMERIC~

:~7;5-'~.~-2 :;.; ~ ~ : : V ' I .':.;'L,'.'.'-','.~[~:,':- ?:.-::~ :: "~ ".~ ~" "~ ~ ;. ~ '; ~, ~, h • " . ~ • ~ "~ ~ . . . . . . .,.,.:,~,.~ .... ~ . , , ~ ~ ~l ~ ~ ~ ~ ~ ~ ~,

1.7~0 "7

1 .~'/U

,767

,511

x . 4 .C079 E-02.

t ,90

Re( V e d ~

Heatsink attachment for such a large heatsink, combined with the shock/vibration implications, shadowing effects, headroom to cool higher power devices in the future plus minimization of timing parameters geared the cooling approach of the processors toward impingement early on in the concept stage. Impingement cooling offers direct airflow over each chip providing the same ambient air to each device, no thermal wakes or airflow shadows, less development of a boundary layer and is overall more efficient at removing heat from multiple high power devices that are in close proximity to each other than transverse cooling.

Figure 3 ]

While impingement cooling was determined to be the best approach for the QPC, transverse cooling was decidedly the best approach for the balance of the system boards. This was based on the power densities of the other boards being comparable to existing products. Since it was clearly understood what flow rates were required for current products (approximately 200 linear feet per minute across the board), this was the general goal to be achieved when performing the system level analysis.

FLOTHERM MODELING AND COMPARISON TO TEST DATA

QPC and Pentium Pro Early on in the concept and development stages of the quad, Flotherm was relied on entirely to perform thermal/airflow analysis and studies. This was especially tree for the Pentium Pro's. No lab data could be gathered since no thermal dummies or packages of the Pentium Pro existed or were available to customers. However, a thermal model of the Pentium Pro became available from Intel for use in Flotherm analysis. This allowed progress to continue on development of the product with some cov:fidence and maintain schedule goals and deliverables.

Initial Flotherm analysis predicted each Pentium Pro (45 watts) would require 18CFM in impingement flow to meet Sequent's goals for reliability. This goal was 10 ° C below the maximum case temperature specified by intel. That data was used when creating the initial blower specification and determining the distribution scheme within the board area and through the impingement plenum.

Initial test results using Pentium Pro thermal dummies, simulating 45watts power dissipation, mounted in a QUAD chassis using the initial prototype blower, yielded lower than predicted case temperatures. Time constraints did not allow investigation into the discrepancy between predicted and measured data. Instead,

lab data was gathered to determine the optimal flow rate for the processors. That was determined to be 10cfm per processor and the blower specification was modified accordingly as well as the distribution scheme (see next section for details on system level flows and distributions).

More recently, time was available to investigate the differences between predicted and measured temperatures on the Pentium Pro. The original thermal model, dissipating 45 watts and with a flow rate of 9 CFM predicted a case temperature above the CPU of 71 ° C at 20 ° C ambient. Initial test measurement was 42 ° C for the same conditions. It was discovered that between the time of concept analysis, and final investigation (a 2 year time span), changes had been made to the P6 thermal model to reflect the final package design. Also, the mechanical design of the board had changed hands, and the heatsink under test was larger than that initially modeled. The updated thermal model of the Pentium Pro combined with complete boundary conditions (correct thermal model and heatsink, radiation, etc.) yielded a case temperature of 58 ° C. See figures 6&7

Fina..1 measured_case temperature was 50 ° C. See Flotherm comparison table 2 below:

Flotherm predicted Boundary condition

case temperature °C (9CFM for all runs)

71 '" w/old P6 model and onginal sLrtk

68 w/old P6 model a~nd updated sink

62 w/'new P6 model and updated sink

60 w/new P6 model and updated sink + radiation

58 w/new P6 model and updated sisal< + radiation and fine grid

Table 2

It was discovered when performing case temperature measurements that air impinging on the thermal couple was affecting the readings. The solution was to fill the holes, drilled in the heatsinks to allow attachment of the thermal couples to the package case, with thermal grease. Graph 1 shows comparative measured results along with the final Flotherm predicted case temperature above the CPU. All results represent 45 watts power dissipation.

A lot of time was spent trying to determine why there was such a large difference between the predicted and measured results. Some of the larger contributors, were the change in the heatsink size, updates to the P6 thermal model, and plugging the holes in the test set up. Some of the more subtle reductions were obtained by the use of radiation and a finer grid in the heatsink pinfin area.

Factors that were insignificant: gravity had little effect (heat removal dominated by impingement jet); problem was insensitive to velocity scaling in algebraic turbulence model; clearance between the top of the heatsink and wall had no effect due to the inertial of the jet (this gap can cause quite a significant change in a transverse situation, allowing bypass); and the amount of heat dissipated into the board was small.

Some of the factors that may still contribute to the 8 ° C difference between final Flotherm model and measured case temperatures include:

1. It was unclear at the time of publication whether the Pentium Pro thermal dummy and Flotherm thermal model matched exactly (die sizes may be different as well as die attach).

2. The degree of gridding required to capture fully jet impingement effects may be quite large and beyond the practical limits of this exercise.

3. The turbulence model as written was not intended to approximate impingement cooling.

~,~ i/ ~ i.~ 1

x . f . - ~-% j .

. . . . . . . . . . . . Overall

. . . . . . . . . . . . . Cubed

Intemal Plate . . . . . . . . . . . . Exta,n'~l Fsfl

. . . . . . . . . . . . . Vent

. . . . . . . . . . . . I=l=~r 8 o u ~

FLOMERIC~

i . . . . . . . . . . . .

. . . . . " : = ~ ' - . , . i ~ ' i : : ~ ' ' t ", " " " ~ " ' - '

Job a ~ l o ~ . . . . . . ¢ i d J ~ ' ~ m 'proc~dllor """=" [Figure7 I

7~.16

61~4

5154

4,12~

s,oe2

2,0~1

1,031

9.6~

R d v~d=~-

R_OMER]G~ r

P6 Airflow vs. Temperature

70

65

60

55

5O A ¢J

• 45

• 4O

I -

35

30

25

20

15

2

~~ , , ,~ , , ~ Flotherm predicted case temp " ~ , ~ e CPU

Measured case ~~ Measured case temp (holes open)

CPU

I_2

I I t I E

4 6 8 10 12 Airflow (CFM)

CPU

Graph 1

L2

I I I I

14 16 18 20

System Level Impedance and Flow Control A fu]l system leve] Flotherm model was developed in order to verify desired flow rates in certain critical areas and to detemfine the overall system level impedance for a desired mass flow rate. Flow rates were measured in the Flotherm model by placing mean flow regions in areas of interest. Individual flow rates were verified for the impingement plenum, to make sure there was approximately the targeted amount of 10 CFM per processor flowing through each nozzle. Resu]ts were verified by comparing measured case temperatures of a QPC mounted in the QUAD enclosure to an individual Pentinm Pro thermal test comparing case temperature to specific flow rates for a given power (See Graph 1). Flotherm modeling predicted an average of 11CFM flowing through each nozzle and test data indicated an average flow rate of approximately 10 CFM through each nozzle, yielding excellent comparative results. See Figures 8&9.

Flow rates are contro]led to the board area via a "throttle plate". Flotherm was used to determine the size of orifices in the throttle plate for the PCI, IQ-LINK and QMC areas so as to provide desired air velocities. Flotherm accurately predicted the flow rate of 15 CFM through the IO-LINK board.

Finally, Flotherm predicted a system level impedance of 0.67 inches of water static pressure at 120 CFM. The actual is 0.700 in of water static pressure @ 120 CFM.

. . . . . . . . :~ . -.'.-. ~ - - . .

B o u n ~ q ~ Tyl;x~

. . . . . . . . . . . . O v ~ l l

. . . . . . . . . . . . . C . u ~ ' d

I n t = ~ = l P I=~

. . . . . . . . . . . . Vmn~

PGB

. . . . . . . . . . . . V d u m e R ~ ,

. . . . . . . . . . . . Ln~'~al Fan

. . . . . . . . . . . . Pl&r~r 5o,ul~

F I - O M E R I C ~

~ : ......... -. :-S: - " 7 . . z " ; 7 : : ; " ; ~ - ' ~ ' 3 ~ -:. +.~ - + ~ ~ " " ~ T 7 + ; ~ iIIif+::~:~:~: : +~-~ ;-~.~ ' ~, I i~ +

+ / " " 2 " - :_" ~ !

I : i i ::_,

7.300

B,0e4

4.B67

&650

2.433

1=217

Job: lullqtMzd I ~'+:~'~ IFignre 9 I

x=1.0618E-01

FLOMERIC~ r

Quad Blower The blower was developed in cooperation with McLean Engineering a Zero Corporation Company, located in Camarillo, California. McLean has been involved in providing airmoving solutions to the electronics industry for over 50 years. A specification for an airmover with detailed performance, physical, and handling capabilities was created by Sequent and given to McLean. These included: [email protected] water static pressure, special appearance requirements in the front to meet system level cosmetics, and the unit had to be redundant, speed controlled and on line replaceable. The specification detailed the overall enclosure requirements, but did not tell how to meet the performance and electrical requirements.

McLean designed a blower utilizing 2 motorized backward inclined impellers whose collective airflow is in series. During normal operation, the wheels turn at a fixed speed to meet airflow requirements. In the event of a locked rotor or motor failure, the other wheel speeds up to maintsin full airflow performance and a signal is sent to the system console indicating the blower is in failed mode. Ai.rmovers were arranged in a serial pattern to maximize air efficiency required to meet total system level impedance's and to eliminate backflow losses associated with a parallel approach.

What is unique about this blower is that it performs several functions at once. It provides for proper airflow through the power supplies, and delivers transverse and impingement cooling to the board area.

Conclusions Flotherm was used exclusively in the concept stages of the QUAD server development to perform feasibility studies and later in the development stages to determine adequate cooling levels of various components and boards. Flotherm data compared quite closely to experimental measurements for air velocity, mass flow rates, system level impedance and ultimately thermal prediction of the Pentium Pro.

Initial test results showed a large discrepancy between predicted and measured case temperatures for the Pentium Pro processors. Key contributing factors to initial discrepancies between measured and predicted

case temperatures of the Pentium Pro include; unknown modifications to the thermal model, late changes to the heatsink that were not cross functionally communicated , and enhancements to test measurement methodology. The overall impact to the QUAD cooling development was small because of the robustness of the scheme developed. A modification to the blower specification was made in order to accommodate these changes. However, on a different type of application, the results could have had a much larger impact, possibly delaying development of the program.

The experiences of this product development make clear the importance of having accurate thermal models of major components available to the design engineer as soon as possible. When these models are provided, it is critical there be active and planned interaction between thermal model provider and the design engineer to ensure an accurate and up to date design.

References J Sequent on the Intemet: http://www.sequent.com

Acknowledgments * Dr. Steve Addison and Satang Shidore, Flomerics Incorporated, California. , Mark Peterson, Staff Engineer, Sequent Computer Systems, Inc. * Wayne Downer, Staff Engineer, Sequent Computer Systems, Inc.