21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm [email protected] 1 Fermi Linux Server...

29
21 May 2003 Fermi Linux Server Vendor Qua lification--Steven Timm timm 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm For the Fermi Linux Vendor Qualification Taskforce

Transcript of 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm [email protected] 1 Fermi Linux Server...

Page 1: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

1

Fermi Linux ServerVendor Qualification

HEPiX

May 21, 2003

Steven C. Timm

For the Fermi Linux Vendor Qualification Taskforce

Page 2: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

2

OUTLINE

Fermilab Hardware Procurement Strategy Goals of Qualification Procedures of Qualification Results of Qualification

Page 3: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

3

SUMMARY

The 2003 Fermi Linux Server Vendor Qualification focused on 1U Intel servers.

First phase was a technical evaluation which identified 18 technically qualified vendors.

All these vendors participate in a price-performance bid—the top five make the vendor list. (Currently ongoing).

We remember all technically qualified vendors and rotate them in as necessary.

We are not making a new qualified desktop vendor list at this time Public web page: http://www-oss.fnal.gov/scs/public/qualify2003

Page 4: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

4

Members of Fermi Linux Server Vendor Qualification Taskforce:

The taskforce involved personnel from five different departments plus key members of management. All major purchasers of server hardware were represented. Also represented were the computer room logistics staff.

Members: Steven Timm (chair), Margaret Greaney, Troy Dawson, Lance Weems, Hans Wenzel, Bruce Karrels, Don Holmgren, Phil Lutz, Stan Naymola, Mark Kaletka, Gerry Bellendir.

Page 5: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

5

Fermi Hardware Procurement Strategy

Buy a hardware solution fully integrated as possible, including installation

Identify vendors that know Fermilab requirements and are willing to work with Fermi Linux.

Replacement parts via 3 year warranty, service provided by Fermilab.

Page 6: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

6

Fermi Linux Vendor List--History

Two previous Fermi Linux qualifications, 1999 and 2001. 1999—desktops as farm workers, 5 vendors 2001—separate vendor lists for desktops and 2U

rackmount servers Also two special evaluations for 2U rackmounts and AMD. Vendor list used in all major Fermi acquisitions, ~1500

machines from 1999-2002. Also used by outside groups: KEK, INFN, Northwestern,

MIT, Geneva, Carnegie Mellon, Pittsburgh, Edinburgh, others

Page 7: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

7

Evaluation: performance/price

Overriding goal has been to get the best performance possible at the lowest price.

We have succeeded well—From 1999 to 2002 Fermi cycles per dollar increased by a factor of 6—Moore’s law should have only given us a factor of four.

Users are happy with quantity of computing that they got for their money.

But still, in this evaluation, we are looking for better long term reliability, not race to the bottom for price only.

Page 8: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

8

Evaluation: Performance/price

Problem: One node not the best test of long-term price/performance by a company.

Small businesses best able to take time to follow directions of evaluation process and give support.

Small businesses not always able to deliver large orders in timely manner with good initial quality.

Single node prices not a good predictor of bid level on a real bid—and we shouldn’t be asking anyway.

Address by: getting technical qualification done first, then doing a price/performance bid.

Page 9: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

9

Evaluation: Vendor attrition

Some vendors on list have gone out of business Others disqualified for bad performance Others stopped bidding on their own, or bid ridiculously high Address by:

– Select vendor list on performance/price basis from all those technically qualified.

– Keeping track of all technically qualified vendors, add to list if necessary

– Supplement list if special hardware (AMD, blades, desktop) required.

Page 10: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

10

Evaluation: Initial quality

Problem: Going too low on the price curve: Sometimes vendors bid too low and try to deliver poor quality systems

Addressed, from the beginning, with tough 30-day acceptance test and “lemon law”

In various cases Fermilab has required vendors to do swaps on all units of PS, case, motherboard, disk drives, and racks.

Cost of Fermi labor to resolve the problem less than difference between the winning bid and the next highest bid.

All issues have been resolved through this process and the systems have all had productive lives.

NOW—also address with references and hard numbers on initial quality.

Page 11: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

11

Evaluation: Components

Problem: Rapidly changing components In commodity market, components change rapidly. From beginning of eval to issuance of purchase order—about six

months CPU speeds go up, cases change. Impossible to track for laptop, difficult to track for desktop. OK for server market but results in higher heat loads and current

draws. ADDRESS by thermal specs that are broad enough so that if

there are problems, vendor still has to fix.

Page 12: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

12

Goals

We want to identify vendors who are best capable to deliver rackmounted solutions– Competent in Linux– Build quality 1U Servers– Can integrate into rackmount environment with good

thermals in a timely and professional manner– Have high performance– Have good support and troubleshooting

Page 13: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

13

Vendor Selection

Existing vendors on Fermi Linux list Sales to other Fermi Departments Advertisements at trade shows Survey of other DOE labs at HEPiX Vendor’s direct contact to Fermilab asking to

participate.

Page 14: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

14

Chronology

We made contact with 45 vendors in all. 29 vendors attended Jan 28. info meeting 24 vendors submitted acceptable configuration

on Feb. 4 21 vendors submitted acceptable benchmarks

and were cleared to ship unit on Mar. 4—all got it here by Mar 11.

18 vendors identified as technically qualified

Page 15: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

15

Specifications

1U Dual Intel Xeon, 2.4 GHz or faster 400 MHz front side bus or faster 1 GB RAM (RDRAM or DDR SDRAM) Disks: 1 20Gb system 2 x 40Gb data 100Mbit Ethernet Video CDROM, Floppy

Page 16: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

16

Why just 1U Xeon

AMD hardware shows high initial failure rate, high current, high heat.

1U is most challenging thermal case…if they can build 1U we believe they can build 2U.

Intel chips are supposed to be faster than AMD at the moment Intel chips supposed to run cooler, draw less current. Simplicity—a platform we already mostly understand, just one

from each vendor Space—we don’t have space to put so many 2U.

Page 17: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

17

Linux Competence

Vendor identifies hardware that’s compatible with Linux. (Much easier than it used to be).

Vendor loads Fermi Linux onto evaluation node Have to configure lm_sensors on the node Runs our supplied test to check and see if they

did it right. They are only allowed to ship the unit to

Fermilab if it is right.

Page 18: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

18

Electrical

Electric current measured with ammeter at startup, idle, and full CPU load.

Current draw ranges: 2.4GHz, 1.6-2.0A, 2.8 GHz, 2.0-2.3A, 3.06GHz, 2.1-2.35A

Likely that with purchase of 2.8 or 3.06GHz machines we can only have seven machines per circuit, not eight as in the past.

Those with higher current draw also tend to have more fans and be better internally cooled.

Bright side—This current similar to 750MHz machines bought 3 years ago, 2.5x the performance for the same current.

Page 19: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

19

Thermal

Measured T from front to back of unit for all. Used internal temperature probes on each unique type of case. All units in evaluation much cooler than the 1U units bought in

FY2002. Due to better thermal characteristics of Intel chip and many more

added internal fans and blowers. “Northbridge” chipset chips in some machines ran hotter than the

CPU’s. Important to watch size of heatsink on these chips. Still analyzing the data we took but confident that all units are

acceptable.

Page 20: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

20

Thermals continued

Page 21: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

21

Quality 1U Servers

Open each machine to verify quality of construction

Run burn-in on each machine for two weeks Thermal measurements in real rack situation Electrical current measurements Verify all components meet specs.

Page 22: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

22

Integration capabilities contd.

Vendors are asked to submit sample proposal for full rack of systems

Standard Fermi rack configuration is base of proposal but they can suggest extras.

Goal is to (1) learn if they can integrate and (2) get new ideas on how to improve our setup.

Also they must submit info on clusters they have installed before, with real temperature and reliability numbers.

Page 23: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

23

Performance

Vendors are supplied CD-ROM of CDF and D0 Benchmark

Performance measured in Fermi Cycles where PIII 1 GHz=1000 Fermi Cycles.

We repeat test when machine gets here QCD benchmark, seti@home, tiny also run. Would be ideal to use SPEC CPU2000—but published

results not repeatable with compilers used by Fermi. Price doesn’t enter in technical evaluation.

Page 24: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

24

Performance

3 CPU speeds measured, 2.4, 2.8, 3.06 GHZ, 1000 FermiCycles=PIII 1 GHz. Average performance, 1779, 2041, 2223 Fermi Cycles

respectively. 400MHZ vs 533 MHz front side bus is 2.5% effect for

farms software, much bigger for QCD. AMD MP2200+ --1771 Fermi Cycles Performance is projected to faster clock speeds in

anticipation that some vendors will bid faster chips.

Page 25: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

25

Support and Troubleshooting

Each vendor gets software call—related to the configuration of Fermi Linux, solvable by E-mail or phone

Each vendor gets hardware call—designed to trigger an on-site service call.

We manufacture one if necessary. Points for prompt response, correct response.

Page 26: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

26

Conclusions

18 technically qualified vendors—in alphabetical order Ace, Angstrom, APPRO, ASA, Aspen, Atipa,

Concentric, Dell, HP, IBM, Koi, Penguin, Promicro, PSSC, Rackable, Racksaver, Richardson, Western Scientific

Price/performance bid will weed them down to five. 21 vendors is too many to bring in, will be more

discriminating next time.

Page 27: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

27

Component issues:

Boards OK: Intel SE7501 series, Supermicro X5DPx series, Tyan 2721, Tyan 2723

Both Tyan S2721-533 (Thunder i7501 Pro) and Tyan S2723 (Tiger i7501) had issues with 10/100 ethernet…resolved by changing resistor value on the board

Some manufacturers offer cold-swap and hot-swap capabilities on drives, very nice.

Issues in Intel E7501 chipset—slower disk throughput than some earlier chipsets, but adequate for our needs.

Page 28: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

28

Price/performance bid

All vendors who pass our technical requirements are participating in a price/performance bid on a small number of nodes (48)

Top five will be the Fermi Linux Qualified Vendors We will keep track of all technically qualified vendors to replenish

the list if– A vendor goes out of business– A vendor stops bidding, or bids consistently very high on Fermi

RFP’s– A particular RFP requires special capacities—Myrinet, AMD,

blade servers, desktop

Page 29: 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal.gov 1 Fermi Linux Server Vendor Qualification HEPiX May 21, 2003 Steven C. Timm.

21 May 2003Fermi Linux Server Vendor Qualification--Steven Timm [email protected]

29

Future Plans

Blade server evaluation coming up.– Requires change in install philosophy…no floppy,

CDROM, serial console available.– Essential to address power and space concerns in

Feynman and elsewhere.