Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed...

6
CSE 403, 2/6/06 Architecture Up Close Gail Alverson Cray Inc. Who am I? Gail Alverson Lurker in your Winter 403 class Computer Science Professor Senior Engineering Manager at Cray Inc. Your entertainer for the next 50 minutes! Outline Part 1 – Case Study: Story of an engineer What’s my software engineering history? Part 2 – Architecture up close at Cray Some examples Some challenges Some lessons learned Government-Classified Government-Research Weather/Environmental Automotive Life Sciences Aerospace Academic Research Cray Solutions for: Security Design Engineering Environmental Life Sciences Data Management/SAN Focus on systems to solve huge computationally intense problems Focus on systems to solve huge computationally intense problems Cray Inc.

Transcript of Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed...

Page 1: Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed custom interconnect yLOW communication latency yFast custom microkernel on compute

CSE 403, 2/6/06

Architecture Up Close

Gail AlversonCray Inc.

Who am I?

• Gail Alverson• Lurker in your Winter 403 class• Computer Science Professor• Senior Engineering Manager at Cray Inc.• Your entertainer for the next 50 minutes!

OutlinePart 1 – Case Study: Story of an engineer

What’s my software engineering history?

Part 2 – Architecture up close at CraySome examplesSome challengesSome lessons learned

Government-Classified

Government-Research

Weather/Environmental

Automotive

Life Sciences

Aerospace

Academic Research

Cray Solutions for:Security

Design EngineeringEnvironmentalLife Sciences

Data Management/SAN

Focus on systems to solve huge computationally intense problemsFocus on systems to solve huge computationally intense problems

Cray Inc.

Page 2: Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed custom interconnect yLOW communication latency yFast custom microkernel on compute

My SW Engineering roles

Technical Manager for the VP of EngineeringTechnical Manager for the VP of Engineering

Senior Manager, OS and PE componentsSenior Manager, OS and PE components

Manager, Programming Environments (PE)Manager, Programming Environments (PE)

Project lead/Software engineer, libraries and toolsProject lead/Software engineer, libraries and tools

Software Engineer – libraries and debuggerSoftware Engineer – libraries and debugger

TIM

E

Can you think of a different SW career track?Can you think of a different SW career track?

My last project: Red Storm system• Massively parallel processing supercomputer

system used for analysis and stewardship of nuclear weapons - for Sandia National Lab $93M

System requirements included10,000 AMD processorsHigh speed custom interconnectLOW communication latencyFast custom microkernel on compute nodesUnix OS on service nodesHigh performance IO2 ½ year development timelineNo single point of failure

Can you think of some SW challenges? Risks?Can you think of some SW challenges? Risks?

How did we architect this beast?

Identify major componentsWhile (1) {

Understand your component (and overall) requirementsDesign major interfaces with dependent components

== interact with other architectsArchitect your component (the what)Design review internally and with customer[Prototype/further design (the how)]}

Page 3: Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed custom interconnect yLOW communication latency yFast custom microkernel on compute

Here are some examples of what an architecture can capture and ways it can be presented

SSI Block

Hyper Transport

L0 Controler

SSI BlockSSI Block

Hyper Hyper TransportTransportHSN

CommunicationLinks

RouterRouterRouter

DMAEngineDMADMA

EngineEngine

2 GHz

AMDOpteron™

2 GHz

AMDOpteron™

ProcessorCore

RAM

ProcessorProcessorCoreCore

RAMRAM

1 –> 4 GB

DRAM

1 –> 4 GB

DRAM

Hardware Architecture

Cray Design

3rd Party Integration

Cray System ChipCray System Chip

+Y

-Y

+Z-X

+X-Z

+Y

-Y

+Z-X

+X-Z

• 2 us Nearest neighbor• 5 us Furthest point

MPI Latency

Compute Node

HSNCommunication

Links

Service & IO Node

• 1.5 TB/s bi-directionalBi-section Bandwidth

• 120 TB/s

Aggregate Sustained Bandwidth

3D Mesh27 x 16 x 24 (x, y, z)

RAID Controllers

10 GigE Network

PCI - X

Network Architecture Can you remember some ways to view SW architecture?• Code oriented views

– Module view– Software layer view

• Execution oriented views– Data-flow view– Operational view

• Deployment views

Page 4: Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed custom interconnect yLOW communication latency yFast custom microkernel on compute

LWK Linux

Compute Service & IO

Service Partition (256 nodes)• Linux OS• Specialized Linux nodes

Login Nodes ~ 10

Dedicated Functional NodesIO Server Nodes ~ 190Network Server Nodes ~ 50FS Metadata Server Nodes ~ 4Database Server Nodes ~ 2

Compute Partition (10K nodes)Sandia LWK OS (Catamount)

RAID Controllers

Network

PCI-X10 GigE

Fiber ChannelPCI-X

Software Architecture (SA)What type of view is this?What type of view is this?

Allocator PBS mom

Job Launch

Compiler PerfToolTotalView

Login Node

Showmesh

LinuxLibraries

User Space

Boot

Portals

Portals Driver IPPortals Driver Support RCA

RCA

RCA daemon

Parallel File Systems

Support Services & daemons

Com

mon

Operating System

3rd Party3rd Party CrayCraySandiaSandia Open SourceOpen Source

3rd PartyCray Integration

3rd PartyCray Integration

User’s Environment

Accounting

DBPBS srv

Server NodeDedicated Server Functions

IO & FS daemons

IO Node

IO & Network

Nodes only

FC Driver

10GigEDriver

Dedicated Network & External RAID IO Specialized by

Node Function

Runs on all Service and IO Nodes

SA: Service NodesWhat type of view is this?What type of view is this?

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

LinuxLinux Kernel

RCA Portals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

QuintessentialKernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

LWK

RCAPortals

Portals 3.2

Compute NodesService Nodes

Seastar

Portals 3.2

Data Packets over the High Speed Network

SeastarSSI SSI

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

LinuxLinux Kernel

RCA Portals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCA

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

Linux Kernel

RCA Portals

LinuxLinux Kernel

RCA Portals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

QuintessentialKernel (QK)

RCAPortals

Quintessential Kernel (QK)

RCAPortals

LWK

RCAPortals

Portals 3.2

Compute NodesService Nodes

Seastar

Portals 3.2

Data Packets over the High Speed Network

SeastarSSI SSI

SA: Message Passing

What type of view is this?What type of view is this?

LWK on Compute Side

SeaStar

SA: more Message PassingApplication

LWKPortalsModule

Firmware Lib

Firmware

Portals3.2

MPI Library

PortalsState

Application

SSNAL

Kernel Lib

MPI Library orSML Library

API

SeaStar

FirmwareDriverState

PortalsState

Messages over High Speed Network

User

KernelH

W

Linux on Service Side

What could make this view more meaningful to the customer?

What could make this view more meaningful to the customer?

Page 5: Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed custom interconnect yLOW communication latency yFast custom microkernel on compute

Job Launch

Login Node

Linux

UserApplication

Login &Start App

Red Storm User ComputeNode

Allocator

Database Node

Request CPU’s

CPU Inventory Database

CPU list

Load Application

catamount/QKCompute Node

PCT

UserApplication

Fan outapplication

Application Runs

IO NodeIO

daemons

IO NodeIO

daemons

IO NodeIO

daemons

IO R

equests from C

ompute N

odesIO

Nodes Im

plement R

equest

Application Exit & Cleanup

Return CPU’s

Return CPU’s

SA: Job Launch

Is there enough detail to start coding from this view?

Is there enough detail to start coding from this view?

Diagrams rock, but you need to specify the APIs as well

Flowcharts help define the usage

Flowcharts help define the usage

API’s specify enough of the “what” for the next development phase, of designing the “how”

API’s specify enough of the “what” for the next development phase, of designing the “how”

One last example of a SA: System boot Requirement 5.11 The full system must be able to be booted in

less than 15 minutes after a clean shutdown.

Service Nodes1. RAS broadcasts kernel to Service nodes2. Service nodes verify Unix image is valid

If not, first loads image from RASand others, load from peer

3. Service nodes are up and running

Compute Nodes1. RAS broadcasts kernel to cage controllers2. Cage controllers broadcast to other nodes3. Compute nodes are up and running

C CC C

C C

R R

I II I

LL

Golden unix kernelGolden unix imageGolden cougar kernel

RAS

How is software different?How is software different?

Page 6: Who am I? Architecture Up Close - University of Washington · y10,000 AMD processors yHigh speed custom interconnect yLOW communication latency yFast custom microkernel on compute

Pictures tend to be part of an overall architecture document

• Introduction– Revision history!– Scope– Requirements

• System description• Deployment view

– Intro– Top level diagram– Interfaces

• Functional view • Software layer view• …

SoftwareArchitecture

Can you think of other elements?Can you think of other elements?

– Issues to resolve (risks, mitigations)

Stepping back, what was important to capture in the architecture?

Can you see any common themes?Can you see any common themes?

• Multiple views of the components• Diagrams, diagrams, diagrams• Focus on interfaces and their requirements

the “what” of the integration points

What was most important in the architecture process?

What do you think?What do you think?

• Reviews• Both internal and with the customer

• Iteration

Epilogue

• Red Storm was made into a product, Cray XT3• Full delivery was 3 ½+ years, but got something to

the customer in 3 • Software effort was much more complex than

expected• Rearchitected at least two major SW components

after getting experience with them• Product is successful and in demand (yah!)