Supercomputing and Mass Market Desktops

18
Supercomputing and Mass Market Desktops John Manferdelli Microsoft Corporation 1 This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

Transcript of Supercomputing and Mass Market Desktops

Page 1: Supercomputing and Mass Market Desktops

Supercomputing and Mass Market Desktops

John Manferdelli

Microsoft Corporation

1

This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this

summary.

Page 3: Supercomputing and Mass Market Desktops

Hardware Paradigm Shift - 2004

“… we see a very significant shift in what architectures will look like in the future

...

fundamentally the way we've begun to look at doing that is to move from

instruction level concurrency to … multiple cores per die. But we're going to

continue to go beyond there. And that just won't be in our server lines in the

future; this will permeate every architecture that we build. All will have massively

multicore implementations.”

Intel Developer Forum, Spring 2004

Pat Gelsinger

Chief Technology Officer, Senior Vice President

Intel Corporation

February, 19, 2004

10,000

1,000

100

10

1

„70 „80 „90 „00 „10

Po

wer

Den

sit

y (

W/c

m2)

4004

8008

8080

8085

8086

286386

486

Pentium® processors

Hot Plate

Nuclear Reactor

Rocket Nozzle

Sun‟s Surface

Intel Developer Forum, Spring 2004 - Pat Gelsinger

Today's Architecture: Memory access speed not

keeping up with CPU clock speeds

Modern Microprocessors - Jason Patterson

Sp

eed

(M

Hz)

10,000

1,000

100

10

1990 1992 1994 1996 1998 2000 2002 2004

CPU Clock Speed

DRAM Access

Speed

Today‟s Architecture: Heat becoming an

unmanageable problem!

Intel Cancels Top-Speed Pentium 4 ChipThu Oct 14, 6:50 PM ET Technology - Reuters

By Daniel Sorid

Intel …canceled plans to introduce its highest-speed desktop

computer chip, ending for now a 25-year run that has seen the

speeds of Intel's microprocessors increase by more than 750

times.

Memory Wall~90 cycles of the CPU clock

to access main memory!

Page 4: Supercomputing and Mass Market Desktops

Trends

1

2

4

8

16

32

64

128

256

512

1024

2048

4096

8192

16384

32768

65536

2004 90nm

2006 65nm

2008 45nm

2010 32nm

2012 22nm

2015 16nm

max transistors (millions, 500 mm2 die)

potential many-core peak parallel GOPS

multi-core single threaded GOPS

memory bandwidth, novel packaging (10s GB/s)

memory bandwidth, DIMMs (10s GB/s)

8X

80X

max transistors ↑40%/year

single thread perf ↑10%/year

32 LPIA cores @ 30 GFLOPS

25X

MCM

stacked dice DRAM

ParallelismOpportunity

Montecito

BandwidthOpportunity

4

Page 5: Supercomputing and Mass Market Desktops

LPIA

x86

1 MB

cache

1 MB

cache

DRAM

ctlr

LPIA

x86

1 MB

cache

1 MB

cache

PCIe

ctlr

Ultra-Mobile: 40 mm2, 5 W, $50

LPIA

x86

LPIA

x86

DRAM

ctlr

DRAM

ctlr

DRAM

ctlr

DRAM

ctlr

LPIA

x86

LPIA

x86

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

PCIe

ctlrNoC NoC NoC NoC NoC NoC

PCIe

ctlr

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

LPIA

x86

LPIA

x86 Custom acceleration

LPIA

x86

LPIA

x86

Server: 350 mm2, 120 W, $2000

LPIA

x86

LPIA

x86

DRAM

ctlr

DRAM

ctlr

OoO

x86

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

PCIe

ctlr

PCIe

ctlrNoC NoC

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

1 MB

cache

1 MB

cache

LPIA

x86

LPIA

x86

DRAM

ctlr

DRAM

ctlr

OoO

x86

Desktop: 200 mm2, 100 W, $400

Lots more computing powerLos Alamos Computing Center? No a single CPU chip.

(2008 45 nm process)

5

Page 6: Supercomputing and Mass Market Desktops

1. We believe user experiences will benefit from 100 fold improvements in computational power.

2. Because of physics, you won’t get this power from frequency scaling or programmer transparent hardware like ILP. The only way is parallelism.

3. Chips with many heterogeneous cores can be manufactured now. Only with chips like this can such performance scaling continue.

4. There are difficult but solvable issues related to memory bandwidth and I/O for this to work well over-all.

5. There are other benefits from such chips like good power and manufacturing characteristics .

6. Application of such hardware does not appear to be limited by any intrinsic lack of parallel algorithms.

Many Core in a Nutshell

6

Page 7: Supercomputing and Mass Market Desktops

Manycore Induced Hard Problems

7

Constructing parallel applications

• Encapsulating parallelism in reusable components

• Integrating concurrency & coordination into existing programming languages

• Raising the semantic level to eliminate sequencing

• Reducing the complexity of debugging, tuning and testing

Executing fine-grain concurrent applications

• Managing large amounts potential concurrency

• Supporting lightweight transactions

• Interoperating with legacy thread model and interfaces

• Evolving hardware to effectively support parallel programs

Coordination of system resources and services

• Assigning resources securely

• Hosting concurrent operating environments

• Managing resources cooperatively

• Providing concurrent system services

• Managing heterogeneous resources

Page 8: Supercomputing and Mass Market Desktops

8

Manycore Stack

Constructing Parallel

Applications

Executing Fine-Grain

Parallel Applications

Coordinating

System Resources

and Services

Applications

Libraries

Languages, Compilers and Tools

Concurrency Runtime

Partitioning Hypervisor

Hardware

OS Kernel

Page 9: Supercomputing and Mass Market Desktops

Developer Impact

9

New Languages Safety & Portability Streaming Data Parallel Functional Parallel

Programming Abstractions

Existing Languages Data Parallel Extensions Transactions API for messaging

Developer Tools

Infrastructure Transaction & Parallel

Safe Numerical Libraries Declarative Languages

Asynchronous Agents

Concurrent Collections

Transactions

Visual DesignersAgents, Dataflow

Diagnostics Debugging &

Performance Tuning

Testing & Verification

Page 10: Supercomputing and Mass Market Desktops

10

Simulating The Physical World

Page 11: Supercomputing and Mass Market Desktops

Gameplay Simulation

Page 12: Supercomputing and Mass Market Desktops

Life

Sciences

Multidisciplinary

Research

New Materials,

Technologies

and Processes

Computer and

Information

Sciences

Math and

Physical Science

Social

SciencesEarth

Sciences

Page 13: Supercomputing and Mass Market Desktops

ComputationalModeling

Computational

Modeling

Sensors

Interpretation

& Insight

Persist

Data Mining

& Algorithms

Page 14: Supercomputing and Mass Market Desktops

A transformative exampleModeling of the instrumented world

141414

First Responder Scenario (empowering users to make good decisions)First responders need to quickly and safely experience unfamiliar areas

under different disaster scenarios before they deploy into it.

For example, earthquakes in San Francisco.

Setup Environment

Explore Possibilities

Monitor and Take Action

Page 15: Supercomputing and Mass Market Desktops

Scenario Features Tech Domains

15

Modeling the instrumented worldScenario: First responders need to quickly and safely experience unfamiliar areas under different disaster scenarios before they deploy into it.

Technology Domains: To realize this scenario, we need to compose these technologies and accelerate their performance at least 10-fold on the manycore client.

Mac

hin

e Le

arn

ing

Mat

h

Ph

ysic

s

Stat

isti

cs

Imag

e P

roce

ssin

g

3d

Mo

del

ing

Ren

der

ing

Dat

abas

e

Qu

ery

engi

ne

Ind

ex e

ngi

ne

Dat

a co

mp

ress

ion

Mo

tio

n In

pu

t

Stitching scenes together (2D, 3D) P P P P P

Object recognition in scene P P P P

Interaction (navigation) in scene P P P P P P P

Linking scene objects to live data P P P P P

Filling in missing information P P P P P P P

Image and video correction P P P P P P

Long running scene statistics P P P P P

Personalization of content P P P

Simulating physical events P P P P P

Real-time scene update P P P P P P P P P P

Page 16: Supercomputing and Mass Market Desktops

Tech Domains Platform

Technology DomainsTo realize this scenario, we need to compose these technologies and accelerate their performance at least 10-fold on the manycore client..

Platform : To compose technology domains and accelerate their performance 10-fold, we need to build the following platformcomponents

Inte

grat

ing

con

curr

ency

in

to e

xist

ing

lan

guag

es

Enca

psu

lati

ng

par

alle

l-is

m in

reu

sab

le m

od

ule

s

Too

ls f

or

deb

ugg

ing

and

p

rofi

ling

par

alle

l co

de

Sup

po

rt f

or

fin

e-g

rain

d

ata

par

alle

lism

Sup

po

rt f

or

ligh

twei

ght

tran

sact

ion

s

Op

tim

izat

ion

of

avai

lab

le b

and

wid

th

Syst

em-w

ide

dyn

amic

re

sou

rce

man

agem

ent

Co

nn

ecti

vity

of

serv

ices

Man

agin

gm

anyc

ore

har

dw

are

pla

tfo

rm

Machine Learning P P P P P P P P

Math P P P P

Physics P P P P P P P

Statistics P P P P P

Image Processing P P P P P P P P P

3D Modeling P P P P P P P P

Rendering P P P P P P P P P

Database P P P P P P P P P

Query engine P P P P P P P P P

Data compression P P P P

Gesture/Motion Input P P P P P P P

Page 17: Supercomputing and Mass Market Desktops

Application/Libraries Architecture

Personal Assistant Information Management

Natural UI Games & Entertainment

Program Development Biology/Health care

Technical Computing Business Intelligence

System Design Robotics

Speech Engine, Gaming (Physics/AI), Unified

Communications, Database, Vision, Machine

Learning, Semantic Processing

Single domain physics, search algorithms,

audio/video characterization and extraction, biology

simulations, optimization and constraint resolution

Applications

Application Services

Domain Specific Libraries

BaseLibraries

Common data structures and

algorithms: trees, graphs, tables,

sorting, traversal.

Page 18: Supercomputing and Mass Market Desktops

System Architecture

Hypervisor

Device/

Virtualization

Service

Partition

Virtual

Device

Disk Net Keyboard, Mouse, Screen

Trusted I/OCPUs, Memory, TPM

Machine

Management

Partition

Paravirtualized

OS Partition

Device

proxy

ConcRT

SysService API

Many-Core

Compute Partition

ConcRT

Device

proxy

Application

Kernel

Application

SysService

Kernel

Resource Management between OS/Runtime and

VM/OS (under Machine Management)Isolation and cooperative

device support provided by

VM/OS

Scalable Services provided by OS (async,

cancellable, thread affinity free)