Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

42
Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    3

Transcript of Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Page 1: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Towards a Digital Human

Kathy Yelick

EECS Department

U.C. Berkeley

Page 2: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

The 20+ Year Vision

• Imagine a “digital body double” – 3D image-based medical record– Includes diagnostic, pathologic, and other

information

• Used for:– Diagnosis– Less invasive surgery-by-robot– Experimental treatments

• Digital Human Effort– Lead by the Federation of American Scientists

Page 3: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Digital Human Today: Imaging

• The Visible Human Project– 18,000 digitized sections of the body

• Male: 1mm sections, released in 1994• Female: .33mm sections, released in 1995

– Goals• study of human anatomy• testing medical imaging algorithms

– Current applications: • educational, diagnostic, treatment planning,

virtual reality, artistic, mathematical and industrial

• Used by > 1,400 licensees in 42 countries

Image Source: www.madsci.org

Page 4: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Digital Human Roadmap

1995 2000 2005 2010

1 organ 1 model

scalable implementations

1 organ multiple models

multiple organs

3D model construction

better algorithms

organ system

coupled models

faster computers

improved programmability

digital human

Page 5: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Building 3D Models from Images

Image source: John Sullivan et al, WPI

Image data from • visible human• MRI• Laboratory experiments

Automatic construction• Surface mesh• Volume mesh• John Sullivan et al, WPI

Page 6: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Simulation: From Genome to Function (James B.Bassingthwaighte, UW)

Genes

HealthOrganism

Organ

Tissue

Cell

Molecule

The Physiome Projecthttp://www.physiome.org

Structure to Function:• Experiments, Databases• Problem Formulation• Engineering the Solutions • Quantitative System Modeling• Archiving & Dissemination

Page 7: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Organ Simulation

Cardiac cells/muscles– SDSC, Auckland, UW, Utah,

Cardiac flow

– NYU,…

Lung transport– Vanderbilt

Lung flow– ORNL

Cochlea

– Caltech, UM, UCB

Kidney mesh generation

– Dartmouth

Electrocardiography– Johns Hopkins,…Skeletal mesh

generation

Brain

– ElIisman

Just a few of the efforts at understanding and simulating parts of the human body

Page 8: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Immersed Boundaries within the Body

• Fluid flow within the body is one of the major challenges, e.g., – Blood through the heart– Coagulation of platelets in clots– Movement of bacteria

• A key problem is modeling an elastic structure immersed in a fluid– Irregular moving boundaries– Wide range of scales– Vary by structure, connectivity, viscosity, external

forced, internally-generated forces, etc.

Page 9: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Heart Simulation

Developed by Peskin and McQueen at NYU– On a Cray C90: 1 heart-beat in 100 CPU hours– 1283 point fluid mesh; coarser mesh incorrect – Heart represented by muscle fibers

–Applications: • Understanding

structural abnormalities

• Evaluating artificial heart valves

• Eventually, artificial hearts

Source: www.psc.org

Page 10: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Heart Simulation

Source: www.psc.org

Animation of lower portion of the heart

Page 11: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Platelet Simulation

• Simulation of blood clotting– Developed by Aaron Fogelson and Charlie Peskin– Cells are represented by polygons

• Rules added to simulate adhesion

– Artery walls represented as fibers– 2D Simulation

• Main implementation in F77 for vectors• Our implemented in Split-C for distributed memory

Page 12: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Cochlea Simulation

– Simulates fluid-structure interactions due to incoming sound waves

– Potential applications: design of cochlear implants

• Simulation–OpenMP code

• E. Givelberg• J. Bunn• M. Rajan

–256x256x128 fluid grid

–18 hours on HP Superdome

Page 13: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Cochlea Simulation

• Embedded structures are– Elastic membranes– Shell model– Variable elasticity

• Capture shape of cochlea

• Shown here is a plate– Implemented in a

few weeks within Titanium code Source: Titanium simulation on a T3E

Page 14: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Bacteria Simulation

• Simulation of bacteria in a fluid– Requires smaller scale– Small animal swimming

• Lisa Fauchy, Tulane and

• James Sullivan, Tulane

• E coli simulation– Video © James A. Sullivan, Cells Alive!

• Brownian motion extension– Under development by Peter Kramer, RPI– microscopic processes where thermal fluctuation is

important

Page 15: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Insect Flight Simulation

• Wings are– Immersed 2D

structure

• Under development – UW and NYU

• Applications to– Insect robot design

Source: Dickenson, UCB

Page 16: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Other Applications

• The immersed boundary method has also been used, or is being applied to– Flags and parachutes– Flagella – Embryo growth– Valveless pumping– Paper making

Page 17: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Immersed Boundary Method Structure

Fiber activation & force calculation

InterpolateVelocity

Navier-Stokes Solver

SpreadForce

4 steps in each timestep

Fiber Points

Interaction

Fluid Lattice

Page 18: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Challenges to Parallelization

• Irregular fiber lists need to interact with regular fluid lattice. – Trade-off between load balancing of fibers

and minimizing communication• Efficient “scatter-gather” across processors

• Need a scalable elliptic solver– Plan to uses multigrid – Eventually add Adaptive Mesh Refinement

• New algorithms under development by Colella’s group at LBNL and Baden at UCSD

Page 19: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Software Architecture

Application Models

Generic Immersed Boundary Method (Titanium)

Heart(Titanium)

CochleaFlagellateSwimming

Spectral(Titanium)

Multigrid MLC(KeLP)

AMR

Extensible Simulation

SolversMultigrid(Titanium)

– Based on Generic Immersed Boundary Method– Can plug in new models by extending fiber points– Uses Java inheritance for simplicity

Page 20: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Immersed Boundaries in Titanium

–Heartbeat simulation • Experiment 800• 1/5th of heartbeat done• Using to validate code

• Recent improvements– FFTW in solver: 10x faster– Study of fluid/fiber

communication• Better load balancing

Source: Titanium simulation on Millennium

• Immersed Boundary Method in Titanium is complete– Download from www.cs.berkeley.edu/~smyau

Page 21: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Reduced Solver Bottleneck

• Performance improvements in past year– About 5x overall; ~10 seconds/ts on 64 T3E nodes– Bottleneck has changed to interaction phases

Fiber14%

Solve23%

Spread30%

Inter-polate27%

Inter-polate27%

Spread27%

Solve44%

Fiber2%

2000 Performance 2002 Performance

Page 22: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Load Balance and Locality

• Tried various assignments of fiber points

Optimized for Locality

Optimized for Load Balance

• Optimizing for load balance generally better– Application (fiber structure) dependent– Somewhat constrained by spectral solver– Multigrid will give more options

Page 23: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Scallop: New Multigrid Poisson Solver

• A latency tolerant elliptic solver library– Based on domain decomposition– Trades off extra computation for fewer messages– Will be used to build Navier-Stokes Solver

• Work by Scott Baden and Greg Balls, UCSD– Based on Balls/Colella algorithm– Implemented in C++/KeLP, with a simple interface

• Solver not yet integrated– Algorithm is complete– Performance tuning is ongoing– Interface between Titanium and C++/KeLP

developed

Page 24: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Tools for High Performance

Challenges to parallel simulation of a digital human are generic

• Parallel machines are too hard to program– Users “left behind” with each new major generation

• Efficiency is too low– Even after a large programming effort – Single digit efficiency numbers are common

• Approach– Titanium: A modern (Java-based) language that

provides performance transparency– Sparsity: Self-tuning scientific kernels

Page 25: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Titanium Overview

Object-oriented language based on Java with:• Scalable parallelism

– Single Program Multiple Data (SPMD) model of parallelism, 1 thread per processor

• Global address space– Processors can read/write memory on other

processor– Pointer dereference can cause communication

• Intermediate point between message passing and shared memory

Page 26: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Performance Challenges

• Two separate performance problems– Serial Performance

• Can Java be used for high performance?• What if we compile to machine code, rather

than using the JVM?

– Communication Performance• Global address space encourages the use of

small messages• Overlapping used to tolerate latencies• How well does this work on current networks?

Page 27: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

SciMark Benchmark

• Numerical benchmark for Java, C/C++• Five kernels:

– FFT (complex, 1D)– Successive Over-Relaxation (SOR)– Monte Carlo integration (MC)– Sparse matrix multiply – dense LU factorization

• Results are reported in Mflops• Download and run on your machine from:

– http://math.nist.gov/scimark2– C and Java sources also provided

Roldan Pozo, NIST, http://math.nist.gov/~Rpozo

Page 28: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

SciMark: Java vs. C(Sun UltraSPARC 60)

0

10

20

30

40

50

60

70

80

90

MFl

ops

FFT SOR MC Sparse LU

C

Java

* Sun JDK 1.3 (HotSpot) , javac -0; Sun cc -0; SunOS 5.7

Roldan Pozo, NIST, http://math.nist.gov/~Rpozo

Page 29: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

SciMark: Java vs. C(Intel PIII 500MHz, Win98)

0

20

40

60

80

100

120

FFT SOR MC Sparse LU

C

Java

* Sun JDK 1.2, javac -0; Microsoft VC++ 5.0, cl -0; Win98

Roldan Pozo, NIST, http://math.nist.gov/~Rpozo

Page 30: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Can we do better without the JVM?

• Pure Java with a JVM (and JIT)– Within 2x of C and sometimes better

• OK for many users, even those using high end machines

– Depends on quality of both compilers

• We can try to do better using a traditional compilation model – E.g., Titanium compiler at Berkeley

• Compiles Java extension to C• Bounds checking of arrays is crucial – compiler

flag to turn this off

Page 31: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Java Compiled by Titanium Compiler

Performance on a Pentium IV (1.5GHz)

050

100150200250300350400450

Overall FFT SOR MC Sparse LU

MF

lop

s

java C (gcc -O6) Ti Ti -nobc

Page 32: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Java Compiled by Titanium Compiler

Performance on a Sun Ultra 4

0

10

20

30

40

50

60

70

Overall FFT SOR MC Sparse LU

MF

lop

s

Java C Ti Ti -nobc

Page 33: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Language Support for Performance

• Multidimensional arrays– Contiguous storage– Support for sub-array operations without copying

• Support for small objects– E.g., complex numbers– Called “immutables” in Titanium– Sometimes called “value” classes

• Semi-automatic memory management– Create named “regions” for new and delete– Avoids distributed garbage collection

Page 34: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Performance Challenges

• Two separate performance problems– Serial Performance

• Can Java be used for high performance?• What if we compile to machine code, rather

than using the JVM?

– Communication Performance• Global address space encourages the use of

small messages• Overlapping used to tolerate latencies• How well does this work on current networks?

Page 35: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

The Network Parameters

• EEL – End to end latency or time spent sending a short message between two processes.

• Parameters of the LogP Model– L – “Latency”or time spent on the network

• During this time, processor can be doing other work

– O – “Overhead” or processor busy time on the sending or receiving side.

– G – “gap” the rate at which messages can be pushed onto the network.

– P – the number of processors

• Overhead is the most important parameter – cannot be hidden

Page 36: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

LogP Parameters: Overhead & Latency• Non-overlapping

overhead• Send and recv

overhead can overlap

P0

P1

osend

L

orecv

P0

P1

osend

orecv

EEL = osend + L + orecv

EEL < osend + orecv

Page 37: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Performance of Remote Read/Write

0

5

10

15

20

25

T3E/M

PI

T3E/S

hmem

T3E/E

-Reg

IBM

/MPI

IBM

/LAPI

Quadr

ics/M

PI

Quadr

ics/P

ut

Quadr

ics/G

et

M2K

/MPI

M2K

/GM

Dolph

in/M

PI

Gigan

et/V

IPL

use

c

Send Overhead (alone) Send & Rec Overhead Rec Overhead (alone) Added Latency

8-byte messages

Page 38: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

LogP Parameters: gap

• The Gap is the delay between sending messages

• Gap may be larger than send overhead, e.g., – NIC may be busy processing the last

message.– Flow control on the network may

prevent the NIC from accepting the next message.

• If the gap is higher than overhead– Need to overlap communication with

computation, not other communication

P0

P1

osendgap

gap

Page 39: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Results: Gap and Overhead

6.7

1.2 0.2

8.2 9.5

95.0

1.6

6.510.3

17.8

7.84.6

0.0

5.0

10.0

15.0

20.0

use

c

Gap Send Overhead Receive Overhead

Page 40: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Send Overhead Over Time

• Overhead has not improved significantly– Lack of integration; lack of attention in software

Myrinet2K

Dolphin

T3E

Cenju4

CM5

CM5

Meiko

MeikoParagon

T3D

Dolphin

Myrinet

SP3

SCI

Compaq

NCube/2

T3E0

2

4

6

8

10

12

14

1990 1992 1994 1996 1998 2000 2002Year (approximate)

usec

Page 41: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Summary

Several categories–Application development

• Heart and cochlea-component simulation

–Application-level package• Generic immersed boundary method• Parallel for shared and distributed memory• Enables new larger-scale simulations; finer grid

–Titanium language and compiler• High level, high performance programming

–Communication runtime layer• Lightweight communication and benchmarks

applicationdata

softwaresystems

Page 42: Towards a Digital Human Kathy Yelick EECS Department U.C. Berkeley.

Collaborators• Susan Graham

• Paul Hilfinger

• Jim Demmel

• Alex Aiken• Phillip Colella• Peter McQuorquodale• Mike Welcome• Sabrina Merchant• Kaushik Datta

• Dan Bonachea• Rich Vuduc• Jason Duell• Paul Hargrove• Wei Chen

• Eun-Jin Im• Szu-Huey Chang• Carrie Fei• Ben Liblit• Robert Lin• Geoff Pike• Jimmy Su• Ellen Tsai• Siu Man Yau• Shaoib Kamil• Benjamin Lee• Rajesh Nishtala• Costin Iancu

http://titanium.cs.berkeley.edu/