Talk Final

download Talk Final

of 42

Transcript of Talk Final

  • 5/20/2018 Talk Final

    1/42

    Rethinking Productivity and Performance

    for the Exascale Era

    Allen D. Malony

    Keynote Talk

    7thWorkshop on Productivity and Performance (PROPER)*

    *

    Supported by the Virtual Institute High Productivity Supercomputing (VI-HPS)

  • 5/20/2018 Talk Final

    2/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Abstract

    The push to exascale systems is forcing the parallel computing community to

    rethink fundamental notions of productivity and performance. The rapidly

    growing degrees of parallelism brought on by manycore processors is just one

    aspect of an evolving landscape of architectural, system, and software features

    that is increasing the complexity of the application development and

    optimization process. It is becoming more apparent that in order to address thecomplexity concerns unfolding in the exascale space, we must think of

    productivity and performance in a more connected way and the technology to

    support them as being more open, integrated, and intelligent. This talk will

    discuss directions for parallel performance research and tools that target the

    scalability, optimization, and programmability challenges of next-generation

    HPC platforms with high-productivity as an essential outcome.

    2

  • 5/20/2018 Talk Final

    3/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Outline

    ! Productivity! Scientific Productivity

    ! HPC Productivity Factors and Landscape

    ! Exascale productivity crisis

    ! Rethinking productivity and performance

    ! Directions

    " Performance knowledge engineering

    "

    Integration and synthesis (for autotuning)" Dynamic introspection and adaptation

    ! Conclusions

    3

  • 5/20/2018 Talk Final

    4/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Productivity a Computing Metric of Merit*

    !

    Rich measure of quality of the computing experience" Captures key factors that determine overall impact

    " Greater productivity, better computing experience! Productivity is strongly related to ease of use

    " Less effort for same result in same time

    ! Expands our notion of computing effectiveness" Focuses attention on important effectiveness contributors" Exposes relationships between

    !program development and program execution!time to develop/maintain/ with time to solution

    !Productivity unifies usability and performance

    " Expresses tradeoff between

    !programmability and delivered performance

    4

    * Courtesy of

    Thomas Sterling,

    Indiana University

  • 5/20/2018 Talk Final

    5/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    HPC is about Scientific Productivity

    ! Scientific productivity is a qualitymeasure of the process of achievingscience results, incorporating:

    "Software productivity: developmenteffort, time, maintenance, support

    "Execution-time productivity: efficiency,

    time, cost to run scientific workloads" Workflow and analysis productivity:

    experiment design, results analysis, validation,hypothesis testing

    "End-to-end productivity:from science questionsto scientific discovery (i.e., valueof scientific insights)

    ! Productivity costs" Human resource in development and re-engineering

    " Machine and energy resources in runtime (performance)" Utility and correctness of computational results

    5

  • 5/20/2018 Talk Final

    6/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 6Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    HPC Productivity Factors

    6

    !"#$%&'( "* +,"-.' /%&$01234 5261.2. 78

  • 5/20/2018 Talk Final

    7/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 7Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    HPC Productivity Factor Elaboration

    7

    !"#$%&'( "* +,"-.' /%&$01234 5261.2. 78

  • 5/20/2018 Talk Final

    8/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    HPC Productivity in Terascale / Petascale Eras

    !

    Productivity in the terascale era focused on evolving vectorcodes to distributed memory parallelism

    " MPI was the major advancement

    ! Trends in the petascale era rode Moores law and focusedon scalability via large-scale clusters and increasing cores

    " Mixed-mode programming with threading (MPI+OpenMP)

    " Accelerators for high-throughput (GPUs, manycore)

    ! Productivity through scalable, evolutionary improvement

    " Familiar programming paradigms and environments

    " More algorithm sophistication and early runtime coupling

    " Development of more robust application libraries

    ! Performance tools followed along a similar path

    " Focused on scalability, robustness, automation,

    8

  • 5/20/2018 Talk Final

    9/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Performance Trends for Exascale Architectures

    ! Increasing flopsbecause of corecounts, notclock speed

    ! Multi-level,massive concurrency

    ! Declining memory per core

    ! Tradeoffs in performance,power, and resilience

    " Multi-objective

    " Hard energy constraints

    ! Hybrid systems alternatives

    9

    !"#$%&'( "* 98 :"33&4 ;"%$& ?;>

    @!"$&'

    A2&$3(B&$C>D9

    !0"EF/B&&6

    G-.H

  • 5/20/2018 Talk Final

    10/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Growing Crisis in Scientific Productivity at Exascale

    ! HPC is increasingly important in science domains, but HPCapplication development has not advanced as in other fields

    ! Exascale factors will further affect productivity problems

    " Disruptive changes in extreme-scale architectures"New frontiers in modeling, simulation, and analysis of

    complex multiscale and multiphysics phenomena! Effects" Significant lag between extreme-scale hardware and

    algorithmic innovations and their effective use inapplications

    "

    Poor support for code coupling of independent components" Lack of agile yet rigorous software engineering practices

    for HPC that are both performant and maintainable" Failure to consider the entire lifecycle of large scientific

    software efforts, leading to fragile, complicated codes

    10

  • 5/20/2018 Talk Final

    11/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Exascale Computing Productivity Attention

    !

    DARPA High Productivity Computing Systems (HPCS)http://en.wikipedia.org/wiki/High_Productivity_Computing_Systems

    ! Extreme-Scale Scientific Application

    Software Productivity: Harnessing the Full

    Capacity of Extreme-Scale Computing,

    white paper, September 9, 2013.http://www.orau.gov/swproductivity2014/ExtremeScaleScientificApplicationSoftwareProductivity2013.pdf

    ! Software Productivity for Extreme Scale

    Science, DOE ASCR Workshop, January

    13-14, 2014.http://www.orau.gov/swproductivity2014/

    !

    Exascale Computing Systems Productivity,DOE ASCR Workshop, June 3-4, 2014.http://www.orau.gov/ecsproductivity2014/

    ! ACS Productivity Workshop, DOE Office

    of Science, July 2014, Indiana University.

    11

  • 5/20/2018 Talk Final

    12/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    What is Exascale Computing Productivity?

    !

    Exascale computing productivity is the effective andefficient use of all exascale resources (hardware,application software, runtime, people, processes, energy)in the production of new scientific insights

    ! Goal

    " Productivity awareness embedded in all exascalelifecycle activities from R&D through deployment tooperation and production of scientific insights

    " Increase efficiency of overall exascale ecosystem

    during research and development by identifying,removing, and ameliorate productivity and

    performancebottlenecks

    ! Sounds good so what is the problem?

    12

  • 5/20/2018 Talk Final

    13/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Exascale Breaks HPC Programming Paradigm

    ! Programming model must reflect underlying machine model

    ! Argues for a more data centric approach in future (not easy)

    13

  • 5/20/2018 Talk Final

    14/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Difficult Mapping Problems with Exascale

    ! Multiple levels of mapping to exascale hardware" Mapping decisions affects performance portability

    14

    ;#-&$1E.0

    -&%,"6

    9$"3$.--12

    3

    -"6&0

    !"-B10&$I

    $#2J-&

    !"#$%&'( "* =8 /,.0*4 >?;>4 KA!DLG MNOP F&(2"%&

  • 5/20/2018 Talk Final

    15/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Multiphysics and Multiscale Science Frontiers

    ! Single physics simulations are maturing! Coupling of simulations across different types of physics

    (multiphysics) and different time and length scales

    (multiscale) is becoming increasingly important for

    improving model fidelity! Parameters studies and design optimization that employ

    ensembles of simulations extend science beyond point

    solutions and explore problem space

    ! Uncertainty quantification and sensitivity analysis! Compounded complexities of code interactions and

    collaboration across algorithms and applications

    15

  • 5/20/2018 Talk Final

    16/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Assumptions of Uniformity are No Longer Valid

    !

    Assumptions of uniformity are not longer valid" Heterogeneous compute engines

    " Fine-grained power management affects homogeneity

    "Non-uniformities in process technology cause variation

    " Fault resilience introduces inhomogeneity

    ! Current bulk synchronous model is deficient

    " Current focus is on removing sources of performancevariation (jitter) will be increasing impractical

    " Huge costs in power/complexity/performance to extend the

    life of a purely bulk synchronous model!Embrace performance heterogeneity

    " Assume asynchronous execution model as the norm

    16

  • 5/20/2018 Talk Final

    17/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 17Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Exascale Productivity Improvement End-to-End

    17

    !"#$%&'( "* +,"-.' ;6"#''&QC&R&$4

  • 5/20/2018 Talk Final

    18/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Performance Technology Implications

    !

    Performance technology has evolved to serve the dominantarchitectures and programming models

    " Observabilityera (1991 1998)

    !instrumentation, measurement, analysis

    "Diagnosisera (1998 2007)

    !

    identifying performance inefficiences

    " Complexityera (2008 2012)

    !scale, memory hierarchy, network, multicore, GPU

    ! 20+ years of reasonably stable parallel execution models

    " 1stperson application focus

    " Productivity somewhat decoupled from performance

    !not directly necessary to know application semantics

    18

  • 5/20/2018 Talk Final

    19/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Exascale Era and Performance Directions

    !

    Exascale era (2012 ??) is fundamentally different! Productivity and performance are more intimately coupled in

    exascale environments

    " High productivity high performance applications

    " Applications must be mapped to new exascale systems

    " Performance awareness is necessary at all levels

    " 1stperson (application) + 3rdperson (system resources)performance views necessary

    ! Directions

    " Performance knowledge engineering

    " Integration and synthesis (for autotuning)

    " Dynamic introspection and adaptation

    19

  • 5/20/2018 Talk Final

    20/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Extreme Performance (Knowledge) Engineering

    Capture all available sources performance data and metadata(collectively knowledge) that can be use to reason aboutapplication performance expectations

    Optimization

    Measurement / Analysis

    Model

    Relative

    Model

    EmpiricalExtreme!scale

    Model

    Performance

    Symbolic

    Performance

    Expectations

    Data

    Mining

    Expert

    System

    ProgrammingLanguageModel

    Execution Models

    Simulation

    Insight

    Extreme!scale

    Application

    System

    Performance Models

    Model

    System

    Model

    Computation

    Model

    Knowledge Base

    Performance

    20

  • 5/20/2018 Talk Final

    21/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Case Study: MPAS Ocean

    !

    Use multiscale methods for accurate,efficient, scale-aware earth system models

    ! MPAS-O uses a variable resolution irregular

    mesh of hexagonal grid cells

    ! Cells assigned to MPI processes" Grouped as blocks

    " Each cell has 1-40 vertical layers

    ! MPAS-O has demonstrated scaling limits

    ! Look at increasing concurrency

    ! Focus on role of partitioning

    21

    l i l I I : ,

    l i l l :

    i i, l li

    l l , l i l

    l i i l

    i i i l

    i i i i i l

    ili

    l

    i

    l i l I I ,RBSII-B.'Q6&T831%,#U81"

  • 5/20/2018 Talk Final

    22/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    MPAS-O Load Imbalance

    ! Stencil codes introduceload imbalances due tohalo/ghost cells

    ! Performance measurementin isolution is insufficientto explain source of

    imbalance and attribute toperformance factors

    !Need to combine:

    " Performance measurements" Application metadata

    " Correlations betweenmeasurements and metadata

    " Visualization of performance data incontext of application domain

    22

  • 5/20/2018 Talk Final

    23/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Stencil Properties and Metadata Collection

    ! Capture stencil propertiesin metadata" nCellSolve= cells in partition

    " nCell= nCellSolve+ nCellsHalo! Correlate computation balance

    with metadata fields! Original partitioning based on

    balancing nCellsSolveon MPI processes! Really need to balance with respect to nCell

    " This depends on the partitioner, but do not know in

    advance!! Apply knowledge to Hindsight partioner

    " Create initial Metis partition" Assign nCell weights on graph and iterate to optimize

    23

    800

    900

    1000

    1100

    1200

    1300

    1400

    1500

    1.50E+07 2.00E+07 2.50E+07 3.00E+07 3.50E+07 4.00E+07

    Metadata Correlated with

    dv Timer

    nCellsSolve

    nCells

  • 5/20/2018 Talk Final

    24/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 24Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Hindsight Results (Original versus Refined)

    24

    +"%.0 E&00' V&HB01E1% W,.0"X B&$ U0"EF

    +1-& 'B&2%

    E"-B#J23

    +1-& 'B&2% 12Y95Z[.1%V.0'" $&6#E&6 J-& 12

    ,.0" $"#J2&'X

    ;#-U&$ "* 2&13,U"$'B&$ U0"EF

    -12SP\]!^]^

    -.HS_P`"\\O

    -12SO!M-.HS\!a

    -12S_8]&\!a8_&\

    -.HSM8^&_"M8P&_

    -12SM8\&\"a8N&`

    -.HSO8a&_"O8^&_

    Original Refined240 processors

    on NERSC Edison

  • 5/20/2018 Talk Final

    25/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 25Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Tools/Technology Integration and Synthesis

    Performance

    Modeling

    Reliability

    Autotuning

    Performance

    Optimization

    Resilience

    Energy

    +L7

    -B19

    9?"#26

    LEJT&b.$-"2(

    !b1>>

    G""c12&

    9L95

    9A?5>

    9/1;%$.E&$

    d9+>

    Tools /Technologies

    G!G+""0F1%

    Code analysis

    Center of mass for

    performance engineerng

    GD/A

    D$1"

    End-to-end

    Integration

    SciDAC

    applications

    Prior researchfunding

    /79AG 1' . ^Q(&.$ /E1

  • 5/20/2018 Talk Final

    26/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Autotuning for Performance Productivity

    !

    Increasing level of parallelism and heterogeneity inhardware exposes a complex performance landscape

    ! Can not expect humans or stand-alone tools to be able to

    achieve desired performance objectives in the future

    " Lack of knowledge and automation (humans)

    " Lack of scope and openness (isolated tools)

    ! Autotuning is promising, but faces two challenges

    "Integrationof performance measurement/analysis tools

    with code analysis/transformation tools

    " Synthesisof information from collective sources and

    preservations of knowledge for shared use

    ?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 26%CA*B ;

  • 5/20/2018 Talk Final

    27/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Tools Integration with SUPER Autotuning

    ! SUPER is applying autotuning to optimize HPC applications" Active Harmony autotuning system (Hollingsworth, UMD)

    !software architecture for optimization and adaptation" CHiLL compiler framework (Hall, Utah)

    !CPU and GPU code transformations for optimization

    " Orio annotation-based autotuning (Norris, ANL)!code transformation with optimization (CUDA, OpenCL)

    ! Integrate performance tools (TAU) with these frameworks

    " Use to gather performance data for autotuning/specialization

    " Store performance data with metadata for each experimentvariant and store in performance database (TAUdb)

    " Use machine learning and data mining to increase the levelof automation of autotuning and specialization

    ?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 27A.#,#.&%&( ,4 D*E AFF9%/30,&8

  • 5/20/2018 Talk Final

    28/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 28Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Orio Empirical-Based Autotuning Process

    28?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&(A.#,#.&%&( ,4 D*E AFF9%/30,&8

    Optimized

    CodeCUDA

    Code withDSL Annotations

    DSLParser

    CodeTransformations

    EmpiricalPerformanceEvaluation

    Sequence of (Nested)Annotated Regions

    Transfomed Code CodeGenerator

    best performing version

    TuningSpecification

    SearchEngine

    Fortran

    C

    OpenCL

    M_

  • 5/20/2018 Talk Final

    29/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 29Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Orio and TAU Integration

    29

    Orio Code Generator

    Experiment

    TAU Metadata Entries

    Transformations

    Execution TimeWrites

    CUPTI callback

    measurement library

    TAU Profiles

    TAUdb

    Writes

    Uploaded

    Links at Runtime

    !7

    Y&.'#$&-&2%

    Y&%$1E B$"e0123 Y&%.6.%. +L76U '%"$.3&

    L#%"%#2123 .2.0('1' Y.E,12& 0&.$2123 DBJ-1f.J"2 '&.$E, /B&E1.01f.J"2

    ?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&(A.#,#.&%&( ,4 D*E AFF9%/30,&8

    D$!7

    Ma

  • 5/20/2018 Talk Final

    30/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Autotuning Radiation Transport Code

    !

    Solid fuel ignition solver

    ! Our goal is to replace two functions which represent alarge proportion of overall execution time and offload

    them to the GPU

    ! The implementations take too long to run to exhaustively

    enumerate the search space" Use a Nelder-Mead search

    30?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&(A.#,#.&%&( ,4 D*E AFF9%/30,&8 ]N

  • 5/20/2018 Talk Final

    31/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    FormJacobian (different sizes, platforms)

    ?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 31A.#,#.&%&( ,4 D*E AFF9%/30,&8

    1,450

    1,250

    1,300

    1,350

    1,400

    Ge

    neratedCodeExecutionTime(milliseconds)

    FJ 64x64x64Radeon 7970

    Xeon Phi GTX 480

    Tesla C2075

    Tesla K20c

    1,210

    1,250

    1,300

    1,350

    1,400

    1,450

    Ge

    neratedCodeExecutionTime(milliseconds)

    FJ 75x75x75

    1,210

    Radeon 7970Xeon Phi

    GTX 480

    Tesla C2075

    Tesla K20c

    1,450

    1,250

    1,300

    1,350

    1,400

    GeneratedCodeExecutio

    nTime(milliseconds)

    Radeon 7970 Xeon Phi

    GTX 480

    Tesla C2075Tesla K20c

    FJ 100x100x100

    1,210 1,210

    1,250

    1,300

    1,350

    1,400

    1,450

    GeneratedCodeExecutio

    nTime(milliseconds)

    FJ 128x128x128

    Radeon 7970Xeon Phi

    GTX 480

    Tesla C2075

    Tesla K20c

    +#2123%$.g&E%"$1&'

    &HB&$1-&2% %$1.0

    &HB&$1-&2% %$1.0&HB&$1-&2% %$1.0

    &HB&$1-&2% %$1.0

  • 5/20/2018 Talk Final

    32/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Factor Analysis

    ! Parameter values for the best performing ex14FF andex14FJ kernels across the architectures

    " Workgroups, Workitemspergroup, Unrollinner,

    Compilerflags, Sizehint, Vechint

    32

  • 5/20/2018 Talk Final

    33/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 33Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    TAU and Autotuning in SUPER

    TAUdb

    OutlinedFunction

    Selective InstrumentationFile (specifying parameters

    to capture)

    InstrumentedVariant

    tau_instrumentor

    ParameterizedPerformance Profile

    execute

    PerfDMFTauDB

    parametersfrom TauDB

    CHiLLRecipes

    Search Driver(brute force or Active

    Harmony)

    code variant

    TAUdb

    CHiLL

    profile dataand metadata

    WEKA

    decision treeinductionalgorithm

    ROSE-basedCode

    GenerationTool

    Code VariantsCode Variants

    Code VariantsCode Variants

    Wrapper Function

    PerfDMFTauDBTAUdb

    Orio Code Generator

    Experiment

    TAU Metadata Entries

    Transformations

    Execution TimeWrites

    CUPTI callbackmeasurement library

    TAU Profiles

    TAUdb

    Writes

    Uploaded

    Links at Runtime

    !7CHiLL+ AH

    Orio

    ROSE

    Geant4MPAS-O

    CESM PerfExplorerhd!O

    33

  • 5/20/2018 Talk Final

    34/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Synthesis for Automated Performance Tuning

    !

    Integration of performance measurement and analysiswith autotuning frameworks is important

    ! However, there is an opportunity for synthesis of 2 types

    " Incorporate broader information from all programming,

    development, performance, optimization, systems toolsand technologies in the autotuning process

    " Preserve performance knowledge and enables higher-

    order understanding (learning) of the relationship

    between performance factors across tuning dimensions!Need a unified architecture for information synthesis and

    performance knowledge preservation

    ?"(+30,& 3&- @2$"8%8 4,+ A.#,53#"- *"+4,+53&/" B.&%&( 34A.#,#.&%&( ,4 D*E AFF9%/30,&8

  • 5/20/2018 Talk Final

    35/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    A New Performance Observability

    !

    Key exascale parallel performance abstraction" Inherent state of exascale execution is dynamic

    " Embodies non-stationarity of performance

    " Constantly shaped by the adaptation of resources to

    meet computational needs and optimize objectives! Requires a fundamentally different performance

    observability paradigm

    " Designed to support introspective adaptation

    " Reflects computation to execution model mapping

    " Aware of multiple (performance) objectives

    " In-situ analysis of performance state and objectives

    35

  • 5/20/2018 Talk Final

    36/42

    !"#$%&'%&( *+,-./01%#2 3&- *"+4,+53&/" %& #$" 6738/39" 6+3*!:*6! ; 36Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Introspection, In Situ Analysis, and Feedback

    36

    !"#$%&'( Y.$J2 /E,#0f4 >>;> i 91B&$ 9$"g&E%

  • 5/20/2018 Talk Final

    37/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    XPRESS Project (DOE X-Stack)

    ! Design and development of exascale software stack to support the

    ParalleX execution model" Highly concurrent" Asynchonous

    " Message driven" Global address space

    ! OpenX" XPI programming API

    " HPX runtime system" RIOS interface to OS

    " APEX performance system

    ! Team" Universities: IU, LSU, UH, UNC/RENCI, UO

    " Laboratories: SNL, LBNL, ORNL

    37

  • 5/20/2018 Talk Final

    38/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Legacy

    Applications

    New Model

    Applications

    MPI

    Metaprogramming

    FrameworkDomain Specific

    Active Library

    Compiler

    AGASname spaceprocessor

    LCOdataflow, futuressynchronization

    Lightweight

    Threadscontext manager

    Parcelsmessage driven

    computation

    ...

    OpenMP

    XPI

    Task recognition Address

    space control

    Memory bankcontrol

    OSthread

    Instrumentation

    Networkdrivers

    Distributed FrameworkOperating System

    HardwareArchitecture

    OperatingSystem

    Instances

    RuntimeSystem

    Instances

    PRIME MEDIUMInterace / Control

    {

    {

    Domain Specific

    Language

    +106 nodes 103 cores / node + integration network

    ...

    Integrated Software Stack for ParalleX

    ! OpenX"

    XPI programming API" HPX runtime system

    " RIOS interface to OS" APEX performance system

    ! APEX" OS (LXK) tracks system-

    level resources" Runtime (HPX) tracks

    threads, queues, parcels,remote ops, memory,

    concurrency" XPI allows, allow

    language-level performancesemantics to be measured

    38

  • 5/20/2018 Talk Final

    39/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Argo DOE ExaOSR Project

    ! Exascale OS and runtime research project"

    Labs: ANL, LLNL, PNL" Universities: BU, UC, UIUC, UO, UTK

    ! Philosophy" Whole-system view

    !dynamic user environment (functionality, dynamism,flexibility)

    !first-class managed resources (performance, power, )

    !hierarchical response to faults" Massive concurrency support

    ! Key ideas

    " Hierarchical (control, communication, goals, data resolution)

    " Embedded (performance, power, ) feedback and response

    " Global system support

    39

  • 5/20/2018 Talk Final

    40/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Argo Global Information Backplane

    !

    BEACON: event/action/control notification! Expos: performance observability system

    40

  • 5/20/2018 Talk Final

    41/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Argo Global Optimization View

    # i 3&2&$.01f&6 E"2%$"0 '132.0'T i '&j23' *"$ 2&H% 0&T&0- i B"k&$I$&'101&2E& -"6&0' l -&E,.21'-'

    3 i B"k&$4 B&$*"$-.2E& 3".0'* i *&&6U.EF *$"- ?AL!D; .26 Ah9D/m& i &$$"$ U&%k&&2 3".0' .26 *&&6U.EF

    41

  • 5/20/2018 Talk Final

    42/42

    Rethinking Productivity and Performance in the Exascale EraPROPER 2014

    Conclusions

    !

    Exascale brings fundamentally new challenges (plusopportunities!) to performance and to productivity

    " ExaFLOPs will require new technology innovations

    "New science will require more sophisticated methods

    ! Performance and productivity are intimately coupled" Achieving scientific productivity requires performance

    " Performance can not be considered an afterthought

    " Performance depends on integration and synthesis

    !

    with application development environment

    !throughout the exascale software stack

    ! FLOPs versus Brains

    42