Maximizing Your SPARC t4 Oracle Solaris Application Performance

43
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 1

Transcript of Maximizing Your SPARC t4 Oracle Solaris Application Performance

Page 1: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 131

Page 2: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 132

Maximizing Your SPARC T4 Oracle Solaris Application Performance§ Darryl Gove

Senior Principal Software Engineer

Page 3: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 133

Program Agenda

§ Hardware§ Correctness§ Performance§ Parallelism

Page 4: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4

More Information§ Download, technical articles and more: http://oracle.com/goto/solarisstudio

OpenWorld Sessions

§ Mon, Oct 1, 10:45 - 11:45 AM: Maximizing Your SPARC T4 Oracle Solaris Application Performance, CON 6382 (Marriott Marquis - Golden Gate)

§ Mon, Oct 1, 3:15 - 4:15 PM: Technical Panel: Developing High Performance Applications on Oracle Solaris, CON 7196 (Marriott Marquis - Golden Gate)

Hands-on Lab

§ Wed, Oct 3, 1:15 - 2:15 PM: Develop C/C++ Applications for the Cloud with Oracle Tuxedo and Oracle Solaris Studio, HOL 10276 (Marriott Marquis - Salon 5/6)

JavaOne Sessions

§ Mon, Oct 1, 8:30 – 9:30 AM: Mixed-Language Development: Leveraging Native Code from Java, CON 6714 (Hilton San Francisco -Continental Ballroom 6)

§ Tues, Oct 2, 1:00 – 2:00 PM: Take Performance Tuning of Your Enterprise Java Applications to the Next Level , CON 10213 (Hilton San Francisco -Continental Ballroom 6)

Page 5: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5

Oracle Solaris Studio

© 2011 Oracle Corporation – Proprietary and Confidential 4

Performance Analyzer provides unparalleled insight into your app, allowing you to identify bottlenecks and improve performance by orders of magnitude

Code Analyzer ensures app reliability by detecting app vulnerabilities, including memory leaks and memory access violations

Thread Analyzer simplifies complex parallel programming errors by detecting hard to pinpoint race and deadlock conditions

Integrated Development Environment increases developer efficiency

New

Analysis Suite

C, C++ Compilers utilize advanced code generation technology to optimize apps for highest performance on SPARC & x86

Fortran Compiler optimizes compute intensive app performance

Debugger ensures app stability with event handling & multi-thread support

Performance Library maximizes compute-intensive app performance using advanced numeric solver libraries

Compiler Suite

Page 6: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6

Oracle Solaris Studio 12.3 Highlights

Accelerate Performance

Ø 3x faster code on SPARC T4 than GCC; 40% faster than Sun Studio 12

Ø 1.5x faster code on Intel x86 than GCC; 20% faster than Sun Studio 12

Gain Extreme Observability

Ø New Code Analyzer for more reliable applications; reports common coding & memory access errors faster than competitive alternatives

Ø Enhanced Performance Analyzer with system-wide performance analysis

Improve Productivity

Ø Remote access to Solaris Studio tools from local desktop (Oracle Solaris, Linux, Microsoft Windows, Mac)

Ø Streamlined Oracle DB application developmentØ Simplify Oracle Tuxedo development with IDE plug-inØ IPS distribution on Solaris 11 for simplified managementØ 20% faster compile time

Page 7: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 137

Click icon to add picture

SPARC T4 Hardware

Page 8: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8

SPARC T4 - Overview

§ Not like T1 – T3 (only shares the T-series name)§ Single thread performance § Multithread throughput

Page 9: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9

SPARC T4 - Details

§ 1 to 4 chips per system§ 8 cores per chip

● Dual issue

● Out-of-order

§ 8 threads per core§ 3.0 GHz clock

● 48B (3.0GHz * 8 * 2) instructions / sec / chip

Page 10: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10

SPARC T4 - Capacity

§ Chip capacity: 48 B instructions / sec§ For fully active threads:

● Single thread: 6 B instructions / sec

● Each of eight threads: 0.75 B instructions / sec

§ Threads rarely fully active:● I/O wait

● Processor stall (fetch from memory = 300-400 cycles)

Page 11: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11

Developing for T4

§ Make it correct§ Remove obvious performance issues§ Make it scale (correctly)

Page 12: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1312

Click icon to add picture

Application Correctness

Page 13: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13

Debug information

§ Always use -g§ No optimisation flags:

● Full debug

● Lower performance

§ Optimised binaries:● Best effort debug

● No/minimal performance impact

§ Debug what you ship!

Page 14: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14

Automatic Error Detection

§ Static/compile time error detection● Code Analyzer

§ Dynamic/runtime memory access error detection● Discover

Page 15: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15

Code Analyzer

§ Static analysis for common coding errors● Uninitialised variables, etc.

§ Compile with:● -xanalyze=code

§ View results with:● code-analyzer <a.out>

Page 16: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16

Code Analyzer – example output

Page 17: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17

Memory Error Detection - discover

§ Common memory allocation and use errors:● Uninitialised memory

● Access past bounds

● Memory leaks

§ Usage: ● discover <a.out>

● <a.out>

● Default = html output

Page 18: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18

Example of discover$ ./a.outERROR 1 (ABR): reading memory beyond array bounds at address 0xffbff278 (8 bytes) on the stack at: average() + 0x228 <disc.c:8> 6: for (int i=1; i<=len; i++) 7: { 8:=> total+=array[i]; 9: } _start() + 0xd8 ... double array[20]; ... printf(" Average = %f\n", average(array,20) );

Page 19: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1319

Click icon to add picture

Application Performance

Page 20: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20

Optimisation – the Basics

§ No optimisation flags == no optimisation§ Good optimisation: -O§ Advanced optimisations:

● Guided by profile of appliaction

● Knowledge of deployment systems

Page 21: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21

Profiling

§ Profiling with the performance analyzer● collect <a.out>

● collect -P <pid>

● analyzer test.1.er

§ Report generation with spot● spot <a.out>

● spot -P <pid>

Page 22: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22

Performance Analyzer

§ Demo

Page 23: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23

Performance Analyzer

§ Demo

Page 24: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24

Aggressive Optimisation

§ One stop flag: -fast§ Enables multiple optimisations

● Build machine = deployment machine

● Floating point simplification and optimisation

● Pointers to different types do not alias

● Function inlining

§ Investigate performance gain

Page 25: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25

Profile Drives Flag Selection Floating Point§ Significant time in floating point computation:

● Floating point simplification

● -fsimple=2

§ Significant time in floating point library code:

● Optimised floating point libraries

● -xlibmopt, -xlibmil

§ Use FP optimisations if performance improves and FP optimisations are acceptable

Page 26: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26

Profile Drives Flag Selection Flat profile§ Many hot small functions

● At least -xO4 optimisation level

● -xipo for cross-file optimisations

§ Conditional code or inlining

● Profile feedback

● -xprofile=collect:

● Training run of application

● -xprofile=use:

Page 27: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27

Profile Drives Flag Selection Pointers§ Pointers inhibit compiler optimisations§ Compiler needs more information§ restrict qualified pointers in C

● Localised action

§ Flags:● -xrestrict (restrict qualified pointers passed into functions)

● -xalias_level=std [C]

● -xalias_level=compatible [C++]

● Actions at file level

Page 28: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28

Processor Specific Optimisations

§ Default: -xtarget=generic often good enough

§ T4 has useful instructions● Compare and branch

● Floating point multiply add

§ One stop flag: -xtarget=T4§ Schedules for T4, uses entire T4 instruction set§ Only runs on T4 (or later) processors

Page 29: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29

SPARC Instruction Sets

Page 30: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1330

Click icon to add picture

Multi-threaded Applications

Page 31: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31

Multi-thread or Multi-process

§ Multiprocess:● Isolation

● Independence

● Large virtual memory footprint

● Potentially high synchronisation costs

§ Multithread

● Low synchronisation costs

● Minimal memory footprint

Throughput

Latency

Page 32: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32

Multi-threaded Application Development

§ POSIX threads (C11, C++11)● Low level: Great control, significant complexity

§ OpenMP

● High abstraction: Easy to use, flexible

§ Automatic parallelisation

● Trivial to use: -xautopar -xreduction

● Works best for loop-intensive code (typically FP)

Page 33: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33

OpenMP Parallel For

§ Distributes iterations across CPUs

#pragma omp parallel for

for (int i=0; i<length; i++)

{

// Do work

}

Page 34: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34

OpenMP Tasks

§ Distributes work across CPUs

for (int i=0; i<length; i++)

{

#pragma omp task

{

// Do work for task ‘i’

}

}

Page 35: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35

Parallel Program Correctness

§ Distributes work across CPUs

int total=0;

#pragma omp parallel for

for (int i=0; i<length; i++)

{

total += i;

}§ Data race: Multiple threads updating the same variable

Page 36: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36

Thread Analyzer

§ Instrument application● Compiler flag: -xinstrument=datarace

● Binary instrumentation: discover -i datarace <a.out>

§ Gather data:● collect -r on <a.out>

§ View data:

● tha tha.1.er

Page 37: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37

Thread Analyzer - Example

§ Demo

Page 38: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38

Scaling to Many Threads

§ Minimise serial code● Amdahl’s Law

§ Minimise lock contention§ Minimise writes of shared data§ Evenly distribute work

Page 39: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39

Scaling to Many Threads

§ Demo

Page 40: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.40

Limits of Performance

§ Threads● vmstat

§ Instruction Issue Width

● pgstat / cputrack / cpustat / ripc

§ Bandwidth

● busstat / bw

Page 41: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.41

Conclusion: Optimising for T4

§ Step 1: Profile and remove inefficient code§ Step 2: Explore benefits of increased optimisation§ Step 3: Identify opportunities for parallelisation§ Step 4: Profile and tune parallel code § Step 5: Watch for hitting hardware limits

Page 42: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1342

Page 43: Maximizing Your SPARC t4 Oracle Solaris Application Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1343