Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

39
Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads

Transcript of Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

Page 1: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

Intel Software College

Tuning Threading Code with Intel® Thread Profiler

for Explicit Threads

Page 2: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

2

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Objectives

After successful completion of this module you will be able to…

• Use Thread Profiler to recognize and fix common performance problems in applications using Windows* threads

Page 3: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

3

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

Look at Intel® Thread Profiler features

Define Critical Path Analysis

Examine Thread Profiler data views available

Review common performance issues of multithreaded applications

• Focus on Load imbalance

• Focus on Synchronization contention

Describe general optimizations to gain better performance

Page 4: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

4

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Motivation

Developing efficient multithreaded applications is hard

New performance problems are caused by the interaction between concurrent threads

• Load imbalance

• Contention on synchronization objects

• Threading overhead

Page 5: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

5

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Thread Profiler

Plugs in to the VTune™ performance environment

• Instrumentation-based data collector in VTune

Identifies performance issues in OpenMP* or threaded applications using the Win32* API, POSIX* threads, and Intel® Threading Building Blocks

Pinpoints performance bottlenecks that directly affect execution time

Page 6: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

6

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Thread Profiler Features

Supports several different compilers

• Intel® C++ and Fortran Compilers, v7 and higher

• Microsoft* Visual* C++ .NET* 2002, 2003 & 2005 Editions • Integrated into Microsoft Visual Studio .NET* IDE

Binary instrumentation of applications

Different views and filters available to assist and organize analysis

Uses critical path analysis

Page 7: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

7

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

The critical pathcritical path is the longest is the longest execution flowexecution flow

What is the Critical Path?

Threaded applications contain multiple execution flows• A new flow is created when a thread is created or resumes

• Flow ends when a thread terminates or blocks on a synchronization primitive

Thread 1

Thread 2

Thread 3

T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15

Acquire L

Threads 2 & 3 Done

Acquire L

Wait for Threads 2 & 3

Release L

Acquire lock L

Wait for L

Release L Wait for L

Thread 2 terminates

Thread 3 terminates

Thread 1 terminates

Page 8: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

8

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Critical Path Analysis

System Utilization

• Relative to the system executing the application Idle: no threadsSerial: a single threadUnder Utilized: more than one thread, less than coresFully Utilized: # threads == # coresOver Utilized: # threads > # cores

Thread interaction categories Cruise: threads running without interferenceOverhead: thread operation overheadBlocking: thread waiting on external eventImpact: thread preventing some other thread from executing

If the If the critical pathcritical path is shortened, the application will run is shortened, the application will run in less timein less time

Page 9: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

9

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Thread 1

Thread 2

Thread 3

T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15

Acquire lock L

Wait for Threads 2 & 3

Wait for L

Release L Wait for L

Release L

Acquire L

Acquire L

Threads 2 & 3Done

System Utilization

Examines processor utilization to determine concurrency level of the application

Concurrency is the number of active threads

Categorization shown for a system configuration with 2 processors

Idle Serial Fully UtilizedUnder Utilized Over Utilized

Concurrency Level0

15

5

10

Tim

e

Page 10: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

10

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Execution Time Categories

Analyze thread interaction and behavior along critical path

Record objects that cause CP transitions

Cruise time Overhead Blocking time Impact time

Categorization shown for a system configuration with 2 processors

Thread Interaction0

15

5

10

Tim

e

Thread 1

Thread 2

Thread 3

T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15

Acquire lock L

Wait for Threads 2 & 3

Wait for L

Release L Wait for L

Release L

Acquire L

Acquire L

Threads 2 & 3 Done

Page 11: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

11

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Merging Concurrency and Behavior

Concurrency Level Critical Path Thread Behavior

0

15

5

10

Tim

e

Start with system utilization

Further categorize by behavior

Page 12: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

12

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Thread Profiler Views

Critical Path View

• Shows breakdown of the critical path

Profile View

• Shows the breakdown of selected critical paths

• User can select other views of the selected profile

• Concurrency level, threads, objects

Timeline View

• Shows thread activity and critical path transitions for the entire application

Source View

• Transition source view, creation source view

Page 13: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

13

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 1a

Threaded version of potential code

• Is there a performance issue?

Goal

• Run application through Thread Profiler

• Examine thread activities by reviewing different views

Page 14: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

14

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Thread Profiler Profile View

Profile Pane

Timeline Pane

Page 15: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

15

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Profile Pane – Concurrency Level View

Concurrency Level View

Two threads ran in parallel ~33% of the time

Ran single threaded ~65% of the time

Let’s look at the Thread View

Page 16: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

16

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Profile Pane – Thread View

Time on the Critical Path

Active time of the thread

Lifetime of the thread

Let’s look at the Object View

Page 17: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

17

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Profile Pane – Object View

This object caused all of the impact

Let’s look at Timeline View

Page 18: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

18

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Timeline Pane

Page 19: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

19

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Source View

Page 20: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

20

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 1b

Threaded version of potential code

• Is there a performance issue?

Goal

• Examine thread activities by reviewing different views

• Determine system utilization

• Identify any performance issues

Page 21: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

21

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Review Activity 1

Concurrency Level view can be used to determine system utilization by the application

Timeline view enables you to understand the thread activity in your application

Instrumentation time will be included in first run results; Instrumentation time will be included in first run results; thus, for applications running in a short amount of time, a thus, for applications running in a short amount of time, a

second run may produce more realistic timings.second run may produce more realistic timings.

Page 22: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

22

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Common Performance Issues

Load balance

• Improper distribution of parallel work

Synchronization

• Excessive use of global data, contention for the same synchronization object

Parallel Overhead

• Due to thread creation, scheduling..

Granularity

• Not sufficient parallel work

Page 23: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

23

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Load Imbalance

Unequal work loads lead to idle threads and wasted time

Busy

Idle

Time

Thread 0

Thread 1

Thread 2

Thread 3

Start threads

Join threads

Page 24: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

24

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Redistribute Work to Threads

Static assignment

• Are the same number of tasks assigned to each thread?

• Do tasks take different processing time?• Do tasks change in a predictable pattern?

• Rearrange (static) order of assignment to threads• Use dynamic assignment of tasks

Page 25: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

25

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Redistribute Work to Threads

Dynamic assignment

• Is there one big task being assigned?• Break up large task to smaller parts

• Are small computations agglomerated into larger task?• Adjust number of computations in a task• More small computations into single task?• Fewer small computations into single task?• Bin packing heuristics

Page 26: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

26

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Unbalanced Workloads

Threads are unbalanced

Active Times not equal

Page 27: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

27

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 2 – Load Imbalance

Threaded version of potential code with thread pools

• Has a load balance performance issue

Page 28: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

28

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Review Activity 2

Threads view can be used to determine activity levels of each thread within the application

Timeline view enables you to understand the thread activity in your application

Page 29: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

29

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Synchronization

By definition, synchronization serializes execution

Lock contention means more idle time for threads

Busy Idle In Critical

Thread 0

Thread 1

Thread 2

Thread 3

Time

Page 30: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

30

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Synchronization Fixes

Eliminate synchronization

• Expensive but necessary “evil”

• Use storage local to threads• Use local variable for partial results, update global after local computations• Allocate space on thread stack (alloca)• Use thread-local storage API (TlsAlloc)

• Use atomic updates whenever possible• Some global data updates can use atomic operations (Interlocked API

family)

Page 31: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

31

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Atomic Updates

Use Win32 Interlocked* intrinsics in place of synchronization object

static long counter;

// FastInterlockedIncrement (&counter);

// SlowerEnterCriticalSection (&cs); counter++;LeaveCriticalSection (&cs);

Page 32: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

32

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Synchronization Fixes

Reduce size of critical regions protected by synchronization object

• Larger critical regions tie up sync objects longer; other threads sit idle longer waiting to acquire objects

• Only accesses to shared variables need to be protected

Page 33: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

33

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Synchronization Fixes

Use best synchronization object for job

• Critical Section• Local object• Available to threads within the same process• Lower overhead (~8X faster than mutex)

• Mutex• Kernel object• Accessible to threads within different processes• Deadlock safety (can only be released by owner)

Other objects are available

Page 34: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

34

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Object Contention

This object caused all of the impact

What is all this?These four threads…

…are impacting threads by this

object

Page 35: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

35

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 3

Threaded version of numerical integration

• Has serious performance issues

Goal

• Understand thread activity

• Use the Thread Profiler groupings

• Examine synchronization and its effect on performance

• Fix performance issue

Page 36: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

36

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Review Activity 3

Grouping objects and threads provides the information on which objects impact what threads

Apply the heuristics from labs for locating bottlenecks in the source code

For longer running applications, the difference in first and second run-times is negligible

Page 37: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

37

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

General Optimizations

Serial Optimizations

• Serial optimizations along the critical path should affect execution time

Parallel Optimizations

• Reduce synchronization object contention

• Balance workload

• Functional parallelism

Analyze benefit of increasing number of processors

Analyze the effect of increasing the number of threads on scaling performance

Page 38: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

38

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Thread Profiler for Explicit ThreadsWhat’s Been Covered

Identifying performance issues can be time consuming without tools

Tools are required to understand and to optimize parallel efficiency and hardware utilization

Thread Profiler helps you understand your applications thread activity, system utilization, and scaling performance

Page 39: Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.

39

Copyright © 2006, Intel Corporation. All rights reserved.

Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.