Performance Analysis - Intel · Intel does not guarantee the availability, functionality, or...

Post on 22-May-2020

11 views 0 download

Transcript of Performance Analysis - Intel · Intel does not guarantee the availability, functionality, or...

Bhanu Shankar, Ph.D.

Architect, 3D XPoint™ Performance Analysis

Intel Corporation

May 17, 2016

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Legal Notices and DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.

No computer system can be absolutely secure.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.

The products described may contain design defects or errors known as errata which may cause the product to deviate from publishedspecifications. Current characterized errata are available on request.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.

Intel, Xeon, Xeon Phi, Core, VTune, Atom, Quark and the Intel logo are trademarks of Intel Corporation in the United States and other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.

2

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

33

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

In a single phrase “VTune is the best oscillator for Intel® Platforms”

If there is something to measure on the platform, VTune can do it

Learn a single tool

Use it on multiple Operating Systems

– Windows / Linux / FreeBSD / Android / VxWorks

Use it on Multiple Platforms

– Quark, Atom Family, Core Family, Xeon family, Xeon Phi family

Updated often with new Analyses modes for better insight

Intel® VTune™ AmplifierGet Faster Code Faster

4

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice5

Familiarity with the basics of Intel® VTune™ Amplifier

Create Projects

Starting a profiling run

– Choose Target and Analysis Type

Types of Analyses available in VTune™ Amplifier

VTune Panes

– Role of the Grid

– Timeline Views

– Grouping Toolbar

Familiarity with the basics

Parallel programming using OpenMP

Intel x86 assembly language

Basics of compiler optimizations

Cache and Memory hierarchies

Audience Knowledge

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

I will start with an application and work through the process analyzing its performance.

The focus of this process is to allow you, the user, to be able to find out if your application is memory bound.

If so, is the memory boundedness caused due to NUMA behavior

The application is a modified version of the stream benchmark

Freely available at: http://www.cs.virginia.edu/stream

A simple, synthetic benchmark designed to measure sustainable memory bandwidth

Synopsis of this webinar

6

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

First - Run Advanced Hotspots Analysis

Identify the hotspots

Characterize the application behavior

Secondly - Run General Exploration Analysis

Identify areas to explore after the basic algorithm / hotspot

Lastly – Run Specialized Analysis

For this example - Memory Analysis

– Memory Analysis without objects

– Memory Analysis with objects (Linux only)

General Methodology for using VTune

7

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Let’s get started

8

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

First - Run Advanced Hotspots

Identify the hotspots

Characterize the application behavior

Secondly - Run General Exploration

Identify areas to explore after the basic algorithm / hotspot

Lastly – Run Specialized Analysis

For Instance - Memory Analysis

– Memory Analysis without objects

– Memory Analysis with objects (Linux only)

Step 1:

9

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice10

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Application – Hotspot – Bottom Up Tab

11

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Application Source/Object

Source code is simple

object code is straight forward

Why the large CPI?

Not caused by algorithm

Must be machine specific

12

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

First - Run Advanced Hotspots

Identify the hotspots

Characterize the application behavior

Secondly - Run General Exploration

Identify areas to explore after the basic algorithm / hotspot

Lastly – Run Specialized Analysis

For Instance - Memory Analysis

– Memory Analysis without objects

– Memory Analysis with objects (Linux only)

Step 2

13

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Summary Page

14

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

General Exploration – Bottom Up Tab

Same Loops as earlier

15

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Let’s Explore – Source level

Yes, Indeed – We have a bottleneck in the memory hierarchy

16

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

First - Run Advanced Hotspots

Identify the hotspots

Characterize the application behavior

Secondly - Run General Exploration

Identify areas to explore after the basic algorithm / hotspot

Lastly – Run Specialized Analysis

For Instance - Memory Analysis

– Memory Analysis without objects – Do we have a bandwidth problem?

– Memory Analysis with objects (Linux only)

Step 3: Find the memory bandwidth

17

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

How do I run Memory Access Analysis?

Make sure this box is unchecked.

18

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Memory Access - Summary

Looks like a problem accessing remote DRAM

19

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Memory Access – Bottom-Up View

Imbalance in memory access across both sockets

Average latency is fairly large

20

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

First run Advanced Hotspots

Identify the hotspots

Characterize the application behavior

Secondly run General Exploration

Identify areas to explore after the basic algorithm / hotspot

Lastly – Run Specialized Analysis

For Instance - Memory Analysis

– Memory Analysis without objects

– Memory Analysis with objects (Linux only)

Step 4: Identify the memory object(s)

21

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Identify the Memory Objects - Configuration

Make sure this box is checked.Minimal size of object to track.

22

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Identify the Memory Objects

Location of the heap object

Average Latency is large

23

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Dive into an object

Access to the object in a parallel region - Good

Access to the object in a serial region –Hmmm…Investigate

24

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Serial Access to memory object

This is where memory is first touched.BINGO!!! Linux stripes memory to local memory of socket!!!

25

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Did it work? Analyze the fixed applicationRun Memory Access on fixed code

26

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Fixed Code: Summary Page

Effects of NUMA completely disappearedRemote DRAM access are minimal

27

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

The stream benchmark has 5 loops that are parallelized

Locate the loops by tagged with “#pragma omp parallel for”

Remove the “#pragma omp parallel for” for each or multiple loops

Run Intel® VTune™ Amplifier

See the effects of memory placement and parallel execution

Try the compare results feature on your runs of VTune using the icon

28

Lab exerciseTry out what you just learned

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Memory and Inter-socket Bandwidth

Memory Latency

Memory Hierarchy

False Sharing

True Sharing

Effectiveness of Lockless Algorithms

What other problems can I diagnose this way?

29

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Intel® VTune™ Amplifier continues to add tools to the toolbox to diagnose system performance problems

Memory Access Analysis is one such powerful tool

Stay tuned for more such tools in the future

Summary

30

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice

Download and Evaluate Intel® VTune™ Amplifier

https://software.intel.com/en-us/intel-vtune-amplifier-xe

Intel® VTune™ Amplifier Support

https://software.intel.com/en-us/intel-vtune-amplifier-xe-support

Get Help: Ask the Community

https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe

NUMA Architecture

https://software.intel.com/en-us/articles/a-brief-survey-of-numa-non-uniform-memory-architecture-literature

Stream Benchmark

http://www.cs.Virginia.edu/stream

or type “stream benchmark” into your favorite search engine

31

Call to Action

Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Optimization Notice32

Questions?