Advanced Modular Software Performance Monitoring
-
Upload
alexander-mazurov -
Category
Technology
-
view
93 -
download
2
description
Transcript of Advanced Modular Software Performance Monitoring
![Page 1: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/1.jpg)
Advanced Modular Software Performance Monitoring
CPU profiling with Intel® VTune™ Amplifier XE
Alexander MazurovFerrara University, CERN
![Page 2: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/2.jpg)
2
I. Event Processing Software II. Profilers III. Intel® VTune™ Amplifier XE IV. Gaudi Framework V. Gaudi Intel Profiler Auditor VI. Profiling examples
![Page 3: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/3.jpg)
3
Physics events
The Higgs Boson
Simulation * Trigger * Analysis
I. Event Processing Software
![Page 4: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/4.jpg)
4
Detectorevents
Events to storage
106 events/sec 4500 events/sec
LHCb High Level Trigger (HLT) Software
Moore
![Page 5: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/5.jpg)
5
II. Profilers
Collect information related to how an application or
system perform.
![Page 6: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/6.jpg)
6
Measure frequency and duration of functions calls and/or code
instructions.
CPU Profiler
![Page 7: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/7.jpg)
7
Profiling Techniques
- Hardware counters- Instrumenting the code
![Page 8: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/8.jpg)
8
Hardware countersExploit hardware performance counters from Performance Monitoring Unit (PMU)
Counters: - Translation lookaside buffer (TLB) misses - Cache misses - Stall cycles - Memory access latency - ...
Perfmon2 * Intel VTune Amplifier
![Page 9: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/9.jpg)
9
Instrumenting the code
- Statically: * Change code manually / automatically * Compiler assisted (gcc -pg)
- Dynamically (at runtime): * Change code in runtime - Valgrind - Google Performance Tools - Intel VTune Amplifier
![Page 10: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/10.jpg)
10
III. VTune™ Amplifier XEPerformance Profiling Tool
- x86 (32 and 64-bit)- GUI and CLI
![Page 11: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/11.jpg)
11
VTune™ FeaturesRuntime instrumenting profiler
- User-mode sampling- Hardware-based sampling- Concurrency and locks and waits analysis- Threading timeline- Attach to a running process- Source view
![Page 12: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/12.jpg)
12
1) Interupts a process2) Collect samples of all active instruction addresses
3) Restore a call sequence upon each sample.
How user-mode sampling works?
![Page 13: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/13.jpg)
13
User-mode analysis types
- Hotspots- Concurrency- Locks and Waits
![Page 14: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/14.jpg)
14
User-mode samplingHotspots analysis:
![Page 15: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/15.jpg)
15
Group results
![Page 16: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/16.jpg)
16
Call Stack
![Page 17: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/17.jpg)
17
Filter by timeline
![Page 18: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/18.jpg)
18
CPU time by code line
Debug mode (-g)
![Page 19: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/19.jpg)
19
User-mode sampling is a statistical method and does not provide a 100% accurate results.
Accuracy depends on:- Duration of the collection- Speed of processor- Amount of software activity- Sampling interval * recommended value is 10 ms * profiling is only 5% slower
Sampling Accuracy
![Page 20: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/20.jpg)
20
Integrating VTune™ Amplifierto Event Processing Framework
![Page 21: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/21.jpg)
21
IV. GaudiEvent processing framework
MooreTrigger
GaussSimulation
BrunelReconstruction
OnlineMonitoring
and commissioningDaVinci
Physicsanalysis
![Page 22: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/22.jpg)
22
Gaudi Architecture
Algorithms * Services * Tools
![Page 23: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/23.jpg)
23
Moore Event LoopHlt1DiMuonHighMassFilterSequence Hlt1DiMuonHighMassStreamer FastVeloHlt MuonRec Velo2CandidatesDiMuonHighMass GECLooseUnit createITLiteClusters createVeloLiteClusters
Algorithms Sequence
How to profile algorithms?
![Page 24: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/24.jpg)
24
V. Gaudi Intel Profiling Auditor
VTune™ User API +
Gaudi Auditors API
![Page 25: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/25.jpg)
25
VTune™ User API
- Start/Pause profiling- Mark profiling regions
![Page 26: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/26.jpg)
26
Gaudi Auditors API
Algorithm
Start event End event
Callback functions
![Page 27: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/27.jpg)
27
Algorithms profiling (I)
CPU time per sequence branch
![Page 28: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/28.jpg)
28
Algorithms profiling (II)
![Page 29: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/29.jpg)
29
Gaudi configuration
from Configurables import IntelProfilerAuditorprofiler = IntelProfilerAuditor()profiler.StartFromEventN = 5000 profiler.StopAtEventN = 15000AuditorSvc().Auditors += [profiler]
![Page 30: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/30.jpg)
30
Run: $> intelprofiler -o /collected/data job.py
Analyze (GUI): $> amplxe-gui /collecter/data/r001hs
Analyze (CLI): $> amplxe-cl -reports hotspots -r /collecter/data/r001hs
![Page 31: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/31.jpg)
31
VI. Profiling examples
1. Memory allocation functions2. Measuring profiling accuracy3. Custom reports
![Page 32: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/32.jpg)
32
1. Memory allocation functionsoperatornew from libstdc++ library:
tc_new from tcmalloc library:
tc_new uses twice less time then operatornew
![Page 33: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/33.jpg)
33
2. Measuring profiling accuracy
Intel Profiling Auditorvs .
Timing AuditorMeasures the absolute time of
algorithm's run1000 events
![Page 34: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/34.jpg)
34
3. Custom reportsBuild reports using CSV files exported
from VTune Amplifier
![Page 35: Advanced Modular Software Performance Monitoring](https://reader033.fdocuments.net/reader033/viewer/2022052619/555cea6ed8b42a08668b46d5/html5/thumbnails/35.jpg)
35
Conclusions
Intel® VTune™ Amplifier XE:
+ Various analysis types and reports + Rich User API + Reasonable overhead time