Parallel considerations of VELO PatPixelTracking
description
Transcript of Parallel considerations of VELO PatPixelTracking
![Page 1: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/1.jpg)
1
Parallel considerations of VELO PatPixelTrackingDaniel Hugo Cámpora PérezLHCb Online team
![Page 2: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/2.jpg)
2
Outline• PatPixel problem description• Test setup, some results• Integration with Gaudi framework
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 3: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/3.jpg)
3
Outline• PatPixel problem description• Test setup, some results• Integration with Gaudi framework
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 4: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/4.jpg)
4
Fast Pixel problem description
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 5: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/5.jpg)
5
Fast Pixel problem description
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 6: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/6.jpg)
6
Fast Pixel problem description• 48 sensors with 12 chips each• Each chip has 256x256 pixels• Clustered 2x2 by readout board• Right and left sensors at different z with overlap
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 7: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/7.jpg)
7
Fast Pixel problem description• The algorithm searches for hits starting from the last pixel
lattice.
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 8: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/8.jpg)
8
Fast Pixel problem description • The algorithm searches for hits starting from the last pixel lattice.• Per hit, it searches for compatible hits (on a given radius) in the
next pixel lattice.• Finding at least three compatible hits forms a track.
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 9: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/9.jpg)
9
Fast Pixel problem descriptionHowever, the current approach is very sequential (albeit efficient!).• Hits must not be already used.• Continue instructions, break the loop and make it fast.
Porting the same algorithm to other programming models as is makes for a proof of concept (produced physics are the same).
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 10: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/10.jpg)
10
Outline• PatPixel problem description• Test setup, some results• Integration with Gaudi framework
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 11: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/11.jpg)
11
Current test setup
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 12: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/12.jpg)
12
Current test setup
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
We are interested in the search bit!
![Page 13: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/13.jpg)
13
Current test setup
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
• Input is 200 Monte-Carlo generated events.
• Implementations produce exactly the same output as Brunel, unless stated otherwise.
• Current setup runs TBB with a variable number of threads specified by task_scheduler_init init(i);
• 1000 experiments are run per configuration. Results shown are the mean of those, standard deviation is checked as well.
![Page 14: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/14.jpg)
14
Comparing apples to…
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
• Lab13▫ Intel Xeon CPU E5-2650 (2 CPUs)▫ 20M Cache, 2.00 GHz (2.80 GHz TB)▫ 8 cores, 16 HW threads
• Intel MIC (Pre-Production Intel® Xeon Phi™ coprocessors)▫ 1.1 GHz▫ 61 cores, 244 HW threads
• GPU▫ NVIDIA GeForce 680GTX▫ 1GHz▫ 1536 CUDA cores (96 SIMD cores)
![Page 15: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/15.jpg)
15
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 16: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/16.jpg)
16
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 17: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/17.jpg)
17
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 18: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/18.jpg)
18
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
Precision• Is there a real need of double point operations?
How about single precision instead…
![Page 19: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/19.jpg)
19
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
Precision
Mean 1 (correct / produced tracks): 100%Mean 2 (correct / total number of tracks): 99.9964%
• We miss one track in 28.000.
• No incorrect tracks are generated.
![Page 20: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/20.jpg)
20
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 21: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/21.jpg)
21
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
Ma(g)ny-cores like Single Precision!
![Page 22: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/22.jpg)
22
Current implementation• Setup is decoupled from the Gaudi framework.• Produced physics are the same.
• Parallelism is setup as thread per event.• GPU acts as simple SIMD (“speedup” of 0.3x !)
▫ divergent branches and warps are not good friends
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 23: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/23.jpg)
23
Event-wise parallelism?Using a similar idea to the baseline algorithm, we can exploit the inherent parallel nature of the problem.
• Average #hits per sensor: 22.6• Average multiplicity (hit x hit): 771.15• Average multiplicity (hit x hit x hit): 1544.7
Early stage parallel algorithm produces 85% of the correct tracks.Different results doesn’t necessarily mean wrong! Physics demonstration!
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 24: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/24.jpg)
24
Outline• PatPixel problem description• Test setup, some results• Integration with Gaudi framework
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 25: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/25.jpg)
25
What’s missing?The current setup is cool and dandy for comparing results, but not for testing the real setup!
Daniel Hugo Cámpora Pérez 26-10-2012
![Page 26: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/26.jpg)
26
Integration with GaudiCurrent HLT doesn’t consider having coprocessors to help in the execution of any step. Framework is sequential!
Per event execution on a coprocessor is not realistic. Memory copies will kill us!
• Each event is approximately 50kB.• Processing one single event is trivial.
We have to hide the latency!
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 27: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/27.jpg)
27
Pipelining!Eg. #event chunk = 200
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 28: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/28.jpg)
28
Integration with GaudiGaudihive
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 29: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/29.jpg)
29
Integration with GaudiGaudihive
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012
![Page 30: Parallel considerations of VELO PatPixelTracking](https://reader035.fdocuments.net/reader035/viewer/2022062501/56815f0b550346895dcdcb83/html5/thumbnails/30.jpg)
30
In conclusion• The analysis on the sequential algorithm is complete.• A speedup of 10.70x has been obtained by properly
configuring TBB.• MIC underperforms because of lack of use of VPUs, more
tweaking is necessary.• Using floats rather than doubles is beneficial for many-core
architectures, and results are the same.
• A parallel version of the PatPixel would show a more realistic architecture comparison, and should be better performant.
• The current framework with a good pipeline could enable the use of a coprocessor in a production environment.
Daniel Hugo Cámpora Pérez - Parallel Considerations of VELO PatPixelTracking 21-11-2012