11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor...
-
Upload
dale-lewis -
Category
Documents
-
view
215 -
download
0
Transcript of 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor...
![Page 1: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/1.jpg)
11
1
Customizing Wide-SIMD Architecturesfor H.264
Sangwon Seo1, Mark Woh1, Scott Mahlke1, Trevor Mudge1
Vijay Sundaram2, Chaitali Chakrabarti2
1 University of Michigan2 Arizona State University
![Page 2: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/2.jpg)
22
2
Customizing Wide-SIMD Architectures for H.264
Outline
Motivation
H.264 Analysis
Proposed Architecture
H.264 Kernel Mappings
Results
Conclusion
2
![Page 3: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/3.jpg)
33
3
Customizing Wide-SIMD Architectures for H.264
Motivation – Smart Phone
3
Reference Images : http://www.apple.com/iphone/gallery/
![Page 4: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/4.jpg)
44
4
Customizing Wide-SIMD Architectures for H.264
Motivation – Inside Smart Phone
4
Reference Images : http://idannyb.files.wordpress.com/2008/07/xiuvbfueck3gsdum-large.jpg
![Page 5: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/5.jpg)
55
5
Customizing Wide-SIMD Architectures for H.264
H.264 Design
5
Reference Images : I. Richardson, “H.264 and MPEG-4 video compression,” WILEY, 2003
H.264 encoder/decoder reference design
![Page 6: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/6.jpg)
66
6
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
H.264 Kernel Algorithms
Heavy SIMD workload
Different natural SIMD widths
High & Medium Thread Level Parallelism
Need to support multiple SIMD widths to maximize the SIMD utilization
6
![Page 7: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/7.jpg)
77
7
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
Example – Deblocking Filter
Two dimensional data are used for multimedia algorithms.
Row or column order memory access works well for one set of edges, but not for the other.
Diagonal memory bank system helps to access blocks along a row or a column.
7
Horizontal Filtering
Vertical
Filtering
![Page 8: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/8.jpg)
88
8
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
Subgraphs for Innerloops of two kernel algorithms
Large amount of data locality
Large RF power consumption (Read/Write)
Bypass and Temporary buffer support
8
![Page 9: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/9.jpg)
99
9
Customizing Wide-SIMD Architectures for H.264
H.264 - Analysis
Instruction Pairs
Heavy usage of shuffle and arithmetic operations
Add-Shift : round operation
Sub-Abs : SAD operation
Need to fuse the frequently used instruction pairs
9
![Page 10: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/10.jpg)
1010
10
Customizing Wide-SIMD Architectures for H.264
H.264 - Analysis
Permutation Patterns for Intraprediction
Fixed set of shuffle patterns
Need for programmable shuffle network
10
![Page 11: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/11.jpg)
1111
11
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
11
![Page 12: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/12.jpg)
1212
12
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
12
Multiple SIMD widths
Thread-Level Parallelism
![Page 13: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/13.jpg)
1313
13
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
13
Diagonal Memory Organization
Memory Bank System + Shuffle Network
![Page 14: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/14.jpg)
1414
14
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
14
Short-lived values stored in temporary buffers
![Page 15: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/15.jpg)
1515
15
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
15
Short-lived values
Fused Operation
![Page 16: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/16.jpg)
1616
16
Customizing Wide-SIMD Architectures for H.264
Modified SIMD architecture
16
Shuffle Networks are placed here and there to align data
![Page 17: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/17.jpg)
1717
17
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Intra Prediction
17
![Page 18: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/18.jpg)
1818
18
Customizing Wide-SIMD Architectures for H.264
Results
System Breakdown
H.264 CIF video at 30fps
18
![Page 19: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/19.jpg)
1919
19
Customizing Wide-SIMD Architectures for H.264
Results
Speedup Breakdown
2.13x performance increase on average
19
![Page 20: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/20.jpg)
2020
20
Customizing Wide-SIMD Architectures for H.264
Results
Energy-Delay product comparison
29% energy-delay improvement on average
20
![Page 21: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/21.jpg)
2121
21
Customizing Wide-SIMD Architectures for H.264
Results
21
Comparison with latest H.264 encoders
[17] T. C. Chen et.al, “2.8 to 62.7 mW low-power and power-aware H.264 encoder for mobile
applications,” 2007 IEEE Symposium on VLSI Circuits, pp. 222–223, June 2007.
[18] M. Bhatnagar, “TMS320DM6446/3 Power Consumption Summary,” Texas Instruments
Application Reports, http://focus.ti.com/lit/an/spraad6a/spraad6a.pdf, Feb. 2008.
![Page 22: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/22.jpg)
2222
22
Customizing Wide-SIMD Architectures for H.264
Conclusion
Key architectural enhancements SIMD partitioning
Diagonal memory bank system
Bypass and temporary buffer support
Fused operation support
Programmable crossbar
Future work Image processing algorithms on SIMD architecture
22
![Page 23: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/23.jpg)
2323
23
Customizing Wide-SIMD Architectures for H.264
Backup Slides
23
![Page 24: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/24.jpg)
2424
24
Customizing Wide-SIMD Architectures for H.264
H.264 – Analysis
Diagonal Memory Organization
Two dimensional data are used for multimedia algorithms.
Blocks along a row or a column need to be accessed easily.
24
![Page 25: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/25.jpg)
2525
25
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Deblocking Filter
25
![Page 26: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/26.jpg)
2626
26
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Motion Compensation
26
![Page 27: 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.](https://reader030.fdocuments.net/reader030/viewer/2022032707/56649e555503460f94b4be05/html5/thumbnails/27.jpg)
2727
27
Customizing Wide-SIMD Architectures for H.264
Mapping of H.264 Kernels
Motion Estimation
27