Gaussian Pyramid: Comparative Analysis of Hardware ...
Transcript of Gaussian Pyramid: Comparative Analysis of Hardware ...
© ANAFOCUS
2007
Gaussian Pyramid: Comparative Analysis of
Hardware Architectures
F. D. V. R. Oliveira1, J. G. R. C. Gomes1, J. Fernández-Berni2, R. Carmona-Galán2, R. del
Río2, Á. Rodríguez-Vázquez2
1Universidade Federal do Rio de Janeiro, Brazil
2Instituto de Microeléctronica de Sevilla (IMSE – CNM)
CSIC – Universidad de Sevilla, Spain
Workshop on Architecture
of Smart Cameras
Córdoba
Spain
Embedded Vision Processing Architecture
CONVENTIONAL ARCHITECTURE GPUs DSPs FPGAs …
• Hardware parallelization takes place after having serialized the data previously
• The fact that the imager requires the physical realization of a 2-D array of elementary cells topographically assigned to the corresponding pixel values can be exploited for early parallelization and distributed memory
Embedded Vision Processing Architecture
PROPOSED ARCHITECTURE
• Drastic reduction of memory accesses during low-level processing stages, where pixel-wise operations are common
• Pixel circuitry to accelerate vision algorithms. This circuitry can be implemented in the analog domain for the sake of power and area efficiency
Long-Term Research
Major drawbacks
Reduced fill factor
Large pixel pitch
→ Limited sensitivity
→ Small image size
→ Spatial aliasing
Major achievements
Concept demonstration
Programmable embedded functionalities
Image-to-Decision chain at >1,000fps using 60nW per pixel (industrial chip)
Spatial Gaussian filtering @20nJ/filter
Content-aware HDR acquisition with >145dB intra-frame DR
Major challenges
Implementation of in-pixel embedded functionalities at minimum area cost
Increase hardware-software integration
Drawbacks and Major Challenges
CONVENTIONAL PIXEL
Photo-sensitive
area
pixel pitch 𝑷
Amplification & R
ead
ou
t C
ircu
itry
𝑷𝟐
Drawbacks and Major Challenges
MULTI-FUNCTIONAL PIXEL
Photo-sensitive
area
pixel pitch 𝑷
Amplification &
Re
ado
ut
Cir
cuit
ry
keep the pixel pitch → reduce the sensitive area
Photo-sensitive
area
pixel pitch 𝑷
Amplification &
Re
ado
ut
Cir
cuit
ry
Processing circuitry
& m
em
ory
keep the sensitive area → reduce the image resolution
Photo-sensitive
area
pixel pitch 𝑷 + ∆
Amplification &
Re
ado
ut
Cir
cuit
ry
Processing circuitry
& m
em
ory
Photo-sensitive
area
Amplification &
Re
ado
ut
Cir
cuit
ry
pixel pitch 𝑷
Drawbacks and Major Challenges
How to minimally impact on the
image quality while maximally
exploiting the advantages of focal-
plane processing
Fundamental Processing Primitive
[J. Campbell and V. Kazantev, “Using an Embedded Vision Processor to Build an Efficient Object Recognition System,” White Paper, Synopsis, 2015]
CMOS IMPLEMENTATION GAUSSIAN FILTERING
- Basic operation in many vision pipelines
Fundamental Processing Primitive
CMOS IMPLEMENTATION GAUSSIAN FILTERING
Fundamental Processing Primitive
CMOS IMPLEMENTATION GAUSSIAN FILTERING
Original full-resolution image
Examples:
Sobel operators
Binomial kernel Original half-resolution image Pre-distorted half-resolution image
Original kernel Reduced kernel
Binomial kernel output
Fundamental Processing Primitive
CMOS IMPLEMENTATION GAUSSIAN FILTERING
Fundamental Processing Primitive
CONVENTIONAL VS. FOCAL-PLANE REALIZATION GAUSSIAN FILTERING
Time Analysis
[F. V. R. Oliveira et al, “Gaussian Pyramid: Comparative Analysis of Hardware Architectures,” IEEE Transactions on Circuits and Systems I, in press, 2017]
CONVENTIONAL VS. FOCAL-PLANE REALIZATION GAUSSIAN FILTERING
Focal-plane processing time
𝑛𝑘: size of the Gaussian kernel
𝑁𝐿𝑒𝑣: number of pyramid levels
𝜏𝐶𝑅: time required to perform one charge redistribution
𝜏𝐴𝐷𝐶: time required to perform the analog-to-digital conversion of one pixel
𝑁𝐴𝐷𝐶: Number of ADCs
𝑀 × 𝑁: Image resolution
Time Analysis
[F. V. R. Oliveira et al, “Gaussian Pyramid: Comparative Analysis of Hardware Architectures,” IEEE Transactions on Circuits and Systems I, in press, 2017]
CONVENTIONAL VS. FOCAL-PLANE REALIZATION GAUSSIAN FILTERING
Digital implementation processing time
𝜏𝑀𝑒𝑚: time required to access a single memory position
𝑁𝑏𝑢𝑠𝑀𝑒𝑚: number of parallel accesses to memory
𝜏𝑜𝑝: time required to perform a single MAC operation
Time Analysis
[F. V. R. Oliveira et al, “Gaussian Pyramid: Comparative Analysis of Hardware Architectures,” IEEE Transactions on Circuits and Systems I, in press, 2017]
CONVENTIONAL VS. FOCAL-PLANE REALIZATION GAUSSIAN FILTERING
Parameters of time analysis equations
Time Analysis
[F. V. R. Oliveira et al, “Gaussian Pyramid: Comparative Analysis of Hardware Architectures,” IEEE Transactions on Circuits and Systems I, in press, 2017]
Energy Analysis
Trickier highly dependent on the architecture and technology parameters; no global parameter either (clock period in time analysis) Standard circuit blocks
MAC unit SRAM memory cell
Energy Analysis
Equations associated with every aspect of the hardware
Focal-plane energy analysis
𝐸𝑝𝑖𝑥𝐶𝑎𝑝𝑡𝑢𝑟𝑒 = 𝐶𝐹𝐷 ∙ 𝑉2𝑑𝑑𝑀
+ 𝐶𝑅𝑠𝑡 ∙ 𝑉2𝑑𝑑𝑀
+ 𝐶𝑇𝑋 ∙ 𝑉2𝑑𝑑𝑀
𝐸𝑐ℎ𝑔𝑅𝑒𝑑𝑖𝑠𝑡𝑟 = (𝑁𝐿𝑒𝑣 − 1)𝑛𝑘 ∙ 2𝑀 ∙ 2𝑁 ∙ (2𝐶𝑛 ∙ 𝑉2𝑑𝑑𝑀
)
.
.
.
Digital implementation energy analysis
𝐸𝑀𝐴𝐶𝑑𝑦𝑛𝑎𝑚𝑖𝑐 = 𝛼 ∙ 𝑁𝑑 ∙ 𝐶𝑛 ∙ 𝑉2𝑑𝑑(𝑁𝑜𝑝 ∙ 3𝜏𝑜𝑝)/𝜏𝐶𝑙𝑘
𝐸𝑀𝐴𝐶𝑠𝑡𝑎𝑡𝑖𝑐 = 𝑁𝑂𝑓𝑓 ∙ 𝑉𝑑𝑑 ∙ 𝐼𝑙𝑒𝑎𝑘 ∙ 𝜏𝐷𝑖𝑔𝑖𝑡𝑎𝑙 . . .
[F. V. R. Oliveira et al, “Gaussian Pyramid: Comparative Analysis of Hardware Architectures,” IEEE Transactions on Circuits and Systems I, in press, 2017]
Energy Analysis
Equations associated with every aspect of the hardware
A/D conversion energy analysis
Energy Analysis
Parameters of energy analysis equations
[F. V. R. Oliveira et al, “Gaussian Pyramid: Comparative Analysis of Hardware Architectures,” IEEE Transactions on Circuits and Systems I, in press, 2017]
Energy Analysis
Parameters of energy analysis equations
[F. V. R. Oliveira et al, “Gaussian Pyramid: Comparative Analysis of Hardware Architectures,” IEEE Transactions on Circuits and Systems I, in press, 2017]
Conclusions
• Hypothesis early vision stages can be accelerated at the focal plane at low energy cost by adding extra per-pixel circuitry
• Comprehensive analysis for Gaussian pyramid generation with minimum pixel area impact
• Major conclusion Potential advantages of focal-plane processing are case-specific
• A/D conversion: critical stage
• Regarding processing time, the focal-plane approach ideally requires one ADC per column to report significant advantages
• Regarding energy saving, the focal-plane approach renders best results for SAR, cyclic or 𝚺𝚫