Automatic Kernel Code Generation for Focal-plane Sensor ...

41
1 Automatic Kernel Code Generation for Focal-plane Sensor-Processor Devices Thomas Debrunner - MSc Student Imperial College London Paul Kelly - Software Performance Optimisation Group Lead, Imperial College London Sajad Saeedi Research Fellow, Imperial College London

Transcript of Automatic Kernel Code Generation for Focal-plane Sensor ...

Page 1: Automatic Kernel Code Generation for Focal-plane Sensor ...

1

Automatic Kernel Code Generation for Focal-plane Sensor-Processor DevicesThomas Debrunner - MSc Student Imperial College LondonPaul Kelly - Software Performance Optimisation Group Lead, Imperial College LondonSajad Saeedi – Research Fellow, Imperial College London

Page 2: Automatic Kernel Code Generation for Focal-plane Sensor ...

2

With kind support from Piotr Dudek and his team at Manchester University

This work is part of the EPSRC “PAMELA” Project

Page 3: Automatic Kernel Code Generation for Focal-plane Sensor ...

Cameras produce images for humans, not machines

3

Page 4: Automatic Kernel Code Generation for Focal-plane Sensor ...

http://personalpages.manchester.ac.uk/staff/p.dudek/papers/carey-cnna2012.pdf

SCAMP 5 focal-plane sensor processor

• 256x256 SIMD processor array

• Light sensor on every processor

• Ca.170 transistors per processor

Piotr Dudek and colleagues at Manchester University

4

Page 5: Automatic Kernel Code Generation for Focal-plane Sensor ...

http://personalpages.manchester.ac.uk/staff/p.dudek/papers/carey-cnna2012.pdf

Piotr Dudek and colleagues at Manchester University

5

SCAMP 5 focal-plane sensor processor

• Seven registers holding analogue values

• Computation by moving charge

• Addition is easy

• No multiply

• North-east-west-south data movement

Page 6: Automatic Kernel Code Generation for Focal-plane Sensor ...

Basic instruction set (of interest)

• Shift image x• Shift image y• Add two images• Subtract two images• Scale image by 1/2• Take absolute value of image

6

Page 7: Automatic Kernel Code Generation for Focal-plane Sensor ...

This talk• How to do convolution filters on SCAMP 5?• For image filtering• As a component in image processing algorithms

• Notably CNNs

• Potential • low power• Extreme effective frame rate

• Example: Viola-Jones face detection• A compiler: general code generator producing highly-

optimised convolution implementations7

Page 8: Automatic Kernel Code Generation for Focal-plane Sensor ...

010203040506070

Gauss3 Box7 Sobel

Filter time [μs]

CPU GPU CPACPU: INTEL i7-6700, GPU: NVIDIA TITAN X, CPA: SCAMP-5c estimate

Page 9: Automatic Kernel Code Generation for Focal-plane Sensor ...

We can add repeatedly – so we can multiply by a constant

9

Convolution filters on SCAMP 5Easy filters

Page 10: Automatic Kernel Code Generation for Focal-plane Sensor ...

10

Convolution filters on SCAMP 5Harder filters

Page 11: Automatic Kernel Code Generation for Focal-plane Sensor ...

We can divide by two repeatedly

11

Convolution filters on SCAMP 5Harder filters – still easy

Page 12: Automatic Kernel Code Generation for Focal-plane Sensor ...

12

Convolution filters on SCAMP 5Hard filters

Page 13: Automatic Kernel Code Generation for Focal-plane Sensor ...

We can approximate

13

Convolution filters on SCAMP 5Hard filters – easy again

Page 14: Automatic Kernel Code Generation for Focal-plane Sensor ...

14

We can approximate

Page 15: Automatic Kernel Code Generation for Focal-plane Sensor ...

Filters often have repeated terms

We implement multiplication using summations – so there are lots of common subterms

We can shift intermediate values to save redundant computation

15

Page 16: Automatic Kernel Code Generation for Focal-plane Sensor ...

Simple motivating (extreme) example 5x5 Box:

16

+ +

++# (1)

+

++

(1) # (1)

++ +

+

(1)

" (2)

+

! (2)

Page 17: Automatic Kernel Code Generation for Focal-plane Sensor ...

“Final Set” (FS) of Partial Value Representatives (PVR)The set of summands we need for the result of the filter application

Finding a plan: End point

17

Page 18: Automatic Kernel Code Generation for Focal-plane Sensor ...

“Initial Set” (IS)

The set of summands of a fresh image

Finding a plan: Starting point

18

Page 19: Automatic Kernel Code Generation for Focal-plane Sensor ...

Find a sequence of operations to transform IS into FS

Objective

19

(Identity filter)

(desired filter)

Page 20: Automatic Kernel Code Generation for Focal-plane Sensor ...

Instructions as transformationsShifts:

20

(0 0) (1 -1)(2 4) (3 3)

→(1) ↓(1)

Page 21: Automatic Kernel Code Generation for Focal-plane Sensor ...

Instructions as transformationsScales (Div2):

21

(0 0)(0 0)

+(1) (0 0)

Page 22: Automatic Kernel Code Generation for Focal-plane Sensor ...

Instructions as transformationsAdditions / Subtractions:

22

(0 1)

+

(0 1)(1 2)

(1 2)

+

Page 23: Automatic Kernel Code Generation for Focal-plane Sensor ...

Reverse SplitFS

A B R

A, B transformableRecursive, continue with B, R

IS

Page 24: Automatic Kernel Code Generation for Focal-plane Sensor ...

We prune splits that would exceed the number of registers in the SCAMP 5 device (seven)

We prune subtrees when the resulting instruction sequence is longer than the best so far

We attempt heuristically-promising splits first

24

Reverse SplitPruning

Page 25: Automatic Kernel Code Generation for Focal-plane Sensor ...

25

node1 = east(node0)node2 = west(node1)node2 = west(node2)node4 = west(node1)node4 = div2(node4)node3 = add(node2,node1)node6 = add(node3, node4)

Example

Page 26: Automatic Kernel Code Generation for Focal-plane Sensor ...

Apply a systematic retiming to minimize shifts26

Graph Relaxation

Page 27: Automatic Kernel Code Generation for Focal-plane Sensor ...

0

1 2

3

4

5

B = west(A)C = div2(A)B = add(C, B)A = east(A)A = add(B, A)

Final resulting code:

27

Register Allocation

Page 28: Automatic Kernel Code Generation for Focal-plane Sensor ...

Full exhaustive search, compared to heuristic search on Sobel 3�3 filter (sampled over 256 runs) 28

Evaluation

Page 29: Automatic Kernel Code Generation for Focal-plane Sensor ...

• SCAMP 5: estimated based on 10MHz clock rate

• 8 common filter examples on 256�256 8-bit grayscale image

• CPU and GPU: default implementations shipped with OpenCV 3.3.0, with TPP and IPP enabled and with CUDA V8.0.61

• Power estimated based on TDP and time

29

Evaluation

Page 30: Automatic Kernel Code Generation for Focal-plane Sensor ...

7 StageViola-JonesFaceDetector

• Due to code size and other limitations, we were only able to run a 7-stage Viola-Jones face detector

• It works as well as a 7-stage CPU implementation• But for full accuracy, 25 stages are needed. SCAMP 5 would be

slower than CPUs, but uses much less energy 30

Page 31: Automatic Kernel Code Generation for Focal-plane Sensor ...

Convolution filters are a key capabilityWith a suitable code generator we can do a lot with very very simple hardwareBy trading approximation against efficiency we can do even more

Near-camera processing is the only way we can approach biological levels of energy efficiencyThere is a spectrum of design choices:

How much to do in analogueWhere to convert to digitalHow compute is distributed and connected to the sensorsHow to preprocess to reduce larger-scale data movement

31

Conclusions

Page 32: Automatic Kernel Code Generation for Focal-plane Sensor ...

32

Backup

Page 33: Automatic Kernel Code Generation for Focal-plane Sensor ...

Reverse SplitFS

A B R

A, B transformable

Page 34: Automatic Kernel Code Generation for Focal-plane Sensor ...

34

Example

Page 35: Automatic Kernel Code Generation for Focal-plane Sensor ...

FS(-1 0)(-1 0)( 0 0)( 1 0)( 1 0)

(1 0)(1 0)

(-1 0)(-1 0)

(0 0)

A

B

R

Page 36: Automatic Kernel Code Generation for Focal-plane Sensor ...

FS(-1 0)(-1 0)( 0 0)( 1 0)( 1 0)

(1 0)(1 0)

(-1 0)(-1 0)

(0 0)

A

B

R

→(2)+

+

+

Page 37: Automatic Kernel Code Generation for Focal-plane Sensor ...

(-1 0)(-1 0)

(0 0)

B

R

Page 38: Automatic Kernel Code Generation for Focal-plane Sensor ...

FS

A B R

A B R1 R2

FS

A B R

A B2RB1

FS

A B R

A BR1 R2

Page 39: Automatic Kernel Code Generation for Focal-plane Sensor ...

(-1 0)(-1 0)

(0 0)

B

R

(-1 0)(-1 0)

(0 0)

B

A

→(1) +(1)

Page 40: Automatic Kernel Code Generation for Focal-plane Sensor ...

(-1 0)(-1 0)

B

Page 41: Automatic Kernel Code Generation for Focal-plane Sensor ...

(-1 0)(-1 0)

BIG(0 0)(0 0)

←(1)