Hipi: Computer Vision at Large Scale

17
Chris Sweeny Liu Liu

Transcript of Hipi: Computer Vision at Large Scale

Page 1: Hipi: Computer Vision at Large Scale

Chris Sweeny

Liu Liu

Page 2: Hipi: Computer Vision at Large Scale

Intro to MapReduce SIMD at Scale

Mapper / Reducer

Page 3: Hipi: Computer Vision at Large Scale

MapReduce, Main Takeaway Data Centric, Data Centric, Data Centric!

Page 4: Hipi: Computer Vision at Large Scale

Hadoop, a Java Impl An Implementation of MapReduce originated from

Yahoo!

The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834

Page 5: Hipi: Computer Vision at Large Scale

Computer Vision at Scale The “computational vision”

The sheer size of dataset:

PCA of Natural Images (1992): 15 images, 4096 patches

High-perf Face Detection (2007): 75,000 samples

IM2GPS (2008): 6,472,304 images

Page 6: Hipi: Computer Vision at Large Scale

HIPI Workflow

Page 7: Hipi: Computer Vision at Large Scale

HIPI Image Bundle Setup Moral of the story:

Many small files are killing the performance in distributed file system.

Page 8: Hipi: Computer Vision at Large Scale

Redo PCA in Natural Images at Scale The first 15 principal components with 15 images

(Hancock, 1992):

Page 9: Hipi: Computer Vision at Large Scale

Redo PCA in Natural Images at Scale Comparison:

Hancock, 1992

HIPI, 100

HIPI, 1,000

HIPI, 10,000

HIPI, 100,000

Page 10: Hipi: Computer Vision at Large Scale

Optimize HIPI Performance Culling: because decompression is costly

Decompress at need

A boolean cull(ImageHeader header) method for conditional decompression

Page 11: Hipi: Computer Vision at Large Scale

Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944

Page 12: Hipi: Computer Vision at Large Scale

HIPI, Glance at Performance figures An empty job (only decompressing and looping over

images), 5 run, using minimal figure, in seconds, lower is better:

050

100150200250300350400450

10 100 1000 10000 100000

Many Small Files

Hadoop Sequence File

HIPI Image Bundle

Page 13: Hipi: Computer Vision at Large Scale

HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5

run, using minimal figure, in seconds, lower is better:

0

100

200

300

400

500

10 100 1000 10000 100000

Many Small Files

Hadoop Sequence File

HIPI Image Bundle

Page 14: Hipi: Computer Vision at Large Scale

HIPI, Glance at Performance figures Covariance job (compute covariance matrix of

patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:

0

1000

2000

3000

4000

5000

6000

7000

8000

10 100 1000 10000 100000

Many Small Files

Hadoop Sequence File

HIPI Image Bundle

Page 15: Hipi: Computer Vision at Large Scale

HIPI, Glance at Performance figures Culling job (decompressing all images V.S.

decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:

0

100

200

300

400

500

600

700

10 100 1000 10000 100000

Without Culling

With Culling

Page 16: Hipi: Computer Vision at Large Scale

Conclusion Everything at large scale gets better.

HIPI provides an image-centric interface that performs on par or better than the leading alternative

Cull method provides significant improvement and convenience

HIPI offers noticeable improvements!

Page 17: Hipi: Computer Vision at Large Scale

Future work Release HIPI as Opensource Project.

Work on deep integration with Hadoop.

Making HIPI work-load more configurable.

Making work-load more balanced.