AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC301)

Post on 16-Apr-2017

379 views 5 download

Transcript of AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC301)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Marshall Tappen and Ernesto Gonzalez

Amazon Fulfillment Technologies

November 30, 2016

MAC301

Transforming Industrial

Processes with Deep Learning

What to Expect from the Session

• Description of how Amazon Fulfillment Technologies has

used computer vision to improve our processes.

• Walk through how we combined deep learning and

traditional computer vision to automate an industrial

process.

• What are the challenges and the opportunity created by

deep learning classifiers?

Overview of fulfillment process

One thing you have to understand about

fulfillment centers

Bins can hold anything

Misplaced inventory “disappears”

Amazon Confidential 5

Associate

rearranged

inventory

when

picking

items.

Misplaced inventory “disappears”

Amazon Confidential 6

We call this

an

inventory

defect

Items fall out of pods

Our solution: use computer vision to locate

inventory defects

First step: get a physical system to capture

images

Station

Outbound

frame

Inbound frame

Totes and

conveyance

Amazon Confidential 9

Capture set of images as pod arrives at

the stationArrival Image

Tower

Departure Image

TowerStation

Associate interacts with pod

Arrival Image

Tower

Departure Image

Tower

Station

Photographed again as pod leaves

Arrival Image

Tower

Departure Image

Tower

Station

General strategy

• We want to take advantage of deep learning.

• The cameras capture images of an entire pod, but we

need data at the bin level.

• We will have a two-step process:

1. Extracting bins from images

2. Analyzing bin Images

Computer vision step 1: pod image to bin

images

No problem, use 2-D barcodes!

Amazon Confidential 15

No problem, use 2-D barcodes!

Bands block the

barcodes

Amazon Confidential 16

Solution, if we can detect the trays

Amazon Confidential 17

And we can detect the sides

Amazon Confidential 18

We have a set of points to match with a recipe of the

pod’s geometry

Amazon Confidential 19

Map the coordinate system of the database to

the face of the pod in the image

Amazon Confidential 20

Detecting the side of a pod: downsample image

and convert to grayscale

2046 X 2046 Image 512 X 512 Image

Amazon Confidential 21

Correlate* with left rail template

Filter

* In practice, we use normalized cross-correlation

Amazon Confidential 22

Threshold

Amazon Confidential 23

Fit a line (similar process for the other side)

Amazon Confidential 24

We can detect trays in the same way

Amazon Confidential 25

We can detect trays in the same way

Now we

have

locations to

tie the

virtual

template to

the image!

Amazon Confidential 26

Transformation between image and pod

physical coordinates is called a homography

We can verify

that it works by

calculating the

boundary of

each bin in the

image and

coloring it in.

Amazon Confidential 27

How can we use computer vision?

• Automatic

identification of

every item?

Amazon Confidential 28

How can we use computer vision?

• Automatic identification of every item?(TOO HARD)

• Automatic counting of every item?

Amazon Confidential 29

What does computer vision need to tell us?

• Automatic

identification of every

item?(TOO HARD)

• Automatic counting

of every item? (TOO

HARD)

Amazon Confidential 30

Instead, we can look for changes

Inbound to the Station Outbound from the Station

Amazon Confidential 31

Our first attempt was with hand-engineered

computer vision

Amazon Confidential 32

It’s hard!

Must be robust to items rolling or shuffling inside

the bin, illumination changes, specularity, etc.

The big insight

• We realized our problem was just binary classification.

• Two images in, one label out.

• Why not try this deep-learning thing?

We did the simplest thing possible

• Take the first image,

convert it to grayscale,

and put it in the red

channel of a new image

• Take the second image

and put it in the blue

channel

• Now, we have a single

image to pass to the

neural network

It worked great!

Best Hand-

Engineered Model

CIFAR CNN

Krizhevsky’s CNN

Processing pipeline

Pod Image

Bin Extraction

Bin Images

Defect

Detection

Implementation details

• Implemented in OpenCV in Python

• C++ extensions for some steps

• Neural net uses Caffe

• Trained on G2 instances

• Runs on CPU in FC server room

• Can tolerate latency in our current use-pattern

Software architecture

Inventory

Event

Correlator

(EC2)

VBI

Service

(EC2)

Remote

Count

Website

(Defect

Detection)

(EC2)

Site Server Room AWS

Inventory

Bin Count

Elimination

(EC2)

• Get Bin Defect

Result

• Get Bin Space

Available

Capture

Event

Data

Router

Bin

Extraction

Process

Auto

Count

Process

Local

Storage

Service

Put

Pod Face

Images

Put Bin

Images

Get Pod

Images

Camera

Controller

File Pusher

Barcode

Extraction

Edge

Device (s)

EDGE

DEVICE

Get Bin

Image

Get Bin

Image

Applications

SN

S

HTTP

POST

SNS

DynamoDB

SNS

SNS

SQS

Get Work for Remote

Counting

SQS

SQS

SNS

How can we use computer vision?

Automatic

identification of every

item?(TOO HARD)

Automatic counting

of every item?

Amazon Confidential 40

Could we just count the number of items in the

bin?

• At this point, we have lots of data.

• Some of it has errors from inventory defects, but

networks have proven resilient to this kind of thing.

• Why not just train a network to directly count bins?

Using a convolutional neural network

• We used the Caffe implementation of GoogLeNet [1]

[1] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent

Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE International

Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

Maps cleanly onto classification paradigm

• Treat it as a multi-class classification problem

Neural

Network

0.1

0.2

0.4

0.4

This saved the project

• Hit the targets we needed

• Eliminated a lot of hardware (no more before/after shots

needed)

• Made the project cost effective

• Here is what we learned:

• Don’t focus on algorithms, focus on DATA

How else can we use this data?

• We want to find free space

in the bin without having to

label data.

• We can guess from

dimensions of items.

• But where is the space at?

2.0

1.0

Train model to predict emptiness from an image

Emptiness scoreConv

Avg

Po

olGoogleNet

Conv

(3*3)

This is a noisy,

probably incorrect

estimate!

But we can use layers in the network to find where the

space actually is!

emptiness scoreConv

Avg

Po

olGoogleNet

Conv

(3*3)

1024 channels

3*3

Original image Activation map Binary mapOriginal image Activation map Binary map

And it works!

We are releasing a dataset

Takeaways

• We have great pattern recognition machinery now.

• Focus on the data:

• How can you get lots of it?

• What can you get for free?

• How much labeling do you really need?

• Is there a proxy problem?

Thank you!

Remember to complete

your evaluations!