Counting Passenger Vehicles from Satellite...

Counting Passenger Vehicles from Satellite Imagery“ Not everything that can be counted counts, and not everything that counts can be counted”

NVIDIA GPU Technology Conference02 Nov 2017

MAHMOUD LABABIDISENIOR DATA SCIENTIST

KEVIN GREENMACHINE LEARNING AND COMPUTER VISION SCIENTIST

10/31/2017 Copyright © DigitalGlobe | Radiant. All rights reserved. 2

“There Are Three Kinds of People -Those Who Can

Count and Those Who Can't”

310/31/2017 Copyright © DigitalGlobe | Radiant. All rights reserved.

Agenda Introduction to Satellite Imagery

Why Machine Learning for Satellite Imagery?

Car Counting from Space is Hard▫ Problems▫ Solutions

Algorithm Overview▫ Classification▫ Segmentation▫ Object Detection

Production

Future Work

Car Count = 14 Cars


DigitalGlobe’s Satellite Constellation


Why Machine Learning for Satellite Imagery?

Extracting information from imagery in an automated way allows for timely, low-manpower macroscopic as well as needle-in-haystack exploitations (e.g., Intelligence, Business Analytics, Disaster Relief)

Some challenges exist in viewing satellite imagery:▫ Environmental (lighting) and atmospheric (clouds) distortion▫ Pixel count for small objects (cars) in 30 cm resolution imagery

is small (~4-5 pixel width)

Automation exists for basic techniques(e.g., vegetation and land use classification), however, for complex tasks such as object detection, these techniques fail

The capability of transforming overhead imagery content into countable objects will satisfy the analytic needs of commercial and military customers


Car Counting from Space is HardProblems

▫ Counting thousands of cars from 30 cm (1 foot) resolution Satellite imagery is visually difficult (i.e., cars are roughly 5 pixels width), tedious, and laborious

▫ Various brightness levels of cars, some cars are dark, blending into other dark surfaces

▫ Classical computer vision techniques such as edge detection fail in this arena

Our Solutions▫ Object Classification▫ Segmentation + Morphological operations▫ Object Bounding Box Detection


Input Imagery

• WorldView 3

Classifier

• LeNet

Morphology

• Binary Threshold

• Convex Hull

• Opening

Labeled Cars and Count

• Estimated Car Count

Algorithm Overview

Segmentation

• Fully Convolutional Network (FCN)

Object Detector

• Single Shot Detector (SSD)

Non-maximum Suppresion

• Removes overlapping detection boxes

Non-maximum Suppresion

• Removes overlapping detection boxes


Algorithm #1: The LeNet Classifier

Yann LeCun started the Convolutional technique to perform classification, particularly for Optical Character Recognition. Their 1998 paper, Gradient-Based Learning Applied to Document Recognition has over 9,600 citations.


The LeNet Classifier: Sliding Windows and NMS

Trained Classifiers slide a window over an image and placing a bounding box (BBOX) on the portions of the image that contain the object of interest

Non-maximum suppression (NMS) is an algorithm that eliminates overlapping detection boxes that are produced by sliding a window over an image

Sliding Window

Non-maximum Suppression


The LeNet Classifier Results

Several issues arise when using an image classifier to detect cars

▫ Determining how much percent overlap to allow before suppressing overlapping detection boxes

▫ More prone to false hits (see figure B)

▫ Challenges in determining car count if BBOX is too large

A)

B)


Algorithm #2 Segmentation: Fully Convolution Network

Based on the VGG Neural Network


Segmentation Results

Here are the issues that surface when using a segmentation algorithm to detect cars

▫ The segmentation outputs exceptional results through pixel masks containing cars but lacks individual BBOXs

▫ Fortunately, segmentation has few false positives to Classification (and Object Detection in some cases)

▫ Although segmentation is powerful, in order to extract individual cars, additional post-processing is needed (e.g., morphology)

B)

A)


Morphological Operations used to Extract Cars from Segments


Morphological Operations (Opening/Closing/Convex)

Morphological operations modify the shape of an image in diverse ways:

Erosion – Erodes the boundary of the image

Dilation – Expands the boundary

Opening - Erosion followed by dilation (noise removal)

Closing – Dilation followed by erosion (hole filler)

Convex - Shape formed by a rubber band stretched around foreground image

Image Erosion

Opening

Closing

Convex Hull

Image Dilation


Car (RGB) Car (Binary Threshold)

Does this still look like a Car?


Morphology to Extract Cars

Thresholding followed by morphological operations may not always yield one car blob:

Figure A illustrates two car components after binary thresholding

Figure B does show one car component per car blob after applying an ‘opening’ operation

However, is the resultant car blob really a car?

One final step is validate the car geometry

A)

B)


Spatial False Alarm Mitigators(FAMs) Reduce False Detections

Spatial FAMs using oriented bounding boxes were used to eliminate blobs that didn’t meet the average dimensions criteria for a car:

Area – Eliminate too big or too small

Length/Width Ratio – Eliminate long skinny boxes

Length and Width – Eliminate if pixel size is one

Area


Algorithm #2: Segmentation

Conclusion:Segmentation (localization) shows promising results when combined with morphological operations (refinement), enabling us to quickly calculate accurate car counts in satellite imagery

Car Count = 14 Cars


Algorithm #3: Single Shot Detector (SSD)


The SSD Object Detector Results

The Single Shot Detector does a more direct execution at detecting and drawing BBOXs around cars

▫ Locates individual cars more accurately in densely packed parking lots

▫ Also less prone to false hits (see figure B)

▫ NMS issues may arise if overlap area isn’t calibrated

A)

B)


Competitive Car-Counting Bake-off

Other Commercial Vendor = 2,205


Training (wheels) to Production

NVIDIA GPU Training:NVIDIA GTX 980NVIDIA GTX 1080NVIDIA TitanXNVIDIA M40NVIDIA P100NVIDIA M1000M - Mobile

Machine Learning Frameworks:TensorFlowKerasCaffe

GBDX is the platform that uses Amazon EC2 to deploy docker images of the code and model.

Training Speed: 100 Batches of 4 300x300 images take 20 minutes to train

Inference Speed: 20 minutes on strip shown above approximately 13kx13k


AnswerFactory – SSD Workflow1) Define AOI & Select Detect Model (Cars) 2) Select Date Ranges & Auto Update

Historical (15 years) Run on all new

images in the future

AnswerFactory

VIP &

Visitor

Parking

Resident

Parking

Employee

Parking

4) Analyze Individual Parking Lots Over-time3) Run Model & Get Results


Provide analysts “Tips” on changing activity levels for enhanced garrison monitoring

0

50

100

150

200

250

300

350

16APR2015 29AUG2015 2SEP2015

Military Vehicle Counting

Object Count

Actual Count

16APR2015

29AUG2015

2SEP2015

Significant ChangesDetected


Future Work

Explore the use of different multi-spectral band combinations for improved car count

Explore whether different activations might better support detecting dark cars (e.g., Leaky Relu)

Go beyond temporal volume anomalies to include spatial anomalous behaviour

Upcoming Xview Challenge, which is an ImageNet-like challenge competition for Satellite imagery

DigitalGlobe colleague, Tood Bacastow, discussed earlier today in his talk entitled “SpaceNet: Accelerating Automated Mapping with Deep Learning and Labeled Satellite Imagery”


Thank you!Questions?

Counting Passenger Vehicles from Satellite...

Documents

Transcript of Counting Passenger Vehicles from Satellite...