Scaling Face Recognition with Big Data

37
@ITCAMPRO #ITCAMP17 Community Conference for IT Professionals Scaling Face Recognition with Big Data Bogdan BOCȘE Solutions Architect & Co-founder VisageCloud https:// VisageCloud.com https ://www.linkedin.com/in/bogdanbocse/ https ://twitter.com/bocse

Transcript of Scaling Face Recognition with Big Data

Page 1: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Scaling Face Recognition

with Big Data

Bogdan BOCȘE

Solutions Architect & Co-founder VisageCloud

https://VisageCloud.com

https://www.linkedin.com/in/bogdanbocse/

https://twitter.com/bocse

Page 2: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• How to learn ?

• What to learn?

• Defining learning objectives

• How to scale learning?

• Gotchas

• VisageCloud

–Architecture

–Use Cases

Agenda

Page 3: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• What questions to ask before writing the code?

• How to look at the data before feeding it to the

machine?

• What is the state of the art regarding ML?

• What frameworks to use?

• What are the common traps to avoid?

• How to design for scale?

Objectives

Page 4: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

HOW TO LEARN?

Page 5: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Vision

• Convolutional Neural Networks

• Inception Paper

NLP

• Word2Vec

• GloVe: Global Vectors for Words Representation

Generic

• Classification

• Prediction

How to Learn?

Page 6: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Convolutional Neural Networks: Big Picture

Page 7: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Pooling / Max Pooling

• Convolution

• Fully Connected Activation–Activation Function, eg. ReLu

Convolutional Neural Networks : Components

Page 8: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Learning is an optimization problem

–Find parameters of a system (neural network) that minimize a fixed error function

–Not unlike planning orbital paths

• Defining the network architecture

• Defining the training algorithm

–Stochastic Gradient Descent

• With momentum

• With noisy

Taking a Step Back: The Math

Page 9: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• DeepLearning4j– Independent company

– Java interface with C-bindings for performance

• TensorFlow– Python & C++ API

– Developed by Google

– Compatible with TPU

• Torch– Developed by Facebook

– Written in LuaJIT, with Python bindings

Frameworks

Page 10: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

WHAT TO LEARN?

Page 11: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Public data sets

– Labelled Faces in the Wild (LFW)

–Youtube faces

–Kaggle

• Private data sets

• Build your own

–Outsourcing: Mechanical Turk

–Crowsourcing: ReCaptcha model

Data Sets

Page 12: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Preparing Data

Clean data

Cropping

Structure

Homogeneity

Normalization

Histograms

Filtering

Page 13: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Machine learning is not magic

• If you can’t understand the data, a machine probably

won’t either

• Preprocessing makes the difference between results

• Applying filters, normalization, anomaly detection is

computationally inexpensive

Preparing Data

Page 14: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

DEFINING LEARNING OBJECTIVES

Page 15: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Supervised

–Classification

–Scoring and regression

– Identification

• Unsupervised

–Clustering

Defining learning objectives

Page 16: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Projecting input onto a fixed set of classes

• “Don’t use a cannon to kill a fly”

–Support Vector Machines

• Linear

• Radial Based Functions

Classification

Page 17: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Embedding

–Projecting input (image) onto an vector space with a

known property

• Triplet Loss Function

Identification

Page 18: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Splitting a set of items into non-overlapping subsets,

based on item attributes

• Counting people in video streams

• Algorithms:

–Fixed threshold

–K-means

–Rank-order clustering

Clustering

Page 19: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

HOW TO SCALE LEARNING?

Page 20: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Scaling training

– Requires shared memory space

– Vertical scaling

• GPU

• Soon-to-come: TPU (tensor processing unit)

• Scaling evaluation

– Shared nothing architecture

–Neural network/classifier rarely change

– Load balancing pattern

– Partitioning data if needed

How to scale learning?

Page 21: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• There is no “reduce” for neural networks

• Averaging weights/parameters

–Usually not a good idea

• Genetic algorithms

– Requires a lot of processing power

– Running independent iterations on different machines

– Crossover between weights/parameters of independently trained neural networks after each epoch

Ideas for horizontal scaling

Page 22: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

GOTCHAS

Page 23: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Our 2D and 3D intuition often fails in high dimensions

• Distances tend to become relatively “the same” as

number of dimensions increases

• Dimensionality reduction

– Embedding functions

– Principal component analysis

The Curse of Dimensionality

Page 24: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• “The bottom of a valley is not necessarily the lowest

point on Earth”

• Learning algorithms may get stuck in local optima

• Using momentum or some random noise reduces

this possibility

• Using genetic algorithms can be even more robust,

but it’s computationally expensive

Local Optima

Page 25: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Visualizing Local Optima

monkey saddle

Page 26: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

“Based on state-of-the-art machine learning, our

weather forecast system can predict tomorrow’s

weather with 72% accuracy”

Evaluating of Learning

You get the same results by saying “it’s going to be the same as today”

Page 27: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Don’t test on the data you train on–Use different data set

– Split the data sets you have

• Beware of data biases– Confirmation bias

– Survivorship bias

– Selection bias

• Compare against a benchmark, even a dummy one– Coin flip

– Linear algorithms

– “Same-as-before”

Evaluation of Learning

Page 28: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Architecture and Use Cases

Page 29: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

High Level Architecture

VisageCloud Production

HAProxy(reverse proxy)

Image StorageAWS S3

Service(API Controller)

Cassandra Containers

(Docker)

Neural Networks(OpenCV, Dlib,

Torch, pixie magic)CQL Binary

HTTP

API Consumer(Customer Infrastructure)

HTTPS

HTTP

HTTPS

Page 30: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Detect faces

Align facesPre-

processingFeature

extractionFeature

comparison

Processing Pipeline

Page 31: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• The collection

–Slice of data used together

– 10K-100K records

• The Cache-Inside Pattern

– Loading / preloading collection in one application server

–Content based routing/balancing to maximize cache hits

–No logic in the database layer

–Requires periodic polling for updates

• Weaker consistency

Partitioning Data: Application Level Logic

Page 32: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Partitioning Data: Application Level Logic

Application Layer

Application Application Application

Cassandra (Database Layer)

Cassandra Node Cassandra Node Cassandra Node Cassandra Node

Content-based balancing/routing

Preload collectionPoll for updatesWrite updates

Page 33: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• Perform comparison logic in database

–User Defined Aggregate Functions

• Removes the need to move data around between

application and database

• Harder to deploy/test

• Stronger consistency

Partitioning Data: Application Level Logic

Page 34: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

• It’s math, not magic

• If you don’t understand the data, neither will the

machine

• Preprocessing makes the difference

• Test against a benchmark, any benchmark

• Evaluate first, scale later

Key Take-away

Page 35: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

[email protected]

+(40) 724 714 234

https://www.linkedin.com/in/bogdanbocse/

https://twitter.com/bocse

Let’s keep in touch

Page 36: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Many thanks to our sponsors & partners!

GOLD

SILVER

PARTNERS

PLATINUM

POWERED BY

Page 37: Scaling Face Recognition with Big Data

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Q & A