Lumos: A selfserve computer vision platform at AI NEXT conference

37
Lumos: A Self-Serve Computer Vision Platform Fei Yang Research Scientist Computer Vision, AML Facebook

Transcript of Lumos: A selfserve computer vision platform at AI NEXT conference

Lumos: A Self-Serve Computer Vision Platform

Fei YangResearch ScientistComputer Vision, AMLFacebook

}Why Computer Vision?

}Why Computer Vision?

• Enhanced photo / video search

Search photos posted by my friendscontaining a black bear

}Why Computer Vision?

• Enhanced photo / video search

• Detecting malicious content

}Why Computer Vision?

• Enhanced photo / video search

• Detecting malicious content

• Helping visually impaired people

}Why Computer Vision?

• Enhanced photo / video search

• Detecting malicious content

• Helping visually impaired people

• Smart camera

}Challenges of CV platform

• Large Scale

• Low Latency

• Reliability

• Flexibility

LumosFacebook’s Self-Serve Computer Vision Platform

Runs on Billions of images• Describes photos to the blind• Resurfaces notable memories • Provides better image and video search results• Protects people from objectionable content

More than 200 visual models• Currently trained and deployed• Dozens of teams across the company

self-serve build their own models

100+ Million examples in Lumosdatasets and growing fast

LumosLumos

DEEP RESIDUAL NETWORK

DEEP RESIDUAL NETWORK

TASK (T1)TASK (T1)

Lumos

DEEP RESIDUAL NETWORK

DEEP RESIDUAL NETWORK

TASK (T1)TASK (T1)

TRAINING:WEEKS

Lumos

DEEP RESIDUAL NETWORK

DEEP RESIDUAL NETWORK

TASK (T1)TASK (T1) TASK (T2)TASK (T2)

TRAINING:WEEKS

Lumos

DEEP RESIDUAL NETWORK

DEEP RESIDUAL NETWORK

DEEP RESIDUAL NETWORKDEEP RESIDUAL NETWORK

TASK (T1)TASK (T1)

nn

n-1n-1

22

11

TASK (T2)TASK (T2)

LESS COMPUTE/LESS ACCURACY

MORE COMPUTE/MORE ACCURACY

Lumos

DEEP RESIDUAL NETWORKDEEP RESIDUAL NETWORK

TASK (T1)TASK (T1)

nn

n-1n-1

22

11

TASK (T2)TASK (T2)

COMPUTE TIME:1-2 DAYS

ACCURACY:LESS

LumosLESS COMPUTE/LESS ACCURACY

MORE COMPUTE/MORE ACCURACY

DEEP RESIDUAL NETWORKDEEP RESIDUAL NETWORK

TASK (T1)TASK (T1)

nn

n-1n-1

22

11

TASK (T2)TASK (T2)

COMPUTE TIME:1 MONTH

ACCURACY:MORE

LumosLESS COMPUTE/LESS ACCURACY

MORE COMPUTE/MORE ACCURACY

DEEP RESIDUAL NETWORKDEEP RESIDUAL NETWORK

TASK (T1)TASK (T1)

nn

n-1n-1

22

11

TASK (T2)TASK (T2)

LESS COMPUTE/LESS ACCURACY

MORE COMPUTE/MORE ACCURACY

TASK (T3)TASK (T3) TASK (T4)TASK (T4) TASK (Tm)TASK (Tm)

Lumos

ACCU

RACY

COMPUTE

LumosTASK (T1)TASK (T1) TASK (T2)TASK (T2) TASK (T3)TASK (T3) TASK (T4)TASK (T4) TASK (Tm)TASK (Tm)

Lumos allows everyone at Facebook to build and deploy new computer vision models on the fly

• Collect training data for your new model• Train your new model at the right

accuracy/computational cost tradeoff• Refine your model based on live performance• Deploy your model to production

LumosLumos

On this DayOn this Day

AccessibilityAccessibility 360 Media Team360 Media TeamConnectivity LabConnectivity Lab

Protect and CareProtect and Care MomentsMomentsNews FeedNews Feed

Photo SearchPhoto Search

Lumos

Continuous Stream of Photos

Automatic Alt Text

ConnectivityOriginal GPW4 map Facebook high-res map

Detect Houses

• Indexing billions of photos• Finding similar photos in microseconds

Binary encoding

1011001011…01011110101101…00100001111000…10101111111001…00011010101010…10010001111110…10100101101001…11111001111000…10100001001001…0010

Compact representations

Query imageQuery image

Similar imagesSimilar images

• Clusters hundreds of millions photos into millions of clusters• Approach: A fast binary k-means algorithm

– Works directly on similarity-preserving binary hashes of images. – Clusters image hashes into binary centers. – Builds hash indexes of binary centers to speedup computation.

Video Understanding

Objects: Dog, Cat..Shot boundary detection

Caption:Dog chasing cat in garden while people are laughing

Action: Chasing

Scene: Garden

Summarization

Saliency Detection

Dynamic Compression

FuturePrediction

Video Q&A

Beating humans on identifying sports

Continuous stream of videos

Mobile Vision

Accuracy

Speed

Size

Small, Fast, Accurate models

Mobile Vision

Pose estimation

3

3D Point Cloud

}Thank you!