The Potential of GPU-driven High Performance Data Analytics in Spark

16
Spark Summit Brussels, October 26, 2016 THE POTENTIAL OF GPU-DRIVEN HIGH PERFORMANCE DATA ANALYTICS IN SPARK Andy Steinbach, Sr. Director, NVIDIA

Transcript of The Potential of GPU-driven High Performance Data Analytics in Spark

Slide 1

Spark Summit Brussels, October 26, 2016THE POTENTIAL OF GPU-DRIVEN HIGH PERFORMANCE DATA ANALYTICS IN SPARKAndy Steinbach, Sr. Director, NVIDIA

#

The Times are a Changing AgainAI - Next Industrial RevolutionWhat is AI, ML, DL BlogTraining vs Inference - BlogWho needs it (Industries/functions)Industries, Companies, ApplicationsPeople need to care about AIWhat you need to be able to do DLData, HW (GPUs), SW (DLSDK) Pain exists our solutions addressWalk through personas: clinician/digitsWhy NVIDIA is the platform to succeed in AISupport SA, IBDs, DLI

1

Scale up

Compute intensiveScale out

Data intensive

HOW TO SCALE AI& DATA ANALYTICS?

We are headed here

#

High PERFORMANCE DATA ANALYTICSScale outScale up

Spark + TensorFlow + GPUSpark + AI framework + GPUMachine Learning & DB Query Deep Learning

#

Training

ImageNet

Inference

DEEP LEARNING - A NEW COMPUTING MODEL

#

BEYOND JUST COMPUTER VISION

#

5

Trained modelLabelled training examplesInference applied to unseen inputsA REVOLUTION IN MEDECINE

#

A REVOLUTION IN ROBOTICS

#

GPU-POWERED SELF-DRIVING CARS

#

#

SUPERHUMAN PERFORMANCE

#

WHAT DOES DEEP LEARNING LEARN?

Feature RepresentationLearning AlgorithmInput

#

PREDICTIVE ANALYTICS IS NEXT

#

10,000s of features make up todays fraudulent behavior. AI can detect patterns faster and more accurate than humans-Hui Wang, Senior Director of Global Risk Sciences, Pay Pal

PREDICTIVE ANALYTICS IS NEXT

#

PayPal

Problem - Online fraud

Solution - The deep learning algorithms are able to analyze potentially tens of thousands of latent features (time signals, actors and geographic location are some easy examples) that might make up a particular type of fraud, and are even able to detect sub modus operandi, or different variants of the same scheme, she said.

Impact - Shes hopeful deep learning will give her team the ability to adapt to these new patterns faster than before. Its possible, for example, that PayPal might some day be able to deploy models that take live data from its system and become smarter, by retraining themselves, in real time.

https://gigaom.com/2015/03/06/how-paypal-uses-deep-learning-and-detective-work-to-fight-fraud/

12

THE NEED TO SCALE UP & OUT IS HUGE

INCREASING DATA VARIETY

Search MarketingBehavioral TargetingDynamic FunnelsUser Generated ContentMobile WebSMS/MMSSentimentHD VideoSpeech To TextProduct/Service LogsSocial NetworkBusiness Data FeedsUser Click StreamSensorsInfotainment SystemsWearable DevicesCyberSecurity LogsConnectedVehiclesMachine DataIoT Data

Dynamic PricingPayment RecordPurchase DetailPurchase RecordSupport ContactsSegmentationOffer DetailsWeb LogsOffer HistoryA/B TestingBUSINESS PROCESSPetabytesTerabytesGigabytesExabyteszettabytesStreaming VideoNatural Language ProcessingWEBDIGITALAI

#

Increase data size/variety13

DGX-1 DEEP LEARNING SUPERCOMPUTER

#

M2090M1060K20K80PascalGB/sM1060K20GFLOPSK80Pascal

M2090PERFORMANCE GAP INCREASES

#

Run time (sec)In practice, compute:

with:

In a nutshell: a complex numerical function

How to Scale Data Analytics?

#

16