GPU Technology Conference 2014 Keynote

53

description

NVIDIA CEO Jen-Hsun Huang introduces NVLink and shares a roadmap of the GPU. Primary topics also include an introduction of the GeForce GTX Titan Z, CUDA for machine learning, and Iray VCA.

Transcript of GPU Technology Conference 2014 Keynote

Page 1: GPU Technology Conference 2014 Keynote
Page 2: GPU Technology Conference 2014 Keynote

5

4

3

2

1

0 2003 2005 2007 2009 2011 2013

Tera

FLO

PS

GPU

CPU

Page 3: GPU Technology Conference 2014 Keynote

GTC — GROWING AND EXPANDING

2010 2012 2014

397 429

729

FASTEST GROWING TOPICS

Big Data Analytics

Machine Learning

Computer Vision

FASTEST GROWING TOPICS

Energy Exploration

Life Science & Genomics

Molecular Dynamics

#1 TOPIC

HPC / Supercomputing

Page 4: GPU Technology Conference 2014 Keynote

2012 2013 2014

FOSTERING THE GPU ECOSYSTEM Big Data / Cloud / Computer Vision

AudioStreamTV

Page 5: GPU Technology Conference 2014 Keynote

CUDA EVERYWHERE

Page 6: GPU Technology Conference 2014 Keynote

Takayuki Aoki Global Scientific Information and Computing Center

Tokyo Institute of Technology

“ Large-scale CFD Applications and a Full GPU Implementation of a Weather

Prediction Code on the TSUBAME Supercomputer ”

Page 7: GPU Technology Conference 2014 Keynote

BANDWIDTH BOTTLENECKS

CPU GPU

PCIe

PCI Express

CPU Memory

GPU Memory

16GB/sec

60GB/sec

288GB/sec

Page 8: GPU Technology Conference 2014 Keynote

INTRODUCING NVLINK CPU GPU

PCIe

Differential with embedded clock

PCIe programming model (w/ DMA+)

Unified Memory

Cache coherency in Gen 2.0

5 to 12X PCIe

Page 9: GPU Technology Conference 2014 Keynote

5X More Bandwidth for Multi-GPU Scaling

GPU

PCIe SWITCH

CPU GPU GPU GPU

Page 10: GPU Technology Conference 2014 Keynote

3D MEMORY 3D Chip-on-Wafer integration

Many X bandwidth

2.5X capacity

4X energy efficiency

0

200

400

600

800

1000

1200

2008 2010 2012 2014 2016

Memory Bandwidth

Page 11: GPU Technology Conference 2014 Keynote

Blaise Pascal 1623-1662

Mechanical Calculator

Probability Theory

Pascal’s Theorem

Pascal’s Law

Page 12: GPU Technology Conference 2014 Keynote

PASCAL

NVLink

3D Memory

Module

5 to 12X PCIe 3.0

2 to 4X memory BW & size

1/3 size of PCIe card

Page 13: GPU Technology Conference 2014 Keynote

SG

EM

M /

W N

orm

alized

2012 2014 2008 2010 2016

Tesla CUDA

Fermi FP64

Kepler Dynamic Parallelism

Maxwell DX12

Pascal Unified Memory

3D Memory

NVLink

20

16

12

8

6

2

0

GPU ROADMAP

4

10

14

18

Page 14: GPU Technology Conference 2014 Keynote

MACHINE LEARNING

Branch of Artificial Intelligence

Computers that learn from data

person

car

helmet

motorcycle

bird

frog

person

dog

chair

person

hammer

flower pot

power drill

Page 15: GPU Technology Conference 2014 Keynote

Machine Learning using Deep Neural Networks

Input Result

Page 16: GPU Technology Conference 2014 Keynote

Building High-level Features Using Large Scale Unsupervised Learning

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, A. Ng

Stanford / Google

1 billion connections

10 million 200x200 pixel images

1,000 machines (16,000 cores)

3 days

Page 17: GPU Technology Conference 2014 Keynote

1,000 CPU Servers 2,000 CPUs • 16,000 cores

600 kWatts

$5,000,000

GOOGLE BRAIN Today’s Largest Networks

1B connections 10M images ~3 days ~30 ExaFLOPS

Human Brain

~100B neurons x 1000 connections 500M images 5,000,000X “Google Brain” ~150 YottaFLOPS ~40,000 “Google Brain-Years”

SOURCE: Ian Goodfellow

Page 18: GPU Technology Conference 2014 Keynote

Deep Learning with COTS HPC Systems

A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro

Stanford / NVIDIA • ICML 2013

STANFORD AI LAB

3 GPU-Accelerated Servers 12 GPUs • 18,432 cores

4 kWatts

$33,000

Now You Can Build Google’s

$1M Artificial Brain on the Cheap “ “

-Wired

1,000 CPU Servers 2,000 CPUs • 16,000 cores

600 kWatts

$5,000,000

GOOGLE BRAIN

Page 19: GPU Technology Conference 2014 Keynote

DEMO: MACHINE LEARNING, SIMPLE TRAINING SET

Page 20: GPU Technology Conference 2014 Keynote

1.2M

1000

2

7

25

Image training set

Classes

Weeks of training

GPUs

EXAFLOPS total to train

DEMO: MACHINE LEARNING, NYU OVERFEAT

Page 21: GPU Technology Conference 2014 Keynote

CUDA for MACHINE LEARNING

Talks @ GTC

Image Detection

Face Recognition

Gesture Recognition

Video Search & Analytics

Speech Recognition & Translation

Recommendation Engines

Indexing & Search

Use Cases Early Adopters

Image Analytics for Creative Cloud

Image Classification

Speech/Image Recognition

Recommendation

Hadoop

Search Rankings

Page 22: GPU Technology Conference 2014 Keynote

Big Data & Infinite Compute Turbocharge Deep Learning

SOURCE: KPCB/Mary Meeker, company data. Unstructured data: IDC's Digital Universe Study.

800M photos uploaded per day 100 hours of video uploaded per minute Unstructured data exploding

0

100

200

300

400

500

600

700

800

900

2007 2008 2009 2010 2011 2012 2013 2014

Facebook

Instagram

Snapchat

Flickr

0

20

40

60

80

100

120

2007 2008 2009 2010 2011 2012 2013

Hours

(Y

ouTu

be)

Millions

1,104

5,379

0

1,000

2,000

3,000

4,000

5,000

6,000

2010 2015

Exabyte

s of

data

Page 23: GPU Technology Conference 2014 Keynote
Page 24: GPU Technology Conference 2014 Keynote

DEMO: TITAN Z REVEAL

Page 25: GPU Technology Conference 2014 Keynote

5,760 CUDA cores

12GB memory

8 TeraFLOPS

$2999

Page 26: GPU Technology Conference 2014 Keynote

STANFORD AI LAB

1 Titan Z-Accelerated Server 3 Titan Zs • 17,280 cores

2 kWatts

$12,000

1,000 CPU Servers 2,000 CPUs • 16,000 cores

600 kWatts

$5,000,000

GOOGLE BRAIN

300X energy efficiency

400X lower cost

Fits next to a desk

Page 27: GPU Technology Conference 2014 Keynote

RenderMan with programmable shading

1.5 hours to render each frame

CCI 6/32 minicomputer

First CGI Film Nominated for

an Academy Award®

Page 28: GPU Technology Conference 2014 Keynote

State-of-the-art water simulator

48 hours to simulate the base water

250 hours to render each frame

2013 Academy Award® Winner

BEST VISUAL EFFECTS

Page 29: GPU Technology Conference 2014 Keynote

DEMO: WHALE

Page 30: GPU Technology Conference 2014 Keynote

DEMO: FLEX

Page 31: GPU Technology Conference 2014 Keynote

DEMO: FLAMEWORKS

Page 32: GPU Technology Conference 2014 Keynote

DEMO: UE4

Page 33: GPU Technology Conference 2014 Keynote

One is a photo, One is Iray…

Page 34: GPU Technology Conference 2014 Keynote

Bunkspeed Maya

Catia 3ds Max

IRAY VCA SCALABLE GPU RENDERING

APPLIANCE

8 Kepler-class

12GB per GPU

23,040

2 x 1GigE

2 x 10GigE

1 x InfiniBand

GPUs

GPU memory

CUDA cores

Network

Page 35: GPU Technology Conference 2014 Keynote

DEMO: IRAY / HONDA

Page 36: GPU Technology Conference 2014 Keynote

0 20 40 60 80

Relative Performance

CPU-only Workstation

Quadro K5000 Workstation

Iray VCA

Bunkspeed Maya

Catia 3ds Max

IRAY VCA SCALABLE GPU RENDERING

APPLIANCE

MSRP $50,000

Page 37: GPU Technology Conference 2014 Keynote
Page 38: GPU Technology Conference 2014 Keynote

GRID GPU in the Cloud

Page 39: GPU Technology Conference 2014 Keynote

Ben Fathi Chief Technology Officer

Horizon DaaS Platform

Page 40: GPU Technology Conference 2014 Keynote

Mobile CUDA

Page 41: GPU Technology Conference 2014 Keynote

“10 of the Top 10” Greenest Supercomputers Powered by CUDA GPUs

Page 42: GPU Technology Conference 2014 Keynote

Unify GPU and Tegra Architecture

192 fully programmable CUDA cores

326 GFLOPS

4X energy efficiency over A15

TEGRA K1 Mobile Super Chip

MOBILE

ARCHITECTURE

Maxwell

Kepler

Tesla

Fermi

Tegra 3

Tegra 4

Tegra K1

GPU

ARCHITECTURE

Page 43: GPU Technology Conference 2014 Keynote

Computer Vision on CUDA

Feature Detection / Tracking

~30 GFLOPS @ 30 Hz

Object Recognition / Tracking

~180 GFLOPS @ 30 Hz

3D Scene Interpretation

~280 GFLOPS @ 30 Hz

Page 44: GPU Technology Conference 2014 Keynote

JETSON TK1 1st MOBILE SUPERCOMPUTER FOR EMBEDDED SYSTEMS

192 CUDA cores

326 GFLOPS

VisionWorks SDK

$192

Page 45: GPU Technology Conference 2014 Keynote

VISIONWORKS COMPUTER VISION ON CUDA

Driver Assistance Computational Photography

Augmented Reality Robotics CUDA

Jetson TK1

VisionWorks Primitives

Your Code

Sample Pipelines

Object Detection / Tracking

Structure from Motion …

Classifier Corner Detection …

Page 46: GPU Technology Conference 2014 Keynote

Sin

gle

Pre

cis

ion G

FLO

PS /

W N

orm

alized

80

60

0

40

2013 2014 2011 2012 2015

Tegra 2 Tegra 3

Tegra 4

Tegra K1 Kepler GPU CUDA 64b & 32b CPU

Erista Maxwell GPU

20

TEGRA ROADMAP

Page 47: GPU Technology Conference 2014 Keynote

Andreas Reich Head of Audi Pre-Development

Page 48: GPU Technology Conference 2014 Keynote

VIDEO: AUDI ADAS

Page 49: GPU Technology Conference 2014 Keynote
Page 50: GPU Technology Conference 2014 Keynote
Page 51: GPU Technology Conference 2014 Keynote

CUDA EVERYWHERE PASCAL PC CLOUD MOBILE

Page 52: GPU Technology Conference 2014 Keynote

DEMO: PORTAL ON SHIELD

Page 53: GPU Technology Conference 2014 Keynote