Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ... · 2 DEEP LEARNING FOR ART Active R&D...

Tuesday, 9 May 2017

Andrew Edelsten - NVIDIA Developer Technologies

ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS USING DEEP LEARNING

2

DEEP LEARNING FOR ARTActive R&D but ready now

▪ Style transfer

▪ Generative networks creating images and voxels

▪ Adversarial networks (DCGAN) – still early but promising

▪ DL & ML based tools from NVIDIA and partners

▪ NVIDIA

▪ Artomatix

▪ Allegorithmic

▪ Autodesk

3

STYLE TRANSFERSomething Fun

Content Style▪ Doodle a masterpiece!

▪ Uses CNN to take the “style” from one image and apply it to another

▪ Sept 2015: A Neural Algorithm of Artistic Style by Gatys et al

▪ Dec 2015: neural-style (github)

▪ Mar 2016: neural-doodle (github)

▪ Mar 2016: texture-nets (github)

▪ Oct 2016: fast-neural-style (github)

▪ 2 May 2017 (last week!): Deep Image Analogy (arXiv)

▪ Also numerous services: Vinci, Prisma, Artisto, Ostagram

4HTTP://OSTAGRAM.RU/STATIC_PAGES/LENTA

5

STYLE TRANSFER

▪ Game remaster & texture enhancement

▪ Try Neural Style and use a real-world photo for the “style”

▪ For stylized or anime up-rez try https://github.com/nagadomi/waifu2x

▪ Experiment with art styles

▪ Dream or power-up sequences

▪ “Come Swim” by Kirsten Stewart - https://arxiv.org/pdf/1701.04928v1.pdf

Something Useful

https://github.com/nagadomi/waifu2x

https://arxiv.org/pdf/1701.04928v1.pdf

6

GAMEWORKS: MATERIALS & TEXTURESUsing DL for Game Development & Content Creation

▪ Set of tools targeting the game industry using machine learning and deep learning

▪ Launched at Game Developer Conference in March, tools run as a web service

▪ Sign up for the Beta at: https://gwmt.nvidia.com

▪ Tools in this initial release:

▪ Photo to Material: 2shot

▪ Texture Multiplier

▪ Super-Resolution

https://gwmt.nvidia.com/

7

PHOTO TO MATERIAL

▪ From two photos of a surface, generate a “material”

▪ Based on a SIGGRAPH 2015 paper by NVIDIA Research & Aalto University (Finland)

▪ “Two-Shot SVBRDF Capture for Stationary Materials”

▪ https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/

▪ Input is pixel aligned “flash” and “guide” photographs

▪ Use tripod and remote shutter or bracket

▪ Or align later

▪ Use for flat surfaces with repeating patterns

The 2Shot Tool

https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/

8

MATERIAL SYNTHESIS FROM TWO PHOTOS

Flash image Guide image

Diffuse

albedoSpecular Normals Glossiness Anisotropy

9

TEXTURE MULTIPLIER

▪ Put simply: texture in, new texture out

▪ Inspired by Gatys, Ecker & Bethge

▪ Texture Synthesis Using Convolutional Neural Networks

▪ https://arxiv.org/pdf/1505.07376.pdf

▪ Artomatix

▪ Similar product “Texture Mutation”

▪ https://artomatix.com/

Organic variations of textures

https://arxiv.org/pdf/1505.07376.pdf

https://artomatix.com/

10

SUPER RESOLUTION

11

SUPER RESOLUTIONZoom.. ENHANCE!

Zoom in on the

license plate

OK!Sure!

Can you

enhance that?

12

SUPER RESOLUTIONThe task at hand

Upscale

(magic?)

W

H

Given alow-resolution image

n * W

n * H

Construct ahigh-resolution image

13

UPSCALE: CREATE MORE PIXELSAn ill-posed task?

Pixels of the upscaled image

Pixels of the given image? ? ?

? ? ? ? ? ?

? ? ?

? ? ? ? ? ?

? ? ?

? ? ? ? ? ?

14

TRADITIONAL APPROACH▪ Interpolation (bicubic, lanczos, etc.)

▪ Interpolation + Sharpening (and other filtration)

Filter-based sharpeningInterpolation

▪ Rough estimation of the data behavior too general

▪ Too many possibilities (8x8 grayscale has 256(8∗8) ≈ 10153 pixel combinations!)

15

A NEW APPROACHFirst: narrow the possible set

Photos

Textures

All possible imagesFocus on the domain of “natural images”

Natural images

16

A NEW APPROACH

Data from natural images is sparse, it’s compressible in some domain

Then “reconstruct” images (rather than create new ones)

Second: Place image in the domain, then reconstruct

+prior information

+constraints

ReconstructCompress

17

PATCH-BASED MAPPING: TRAINING

Model

params

Mapping

Training images

,

LR,HR pairs of patches

training

Low-resolution patch High-resolution patch

18

PATCH-BASED MAPPING

LR patch

HR patch

Encode Decode

𝒙𝑳

𝒙𝑯

High-level information about the patch

19

PATCH-BASED MAPPING: SPARSE CODING

LR patch

HR patch

Sparse code

Encode Decode

𝒙𝑳

𝒙𝑯

High-level information about the patch“Features”

20

PATCH FEATURES & RECONSTRUCTION

𝒙 = 𝑫𝒛 = 𝒅𝟏𝒛𝟏 +⋯+ 𝒅𝑲𝒛𝑲

= 0.8 * + 0.3 * + 0.5 *

𝑫

𝒅𝟑𝟔 𝒅𝟒𝟐 𝒅𝟔𝟑𝒙

Image patch can be reconstructed as a sparse linear combination of features

Features are learned from the dataset over time

𝒛

𝒙

𝑫 - dictionary

- patch

- sparse code

21

GENERALIZED PATCH-BASED MAPPING

MappingMapping

LR patch

HR patchHigh-level

representation of the LR patch

“Features”

High-level representation of

the HR patch

Mapping in feature space

22

GENERALIZED PATCH-BASED MAPPING


MappingMapping

LR patch

HR patch

Trainable parameters

𝑊1 𝑊2 𝑊3

23

MAPPING OF THE WHOLE IMAGEUsing Convolutions

LR image

HR image


MappingMapping

Convolutional operators

24

AUTO-ENCODERS

input output ≈ input

25

AUTO-ENCODER

input

features

Encode

output ≈ input

Decode

26

AUTO-ENCODER

𝑥 𝑦

Parameters

𝑊

Training

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛

𝑖

𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )

𝑦 = 𝐹𝑊(𝑥)

Inference

𝑥𝑖 - training set

27

AUTO-ENCODER

input

Encode

information loss

▪ Our encoder is LOSSY by definition

28

SUPER-RESOLUTION AUTO-ENCODER

Training

𝑦 = 𝐹𝑊(𝑥)

Inference



𝑖

𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )

𝑥 𝑦

Parameters

𝑊

29


𝑖

𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝐷(𝑥𝑖) )

SUPER RESOLUTION AE: TRAINING

y


Ground-truth HR image

Downscaling

LR image

SR AE

Reconstructed HR image

𝑥

𝐹W

𝐷

ො𝑥

𝑊

30

SUPER RESOLUTION AE: INFERENCE

Given LR image

Constructed HR image

y

ො𝑥

𝑦 = 𝐹𝑊(ො𝑥)

SR AE

𝐹W

𝑊

31

SUPER-RESOLUTION: ILL-POSED TASK?

32

THE LOSS FUNCTION

33

THE LOSS FUNCTION

Distance function is a key element to obtaining good results.

Measuring the “distance” from a good result


𝑖

𝐷 𝑥𝑖 , 𝐹𝑊(𝑥𝑖 )

Choice of the loss function is an important decision

34

LOSS FUNCTION

1

𝑁𝑥 − 𝐹 𝑥 2

MSEMean Squared Error

35

LOSS FUNCTION: PSNR

1



PSNR Peak Signal-to-Noise Ratio

10 ∗ 𝑙𝑜𝑔10𝑀𝐴𝑋2

𝑀𝑆𝐸

36

LOSS FUNCTION: HFEN

1



PSNR Peak Signal-to-Noise Ratio

10 ∗ 𝑙𝑜𝑔10𝑀𝐴𝑋2

𝑀𝑆𝐸

𝐻𝑃(𝑥 − 𝐹 𝑥 ) 2

HFEN(see A)

High Frequency Error Norm High-Pass filter

Perceptual loss

Ref A: http://ieeexplore.ieee.org/document/5617283/

37

REGULAR LOSS

Result 4x Result 4x

38

REGULAR LOSS + PERCEPTUAL LOSS

Result 4x Result 4x

39

WARNING… THIS IS EXPERIMENTAL!

40

SUPER-RESOLUTION: GAN-BASED LOSS

Total loss = Regular (MSE+PSNR+HFEN) loss + GAN loss

Generator Discriminator

𝑥𝐹(𝑥)

𝐷(𝑦)𝑦

= −𝑙𝑛𝐷(𝐹 𝑥 )GAN loss

real

fake

Extended presentation from Game Developer Conference 2017

https://developer.nvidia.com/deep-learning-games

GameWorks: Materials & Textures

https://gwmt.nvidia.com

QUESTIONS?

Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ... · 2 DEEP LEARNING FOR ART Active R&D...

Documents

Transcript of Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ... · 2 DEEP LEARNING FOR ART Active R&D...