Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

42
Confidential + Proprietary Confidential + Proprietary Generative Adversarial Robotics Vincent Vanhoucke, Brain Robotics

Transcript of Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Page 1: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + ProprietaryConfidential + Proprietary

Generative Adversarial RoboticsVincent Vanhoucke, Brain Robotics

Page 2: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Generative Adversarial Networks (GANs)

Inspired two major new lines of research:

1. Conditional image generation2. Thinking adversarially

Real Data

Sample

SampleGenerator

Discriminator

Ran

dom

Va

riabl

es

Real

Fake

Page 3: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Generative Adversarial Networks

We can now generate realistic images.

In particular, we can generate realistic images conditionally.

What does it mean to have this new superpower?

BEGAN: Boundary Equilibrium GenerativeAdversarial Networks

David Berthelot, Tom Schumm, Luke Metz

Page 4: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Generative Adversarial Networks

Thinking adversarially opens up different ways to look at a problem.

Many problems don’t lend themselves to simple loss functions:● Does this image look good?● Was your interaction with this AI pleasant?● Is this text topical?

‘You know it when you see it’ is not an easy loss to backprop through.

Page 5: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Generative Adversarial Networks

Adversarial training provides an elegant solution:

1. Give me some data that looks like what you want.2. I’ll (co-)train a loss function which discriminates between the good stuff and

what I produce.3. The less I can tell the difference between the two, the more progress I make.

It’s the most honest loss function there is: if you try to cheat, the adversary will automatically adapt to catch your treachery.

Being honest has a price: adversarial losses are really hard to train :(

Page 6: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

What does this have to do with Robotics?

1. Generating sensor inputs means you can manipulate reality with high fidelity.

2. Real-world robotics tasks are remarkably resistant to being expressed using a well-behaved loss function:

● Can’t even define the goal mathematically.● Ok, maybe we can define the goal but it’s hell to instrument.● You have an instrumented goal, but loss is flat everywhere, except at the goal.

Page 7: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Generative Adversarial Robotics

Part 1: Generative Robotics

Model-free predictive control

Closing the reality gap

Part 2: Adversarial Robotics

Adversarial grasping

Adversarial exploration

Unsupervised Imitation Learning

Page 8: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Visual Model Predictive Control

Unsupervised Learning for Physical Interaction through Video Prediction

Chelsea Finn, Ian Goodfellow, Sergey Levine

Insight: use conditional image generation to plan actions.

Robot visualizes what might happen when they take an action, and picks the action that matches the desired future perceptually.

Page 9: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Predict What Happens to Objects when Arm Moves

Page 10: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Closing the Reality Gap

Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial

NetworksKonstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan

Page 11: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

From CAD models toReal Objects in Clutter

Classification and 3D Pose Estimation (Cropped LineMod)

Page 12: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Generalization Performance

Synthetic Cropped Linemod to Cropped Linemod

● Trained model on 6 of 11 classes● Evaluated on the unseen 5 classes and on entire set

Opens up the possibility of training real perceptual robotic task in simulation

Page 13: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Generative Adversarial Robotics

Part 1: Generative Robotics

Model-free predictive control

Closing the reality gap

Part 2: Adversarial Robotics

Adversarial grasping

Adversarial exploration

Unsupervised Imitation Learning

Page 14: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

What’s a Stable Grasp?

By definition: a grasp that you (i.e. an adversary) can’t easily dislodge.

Page 15: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Shaking Adversary

Unstable grasp Stable grasp

Page 16: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Snatching Adversary: Example

Unstable grasp Stable grasp

Page 17: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Supervision via Competition: Robot Adversaries for Learning TasksLerrel Pinto (CMU), Abhinav Gupta (CMU), James Davidson

Page 18: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

How Do Researchers Evaluate Robust Control?

You’ve all seen the Boston Dynamics robot abuse videos…

Can adversaries help train better control policies?

Page 19: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Making Nature an Adversary

Protagonist

Antagonist

Environment

Page 20: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Robust Adversarial Reinforcement LearningLerrel Pinto (CMU), Abhinav Gupta (CMU), Rahul Sukthankar, James Davidson

● Improved stability○ increase in the mean score

● Improved performance ○ Best policies outperform or match

baseline method

● Improved efficiency○ Trains in fewer iterations

compared to the baseline

Page 21: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Adversarial Data MiningNote the subtext so far: you don’t need fancy GANs to think adversarially.

Before there were GANs, there was the triplet loss with hard negative mining:

Key: are sampled dynamically to be hard to discriminate from .

GAN: Generator ↔ Discriminator

Triplet: Sampler ↔ Discriminator

AnchorNegative

Positive

Page 22: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Unsupervised Imitation Learning

Triplet loss is a great way to learn similarities with weak supervision:

‘like’

‘not like’

In this work:

‘happens at the same time’

‘happens at a different time’

Page 23: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Time-Contrastive Networks: Self-Supervised Learning from Multi-View ObservationPierre Sermanet, Corey Lynch, Jasmine Hsu, Sergey Levine

Page 24: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Temporal Coherency

Lots of related work, e.g.:

Shuffle and Learn: Unsupervised Learning using Temporal Order Verification I Misra, CL Zitnick, M Hebert, 2016Unsupervised Learning of Visual Representations using Videos X Wang, A Gupta, 2015Slow Feature Analysis: Unsupervised Learning of Invariances L Wiskott, TJ Sejnowski, 2002Unsupervised Learning of Spatiotemporally Coherent Metrics R Goroshin, J Bruna, J Tompson, D Eigen, Y LeCun, 2014Learning Features by Watching Objects Move D Pathak, R Girshick, P Dollár, T Darrell, B Hariharan, 2016

Very powerful approaches to learn video features.

Can we use the same concepts to discover states and scene attributes “betrayed” by temporal changes?

Page 25: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Data Collection

2 smartphones + off-the-shelf synchronization app

Page 26: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Multi-View Data

● Exact temporal correspondence● Correspondences: viewpoint, occlusion, scale, motion-blur, translation● Training: ~15 minutes, Testing: ~5 minutes

Page 27: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Unsupervised Attribute Classification (Nearest Neighbor)

Page 28: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

TCN Pouring

Page 29: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

TCN ‘Fake’ Pouring

Page 30: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

TCN ‘Fake Robot’ Pouring

Page 31: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

What We Have So Far

A pretty good deal!

● Only ~15 minutes of experience, no calibration, no labels…● Trajectory embeddings agnostic to container, liquid and arm appearance.● A very general approach to action representation.

Next: using this for imitation!

Page 32: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Self-Supervised and End-to-end Pose Estimation

Page 33: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

3 Training Signals

Page 34: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

End-to-end Imitation

Let’s look at all three losses:

Page 35: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Combining Training Signals

Page 36: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

t-SNE

Page 37: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

TCN + Self-Supervision (No Labels!)

Page 38: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

TCN + Self-Supervision (No Labels!)

Page 39: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Failure Case: Bad Sampling

Page 40: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

All Signals

Page 41: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

Conclusion

Lots more to learn!

In particular, can we use these approaches to learn elaborate manipulation tasks?

More generally, new ways to think about robotic problems:

● Think generatively: we are getting better at synthesizing conditional universes that we can evaluate against perceived reality.

● Think adversarially: lets you express know-it-when-you-see-it rewards and opens up new avenues for self-supervision.

Page 42: Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + Proprietary

That’s All!

Questions?

[email protected]