Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Confidential + ProprietaryConfidential + Proprietary

Generative Adversarial RoboticsVincent Vanhoucke, Brain Robotics

Confidential + Proprietary

Generative Adversarial Networks (GANs)

Inspired two major new lines of research:

1. Conditional image generation2. Thinking adversarially

Real Data

Sample

SampleGenerator

Discriminator

Ran

dom

Va

riabl

es

Real

Fake


Generative Adversarial Networks

We can now generate realistic images.

In particular, we can generate realistic images conditionally.

What does it mean to have this new superpower?

BEGAN: Boundary Equilibrium GenerativeAdversarial Networks

David Berthelot, Tom Schumm, Luke Metz

https://arxiv.org/abs/1703.10717






Thinking adversarially opens up different ways to look at a problem.

Many problems don’t lend themselves to simple loss functions:● Does this image look good?● Was your interaction with this AI pleasant?● Is this text topical?

‘You know it when you see it’ is not an easy loss to backprop through.



Adversarial training provides an elegant solution:

1. Give me some data that looks like what you want.2. I’ll (co-)train a loss function which discriminates between the good stuff and

what I produce.3. The less I can tell the difference between the two, the more progress I make.

It’s the most honest loss function there is: if you try to cheat, the adversary will automatically adapt to catch your treachery.

Being honest has a price: adversarial losses are really hard to train :(


What does this have to do with Robotics?

1. Generating sensor inputs means you can manipulate reality with high fidelity.

2. Real-world robotics tasks are remarkably resistant to being expressed using a well-behaved loss function:

● Can’t even define the goal mathematically.● Ok, maybe we can define the goal but it’s hell to instrument.● You have an instrumented goal, but loss is flat everywhere, except at the goal.


Generative Adversarial Robotics

Part 1: Generative Robotics

Model-free predictive control

Closing the reality gap

Part 2: Adversarial Robotics

Adversarial grasping

Adversarial exploration

Unsupervised Imitation Learning


Visual Model Predictive Control

Unsupervised Learning for Physical Interaction through Video Prediction

Chelsea Finn, Ian Goodfellow, Sergey Levine

Insight: use conditional image generation to plan actions.

Robot visualizes what might happen when they take an action, and picks the action that matches the desired future perceptually.




Predict What Happens to Objects when Arm Moves

http://www.youtube.com/watch?v=6k7GHG4IUCY&t=90


Closing the Reality Gap

Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial

NetworksKonstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan








From CAD models toReal Objects in Clutter

Classification and 3D Pose Estimation (Cropped LineMod)


Generalization Performance

Synthetic Cropped Linemod to Cropped Linemod

● Trained model on 6 of 11 classes● Evaluated on the unseen 5 classes and on entire set

Opens up the possibility of training real perceptual robotic task in simulation


Generative Adversarial Robotics

Part 1: Generative Robotics

Model-free predictive control

Closing the reality gap

Part 2: Adversarial Robotics

Adversarial grasping

Adversarial exploration



What’s a Stable Grasp?

By definition: a grasp that you (i.e. an adversary) can’t easily dislodge.


Shaking Adversary

Unstable grasp Stable grasp

http://www.youtube.com/watch?v=0HeSefPj_G0&t=56

http://www.youtube.com/watch?v=0G0bDicQ5X8


Snatching Adversary: Example

Unstable grasp Stable grasp

http://www.youtube.com/watch?v=SFiVeImoGzg

http://www.youtube.com/watch?v=AXFZamlTGYg


Supervision via Competition: Robot Adversaries for Learning TasksLerrel Pinto (CMU), Abhinav Gupta (CMU), James Davidson




How Do Researchers Evaluate Robust Control?

You’ve all seen the Boston Dynamics robot abuse videos…

Can adversaries help train better control policies?


Making Nature an Adversary

Protagonist

Antagonist

Environment


Robust Adversarial Reinforcement LearningLerrel Pinto (CMU), Abhinav Gupta (CMU), Rahul Sukthankar, James Davidson

● Improved stability○ increase in the mean score

● Improved performance ○ Best policies outperform or match

baseline method

● Improved efficiency○ Trains in fewer iterations

compared to the baseline




Adversarial Data MiningNote the subtext so far: you don’t need fancy GANs to think adversarially.

Before there were GANs, there was the triplet loss with hard negative mining:

Key: are sampled dynamically to be hard to discriminate from .

GAN: Generator ↔ Discriminator

Triplet: Sampler ↔ Discriminator

AnchorNegative

Positive



Triplet loss is a great way to learn similarities with weak supervision:

‘like’

‘not like’

In this work:

‘happens at the same time’

‘happens at a different time’


Time-Contrastive Networks: Self-Supervised Learning from Multi-View ObservationPierre Sermanet, Corey Lynch, Jasmine Hsu, Sergey Levine


Temporal Coherency

Lots of related work, e.g.:

Shuffle and Learn: Unsupervised Learning using Temporal Order Verification I Misra, CL Zitnick, M Hebert, 2016Unsupervised Learning of Visual Representations using Videos X Wang, A Gupta, 2015Slow Feature Analysis: Unsupervised Learning of Invariances L Wiskott, TJ Sejnowski, 2002Unsupervised Learning of Spatiotemporally Coherent Metrics R Goroshin, J Bruna, J Tompson, D Eigen, Y LeCun, 2014Learning Features by Watching Objects Move D Pathak, R Girshick, P Dollár, T Darrell, B Hariharan, 2016

Very powerful approaches to learn video features.

Can we use the same concepts to discover states and scene attributes “betrayed” by temporal changes?





http://www.cnbc.cmu.edu/~tai/readings/learning/wiskott_sejnowski_2002.pdf

http://www.cnbc.cmu.edu/~tai/readings/learning/wiskott_sejnowski_2002.pdf



https://arxiv.org/pdf/1612.06370.pdf

https://arxiv.org/pdf/1612.06370.pdf


Data Collection

2 smartphones + off-the-shelf synchronization app


Multi-View Data

● Exact temporal correspondence● Correspondences: viewpoint, occlusion, scale, motion-blur, translation● Training: ~15 minutes, Testing: ~5 minutes


Unsupervised Attribute Classification (Nearest Neighbor)


TCN Pouring


TCN ‘Fake’ Pouring


TCN ‘Fake Robot’ Pouring


What We Have So Far

A pretty good deal!

● Only ~15 minutes of experience, no calibration, no labels…● Trajectory embeddings agnostic to container, liquid and arm appearance.● A very general approach to action representation.

Next: using this for imitation!


Self-Supervised and End-to-end Pose Estimation


3 Training Signals


End-to-end Imitation

Let’s look at all three losses:


Combining Training Signals


t-SNE


TCN + Self-Supervision (No Labels!)


Failure Case: Bad Sampling


All Signals


Conclusion

Lots more to learn!

In particular, can we use these approaches to learn elaborate manipulation tasks?

More generally, new ways to think about robotic problems:

● Think generatively: we are getting better at synthesizing conditional universes that we can evaluate against perceived reality.

● Think adversarially: lets you express know-it-when-you-see-it rewards and opens up new avenues for self-supervision.


That’s All!

Questions?

[email protected]

mailto:[email protected]

Vincent Vanhoucke, Brain Robotics Generative Adversarial ...

Documents

Transcript of Vincent Vanhoucke, Brain Robotics Generative Adversarial ...