Vincent Vanhoucke, Brain Robotics Generative Adversarial ...
Transcript of Vincent Vanhoucke, Brain Robotics Generative Adversarial ...
Confidential + ProprietaryConfidential + Proprietary
Generative Adversarial RoboticsVincent Vanhoucke, Brain Robotics
Confidential + Proprietary
Generative Adversarial Networks (GANs)
Inspired two major new lines of research:
1. Conditional image generation2. Thinking adversarially
Real Data
Sample
SampleGenerator
Discriminator
Ran
dom
Va
riabl
es
Real
Fake
Confidential + Proprietary
Generative Adversarial Networks
We can now generate realistic images.
In particular, we can generate realistic images conditionally.
What does it mean to have this new superpower?
BEGAN: Boundary Equilibrium GenerativeAdversarial Networks
David Berthelot, Tom Schumm, Luke Metz
Confidential + Proprietary
Generative Adversarial Networks
Thinking adversarially opens up different ways to look at a problem.
Many problems don’t lend themselves to simple loss functions:● Does this image look good?● Was your interaction with this AI pleasant?● Is this text topical?
‘You know it when you see it’ is not an easy loss to backprop through.
Confidential + Proprietary
Generative Adversarial Networks
Adversarial training provides an elegant solution:
1. Give me some data that looks like what you want.2. I’ll (co-)train a loss function which discriminates between the good stuff and
what I produce.3. The less I can tell the difference between the two, the more progress I make.
It’s the most honest loss function there is: if you try to cheat, the adversary will automatically adapt to catch your treachery.
Being honest has a price: adversarial losses are really hard to train :(
Confidential + Proprietary
What does this have to do with Robotics?
1. Generating sensor inputs means you can manipulate reality with high fidelity.
2. Real-world robotics tasks are remarkably resistant to being expressed using a well-behaved loss function:
● Can’t even define the goal mathematically.● Ok, maybe we can define the goal but it’s hell to instrument.● You have an instrumented goal, but loss is flat everywhere, except at the goal.
Confidential + Proprietary
Generative Adversarial Robotics
Part 1: Generative Robotics
Model-free predictive control
Closing the reality gap
Part 2: Adversarial Robotics
Adversarial grasping
Adversarial exploration
Unsupervised Imitation Learning
Confidential + Proprietary
Visual Model Predictive Control
Unsupervised Learning for Physical Interaction through Video Prediction
Chelsea Finn, Ian Goodfellow, Sergey Levine
Insight: use conditional image generation to plan actions.
Robot visualizes what might happen when they take an action, and picks the action that matches the desired future perceptually.
Confidential + Proprietary
Predict What Happens to Objects when Arm Moves
Confidential + Proprietary
Closing the Reality Gap
Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial
NetworksKonstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan
Confidential + Proprietary
From CAD models toReal Objects in Clutter
Classification and 3D Pose Estimation (Cropped LineMod)
Confidential + Proprietary
Generalization Performance
Synthetic Cropped Linemod to Cropped Linemod
● Trained model on 6 of 11 classes● Evaluated on the unseen 5 classes and on entire set
Opens up the possibility of training real perceptual robotic task in simulation
Confidential + Proprietary
Generative Adversarial Robotics
Part 1: Generative Robotics
Model-free predictive control
Closing the reality gap
Part 2: Adversarial Robotics
Adversarial grasping
Adversarial exploration
Unsupervised Imitation Learning
Confidential + Proprietary
What’s a Stable Grasp?
By definition: a grasp that you (i.e. an adversary) can’t easily dislodge.
Confidential + Proprietary
Shaking Adversary
Unstable grasp Stable grasp
Confidential + Proprietary
Snatching Adversary: Example
Unstable grasp Stable grasp
Confidential + Proprietary
Supervision via Competition: Robot Adversaries for Learning TasksLerrel Pinto (CMU), Abhinav Gupta (CMU), James Davidson
Confidential + Proprietary
How Do Researchers Evaluate Robust Control?
You’ve all seen the Boston Dynamics robot abuse videos…
Can adversaries help train better control policies?
Confidential + Proprietary
Making Nature an Adversary
Protagonist
Antagonist
Environment
Confidential + Proprietary
Robust Adversarial Reinforcement LearningLerrel Pinto (CMU), Abhinav Gupta (CMU), Rahul Sukthankar, James Davidson
● Improved stability○ increase in the mean score
● Improved performance ○ Best policies outperform or match
baseline method
● Improved efficiency○ Trains in fewer iterations
compared to the baseline
Confidential + Proprietary
Adversarial Data MiningNote the subtext so far: you don’t need fancy GANs to think adversarially.
Before there were GANs, there was the triplet loss with hard negative mining:
Key: are sampled dynamically to be hard to discriminate from .
GAN: Generator ↔ Discriminator
Triplet: Sampler ↔ Discriminator
AnchorNegative
Positive
Confidential + Proprietary
Unsupervised Imitation Learning
Triplet loss is a great way to learn similarities with weak supervision:
‘like’
‘not like’
In this work:
‘happens at the same time’
‘happens at a different time’
Confidential + Proprietary
Time-Contrastive Networks: Self-Supervised Learning from Multi-View ObservationPierre Sermanet, Corey Lynch, Jasmine Hsu, Sergey Levine
Confidential + Proprietary
Temporal Coherency
Lots of related work, e.g.:
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification I Misra, CL Zitnick, M Hebert, 2016Unsupervised Learning of Visual Representations using Videos X Wang, A Gupta, 2015Slow Feature Analysis: Unsupervised Learning of Invariances L Wiskott, TJ Sejnowski, 2002Unsupervised Learning of Spatiotemporally Coherent Metrics R Goroshin, J Bruna, J Tompson, D Eigen, Y LeCun, 2014Learning Features by Watching Objects Move D Pathak, R Girshick, P Dollár, T Darrell, B Hariharan, 2016
Very powerful approaches to learn video features.
Can we use the same concepts to discover states and scene attributes “betrayed” by temporal changes?
Confidential + Proprietary
Data Collection
2 smartphones + off-the-shelf synchronization app
Confidential + Proprietary
Multi-View Data
● Exact temporal correspondence● Correspondences: viewpoint, occlusion, scale, motion-blur, translation● Training: ~15 minutes, Testing: ~5 minutes
Confidential + Proprietary
Unsupervised Attribute Classification (Nearest Neighbor)
Confidential + Proprietary
TCN Pouring
Confidential + Proprietary
TCN ‘Fake’ Pouring
Confidential + Proprietary
TCN ‘Fake Robot’ Pouring
Confidential + Proprietary
What We Have So Far
A pretty good deal!
● Only ~15 minutes of experience, no calibration, no labels…● Trajectory embeddings agnostic to container, liquid and arm appearance.● A very general approach to action representation.
Next: using this for imitation!
Confidential + Proprietary
Self-Supervised and End-to-end Pose Estimation
Confidential + Proprietary
3 Training Signals
Confidential + Proprietary
End-to-end Imitation
Let’s look at all three losses:
Confidential + Proprietary
Combining Training Signals
Confidential + Proprietary
t-SNE
Confidential + Proprietary
TCN + Self-Supervision (No Labels!)
Confidential + Proprietary
TCN + Self-Supervision (No Labels!)
Confidential + Proprietary
Failure Case: Bad Sampling
Confidential + Proprietary
All Signals
Confidential + Proprietary
Conclusion
Lots more to learn!
In particular, can we use these approaches to learn elaborate manipulation tasks?
More generally, new ways to think about robotic problems:
● Think generatively: we are getting better at synthesizing conditional universes that we can evaluate against perceived reality.
● Think adversarially: lets you express know-it-when-you-see-it rewards and opens up new avenues for self-supervision.