How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C....

41
How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna Rao, Pradeep Vaghela Natural Language: Prof. Achla Raina, V. Shreeniwas

Transcript of How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C....

Page 1: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

How Machines Learn to Talk

Amitabha Mukerjee IIT Kanpur

work done with:Computer Vision: Profs. C. Venkatesh, Pabitra Mitra

Prithvijit Guha, A. Ramakrishna Rao, Pradeep Vaghela

Natural Language: Prof. Achla Raina, V. Shreeniwas

Page 2: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Robotics

Collaborations:IGCAR Kalpakkam

Sanjay Gandhi PG Medical Hospital

Page 3: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Visual Robot Navigation

Time-to-Collisionbased Robot Navigation

Page 4: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Hyper-Redundant Manipulators

• Reconfigurable Workspaces / Emergency Access

• Optimal Design of Hyper-Redundant Systems – Scara and 3D

The same manipulator can work in changing workspaces

Page 5: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Planar Hyper-Redundancy

4-link PlanarRobot

Motion Planning

Page 6: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Micro-Robots

• Micro Soccer Robots (1999-)

• 8cm Smart Surveillance Robot – 1m/s

• Autonomous Flying Robot (2004)

• Omni-directional platform (2002)

Omni-Directional Robot Sponsor: [email protected]

Page 7: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Flying Robot

heli-flight.wmv

Test Flight of UAV. Inertial Meas Unit (IMU) under commercial production

Start-Up at

IIT Kanpur

WhirligigRobotics

Page 8: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Tracheal Intubation Device

Device for Intubation during general Anesthesia

Aperture for Fibre optic video cable

Endotracheal tube Aperture

Aperture for Oxygenation tube

Hole for suction tube

Control cables Attachment Points

Ball & Socket joint

Assists surgeon while inserting breathing tube during general anaesthesia

Sponsor: DST / [email protected]

Page 9: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Draupadi’s Swayamvar

Can the Arrow hit the rotating mark? Sponsor: Media Lab Asia

Page 10: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

High DOF Motion Planning

• Accessing Hard to Reach spaces

• Design of Hyper-Redundant Systems

• Parallel Manipulators

Sponsor: BRNS / [email protected]

10-link 3D Robot – Optimal Design

Page 11: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Multimodal Language Acquisition

Consider a child observing a scene together with adults talking about it

Grounded Language : Symbols are grounded in perceptual signals

Use of simple videos with boxes and simple shapes – standardly used in sociopsychology

Page 12: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Objective

To develop a computational frameworkfor Multimodal Language Acquisition• acquiring the perceptual structure

corresponding to verbs • using Recurrent Neural Networks as

a biologically plausible model for temporal abstraction

• Adapt the learned model to interpret activities in real videos

Page 13: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Visually Grounded Corpus

Two psychological research films, one based on the classic Heider & Simmel (1944) and other based on Hide & Seek

These animation portray motion paths of geometric figures (Big Square, Small square & Circle)

Chase Alt

Page 14: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Cognate clustering Similarity Clustering: Different

expressoins for same action, e.g.: “move away from center” vs “go to a corner”

Frequency: Remove Infrequent lexical units

Synonymy: Set of lexical units being used consistently in the same intervals, to mark the same action, for the same set of agents.

Page 15: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Perceptual Process

Cognate Clustering

Trained Simple

Recurrent Network

Descriptions

FeaturesVideo

Events

Feature Extraction

Multi Modal Input

VICES

Page 16: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Design of Feature Set The features selected here are related to

spatial aspects of conceptual primitives in children, such as position, relative pose, velocity etc.

Use features that are kinematical in nature, temporal derivations or simple transforms of the basic ones.

Page 17: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Monadic Features

Page 18: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Dyadic Predicates

Page 19: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

VIdeo and Commentary for Event Structures [VICES]

Cognate Clustering

Trained Simple

Recurrent Network

Descriptions

FeaturesVideo

Events

Feature Extraction

Multi Modal Input

VICES

Page 20: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

The classification problem

The problem is of time series classification

Possible methodologies include: Logic based methods Hidden Markov Models Recurrent Neural Networks

Page 21: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Elman Network Commonly a two-

layer network with feedback from the first-layer output to the first layer input

Elman Networks detect and generate time-varying patterns

It is also able to learn spatial patterns

Page 22: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Feature Extraction in Abstract Videos

Each image is read into a 2D matrix Connected Component Analysis is

performed Bounding box is computed for each

such connected component Dynamic tracking is used to keep

track of each object

Page 23: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.
Page 24: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Working with Real Videos Challenges

Noise in real world videos Illumination Changes Occlusions Extracting Depth Information

Our Setup Camera is fixed at head height. Angle of depression is 0 degrees (approx.).

Video

Page 25: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Background Subtraction Learn on still

background images Find pixel intensity

distributions Classify each pixel as

background if

Remove Shadows Special Case of Reduced

Illumination S = k*P where k<1.0

Background Subtraction

P(x,y) - µ(x,y) < P(x,y) - µ(x,y) < kkσσ(x,y)(x,y)

2

Page 26: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Contd.. Extract Human Blobs

By Connected Component Analysis

Bounding box is computed for each person

Track Human Blobs Each object is tracked

using a mean-shift tracking algorithm.

Page 27: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Contd..

Page 28: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Depth Estimation Two approximations

Using Gibson’s affordances Camera Geometry

Affordances: Visual Clues Action of a human is triggered by the

environment itself. A floor offers walk-on ability

Every object affords certain actions to perceive along with anticipated effects A cups handle affords grasping-lifting-drinking

Page 29: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Contd..

Gibson’s model Horizon is fixed at the head height of the

observer. Monocular Depth Cues

Interposition An object that occludes another is closer.

Height in the visual field Higher the object is the further it is.

Page 30: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Depth Estimation Pin hole Camera Model Mapping (X,Y,Z) to (x,y)

x = X * f / Z y = Y * f / Z

For the point of contact with the ground Z 1 / y X x / y

Page 31: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Depth plot for A chase B Top view (Z-X plane)

Page 32: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Results (contd..)

Page 33: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Results (contd..)

Page 34: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Results (contd..)

Page 35: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Results

Separate-SRN-for-each-action Trained & tested on different parts of the

abstract video Trained on abstract video and tested on

real video Single-SRN-for-all-actions

Trained on synthetic video and tested on real video

Page 36: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Basis for Comparison

E

E'-E Positives False

Mismatches Focus as classified Intervals :FM

E

E' E Positives True

occurring asevent an describe subjects when Intervals : E

E - t : E

occurring asevent an describes VICES when Intervals E'

'' EtE

E

E'-E Negatives False E

FM Mismatches Focus

t

EE 'E'EAccuracy

Let the total time of visual sequence for each verb be t time units

Page 37: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Separate SRN for each action

Framework : Abstract videoVerb True Positives False Positives False Negatives Focus Mismatches Accuracy

hit 46.02% 3.06% 53.98% 2.4% 92.37%

chase 24.44% 0% 75.24% 0.72% 93.71%

come Closer 25.87% 14.61% 73.26% 16.77% 63.66%

move Away 46.34% 7.21% 52.33% 15.95% 73.37 %

spins 82.54% 0% 16.51% 24.7% 97.03%

moves 68.24% 0.12% 31.76% 1.97% 77.33%

Verb True Positives False Positives False Negatives Focus Mismatches

hit 3 3 1 1

chase 6 0 3 4

come Closer 6 20 7 24

move Away 8 3 0 14

spins 22 0 1 9

moves 5 1 2 7

Page 38: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Time Line comparison for Chase

Page 39: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Separate SRN for each action Real video (action recognition only)

Verb Retrieved Relevant True Positives

False Positives

False Negatives

Precision Recall

A Chase B 237 140 135 96 5 58.4% 96.4%

B Chase A 76 130 76 0 56 100% 58.4%

Page 40: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Single SRN for all actions

Framework : Real video

Verb Retrieved Relevant True Positives

False Positives

False Negatives

Precision Recall

Chase 239 270 217 23 5 91.2% 80.7%

Going Away 21 44 13 8 31 61.9% 29.5%

Page 41: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna.

Conclusions & Future Work Sparse nature of video provides for ease of

visual analysis Directly learning event structures from

perceptual stream. Extensions: Learn fine nuances between

event structures of related action words. Learn the Morphological variations. Extend the work towards using Long Short

Term Memory (LSTM). Hierarchical acquisition of higher level

action verbs.