Artificial Intelligence Advanced Visual Interaction Understanding … · 2016-09-28 · Artificial...

Artificial Intelligence AdvancedVisual Interaction

Understanding Human Motions and Emotions using Visual Information

Atsushi Nakazawa

IST, Kyoto University

KYOTO UNIVERSITY

• 12/2 (today) – Understanding human emotions using visual information

• 12/9 (next week) – Understanding human motions (behaviors) using visual information

KYOTO UNIVERSITY

• Affective Computing – Treating human emotions in computer systems• Rosalind Piccard – TED talks

• Applications Affective Computing

• How to quantize human emotions ?

• Emotion model: Primary and secondary emotions

• How to measure human emotions ?• Quantize facial actions FACS (Facial Action Coding System)

• Relations between facial expressions and emotions

• Physiological signals and parasympathetic nervous system

• Measurement of human internal states from physiological signals

• Human gaze estimation• Gaze and its measurement techniques

• Corneal imaging and its applications

• Estimate human internal states from eye images – pupillary response

• Summary

KYOTO UNIVERSITY

Human Machine Interface (HMI) that understands / expresses emotional signals

Recent trend• Machines that simply follow the command is realized by recent AI algorithms (such as deep learning).

e.g WEB search, Natural language processing (NLP), Facial detection and identifications, Character and image recognitions

• On the contrary, human does not change so much !Computer allergyHuman cannot specify the task

“I want clothes that look good on me !”“I want to drive nice place”

Human cannot control their emotions.“My car navigation system always shows the wrong roots !”

• Can we give emotion to machine ?

KYOTO UNIVERSITY

KYOTO UNIVERSITY

• Proposed by Dr. Rosalind Picard (MIT Media Lab.), 1995

• ”Affective Computing”

Discussed why computer systems needs emotional sensing and processing.

• Applications of Affective Computing

• Elements of Affective Computing

• Concerns about Affective Computing

TED Talk video

KYOTO UNIVERSITY

• Generally, emotions are assumed to be negative than logics.• Usually people thinks “Emotions disturb ones decisions.”

• Observations of a patient with frontal lobe damage (Antonio Damasio)• He has normal IQ and cognitive abilities, but does not have emotions (appear to be like Star Trek’s Mr. Spock).

• In reality, he always make disastrous decisions.• E.g. Typical people stop investments after large losses,

however, he continues investing until becoming bankrupt.

• He has similar problems in social interactions.• He seems to be unable to learn the links between

dangerous choices and bad feelings, so he repeatsdangerous decisions.

• He will disappear into and endless rational search foranything, even a simple task such as schedule an appointments.• “Well, this time might be good.”• “Maybe I will have to be on that side of town so this time

would be better..”

KYOTO UNIVERSITY

• Emotion is vital for human decision making.• Damasio has hypothesized that Elliot’s brain is missing “somatic markers” that associate positive or

negative feelings with certain decisions. These feelings would help limit a mental search by nudging the person away from considering the possibilities with bad associations.

Similar logic works to the future computer (machine) systems• Computer system that can make emotional decision, beyond logics.

KYOTO UNIVERSITY

Human-Machine relation• Physiological sensing • Pattern analysis• Sensors• Emotional expressions using

machine systems

Human-Human relation• Affective Communication• Affective Wearable Computers

Fundamentals• Modeling and understanding

emotions

Physiological Sensing

PatternAnalysis

Emotional expressionUsing machine systems

Understanding and modelingof Emotions

Affective ComputingApplications

InteractionWith Machine Systems

Affective Communications

KYOTO UNIVERSITY

MAUI: a Multimodal Affective User Interface

KYOTO UNIVERSITY

• What is the emotion for machines.

KYOTO UNIVERSITY

• 135 / 379 users feel genders.

• 87 / 379 users give names.• Most commonly, the robots were given a human name

such as Sarah, Alex, Joe, and Veronica. Other names were wordplays on the word “Roomba”: Roomie, Roomby and Ruby.

• 44 / 379 users find personality.

• 42 / 379 users talks.

• 43 / 379 users dress-up him/her.

http://www.nbcnews.com/id/21102202/ns/technology_and_science-tech_and_gadgets/t/roombas-fill-emotional-vacuum-owners/#.Vdqjna2hMxA

Housewives or Technophiles?: Understanding DomesticRobot Owners

KYOTO UNIVERSITY

MAUI: a Multimodal Affective User Interface

KYOTO UNIVERSITY

• Many efforts have been conducted for modeling emotions.

KYOTO UNIVERSITY

• Facial Action Coding System (FAC) – Ekman• Facial expressions consist of the movement of facial parts (in FACS 68 parts, 10 groups)

Action Units (Aus)

• Six Basic Emotions (Anger, Disgust, Fear, Joy, Sadness, Surprise) consists of the combination of AUs.

KYOTO UNIVERSITY

Sadness:

• close_t_l_eyelid (FAP 19), close_t_r_eyelid (FAP 20), close_b_l_eyelid (FAP 21), close_b_r_eyelid (FAP 22), raise_l_i_eyebrow (FAP 31), raise_r_i_eyebrow (FAP 32), raise_l_m_eyebrow (FAP 33), raise_r_m_eyebrow (FAP 34), raise_l_o_eyebrow (FAP 35),

raise_r_o_eyebrow (FAP 36)

KYOTO UNIVERSITY

• Tracks facial parts Estimate emotions from the movement of facial parts

KYOTO UNIVERSITY

• Waseda University (Takanishi Lab.)

KYOTO UNIVERSITY

• RAFD expression dataset

https://www.behance.net/gallery/10675283/Facial-Expression-Public-Databases

Coding Facial Expressions with Gabor WaveletsLyons M. and Akamatsu S., Kamachi M, Gyoba J., FG98

KYOTO UNIVERSITY

•We show such obvious facial expression in real life ?

KYOTO UNIVERSITY

• Primary and Secondary emotions theory (Damasio)• Primary emotion : Direct output of emotion to input signal, such as

feeling pains.

• Secondary emotion: More higher level of emotions

• Primary emotion is related to the response of amygdala. However, secondary emotion is the response at (ventromedial prefrontal cortex (VMF)).

KYOTO UNIVERSITY

KYOTO UNIVERSITY

• They constructs layered model(Multi-level Dynamic Bayesian Networks) of mental stats and display levels.

KYOTO UNIVERSITY

KYOTO UNIVERSITY• See also http://www.affdex.com/

KYOTO UNIVERSITY

KYOTO UNIVERSITY

Affectiva-MIT Facial Expression Dataset (AM-FED): Naturalistic andSpontaneous Facial Expressions Collected In-the-Wild

KYOTO UNIVERSITY

Physiological Sensing

PatternAnalysis

Emotional expressionUsing machine systems

Understanding and modelingof Emotions

Affective ComputingApplications

InteractionWith Machine Systems

Affective Communications

KYOTO UNIVERSITY

• Mobile EDA sensors skin conductivty

KYOTO UNIVERSITY

KYOTO UNIVERSITY

Wearable sensor reading facial muscles that detects positive facial expressions [Suzuki2013].

KYOTO UNIVERSITY

• Classify the types of smiles using fEMG (facial EMG) and SCR (SkinConductivity)• fEMG measures は眼輪筋、口角下制筋、大頬骨筋の3カ所を計る

KYOTO UNIVERSITY34

August 4, 2014

Recall 98.4%

Precision 83.3%

F measure90.2%

KYOTO UNIVERSITY

• Eye image processing

• Corneal imaging method（Non-calibrated eye gaze tracker, peripheral vision estimation, scene-corneal reflection matching）

• Pupillary reponse

KYOTO UNIVERSITY

• Find ‘where one is looking at ?’

• Many applications• Computer Science / Engineering

• Human behavior analysis

• Saliency estimation

• Drivers view analysis

• Input device, Human interface design

• Other fields• Phycology / Physiology

• Medical/life sciences

• Marketing

San Augustin et al, 2009

Babcock et al, 2004

Current EGT systems have several problemsof usability (calibration, headmount drift)

KYOTO UNIVERSITY

Georgia Institute of Technology

KYOTO UNIVERSITY

Where is the most ‘appealing’ point in a vending machine ?http://bizmakoto.jp/makoto/articles/1403/26/news045.html

KYOTO UNIVERSITY

Eye imageGaze direction

w.r.t. eye-camera

PoG (scene point)Estimation

w.r.t. eye-camera

Gaze (eye) direction estimation Point of gaze (PoG) estimation

Relative pose

Gaze Target

Camera

Gaze

Eye

Require Calibration, Headmount DriftParallax error

KYOTO UNIVERSITY40

Corneal reflections

Outside viewCorneal shape and limbus

Corneal imaging [Nishino and Nayer 2006]

KYOTO UNIVERSITY

• Light arriving at cornea• partially refracts and enters eye

• partially reflects into environment

We can observe human view through corneal reflections

Cornea

Camera Camera

Mirror

• Similar optics with CatadioptricImaging system

KYOTO UNIVERSITY

• Display-Eye Calibration using corneal reflections [ICCV2009]

• Solve fundamentals of corneal reflection analysis [J. Computer Vision and Applications 2011]

• Non-calibrated gaze estimation using corneal imaging [ECCV2012]

• Super-resolution scene reconstruction using corneal reflections [BMVC2012]

• Remote non-calibrated EGT system for infants and babies (with James Rehg’s group @ Georgia Tech.)

• Robust registration of scene and eye reflection using RANRESAC algorithm [MIRU2014]

• Corneal imaging camera [WeSAX2015]

KYOTO UNIVERSITY

• USB2 Micro Camera (1280 x 1024, 15fps)• Two visible LEDs (to identify the corneal center position)• Tracking speed : 10 fps for 3D eye pose estimation and GRP estimation

KYOTO UNIVERSITY

Flow overview Eye image projection

• Use weak-perspective projection.• Eye 3D pose five parameters (x,y,s,t,f)• Estimation of these five parameters using a

particle filter

KYOTO UNIVERSITY

Optical axisCorneal surface

Eye ball

Geometric eye model

Problem: find a corneal surface point that reflects the light from human viewing direction Gaze Reflection Point (GRP)

3D geometry model

Gaze Reflection Point (GRP)

GRP

Point of Gaze

KYOTO UNIVERSITY

KYOTO UNIVERSITY

• Designing 3rd generation Practical use for marketing, health-care fields.

• Much higher resolution (full-HD) and frequency (30Hz).

• More smarter !

KYOTO UNIVERSITY

• We want to capture Eye Gaze of infants in everyday scene.

• Problems in existing methods:– Require body-attached devices.

– Narrow field of movement.

– Restricted in in-door environment.

– Require calibrations

– System errors including drift and parallax errors.

[Franchak2010] (NYU)

KYOTO UNIVERSITY

• Strong relation between ASD and gaze

• Develop a method to observe the gaze behavior of infants / children

• Requirements• Easy mesurement (w/o body-attached devices,

w/o headmount)

• w/o calibrations

KYOTO UNIVERSITY

KYOTO UNIVERSITYCamera array

Zoom and Focus Controller

video

Zoom FocusController

KYOTO UNIVERSITY

• Point of gaze estimation while card magic.

KYOTO UNIVERSITY

• Eye tracking cannot know the one is looking seriously or unseriously.

• We want to estimate the human concentration pupillary response

Concentrating (pupil dilates)Normal

Approach: observe pupil dilationOne is concentrating pupil dilatesBuild a pupil dilation model with respect to the task difficulty.

KYOTO UNIVERSITY

Task

• Ask 10 users path-tracking game (iraira bou)

Experiment

• Observe pupil size using eye observation camera (with IR)

Start

Goal

Cursor

KYOTO UNIVERSITY

KYOTO UNIVERSITY

110

115

120

125

130

135

140

145

150

0 10000 20000 30000 40000 50000 60000 70000 80000 90000IN OUT

5pixel 10pixel10pixel

Dilates when entering narrow pathway

Get smaller after going out to wider pathway

Time（ms）

Pupil s

ize（

pix

el）

KYOTO UNIVERSITY

𝑎𝑖 (random effect analysis)

𝑎𝑖 > 0 **（p=0.0013）

Statistically confirmed the difficulty of the game (pathway width) is related to the pupil dilation.

The model between ID (task difficulty) and pupil change ratio

𝐼𝐷 =1

𝑃𝑎𝑡ℎ𝑊𝑖𝑑𝑡ℎ

𝑃𝑢𝑝𝑖𝑙𝐶ℎ𝑎𝑛𝑔𝑒𝑅𝑎𝑡𝑖𝑜𝑖 = −𝑎𝑖𝐼𝐷

+ 𝑏𝑖

𝑃𝑢𝑝𝑖𝑙𝐶ℎ𝑎𝑛𝑔𝑒𝑅𝑎𝑡𝑖𝑜 = −𝑎𝑖𝑃𝑎𝑡ℎ𝑊𝑖𝑑𝑡ℎ + 𝑏𝑖

Pathway width (pixel)

Pupil d

ilation

(Ratio)

KYOTO UNIVERSITY

1000人/日

5000人/日 10000人/日

3000人/日

リアルシーンページビュー

• Collecting mass gaze data • What is the most viewed object in a city?• How many people looked at a particular advertisement. • Where we should show critical information ?

KYOTO UNIVERSITY

Personal difference

Memorability ofthe object

How the subjectsee the object

Status of the subject

Assumed to be constant in our experiments

Object in a scene

First person view image

Status of the subject

Features retrieved from FPV image Frequency• オブジェクトが一人称視点映像中に何回出てきたか

・・・・・・

一回目二回目

時間

停留時間

1/4

1/4 中心部分周辺部分

Duratioin• オブジェクトが一人称視点映像中に何秒現れたか

Position• オブジェクトが一人称視点映像のどこに現れたか

Scene memorability is related to how the subject see the object.

KYOTO UNIVERSITY

• Looking frequently and longer cause higher memorability.

Subject 1 Subject 2

Subject 3 Subject 4

Frequency [times]

Du

ration

[sec]

Du

ration

[sec]

Du

ration

[sec]

Du

ration

[sec]

Du

ration

[sec])

Frequenct [times]

Total

RED：Correctly rememberedBLUE：Wrongly remembered

Frequency [times]

Frequency [times] Frequency [times]

KYOTO UNIVERSITY

• A novel eye gaze tracking method using the corneal imaging technique.

• Calibration-free, no headmount drift

• obtain human peripherally as well (not only point of gaze)

• Applications for marketing, heath-care purposes, as well as user interface.

• Pupillary response indicates the human concentration.

• Show the model between task difficulty and pupil dilation.

• Potentially has a usage of new bio sensors.

Artificial Intelligence Advanced Visual Interaction Understanding … · 2016-09-28 · Artificial...

Documents

Transcript of Artificial Intelligence Advanced Visual Interaction Understanding … · 2016-09-28 · Artificial...