Artificial Intelligence Advanced Visual Interaction Understanding … · 2016-09-28 · Artificial...
Transcript of Artificial Intelligence Advanced Visual Interaction Understanding … · 2016-09-28 · Artificial...
Artificial Intelligence AdvancedVisual Interaction
Understanding Human Motions and Emotions using Visual Information
Atsushi Nakazawa
IST, Kyoto University
KYOTO UNIVERSITY
• 12/2 (today) – Understanding human emotions using visual information
• 12/9 (next week) – Understanding human motions (behaviors) using visual information
KYOTO UNIVERSITY
• Affective Computing – Treating human emotions in computer systems• Rosalind Piccard – TED talks
• Applications Affective Computing
• How to quantize human emotions ?
• Emotion model: Primary and secondary emotions
• How to measure human emotions ?• Quantize facial actions FACS (Facial Action Coding System)
• Relations between facial expressions and emotions
• Physiological signals and parasympathetic nervous system
• Measurement of human internal states from physiological signals
• Human gaze estimation• Gaze and its measurement techniques
• Corneal imaging and its applications
• Estimate human internal states from eye images – pupillary response
• Summary
KYOTO UNIVERSITY
Human Machine Interface (HMI) that understands / expresses emotional signals
Recent trend• Machines that simply follow the command is realized by recent AI algorithms (such as deep learning).
e.g WEB search, Natural language processing (NLP), Facial detection and identifications, Character and image recognitions
• On the contrary, human does not change so much !Computer allergyHuman cannot specify the task
“I want clothes that look good on me !”“I want to drive nice place”
Human cannot control their emotions.“My car navigation system always shows the wrong roots !”
• Can we give emotion to machine ?
KYOTO UNIVERSITY
KYOTO UNIVERSITY
• Proposed by Dr. Rosalind Picard (MIT Media Lab.), 1995
• ”Affective Computing”
Discussed why computer systems needs emotional sensing and processing.
• Applications of Affective Computing
• Elements of Affective Computing
• Concerns about Affective Computing
TED Talk video
KYOTO UNIVERSITY
• Generally, emotions are assumed to be negative than logics.• Usually people thinks “Emotions disturb ones decisions.”
• Observations of a patient with frontal lobe damage (Antonio Damasio)• He has normal IQ and cognitive abilities, but does not have emotions (appear to be like Star Trek’s Mr. Spock).
• In reality, he always make disastrous decisions.• E.g. Typical people stop investments after large losses,
however, he continues investing until becoming bankrupt.
• He has similar problems in social interactions.• He seems to be unable to learn the links between
dangerous choices and bad feelings, so he repeatsdangerous decisions.
• He will disappear into and endless rational search foranything, even a simple task such as schedule an appointments.• “Well, this time might be good.”• “Maybe I will have to be on that side of town so this time
would be better..”
KYOTO UNIVERSITY
• Emotion is vital for human decision making.• Damasio has hypothesized that Elliot’s brain is missing “somatic markers” that associate positive or
negative feelings with certain decisions. These feelings would help limit a mental search by nudging the person away from considering the possibilities with bad associations.
Similar logic works to the future computer (machine) systems• Computer system that can make emotional decision, beyond logics.
KYOTO UNIVERSITY
Human-Machine relation• Physiological sensing • Pattern analysis• Sensors• Emotional expressions using
machine systems
Human-Human relation• Affective Communication• Affective Wearable Computers
Fundamentals• Modeling and understanding
emotions
Physiological Sensing
PatternAnalysis
Emotional expressionUsing machine systems
Understanding and modelingof Emotions
Affective ComputingApplications
InteractionWith Machine Systems
Affective Communications
KYOTO UNIVERSITY
MAUI: a Multimodal Affective User Interface
KYOTO UNIVERSITY
• What is the emotion for machines.
KYOTO UNIVERSITY
• 135 / 379 users feel genders.
• 87 / 379 users give names.• Most commonly, the robots were given a human name
such as Sarah, Alex, Joe, and Veronica. Other names were wordplays on the word “Roomba”: Roomie, Roomby and Ruby.
• 44 / 379 users find personality.
• 42 / 379 users talks.
• 43 / 379 users dress-up him/her.
http://www.nbcnews.com/id/21102202/ns/technology_and_science-tech_and_gadgets/t/roombas-fill-emotional-vacuum-owners/#.Vdqjna2hMxA
Housewives or Technophiles?: Understanding DomesticRobot Owners
KYOTO UNIVERSITY
MAUI: a Multimodal Affective User Interface
KYOTO UNIVERSITY
• Many efforts have been conducted for modeling emotions.
KYOTO UNIVERSITY
• Facial Action Coding System (FAC) – Ekman• Facial expressions consist of the movement of facial parts (in FACS 68 parts, 10 groups)
Action Units (Aus)
• Six Basic Emotions (Anger, Disgust, Fear, Joy, Sadness, Surprise) consists of the combination of AUs.
KYOTO UNIVERSITY
Sadness:
• close_t_l_eyelid (FAP 19), close_t_r_eyelid (FAP 20), close_b_l_eyelid (FAP 21), close_b_r_eyelid (FAP 22), raise_l_i_eyebrow (FAP 31), raise_r_i_eyebrow (FAP 32), raise_l_m_eyebrow (FAP 33), raise_r_m_eyebrow (FAP 34), raise_l_o_eyebrow (FAP 35),
raise_r_o_eyebrow (FAP 36)
KYOTO UNIVERSITY
• Tracks facial parts Estimate emotions from the movement of facial parts
KYOTO UNIVERSITY
• Waseda University (Takanishi Lab.)
KYOTO UNIVERSITY
• RAFD expression dataset
https://www.behance.net/gallery/10675283/Facial-Expression-Public-Databases
Coding Facial Expressions with Gabor WaveletsLyons M. and Akamatsu S., Kamachi M, Gyoba J., FG98
KYOTO UNIVERSITY
•We show such obvious facial expression in real life ?
KYOTO UNIVERSITY
• Primary and Secondary emotions theory (Damasio)• Primary emotion : Direct output of emotion to input signal, such as
feeling pains.
• Secondary emotion: More higher level of emotions
• Primary emotion is related to the response of amygdala. However, secondary emotion is the response at (ventromedial prefrontal cortex (VMF)).
KYOTO UNIVERSITY
KYOTO UNIVERSITY
• They constructs layered model(Multi-level Dynamic Bayesian Networks) of mental stats and display levels.
KYOTO UNIVERSITY
KYOTO UNIVERSITY• See also http://www.affdex.com/
KYOTO UNIVERSITY
KYOTO UNIVERSITY
KYOTO UNIVERSITY
Affectiva-MIT Facial Expression Dataset (AM-FED): Naturalistic andSpontaneous Facial Expressions Collected In-the-Wild
KYOTO UNIVERSITY
Physiological Sensing
PatternAnalysis
Emotional expressionUsing machine systems
Understanding and modelingof Emotions
Affective ComputingApplications
InteractionWith Machine Systems
Affective Communications
KYOTO UNIVERSITY
• Mobile EDA sensors skin conductivty
KYOTO UNIVERSITY
KYOTO UNIVERSITY
Wearable sensor reading facial muscles that detects positive facial expressions [Suzuki2013].
KYOTO UNIVERSITY
• Classify the types of smiles using fEMG (facial EMG) and SCR (SkinConductivity)• fEMG measures は眼輪筋、口角下制筋、大頬骨筋の3カ所を計る
KYOTO UNIVERSITY34
August 4, 2014
Recall 98.4%
Precision 83.3%
F measure90.2%
KYOTO UNIVERSITY
• Eye image processing
• Corneal imaging method(Non-calibrated eye gaze tracker, peripheral vision estimation, scene-corneal reflection matching)
• Pupillary reponse
KYOTO UNIVERSITY
• Find ‘where one is looking at ?’
• Many applications• Computer Science / Engineering
• Human behavior analysis
• Saliency estimation
• Drivers view analysis
• Input device, Human interface design
• Other fields• Phycology / Physiology
• Medical/life sciences
• Marketing
San Augustin et al, 2009
Babcock et al, 2004
Current EGT systems have several problemsof usability (calibration, headmount drift)
KYOTO UNIVERSITY
Georgia Institute of Technology
KYOTO UNIVERSITY
Where is the most ‘appealing’ point in a vending machine ?http://bizmakoto.jp/makoto/articles/1403/26/news045.html
KYOTO UNIVERSITY
Eye imageGaze direction
w.r.t. eye-camera
PoG (scene point)Estimation
w.r.t. eye-camera
Gaze (eye) direction estimation Point of gaze (PoG) estimation
Relative pose
Gaze Target
Camera
Gaze
Eye
Require Calibration, Headmount DriftParallax error
KYOTO UNIVERSITY40
Corneal reflections
Outside viewCorneal shape and limbus
Corneal imaging [Nishino and Nayer 2006]
KYOTO UNIVERSITY
• Light arriving at cornea• partially refracts and enters eye
• partially reflects into environment
We can observe human view through corneal reflections
Cornea
Camera Camera
Mirror
• Similar optics with CatadioptricImaging system
KYOTO UNIVERSITY
• Display-Eye Calibration using corneal reflections [ICCV2009]
• Solve fundamentals of corneal reflection analysis [J. Computer Vision and Applications 2011]
• Non-calibrated gaze estimation using corneal imaging [ECCV2012]
• Super-resolution scene reconstruction using corneal reflections [BMVC2012]
• Remote non-calibrated EGT system for infants and babies (with James Rehg’s group @ Georgia Tech.)
• Robust registration of scene and eye reflection using RANRESAC algorithm [MIRU2014]
• Corneal imaging camera [WeSAX2015]
KYOTO UNIVERSITY
• USB2 Micro Camera (1280 x 1024, 15fps)• Two visible LEDs (to identify the corneal center position)• Tracking speed : 10 fps for 3D eye pose estimation and GRP estimation
KYOTO UNIVERSITY
Flow overview Eye image projection
• Use weak-perspective projection.• Eye 3D pose five parameters (x,y,s,t,f)• Estimation of these five parameters using a
particle filter
KYOTO UNIVERSITY
Optical axisCorneal surface
Eye ball
Geometric eye model
Problem: find a corneal surface point that reflects the light from human viewing direction Gaze Reflection Point (GRP)
3D geometry model
Gaze Reflection Point (GRP)
GRP
Point of Gaze
KYOTO UNIVERSITY
KYOTO UNIVERSITY
KYOTO UNIVERSITY
KYOTO UNIVERSITY
KYOTO UNIVERSITY
KYOTO UNIVERSITY
KYOTO UNIVERSITY
• Designing 3rd generation Practical use for marketing, health-care fields.
• Much higher resolution (full-HD) and frequency (30Hz).
• More smarter !
KYOTO UNIVERSITY
• We want to capture Eye Gaze of infants in everyday scene.
• Problems in existing methods:– Require body-attached devices.
– Narrow field of movement.
– Restricted in in-door environment.
– Require calibrations
– System errors including drift and parallax errors.
[Franchak2010] (NYU)
KYOTO UNIVERSITY
• Strong relation between ASD and gaze
• Develop a method to observe the gaze behavior of infants / children
• Requirements• Easy mesurement (w/o body-attached devices,
w/o headmount)
• w/o calibrations
KYOTO UNIVERSITY
KYOTO UNIVERSITYCamera array
Zoom and Focus Controller
video
Zoom FocusController
KYOTO UNIVERSITY
• Point of gaze estimation while card magic.
KYOTO UNIVERSITY
• Eye tracking cannot know the one is looking seriously or unseriously.
• We want to estimate the human concentration pupillary response
Concentrating (pupil dilates)Normal
Approach: observe pupil dilationOne is concentrating pupil dilatesBuild a pupil dilation model with respect to the task difficulty.
KYOTO UNIVERSITY
Task
• Ask 10 users path-tracking game (iraira bou)
Experiment
• Observe pupil size using eye observation camera (with IR)
Start
Goal
Cursor
KYOTO UNIVERSITY
KYOTO UNIVERSITY
110
115
120
125
130
135
140
145
150
0 10000 20000 30000 40000 50000 60000 70000 80000 90000IN OUT
5pixel 10pixel10pixel
Dilates when entering narrow pathway
Get smaller after going out to wider pathway
Time(ms)
Pupil s
ize(
pix
el)
KYOTO UNIVERSITY
𝑎𝑖 (random effect analysis)
𝑎𝑖 > 0 **(p=0.0013)
Statistically confirmed the difficulty of the game (pathway width) is related to the pupil dilation.
The model between ID (task difficulty) and pupil change ratio
𝐼𝐷 =1
𝑃𝑎𝑡ℎ𝑊𝑖𝑑𝑡ℎ
𝑃𝑢𝑝𝑖𝑙𝐶ℎ𝑎𝑛𝑔𝑒𝑅𝑎𝑡𝑖𝑜𝑖 = −𝑎𝑖𝐼𝐷
+ 𝑏𝑖
𝑃𝑢𝑝𝑖𝑙𝐶ℎ𝑎𝑛𝑔𝑒𝑅𝑎𝑡𝑖𝑜 = −𝑎𝑖𝑃𝑎𝑡ℎ𝑊𝑖𝑑𝑡ℎ + 𝑏𝑖
Pathway width (pixel)
Pupil d
ilation
(Ratio)
KYOTO UNIVERSITY
1000人/日
5000人/日 10000人/日
3000人/日
リアルシーンページビュー
• Collecting mass gaze data • What is the most viewed object in a city?• How many people looked at a particular advertisement. • Where we should show critical information ?
KYOTO UNIVERSITY
Personal difference
Memorability ofthe object
How the subjectsee the object
Status of the subject
Assumed to be constant in our experiments
Object in a scene
First person view image
Status of the subject
Features retrieved from FPV image Frequency• オブジェクトが一人称視点映像中に何回出てきたか
・・・ ・・・
一回目 二回目
時間
停留時間
1/4
1/4 中心部分 周辺部分
Duratioin• オブジェクトが一人称視点映像中に何秒現れたか
Position• オブジェクトが一人称視点映像のどこに現れたか
Scene memorability is related to how the subject see the object.
KYOTO UNIVERSITY
• Looking frequently and longer cause higher memorability.
Subject 1 Subject 2
Subject 3 Subject 4
Frequency [times]
Du
ration
[sec]
Du
ration
[sec]
Du
ration
[sec]
Du
ration
[sec]
Du
ration
[sec])
Frequenct [times]
Total
RED:Correctly rememberedBLUE:Wrongly remembered
Frequency [times]
Frequency [times] Frequency [times]
KYOTO UNIVERSITY
• A novel eye gaze tracking method using the corneal imaging technique.
• Calibration-free, no headmount drift
• obtain human peripherally as well (not only point of gaze)
• Applications for marketing, heath-care purposes, as well as user interface.
• Pupillary response indicates the human concentration.
• Show the model between task difficulty and pupil dilation.
• Potentially has a usage of new bio sensors.