EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence...
Transcript of EMOTIC: Emotions in Context Dataset...Recognizing people’s emotions from their frame of ref-erence...
EMOTIC: Emotions in Context Dataset
Ronak Kosti∗, Jose M. Alvarez†, Adria Recasens‡, Agata Lapedriza∗
Universitat Oberta de Catalunya∗
Data61 / CSIRO†
Massachusetts Institute of Technology‡
{rkosti, alapedriza}@uoc.edu∗, [email protected]†, [email protected]‡
Abstract
Recognizing people’s emotions from their frame of ref-
erence is very important in our everyday life. This capac-
ity helps us to perceive or predict the subsequent actions
of people, interact effectively with them and to be sympa-
thetic and sensitive toward them. Hence, one should ex-
pect that a machine needs to have a similar capability of
understanding people’s feelings in order to correctly inter-
act with humans. Current research on emotion recognition
has focused on the analysis of facial expressions. However,
recognizing emotions requires also understanding the scene
in which a person is immersed. The unavailability of suit-
able data to study such a problem has made research in
emotion recognition in context difficult. In this paper, we
present the EMOTIC database (from EMOTions In Con-
text), a database of images with people in real environ-
ments, annotated with their apparent emotions. We defined
an extended list of 26 emotion categories to annotate the
images, and combined these annotations with three com-
mon continuous dimensions: Valence, Arousal, and Dom-
inance. Images in the database are annotated using the
Amazon Mechanical Turk (AMT) platform. The resulting
set contains 18, 313 images with 23, 788 annotated people.
The goal of this paper is to present the EMOTIC database,
detailing how it was created and the information available.
We expect this dataset can help to open up new horizons on
creating systems able of recognizing rich information about
people’s apparent emotional states.
1. Introduction
When we look at a person it is very easy for us to put
ourselves in her situation, and even to feel, to some degree,
things that this person appears to be feeling. We use this ex-
ceptional ability of guessing how others feel constantly in
our daily lives. Such empathizing capacity serves us to be
more helpful, sensitive, sympathetic, affectionate and cor-
Face / Head Body Person in Context
a)
b)
c)
Figure 1: What can we estimate about these people emo-
tional states?
dial in our social interactions. More generally, this capacity
help us to understand better other people, to understand the
motivations and goals behind their actions and to predict
how they will react to different events.
In this paper we introduce the EMOTIC (Emotions in
Context) database. The EMOTIC database is a large scale
annotated image database with people in context. In this
database people are annotated according to their apparent
emotions, with a rich set of 26 emotional categories and also
with continuous dimensions (valence, arousal and domi-
nance). The images, which show context of the person,
61
cover a wide range of scene types and activities, allowing
the study of emotion recognition beyond the analysis of fa-
cial expressions.
There has been a lot of research in emotion recogni-
tion from images. In particular, remarkable research has
been done in the direction of recognizing the 6 basic emo-
tions [11] (anger, disgust, fear, happiness, sadness, and sur-
prise), mainly from facial expression, but also from body
language analysis. In section 2, we give a brief overview
of some of the most relevant works. However, despite these
efforts, machines are still far from recognizing emotional
states as we do.
The problem of emotional state recognition is extremely
complex, but our hypothesis is that there are two main im-
portant limitations in the current approaches: (1) First, most
of the existing databases in emotion recognition lack of fine-
grain labels of human emotions. Most studies classify emo-
tions according to 6 categories, but this is far from the fine
grain categorization that humans are capable of. In this
work we introduce a more sophisticated set of 26 emotion
categories and combine them with the common continuous
dimensions (valence, arousal and dominance). This combi-
nation provides a rich description of the emotional state of
a person. (2) Second, the context (the surroundings of the
person) is an important source of information and has been
traditionally dismissed. The EMOTIC database attempts to
overcome these two issues.
Recent research in psychology highlights the importance
of context in the perception of emotions [4]. The impor-
tance of the context to infer fine grain information on ap-
parent emotional states is illustrated in Figure 1. For ex-
ample, in Figure 1.a, when we look only at the face of the
person (first column), we can guess that the person is feel-
ing Happiness, but is hard to infer additional information
about her emotions. The body pose and clothing (second
column) gives additional clues and we can infer that she
is practicing some sport. However, when we consider the
whole context (third column), we can see that she was in-
volved in a competition and she got the first prize. From
this information, we can say she probably feels Excitement
and Confidence. A similar story can be told for the person in
Figure 1.b. We only see part of the face (first column) which
is not very informative, but the body (second column) indi-
cates that the person is looking away toward something or
someone - which apparently has his attention. Even now
we cannot tell much. It is only when we look at the whole
context (third column) that it becomes clear the person is
in a meeting room and he is paying attention to a person
talking, probably feeling (Engagement). Figure 1.c shows
even a more challenging situation. We just see the back of
the person’s head (first column) which does not give any
information about the emotional state of the person. The
body pose (second column) reveals part of the story, but it
is only the whole image (third column) that tells us that the
boy is playing, so he probably feels Engagement with some
other kids, and he is probably in a state of Anticipation to
the trajectory of the ball. The goal of the EMOTIC dataset
is to provide data for developing automatic systems that can
make these type of inferences.
The EMOTIC database is introduced in a paper accepted
for the Computer Vision and Pattern Recognition 2017 con-
ference [18]. In this paper we give more details on the
dataset, on the image annotation, and on the annotation
consistency among different annotators. We present also
extended statistics and algorithmic analysis of the data us-
ing state-of-the-art methods for scene category and scene
attribute recognition. Our analytics show different distribu-
tion patterns of emotions depending on the different scenes
and attributes. The obtained results indicate that current
systems of scene understanding can be used to incorpo-
rate the analysis of context in the understanding of people’s
emotions. Thus, we think that EMOTIC dataset, in com-
bination with previous datasets for emotion estimation (see
Section 2.1 for an overview), can help to make further steps
in the direction of designing systems capable of recognizing
people’s emotions as humans do.
2. Related Work
Most of the research in computer vision to recognize
emotional states is contextualized in facial expression anal-
ysis (e.g.,[5, 13]). Some of these methods are based on the
Facial Action Coding System [15, 29]. This system uses
a set of specific localized movements of the face, called
Action Units, to encode the facial expression. These Ac-
tion Units can be recognized from geometric-based fea-
tures and/or appearance features extracted from face images
[23, 19, 12]. Recent works for emotion recognition based
on facial expression use CNNs to recognize the emotions
and/or the Action Units [5].
Instead of recognizing emotion categories, some recent
works on facial expression [27] use the continuous dimen-
sions of the VAD Emotional State Model [21] to represent
emotions. The VAD model describes emotions using 3 nu-
merical dimensions: Valence (V), that measures how pos-
itive or pleasant an emotion is, ranging from negative to
positive; Arousal (A), that measures the agitation level of
the person, ranging from non-active / in calm to agitated /
ready to act; and Dominance (D) that measures the control
level of the situation by the person, ranging from submissive
/ non-control to dominant / in-control. On the other hand,
Du et al. [10] proposed a set of 21 facial emotion categories,
defined as different combinations of the basic emotions, like
‘happily surprised’ or ‘happily disgusted’. This categoriza-
tion gives more detail about the expressed emotion.
Although most of the works in recognizing emotions are
focused on face analysis, there are a few works in computer
62
vision that address emotion recognition using other visual
clues apart from the face. For instance, some works [22]
consider the location of shoulders as additional information
to the face features to recognize basic emotions. More gen-
erally, Schindler et al. [26] used the body pose to recognize
the 6 basic emotions, performing experiments on a small
dataset of non-spontaneous poses acquired under controlled
conditions.
2.1. Related datasets
In recent years we observed a significant emergence
of affective datasets to recognize people’s emotions. The
GENKI database [1] contains frontal face images of single
person with wide ranging illumination, geographical, per-
sonal and ethnic setting and the images are labeled as smil-
ing or non-smiling. The ICML face-expression recognition
dataset [2] consists of 28k images annotated with 6 basic
emotions and a neutral category. The UCDSEE dataset [28]
has a set of 9 emotion expressions acted by 4 persons ac-
quired using strictly the same lab setting in order to focus
mainly on the facial expression of the person.
The dynamic body movement is also an essential source
of emotion. The studies [16, 17] establish the relationship
between affect and body posture using as ground truth the
base-rate of human observers. The data used consists of
a spontaneous subset acquired under a controlled setting
(people playing Wii games). The GEMEP database [3] is
a multi-modal (audio and video) dataset and comprises 10actors playing 18 affective states. The dataset has videos
of actors showing emotions through acting - body pose and
facial expression combined.
EMOTIW challenges [7] hosts 3 databases viz. 1) The
AFEW database [6] focuses on emotion recognition from
video frames taken from movies and TV shows; where the
actions are semi-spontaneous and are annotated with at-
tributes like name, age of actor, age of character, pose, gen-
der, expression of person, the overall clip expression and
the basic 6 emotions and a neutral category; 2) The SFEW
dataset [8] is a subset of AFEW database consisting of im-
ages of face-frames annotated specifically with the basic
6 emotions and a neutral category and, 3) The HAPPEI
database [9] focuses on the problem of group level emo-
tion estimation and it is a first attempt to use context for the
problem of predicting happiness in groups of people.
Finally, the MSCOCO dataset has been recently anno-
tated with object attributes [24], including some emotion
categories for people, such as happy and curious. These
attributes show some overlap with the categories that we
define in this paper. However, COCO attributes are not in-
tended to be exhaustive for emotion recognition, and not ev-
ery person in the dataset is annotated with affect attributes.
1. Peace: well being and relaxed; no worry; having positive thoughts or
sensations; satisfied
2. Affection: fond feelings; love; tenderness
3. Esteem: feelings of favorable opinion or judgment; respect; admiration;
gratefulness
4. Anticipation: state of looking forward; hoping on or getting prepared
for possible future events
5. Engagement: paying attention to something; absorbed into something;
curious; interested
6. Confidence: feeling of being certain; conviction that an outcome will be
favorable; encouraged; proud
7. Happiness: feeling delighted; feeling enjoyment or amusement
8. Pleasure: feeling of delight in the senses
9. Excitement: feeling enthusiasm; stimulated; energetic
10. Surprise: sudden discovery of something unexpected
11. Sympathy: state of sharing others emotions, goals or troubles; sup-
portive; compassionate
12. Doubt/Confusion: difficulty to understand or decide; thinking about
different options
13. Disconnection: feeling not interested in the main event of the sur-
rounding; indifferent; bored; distracted
14. Fatigue: weariness; tiredness; sleepy
15. Embarrassment: feeling ashamed or guilty
16. Yearning: strong desire to have something; jealous; envious; lust
17. Disapproval: feeling that something is wrong or reprehensible; con-
tempt; hostile
18. Aversion: feeling disgust, dislike, repulsion; feeling hate
19. Annoyance: bothered by something or someone; irritated; impatient;
frustrated
20. Anger: intense displeasure or rage; furious; resentful
21. Sensitivity: feeling of being physically or emotionally wounded; feel-
ing delicate or vulnerable
22. Sadness: feeling unhappy, sorrow, disappointed, or discouraged
23. Disquietment: nervous; worried; upset; anxious; tense; pressured;
alarmed
24. Fear: feeling suspicious or afraid of danger, threat, evil or pain; horror
25. Pain: physical suffering
26. Suffering: psychological or emotional pain; distressed; anguished
Table 1: Proposed emotion categories with definitions.
3. EMOTIC Dataset Creation
Our aim was to create a database of natural images, cap-
turing the subjects and their contexts with their natural un-
constrained environments. We started by collecting im-
ages from well established datasets like MSCOCO [20] and
ADE20K [33]. These datasets host a good number of im-
ages which satisfy our criteria. We also downloaded im-
ages after searching on Google search engine. We used
various combination of words representing a varied mix-
ture of subjects, locations, situations and contexts. This re-
sulted in a challenging collection of images, that combine
images of people under different situations, performing dif-
ferent tasks and showing a wide range of emotional states.
Currently, the EMOTIC database consists of 18316 images
with 23788 people annotated. EMOTIC dataset is split in
training (70%), validation (10%), and testing (20%) sets.
63
3.1. Emotion Representation
EMOTIC dataset combines 2 methods to represent emo-
tions:
• Discrete Categories: We define an extended list of 26emotional categories. Table 1 gives the definition of
each emotion category. Two examples of images for
each of the emotion category are shown in Figure 2.
• Continuous Dimensions: We also used the VAD Emo-
tional State Model to represent emotions. The contin-
uous dimensions annotations in the database are in a
1− 10 scale. Figure 3 shows examples of people with
different levels of each one of these three dimensions.
In Figure 4 we show images of the EMOTIC database
along with their annotations.
To define the 26 emotion categories, we collected a vo-
cabulary of affective states. Using word connections (syn-
onyms, affiliations, relevance) and the inter-dependence of a
group of words (psychological research and affective com-
puting [14, 25]), we started forming word-groupings. After
multiple iterations and cross-referencing with dictionaries
and research in affective computing, we obtained the final
26 categories (ref Table 1). While forming this group of 26emotion categories, we adjudged it necessary for the group
to follow two important conditions: (1) Disjointness and (2)
Visual Separability. By disjointness, we mean that given
any category pair, {c1, c2}, we could always find an exam-
ple of image where just one of the categories apply (and not
the other). By visual separability we mean that two affec-
tive states were assigned to the same emotion group in case
we find, qualitatively, that the two could not be visually sep-
arable under the conditions of our database. For example,
in Figure 2 the images for 7.Happiness and 1.Peace show
clearly the visual separability of these categories in spite
of being similar. However, the category excitement, for in-
stance, includes the subcategories “enthusiastic, stimulated,
and energetic.” Each of these three words have a specific
and different meaning, but it is very difficult to separate one
from another just after seeing a single image. In our list of
categories we decided to avoid the neutral category since
we think that, in general, at least one category applies, even
though it could apply just with low intensity.
Notice that our 26 categories also include the 6 basic
emotions (categories 7, 10, 18, 20, 22, 24) defined by Ek-
man [11]. Note that Aversion is a general form of the basic
emotion Disgust, hence it makes more sense to keep it as a
main emotion category.
3.2. Image Annotation
Images have been annotated on the Amazon Mechanical
Turk(AMT) Platform. Figure 6 shows the two different an-
notation interfaces created for labeling images with emotion
categories (Figure 6.a) and continuous dimensions (Figure
6.b). In the AMT interface for continuous dimensions we
also asked AMT workers to annotated the gender and the
estimated age range of the person in the bounding box ac-
cording to the following age ranges: kid (0-12 years old),
teenager (13-20 years old), adult (more than 20 years old).
We adopted two methods to control the annotation qual-
ity, apart of providing the AMT workers with extended an-
notation instructions and examples. First, we launched a
qualification task to monitor the annotation of one image
according to discrete categories and one image according to
continuous dimensions. In each of the control figures we
manually selected all the categories that we thought were
not clearly apply for that image, and also those ranges of
continuous dimensions that were, in our opinion, out of an
acceptable response. For instance, in the image shown in 6,
a worker that selected pleasure, disconnection, sadness,
fear, pain, suffering, or a worker that selects 1 − 2 for
valence, arousal or dominance would not pass. We con-
sidered that a worker that labeled according to these light
restrictions understood well the task and did not make a
random labeling. Just those workers that label the control
images under the the mentioned restrictions were allowed
to label images of the dataset. Secondly, we insert 2 con-
trol images, with restrictions similar to the ones mentioned
before, for every 18 images to track the consistency during
the annotation. The annotations of our database come from
those workers that keep passing the control images during
their HITs.
3.3. Annotation Consistency
The Validation set was annotated by 5 different workers,
to check the annotation consistency among different peo-
ple. Although there is an inherent subjective nature in this
annotation task, we observed that there is a degree of agree-
ment among different annotators. To measure this agree-
ment quantitatively, we computed Fleiss’ Kappa (κ) statis-
tic using the 5 annotations of each image in the validation
set. The obtained result is κ = 0.31. Furthermore, more
than 50% of the images in the validation set has κ > 0.31,
which indicates a much higher agreement level than ran-
dom chance (notice that random annotations will produce
κ ∼ 0).
For every image in validation set, there is at least one cat-
egory which is annotated by 2 or more annotators. In 60%images (validation), there is at least one category which is
annotated by at least 3 annotators. These statistics which are
reflected in the Figure 5 are a good indicator of the agree-
ableness amongst the human annotators.
For continuous dimensions, the standard deviation
among the 3 different workers is, in average, 1.41, 0.70 and
2.12 for valence, arousal and dominance respectively. Dom-
inance shows a higher dispersion in its value as compared to
64
Figure 2: Visual examples of the 26 emotion categories defined in Table 1. Per each category we show two images where the
person marked with the red bounding box has been annotated with the corresponding category.
Arousal
Dominance
1
1 53 8 10
1 53 8 10
1 53 8 10
Valence
Figure 3: Examples of images from the EMOTIC dataset
with different scores of Valence, Arousal, and Dominance.
other dimensions. This means that the agreement deviation
is higher for dominance than for other dimensions. Note
that the range of values for each dimension is between 1 to
10 - 1 being the lowest and 10 the highest level.
4. Dataset Statistics
Of the 23, 788 persons annotated in the dataset, 66%are males and 34% are females. The age distribution of
the annotated people is the following: 11% children, 11%teenagers, and 78% adults.
Figure 7.a shows the number of people for each emo-
tional category, while Figures 7.b, 7.c and 7.d show the
number of people for valence, arousal and dominance con-
tinuous dimensions for each score, respectively.
It is interesting to see how the values (1 - lowest to 10 -
highest) of a given continuous dimension are spread across
the dataset for each emotional category. Figure 8.a shows
Anticipation
Excitement
Engagement
V A D0
0.5
1!"#$%&'#("
)*$%+,-,"+.
)"/'/,-,"+
!"#$%&'#("
)*$%+,-,"+
)".'.,-,"+V A D
0
0.5
1!"##$%&''(
)&"*%$%+
(,%+"+&-&%.
Yearning
Disquietment
Annoyance
V A D0
0.5
1!"#$%&%'
(&)*+&,"-"%,
.%%/0#%1"2
!"#$%&%'(
)&*+,&-"."%-(
/%%01#%2"
Peace
Esteem
Happiness
V A D0
0.5
1Peace
Esteem
Happiness
Anticipation
Excitement
Engagement
Yearning
Disquitement
Annoyance
Happiness
Yearning
Engagement
Pleasure
10 10
10 10
Figure 4: Annotated images from the EMOTIC dataset.
the plot for the spread of Valence values across each one
of them. The categories are sorted according to the mean
value of Valence across each category. We can clearly see
that the lowest mean of Valence is for emotions like Suffer-
ing, Pain, Sadness - indicating that Valence values, in av-
erage, are low for these categories. This shows consistency
on the annotations of our EMOTIC dataset, since one would
expect that images with emotion categories like Suffering,
Pain, Sadness should exhibit, in average, low levels of Va-
lence. Similarly, it is clear that emotions like Happiness,
65
0 500 1000 1500 2000 2500Images
0
10
20
30
40
50
60
70
80
90
100
% o
f C
ate
go
rie
s
% of Categories annotated by more than 1(black), 2(red), 3(green), 4(blue) annotators
More than 1 AnnotatorsMore than 2 AnnotatorsMore than 3 AnnotatorsMore than 4 Annotators
Figure 5: Images of validation set with at least one of all the
annotated categories with more than (1,2,3,4) annotators
a)
b)
Back Go to Next Image(Image 1 of 20)
Peace (well being and relaxed/no worry/positive sensation/satis�ed)
A�ection (fond feelings/tenderness/love/compassion)
Expectation (state of anticipating/hoping on something or someone)
Esteem (favorable opinion or judgment/gratefulness/admiration/respect)
Con�dence (feeling of being certain/proud/encouraged/optimistic)
Engagement (occupied/absorbed/interested/paying attention to something)
Pleasure (feeling of delight in the senses)
Happiness (feeling delighted/enjoyment/amusement)
Excitement (pleasant and excited state/stimulated/energetic/enthusiastic)
Surprise (sudden discovery of something unexpected)
Su�ering (distressed/perturbed/anguished)
Disapproval (think that something is wrong or reprehensible/contempt/hostile)
Yearning (strong desire to have something/jealous/envious)
Fatigue (weariness/tiredness/sleepy)
Pain (physical su�ering)
Doubt/Confusion (di�culty to understand or decide/sceptical/lost)
Fear (feeling afraid of danger/evil/pain/horror)
Vulnerability (feeling of being physically or emotionally wounded)
Disquitement (unpleasant restlessness/tense/worried/upset/stressed)
Annoyance (bothered/iritated/impatient/troubled/frustrated)
Anger (intense displeasure or rage/furious/resentful)
Disgust (feeling dislike or repulsion/feeling hateful)
Sadness (feeling unhappy/grief/disappointed/discouraged)
Disconnection (not participating/indi�erent/bored/distracted)
Embarrassment (feeling ashamed or guilty)
Back Go to Next Image(Image 1 of 20)
Valence: Negative vs. Positive
Arousal (awakeness): Calm vs. Ready to act
Dominance: Dominated vs. In control
Gender and age of the person in the yellow box
Positive
(pleasant)
Negative
(unpleasant)
Ready to act
(active)Calm
In
control
Dominated
(no
control)
Male Female
Kid (0-12) Teenager (13-20) Adult (more than 20)
Figure 6: AMT Interfaces for emotion category (a) and con-
tinuous dimensions (a) annotation.
Affection, Pleasure should have a higher level of Valence
and this fact is apparent from the plot itself. Finally, as ex-
pected, the plot also shows the correlation between a person
in a disconnected emotional state and a mid-level Valence
value - Disconnection lies in the mid-range of the sorted
emotion categories. Similarly, in Figure 8.b we can see the
same type of information for the arousal dimension. Notice
5.Eng
agem
ent
4.Ant
icipat
ion
7.Hap
pine
ss
9.Exc
item
ent
6.Con
fiden
ce
8.Ple
asur
e
1.Pea
ce
2.Affe
ction
13.D
isco
nnec
tion
3.Est
eem
11.S
ympa
thy
16.Y
earn
ing
14.F
atigue
12.D
oubt
/Con
fusion
23.D
isqu
ietm
ent
19.A
nnoy
ance
21.S
ensitiv
ity
17.D
isap
prov
al
10.S
urpr
ise
22.S
adne
ss
26.S
uffe
ring
18.A
vers
ion
20.A
nger
24.F
ear
25.P
ain
15.E
mba
rrass
men
t
103
104
Nu
mb
er
of P
ers
on
s
Categories: Number of persons per category
50%
27% 26% 26%23%
11%
8%
6% 6%5% 5%
4%3% 3%
3%
2% 2% 2% 2% 2%
2%
1% 1%1% 1%
1%
Total Number of People: 23788
Total Number of Images: 18316
a)
Nu
mb
er
of
Pe
op
le
b) c) d)
1%
1%
3%
7%
16%
36%
19%
14%
3%
1%
1%
2%
5%
23%
11%14%
20%
11%
6%5%
1%
2%
3%
5%6%
20%
34%
14%
10%
5%
Nu
mb
er
of P
eo
ple
1 2 3 1 2 3 4 5 6 7 8 9 10 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Valence Scores Arousal Scores
104
103
10
Dominance Scores
2
103
103
104
102
102
Figure 7: a) Number of people per emotion category in the
EMOTIC dataset; b),c),d) Number of people per score in
each of the continuous dimensions.
that fatigue and sadness are the categories that, have lowest
arousal score in average, meaning that when these feelings
are present, people are usually in a low level of agitation.
On the other hand, confidence and excitement are the cate-
gories with highest arousal level. Finally, Figure 8.c shows
the distribution of the dominance scores. The categories
with lowest dominance level (people feeling they are not in
control of the situation) are suffering and pain, while the
highest dominance levels in average are shown with the cat-
egories confidence and excitement. We observe that these
types of category sorting are consistent with our common
sense knowledge. However, we also observe in these graph-
ics that per each category we have a some relevant variabil-
ity of the continuous dimension scores. This suggests that
the information contributed by each type of annotation can
be complementary and not redundant.
We also computed the co-occurrence probability of cate-
gories. These probabilities are shown in Figure 9. Given
a row, corresponding to the emotional category c1, and
a column, corresponding to the emotional category c2,
each entry corresponds to the probability P (c2|c1). From
these results we can observe interesting patterns of cat-
egory co-occurrences. For instance, we see that when
a person feels affection it is very likely that she also
feels happiness, or that when a person feels anger she
is also likely to feel annoyance. More generally, we
used k-means to cluster category annotations and we ob-
66
26.Suffe
ring
25.Pain
22.Sadness
20.Anger
19.Annoya
nce
18.Ave
rsio
n
23.Disq
uietm
ent
12.Doubt/C
onfusio
n
17.Disa
pprova
l
15.Em
barrass
ment
24.Fear
14.Fatig
ue
13.Disc
onnectio
n
21.Sensit
ivity
10.Surp
rise
16.Yearn
ing
5.Engagem
ent
4.Antic
ipatio
n
1.Peace
6.Confid
ence
11.Sym
pathy
9.Exc
item
ent
3.Est
eem
8.Ple
asure
2.Affe
ctio
n
7.Happin
ess0
10
20
30
40
50
60
70
80
90
100
% o
f eac
h sc
ore
per
Cat
egor
y
Distribution of Valence values across Categories
12345678910
14.Fatig
ue
22.Sadness
13.Disc
onnectio
n
26.Suffe
ring
17.Disa
pprova
l
25.Pain
12.Doubt/C
onfusio
n
1.Peace
15.Em
barrass
ment
16.Yearn
ing
19.Annoya
nce
21.Sensit
ivity
23.Disq
uietm
ent
11.Sym
pathy
2.Affe
ctio
n
18.Ave
rsio
n
10.Surp
rise
7.Happin
ess
3.Est
eem
8.Ple
asure
20.Anger
5.Engagem
ent
24.Fear
4.Antic
ipatio
n
6.Confid
ence
9.Exc
item
ent0
10
20
30
40
50
60
70
80
90
100
% o
f eac
h sc
ore
per
Cat
egor
y
Distribution of Arousal values across Categories
12345678910
26.Suffe
ring
25.Pain
22.Sadness
17.Disa
pprova
l
18.Ave
rsio
n
20.Anger
24.Fear
19.Annoya
nce
12.Doubt/C
onfusio
n
23.Disq
uietm
ent
14.Fatig
ue
15.Em
barrass
ment
10.Surp
rise
21.Sensit
ivity
13.Disc
onnectio
n
16.Yearn
ing
1.Peace
2.Affe
ctio
n
11.Sym
pathy
8.Ple
asure
7.Happin
ess
5.Engagem
ent
3.Est
eem
4.Antic
ipatio
n
9.Exc
item
ent
6.Confid
ence0
10
20
30
40
50
60
70
80
90
100
% o
f eac
h sc
ore
per
Cat
egor
y
Distribution of Dominance values across Categories
12345678910
0
a)
b)
c)
Figure 8: Per each of the continuous dimensions, distribu-
tion of the scores across the different emotion categories.
served that some category groups appear frequently in the
EMOTIC database. Some examples are {anticipation, en-
gagement, confidence}, {affection, happiness, pleasure},
{doubt/confusion, disapproval, annoyance}, {yearning, an-
noyance, disquietment}.
5. Dataset Algorithmic Analysis
We classified the scenes of EMOTIC dataset using a
“state-of-the-art Convolutional Neural Network model for
scene recognition [32]. Figure 10 shows two examples of
scene category and scene attribute recognition in images of
EMOTIC dataset. The figure shows the original images, the
class activation maps [30] (that correspond to the region of
the image that supports the decision of the classifier, and the
scene categories (top 2) and scene attribute (top 5) automat-
ically recognized in each image. As we can see, the results
are very accurate. In general, as reported by the authors in
[31], the recognition accuracy of these systems is around
78%, according to the feedback provided by the users of
online demo for scene recognition.
Cooccurence of Categories100.00
15.76
10.52
3.39
6.41
8.33
12.61
17.08
4.40
4.08
9.87
2.82
10.41
8.59
1.93
10.49
1.51
0.72
0.72
0.00
23.26
2.25
0.59
1.72
2.19
1.59
11.43
100.00
24.11
2.91
3.46
4.41
14.71
23.04
6.77
6.80
34.51
1.74
0.75
0.76
3.86
2.87
0.57
1.09
0.00
0.36
9.38
2.87
0.88
2.16
1.32
1.59
6.57
20.76
100.00
5.40
4.04
8.79
9.02
11.31
6.15
2.91
26.78
0.94
2.41
0.51
3.38
2.54
0.38
1.09
0.36
0.73
1.88
0.20
0.74
2.59
0.44
0.53
11.38
13.48
29.08
100.00
32.10
31.58
12.99
18.15
31.34
14.76
19.06
20.81
8.45
10.35
16.43
13.58
10.94
9.42
9.34
6.57
11.26
3.69
14.12
24.14
5.26
3.97
39.56
29.46
39.93
59.01
100.00
58.46
34.88
39.85
55.11
29.13
24.64
27.25
25.41
23.23
12.08
12.80
23.96
14.86
19.93
14.96
12.95
9.63
23.09
18.10
12.28
6.88
23.27
16.98
39.35
26.27
26.46
100.00
22.11
28.50
40.48
11.26
15.54
3.76
5.51
6.82
6.76
5.08
3.96
3.26
5.03
5.11
3.38
0.41
5.00
9.48
3.51
1.06
40.33
64.91
46.23
12.38
18.08
25.32
100.00
63.31
40.74
24.08
45.06
2.28
5.05
3.41
7.73
7.17
1.70
0.36
0.54
0.36
10.51
0.00
0.74
0.43
2.63
1.85
22.96
42.72
24.36
7.27
8.68
13.72
26.60
100.00
16.72
14.95
19.91
2.15
2.11
2.40
4.35
4.97
1.51
6.88
0.00
0.36
6.19
0.00
0.15
2.16
3.07
1.32
13.96
29.67
31.32
29.64
28.36
46.02
40.44
39.50
100.00
26.60
16.91
5.64
2.49
3.16
8.70
4.53
2.45
2.90
3.95
3.28
4.13
0.00
7.35
18.10
1.75
0.53
1.09
2.50
1.24
1.17
1.26
1.07
2.00
2.96
2.23
100.00
1.89
16.64
1.28
7.20
8.21
5.52
14.91
6.52
4.49
4.01
1.69
2.87
4.71
14.66
21.93
13.49
5.95
28.67
25.85
3.42
2.40
3.35
8.49
8.92
3.21
4.27
100.00
2.95
0.75
1.52
3.38
2.21
2.64
5.07
0.90
1.09
8.63
15.78
4.41
3.45
5.26
9.52
1.09
0.93
0.58
2.39
1.70
0.52
0.27
0.62
0.68
24.08
1.89
100.00
9.13
15.91
19.81
8.83
31.51
15.58
19.21
14.96
7.50
13.93
18.24
28.02
30.26
25.13
7.14
0.71
2.65
1.72
2.82
1.35
1.08
1.08
0.54
3.30
0.86
16.24
100.00
24.24
10.14
3.31
10.94
12.68
13.29
6.20
3.94
10.45
13.09
6.03
6.58
8.73
3.52
0.43
0.33
1.26
1.54
1.00
0.44
0.73
0.41
11.07
1.03
16.91
14.48
100.00
11.11
7.40
17.17
8.33
14.54
4.74
4.32
12.70
18.68
10.78
39.47
30.69
0.21
0.57
0.58
0.52
0.21
0.26
0.26
0.35
0.29
3.30
0.60
5.50
1.58
2.90
100.00
0.33
6.42
6.16
4.67
4.74
2.81
6.35
6.62
7.76
8.33
8.20
4.91
1.85
1.91
1.89
0.97
0.85
1.05
1.73
0.67
9.71
1.72
10.74
2.26
8.46
1.45
100.00
11.13
4.35
5.39
4.38
15.38
5.94
4.12
5.17
20.61
14.02
0.41
0.21
0.17
0.89
1.06
0.39
0.15
0.31
0.21
15.34
1.20
22.42
4.37
11.49
16.43
6.51
100.00
36.23
38.06
36.13
5.63
15.57
15.74
10.34
29.39
27.51
0.10
0.21
0.25
0.40
0.34
0.17
0.02
0.73
0.13
3.50
1.20
5.77
2.64
2.90
8.21
1.32
18.87
100.00
17.41
24.82
2.81
8.20
9.26
9.48
4.82
11.90
0.21
0.00
0.17
0.80
0.93
0.52
0.05
0.00
0.36
4.85
0.43
14.36
5.58
10.23
12.56
3.31
40.00
35.14
100.00
55.47
6.57
11.07
20.15
12.93
13.60
19.58
0.00
0.07
0.17
0.28
0.34
0.26
0.02
0.04
0.15
2.14
0.26
5.50
1.28
1.64
6.28
1.32
18.68
24.64
27.29
100.00
2.63
5.53
8.82
7.33
9.65
10.85
6.41
3.57
0.83
0.92
0.58
0.33
0.91
1.27
0.36
1.75
3.95
5.37
1.58
2.90
7.25
9.05
5.66
5.43
6.28
5.11
100.00
19.47
8.09
12.07
14.04
20.37
0.57
1.00
0.08
0.28
0.39
0.04
0.00
0.00
0.00
2.72
6.61
9.13
3.85
7.83
14.98
3.20
14.34
14.49
9.69
9.85
17.82
100.00
18.09
18.10
26.75
47.35
0.21
0.43
0.41
1.48
1.32
0.63
0.08
0.04
0.81
6.21
2.58
16.64
6.71
16.04
21.74
3.09
20.19
22.83
24.60
21.90
10.32
25.20
100.00
19.83
14.91
29.37
0.21
0.36
0.50
0.86
0.35
0.41
0.02
0.19
0.68
6.60
0.69
8.72
1.06
3.16
8.70
1.32
4.53
7.97
5.39
6.20
5.25
8.61
6.76
100.00
9.65
11.11
0.26
0.21
0.08
0.18
0.23
0.15
0.10
0.27
0.07
9.71
1.03
9.26
1.13
11.36
9.18
5.19
12.64
3.99
5.57
8.03
6.00
12.50
5.00
9.48
100.00
35.45
0.31
0.43
0.17
0.23
0.22
0.07
0.11
0.19
0.03
9.90
3.09
12.75
2.49
14.65
14.98
5.85
19.62
16.30
13.29
14.96
14.45
36.68
16.32
18.10
58.77
100.00
1.Pea
ce
2.Affe
ction
3.Est
eem
4.Ant
icipat
ion
5.Eng
agem
ent
6.Con
fiden
ce
7.Hap
pine
ss
8.Ple
asur
e
9.Exc
item
ent
10.S
urpr
ise
11.S
ympa
thy
12.D
oubt
/Con
fusion
13.D
isco
nnec
tion
14.F
atigue
15.E
mba
rrass
men
t
16.Y
earn
ing
17.D
isap
prov
al
18.A
vers
ion
19.A
nnoy
ance
20.A
nger
21.S
ensitiv
ity
22.S
adne
ss
23.D
isqu
ietm
ent
24.F
ear
25.P
ain
26.S
uffe
ring
1.Peace
2.Affectio
n
3.Esteem
4.Anticipatio
n
5.Engagement
6.Confid
ence
7.Happiness
8.Pleasure
9.Excitement
10.Surprise
11.Sympathy
12.Doubt/C
onfusion
13.Disconnectio
n
14.Fatigue
15.Embarrassment
16.Yearning
17.Disapproval
18.Aversion
19.Annoyance
20.Anger
21.Sensitivity
22.Sadness
23.Disquietm
ent
24.Fear
25.Pain
26.Sufferin
g
Figure 9: Category co-ocurrance probabilities.
Original Image Class Activation Map Classification
Scene Categories:
Home o�ce,
Living Room
Scene Attributes:
No Horizon,
Enclosed Area,
Man-Made,
Dry, Metal
Scene Categories:
Ocean, Coast
Scene Attributes:
Warm, Open Area,
Far-Away Horizon,
Natural Light, Cloth
Figure 10: Examples of scene category and attribute recog-
nition in images from the EMOTIC dataset, with the corre-
sponding class activation map.
Using the classification labels obtained with these CNN
models we can now study patterns on emotion distribu-
tion in different scene categories and under different scene
attributes. Using our data, we computed the probabil-
ity of each emotion at each place as P (emo|place) =Nemo/Npeople, where Npeople is the number of people
that we observed in the specific place category, and Nemois the number of those people that have been labeled with
the specific emo category. Figure 11 shows representa-
tive examples of the emotion category distribution in dif-
ferent scenes. We see that, in our data, there are differ-
ent patterns. For instance, the most frequent emotions in a
bedroom are engagement, happiness, pleasure, peace,
and affection, while in a baseball field, the most frequent
emotions are engament, anticipation, confidence, and
excitement. Among these examples, we also see that emo-
tions like disquietment, annoyance, or anger are more
frequent in bedrooms (where people have some privacy de-
gree), in offices (where people can feel tired of working) or
67
Bedroom
1.Peace
3.Anticipation
4.Esteem
6.Engag
ement
7.Pleasure
8.Hap
piness
9.Excitement
10.Surprise
12.Disap
proval
13.Yearning
14.Fatigue
15.Pain
16.Doubt/Confusion
17.Fear
18.Disquietm
ent
19.Annoyance
20.Anger
21.Aversion
22.Sad
ness
23.Disconnection
24.Embarrassment
25.Sensitivity
26.Sym
pathy
Game Room
1.Peace
3.Anticipation
4.Esteem
6.Engag
ement
7.Pleasure
8.Hap
piness
9.Excitement
10.Surprise
12.Disap
proval
13.Yearning
14.Fatigue
15.Pain
16.Doubt/Confusion
17.Fear
18.Disquietm
ent
19.Annoyance
20.Anger
21.Aversion
22.Sad
ness
23.Disconnection
24.Embarrassment
25.Sensitivity
26.Sym
pathy
Waiting Room
1.Peace
3.Anticipation
4.Estee
m
6.Engag
emen
t
7.Pleasure
8.Hap
piness
9.Excitem
ent
10.Surprise
12.Disap
proval
13.Yearning
14.Fatigue
15.Pain
16.Doubt/Confusion
17.Fear
18.Disquietm
ent
19.Annoyance
20.Anger
21.Aversion
22.ad
ness
23.Disconnection
24.Embarrassmen
t
25.Sen
sitivity
26.Sym
pathy
Office
1.Peace
3.Anticipation
4.Estee
m
6.Engag
emen
t
7.Pleasure
8.Hap
piness
9.Excitem
ent
10.Surprise
12.Disap
proval
13.Yearning
14.Fatigue
15.Pain
16.Doubt/Confusion
17.Fear
18.Disquietm
ent
19.Annoyance
20.Anger
21.Aversion
22.Sad
ness
23.Disconnection
24.Embarrassmen
t
25.Sen
sitivity
26.Sym
pathy
Hospital Room
1.Pe
ace
3.Anticipation
4.Esteem
6.En
gag
emen
t
7.Pleasure
8.Hap
piness
9.Excitemen
t
10.Surprise
12.Disap
proval
13.Yearning
14.Fatigue
15.Pain
16.Doubt/Confusion
17.Fear
18.Disquietm
ent
19.Annoyance
20.Anger
21.Aversion
22.Sad
ness
23.Disconnection
24.Embarrassmen
t
25.Sen
sitivity
26.Sym
pathy
Probability
Probability
Probability
Probability
Probability
Baseball Field
1.Pe
ace
3.Anticipation
4.Esteem
6.En
gag
emen
t
7.Pleasure
8.Hap
piness
9.Excitemen
t
10.Surprise
12.Disap
proval
13.Yearning
14.Fatigue
15.Pain
16.Dou
bt/Con
fusion
17.Fear
18.Disquietmen
t
19.Annoy
ance
20.Anger
21.Aversion
22.Sad
ness
23.Disconnection
24.Embarrassmen
t
25.Sen
sitivity
26.Sym
pathy
Probability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure 11: Emotion Distribution per place category
in hospital rooms (where people can be worried).
We can do a similar analysis using the recognized scene
attributes. Representative results are shown in Figure 12.
Here, we also observe some interesting patterns. Among
these examples, we see that the category peace is signif-
icantly higher when scenes have the attribute vegetation.
This seems reasonable, since natural areas are more suitable
for relaxing. Similarly, the highest frequency for sympathyis shown for the attribute socializing, followed by the at-
tribute vegetation. Taking into account the 6 attributes
shown in Figure 12, it seems reasonable to see more fre-
quently sympathy in scenes with these two attributes than,
for instance, in scenes of sports or competing. We also ob-
serve very high frequencies for the categories engegment,anticipation, confidence, and excitement in scenes with
the attributes sports and competing. Notice that, in gen-
eral, the most frequent emotions correlate with the most fre-
quent emotions in the whole database (see Figure 7), as ex-
pected. For instance, there are many people in the database
showing engagement, since usually people are ”involved
into something.” As a consequence, we see a high frequency
of the emotion engagement in all of the scene categories
and attributes. However, even though the main pattern of
the general emotion distribution is preserved, we see spe-
cific variations in each scene category and attribute.
Regarding the continuous dimensions, we also observe
interpretable patterns. Figure 13 shows an overview of
scene categories (top section of each box) and scene at-
tributes (low section of each box) with highest and lowest
valence, arousal, and dominance, respectively, in average.
6. Conclusions
In this paper we present the EMOTIC database, a
database of images with people in context annotated ac-
cording to their apparent emotional states. The EMOTIC
database combines 2 different annotation approaches: an
extended set of 26 emotional categories and a set of 3 con-
Vegetation
0
0.1
0.2
0.3
0.4
0.5
0.6
1.Pe
ace
2.A�e
ction
3.Anticipation
4.Esteem
5.Co
n�de
nce
6.En
gage
men
t
7.Pleasure
8.Hap
pine
ss
9.Excitemen
t
10.Surprise
11.Su�
ering
12.Disap
prov
al
13.Yearning
14.Fatigue
15.Pain
16.Dou
bt/Con
fusion
17.Fear
18.Disqu
ietm
ent
19.Ann
oyan
ce
20.Ang
er
21.Aversion
22.Sad
ness
23.Disconn
ectio
n
24.Emba
rrassm
ent
25.Sen
sitiv
ity
26.Sym
pathy
0
0.1
0.2
0.3
0.4
0.5
0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
Working
1.Pe
ace
2.A�e
ction
3.Anticipation
4.Esteem
5.Co
n�de
nce
6.En
gage
men
t
7.Pleasure
8.Hap
pine
ss
9.Excitemen
t
10.Surprise
11.Su�
ering
12.Disap
prov
al
13.Yearning
14.Fatigue
15.Pain
16.Dou
bt/Con
fusion
17.Fear
18.Disqu
ietm
ent
19.Ann
oyan
ce
20.Ang
er
21.Aversion
22.Sad
ness
23.Disconn
ectio
n
24.Emba
rrassm
ent
25.Sen
sitiv
ity
26.Sym
pathy0
0.1
0.2
0.3
0.4
0.5
0.6
1.Pe
ace
2.A�e
ction
3.Anticipation
4.Esteem
5.Co
n�de
nce
6.En
gage
men
t
7.Pleasure
8.Hap
pine
ss
9.Excitemen
t
10.Surprise
11.Su�
ering
12.Disap
prov
al
13.Yearning
14.Fatigue
15.Pain
16.Dou
bt/Con
fusion
17.Fear
18.Disqu
ietm
ent
19.Ann
oyan
ce
20.Ang
er
21.Aversion
22.Sad
ness
23.Disconn
ectio
n
24.Emba
rrassm
ent
25.Sen
sitiv
ity
26.Sym
pathy
Competing Training
1.Pe
ace
2.A�e
ction
3.Anticipation
4.Esteem
5.Co
n�de
nce
6.En
gage
men
t
7.Pleasure
8.Hap
pine
ss
9.Excitemen
t
10.Surprise
11.Su�
ering
12.Disap
prov
al
13.Yearning
14.Fatigue
15.Pain
16.Dou
bt/Con
fusion
17.Fear
18.Disqu
ietm
ent
19.Ann
oyan
ce
20.Ang
er
21.Aversion
22.Sad
ness
23.Disconn
ectio
n
24.Emba
rrassm
ent
25.Sen
sitiv
ity
26.Sym
pathy
Socializing
Sports
0
0.1
0.2
0.3
0.4
0.5
0.6
1.Pe
ace
2.A�e
ction
3.Anticipation
4.Esteem
5.Co
n�de
nce
6.En
gage
men
t
7.Pleasure
8.Hap
pine
ss
9.Excitemen
t
10.Surprise
11.Su�
ering
12.Disap
prov
al
13.Yearning
14.Fatigue
15.Pain
16.Dou
bt/Con
fusion
17.Fear
18.Disqu
ietm
ent
19.Ann
oyan
ce
20.Ang
er
21.Aversion
22.Sad
ness
23.Disconn
ectio
n
24.Emba
rrassm
ent
25.Sen
sitiv
ity
26.Sym
pathy
Probability
Probability
Probability
Probability
Probability
1.Pe
ace
2.A�e
ction
3.Anticipation
4.Esteem
5.Co
n�de
nce
6.En
gage
men
t
7.Pleasure
8.Hap
pine
ss
9.Excitemen
t
10.Surprise
11.Su�
ering
12.Disap
prov
al
13.Yearning
14.Fatigue
15.Pain
16.Dou
bt/Con
fusion
17.Fear
18.Disqu
ietm
ent
19.Ann
oyan
ce
20.Ang
er
21.Aversion
22.Sad
ness
23.Disconn
ectio
n
24.Emba
rrassm
ent
25.Sen
sitiv
ity
26.Sym
pathy
Probability
Figure 12: Emotion Distribution per place attribute
Valence Arousal Dominance
Highest
Lowest
Foliage, Snow,
Sailing/Boating,
Warm, Socializing
Sailing/Boating,
Competing, Sports,
Ocean, Snow
Enclosed Area,
Competing, Dry
Driving, Working
Socializing, Working,
Enclose Area,
Nohorizon, Trees
Snow, Ocean,
Sports, Competing,
Open Area
Wood, Enclosed Area,
Cloth, Working,
Man-Made
Wheat Field,
Orchard, Veranda,
Sky Slope, Food Court
Locker Room,
Conference Room,
O!ce, Slum,
Conference Center
Ocean, Baseball
Stadium, Baseball
Field, Sky Resort,
Football Stadium
Hotel Room,
Conference Room,
O!ce, Hospital Room,
Bedroom
Ocean,
Sky Slope,
Raft, Highway,
Forest Road
Hospital,
Conference Center,
Hospital Room, Shower,
Auditorium
Figure 13: Place categories and attributes with highest and
lowest average scores for valence, arousal, and dominance.
tinuous dimensions (Valence, Arousal, and Dominance). In
this paper we present the details on the EMOTIC database
creation and statistics. We also present an algorithmic anal-
ysis of the data performed using state-of-the-art methods for
scene category and scene attribute recognition. Our results
suggest that current scene understanding techniques can be
used to incorporate the analysis of the context for emotional
states recognition. Thus, the EMOTIC dataset, in combina-
tion with previous datasets on emotion estimation, can open
the door to new approaches for apparent emotion estimation
in the wild from visual information.
Acknowledgments
This work has been partially supported by the Ministerio
de Economia, Industria y Competitividad (Spain), under the
Grant Ref. TIN2015-66951-C2-2-R. The authors also thank
NVIDIA for their generous hardware donations.
References
[1] GENKI database. http://mplab.ucsd.edu/
wordpress/?page_id=398. Accessed: 2017-04-12.
68
[2] ICML face expression recognition dataset. https://
goo.gl/nn9w4R. Accessed: 2017-04-12.
[3] T. Banziger, H. Pirker, and K. Scherer. Gemep-geneva mul-
timodal emotion portrayals: A corpus for the study of multi-
modal emotional expressions. In Proceedings of LREC, vol-
ume 6, pages 15–019, 2006.
[4] L. F. Barrett, B. Mesquita, and M. Gendron. Context in emo-
tion perception. Current Directions in Psychological Sci-
ence, 20(5):286–290, 2011.
[5] C. F. Benitez-Quiroz, R. Srinivasan, and A. M. Martinez.
Emotionet: An accurate, real-time algorithm for the auto-
matic annotation of a million facial expressions in the wild.
In Proceedings of IEEE International Conference on Com-
puter Vision & Pattern Recognition (CVPR16), Las Vegas,
NV, USA, 2016.
[6] A. Dhall et al. Collecting large, richly annotated facial-
expression databases from movies. 2012.
[7] A. Dhall, R. Goecke, J. Joshi, J. Hoey, and T. Gedeon.
Emotiw 2016: Video and group-level emotion recogni-
tion challenges. In Proceedings of the 18th ACM Interna-
tional Conference on Multimodal Interaction, pages 427–
432. ACM, 2016.
[8] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static fa-
cial expression analysis in tough conditions: Data, evalua-
tion protocol and benchmark. In Computer Vision Workshops
(ICCV Workshops), 2011 IEEE International Conference on,
pages 2106–2112. IEEE, 2011.
[9] A. Dhall, J. Joshi, I. Radwan, and R. Goecke. Finding hap-
piest moments in a social context. In Asian Conference on
Computer Vision, pages 613–626. Springer, 2012.
[10] S. Du, Y. Tao, and A. M. Martinez. Compound facial expres-
sions of emotion. Proceedings of the National Academy of
Sciences, 111(15):E1454–E1462, 2014.
[11] P. Ekman and W. V. Friesen. Constants across cultures in the
face and emotion. Journal of personality and social psychol-
ogy, 17(2):124, 1971.
[12] S. Eleftheriadis, O. Rudovic, and M. Pantic. Discriminative
shared gaussian processes for multiview and view-invariant
facial expression recognition. IEEE transactions on image
processing, 24(1):189–204, 2015.
[13] S. Eleftheriadis, O. Rudovic, and M. Pantic. Joint facial ac-
tion unit detection and feature fusion: A multi-conditional
learning approach. IEEE Transactions on Image Processing,
25(12):5727–5742, 2016.
[14] E. G. Fernandez-Abascal, B. Garcıa, M. Jimenez, M. Martın,
and F. Domınguez. Psicologıa de la emocion. Editorial Uni-
versitaria Ramon Areces, 2010.
[15] E. Friesen and P. Ekman. Facial action coding system: a
technique for the measurement of facial movement. Palo
Alto, 1978.
[16] A. Kleinsmith and N. Bianchi-Berthouze. Recognizing af-
fective dimensions from body posture. In Proceedings of the
2Nd International Conference on Affective Computing and
Intelligent Interaction, ACII ’07, pages 48–58, Berlin, Hei-
delberg, 2007. Springer-Verlag.
[17] A. Kleinsmith, N. Bianchi-Berthouze, and A. Steed. Au-
tomatic recognition of non-acted affective postures. IEEE
Transactions on Systems, Man, and Cybernetics, Part B (Cy-
bernetics), 41(4):1027–1038, Aug 2011.
[18] R. Kosti, J. M. Alvarez, A. Recasens, and A. Lapedriza.
Emotion recognition in context. In The IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2017.
[19] Z. Li, J.-i. Imai, and M. Kaneko. Facial-component-based
bag of words and phog descriptor for facial expression recog-
nition. In SMC, pages 1353–1358, 2009.
[20] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B.
Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollar, and
C. L. Zitnick. Microsoft COCO: common objects in context.
CoRR, abs/1405.0312, 2014.
[21] A. Mehrabian. Framework for a comprehensive description
and measurement of emotional states. Genetic, social, and
general psychology monographs, 1995.
[22] M. A. Nicolaou, H. Gunes, and M. Pantic. Continuous pre-
diction of spontaneous affect from multiple cues and modali-
ties in valence-arousal space. IEEE Transactions on Affective
Computing, 2(2):92–105, 2011.
[23] M. Pantic and L. J. Rothkrantz. Expert system for automatic
analysis of facial expressions. Image and Vision Computing,
18(11):881–905, 2000.
[24] G. Patterson and J. Hays. Coco attributes: Attributes for
people, animals, and objects. In European Conference on
Computer Vision, pages 85–100. Springer, 2016.
[25] R. Picard. Affective computing, volume 252. MIT press Cam-
bridge, 1997.
[26] K. Schindler, L. Van Gool, and B. de Gelder. Recognizing
emotions expressed by body pose: A biologically inspired
neural model. Neural networks, 21(9):1238–1246, 2008.
[27] M. Soleymani, S. Asghari-Esfeden, Y. Fu, and M. Pantic.
Analysis of eeg signals and facial expressions for continuous
emotion detection. IEEE Transactions on Affective Comput-
ing, 7(1):17–28, 2016.
[28] J. L. Tracy, R. W. Robins, and R. A. Schriber. Development
of a facs-verified set of basic and self-conscious emotion ex-
pressions. Emotion, 9(4):554, 2009.
[29] J. F. C. W.-S. Chu, F. De la Torre. Selective transfer ma-
chine for personalized facial expression analysis. In IEEE
Transactions on Pattern Analysis and Machine Intelligence,
2016.
[30] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Tor-
ralba. Learning deep features for discriminative localization.
In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 2921–2929, 2016.
[31] B. Zhou, A. Khosla, A. Lapedriza, A. Torralba, and A. Oliva.
Places: An image database for deep scene understanding.
arXiv preprint arXiv:1610.02055, 2016.
[32] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva.
Learning deep features for scene recognition using places
database. In Advances in neural information processing sys-
tems, pages 487–495, 2014.
[33] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Tor-
ralba. Semantic understanding of scenes through ade20k
dataset. 2016.
69