Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect...

30
BJöRN SCHULLER | ANTON BATLINER COMPUTATIONAL PARALINGUISTICS EMOTION, AFFECT AND PERSONALITY IN SPEECH AND LANGUAGE PROCESSING

Transcript of Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect...

Page 1: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Björn Schuller | Anton BAtliner

Computational paralinguistiCs

Emotion, affECt and pErsonality in spEECh and languagE proCEssing

Page 2: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.
Page 3: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

COMPUTATIONALPARALINGUISTICS

Page 4: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.
Page 5: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

COMPUTATIONALPARALINGUISTICSEMOTION, AFFECT AND PERSONALITY INSPEECH AND LANGUAGE PROCESSING

Bjorn W. SchullerTechnische Universitat Munchen, Germany

Anton M. BatlinerFriedrich-Alexander-Universitat Erlangen-Nurnberg, Germany

Page 6: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

This edition first published 2014C© 2014 John Wiley & Sons, Ltd

Registered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply forpermission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright,Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in anyform or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UKCopyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not beavailable in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names andproduct names used in this book are trade names, service marks, trademarks or registered trademarks of theirrespective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparingthis book, they make no representations or warranties with respect to the accuracy or completeness of the contents ofthis book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It issold on the understanding that the publisher is not engaged in rendering professional services and neither thepublisher nor the author shall be liable for damages arising herefrom. If professional advice or other expertassistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Schuller, Bjorn.Computational paralinguistics : emotion, affect and personality in speech and language

processing / Bjorn Schuller, Anton Batliner. – First Edition.pages cm

Includes bibliographical references and index.ISBN 978-1-119-97136-8 (hardback)

1. Psycholinguistics–Data processing. 2. Linguistic models–Data processing. 3. Paralinguistics.4. Language and emotions. 5. Speech processing systems. 6. Human-computer interaction.7. Emotive (Linguistics) 8. Computational linguistics. I. Batliner, Anton. II. Title.

P37.5.D37S38 2014401′.90285–dc23

2013019325

A catalogue record for this book is available from the British Library.

Typeset in 10/12pt Times by Aptara Inc., New Delhi, India

1 2014

Page 7: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

To Dagmar and Gisela

Page 8: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.
Page 9: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Contents

Preface xiii

Acknowledgements xv

List of Abbreviations xvii

Part I Foundations

1 Introduction 31.1 What is Computational Paralinguistics? A First Approximation 31.2 History and Subject Area 71.3 Form versus Function 101.4 Further Aspects 12

1.4.1 The Synthesis of Emotion and Personality 121.4.2 Multimodality: Analysis and Generation 131.4.3 Applications, Usability and Ethics 15

1.5 Summary and Structure of the Book 17References 18

2 Taxonomies 212.1 Traits versus States 212.2 Acted versus Spontaneous 252.3 Complex versus Simple 302.4 Measured versus Assessed 312.5 Categorical versus Continuous 332.6 Felt versus Perceived 352.7 Intentional versus Instinctual 372.8 Consistent versus Discrepant 382.9 Private versus Social 392.10 Prototypical versus Peripheral 402.11 Universal versus Culture-Specific 412.12 Unimodal versus Multimodal 432.13 All These Taxonomies – So What? 44

2.13.1 Emotion Data: The FAU AEC 452.13.2 Non-native Data: The C-AuDiT corpus 47References 48

Page 10: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

viii Contents

3 Aspects of Modelling 533.1 Theories and Models of Personality 533.2 Theories and Models of Emotion and Affect 553.3 Type and Segmentation of Units 583.4 Typical versus Atypical Speech 603.5 Context 613.6 Lab versus Life, or Through the Looking Glass 623.7 Sheep and Goats, or Single Instance Decision versus Cumulative Evidence

and Overall Performance 643.8 The Few and the Many, or How to Analyse a Hamburger 653.9 Reifications, and What You are Looking for is What You Get 673.10 Magical Numbers versus Sound Reasoning 68

References 74

4 Formal Aspects 794.1 The Linguistic Code and Beyond 794.2 The Non-Distinctive Use of Phonetic Elements 81

4.2.1 Segmental Level: The Case of /r/ Variants 814.2.2 Supra-segmental Level: The Case of Pitch and Fundamental

Frequency – and of Other Prosodic Parameters 824.2.3 In Between: The Case of Other Voice Qualities, Especially

Laryngealisation 864.3 The Non-Distinctive Use of Linguistics Elements 91

4.3.1 Words and Word Classes 914.3.2 Phrase Level: The Case of Filler Phrases and Hedges 94

4.4 Disfluencies 964.5 Non-Verbal, Vocal Events 984.6 Common Traits of Formal Aspects 100

References 101

5 Functional Aspects 1075.1 Biological Trait Primitives 109

5.1.1 Speaker Characteristics 1115.2 Cultural Trait Primitives 112

5.2.1 Speech Characteristics 1145.3 Personality 1155.4 Emotion and Affect 1195.5 Subjectivity and Sentiment Analysis 1235.6 Deviant Speech 124

5.6.1 Pathological Speech 1255.6.2 Temporarily Deviant Speech 1295.6.3 Non-native Speech 130

5.7 Social Signals 1315.8 Discrepant Communication 135

5.8.1 Indirect Speech, Irony, and Sarcasm 1365.8.2 Deceptive Speech 1385.8.3 Off-Talk 139

Page 11: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Contents ix

5.9 Common Traits of Functional Aspects 140References 141

6 Corpus Engineering 1596.1 Annotation 160

6.1.1 Assessment of Annotations 1616.1.2 New Trends 164

6.2 Corpora and Benchmarks: Some Examples 1646.2.1 FAU Aibo Emotion Corpus 1656.2.2 aGender Corpus 1656.2.3 TUM AVIC Corpus 1666.2.4 Alcohol Language Corpus 1686.2.5 Sleepy Language Corpus 1686.2.6 Speaker Personality Corpus 1696.2.7 Speaker Likability Database 1706.2.8 NKI CCRT Speech Corpus 1716.2.9 TIMIT Database 1716.2.10 Final Remarks on Databases 172References 173

Part II Modelling

7 Computational Modelling of Paralinguistics: Overview 179References 183

8 Acoustic Features 1858.1 Digital Signal Representation 1858.2 Short Time Analysis 1878.3 Acoustic Segmentation 1908.4 Continuous Descriptors 190

8.4.1 Intensity 1908.4.2 Zero Crossings 1918.4.3 Autocorrelation 1928.4.4 Spectrum and Cepstrum 1948.4.5 Linear Prediction 1988.4.6 Line Spectral Pairs 2028.4.7 Perceptual Linear Prediction 2038.4.8 Formants 2058.4.9 Fundamental Frequency and Voicing Probability 2078.4.10 Jitter and Shimmer 2128.4.11 Derived Low-Level Descriptors 214References 214

9 Linguistic Features 2179.1 Textual Descriptors 2179.2 Preprocessing 218

Page 12: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

x Contents

9.3 Reduction 2189.3.1 Stopping 2189.3.2 Stemming 2199.3.3 Tagging 219

9.4 Modelling 2209.4.1 Vector Space Modelling 2209.4.2 On-line Knowledge 222References 227

10 Supra-segmental Features 23010.1 Functionals 23110.2 Feature Brute-Forcing 23210.3 Feature Stacking 233

References 234

11 Machine-Based Modelling 23511.1 Feature Relevance Analysis 23511.2 Machine Learning 238

11.2.1 Static Classification 23811.2.2 Dynamic Classification: Hidden Markov Models 25611.2.3 Regression 262

11.3 Testing Protocols 26411.3.1 Partitioning 26411.3.2 Balancing 26611.3.3 Performance Measures 26711.3.4 Result Interpretation 272References 277

12 System Integration and Application 28112.1 Distributed Processing 28112.2 Autonomous and Collaborative Learning 28412.3 Confidence Measures 286

References 287

13 ‘Hands-On’: Existing Toolkits and Practical Tutorial 28913.1 Related Toolkits 28913.2 openSMILE 290

13.2.1 Available Feature Extractors 29313.3 Practical Computational Paralinguistics How-to 294

13.3.1 Obtaining and Installing openSMILE 29513.3.2 Extracting Features 29513.3.3 Classification and Regression 302References 303

14 Epilogue 304

Page 13: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Contents xi

Appendix 307A.1 openSMILE Feature Sets Used at Interspeech Challenges 307A.2 Feature Encoding Scheme 310

References 314

Index 315

Page 14: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.
Page 15: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Preface

It might be safe to claim that 20 years ago, neither the term ‘computational paralinguistics’nor the field it denotes existed. Some 10 years ago, the term did not yet exist either. However,in hindsight, the field had begun to exist if we think of the first steps towards the automaticprocessing of emotions in speech in the mid-1990s. For example, Picard’s book on AffectiveComputing published in 1997, and the International Speech Communication Association(ISCA) workshop on emotion and speech in 2000, just to mention some of the many topicsand events related to and belonging to computational paralinguistics. The term ‘paralinguistics’had already been coined in the 1950s – with different broad or narrow denotations; we will tryand sketch this history in Part I of this book. Yet, in the realm of ‘hard core’ automatic (i.e.,computational) processing of speech, the topic was still not fully acknowledged; as one of ourcolleagues said: ‘Emotion recognition, that’s esoterics with HMMs.’

Today, it might be safe to claim that computational paralinguistics has been established asa discipline in its own right – although surprisingly, not the term itself. It is only natural that,as a new and still somewhat exotic field, it has to cope with prejudices on the one hand, andunrealistic promises on the other hand.

This book represents the first attempt towards a unified overview of the field, its extremelyramified and diverse ‘genealogy’, its methodology, and the state of the art. ‘Computationalparalinguistics’ is not an established subject that can be studied, and this fact is mirroredin the ‘scientific CVs’ of both authors. B.S. studied electrical engineering and informationtechnology. However, his doctoral thesis dealt with one aspect of computational paralinguistics:the automatic recognition of human emotion in speech. During his habilitation period hebroadened the scope of his work to ‘intelligent audio analysis’ – dealing with quite a numberof further paralinguistic aspects, including those found in sung language and many other audioprocessing problems such as emotion in music and general sound. At the time of finalisationof the manuscript, he was a professor in computer science. A.B. started within philology, camefrom diachronic phonology to phonetics in general to prosody in particular, and via prosodyin the interface for and within syntax and semantics to the automatic processing of acted andvery soon naturalistic emotions – realising, moreover, that he had been dealing with differenttopics for a long time that all can be located within (computational) paralinguistics.

Originally, the intended focus of this book was on the computational processing of emotionsand affect in speech and language, taking into account personality as well. In the earlyconceptual stages, however, we realised that this would be sub-optimal and thus decided todeal with everything ‘besides’ linguistics – namely, the computational processing of ‘para’-linguistics in a broad sense. However, we confine ourselves to the acoustic/phonetic/linguistic

Page 16: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

xiv Preface

aspects, that is, we only deal with one modality, namely speech/language including non-verbalcomponents, and disregard other modalities such as facial gestures or body posture. Moreover,we do not aim at a complete description of human–human or human–machine communicationwhich would include the generation and production (synthesis) of speech, the interaction withother components within a multimodal system, the role within application systems, or real-lifeapplications and their evaluation. Apart from the fact that most of these aspects would notbe part of our core competence, we feel that it makes sense to try to establish computationalparalinguistics as one building block amongst several others. Besides, there are already goodoverview and introductory books available on these other topics. And last but not least, itwould be rather too complex for one book.

We wish to provide the reader with a sort of map presenting an overview of the field, anduseful for finding one’s own way through. The scale of this map is medium-size, and we canonly display a few of the houses in this virtual paralinguistic ‘city’ with their interiors, on anexemplary basis. In so doing, we hope to provide guidelines for the novice and to present atleast a few new insights and perspectives to the expert. Many studies are referred to and coreresults are summarised. For all of them, the caveat holds that basically all such studies arerestricted – confined to a specific choice of subjects, research questions, operationalisations,and features employed, just to mention a few of the decisive factors. There are errors such asthe famous erroneous decimal point that made spinach more healthy than anything else – notethat reports of this error might be erroneous themselves. And of course, there is much morethat can go wrong – and hopefully, we will find out: scrutiny of results and replications ofstudies will eventually converge to more stable claims.

We decided not to describe basic phonetic/linguistic knowledge such as vowel or consonantcharts, details on pitch versus F0, loudness, morphological and grammatical systems, basicson production and perception of speech, and the like. Such information can easily be obtainedin introductory and overview books from the respective fields, as well as from online sources.In a similar way, selections had to be made for the computation aspects. For example, manyapproaches to linguistic modelling exist, and the fields of machine learning and signal pro-cessing each deserve at least one book in its own right. Thus, we limited our choices to themethods most established and common in the field – serving as a solid basis and inspirationfor the interested reader to look further. A connected resource of information is the book’shomepage found at http://www.cp.openaudio.eu which includes features such as links to theopenSMILE toolkit and (part of) the data described.

Bjorn W. Schuller and Anton M. BatlinerMunich, February 2013

Page 17: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Acknowledgements

We would like to thank in particular Florian Eyben and Felix Weninger at Technische Uni-versitat Munchen in Munich, Germany, for their great help in finishing Chapters 7–13; wealso extend our thanks to Martina Rompp for her help with the illustrations in this part,and to Tobias Bocklet and Florian Honig at the Friedrich Alexander Universitat in Erlangen-Nuremberg, Germany, for commenting on Chapter 5.

We stand on the proverbial shoulders of giants – and of many other ‘normal’ people whoall contributed to our understanding. In the same way, we are grateful to colleagues who donot share our points of view: opinions are not only reinforced by encouragement but also bydissent which helps us to rethink or revise our own approaches and theoretical positions, orsimply to stick with them.

There are too many people in the field(s) that supported us, discussed with us, and exchangedideas with us, too many to list in detail, so we wish to express our gratitude to all of them in ageneric way.

For the encouragement in the first place, excellent continued support and guidance as wellas patience, we sincerely thank the editor and publisher – John Wiley & Sons.

Finally, we would like to thank the HUMAINE Association (henceforth the Associationfor the Advancement of Affective Computing) for providing an excellent network not onlyfor affective computing, but also for the broader field of computational paralinguistics. Theauthors also acknowledge funding from the European Commission under grant agreementno. 289021 from the ASC Inclusion Project committed to provide interactive serious emotiongames for children with autism spectrum condition.

Page 18: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.
Page 19: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

List of Abbreviations

ACF Autocorrelation FunctionACM Association for Computing MachineryAD(H)D Attention Deficit (Hyperactivity) DisorderAEC (FAU) Aibo Emotion CorpusAL Active LearningALC Alcohol Language CorpusAM Acoustic ModelANN Artificial Neural NetworkAPI Application Programming InterfaceARFF Attribute Relation File Format (WEKA)ARMA Autoregressive Moving AverageASC Autism Spectrum ConditionASCII American Standard Code for Information ExchangeASD Autism Spectrum DisorderASR Automatic Speech RecognitionAUC Area Under CurveAVEC Audiovisual Emotion ChallengeAVIC (TUM) Audiovisual Interest CorpusBAC Blood Alcohol ContentBES Berlin Emotional Speech DatabaseBFI Big Five InventoryBLSTM Bidirectional Long Short-Term MemoryBoCNG Bag-of-Character-N -GramsBoNG Bag-of-N -GramsBoW Bag-of-wordsBPTT Back-Propagation Through TimeBRAC Breath Alcohol Concentration TestBRNN Bidirectional Recurrent Neural NetworkC-AuDiT Computer-Assisted Pronunciation and Dialogue TrainingCALL Computer-Aided Language LearningCAPT Computer-Aided Pronunciation TrainingCC (Pearson) Correlation CoefficientCD Compact Disc

Page 20: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

xviii List of Abbreviations

CEICES Combining Efforts for Improving Automatic Classification of Emotion inSpeech

CFS Correlation based Feature SelectionCGN Spoken Dutch Corpus from Centre for Genetic Resources, the NetherlandsCR Compression RateCSV Comma Separated Value (format)DAT Digital Audio TapeDBN Dynamic Bayesian NetworksDCT Discrete Cosine TransformationDES Danish Emotional Speech (Database)DET Detection Error Trade-off (curve)DFT Discrete Fourier TransformDLL Dynamic Link LibraryDT Decision Tree (machine learning) / Determiner (linguistics)EC Emotion Challenge (Interspeech 2009)ECA Embodied Conversational AgentEER Equal Error RateEM Expectation MaximisationEMMA Extensible Multi-Modal Annotation (markup language, XML-style)ERB Equivalent Rectangular BandwidthETSI European Telecommunications Standards InstituteEU European UnionEWE Evaluator Weighted EstimatorF0 Fundamental FrequencyFAU Friedrich Alexander UniversityFFT Fast Fourier TransformationFN False NegativesFNN Feed-forward Neural NetworkFNR False Negative Rate/RatioFP False PositivesFPR False Positive Rate/RatioGM Gaussian MixtureGMM Gaussian Mixture ModelGPL GNU/General Public LicenceGUI Graphical User InterfaceHMM Hidden Markov ModelHNR Harmonics-to-Noise RatioHTK Hidden Markov (model) ToolkitICA Independent Component AnalysisIDF Inverse Document FrequencyIG Information GainIGR Information Gain RatioIIR Infinite Impulse ResponseIP Interruption PointISCA International Speech Communication AssociationISLE Italian and German Spoken Learners English (corpus)

Page 21: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

List of Abbreviations xix

ITU International Telecommunication UnionKL Kullback–Leibler (divergence/distance)L1 first languageL2 second languageLDA Linear Discriminant AnalysisLDC Linguistic Data ConsortiumLIWC Linguistic Inquiry and Word CountLLD Low-Level DescriptorLM Language ModelLOO Leave One OutLP Linear Predictor / Linear PredictionLPC Linear Predictive CodingLPCC Linear Predictive Cepstral CoefficientLSF Line Spectral (pair) FrequencyLSP Line Spectral PairLSTM Long Short-Term MemoryLSTM-RNN Long Short-Term Memory Recurrent Neural NetworkLVCSR Large Vocabulary Continuous Speech RecognitionMAE Mean Absolute ErrorMAPE Mean Absolute Percentage ErrorMFB Mel-Frequency BandMFCC Mel-Frequency Cepstral CoefficientMIML Multimodal Interaction Markup LanguageML Maximum LikelihoodMLE Mean Linear ErrorMLP Multilayer PerceptronMSE Mean Square ErrorNCSC NKI CCRT Speech CorpusNHD Null Hypothesis DecisionNHR Noise-to-Harmonics RatioNHT Null Hypothesis TestingNL Non-likeableNMF Non-negative Matrix FactorisationNN Noun (linguistic)NP Noun Phrase (linguistic)NPV Negative Predictive ValueOCEAN Openness, Conscientiousness, Extraversion, Agreeableness, NeuroticismOOV Out Of Vocabulary (words)openEAR open-source Emotion and Affect Recognition (toolkit)openSMILE open-source Speech and Music Interpretation by Large Scale Extraction

(toolkit)PC Paralinguistic Challenge (Interspeech 2010)PCA Principal Component AnalysisPCM Pulse Code ModulationPDA Pitch Detection AlgorithmPDF Probability Density Function

Page 22: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

xx List of Abbreviations

PLP Perceptual Linear PredictionPLP-CC Perceptual Linear Prediction Cepstral CoefficientPLTT Post Laryngectomy Telephone TestPOS Part Of SpeechPP Prepositional Phrase (linguistic)PR PrecisionRASTA RelAtive SpecTrARASTA-PLP RelAtive SpecTrA Perceptual Linear Prediction (Coefficients)RE RecallRIFF Resource Interchange File FormatRMS Root Mean SquareRMSE Root Mean Squared ErrorRNN Recurrent Neural NetworkROC Receiver Operating CharacteristicSAL Sensitive Artificial ListenerSAMPA Speech Assessment Methods Phonetic AlphabetSEMAINE Sustained Emotionally coloured Machine-human Interaction using Non-

verbal Expression (project)SFFS Sequential Floating Forward SearchSFS Speech Filing SystemSHS Sub-Harmonic SummationSI International System of Units (French: Systeme international d’unites)SIFT Simplified Inverse Filtering TechniqueSIMIS Speech In Minimal Invasive Surgery (database)SLC Sleepy Language CorpusSLD Speaker Likability DatabaseSNR Signal-to-Noise RatioSP SPecificitySPC Speaker Personality CorpusSSC Speaker State ChallengeSTC Speaker Trait ChallengeSVM Support Vector MachineSVQ Split Vector QuantisationSVR Support Vector RegressionT-expression Ternary expressionTF Term FrequencyTFIDF Term Frequency and Inverse Document FrequencyTN True NegativesTP True PositivesTPR True Positive Rate/RatioTUM Technische Universitat MunchenUA Unweighted AccuracyUAR Unweighted Average RecallVAM Vera-Am-Mittag (German TV show, corpus)VB VerbVP Verb Phrase

Page 23: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

List of Abbreviations xxi

VQ Vector QuantisationWA Weighted Accuracy / Word Accuracy (ASR)WAR Weighted Average RecallWYALFIWYG What You Are Looking For Is What You GetWYSIWYG What You See Is What You GetXML eXtensible Mark-up LanguageZCR Zero Crossing Rate

Page 24: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.
Page 25: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Part IFoundations

Page 26: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.
Page 27: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

1Introduction

1.1 What is Computational Paralinguistics? A First Approximation

So difficult it is to show the various meanings and imperfections of words when we havenothing else but words to do it with.

(John Locke)

The term computational paralinguistics is not yet a well-established term, in contrast to com-putational linguistics or even computational phonetics; the reader might like to try comparingthe hits for each of these terms – or for any other combination of ‘computational’ with thename of a scientific field such as psychology or sociology – in a web search. This terminolog-ical gap is a little puzzling given the fact that there is a plethora of studies on, for example,affective computing (Picard 1997) and speech – which can partly be conceived as a sub-fieldof computational paralinguistics (as far as speech and language are concerned). But let us firsttake a look at the coarse meanings of the two words this term consists of: ‘computational’ and‘paralinguistics’.

Here, ‘computational’ means roughly that something is done by a computer and not by ahuman being; this can mean analysing the phenomenon in question, or generating humanlikebehaviour. Note that nowadays computers are used for practically all systematic and scientificwork, even if it is only for listing data, detailed information on subjects, or annotations inan ASCII (American Standard Code for Information Exchange) file. In traditional phoneticor psychological approaches, this can go along with the use of highly sophisticated signalextraction and statistical programs. A borderline between the ‘simple’ use of computers fortedious work and the use of computers for actually modelling and performing human behaviouris of course difficult to define. Here, we simply mean both: doing the work with the help ofcomputers, and letting computers do the work of analysing and processing.

‘Paralinguistics’ means ‘alongside linguistics’ (from the Greek preposition παρα); thusthe phenomena in question are not typical linguistic phenomena such as the structure of alanguage, its phonetics, its grammar (syntax, morphology), or its semantics. It is concernedwith how you say something rather than what you say.

Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing, First Edition.Bjorn W. Schuller and Anton M. Batliner. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.

Page 28: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

4 Computational Paralinguistics

the sciences of the universe andof the world and of everything

the sciences of mankind the sciences of everything else

communication anthropologysociologypsychology …

unimodality multimodality

speech & languagevocal factors vision: face, gesture, posture extracommununicative context

PARALINGUISTICS PhoneticsLinguistics

analysis synthesis

statestraits

gender, age, personality … intoxication, sleepiness, friendliness, mood … emotion

long-term short-term

observation, recording, annotation → automatic processing → classification, correlation

harnessing in end-to-end systems and applications

Figure 1.1 The realm of computational paralinguistics

In Figure 1.1 we try to narrow down the realm of paralinguistics in a reasonable way, as weconceive it and as we will deal with it in this book. Of course, there are other conceptualisationsof paralinguistics, some broader, some narrower in scope. Figure 1.1 is a sort of flowchart thatwe will follow from top to bottom. A grey font indicates fields and topics that are not part ofparalinguistics, for instance, the global science of mankind or of everything else that can befound in this world. Dashed lines lead to fields that are more or less disregarded in this book.

Page 29: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

Introduction 5

The first word shown in black is ‘communication’, denoting that interactions between humanbeings are focal. Paralinguistics deals with speech and language which both are primarilymeans of communication; even a soliloquy has to be overheard and eventually recorded andprocessed by the computer in order to be an object of investigation. The same holds for a privatediary in its written form: it might not be intended as communication with others, but as soon asit is read by someone else, it is. Of course, human communication is an important part of relatedfields such as psychology, sociology, or anthropology. Thus, we have to follow the flowchartfurther down to point out what distinguishes paralinguistics from all these related fields.

In traditional linguistics, the term ‘language’ refers to the (innate and/or acquired) mentalcompetence, and the term ‘speech’ to the performance, that is, to the ability to convert thiscompetence into motor signals, acoustic waves, and percepts. In this book we adhere to ashallower definition of these two terms, based on their use in speech and language technology.Language is more or less synonymous with ‘natural language’ which is modelled and processedwithin computational linguistics; speech is the object of investigation within automatic speechprocessing, that is, ‘spoken language’, as opposed to written language.

We want to restrict paralinguistics to the unimodal processing of events primarily producedwith the voice, or secondarily encoded in written language. ‘Communication’ is used in abroad sense: speech and language are primarily means of interaction between human beings;however, they can be decoupled from this function and analysed on their own, that is, whennot used in a communicative setting. Note that this sort of secondary communication is naturalfor written language, because here the communication of sender and addressee is normallydecoupled as for time and place.

We do not want to distinguish between extralinguistics and paralinguistics; Laver (1994,pp. 22f) attributes extralinguistics to informative functions denoting age, sex, and suchlike,and paralinguistics to communicative functions. Implicitly, extralinguistic functions are alwayscommunicated, as a sort of background; we will subsume these functions under biologicaltrait primitives; see Section 5.1. Moreover, it would be simply too cumbersome to introduce‘computational extralinguistics’ as an additional field.

There are alternative conceptualisations, the most important one arguably paralinguistics inthe sense of ‘multimodality’; this holds practically for every aspect, whether it be emotion, orpersonality, or social signals (see Chapter 5). Undoubtedly, the most natural human–humaninteraction is face-to-face, with each partner employing all the means available to them: voice,linguistic message, face, gestures, and body. However, there are some common and natural(interaction) scenarios where only the acoustic channel is used, for instance, in telephoneconversations or in radio plays. Moreover, it is simply natural that interaction/conversationpartners sometimes speak, and sometimes only listen. In the latter case, there is no speechavailable that can be investigated; in the first case, when people are talking, analysing faces ismore difficult because the movements of speech are superposed onto the facial gestures.

In this book, apart from vocal factors, speech, and language, everything else such as facialexpressions, gestures, body posture, and any extracommunicative context is not part of pa-ralinguistics. Needless to say, all these other aspects are very important within human–humanand human–machine communication; we will return to such multimodal aspects throughoutthis book.

Moreover, paralinguistics is sort of defined ex negativo; it comprises everything which is notthe object of investigation in phonetics or linguistics: It does not address the systematic aspectsof speech and language which are dealt with in sub-fields such as phonology, morphology,

Page 30: Computational paralinguisti Cs · 2013-10-03 · Computational paralinguistics : emotion, affect and personality in speech and language processing / Bj¨orn Schuller, Anton Batliner.

6 Computational Paralinguistics

syntax, or semantics. Note, however, that the use of specific phonological or grammatical struc-tures within a specific context may very well be object of investigation within paralinguistics.All this will be illustrated more extensively below.

In this book we will focus on analysis, basically excluding generation and synthesis. At firstsight, this might appear exotic: after all, analysis and generation/synthesis can be considered astwo sides of the same coin. Both are necessary for a complete account. However, methodologiesdiffer considerably; in Batliner and Mobius (2005) the methodological differences betweenanalysis on the one hand and generation/synthesis on the other hand have been detailed for themodelling and processing of prosody. From a methodological point of view, the analysis ofgestures has perhaps more in common with the analysis of speech than its synthesis. Moreover,analysis and not synthesis is the core competence of the authors. We therefore decided to treatsynthesis the same way as vision and extracommunicative context, as a fringe phenomenon inthis book.

So far, we have addressed broad fields of science, either including or excluding them fromour definition of paralinguistics. We will now briefly sketch the phenomena we are dealingwith, as well as the processing chain. All this will be dealt with in more depth in the chapters tofollow. In simple terms, paralinguistics deals with traits and states; traits are long-term events,whereas states are short-term. Examples are given in Figure 1.1. Typical traits are gender,age, and personality, and typical states are emotions. Then, there are phenomena which aresomehow in between: People can be friendly towards everybody, or, towards a specific person,only for a very short time. You can get tipsy, that is, intoxicated, for a short time, or you canbe a regular heavy drinker. The title of this book mentions three exemplary phenomena:

personality denoting long-term character traits which are specific to individualsor groups. In a broader sense, this encompasses everything that characterises aspecific individual, including traits such as age, gender, race, and suchlike. In anarrower sense, this encompasses psychological traits such as neuroticism.

emotion denoting short-term states: prototypical ones such as anger, fear, joy, orless prototypical ones such as surprise.

affect as a broader term, encompassing all kinds of manifestations of personalitysuch as mood, interpersonal stances, or attitudes as displayed in Table 2.1 – a verycommon term since Picard (1997).

The last terms to be commented upon in the title are speech and language processing: Inbasic research, the two fields of phonetics and linguistics deal with different data: phoneticswith (the production/acoustics/perception of) speech, and linguistics with (written) language.Accordingly, there are two different lines of research traditions in paralinguistics: one deal-ing mostly with the acoustic signal (called, for instance, ‘emotion/affect processing’), andone dealing exclusively with written language (called, for instance, ‘sentiment analysis’). Inautomatic speech processing, the approach is different: acoustic and linguistic informationare combined in a hybrid fashion. Following this tradition, we will address both acoustic andlinguistic phenomena in this book.

With observations, recordings, and annotations, we decide which phenomena we are dealingwith, and how long the single event takes. In computational paralinguistics, we then try toprocess these phenomena automatically. Ultimately, this means producing some performance