Speech perception and production

19
Overview Speech perception and production Elizabeth D. Casserly 1* and David B. Pisoni 2 Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context-free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however, research in these two domains of speech communication has progressed more or less independently for more than 60 years. In this article, we present an overview of the foundational works and current trends in the two fields, specifically discussing the progress made in both lines of inquiry as well as the basic fundamental issues that neither has been able to resolve satisfactorily so far. We then discuss theoretical models and recent experimental evidence that point to the deep, pervasive connections between speech perception and production. We conclude that although research focusing on each domain individually has been vital in increasing our basic understanding of spoken language processing, the human capacity for speech communication is so complex that gaining a full understanding will not be possible until speech perception and production are conceptually reunited in a joint approach to problems shared by both modes. 2010 John Wiley & Sons, Ltd. WIREs Cogn Sci 2010 1 629–647 H istorically, language research focusing on the spoken (as opposed to written) word has been split into two distinct fields: speech perception and speech production. Psychologists and psycholinguists worked on problems of phoneme perception, whereas phoneticians examined and modeled articulation and speech acoustics. Despite their common goal of discovering the nature of the human capacity for spoken language communication, the two broad lines of inquiry have experienced limited mutual influence. The division has been partially practical, because methodologies and analysis are necessarily quite different when aimed at direct observation of overt behavior, as in speech production, or examination of hidden cognitive and neurological function, as in speech perception. Academic specialization has also played a part, since there is an overwhelming volume of knowledge available, but single researchers can only learn and use a small portion. In keeping with the goal of this series, however, we argue that the * Correspondence to: [email protected] 1 Department of Linguistics, Speech Research Laboratory, Indiana University, Bloomington, IN 47405, USA 2 Department of Psychological and Brain Sciences, Speech Research Laboratory, Cognitive Science Program, Indiana University, Bloomington, IN 47405, USA DOI: 10.1002/wcs.63 greatest prospects for progress in speech research over the next few years lie at the intersection of insights from research on speech perception and production, and in investigation of the inherent links between these two processes. In this article, therefore, we will discuss the major theoretical and conceptual issues in research dedicated first to speech perception and then to speech production, as well as the successes and lingering problems in these domains. Then we will turn to several exciting new directions in experimental evidence and theoretical models which begin to close the gap between the two research areas by suggesting ways in which they may work together in everyday speech communication and by highlighting the inherent links between speaking and listening. SPEECH PERCEPTION Before the advent of modern signal processing technol- ogy, linguists and psychologists believed that speech perception was a fairly uncomplicated, straightfor- ward process. Theoretical linguistics’ description of spoken language relied on the use of sequential strings of abstract, context-invariant segments, or phonemes, which provided the mechanism of contrast between Volume 1, September/October 2010 2010 John Wiley & Sons, Ltd. 629

Transcript of Speech perception and production

Page 1: Speech perception and production

Overview

Speech perception and productionElizabeth D. Casserly1∗ and David B. Pisoni2

Until recently, research in speech perception and speech production has largelyfocused on the search for psychological and phonetic evidence of discrete, abstract,context-free symbolic units corresponding to phonological segments or phonemes.Despite this common conceptual goal and intimately related objects of study,however, research in these two domains of speech communication has progressedmore or less independently for more than 60 years. In this article, we presentan overview of the foundational works and current trends in the two fields,specifically discussing the progress made in both lines of inquiry as well as thebasic fundamental issues that neither has been able to resolve satisfactorily so far.We then discuss theoretical models and recent experimental evidence that pointto the deep, pervasive connections between speech perception and production.We conclude that although research focusing on each domain individually hasbeen vital in increasing our basic understanding of spoken language processing,the human capacity for speech communication is so complex that gaining a fullunderstanding will not be possible until speech perception and production areconceptually reunited in a joint approach to problems shared by both modes. 2010John Wiley & Sons, Ltd. WIREs Cogn Sci 2010 1 629–647

Historically, language research focusing on thespoken (as opposed to written) word has been

split into two distinct fields: speech perception andspeech production. Psychologists and psycholinguistsworked on problems of phoneme perception, whereasphoneticians examined and modeled articulation andspeech acoustics. Despite their common goal ofdiscovering the nature of the human capacity forspoken language communication, the two broad linesof inquiry have experienced limited mutual influence.The division has been partially practical, becausemethodologies and analysis are necessarily quitedifferent when aimed at direct observation of overtbehavior, as in speech production, or examinationof hidden cognitive and neurological function, as inspeech perception. Academic specialization has alsoplayed a part, since there is an overwhelming volumeof knowledge available, but single researchers canonly learn and use a small portion. In keeping withthe goal of this series, however, we argue that the

∗Correspondence to: [email protected] of Linguistics, Speech Research Laboratory, IndianaUniversity, Bloomington, IN 47405, USA2Department of Psychological and Brain Sciences, Speech ResearchLaboratory, Cognitive Science Program, Indiana University,Bloomington, IN 47405, USA

DOI: 10.1002/wcs.63

greatest prospects for progress in speech research overthe next few years lie at the intersection of insightsfrom research on speech perception and production,and in investigation of the inherent links between thesetwo processes.

In this article, therefore, we will discuss themajor theoretical and conceptual issues in researchdedicated first to speech perception and then tospeech production, as well as the successes andlingering problems in these domains. Then wewill turn to several exciting new directions inexperimental evidence and theoretical models whichbegin to close the gap between the two researchareas by suggesting ways in which they may worktogether in everyday speech communication and byhighlighting the inherent links between speaking andlistening.

SPEECH PERCEPTIONBefore the advent of modern signal processing technol-ogy, linguists and psychologists believed that speechperception was a fairly uncomplicated, straightfor-ward process. Theoretical linguistics’ description ofspoken language relied on the use of sequential stringsof abstract, context-invariant segments, or phonemes,which provided the mechanism of contrast between

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 629

Page 2: Speech perception and production

Overview wires.wiley.com/cogsci

Typical Yester year

FIGURE 1 | Speech waveform of the words typical and yesteryear as produced by an adult male speaker, representing variations in amplitudeover time. Vowels are generally the most resonant speech component, corresponding to the most extreme amplitude levels seen here. The identifyingformant frequency information in the acoustics is not readily accessible from visual inspection of waveforms such as these.

Typical Yester year

FIGURE 2 | A wide-band speech spectrogram of the same utterance as in Figure 1, showing the change in component frequencies over time.Frequency is represented along the y-axis and time on the x-axis. Darkness corresponds to greater signal strength at the corresponding frequencyand time.

lexical items (e.g., distinguishing pat from bat).1,2

The immense analytic success and relative ease ofapproaches using such symbolic structures led lan-guage researchers to believe that the physical imple-mentation of speech would adhere to the segmental‘linearity condition,’ so that the acoustics correspond-ing to consecutive phonemes would concatenate likean acoustic alphabet or a string of beads stretchedout in time. If that were the case, perception of thelinguistic message in spoken utterances would be atrivial matching process of acoustics to contrastivephonemes.3

Understanding the true nature of the physicalspeech signal, however, has turned out to be farfrom easy. Early signal processing technologies, priorto the 1940s, could detect and display time-varyingacoustic amplitudes in speech, resulting in the familiarwaveform seen in Figure 1. Phoneticians have longknown that it is the component frequencies encodedwithin speech acoustics, and how they vary over time,that serve to distinguish one speech percept fromanother, but waveforms do not readily provide accessto this key information. A major breakthrough camein 1946, when Ralph Potter and his colleagues atBell Laboratories developed the speech spectrogram,a representation which uses the mathematical Fouriertransform to uncover the strength of the speech signalhidden in the waveform amplitudes (as shown inFigure 1) at a wide range of possible componentfrequencies.4 Each calculation finds the signal strengththrough the frequency spectrum of a small time

window of the speech waveform; stringing the resultsof these time-window analyses together yields a speechspectrogram or voiceprint, representing the dynamicfrequency characteristics of the spoken signal as itchanges over time (Figure 2).

Phonemes—An Unexpected Lackof EvidenceAs can be seen in Figure 2, the content of aspeech spectrogram does not visually correspond tothe discrete segmental units listeners perceive in astraightforward manner. Although vowels stand outdue to their relatively high amplitudes (darkness) andclear internal frequency structure, reflecting harmonicresonances or ‘formant frequencies’ in the vocal tract,their exact beginning and ending points are notimmediately obvious to the eye. Even the seeminglyclear-cut amplitude rises after stop consonant closures,such as for the [p] in typical, do not directlycorrelate with the beginning of a discrete vowelsegment, since these acoustics simultaneously providecritical information about both the identity of theconsonant and the following vowel. Demarcatingconsonant/vowel separation is even more difficult inthe case of highly sonorant (or resonant) consonantssuch as [w] or [r].

The simple ‘acoustic alphabet’ view of speechreceived another set-back in the 1950s, when FranklinCooper of Haskins Laboratories reported his researchgroup’s conclusion that acoustic signals composed of

630 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 3: Speech perception and production

WIREs Cognitive Science Speech perception and production

strictly serial, discrete units designed to correspondingphonemes or segments are actually impossible forlisteners to process at speeds near those of normalspeech perception.5 No degree of signal simplicity,contrast between units, or user training with thecontext-free concatenation system could producenatural rates of speech perception for listeners.Therefore, the Haskins group concluded that speechmust transmit information in parallel, through use ofthe contextual overlap observed in spectrograms ofthe physical signal. Speech does not look like a stringof discrete, context-invariant acoustic segments, andin order for listeners to process its message as quicklyas they do, it cannot be such a system. Instead, asAlvin Liberman proposed, speech is a ‘code,’ takingadvantage of parallel transmission of phonetic contenton massive scale through co-articulation3 (see section‘Variation in Invariants,’ below).

As these discoveries came to light, the ‘speechperception problem’ began to appear increasinglyinsurmountable. On the one hand, phonological evi-dence (covered in more depth in the ‘Variation inInvariants’ section) implies that phonemes are a gen-uine property of linguistic systems. On the other hand,it has been shown that the acoustic speech signal doesnot directly correspond to phonological segments.How could a listener use such seemingly unhelpfulacoustics to recover a speaker’s linguistic message?Hockett encapsulated early speech scientists’ bewilder-ment when he famously likened the speech perceptiontask to that of the inspector in the following scenario:

Imagine a row of Easter eggs carried along a movingbelt; the eggs are of various sizes, and variouslycolored, but not boiled. At a certain point, thebelt carries the row of eggs between two rollersof a wringer, which quite effectively smash themand rub more or less into each other. The flowof eggs before the wringer represents the series ofimpulses from the phoneme source; the mess thatemerges from the wringer represents the output of thespeech transmitter. At a subsequent point, we have aninspector whose task it is to examine the passing messand decide, on the basis of the broken and unbrokenyolks, the variously spread out albumen, and thevariously colored bits of shell, the nature of the flowof eggs which previously arrived at the wringer.

(Ref 1, p. 210)

For many years, researchers in the field of speechperception focused their efforts on trying to solve thisenigma, believing that the heart of the speech percep-tion problem lay in the seemingly impossible task ofphoneme recognition—putting the Easter eggs backtogether.

Synthetic Speech and the Haskins PatternPlaybackSoon after the speech spectrogram enabled researchersto visualize the spectral content of speech acous-tics and its changes over time, that knowledge wasput to use in the development of technology ableto generate speech synthetically. One of the earlyresearch synthesizers was the Pattern Playback (Fig-ure 3, top panel), developed by scientists and engi-neers, including Cooper and Liberman, at Hask-ins Laboratories.6 This device could take simplifiedsound spectrograms like those shown in Figure 3 anduse the component frequency information to pro-duce highly intelligible corresponding speech acous-tics. Hand-painted spectrographic patterns (Figure 3,lower panel) allowed researchers tight experimen-tal control over the content of this synthetic, verysimplified Pattern Playback speech. By varying its fre-quency content and transitional changes over time,investigators were able to determine many of the spe-cific aspects in spoken language which are essentialto particular speech percepts, and many which arenot.3,6

Perceptual experiments with the Haskins PatternPlayback and other speech synthesizers revealed, forexample the pattern of complex acoustics that signalsthe place of articulation of English stop consonants,such as [b], [t] and [k].3 For voiced stops ([b],[d], [g]) the transitions of the formant frequenciesfrom silence to the vowel following the consonantlargely determine the resulting percept. For voicelessstops ([p], [t], [k]) however, the acoustic frequencyof the burst of air following the release of theconsonant plays the largest role in identification.The experimental control gained from the PatternPlayback allowed researchers to alter and eliminatemany aspects of naturally produced speech signals,discovering the identities of many such sufficient ornecessary acoustic cues for a given speech percept. Thisearly work attempted to pair speech down to its bareessentials, hoping to reveal the mechanisms of speechperception. Although largely successful in identifyingperceptually crucial aspects of speech acoustics andgreatly increasing our fundamental understanding ofspeech perception, these pioneering research effortsdid not yield invariant, context-independent acousticfeatures corresponding to segments or phonemes. Ifanything, this research program suggested alternativebases for the communication of linguistic content.7,8

Phoneme Perception—Positive EvidenceSome of the research conducted with the aim ofunderstanding phoneme perception, however, did

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 631

Page 4: Speech perception and production

Overview wires.wiley.com/cogsci

Lightsource

Cyllens

Tonewheel

Lens

45° Mirror Light collectorand photocell

(reflection)

Light collector (transmission)

Spectrogram

Amplifier Loud-speaker

Place of articulation

Front Middle Back

ba da ga

pa ta ka

ma na

Man

ner

of a

rtic

ulat

ion

Nas

als

Unv

oice

dst

ops

Voi

ced

stop

s

a

FIGURE 3 | Top panel: Adiagram of the principles andcomponents at work in theHaskins Pattern Playbackspeech synthesizer.(Reprinted with permissionfrom Ref. 68 Copyright 1951national Academies ofScience.) Lower panel: Aseries of hand-paintedschematic spectrographicpatterns used as input to theHaskins Pattern Playback inearly research on perceptual‘speech cues.’ (Reprintedwith permission from Ref. 69Copyright 1957 AmericanInstitute of Physics.)

lead to results suggesting the reality of psychologicalparticulate units such as phonemes. For instance, insome cases listeners show evidence of ‘perceptual con-stancy,’ or abstraction from signal variation to moregeneralized representations—possibly phonemes. Var-ious types of such abstraction have been found inspeech perception, but we will address two of themost influential here.

Categorical Perception EffectsPhoneme representations split potential acoustic con-tinuums into discrete categories. The duration ofaspiration occurring after the release of a stop con-sonant, for example, constitutes a potential contin-uum ranging from 0 ms, where vocalic resonancebegins simultaneously with release of the stop, to an

indefinitely long period between the stop release andthe start of the following vowel. Yet stops in Englishfalling along this continuum are split by native lis-teners into two functional groups—voiced [b], [d], [g]or voiceless [p], [t], [k]—based on the length of this‘voice onset time.’ In general, this phenomenon is notso strange: perceptual categories often serve to breakcontinuous variation into manageable chunks.

Speech categories appear to be unique in oneaspect, however, listeners are unable to reliably dis-criminate between two members of the same category.That is, although we may assign two different colorsboth to the category ‘red,’ we can easily distinguishbetween the two shades in most cases. When speechscientists give listeners stimuli varying along an acous-tic continuum, however, their discrimination between

632 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 5: Speech perception and production

WIREs Cognitive Science Speech perception and production

different tokens of the same category (analogous totwo shades of red) is very close to chance.9 Theyare highly accurate at discriminating tokens spanningcategory boundaries, on the other hand. The combina-tion of sharp category boundaries in listeners’ labelingof stimuli and their within-category insensitivity indiscrimination, as shown in Figure 4, appears to beunique to human speech perception, and constitutessome of the strongest evidence in favor of robustsegmental categories underlying speech perception.

According to this evidence, listeners sacrificesensitivity to acoustic detail in order to make speechcategory distinctions more automatic and perhapsalso less subject to the influence of variability. Thistype of category robustness is observed more stronglyin the perception of consonants than vowels. Notcoincidentally, as discussed briefly above and in moredetail in the ‘Acoustic Phonetics’ section, below,the stop consonants which listeners have the mostdifficulty discriminating also prove to be the greatestchallenge to define in terms of invariant acousticcues.10

Perceptual ConstancyCategorical perception effects are not the only case ofsuch abstraction or perceptual constancy in speechperception; listeners also appear to ‘translate’ thespeech they hear into more symbolic or idealizedforms, encoding based on expectations of genderand accent. Niedzielski, for example, found thatlisteners identified recorded vowel stimuli differentlywhen they were told that the original speaker wasfrom their own versus another dialect group.11

For these listeners, therefore, the mapping fromphysical speech characteristics to linguistic categorieswas not absolute, but mediated by some abstractconceptual unit. Johnson summarizes the results ofa variety of studies showing similar behavior,12

which corroborates the observation that, althoughindexical or ‘extra-linguistic’ information such asspeaker gender, dialect, and speaking style are notinert in speech perception, more abstract linguisticunits play a role in the process as well.

Far from being exotic, this type of ‘percep-tual equivalence’ corresponds very well with languageusers’ intuitions about speech. Although listeners areaware that individuals often sound drastically differ-ent, the feeling remains that something holds constantacross talkers and speech tokens. After all, cat is stillcat no matter who says it. Given the signal vari-ability and complexity observed in speech acoustics,such consistency certainly seems to imply the influenceof some abstract unit in speech perception, possiblycontrastive phonemes or segments.

100

75

50

25

0

Per

cent

iden

tific

atio

n

100

75

50

25

0

Per

cent

cor

rect

1 2 3 4 5 6 7 8 9 10 11 12 13

1 2 3 4 5 6 7 8 9 10 11 12 13

14Stimulus value

Value of “A” stimulus

One-stepdiscrimination

Labelling data

b d g

FIGURE 4 | Data for a single subject from a categorical perceptionexperiment. The upper panel gives labeling or identification data foreach step on a [b]/[g] place-of-articulation continuum. The lower graphgives this subject’s ABX discrimination data (filled circles) for the samestimuli with one step difference between pairs, as well as the predicteddiscrimination performance (open circles). Discrimination accuracy ishigh at category boundaries and low within categories, as predicted.(Reprinted with permission from Ref. 9 Copyright 1957 AmericanPsychological Association.)

Phoneme Perception—Shortcomings andRoadblocksFrom the discussion above, it should be evident thatspeech perception research with the traditional focuson phoneme identification and discrimination hasbeen unable either to confirm or deny the psycho-logical reality of context-free symbolic units such asphonemes. Listeners’ insensitivity to stimulus differ-ences within a linguistic category and their referenceto an abstract ideal in identification support the cog-nitive role of such units, whereas synthetic speechmanipulation has simultaneously demonstrated thatlinguistic percepts simply do not depend on invari-ant, context-free acoustic cues corresponding to seg-ments. This paradoxical relationship between signal

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 633

Page 6: Speech perception and production

Overview wires.wiley.com/cogsci

variance and perceptual invariance constitutes one ofthe fundamental issues in speech perception research.

Crucially, however, the research discussed untilnow focused exclusively on the phoneme as thelocus of language users’ perceptual invariance. Thisapproach stemmed from the assumption that speechperception can essentially be reduced to phonemeidentification, relating yet again back to theoreticallinguistics’ analysis of language as sequences ofdiscrete, context-invariant units. Especially giventhe roadblocks and contradictions emerging in thefield, however, speech scientists began to questionthe validity of those foundational assumptions. Byattempting to control variability and isolate perceptualeffects on the level of the phoneme, experimenterswere asking listeners to perform tasks that bore littleresemblance to typical speech communication. Interestin the field began to shift toward the influence of largerlinguistic units such as words, phrases, and sentencesand how speech perception processes are affected bythem, if at all.

Beyond the Phoneme—Spoken WordRecognition ProcessesBoth new and revisited experimental evidence readilyconfirmed that the characteristics of word-level unitsdo exert massive influence in speech perception. Thelexical status (word vs non-word) of experimentalstimuli, for example, biases listeners’ phonemeidentification such that they hear more tokens as[d] in a dish/tish continuum, where the [d] perceptcreates a real word, than a da/ta continuum whereboth perceptual options are non-words.13 Moreover,research into listeners’ perception of spoken words hasshown that there are many factors that play a majorrole in word recognition but almost never influencephoneme perception.

Perhaps the most fundamental of these factorsis word frequency: how often a lexical item tends tobe used. The more frequently listeners encounter aword over the course of their daily lives, the morequickly and accurately they are able to recognizeit, and the better they are at remembering it in arecall task (e.g., Refs 14,15). High-frequency wordsare more robust in noisy listening conditions, andwhenever listeners are unsure of what they have heardthrough such interference, they are more likely toreport hearing a high-frequency lexical item than alow-frequency one.16 In fact, the effects of lexicalstatus mentioned above are actually only extremecases of frequency effects; phonotactically legal non-words (i.e., non-words which seem as though theycould be real words) are treated psychologically likereal words with a frequency of zero. Like cockroaches,

these so-called ‘frequency effects’ pop up everywherein speech research.

The nature of a word’s ‘lexical neighborhood’also plays a pervasive role in its recognition. If a wordis highly similar to many other words, such as catis in English, then listeners will be slower and lessaccurate to identify it, whereas a comparably high-frequency word with fewer ‘neighbors’ to competewith it will be recognized more easily. ‘Lexical hermits’such as Episcopalian and chrysanthemum, therefore,are particularly easy to recognize despite their lowfrequencies (and long durations). As further evidenceof frequency effects’ ubiquitous presence, however,the frequencies of a word’s neighbors also influenceperception: a word with a dense neighborhood of high-frequency items is more difficult to recognize than aword with a dense neighborhood of relatively low-frequency items, which has weaker competition.17,18

Particularly troublesome for abstractionistphoneme-based views of speech perception, however,was the discovery that the indexical properties ofspeech (see ‘Perceptual Constancy,’ below) also influ-ence word recognition. Goldinger, for example, hasshown that listeners are more accurate at word recallwhen they hear stimuli repeated by the same versusdifferent talkers.19 If speech perception were mediatedonly by linguistic abstractions, such ‘extra-linguistic’detail should not be able to exert this influence. In fact,this and an increasing number of similar results (e.g.,Ref 20) have caused many speech scientists to aban-don traditional theories of phoneme-based linguisticrepresentation altogether, instead positing that lexicalitems are composed of maximally detailed ‘episodic’memory traces.19,21

Conclusion—Speech PerceptionRegardless of the success or failure of episodicrepresentational theories, a host of new researchquestions remain open in speech perception. Thevariable signal/common percept paradox remains afundamental issue: what accounts for the perceptualconstancy across highly diverse contexts, speech stylesand speakers? From a job interview in a quiet roomto a reunion with an old friend at a cocktail party,from a southern belle to a Detroit body builder, whatmakes communication possible? Answers to thesequestions may lie in discovering the extent to whichthe speech perception processes tapped by experimentsin word recognition and phoneme perception arerelated, and uncovering the nature of the neuralsubstrates of language that allow adaptation to suchdiverse situations. Deeply connected to these issues,Goldinger, Johnson and others’ results have prompted

634 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 7: Speech perception and production

WIREs Cognitive Science Speech perception and production

us to wonder: what is the representational specificity ofspeech knowledge and how does it relate to perceptualconstancy?

Although speech perception research over thelast 60 years has made substantial progress inincreasing our understanding of perceptual challengesand particularly the ways in which they are not solvedby human listeners, it is clear that a great deal ofwork remains to be done before even this one aspectof speech communication is truly understood.

SPEECH PRODUCTIONSpeech production research serves as the comple-ment to the work on speech perception describedabove. Where investigations of speech perception arenecessarily indirect, using listener response time laten-cies or recall accuracies to draw conclusions aboutunderlying linguistic processing, research on speechproduction can be refreshingly direct. In typicalproduction studies, speech scientists observe articu-lation or acoustics as they occur, then analyze thisconcrete evidence of the physical speech produc-tion process. Conversely, where speech perceptionstudies give researchers exact experimental controlover the nature of their stimuli and the inputs toa subject’s perceptual system, research on speechproduction severely limits experimental control, mak-ing the investigators observe more or less passively,whereas speakers do as they will in response to theirprompts.

Such fundamentally different experimental con-ditions, along with focus on the opposite side of theperceptual coin, allows speech production research toask different questions and draw different conclusionsabout spoken language use and speech communica-tion. As we discuss below, in some ways this ‘divideand conquer’ approach has been very successful inexpanding our understanding of speech as a whole.In other ways, however, it has met with many of thesame roadblocks as its perceptual complement andsimilarly leaves many critical questions unanswered inthe end.

A Different ApproachWhen the advent of the speech spectrogram made itobvious that the speech signal does not straightfor-wardly mirror phonemic units, researchers respondedin different ways. Some, as discussed above, chose toquestion the perceptual source of phoneme intuitions,trying to define the acoustics necessary and sufficientfor an identifiable speech percept. Others, however,began separate lines of work aiming to observe the

behavior of speakers more directly. They wanted toknow what made the speech signal as fluid and seam-less as it appeared, whether the observed overlap andcontextual dependence followed regular patterns orrules, and what evidence speakers might show in sup-port of the reality of the phonemic units. In short,these speech scientists wanted to demystify the puz-zling acoustics seen on spectrograms by investigatingthem in the context of their source.

The Continuing SearchIt may seem odd, perhaps, that psychologists, speechscientists, engineers, phoneticians, and linguists werenot ready to abandon the idea of phonemes as soonas it became apparent that the physical speech signaldid not straightforwardly support their psychologicalreality. Dating back to Panini’s grammatical study ofSanskrit, however, the use of abstract units such asphonemes has provided enormous gains to linguisticand phonological analysis. Phonemic units appear tocapture the domain of many phonological processes,for example, and their use enables linguists to makesense of the multitude of patterns and distributionsof speech sounds across the world’s languages. Ithas even been argued22 that their discrete, particulatenature underlies humanity’s immense potential forlinguistic innovation, allowing us to make ‘infinite useof finite means.’23

Beyond these theoretical gains, phonemes wereargued to be empirically supported by research onspeech errors or ‘slips of the tongue,’ which appearedto operate over phonemic units. That is, the kindsof errors observed during speech production, such asanticipations (‘‘a leading list’’), perseverations (‘pulleda pantrum’), reversals (‘heft lemisphere’), additions(‘moptimal number’), and deletions (‘chrysanthemump ants’), appear to involve errors in the ordering andselection of whole segmental units, and always resultin legal phonological combinations, whose domainis typically described as the segment.24 Without evi-dence to the contrary, these errors seemed to provideevidence for speakers’ use of discrete phonologicalunits.

Although there have been dissenters25 and shiftsin the conception of the units thought to underliespeech, abstract features, or phoneme-like units ofsome type have remained prevalent in the literature. Inlight of the particulate nature of linguistic systems, theenhanced understanding gained with the assumptionof segmental analysis, and the empirical evidenceobserved in speech planning errors, researchers wereand are reluctant to give up the search for the basis ofphonemic intuitions in physically observable speechproduction.

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 635

Page 8: Speech perception and production

Overview wires.wiley.com/cogsci

Acoustic PhoneticsOne of the most fruitful lines of research intospeech production focused on the acoustics of speech.This body of work, part of ‘Acoustic Phonetics,’examines the speech signals speakers produce in greatdetail, searching for regularities, invariant properties,and simply a better understanding of the humanspeech capacity. Although the speech spectrographdid not immediately show the invariants researchersanticipated, they reasoned that such technology wouldalso allow them to investigate the speech signal atan unprecedented level of scientific detail. Becausespeech acoustics are so complex, invariant cuescorresponding to phonemes may be present, butdifficult to pinpoint.10,26

While psychologists and phoneticians in speechperception were generating and manipulating syn-thesized speech in an effort to discover the acoustic‘speech cues,’ therefore, researchers in speech pro-duction refined signal processing techniques enablingthem to analyze the content of naturally producedspeech acoustics. Many phoneticians and engineerstook on this problem, but perhaps none has beenas tenacious and successful as Kenneth Stevens ofMIT.

An electrical engineer by training, Stevens tookthe problem of phoneme-level invariant classificationand downsized it, capitalizing on the phonologicaltheories of Jackobson et al.27 and Chomsky andHalle’s Sound Patterns of English28 which postulatedlinguistic units below the level of the phoneme calleddistinctive features. Binary values of universal featuressuch as [sonorant], [continuant], and [high], theselinguists argued, constituted the basis of phonemes.Stevens and his colleagues thought that invariantacoustic signals may correspond to distinctive featuresrather than phonemes.10,26 Since phonemes oftenshare features (e.g., /s/ and /z/ share specification forall distinctive features except [voice]), it would makesense that their acoustics are not as unique as mightbe expected from their contrastive linguistic functionalone.

Stevens, therefore, began a thorough searchfor invariant feature correlates that continued untilhis retirement in 2007. He enjoyed several notablesuccesses: many phonological features, it turns out,can be reliably specified by one or two signalcharacteristics or ‘acoustic landmarks.’ Phonolog-ical descriptors of vowel quality, such as [high]and [front], were found to correspond closely tothe relative spacings between the first and secondharmonic resonances of the vocal tract (or ‘for-mants’) during the production of sonorant vowelsegments.10

Articulation Acoustics Percept

Rising transition

Falling transition

Burst at1440 Hz

d d

pp

k k

FIGURE 5 | Observations from early perceptual speech cue studies.In the first case, two different acoustic signals (consonant/vowelformant frequency transitions) result in the same percept. In the lattercase, identical acoustics (release burst at 1440 Hz) result in twodifferent percepts, depending on the vocalic context. In both cases,however, perception reflects articulatory, rather than acoustic, contrast.(Adapted and reprinted with permission from Ref. 29 Copyright 1996American Institute of Physics.)

Some features, however, remained more difficultto define acoustically. Specifically, the acousticscorresponding to consonant place of articulationseemed to depend heavily on context—the exact sameburst of noise transitioning from a voiceless stopto a steady vowel might result from the lip closureof a [p] or the tongue-dorsum occlusion of a [k],depending on the vowel following the consonant.Equally problematic, the acoustics signaling thealveolar ridge closure location of coronal stop [t] arecompletely different before different vowels.29 Thearticulation/acoustic mismatch, and the tendency forlinguistic percepts to mirror articulation rather thanacoustics, is represented in Figure 5.

Variation in InvariantsWhy do listeners’ speech percepts show this dissocia-tion from raw acoustic patterns? Perhaps the answerbecomes more intuitive when we consider that even themost reliable acoustic invariants described by Stevensand his colleagues tend to be somewhat broad, dealingin relative distances between formant frequencies invowels and relative abruptness of shifts in amplitudeand so on. This dependence on relative measurescomes from two major sources: individual differ-ences among talkers and contextual variation due toco-articulation. Individual speakers’ vocal tracts areshaped and sized differently, and therefore they res-onant differently (just as different resonating soundsare produced by blowing over the necks of differently

636 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 9: Speech perception and production

WIREs Cognitive Science Speech perception and production

sized and shaped bottles), making the absoluteformant frequencies corresponding to different vow-els, for instance, impossible to generalize acrossindividuals.

Perhaps more obviously problematic, though,is the second source: speech acoustics’ sensitivity tophonetic context. Not only do the acoustics cues for[p], [t], or [k] depend on the vowel following thestop closure, for example, but because the consonantand vowel are produced nearly simultaneously, theidentity of the consonant reciprocally affects theacoustics of the vowel. Such co-articulatory effectsare extremely robust, even operating across syllableand word boundaries. This extensive interdependencemakes the possibility of identifying reliable invariancein the acoustic speech signal highly remote.

Although some researchers, such as Stevens,attempted to factor out or ‘‘normalize’’ these co-articulatory effects, others believed that they arecentral to the functionality of speech communication.Liberman et al. at Haskins Laboratories pointedout that co-articulation of consonants and vowelsallows the speech signal to transfer information inparallel, transmitting messages more quickly than itcould if spoken language consisted of concatenatedstrings of context-free discrete units.3 Co-articulationtherefore enhances the efficiency of the system, ratherthan being a destructive or communication-hamperingforce. Partially as a result of this view, some speechscientists focused on articulation as a potential key tounderstanding the reliability of phonemic intuitions,

rather than on its acoustic consequences. Theydeveloped the research program called ‘articulatoryphonetics,’ aimed at the study of the visible and hiddenmovements of the speech organs.

Articulatory Phonetics

TechniquesIn many ways articulatory phonetics constitutes asmuch of an engineering challenge as a linguistic one.Because the majority of the vocal tract ‘machinery’ lieshidden from view (see Figure 6), direct observationof the mechanics of speech production requirestechnology, creativity, or both. And any potentialsolution to the problem of observation cannot disruptnatural articulation too extensively if its results areto be useful in understanding natural productionof speech. The challenge, therefore, is to investigateaspects of speech articulation accurately and to a highlevel of detail, while keeping interference with thespeaker’s normal production as minor as possible.

Various techniques have been developed thatmanage to satisfy these requirements, spanning fromthe broadly applicable to the highly specialized. Elec-tromyography (EMG), for instance, allows researchersto measure directly the activity of muscles within thevocal tract during articulation via surface or insertedpin electrodes.30 These recordings have broad applica-tions in articulatory phonetics, from determining therelative timing of tongue movements during syllable

FIGURE 6 | A sagittal view of the humanvocal tract showing the main speecharticulators as labeled. (Reprinted withpermission from Ref. 70 Copyright 2001Blackwell Publishers Inc.)

Tongue blade

Tongue back

Tongue root

Soft palate(velum)

Uvula

Pharynx

Vocal folds(aperture = glottis)

Larynx

Epiglottis

Lips

Tongue tip

Alveolar ridge

Hard palateNasal cavity

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 637

Page 10: Speech perception and production

Overview wires.wiley.com/cogsci

production to measures of pulmonary function fromactivity in speakers’ diaphragms to examining toneproduction strategies via raising and lowering ofspeakers’ larynxes. EMG electrode placement can sig-nificantly impact articulation, however, which doesimpose limits on its use. More specialized techniquesare typically still more disruptive of typical speechproduction, but interfere minimally with their partic-ular investigational target. In transillumination of theglottis, for example, a bundle of fiberoptic lights isfed through a speaker’s nose until the light source ispositioned just above their larynx.31 A light-sensitivephotocell is then placed on the neck just below theglottis to detect the amount of light passing throughthe vocal folds at any given moment, which corre-lates directly with the size of glottal opening overtime. Although transillumination is clearly not anideal method to study the majority of speech artic-ulation, it nevertheless provides a highly accuratemeasure of various glottal states during speech pro-duction.

Perhaps the most currently celebrated articula-tory phonetics methods are also the least disruptive tospeakers’ natural articulation. Simply filming speechproduction in real-time via X-ray provided an excel-lent, complete view of unobstructed articulation, butfor health and safety reasons can no longer beused to collect new data.32 Methods such as X-raymicrobeam and Electromagnetic Mid-Sagittal Artic-ulometer (EMMA) tracking attempt to approximatethat ‘X-ray vision’ by recording the movements ofspeech articulators in real-time through other means.The former uses a tiny stream of X-ray energy aimedat radio-opaque pellets attached to a speaker’s lips,teeth, and tongue to monitor the movements of theshadows created by the pellets as the speaker talks.The latter, EMMA, generates similar positional datafor the speech organs by focusing alternating mag-netic fields on a speaker and monitoring the voltageinduced in small electromagnetic coils attached tospeaker’s articulators as they move through the fieldsduring speech. Both methods track the movements ofspeech articulators despite their inaccessibility to vis-ible light, providing reliable position-over-time datathat minimally disrupts natural production.33,34 How-ever, comparison across subjects can be difficult dueto inconsistent placement of tracking points from onesubject to another and simple anatomical differencesbetween subjects.

Ultrasound provides another, even less disrup-tive, articulatory phonetics technique that has beengaining popularity in recent years (e.g., Refs 35,36).Using portable machinery that does nothing moreinvasive than send sound waves through a speaker’s

tissue and monitor their reflections, speech scientistscan track movements of the tongue body, tongueroot, and pharynx that even X-ray microbeam andEMMA cannot capture, as these articulators are allbut completely manually inaccessible. By placing anultrasound wand at the juncture of the head andneck below the jaw, however, images of the tonguefrom its root in the larynx to its tip can be viewedin real-time during speech production, with virtuallyno interference to the speech act itself. The trackingcannot extend beyond cavities of open air, makingthis method inappropriate for studies of precise placeof articulation against the hard palate or of velummovements, for example, but these are areas in whichX-ray microbeam and EMMA excel. The data recentlycaptured using these techniques are beginning to givespeech scientists a more complete picture of speecharticulation than ever before.

Impact on the Search for PhonemesUnfortunately for phoneme-based theories of speechproduction and planning, the results of recent articu-latory studies of speech errors do not seem to paint acompatible picture. As discussed above, the categor-ical nature of speech errors has served as importantsupport for the use of phonemic units in speech pro-duction. Goldstein, Pouplier, and their colleagues,however, used EMMA to track speakers’ productionof errors in a repetition task similar to a tonguetwister. Confirming earlier suspicious (e.g., Ref 25),they found that while speakers’ articulation some-times followed a categorically ‘correct’ or ‘errorful’gestural pattern, it was more frequently somewherebetween two opposing articulations. In these cases,small ‘incorrect’ movements of the articulators wouldintrude upon the target speech gesture, both gestureswould be executed simultaneously, or the errorfulgesture would completely overshadow the target artic-ulation. Only the latter reliably resulted in the acousticpercept of a speech error.37 As Goldstein and Pouplierpoint out, such non-categorical, gradient speech errorscannot constitute support for discrete phonemic unitsin speech planning.

Importantly, this finding was not isolated in thearticulatory phonetics literature: speakers frequentlyappear to execute articulatory movements that donot result in any acoustic consequences. Specifically,X-ray microbeam tracking of speakers’ tongue tip,tongue dorsum, and lip closures during casual pro-nunciation of phrases such as perfect memory revealsthat speakers raise their tongue tips for [t]-closure,despite the fact that the preceding [k] and following[m] typically obscure the acoustic realization of the[t] completely.38 Although they could minimize their

638 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 11: Speech perception and production

WIREs Cognitive Science Speech perception and production

articulatory effort by not articulating the [t] whereit will not be heard, speakers faithfully proceed withtheir complete articulation, even in casual speech.

Beyond the PhonemeSo far we have seen that, while technological,methodological and theoretical advances have enabledspeech scientists to understand the speech signal andits physical production better than ever before, theunderlying source of spoken language’s systematicnature remains largely mysterious. New researchquestions continue to be formulated, however, usingresults that were problematic under old hypothesesto motivate new theories and new approaches to thestudy of speech production.

The theory of ‘Articulatory Phonology’ standsas a prominent example; its proponents took thecombination of gradient speech error data, speakers’faithfulness to articulation despite varying means ofacoustic transmission, and the lack of invariant acous-tic speech cues as converging evidence that speechis composed of articulatory, rather than acoustic,fundamental units that contain explicit and detailedtemporal structure.8,38 Under this theory, linguisticinvariants are underlyingly motor-based articulatorygestures which specify the degree and location of con-strictions in the vocal tract and delineate a certainamount of time relative to other gestures for theirexecution. Constellations of these gestures, relatedin time, constitute syllables and words without ref-erence to strictly sequential segmental or phonemicunits. Speech perception, then, consists of deter-mining the speech gestures and timing responsi-ble for creating a received acoustic signal, possiblythrough extension of experiential mapping betweenthe perceiver’s own gestures and their acoustic con-sequences, as in Liberman and Mattingly’s MotorTheory of Speech Perception7 or Fowler’s Direct Real-ist approach.39 Recent evidence from neurosciencemay provide a biological mechanism for this process40

(see ‘Neurobiological Evidence—Mirror Neurons,’below).

And although researchers like Stevens continuedto focus on speech acoustics as opposed to articu-lation, the separate lines of inquiry actually appearto be converging on the same fundamental solutionto the invariance problem. The most recent instanti-ation of Stevens’ theory posits that some distinctivephonological features are represented by sets of redun-dant invariant acoustic cues, only a subset of whichare necessary for recognition in any single token.As Stevens recently wrote, however, the distinctionbetween this most recent feature-based account and

theories of speech based on gestures may no longer beclear:

The acoustic cues that are used to identify theunderlying distinctive features are cues that provideevidence for the gestures that produced the acousticpattern. This view that a listener focuses on acousticcues that provide evidence for articulatory gesturessuggests a close link between the perceptually relevantaspects of the acoustic pattern for a distinctive featurein speech and the articulatory gestures that give riseto this pattern.

(Ref 10, p. 142)

Just as in speech perception research, however,some speech production scientists are beginning towonder if the invariance question was even the rightquestion to be asked in the first place. In the spirit ofLindblom’s hyper-articulation and hypo-articulationtheory41 (see ‘Perception-Driven Adaptation in SpeechProduction,’ below), these researchers have beguninvestigating control and variability in production asa means of pinning down the nature of the underlyingsystem. Speakers are asked to produce the samesentence in various contextual scenarios such thata target elicited word occurs as the main element offocus, as a carrier of stress, as a largely unstressedelement, and as though a particular component ofthe word was misheard (e.g., in an exchange suchas ‘Boy?’ ‘No, toy’), while their articulation andspeech acoustics are recorded. Then the data areexamined for regularities. If the lengths of onsetconsonants and following vowels remain constantacross emphasized, focused, stressed, and unstressedconditions, for example, that relationship may bespecified in the representation of syllables, whereasthe absolute closure and vocalic durations varyfreely and therefore must not be subject to linguisticconstraint. Research of this type seeks to determinethe articulatory variables under active, regular controland which (if any) are mere derivatives or side effectsof deliberate actions.42–44

Conclusion—Speech ProductionDespite targeting a directly observable, overt linguisticbehavior, speech production research has had no moresuccess than its complement in speech perceptionat discovering decisive answers to the foundationalquestions of linguistic representational structure or theprocesses governing spoken language use. Due to thejoint endeavors of acoustic and articulatory phonetics,our understanding of the nature of the acousticspeech signal and how it is produced has increased

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 639

Page 12: Speech perception and production

Overview wires.wiley.com/cogsci

tremendously, and each new discovery points to newquestions. If the basic units of speech are gesture-based, what methods and strategies do listeners usein order to perceive them from acoustics? Are theretestable differences between acoustic and articulatorytheories of representation? What aspects of speechproduction are under demonstrable active control,and how do the many components of the linguisticand biological systems work together across speakersand social and linguistic contexts? Although new linesof inquiry are promising, speech production researchseems to have only begun to scratch the surface of thecomplexities of speech communication.

SPEECH PERCEPTION ANDPRODUCTION LINKS

As Roger Moore recently pointed out, the natureof the standard scientific method is such that ‘itleads inevitably to greater and greater knowledgeabout smaller and smaller aspects of a problem’(Ref. 45, p. 419). Speech scientists followed goodscientific practice when they effectively split thespeech communication problem, one of the mostcomplex behaviors of a highly complex species, intomore manageable chunks. And the perceptual andproductive aspects of speech each provided enoughof a challenge, as we have seen, that researchers hadplenty to work on without adding anything. Yet wehave also seen that neither discipline on its own hasbeen able to answer fundamental questions regardinglinguistic knowledge, representation, and processing.

Although the scientific method serves to separateaspects of a phenomenon, the ultimate goal of anyscientific enterprise is to unify individual discoveries,uncovering connections and regularities that werepreviously hidden.46 One of the great scientificbreakthroughs of the 19th century, for example,brought together the physics of electricity andmagnetism, previously separate fields, and revealedthem to be variations of the same basic underlyingprinciples. Similarly, where research isolated to eitherspeech perception or production has failed to findsuccess, progress may lie in the unification of thedisciplines. And unlike electricity and magnetism thea priori connection between speech perception andspeech production is clear: they are two sides ofthe same process, two links in Denes and Pinson’sfamous ‘speech chain’.47 Moreover, informationtheory demands that whatever signals generated inspeech production match those received in perception,a criteria known as ‘signal parity’ which must be metfor successful communication to take place; therefore,

the two processes must at some point even deal in thesame linguistic currency.48

In this final section, we will discuss theoriesand experimental evidence that highlight the deep,inherent links between speech perception and pro-duction. Perhaps by bringing together the insightsachieved within each separate line of inquiry, therecent evidence pointing to the nature of the con-nection between them, and several theories of howthey may work together in speech communication, wecan point to where the most exciting new researchquestions lie in the future.

Early Evidence—Audiovisual SpeechPerceptionLurking behind the idea that speech perception andproduction may be viewed as parts of a unified speechcommunication process is the assumption that speechconsists of more than just acoustic patterns and motorplans that happen to coincide. Rather, the currencyof speech must somehow combine the domains ofperception and production, satisfying the criterion ofsignal parity discussed above. Researchers such asKluender, Diehl, and colleagues take the former, more‘separatist’ stance, believing that speech is processedby listeners like any other acoustic signal, withoutinput from or reference to complementary productionsystems.49,50 Much of the research described in thissection runs counter to such a ‘general auditory’view, but none so directly challenges its fundamentalassumptions as the phenomenon of audiovisual speechperception.

The typical view of speech, fully embracedthus far here, puts audition and acoustics at thefore. However, visual and other sensory cues alsoplay important roles in the perception of a speechsignal, augmenting or occasionally even overridinga listener’s auditory input. Very early in speechperception research, Sumby and Pollack showed thatsimply seeing a speaker’s face during communicationin background noise can provide listeners with massivegains in speech intelligibility, with no change inthe acoustic signal.51 Similarly, it has been welldocumented that access to a speaker’s facial dynamicsimproves the accuracy and ease of speech perceptionfor listeners with mild to more severe types ofhearing loss52 and even deaf listeners with cochlearimplants.53,54 Perhaps no phenomenon demonstratesthis multimodal integration as clearly or has attractedmore attention in the field than the effect reported byMcGurk and MacDonald in 1976.55 When listenersreceive simultaneous, mismatching visual and auditoryspeech input, such as a face articulating the syllable

640 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 13: Speech perception and production

WIREs Cognitive Science Speech perception and production

ba paired with the acoustics for ga, they typicallyexperience a unified percept da that appears tocombine features of both signals while matchingneither. In cases of a closer match—between visualva and auditory ba, for example—listeners tend toperceive va, adhering to the visual rather than auditorysignal. The effect is robust even when listeners areaware of the mismatch, and has been observed withconflicting tactile rather than visual input56 and withpre-lingual infants.57 As these last cases show, theeffect cannot be due to extensive experience linkingvisual and auditory speech information. Instead,the McGurk effect and the intelligibility benefits ofaudiovisual speech perception provide strong evidencefor the inherently multimodal nature of speechprocessing, contrary to a ‘general auditory’ view. Asa whole, the audiovisual speech perception evidencesupports the assumptions which make possible thediscussion of evidence for links between speechperception and production below.

Phonetic ConvergenceRecent work by Pardo builds on the literature oflinguistic ‘alignment’ to find further evidence of anactive link between speech perception and productionin ‘real-time,’ typical communicative tasks. She hadpairs of speakers play a communication game calledthe ‘map task,’ where they must cooperate to copya path marked on one speaker’s map to the other’sblank map without seeing one another. The speakersrefer repeatedly to certain landmarks on the map,and Pardo examined their productions of these targetwords over time. She asked naive listeners to comparea word from one speaker at both the beginningand end of the game with a single recording of thesame word said by the other speaker. Consistentlyacross pairs, she found that the recordings fromthe end of the task were judged to be more similarthan those from the beginning. Previous studies haveshown that speakers may align in their patterns ofintonation,58 for example, but Pardo’s are the firstresults demonstrating such alignment at the phoneticlevel in an ecologically valid speech setting.

This ‘phonetic convergence’ phenomenon defiesexplanation unless the processes of speech percep-tion and subsequent production are somehow linkedwithin an individual. Otherwise, what a speaker hearshis or her partner say could not affect subsequentproductions. Further implications of the convergencephenomenon become apparent in light of the cate-gorical perception literature described in ‘CategoricalPerception Effects’ above. In these robust speech per-ception experiments, listeners appear to be unable to

reliably detect differences in acoustic realization ofparticular segments.9 Yet the convergence observed inPardo’s work seems to operate at the sub-phonemiclevel, affecting subtle changes within linguistic cat-egories (i.e., convergence results do not depend onwhole-segment substitutions, but much more fine-grained effects).

As Pardo’s results show, the examination of linksbetween speech perception and production has alreadypointed toward new answers to some old questions.Perhaps we do not understand categorical perceptioneffects as well as we thought—if the speech listenershear can have these gradient within-category effectson their own speech production, then why is it thatthey cannot access these details in the discriminationtasks of classic categorical perception experiments?And what are the impacts of the answer for variousrepresentational theories of speech?

Perception-Driven Adaptation in SpeechProductionDespite the typical separation between speech percep-tion and production, the idea that the two processesinteract or are coupled within individual speakersis not new. In 1990, Bjorn Lindblom introducedhis ‘hyper-articulation and hypo-articulation’ (H&H)theory, which postulated that speakers’ productionof speech is subject to two conflicting forces: econ-omy of effort and communicative contrast.41 Thefirst pressures speech to be ‘hypo-articulated,’ withmaximally reduced articulatory movements and max-imal overlap between movements. In keeping withthe theory’s roots in speech production research, thisforce stems from a speaker’s motor system. The con-trasting pressure for communicative distinctivenesspushes speakers toward ‘hyper-articulated’ speech,executed so as to be maximally clear and intelligi-ble, with minimal co-articulatory overlap. Crucially,this force stems from listener-oriented motivation. Cir-cumstances that make listeners less likely to correctlyperceive a speaker’s intended message—ranging fromphysical factors like presence of background noise, topsychological factors such as the lexical neighborhooddensity of a target word, to social factors such as alack of shared background between the speaker andlistener—cause speakers to hyper-articulate, expend-ing greater articulatory effort to ensure transmissionof their linguistic message.

For nearly a hundred years, speech scientistshave known that physical conditions such asbackground noise affect speakers’ production. As Laneand Tranel neatly summarized, a series of experimentsstemming from the work of Etienne Lombard in

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 641

Page 14: Speech perception and production

Overview wires.wiley.com/cogsci

1911 unequivocally showed that the presence ofbackground noise causes speakers not only to raisethe level of their speech relative to the amplitude ofthe noise, but also to alter their articulation style inways similar to those predicted by H&H theory.59

No matter the eventual status of H&H theory in allits facets, this ‘Lombard Speech’ effect empiricallydemonstrates a real and immediate link between whatspeakers are hearing and the speech they produce.As even this very early work demonstrates, speechproduction does not operate in a vacuum, free fromthe influences of its perceptual counterpart; the twoprocesses are coupled and closely linked.

Much more recent experimental work hasdemonstrated that speakers’ perception of their ownspeech can be subject to direct manipulation, asopposed to the more passive introduction of noise usedin inducing Lombard speech, and that the resultingchanges in production are immediate and extremelypowerful. In one experiment conducted by Houde andJordan, for example, speakers repeatedly produceda target vowel [ε], as in bed, while hearing theirspeech only through headphones. The researchersran the speech through a signal processing programwhich calculated the formant frequencies of thevowel and shifted them incrementally toward thefrequencies characteristic of [æ], raising the firstformant and lowering the second. Speakers werecompletely unaware of the real-time alteration of theacoustics corresponding to their speech production,but they incrementally shifted their articulation of[ε] to compensate for the researchers’ manipulation:they began producing lower first formants and highersecond formants. This compensation was so dramaticthat speakers who began by producing [ε] ended theexperiment by saying vowels much closer to [i] (whenheard outside the formant-shifting influence of themanipulation).60

Houde, Jordan, and other researchers workingin this paradigm point out that such ‘sensorimotoradaptation’ phenomena demonstrate an extremelypowerful and constantly active feedback system inoperation during speech production.61,62 Apparently,a speaker’s perception of his or her own speech plays asignificant role in the planning and execution of futurespeech production.

The Role of Feedback—Modeling SpokenLanguage UseIn his influential theory of speech production planningand execution, Levelt makes explicit use of suchperceptual feedback systems in production.63 Incontrast to Lindblom’s H&H theory, Levelt’s model(WEAVER++) was designed primarily to provide

an account of how lexical items are selected frommemory and translated into articulation, along withhow failures in the system might result in typicalspeech errors. In Levelt’s model, speakers’ perceptionof their own speech allows them to monitor for errorsand execute repairs. The model goes a step further,however, to posit another feedback loop entirelyinternal to the speaker, based on their experiencewith mappings between articulation and acoustics.

According to Levelt’s model, then, for any givenutterance a speaker has several levels of verificationand feedback. If, for example, a speaker decides tosay the word day the underlying representation ofthe lexical item is selected and prepared for articu-lation, presumably following the various steps of themodel not directly relevant here. Once the articula-tion has been planned, the same ‘orders’ are sent toboth the real speech organs and a mental emulator or‘synthesizer’ of the speaker’s vocal tract. This emu-lator generates the acoustics that would be expectedfrom the articulatory instructions it received, based onthe speaker’s past experience with the mapping. Theexpected acoustics feed back to the underlying repre-sentation of day to check for a match with rememberedinstances of the word. Simultaneous to this process,the articulators are actually executing their move-ments and generating acoustics. That signal entersthe speaker’s auditory pathway, where the resultingspeech percept feeds back to the same underlyingrepresentation, once again checking for a match.

Such a system may seem redundant, but eachcomponent has important properties. As Moore pointsout for his own model (see below), internal feedbackloops of the type described in Levelt’s work allowspeakers to repair errors much more quickly thanreliance on external feedback would permit, whichtranslates to significant evolutionary advantages.45

Without external loops backing up the internal sys-tems, however, speakers might miss changes to theirspeech imposed by physical conditions (e.g., noise).Certainly the adaptation observed in Houde and Jor-dan’s work demonstrates active external feedbackcontrol over speech production: only an external loopcould catch the disparity between the acoustics aspeaker actually perceives and his or her underlyingrepresentation. And indeed, similar feedback-reliantmodels have been proposed as the underpinnings ofnon-speech movements such as reaching.64

As suggested above, Moore has recently pro-posed a model of speech communication that alsoincorporates multiple feedback loops, both internaland external.45 His Predictive Sensorimotor Con-trol and Emulation (PRESENCE) model goes farbeyond the specifications of Levelt’s production

642 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 15: Speech perception and production

WIREs Cognitive Science Speech perception and production

model, however, to incorporate additional feedbackloops that allow the speaker to emulate the listener’semulation of the speaker, and active roles for tradi-tionally ‘extra-linguistic’ systems such as the speaker’saffective or emotional state. In designing his model,Moore attempts to take the first step in what heargues is the necessary unification of not just researchon speech perception and production, but the workrelated to speech in many other fields as well, such asneuroscience, automated speech recognition, text-to-speech synthesis, and biology, to name just a few.45

Perhaps most fundamental to our discussionhere, however, is the role of productive emula-tion or feedback during speech perception in themodel. Where Levelt’s model deals primarily withspeech production, Moore’s PRESENCE incorporatesboth speech perception and production, deliberatelyemphasizing their interdependence and mutually con-straining relationship. According to his model, speechperception takes place with the aid of listener-internalemulation of the acoustic-to-articulatory mappingpotentially responsible for the received signal. AsMoore puts it, speech perception in his model isessentially a revisiting of the idea of ‘recognition-by-synthesis’ (e.g., Ref 65), whereas speech production is(as in Levelt) ‘synthesis by recognition.’

Neurobiological Evidence—Mirror NeuronsThe experimental evidence we considered abovesuggests pervasive links between what listeners hearand the speech they produce. Conversational partnersconverge in their production of within-categoryphonetic detail, speakers alter their speech stylesin adverse listening conditions, and manipulationof speakers’ acoustic feedback from their ownspeech can dramatically change the speech theyproduce in response. As we also considered, varioustheoretical models of spoken language use havebeen proposed to account for these phenomena andthe observed perceptual and productive links. Untilrecently, however, very little neurobiological evidencesupported these proposals. The idea of a speaker-internal vocal tract emulator, for instance, seemedhighly implausible to many speech scientists; howwould the brain possibly implement such a structure?

Cortical populations of newly discovered ‘mirrorneurons,’ however, seem to provide a plausibleneural substrate for proposals of direct, automatic,and pervasive links between speech perception andproduction. These neurons ‘mirror’ in the sensethat they fire both when a person performs anaction themselves and when they perceive someoneelse performing the same action, either visually or

through some other (auditory, tactile) perceptualmode. Human mirror neuron populations appearto be clustered in several cortical areas, includingthe pre-frontal cortex, which is often implicated inbehavioral inhibition and other executive function,and areas typically recognized as centers of speechprocessing, such as Broca’s area (for in-depth reviewof the literature and implications, see Ref 66).

Neurons which physically equate (or at leastdirectly link) an actor’s production and perceptionof a specific action have definite implications fortheories linking speech perception and production:they provide a potential biological mechanism.The internal feedback emulators hypothesized mostrecently by Levelt and Moore could potentially berealized in mirror neuron populations, which wouldemulate articulatory-to-acoustic mappings (and viceversa) via their mutual sensitivity to both processesand their connectivity to both sensory and motorareas. Regardless of their specific applicability toLevelt and Moore’s models, however, these neuronsdo appear to be active during speech perception, as onestudy using Transcranial Magnetic Stimulation (TMS)demonstrates elegantly. TMS allows researchers totemporarily either attenuate or elevate the backgroundactivation of a specific brain area, respectivelyinducing a state similar to the brain damage causedby a stroke or lesion or making it so that anyslight increase in the activity of the area causes overtbehavior when its consequences would not normallybe observable. The later excitation technique was usedby Fadiga and colleagues, who raised the backgroundactivity of specific motor areas controlling the tonguetip. When the ‘excited’ subjects then listened tospeech containing consonants which curled the tongueupward, their tongues twitched correspondingly.67

Perceiving the speech caused activation of the motorplans that would be used in producing the samespeech—direct evidence of the link between speechperception and production.

Perception/Production Links—ConclusionClearly, the links between speech perception andproduction are inherent in our use of spoken lan-guage. They are active during typical speech per-ception (TMS mirror neuron study), are extremelypowerful, automatic and rapid (sensorimotor adap-tation), and influence even highly ecologically validcommunication tasks (phonetic convergence). Spokenlanguage processing, therefore, seems to represent alinking of sensory and motor control systems, as thepervasive effects of visual input on speech percep-tion suggest. Indeed, speech perception cannot be

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 643

Page 16: Speech perception and production

Overview wires.wiley.com/cogsci

just sensory interpretation and speech productioncannot be just motor execution. Rather, both pro-cesses draw on common resources, using them intandem to accomplish remarkable tasks such as gen-eralization from talker to talker and acquiring newlexical items. As new information regarding theselinks comes to light, models such as Lindblom’s H&H,Levelt’s WEAVER++, and Moore’s PRESENCE willboth develop greater reflection of the actual capa-bilities of language users (simultaneous speakers andlisteners) and be subject to greater constraint in theirhypotheses and mechanisms. And hopefully, theoryand experimental evidence will converge to discoverhow speech perception and production interact in thehighly complex act of vocal communication.

CONCLUSIONSDespite the strong intuitions and theoretical tradi-tions of linguists, psychologists, and speech scientists,spoken language does not appear to straightfor-wardly consist of linear sequences of discrete, idealizedabstract, context-free symbols such as phonemes orsegments. This discovery begs the question, however;how does speech convey equivalent information acrosstalkers, dialects, and contexts? And how do languageusers mentally represent both the variability and con-stancy in the speech they hear?

New directions in research on speech percep-tion include theories of exemplar-based representationof speech and experiments designed to discover thespecificity, generalized application, and flexibility oflisteners’ perceptual representations. Efforts to focuson more ecologically valid tasks such as spokenword recognition also promise fruitful progress incoming years, particularly those which provide testsof theoretical and computational models. In speechproduction, meanwhile, the apparent convergence ofacoustic and articulatory theories of representationpoints to the emerging potential for exciting newlines of research combining their individual successes.At the same time, more and more speech scientistsare turning their research efforts toward variabilityin speech, and what patterns of variation can revealabout speakers’ language-motivated control and lin-guistic knowledge.

Perhaps the greatest potential for progress anddiscovery, however, lies in continuing to explorethe behavioral and neurobiological links betweenspeech perception and production. Although madeseparate by practical and conventional scientific con-siderations, these two processes are inherently andintimately coupled, and it seems that we will nevertruly be able to understand the human capacity forspoken communication until they have been concep-tually reunited.

REFERENCES

1. Hockett CF. A Manual of Phonology. Baltimore:Waverly Press; 1955.

2. Chomsky N, Miller GA. Introduction to the formalanalysis of natural languages. In: Luce RD, Bush RR,Galanter E, eds. Handbook of Mathematical Psychol-ogy. New York: John Wiley & Sons; 1963, 269–321.DOI:10.1016/S0010-0277(98)00034-1.

3. Liberman AM, Cooper FS, Shankweiler DP, Studdert-Kennedy M. Perception of the speech code. PsycholRev 1967, 74:431–461. DOI:10.1037/h0020279.

4. Potter RK. Visible patterns of sound. Science 1945,9:463–470.

5. Cooper FS. Research on reading machines for the blind.In: Zahl PA, ed. Blindness: Modern Approaches to theUnseen Environment. Princeton: Princeton UniversityPress; 1950, 512–543.

6. Cooper FS, Borst JM, Liberman AM. Analysis and syn-thesis of speech-like sounds. J Acoust Soc Am 1949,21:461–461.

7. Liberman AM, Mattingly IG. The motor theory ofspeech perception revised. Cognition 1985, 21:1–36.

8. Goldstein, L, Fowler, CA. Articulatory phonology: aphonology for public language use. In: Schiller NO,Meyer AS, eds. Phonetics and Phonology in LanguageComprehension and Production. Berlin: Mouton deGruyter; 2003, 159–207.

9. Liberman AM, Harris KS, Hoffman HS, GriffithBC. The discrimination of speech sounds within andacross phoneme boundaries. J Exp Psychol 1957,54:358–368.

10. Stevens KN. Features in speech perception and lexicalaccess. In: Pisoni DB, Remez RE, eds. Handbook ofSpeech Perception. Malden: Blackwell Science; 2005,125–155.

11. Niedzielski N. The effect of social information on theperception of sociolinguistic variables. J Lang SocialPsychol 1999, 18:62–85.

12. Johnson KA. Speaker normalization in speech percep-tion. In: Pisoni DB, Remez RE, eds. Handbook ofSpeech Perception. Malden: Blackwell Science; 2005.

644 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 17: Speech perception and production

WIREs Cognitive Science Speech perception and production

13. Ganong WF. Phonetic categorization in auditoryword perception. J Exp Psychol 1980, 6:110–125.DOI:10.1037/0096-1523.6.1.110.

14. Howes D. On the relation between the intelligibilityand frequency of occurrence of English words. J AcoustSoc Am 1957, 29:296–305.

15. Oldfield RC. Things, words and the brain. Q J ExpPsychol 1966, 18:340–353.

16. Goldiamond I, Hawkins WF. Vexierversuch: the logrelationship between word-frequency and recognitionobtained in the absence of stimulus words. Journal ofExperimental Psychology. 1958, 56:457–463.

17. Luce PA, Pisoni DB. Recognizing spoken words: theneighborhood activation model. Ear Hearing 1998,19:1–36.

18. Luce PA, Goldinger SD, Auer ET, Vitevitch MS. Pho-netic priming, neighborhood activation, and PARSYN.Perception Psychophys 2000, 62:615–625.

19. Goldinger SD. Echoes of echoes? An episodic theory oflexical access. Psychol Rev 1998, 105:251–279.

20. Nygaard LC, Sommers MS, Pisoni DB. Speech percep-tion as a talker-contingent process. Psychol Sci 1994,5:42–46.

21. Port R. How are words stored in memory? Beyondphones and phonemes. New Ideas Psychol 2007, 25:143–170.

22. Abler WL. On the particulate principle of self-diversifying systems. J Social Biol Struct 1989, 12:1–13.

23. Humboldt Wv. Linguistic Variability and IntellectualDevelopment. Baltimore: University of Miami Press;1836.

24. Fromkin VA. Slips of tongue. Sci Am 1973, 229:110–117.

25. Mowrey RA, MacKay IRA. Phonological primitives:electromyographic speech error evidence. J Acoust SocAm 1990, 88:1299–1312.

26. Stevens, KN. 1986., Models of phonetic recognition II:a feature-based model of speech recognition. MontrealSatellite Symposium on Speech Recognition (TwelfthInternational Congress of Acoustics). 67–68.

27. Jackobson R, Fant G, Halle M. Preliminaries to SpeechAnalysis: The Distinctive Features. Cambridge: MIT;1952.

28. Chomsky N, Halle M. The Sound Pattern of English.New York: Harper and Row; 1968.

29. Fowler CA. Listeners do hear sounds, not tongues.J Acoust Soci Am 1996, 99:1730–1741. DOI:10.1121/1.415237.

30. Borden GJ, Harris KS, Raphael LJ. Speech Sci-ence Primer: Physiology, Acoustics and Perceptionof Speech. 4th ed. Baltimore: Lippincott Williams &Wilkins; 2003.

31. Lisker L, Abramson AS, Cooper FS, Schvery MH.Transillumination of the larynx in running speech.

J Acoust Soc Am 1969, 45:1544–1546. DOI:10.1121/1.1911636.

32. Perkell JS. Physiology of Speech Production: Resultsand Implications of a Quantitative CineradiographicStudy. Cambridge: MIT Press; 1969.

33. Nadler, R, Abbs, JH, Fujimora, O. 1987., Speechmovement research using the new x-ray microbeam sys-tem. 11th International Congress of Phonetic Sciences;Tallinn. 221–224.

34. Perkell J, Cohen M, Svirsky M, Matthies M, Gara-bieta I, et al. Electromagnetic mid-sagittal articulo-meter (EMMA) systems for transducing speech artic-ulatory movements. J Acoust Soc Am 1992, 92:3078–3096.

35. Stone M, Lundberg A. Three-dimensional tongue sur-face shapes of English consonants and vowels. J AcoustSoc Am 1996, 99:3728–3737.

36. Stone M. A guide to analysing tongue motion fromultrasound images. Clin Linguist Phon 2005, 19:455–501.

37. Goldstein L, Pouplier M, Chen L, Saltzman E,Byrd D. Dynamic action units in speech productionerrors. Cognition 2007, 103:386–412. DOI:10.1016/j.cognition.2006.05.010.

38. Browman C, Goldstein L. Tiers in articulatory phonol-ogy, with some implications for casual speech. In:Kingston J, Beckman M, eds. Papers in LaboratoryPhonology I: Between the Grammar and the Physics ofSpeech. Cambridge: Cambridge University Press; 1990,341–376.

39. Fowler CA. An event approach to the study of speech-perception from a direct realist perspective. J Phon1986, 14:3–28.

40. Rizzolatti G, Fadiga L, Gallese V, Fogassi L. Premo-tor cortex and the recognition of motor actions. CognBrain Res 1996, 3:131–141.

41. Lindblom, BEF. Explaning phonetic variation: a sketchof H&H theory. In: Hardcastle HJ, Marchal A, eds.Speech Production and Speech Modeling,, NATO ASISeries D: Behavioural and Social Sciences. Dordrecht:Kluwer Academic Press; 1990, 403–439.

42. Kent RD, Netsell R. Effects of stress contrasts oncertain articulatory parameters. Phonetica 1971, 24:23–44.

43. de Jong K. The supraglottal articulation of prominencein English: Linguistic stress as localized hyperarticula-tion. J Acoust Soc Am 1995, 97:491–504.

44. de Jong K. Stress, lexical focus, and segmental focusin English: patterns of variation in vowel dura-tion. J Phon 2004, 32:493–516. DOI:10.1016/j.wocn.2004.05.002.

45. Moore RK. Spoken language processing: piecingtogether the puzzle. Speech Commun 2007, 49:418–435.

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 645

Page 18: Speech perception and production

Overview wires.wiley.com/cogsci

46. Wilson EO. Consilience: The Unity of Knowledge. NewYork: Knopf; 1998.

47. Denes P, Pinson E. The Speech Chain. Garden City:Anchor Press/Doubleday; 1963.

48. Mattingly, IG, Liberman, AM. Specialized perceiv-ing systems for speech and other biologically signifi-cant sounds. In: Edelman GM, Gall E, Cowan WM,eds. Auditory Function: Neurological Bases of Hear-ing: New York, NY: John Wiley & Sons; 1988,775–793.

49. Kluender, KR. Speech perception as a tractable problemin cognitive science. In: Gernsbacher M, ed. Handbookof Psycholinguistics. San Diego: Academic Press; 1994,172–217.

50. Diehl RL, Lotto AJ, Holt LL. Speech perception. AnnuRev Psychol 2004, 55:149–179.

51. Sumby WH, Pollack I. Visual contribution to speechintelligibility in noise. J Acoust Soc Am 1954, 26:212–215.

52. Geers A, Brenner C. Speech perception results: audi-tion and lipreading enhancement. Volta Rev 1994,96:97–108.

53. Lachs L, Pisoni DB, Kirk KI. Use of audiovisualinformation in speech perception by prelingually deafchildren with cochlear implants: a first report. EarHearing 2001, 22:236–251.

54. Bergeson, TR, Pisoni, DB. Audiovisual speech percep-tion in deaf adults and children following cochlearimplantation. In: Calvert GA, Spence C, Stein BE, eds.The Handbook of Multisensory Processes. Cambridge:MIT Press; 2004, 749–772.

55. McGurk H, McDonald J. Hearing lips and seeingvoices. Nature 1976, 264:746–748.

56. Fowler CA, Dekle DJ. Listening with eye and hand:crossmodal contributions to speech perception. J ExpPsychol 1996, 17:816–828.

57. Rosenblum LD, Schmuckler MA, Johnson JA. TheMcGurk effect in infants. Percept Psychophys 1997,59:347–357.

58. Gregory SW, Webster S. A nonverbal signal in voices ofinterview partners effectively predicts communication

accomodation and social status perceptions. J Pers SocPsychol 1996, 70:1231–1240.

59. Lane H, Tranel B. The Lombard sign and the roleof hearing in speech. J Speech Hearing Res 1971,4:677–709.

60. Houde JF, Jordan MI. Sensorimotor adaptation inspeech production. Science 1998, 279:1213–1216.

61. Houde JF, Jordan MI. Sensoriomotor adaptation ofspeech I: compensation and adaptation. J Speech Hear-ing Res 2002, 45:295–310.

62. Villacorta VM, Perkell J, Guenther FH. Sensorimotoradaptation to feedback perturbations of vowel acous-tics and its relation to perception. J Acoust Soc Am2007, 122:2306–2319. DOI:10.1121/1.2773966.

63. Levelt WJ, Roelofs A, Meyer AS. A theory of lexicalaccess in speech production. Behav Brain Sci 1999,22:1–75.

64. Kawato M. Adaptation and learning in control of vol-untary movement by the central nervous system. AdvRobot 1989, 3:229–249.

65. Halle M, Stevens KN. Speech recognition: a model anda program for research. IRE Trans Information Theory1962, 8:155–159. DOI:10.1109/TIT.1962.1057686.

66. Rizzolatti G, Sinigaglia C. Mirrors in the Brain: Howour Minds Share Actions and Emotions. Oxford:Oxford University Press, 2008.

67. Fadiga L, Craighero L, Buccino G, Rizzolatti G.Speech listening specifically modulates the excitabil-ity of tongue muscles: a TMS study. Eur J Neurosci2002, 15:399–402. DOI:10.1046/j.0953-816x.2001.01874.x.

68. Cooper FS, Liberman AM, Borst JM. The intercon-version of audible and visible patterns as a basis forresearch in the perception of speech. Proc Natl AcadSci 1951, 37:318–325.

69. Liberman AM. Some results of research on speechperception. J Acoust Soc Am 1957, 29(1):117–123.

70. Cleary M, Pisoni DB. Speech perception and spokenword recognition: Research and theory. Pp 499–534in Goldstein EB, ed. Handbook of Perception. 2001.Blackwell Publishers Inc., Malden, MA.

FURTHER READING

Speech PerceptionJohnson K, Mullinex JW, eds. Talker Variability in Speech Processing. San Diego: Academic Press; 1997.Pisoni DB, Remez RE, eds. The Handbook of Speech Perception. Malden: Blackwell Sciece; 2005.

Acoustic & Articulatory PhoneticsLagefoged P. A Course in Phonetics. 5th ed. Boston: Thomson/Wadsworth; 2005.Stevens KN. Acoustic Phonetics. Cambridge: MIT Press; 1998.

646 2010 John Wiley & Sons, L td. Volume 1, September/October 2010

Page 19: Speech perception and production

WIREs Cognitive Science Speech perception and production

Perception/Production LinksBradlow AR, Pisoni DB, Akahane-Yamada R, Tohkura Y. Training Japanese listeners to identify English /r/ and/l/: IV. Some effects of perceptual learning on speech production. J Acoust SocAm 1997 101: 2299–2310.Iacoboni, M. Mirroring People: The New Science of How we Connect with Others. New York: Farrar, Strausand Giroux; 2008.

Volume 1, September/October 2010 2010 John Wiley & Sons, L td. 647