Theories of Speech Perception

36

description

Theories of Speech Perception Introduction Theories: Motor Theory Synthesis Theory Direct Realism Theory Stage Theories

Transcript of Theories of Speech Perception

Page 1: Theories of Speech Perception
Page 2: Theories of Speech Perception

Presentor: Syeda UroojAsma AghaYasmeen JamilRahat Umer

Page 3: Theories of Speech Perception
Page 4: Theories of Speech Perception

SPEECH PERCEPTION Articulatory phonetics

Production based Place and manner of articulation

Acoustic phonetics Based on the acoustic signal

Formants, transitions, co-articulation, etc.

Page 5: Theories of Speech Perception

Speech Production to Perception

Acoustic cues are extracted and stored in sensory

memory and then mapped onto linguistic information

Air is pushed into the larynx across the vocal cords and into the

mouth nose, different types of sounds are produced.

The different qualities of the sounds are represented in formants

The formants and other features are mapped onto phonemes

Page 6: Theories of Speech Perception

Theoretical approaches

Page 7: Theories of Speech Perception

Theories of Speech Perception Theories of speech perception must be able to account for

certain facts about the acoustic speech signal, e.g.:

There is inter-speaker and intra-speaker variability among signals that convey information about equivalent phonetic events.

The acoustic speech signal is continuous even though it is perceived as and represents a series of discrete units.

Speech signals contain cues that are transmitted very quickly (20 to 25 sounds per second) and simultaneously.

Page 8: Theories of Speech Perception

Scope of the problem

Speech perception involves the mapping of speech acoustic signals onto linguistic messages (e.g., phonemes, distinctive features, syllables, words, phrases…)

Page 9: Theories of Speech Perception

Types of Theories:Theories of speech perception fall into one of three broad classes:

Motor Theories: Perception involves processes related to the production of speech. Examples include Motor Theory and Analysis-by-Synthesis.

Direct Perception: Perception recovers the sound producing objects directly. Examples include Fowler’s Direct Realist Approach.

Stage theories: - Perception involves a sequence of transforms from sound to object. Examples include TRACE and LAFS.

Page 10: Theories of Speech Perception
Page 11: Theories of Speech Perception

Motor Theory of Speech Perception(Liberman & Mattingly, 1967) "...overlapping activity of several neural networks - those that

supply control signals to the articulators, and those that process incoming neural patterns from the ear..." and "... that information can be correlated by these networks and passed through them in either direction." (Liberman et al, 1967)

…“the candidate signal descriptions are computed by an analogue of the production process—an internal, innately specified vocal-tract synthesizer…—that complete information about the anatomical and physiological characteristics of the vocal tract and also about the articulatory and acoustic consequences of linguistically significant gestures”. (Liberman & Mattingly, 1985 (revised))

Page 12: Theories of Speech Perception

Motor Theory

Motor Theory has, as its core, the premise that perception involves a reference to articulation. This view is often associated with the idea that speech is somehow “special” and involves specialized, species-specific mechanisms in perception.

Page 13: Theories of Speech Perception

Motor Theory This model was developed in 1967 by Liberman and

colleagues

The basic principle of this model lies with the production of speech sounds in the speaker's vocal tract.

The Motor Theory proposes that a listener specifically perceives a speaker's phonetic gestures while they are speaking.

Speech is perceived in humans by means of a specialized speech module.

Page 14: Theories of Speech Perception

Motor Theory (… contd) A phonetic gesture is a representation of the speaker's

vocal tract constriction while producing a speech sound.

Each phonetic gesture is produced uniquely in the vocal tract.

The different places of producing gestures permit the speaker to produce salient phonemes for listeners to perceive.

The Motor Theory model functions by using separate embedded models within the main model. It is the interaction of these models that makes Motor Theory possible.

Page 15: Theories of Speech Perception

Motor Theory (… contd)According to the motor theory of speech perception:

We have a special system for processing speech.

Perception and production are closely linked.

Motor commands in the brain that control movements of the muscles used to speak help us perceive speech.

Humans are born with a module that connects sounds with mental commands - we have an innate speech processing module.

Page 16: Theories of Speech Perception

The Speech Chain(s)

Page 17: Theories of Speech Perception
Page 18: Theories of Speech Perception

Analysis-by-Synthesis Theory of Speech Perception(Stevens and Halle 1967)

Stevens and Halle (1967) have postulated that"... the perception of speech involves the internal synthesis of patterns according to certain rules, and a matching of these internally generated patterns against the pattern under analysis. ..moreover, ...the generative rules in the perception of speech [are] in large measure identical to those utilized in speech production, and that fundamental to both processes [is] an abstract representation of the speech event."

Page 19: Theories of Speech Perception

Analysis-by-Synthesis Model In this model the incoming acoustic signal is subjected

to an initial analysis at the periphery of the auditory system.

This information is then passed upward to a master control unit and is there processed along with certain contextual constraints derived from preceding segments.

This produces an hypothesized abstract representation defined in terms of a set of generative rules.

Page 20: Theories of Speech Perception

Analysis-by-Synthesis Model This is then used to generate motor commands, but

during speech perception articulation is inhibited and instead the commands produce a hypothetical auditory pattern which is then passed to a comparator module which compares this with the original signal which is held in a temporary store. If a mismatch occurs the procedure is repeated until a suitable match is found.

Page 21: Theories of Speech Perception

Analysis-by-synthesis Model (after Stevens, 1972)

Page 22: Theories of Speech Perception
Page 23: Theories of Speech Perception

Direct Realism Theory of Speech Perception(Carol Fowler, 1986)

The direct realist theory of speech perception is a part of the more general theory of direct realism, which postulates that perception allows us to have direct awareness of the world because it involves direct recovery of the distal source of the event that is perceived.

Page 24: Theories of Speech Perception

Direct Realism Theory The theory asserts that the objects of perception are actual vocal

tract movements, or gestures, and not abstract phonemes or (as in the Motor Theory) events that are causally antecedent to these movements, i.e. intended gestures.

Listeners perceive gestures not by means of a specialized decoder (as in the Motor Theory) but because information in the acoustic signal specifies the gestures that form it.

By claiming that the actual articulatory gestures that produce different speech sounds are themselves the units of speech perception, the theory bypasses the problem of lack of invariance.

Page 25: Theories of Speech Perception

You say you have a theory?

The result of underestimating the complexity of perceptual processing in a theory.

Page 26: Theories of Speech Perception
Page 27: Theories of Speech Perception

Stage Theories Diverse set of theories that do not assume a link

between production and perception.

Role and nature of segmental (phonetic) representation is diverse.

Page 28: Theories of Speech Perception

Stage Theories – Key ElementsCoding is based on auditory processes.

All use intermediate representations though nature of representations is diverse.

All use an information processing framework (perception is the result of a sequence of transformations).

Page 29: Theories of Speech Perception

LAFS THEORYLAFS – Lexical Access From Spectra

Page 30: Theories of Speech Perception

LAFS - Lexical Access From Spectra(Klatt, 1979) In LAFS, Klatt proposed that the input is an auditory

representation of the signal. This representation is a series of spectral sections.

A finite-state network parses the input. The path through the network that results from parsing an input is a word. That is, this system maps a sequence of spectral sections onto a word. Parts of the network that correspond to sequences of spectral sections are isomorphic to “diaphones” (a type of context sensitive allophone).

Page 31: Theories of Speech Perception

LAFS - Key Elements The invariant for perception is a characterization of the

spectral shape, over time.

The “perceptual unit” is the context sensitive allophone, but listeners have no direct access to this representation (phonetic perception is lexically mediated).

Processing is controlled by a temporal parsing process (implemented as a finite state machine).

Note that Hillenbrand’s model of vowel recognition is similar to LAFS.

Page 32: Theories of Speech Perception

TRACE THEORY

Page 33: Theories of Speech Perception

TRACE (McClelland & Elman, 1986) Elman and McClellan proposed TRACE as a stages

model that consists of an auditory (ear) front end, auditory feature extraction, a phonetic level, and a lexical level.

TRACE is implemented in a connectionist architecture and has both ascending and descending (feedback) connections as well as connections within each level.

TRACE is both a theory and, in its two versions, a model of perception.

Page 34: Theories of Speech Perception

TRACE – Key Elements Invariant cues are not required. Perception is a result

of a cascade of stages involving a one-to-many and many-to-one mapping (behaves like a prototype system).

There are two variants of TRACE. One uses a triphone(context-sensitive allophone) representation and the other an abstract phoneme.

Feedback and competition among nodes at the same level are used to stabilize perception.

Page 35: Theories of Speech Perception
Page 36: Theories of Speech Perception