Nature of Speech

8/3/2019 Nature of Speech

1/20

1

1

The nature of speech

1.1 More than a word

Darling?, came an anxious voice as he edged open the

unlocked door to the darkened hotel room. He knew at once it was not his

wifes voice. It was the voice of a younger woman an Australian, he

thought.

Sadly, perhaps, the rest of this book will not reveal whether our hero was a

leading scientist about to be seduced by a beautiful spy, or just the kind of

average fellow who is prone to get out of the hotel lift at the wrong floor

without noticing and blunder into someone elses room. Instead, the book will

concentrate on an ultimately more intriguing mystery: how speech conveys

information. This, broadly, is the subject matter of the discipline ofphonetics.

In the fictional example above the unseen woman spoke only one word. It is

clear, however, that as she did so a number of quite different kinds of

information were conveyed. The hearer identified a particular word (darling)

of the language he shares with the speaker. As indicated by the question mark,

he interpreted it to be a query of some kind. Something about the utterance

told the hearer that the speaker was anxious perhaps it was spoken quickly,

quietly, and a little breathily. He knew immediately from the voice that the

speaker was not his wife; nor, presumably, anyone else he knew very well. But

even so he was able to infer, with at least a fair degree of confidence, further

facts about the unknown speaker: her sex, her age, and her geographical

background. Unlike the word, and its function as a query, these other kinds of

information cannot be directly represented in the way an utterance is written,

and so an author has to resort to commentary to convey them.

The main point, then, is that any utterance conveys a number of distinct

types of information. Two further points need to be made. Firstly, not all types

of information conveyed by an utterance are equally intended by the speaker.

Speakers clearly do intend to say particular words appropriate to given


2/20


2

contexts; but it would be odd to suggest that they normally intend to sound

like a tired young woman, or a large man with a cold. The pair of terms

communicative and informativehave been used for this kind of distinction

(Lyons 1977:33). Any aspect of an utterance is informative if it potentiallymakes the hearer aware of something. Only those aspects which are intended

by the speaker to be informative are communicative.

Secondly, whilst the various types of information are distinct, they are all

conveyed by a single complex speech signal. The speech signal is the physical

link between a speaker and a hearer (the only one if they are out of sight and

touch of each other). It consists of very rapid pressure variations in the air,

caused by the speakers speech organs, and sensed by the hearers ear.

Although the speech signal can be analysed in terms of a number of separateacoustic dimensions, corresponding to what we perceive as for instance pitch,

loudness, rhythm, and so on, it is far from being the case that each type of

information will be carried by its own acoustic dimension. Any acoustic

dimension will help to carry a variety of information. The pitch of the speech

signal, for instance, might tell a listener that the utterance is a question, and

that the speaker is a man, and that he is bored. Humans are very skilled at

unravelling the information in the speech signal, but the difficulty of building

a machine that will replicate some of this skill an automatic speechrecogniser, for example demonstrates the complexity of the way in which

information is represented in the speech signal.

1.2 Information carried by the speech signal

A speaker is often described as communicating a message. Since

a speaker in fact uses the speech signal to convey a variety of information, it is

better to think of the speaker having not a simple message but a complex

communicative intent. This is made up of a number of distinct kinds ofinformation.

Cognitive information is essentially factual, or propositional; it consists of

things we know, or could know as opposed, for instance, to how we feel.

Words, and their combination into phrases and sentences, are the primary

vehicle of cognitive information, and it is the kind of information which

writing copes with best.

Affective information has to do with a speakers feelings and attitudes. In

everyday terms a speakers tone of voice is one way of conveying affectiveinformation, as in I didnt mind what he said to me, it was his patronising

tone which I objected to. Choice of words, too, can be important.


3/20

Information carried by the speech signal

3

Social information in speech might seem to be something the speaker does

not choose, and therefore to be merely informative rather than communicative.

We think of speakers having an accent which indicates they belong toparticular geographically and socially defined communities, and generally see

this as an unalterable attribute of the person. But in fact most speakers vary

their way of speaking according to the situation and their addressee(s). A

speaker may also adjust his or her accent in the direction of that of another

person, probably as a way of indicating friendliness or solidarity with that

person.

Self-presentational information concerns the speakers self-image. A

speaker who wishes to present an authoritative, knowledgable, persona to theworld may adopt a confident tone of voice (probably relatively loud, and

involving a moderate degree of muscular tension and clear, precise

pronunciation).

Finally here, though this list does not necessarily exhaust the kinds of

information a speaker may choose to convey in an utterance, is regulative

information. This concerns the management of a spoken interaction. For a

conversation to proceed smoothly there have to be some traffic rules, to

avoid the counterproductive situation of both participants speakingsimultaneously and being silent simultaneously. The participant who is

speaking will encode signals in the utterance to communicate that he or she is

full flow, and shouldnt be interrupted (maybe by speaking louder and

speeding up a little), or nearing the end of his or her conversational turn (by

lowering the voice and slowing).

In contrast to the above types of information which speakers may intend to

communicate are those which they convey willy-nilly. The latter kinds of

information leave their trace in the speech signal without intent on the part ofthe speaker, and are sometimes called indexical because they serve as an

index or indicator of aspects of an individual (e.g. Abercrombie, 1967:6). Such

aspects include the speakers social background, age, sex, physique,

psychological state, and health.

The routes by which these various kinds of information leave their traces in

the speech signal will be discussed in section 1.4.

1.3 The speech machine

We have considered various kinds of information which originate


4/20


4

within the speaker, and an external speech signal in which that information is

conveyed. The machine which produces this signal is made up of two parts,

the vocal mechanism and the linguistic mechanism. In a sense these are

rather like a computer and its software. The vocal mechanism is the physicaldevice concerned with speaking, while the linguistic mechanism is the

software which controls it.

The vocal mechanism is shown schematically in Fig. 1, which is like an x-

ray picture of someone facing to the left. The vocal mechanism consists most

obviously of the speakers mouth, throat, nose, larynx, and associated

structures such as the tongue and the vocal cords. But it also consists of the

lungs and the muscles which control breathing, since speech requires air; and

very importantly those parts of the brain and nervous system which control therest of the vocal mechanism.


5/20

The speech machine

5

Fig. 1.1 Sagittal section of the vocal mechanism

In the production of speech, broadly, air is expelled in a controlled way

from the lungs, and the airstream is interfered with at various points. These

various kinds of interference create acoustic energy (sound) of different

kinds, which is further modified by the shape of the vocal tract (the air passagethrough the mouth and nose). These processes will be dealt with in detail in

Chapter 2.


6/20


6

The linguistic mechanism includes the speakers language, but goes

beyond what is most commonly thought of as a language. The linguistic

mechanism consists of the set of conventions which the speaker shares with

the relevant language community. These conventions range over aspects suchas the following (assuming, say, that the relevant language community is

English speaking and in the South East of England): that the word for a canine

quadruped is dog; that adjectives come before nouns (a big dog not a dog big);

that a rising pitch towards the end of an utterance often signals a question; that

shifting the pronunciation of the word time in the direction of toym is less

educated, while shifting it in the direction oftame is posh or affected; and

that the use of a whispery voice can mark what is being said as in some way

confidential. The last two of these conventions would be excluded by manydefinitions of a language, but are an integral part of the linguistic mechanism

as a whole.

Linguistics gives us a more structured way of looking at some of these

conventions. Fig. 1.2 shows several components of the linguistic mechanism.

These could be thought of as a set of resources at the speakers disposal. The

lexicon is the mental dictionary shared by speakers of a language, linking

meanings and pronunciations, and including grammatical information.

Arguably it is organised on the basis of morphemes, meaningful sub-wordelements such as point, -ing, aim, -less, -ly, and a set of rules for combining

them into words such as aiming, and pointlessly. Syntax is a set of

conventions governing the combination of words grammar in its most

familiar sense. It is responsible for the sense the English speaker has, for

instance, that pass me the butter is a usable combination of words, and me

butter the pass is not. Phonology is a set of conventions specifying how a

language organises sound. The fact that English has a th sound as in thin and

French does not, or thatpalm andfarm rhyme in some varieties of English andnot in others, are two small examples of phonological differences between

languages or varieties. Prosody provides a set of conventionalised patterns of

pitch and timing which can signal the organisation of the words of an

utterance, refine the meaning of the utterance, and organise sounds within

words. So in Youre broke again? You neverhave any money! the punctuation

partially captures the prosody (the utterance is organised into two parts; the

first has questioning role despite its declarative syntax, and the second is

spoken with marked emphasis); and the bold type conveys a particularprominence lent to never by prosodic features. Not indicated in the written

form is the organisation of never into a more prominent part nev and a less


7/20

The speech machine

7

prominent part -er.

Tone of voice lies outside the usual definition of language (see 1.5), and

provides for the communication of additional information in ways such asloudness, voice quality, and whisper. Core language relies on mapping

meaning via discrete abstract categories. Tone of voice involves a more direct

signalling system where gradual changes in meaning are mapped onto gradual

changes in phonetic signals. Increasing anger may correlate gradiently with

increasing loudness. Likewise, different pitch ranges may express a continuum

of involvement or enthusiasm across utterances of the words Oh, thats great.

The categorical and gradient elements of the linguistic mechanism are

represented in Fig 1.2 by the two cylinders abutting over prosody. Thisacknowledges that part of prosody, for instance intonation pitch range, is not

categorical.

PHONETIC PLAN

Fig. 1.2 The linguistic mechanism

There must be a point of contact between the linguistic mechanism and thevocal mechanism. In the view adopted here, that point of contact is the

phonetic plan of an utterance. From the speakers point of view this is a

specification of all the sound properties which the vocal mechanism will have

to achieve during the utterance. It is also likely that the listener will have to

derive something similar to the phonetic plan as a stage in interpreting the

utterance. The exact nature of the phonetic plan is a difficult issue, and

depends on assumptions about how speech is produced and perceived.

1.4 The mapping of information onto the speech signal


8/20


8

This section combines the conceptualisation of the speech

machine developed in section 1.3 with the different sources of information

discussed in section 1.2. The purpose is to show the variety of routes by which

information gets into, or is mapped onto, the speech signal. Fig 1.3 gives an

overview of the mapping process.

Fig. 1.3 Overview of the mapping of communicative intent

1.4.1 The encoding of communicative intent

Communicative intent is shown at the top of Fig 1.4. Its mapping

onto the speech signal is mediated by the linguistic mechanism. The linguistic

mechanism can be thought of as defining a kind of code shared between

speakers of a language, and the process of mapping communicative intent onto

the speech signal as encoding. There is not, however, a simple one-to-one

relation between aspects of communicative intent and the distinct resources of

the linguistic mechanism. Cognitive information, for example, is not mapped


9/20

The mapping of information onto the speech signal

9

exclusively through lexical choices, and affective information is not conveyed

solely through choices of tone of voice.

Fig. 1.4 Mapping of communicative intent onto the linguistic mechanism

Communicating cognitive information, the most factual, message-like

element behind speaking, depends on selecting appropriate words, combining

them into grammatical structures, choosing the right prosody (e.g. question or

statement), and, less obviously, using the right tone of voice (it is possible to

override the apparent meaning of an utterance by using an ironic or

sarcastic tone of voice).

Affective communicative intent the speakers attitude can likewise be

conveyed in multiple ways: by choices of prosody and tone of voice certainly,but also by the words chosen, and perhaps by syntax (the lines are not shown

in Fig. 1.4 to avoid a visual cats cradle). Social intent may affect any of the

four resources, since finding a way of speaking appropriate to a particular

social setting may involve the precise variety of a language chosen, and a

particular tone of voice.

Self-presentation may depend on choosing the right words, opting for more

or less complex syntax, and making phonological and tone-of-voice choices.

Regulation of an interaction perhaps recruits fewest resources, thecompleteness or incompleteness of a turn being communicated to interlocutors

mainly by prosodic choices and tone of voice.


10/20


10

1.4.2 The imprinting of indexical factors

It would be less appropriate to talk of indexical factors being

encoded, as there is no intention on the part of the speaker, and no obvious

code. The metaphor used by Laver (1994:20-21) is that of a handworker

producing artifacts, and leaving traces of the apparatus used to produce the

artifact and of his or her personal style. Both the apparatus and the style can

leave what will be called here their imprint. A cast metal object might have a

detectable seam where two halves of the mould in which it was cast joined,

and a particular detail of working in its finish characteristic of the individual

who worked it. Indexical factors carried in the speech signal will be regarded

as the result of a similar kind of imprinting.

At the left of Fig 1.5 indexical factors are shown divided into two

overlapping sets, according to whether their imprint is left mainly via the

linguistic mechanism or the vocal mechanism. Consider first two extreme

cases. We can tell social indexical information, including geographical, from a

Fig. 1.5 Imprinting of indexical factors on speech

persons dialectoraccent as did our hero in section 1.1 when he guessed


11/20


11

Australian. A dialect consists in features of the linguistic mechanism specific

to a given geographical and/or social group. Dialect is often taken to refer

more broadly to any linguistic resource, including for instance grammar (I

dont know nothing about it is grammatical in many dialects of English) and

vocabulary (to laik means to play in many parts of the North of England),

whilst accent refers specifically to regular pronunciation differences, such as

the use of a particular set of vowels, or the use of a glottal stop for certain

consonants. Admittedly it was pointed out above that people have some

socially oriented communicative choices in how they speak, but for most

people such choices only cover a tiny part of the total range of variation which

exists in a particular language, and so individuals are likely to reveal

themselves reliably as for instance a Texan, or a middle class Liverpudlian. It

is fairly difficult, however, to imagine how such indexical information would

have an effect directly on the vocal mechanism.

Contrast the case of a speakers health. A simple cold can have a drastic

effect on the state of the vocal apparatus a blocked nose makes it hard to

produce words like man properly, which contain nasal consonants, and

inflammation of the vocal cords makes the whole voice sound croaky. In the

longer term, persistent hoarseness can be a cue to serious diseases of the

larynx such as cancer, and in the short term, normal tiredness will also be

reflected in aspects of a persons voice. Similarly psychological state, such as

momentary stress or longer term conditions such as depression, may also leave

its imprint on the speech signal as a result of bio-chemical effects on the

performance of the vocal mechanism. On the other hand a persons health or

psychological state would not be encoded through the linguistic mechanism

(utterances such as Ive got a throat like sandpaper or Im really depressed

encode a cognitive analysis of the states giving rise to the indexical

information, not the indexical information itself).

Between these extremes there will be indexical information which the

listener may be able to glean from the speakers linguistic resources and from

acoustic effects encoded in the speech wave directly by the vocal tract. Age,

for instance, has direct effects on the physiology of the vocal tract, including a

lowering of the larynx toward middle age which results in a deepening of the

voice, and a hardening of the vocal cords in old age which contributes to a

very old persons characteristically quavery voice. But age may also be

indicated by aspects of a persons linguistic mechanism, for instance the use of

particular words, such as wireless rather than radio; the use of slang


12/20


12

expressions, which have a notoriously short life-span (e.g. groovy; far out)

and, of more interest to phonetics, the use of particular sounds and

pronunciations. It is less apparent that pronunciation, as opposed to words and

expressions, changes within a lifetime; but there is no doubt that it does.

Informally we may be aware of this listening to sound recordings (e.g. in

films) from some decades ago. Speakers using an educated (i.e. prestige)

pronunciation of South East England were much more likely to pronounce off

as awf(as in awful), and much less likely to use glottal stops (see Chapter N)

than their equivalents today. Speakers may modify their pronunciations to

keep up with the developing trends of their speech community, but in general

they get left behind enough for pronunciation to be informative about their

age.

Speakers physique is frequently reflected in the speech signal. Large

objects have lower natural pitches (compare a violin and a cello). Larger vocal

tracts have lower resonances, and larger vocal cords vibrate more slowly.

Since there is a tendency for size of vocal tract and vocal cords to correlate

with size of person, we have a better than chance ability to guess which of two

speakers whose voices we hear is larger. Physique leaves it imprint on the

speech signal directly through the vocal tract.

Sex might be regarded as a special case of physique, and it can be inferred

with fair reliability. Apart from mens and womens differing (though

overlapping) ranges of vocal tract and vocal cord sizes, there are also

differences in the proportions of the vocal tract (concerning the pharynx,

which is proportionately longer in men) which may help the cuing of sex.

However there are also said to be languages where women speak a different

dialect, either in terms of pronunciation or grammar, and there are well

attested differences in pronunciation trends between the sexes in English (see

e.g. Trudgill 1974: 84ff). If it is truly the case that men and women have

different dialects, rather than merely being free to make choices from a shared

linguistic system, then sex is imprinted not only through the vocal mechanism

but also through the linguistic mechanism.

1.4.3 Speaker identity and other problems

This overview of the mapping of information mapped onto the

speech signal should be seen as a suggestive outline rather than a definitive

analysis. Where, for instance, does personalityfit in? There is evidence that

personality traits correlate with certain features of speech, for instance that


13/20


13

extroversion is associated with greater loudness (cf. Scherer 1979:191), but is

this a matter of self-presentation, chosen by the speaker, or is it purely

indexical, determined in a biological way? Nor are the boundaries between

categories as clear-cut as the boxes in Fig. 1.4 would suggest. Adjusting ones

way of speaking to make it more like that of an addressee may involve

changes in accent, apparently of a social kind; but it may be perceived as a cue

to affective information, equivalent to the use of a friendly voice quality and

intonation.

Another question is how accurate hearers are at utilising information carried

by the speech signal. Undoubtedly many of the factors discussed are often

inferred quite inaccurately. We can be surprised to discover the real age of

someone who sounded quite young over the phone; we can inadvertently butt

in to another persons conversational turn thinking they have signalled the

end of their own turn; and of course we can even misunderstand the

cognitive content of speech by mishearing a word or misparsing a sentence.

A surprising omission, perhaps, from the list of indexical factors, is

identity. In everyday life, particularly over the telephone, we successfully

identify a person just from speech, and techniques for identifying speakers for

forensic and other purposes (see Chapter N) exploit information about identity

in the speech signal. The omission reflects the problems surrounding the

concept of identity. It can mean the biological entity constituting a person

(this is the sense which is of relevance to forensic applications), or for instance

a persons membership of certain subgroups of the population. The biological

entity is what a fingerprint defines, with very few exceptions. But the

indirectness of the relationship between the biological person and the speech

signal, and the multiplicity of information mapped onto it, makes it

questionable whether much information about the biological individual is

available. An alternative is that our sense of hearing a particular person is

actually derived from other indexical factors. If ones uncle Herbert is a large

middle-aged man from Birmingham with chronic hoarseness, then those

factors in themselves suffice for an identification when his habitual Sunday

morning telephone call arrives.

The possibility of voice disguise, and mimicry, highlight further the

limitations of a model such as the one in Fig. 1.3. Some issues can be dealt

with, albeit imperfectly. What kind of communicative intent is involved?

Perhaps a special case of self-presentation. What linguistic resources are being

manipulated? Presumably all four may be, though we tend in particular to


14/20


14

think of disguise and mimicry primarily in terms of phonology and tone of

voice (adopting a different accent, and speaking loudly and fast, for instance).

But the model does not really allow for that part of disguise or mimcry which

consists in distorting ones vocal mechanism to make it sound like that of

someone else (known in the case of mimicry or unknown in the case of

disguise). This is not part of the linguistic mechanism (the shared conventions

of the language community hardly includes the recipe for sounding each

person in that community), and the model fails to provide a direct input from

intention to the vocal mechanism.

So in this respect, and many others, the model is imperfect. Nonetheless it

does show how the speech signal at any given moment is determined by a

wide variety of factors, and how it is potentially informative to a hearer in

many different ways, intended and unintended by the speaker. The next

section compares the ways in which different kinds of information are carried

by the speech signal.

1.5 Gradience, discreteness, and componentiality

Imagine someone asking Have you ever air-mailed a miniskirt to

Iceland? Imagine, too, that the speaker has a cold. As a result of the utterance

the hearer should have gleaned two very different kinds of information. The

first comprises the speakers communicative intent, the cognitive content of

which is almost certainly novel to the hearer. The second comprises the

speakers state of health, about which the speaker has unintentionally informed

the hearer. The way the speech signal carries these two kinds of information

contrasts in a number of respects.

As discussed in section 1.4, the cold imprints itself rather directly on the

speech signal through its effect on the vocal mechanism the blocked nose

and inflamed vocal cords. Miniskirt will sound a bit like bidiskirt because

of the blocked nose, and the whole utterance may sound rather as if it came

from a talking frog. The imprinting can vary, but only in the sense that the

severity of the cold will be reflected in the degree of distortion of the speakers

normal voice. The imprinting of the cold on the speech signal is gradient, that

is, it varies continuously.

If the listener chose to respond not to the communciative intent but to the

indexical information, the response, depending on the severity of the cold,

might be Youve got a cold, or Youve got a bad cold, or Youve got a

terrible cold or even Youve got a really terrible cold. This illustrates that


15/20

Gradience, discreteness, and componentiality

15

the language code is not gradient but works in terms of choices which are

discrete. The respondent either uses an adjective, or not, chooses the word

bad, or terrible or some other word, and so on. This discreteness is centralto how language works. It is as though language partitions our mental

experience, and allocates a label or symbol to stand for each partition.

Lets imagine now a simplified language, in which each partition is a

complete meaning, and the symbol standing for each meaning is a unique,

simple sound. The meaning Im exhausted might be conveyed by a long f

fffff; You bore me by ssssss; look at this by ooooo, and so on. This situation

is schematised below, where geometric symbols are used instead of sound

symbols.

M1 M2 M3 M4 Meanings

Symbols / Sounds

But what about Have you ever air-mailed a miniskirt to Iceland? Despite the

versatility of our vocal tracts, we would soon run out of adequately distinct

(and memorable) noises to convey the infinite variety of messages we might

need to convey. Not surprisingly, no human language uses this kind of direct

mapping of messages onto sound.

Instead, all human languages exhibit componentiality: they construct larger

units of communication out of smaller discrete components. At one level,

roughly speaking, areas of our mental experience are mapped onto a finite

number of words (the vocabulary of the language). At another level, these

words are mapped onto a comparatively small set (often less than 50) of

meaningless sound units. This is represented schematically as follows

M1 M2 M3 M4 .......... M? Meanings

Symbols / Sounds

Words

The meaningless sound units are often called phonemes and are what are

represented, albeit often inconsistently, by letters in alphabetic writing. So for


16/20


16

instance inpin, tin, kin, we have three different words composed of the same

sound units except for the first one which crucially differentiates them. By

adding s to the beginning of each of these sequences, three new words are

created (stin, in fact, is not an existing word of English, but is one which couldbe adopted if needed). The crucial fact that all human languages associate

meanings with abstract units (words or morphemes) and designate these units

by sequences of discrete meaningless sound elements is sometimes known as

the dual structure of language. The meaningful units can then be combined

into an infinite number of more complex sequential structures (phrases,

sentences) according to the syntax of the language in order to convey a

limitless set of complex meanings including that of the novel and improbable

utterance Have you ever air-mailed a miniskirt to Iceland?. More precisely,dual structure may be taken to refer to two levels of structure, the

grammatical level and the phonological level (Laver 1994:18). Meaningless

sound elements at the phonological level serve as building blocks which

designate meaningful structures at the grammatical level. Componentiality and

sequential structure thus circumvent the problems that would be encountered if

a distinct sound had to be found for every message we wanted to

communicate, and they are taken here as definitional properties of language.

But it is not the case that the speaker has no control over non-componential,gradient aspects of speech. The listener could, for instance, emphasise the

severity of the respiratory ailment perceived in the speaker by saying youve

got a REALLY TERRIBLE COLD! that is, by giving the words in capitals

extra pitch variation and length, and perhaps speaking them with a breathy

voice, as if gasping at the seriousness of the situation. These uses of extra

pitch and length, and of breathy voice, do not share the properties with the

units of language of being discrete elements which can be structured

sequentially, and so are not regarded as part of language itself. They areconsidered here to be choices from a linguistic resource oftone of voice lying

outside the language system, but closely allied to it.

Tone of voice may mimic indexical effects. A person genuinely shocked

may undergo physiological changes which make an extended pitch range and a

breathy voice an automatic consequence. But in the present example the vocal

effects are deliberately chosen by the speaker, and are part of the conventions

shared by the speaker and the hearer. Tone of voice perhaps constitutes an

exploitation and incorporation into the linguistic mechanism of the kind ofgradual vocal effects which naturally inform of certain indexical facts.


17/20

Sameness and variation

17

1.6 Sameness and variation

A language, as an abstract code, can be embodied in more than

one medium or physical carrier. Speech is one, and writing another. Thewritten medium, as exemplified in the words on this page, clearly reflects the

componentiality of language. Words are separated by spaces, and made up of

letters which are discrete elements. Each occurrence or token of a letter is an

exact replica of the other occcurences of its type. So for instance in big dogs

scare her, the spaces reflect the linguistic structure of an utterance consisting

of four words, and the two tokens of the letter are identical, as are the two

of . The structure of the language system is closely mirrored in its

realisation in the printed medium.In speech, there is a potential conflict between the discrete, componential

structure of language, and the natural behaviour of the vocal mechanism. Like

a gymnast, the vocal mechanism does not, indeed could not, move abruptly

from one static posture to another. The laws of physics, governing the

velocity, acceleration, and momentum of the gymnasts body, or the vocal

organs, conspire against such abruptness. The gymnast, and the vocal

mechanism, produce a flowing movement. This can be seen reflected in the

spectrogram of the phrase a real worry in Fig. 1.6 [p.20]. A spectrogram is away of displaying an acoustic analysis of speech produced by a computer or

other device. In the picture, time runs from left to right, so the start of the

utterance is at the left, while the vertical axis shows the breakdown of the

complex speech signal into the different pitches or frequencies which make it

up. The darker the pattern, the more sound there is at a particular frequency.

Note that there are no breaks between the words, and there are no successive

static patterns corresponding to successive sounds. This is a carefully selected

example, but it vividly demonstrates the fact that in the spoken medium thediscrete units of language are being realised in a continuous and flowing event.

In the written medium of language, it is fluent (even sloppy) handwriting

rather than printing or typescript which provides a slightly closer analogy with

speech. In a handwritten realisation ofbig dogs scare her

although the words are still separated (unlike in speech), the letters within thewords are not. More interestingly the two occurrences of and are

different. In the second , the descender loops back up again across itself on


18/20


18

the way to the following letter, and in the second , there is no diagonal

rising stroke at the start of the letter, as seen below:

The marks on the paper corresponding to a particular letter-type vary

according to the context in which they occur. Everyones handwriting is

different, but most will exhibit equivalent examples of contextual variation.

Some of the variation can be interpreted as short-cuts, but in other cases the

contextual variation may be purely a matter of habit or convention.

Something very similar is true of speech. A sound will vary in its

pronunciation according to adjacent sounds and its position. Both instances of

in khaki (the in the spelling is irrelevant to the English pronunciation)

require the body of the tongue to make a closure against the roof of the mouth,

but the first will be further back in the mouth than the second. This is in

keeping with the fact that the first vowel requires the body of the tongue to be

near the back of the mouth, and the second requires it to be raised at the front

of the mouth. The difference can be felt best if the word is said silently. If the

word is whispered, emphasing the explosion, or burst in phonetic terms, of

each , the second burst will be heard as higher in pitch. This is because of

the smaller mouth cavity in front of the release which has higher resonant

frequencies.

1.7 Phonology

The situation would be chaotic if such variation were random. We

could never be sure which linguistic unit a particular burst was representing.

What is needed is a set of conventions which express what variation is

allowable with the representatives of a particular unit, and what is not. We can

consider this with an abstract example using geometric shapes, which we

could imagine to be elements in a visual code. The shape on the left of the

arrow represents the abstract (or ideal) element of the code. The arrow means

the following ways of writing the symbol are acceptable, while the crossed

out arrow means these variants are not acceptable.


19/20

Phonology

19

In fact, it is neither necessary nor feasible to list what is unacceptable instead

we just assume that everything else not mentioned in the positive part of the

rule is not acceptable.

Much the same sort of rules can be expressed for sounds. The following

rules, which simplify the situation, express what the khaki case exemplified:

/k/ [k h ] / []

/k/ [k h ] / [i]

This means that the linguistic unit or phoneme /k/ can be made at the back of

the mouth or nearer the front (the minus and plus under the symbols mean

further back and further forward in the vocal tract respectively) depending

on the vowel following. In fact a whole range of realisations between these is

possible. Notice that the abstract linguistic unit, the phoneme, is shown in

slants / /, while its realisation is shown in square brackets [ ]. This is an

important convention in phonetics. The variants realising a phoneme are called

allophones.

Such a rule reconciles a coding system using invariant abstract elements

with the pressures towards variability arising from the physical constraints of

the vocal mechanism. Phonology is the part of the language system which

deals with sound patterns, including patterns of variability. Like other terms

(syntax, the lexicon etc.) describing parts of the language system,

phonology can mean both the phenomena we hypothesise to be part of

language, and the scientific study of those phenomena.

The phonology of a language also consists of its inventory of phonemes

(sound-units), and principles governing their combination. For instance the

phonology of many varieties of English specifies seven short-vowel

phonemes, as exemplified in the contrasting words pit, pet, pat, pot, putt, put,

and the first vowel ofpotato. And it specifies that spray, stray, and scray are

all good sequences of phonemes (scray happens not to be a word, but could

be), whilefpray, stlay, and tsray are ill-formed.

Phonology also has a role to play in the combination of morphemes.


20/20


Morphemes are the units which used in word-building, such as point and -

less. Although there is a arguably a constant relationship between a morpheme

and a meaning, there can be variability in the relationship between a

morpheme and its pronunciation. English provides many examples of this, butthe spelling often disguises them. So when the noun-forming suffix -ity is

added to the adjective electric to make electricity, the [k] at the end ofelectric

changes to [s] and the stress shifts from -lec- to -tric- ([lktrk] [lktrsti]).

Like the variation in /k/ according to what vowel followed it, this is another

example of invariant units of a language receiving variable realisation. The

part of phonology which specifies these patterns of variation in morphemes is

sometimes called morphophonology.

Various aspects of phonology are dealt with in Chapter N. In the nextChapter, however, the focus switches from the linguistic mechanism to the

vocal mechanism as we examine how it generates the speech signal.

Fig. 1.6 Spectrogram ofa real worry, with speech wave at the top and pitch trace at the

bottom

Nature of Speech

Documents

Transcript of Nature of Speech