The two different parts of speech Speech Production Speech Perception.
Nature of Speech
-
Upload
david-hurley -
Category
Documents
-
view
217 -
download
0
Transcript of Nature of Speech
-
8/3/2019 Nature of Speech
1/20
1
1
The nature of speech
1.1 More than a word
Darling?, came an anxious voice as he edged open the
unlocked door to the darkened hotel room. He knew at once it was not his
wifes voice. It was the voice of a younger woman an Australian, he
thought.
Sadly, perhaps, the rest of this book will not reveal whether our hero was a
leading scientist about to be seduced by a beautiful spy, or just the kind of
average fellow who is prone to get out of the hotel lift at the wrong floor
without noticing and blunder into someone elses room. Instead, the book will
concentrate on an ultimately more intriguing mystery: how speech conveys
information. This, broadly, is the subject matter of the discipline ofphonetics.
In the fictional example above the unseen woman spoke only one word. It is
clear, however, that as she did so a number of quite different kinds of
information were conveyed. The hearer identified a particular word (darling)
of the language he shares with the speaker. As indicated by the question mark,
he interpreted it to be a query of some kind. Something about the utterance
told the hearer that the speaker was anxious perhaps it was spoken quickly,
quietly, and a little breathily. He knew immediately from the voice that the
speaker was not his wife; nor, presumably, anyone else he knew very well. But
even so he was able to infer, with at least a fair degree of confidence, further
facts about the unknown speaker: her sex, her age, and her geographical
background. Unlike the word, and its function as a query, these other kinds of
information cannot be directly represented in the way an utterance is written,
and so an author has to resort to commentary to convey them.
The main point, then, is that any utterance conveys a number of distinct
types of information. Two further points need to be made. Firstly, not all types
of information conveyed by an utterance are equally intended by the speaker.
Speakers clearly do intend to say particular words appropriate to given
-
8/3/2019 Nature of Speech
2/20
The nature of speech
2
contexts; but it would be odd to suggest that they normally intend to sound
like a tired young woman, or a large man with a cold. The pair of terms
communicative and informativehave been used for this kind of distinction
(Lyons 1977:33). Any aspect of an utterance is informative if it potentiallymakes the hearer aware of something. Only those aspects which are intended
by the speaker to be informative are communicative.
Secondly, whilst the various types of information are distinct, they are all
conveyed by a single complex speech signal. The speech signal is the physical
link between a speaker and a hearer (the only one if they are out of sight and
touch of each other). It consists of very rapid pressure variations in the air,
caused by the speakers speech organs, and sensed by the hearers ear.
Although the speech signal can be analysed in terms of a number of separateacoustic dimensions, corresponding to what we perceive as for instance pitch,
loudness, rhythm, and so on, it is far from being the case that each type of
information will be carried by its own acoustic dimension. Any acoustic
dimension will help to carry a variety of information. The pitch of the speech
signal, for instance, might tell a listener that the utterance is a question, and
that the speaker is a man, and that he is bored. Humans are very skilled at
unravelling the information in the speech signal, but the difficulty of building
a machine that will replicate some of this skill an automatic speechrecogniser, for example demonstrates the complexity of the way in which
information is represented in the speech signal.
1.2 Information carried by the speech signal
A speaker is often described as communicating a message. Since
a speaker in fact uses the speech signal to convey a variety of information, it is
better to think of the speaker having not a simple message but a complex
communicative intent. This is made up of a number of distinct kinds ofinformation.
Cognitive information is essentially factual, or propositional; it consists of
things we know, or could know as opposed, for instance, to how we feel.
Words, and their combination into phrases and sentences, are the primary
vehicle of cognitive information, and it is the kind of information which
writing copes with best.
Affective information has to do with a speakers feelings and attitudes. In
everyday terms a speakers tone of voice is one way of conveying affectiveinformation, as in I didnt mind what he said to me, it was his patronising
tone which I objected to. Choice of words, too, can be important.
-
8/3/2019 Nature of Speech
3/20
Information carried by the speech signal
3
Social information in speech might seem to be something the speaker does
not choose, and therefore to be merely informative rather than communicative.
We think of speakers having an accent which indicates they belong toparticular geographically and socially defined communities, and generally see
this as an unalterable attribute of the person. But in fact most speakers vary
their way of speaking according to the situation and their addressee(s). A
speaker may also adjust his or her accent in the direction of that of another
person, probably as a way of indicating friendliness or solidarity with that
person.
Self-presentational information concerns the speakers self-image. A
speaker who wishes to present an authoritative, knowledgable, persona to theworld may adopt a confident tone of voice (probably relatively loud, and
involving a moderate degree of muscular tension and clear, precise
pronunciation).
Finally here, though this list does not necessarily exhaust the kinds of
information a speaker may choose to convey in an utterance, is regulative
information. This concerns the management of a spoken interaction. For a
conversation to proceed smoothly there have to be some traffic rules, to
avoid the counterproductive situation of both participants speakingsimultaneously and being silent simultaneously. The participant who is
speaking will encode signals in the utterance to communicate that he or she is
full flow, and shouldnt be interrupted (maybe by speaking louder and
speeding up a little), or nearing the end of his or her conversational turn (by
lowering the voice and slowing).
In contrast to the above types of information which speakers may intend to
communicate are those which they convey willy-nilly. The latter kinds of
information leave their trace in the speech signal without intent on the part ofthe speaker, and are sometimes called indexical because they serve as an
index or indicator of aspects of an individual (e.g. Abercrombie, 1967:6). Such
aspects include the speakers social background, age, sex, physique,
psychological state, and health.
The routes by which these various kinds of information leave their traces in
the speech signal will be discussed in section 1.4.
1.3 The speech machine
We have considered various kinds of information which originate
-
8/3/2019 Nature of Speech
4/20
The nature of speech
4
within the speaker, and an external speech signal in which that information is
conveyed. The machine which produces this signal is made up of two parts,
the vocal mechanism and the linguistic mechanism. In a sense these are
rather like a computer and its software. The vocal mechanism is the physicaldevice concerned with speaking, while the linguistic mechanism is the
software which controls it.
The vocal mechanism is shown schematically in Fig. 1, which is like an x-
ray picture of someone facing to the left. The vocal mechanism consists most
obviously of the speakers mouth, throat, nose, larynx, and associated
structures such as the tongue and the vocal cords. But it also consists of the
lungs and the muscles which control breathing, since speech requires air; and
very importantly those parts of the brain and nervous system which control therest of the vocal mechanism.
-
8/3/2019 Nature of Speech
5/20
The speech machine
5
Fig. 1.1 Sagittal section of the vocal mechanism
In the production of speech, broadly, air is expelled in a controlled way
from the lungs, and the airstream is interfered with at various points. These
various kinds of interference create acoustic energy (sound) of different
kinds, which is further modified by the shape of the vocal tract (the air passagethrough the mouth and nose). These processes will be dealt with in detail in
Chapter 2.
-
8/3/2019 Nature of Speech
6/20
The nature of speech
6
The linguistic mechanism includes the speakers language, but goes
beyond what is most commonly thought of as a language. The linguistic
mechanism consists of the set of conventions which the speaker shares with
the relevant language community. These conventions range over aspects suchas the following (assuming, say, that the relevant language community is
English speaking and in the South East of England): that the word for a canine
quadruped is dog; that adjectives come before nouns (a big dog not a dog big);
that a rising pitch towards the end of an utterance often signals a question; that
shifting the pronunciation of the word time in the direction of toym is less
educated, while shifting it in the direction oftame is posh or affected; and
that the use of a whispery voice can mark what is being said as in some way
confidential. The last two of these conventions would be excluded by manydefinitions of a language, but are an integral part of the linguistic mechanism
as a whole.
Linguistics gives us a more structured way of looking at some of these
conventions. Fig. 1.2 shows several components of the linguistic mechanism.
These could be thought of as a set of resources at the speakers disposal. The
lexicon is the mental dictionary shared by speakers of a language, linking
meanings and pronunciations, and including grammatical information.
Arguably it is organised on the basis of morphemes, meaningful sub-wordelements such as point, -ing, aim, -less, -ly, and a set of rules for combining
them into words such as aiming, and pointlessly. Syntax is a set of
conventions governing the combination of words grammar in its most
familiar sense. It is responsible for the sense the English speaker has, for
instance, that pass me the butter is a usable combination of words, and me
butter the pass is not. Phonology is a set of conventions specifying how a
language organises sound. The fact that English has a th sound as in thin and
French does not, or thatpalm andfarm rhyme in some varieties of English andnot in others, are two small examples of phonological differences between
languages or varieties. Prosody provides a set of conventionalised patterns of
pitch and timing which can signal the organisation of the words of an
utterance, refine the meaning of the utterance, and organise sounds within
words. So in Youre broke again? You neverhave any money! the punctuation
partially captures the prosody (the utterance is organised into two parts; the
first has questioning role despite its declarative syntax, and the second is
spoken with marked emphasis); and the bold type conveys a particularprominence lent to never by prosodic features. Not indicated in the written
form is the organisation of never into a more prominent part nev and a less
-
8/3/2019 Nature of Speech
7/20
The speech machine
7
prominent part -er.
Tone of voice lies outside the usual definition of language (see 1.5), and
provides for the communication of additional information in ways such asloudness, voice quality, and whisper. Core language relies on mapping
meaning via discrete abstract categories. Tone of voice involves a more direct
signalling system where gradual changes in meaning are mapped onto gradual
changes in phonetic signals. Increasing anger may correlate gradiently with
increasing loudness. Likewise, different pitch ranges may express a continuum
of involvement or enthusiasm across utterances of the words Oh, thats great.
The categorical and gradient elements of the linguistic mechanism are
represented in Fig 1.2 by the two cylinders abutting over prosody. Thisacknowledges that part of prosody, for instance intonation pitch range, is not
categorical.
PHONETIC PLAN
Fig. 1.2 The linguistic mechanism
There must be a point of contact between the linguistic mechanism and thevocal mechanism. In the view adopted here, that point of contact is the
phonetic plan of an utterance. From the speakers point of view this is a
specification of all the sound properties which the vocal mechanism will have
to achieve during the utterance. It is also likely that the listener will have to
derive something similar to the phonetic plan as a stage in interpreting the
utterance. The exact nature of the phonetic plan is a difficult issue, and
depends on assumptions about how speech is produced and perceived.
1.4 The mapping of information onto the speech signal
-
8/3/2019 Nature of Speech
8/20
The nature of speech
8
This section combines the conceptualisation of the speech
machine developed in section 1.3 with the different sources of information
discussed in section 1.2. The purpose is to show the variety of routes by which
information gets into, or is mapped onto, the speech signal. Fig 1.3 gives an
overview of the mapping process.
Fig. 1.3 Overview of the mapping of communicative intent
1.4.1 The encoding of communicative intent
Communicative intent is shown at the top of Fig 1.4. Its mapping
onto the speech signal is mediated by the linguistic mechanism. The linguistic
mechanism can be thought of as defining a kind of code shared between
speakers of a language, and the process of mapping communicative intent onto
the speech signal as encoding. There is not, however, a simple one-to-one
relation between aspects of communicative intent and the distinct resources of
the linguistic mechanism. Cognitive information, for example, is not mapped
-
8/3/2019 Nature of Speech
9/20
The mapping of information onto the speech signal
9
exclusively through lexical choices, and affective information is not conveyed
solely through choices of tone of voice.
Fig. 1.4 Mapping of communicative intent onto the linguistic mechanism
Communicating cognitive information, the most factual, message-like
element behind speaking, depends on selecting appropriate words, combining
them into grammatical structures, choosing the right prosody (e.g. question or
statement), and, less obviously, using the right tone of voice (it is possible to
override the apparent meaning of an utterance by using an ironic or
sarcastic tone of voice).
Affective communicative intent the speakers attitude can likewise be
conveyed in multiple ways: by choices of prosody and tone of voice certainly,but also by the words chosen, and perhaps by syntax (the lines are not shown
in Fig. 1.4 to avoid a visual cats cradle). Social intent may affect any of the
four resources, since finding a way of speaking appropriate to a particular
social setting may involve the precise variety of a language chosen, and a
particular tone of voice.
Self-presentation may depend on choosing the right words, opting for more
or less complex syntax, and making phonological and tone-of-voice choices.
Regulation of an interaction perhaps recruits fewest resources, thecompleteness or incompleteness of a turn being communicated to interlocutors
mainly by prosodic choices and tone of voice.
-
8/3/2019 Nature of Speech
10/20
The nature of speech
10
1.4.2 The imprinting of indexical factors
It would be less appropriate to talk of indexical factors being
encoded, as there is no intention on the part of the speaker, and no obvious
code. The metaphor used by Laver (1994:20-21) is that of a handworker
producing artifacts, and leaving traces of the apparatus used to produce the
artifact and of his or her personal style. Both the apparatus and the style can
leave what will be called here their imprint. A cast metal object might have a
detectable seam where two halves of the mould in which it was cast joined,
and a particular detail of working in its finish characteristic of the individual
who worked it. Indexical factors carried in the speech signal will be regarded
as the result of a similar kind of imprinting.
At the left of Fig 1.5 indexical factors are shown divided into two
overlapping sets, according to whether their imprint is left mainly via the
linguistic mechanism or the vocal mechanism. Consider first two extreme
cases. We can tell social indexical information, including geographical, from a
Fig. 1.5 Imprinting of indexical factors on speech
persons dialectoraccent as did our hero in section 1.1 when he guessed
-
8/3/2019 Nature of Speech
11/20
The mapping of information onto the speech signal
11
Australian. A dialect consists in features of the linguistic mechanism specific
to a given geographical and/or social group. Dialect is often taken to refer
more broadly to any linguistic resource, including for instance grammar (I
dont know nothing about it is grammatical in many dialects of English) and
vocabulary (to laik means to play in many parts of the North of England),
whilst accent refers specifically to regular pronunciation differences, such as
the use of a particular set of vowels, or the use of a glottal stop for certain
consonants. Admittedly it was pointed out above that people have some
socially oriented communicative choices in how they speak, but for most
people such choices only cover a tiny part of the total range of variation which
exists in a particular language, and so individuals are likely to reveal
themselves reliably as for instance a Texan, or a middle class Liverpudlian. It
is fairly difficult, however, to imagine how such indexical information would
have an effect directly on the vocal mechanism.
Contrast the case of a speakers health. A simple cold can have a drastic
effect on the state of the vocal apparatus a blocked nose makes it hard to
produce words like man properly, which contain nasal consonants, and
inflammation of the vocal cords makes the whole voice sound croaky. In the
longer term, persistent hoarseness can be a cue to serious diseases of the
larynx such as cancer, and in the short term, normal tiredness will also be
reflected in aspects of a persons voice. Similarly psychological state, such as
momentary stress or longer term conditions such as depression, may also leave
its imprint on the speech signal as a result of bio-chemical effects on the
performance of the vocal mechanism. On the other hand a persons health or
psychological state would not be encoded through the linguistic mechanism
(utterances such as Ive got a throat like sandpaper or Im really depressed
encode a cognitive analysis of the states giving rise to the indexical
information, not the indexical information itself).
Between these extremes there will be indexical information which the
listener may be able to glean from the speakers linguistic resources and from
acoustic effects encoded in the speech wave directly by the vocal tract. Age,
for instance, has direct effects on the physiology of the vocal tract, including a
lowering of the larynx toward middle age which results in a deepening of the
voice, and a hardening of the vocal cords in old age which contributes to a
very old persons characteristically quavery voice. But age may also be
indicated by aspects of a persons linguistic mechanism, for instance the use of
particular words, such as wireless rather than radio; the use of slang
-
8/3/2019 Nature of Speech
12/20
The nature of speech
12
expressions, which have a notoriously short life-span (e.g. groovy; far out)
and, of more interest to phonetics, the use of particular sounds and
pronunciations. It is less apparent that pronunciation, as opposed to words and
expressions, changes within a lifetime; but there is no doubt that it does.
Informally we may be aware of this listening to sound recordings (e.g. in
films) from some decades ago. Speakers using an educated (i.e. prestige)
pronunciation of South East England were much more likely to pronounce off
as awf(as in awful), and much less likely to use glottal stops (see Chapter N)
than their equivalents today. Speakers may modify their pronunciations to
keep up with the developing trends of their speech community, but in general
they get left behind enough for pronunciation to be informative about their
age.
Speakers physique is frequently reflected in the speech signal. Large
objects have lower natural pitches (compare a violin and a cello). Larger vocal
tracts have lower resonances, and larger vocal cords vibrate more slowly.
Since there is a tendency for size of vocal tract and vocal cords to correlate
with size of person, we have a better than chance ability to guess which of two
speakers whose voices we hear is larger. Physique leaves it imprint on the
speech signal directly through the vocal tract.
Sex might be regarded as a special case of physique, and it can be inferred
with fair reliability. Apart from mens and womens differing (though
overlapping) ranges of vocal tract and vocal cord sizes, there are also
differences in the proportions of the vocal tract (concerning the pharynx,
which is proportionately longer in men) which may help the cuing of sex.
However there are also said to be languages where women speak a different
dialect, either in terms of pronunciation or grammar, and there are well
attested differences in pronunciation trends between the sexes in English (see
e.g. Trudgill 1974: 84ff). If it is truly the case that men and women have
different dialects, rather than merely being free to make choices from a shared
linguistic system, then sex is imprinted not only through the vocal mechanism
but also through the linguistic mechanism.
1.4.3 Speaker identity and other problems
This overview of the mapping of information mapped onto the
speech signal should be seen as a suggestive outline rather than a definitive
analysis. Where, for instance, does personalityfit in? There is evidence that
personality traits correlate with certain features of speech, for instance that
-
8/3/2019 Nature of Speech
13/20
The mapping of information onto the speech signal
13
extroversion is associated with greater loudness (cf. Scherer 1979:191), but is
this a matter of self-presentation, chosen by the speaker, or is it purely
indexical, determined in a biological way? Nor are the boundaries between
categories as clear-cut as the boxes in Fig. 1.4 would suggest. Adjusting ones
way of speaking to make it more like that of an addressee may involve
changes in accent, apparently of a social kind; but it may be perceived as a cue
to affective information, equivalent to the use of a friendly voice quality and
intonation.
Another question is how accurate hearers are at utilising information carried
by the speech signal. Undoubtedly many of the factors discussed are often
inferred quite inaccurately. We can be surprised to discover the real age of
someone who sounded quite young over the phone; we can inadvertently butt
in to another persons conversational turn thinking they have signalled the
end of their own turn; and of course we can even misunderstand the
cognitive content of speech by mishearing a word or misparsing a sentence.
A surprising omission, perhaps, from the list of indexical factors, is
identity. In everyday life, particularly over the telephone, we successfully
identify a person just from speech, and techniques for identifying speakers for
forensic and other purposes (see Chapter N) exploit information about identity
in the speech signal. The omission reflects the problems surrounding the
concept of identity. It can mean the biological entity constituting a person
(this is the sense which is of relevance to forensic applications), or for instance
a persons membership of certain subgroups of the population. The biological
entity is what a fingerprint defines, with very few exceptions. But the
indirectness of the relationship between the biological person and the speech
signal, and the multiplicity of information mapped onto it, makes it
questionable whether much information about the biological individual is
available. An alternative is that our sense of hearing a particular person is
actually derived from other indexical factors. If ones uncle Herbert is a large
middle-aged man from Birmingham with chronic hoarseness, then those
factors in themselves suffice for an identification when his habitual Sunday
morning telephone call arrives.
The possibility of voice disguise, and mimicry, highlight further the
limitations of a model such as the one in Fig. 1.3. Some issues can be dealt
with, albeit imperfectly. What kind of communicative intent is involved?
Perhaps a special case of self-presentation. What linguistic resources are being
manipulated? Presumably all four may be, though we tend in particular to
-
8/3/2019 Nature of Speech
14/20
The nature of speech
14
think of disguise and mimicry primarily in terms of phonology and tone of
voice (adopting a different accent, and speaking loudly and fast, for instance).
But the model does not really allow for that part of disguise or mimcry which
consists in distorting ones vocal mechanism to make it sound like that of
someone else (known in the case of mimicry or unknown in the case of
disguise). This is not part of the linguistic mechanism (the shared conventions
of the language community hardly includes the recipe for sounding each
person in that community), and the model fails to provide a direct input from
intention to the vocal mechanism.
So in this respect, and many others, the model is imperfect. Nonetheless it
does show how the speech signal at any given moment is determined by a
wide variety of factors, and how it is potentially informative to a hearer in
many different ways, intended and unintended by the speaker. The next
section compares the ways in which different kinds of information are carried
by the speech signal.
1.5 Gradience, discreteness, and componentiality
Imagine someone asking Have you ever air-mailed a miniskirt to
Iceland? Imagine, too, that the speaker has a cold. As a result of the utterance
the hearer should have gleaned two very different kinds of information. The
first comprises the speakers communicative intent, the cognitive content of
which is almost certainly novel to the hearer. The second comprises the
speakers state of health, about which the speaker has unintentionally informed
the hearer. The way the speech signal carries these two kinds of information
contrasts in a number of respects.
As discussed in section 1.4, the cold imprints itself rather directly on the
speech signal through its effect on the vocal mechanism the blocked nose
and inflamed vocal cords. Miniskirt will sound a bit like bidiskirt because
of the blocked nose, and the whole utterance may sound rather as if it came
from a talking frog. The imprinting can vary, but only in the sense that the
severity of the cold will be reflected in the degree of distortion of the speakers
normal voice. The imprinting of the cold on the speech signal is gradient, that
is, it varies continuously.
If the listener chose to respond not to the communciative intent but to the
indexical information, the response, depending on the severity of the cold,
might be Youve got a cold, or Youve got a bad cold, or Youve got a
terrible cold or even Youve got a really terrible cold. This illustrates that
-
8/3/2019 Nature of Speech
15/20
Gradience, discreteness, and componentiality
15
the language code is not gradient but works in terms of choices which are
discrete. The respondent either uses an adjective, or not, chooses the word
bad, or terrible or some other word, and so on. This discreteness is centralto how language works. It is as though language partitions our mental
experience, and allocates a label or symbol to stand for each partition.
Lets imagine now a simplified language, in which each partition is a
complete meaning, and the symbol standing for each meaning is a unique,
simple sound. The meaning Im exhausted might be conveyed by a long f
fffff; You bore me by ssssss; look at this by ooooo, and so on. This situation
is schematised below, where geometric symbols are used instead of sound
symbols.
M1 M2 M3 M4 Meanings
Symbols / Sounds
But what about Have you ever air-mailed a miniskirt to Iceland? Despite the
versatility of our vocal tracts, we would soon run out of adequately distinct
(and memorable) noises to convey the infinite variety of messages we might
need to convey. Not surprisingly, no human language uses this kind of direct
mapping of messages onto sound.
Instead, all human languages exhibit componentiality: they construct larger
units of communication out of smaller discrete components. At one level,
roughly speaking, areas of our mental experience are mapped onto a finite
number of words (the vocabulary of the language). At another level, these
words are mapped onto a comparatively small set (often less than 50) of
meaningless sound units. This is represented schematically as follows
M1 M2 M3 M4 .......... M? Meanings
Symbols / Sounds
Words
The meaningless sound units are often called phonemes and are what are
represented, albeit often inconsistently, by letters in alphabetic writing. So for
-
8/3/2019 Nature of Speech
16/20
The nature of speech
16
instance inpin, tin, kin, we have three different words composed of the same
sound units except for the first one which crucially differentiates them. By
adding s to the beginning of each of these sequences, three new words are
created (stin, in fact, is not an existing word of English, but is one which couldbe adopted if needed). The crucial fact that all human languages associate
meanings with abstract units (words or morphemes) and designate these units
by sequences of discrete meaningless sound elements is sometimes known as
the dual structure of language. The meaningful units can then be combined
into an infinite number of more complex sequential structures (phrases,
sentences) according to the syntax of the language in order to convey a
limitless set of complex meanings including that of the novel and improbable
utterance Have you ever air-mailed a miniskirt to Iceland?. More precisely,dual structure may be taken to refer to two levels of structure, the
grammatical level and the phonological level (Laver 1994:18). Meaningless
sound elements at the phonological level serve as building blocks which
designate meaningful structures at the grammatical level. Componentiality and
sequential structure thus circumvent the problems that would be encountered if
a distinct sound had to be found for every message we wanted to
communicate, and they are taken here as definitional properties of language.
But it is not the case that the speaker has no control over non-componential,gradient aspects of speech. The listener could, for instance, emphasise the
severity of the respiratory ailment perceived in the speaker by saying youve
got a REALLY TERRIBLE COLD! that is, by giving the words in capitals
extra pitch variation and length, and perhaps speaking them with a breathy
voice, as if gasping at the seriousness of the situation. These uses of extra
pitch and length, and of breathy voice, do not share the properties with the
units of language of being discrete elements which can be structured
sequentially, and so are not regarded as part of language itself. They areconsidered here to be choices from a linguistic resource oftone of voice lying
outside the language system, but closely allied to it.
Tone of voice may mimic indexical effects. A person genuinely shocked
may undergo physiological changes which make an extended pitch range and a
breathy voice an automatic consequence. But in the present example the vocal
effects are deliberately chosen by the speaker, and are part of the conventions
shared by the speaker and the hearer. Tone of voice perhaps constitutes an
exploitation and incorporation into the linguistic mechanism of the kind ofgradual vocal effects which naturally inform of certain indexical facts.
-
8/3/2019 Nature of Speech
17/20
Sameness and variation
17
1.6 Sameness and variation
A language, as an abstract code, can be embodied in more than
one medium or physical carrier. Speech is one, and writing another. Thewritten medium, as exemplified in the words on this page, clearly reflects the
componentiality of language. Words are separated by spaces, and made up of
letters which are discrete elements. Each occurrence or token of a letter is an
exact replica of the other occcurences of its type. So for instance in big dogs
scare her, the spaces reflect the linguistic structure of an utterance consisting
of four words, and the two tokens of the letter are identical, as are the two
of . The structure of the language system is closely mirrored in its
realisation in the printed medium.In speech, there is a potential conflict between the discrete, componential
structure of language, and the natural behaviour of the vocal mechanism. Like
a gymnast, the vocal mechanism does not, indeed could not, move abruptly
from one static posture to another. The laws of physics, governing the
velocity, acceleration, and momentum of the gymnasts body, or the vocal
organs, conspire against such abruptness. The gymnast, and the vocal
mechanism, produce a flowing movement. This can be seen reflected in the
spectrogram of the phrase a real worry in Fig. 1.6 [p.20]. A spectrogram is away of displaying an acoustic analysis of speech produced by a computer or
other device. In the picture, time runs from left to right, so the start of the
utterance is at the left, while the vertical axis shows the breakdown of the
complex speech signal into the different pitches or frequencies which make it
up. The darker the pattern, the more sound there is at a particular frequency.
Note that there are no breaks between the words, and there are no successive
static patterns corresponding to successive sounds. This is a carefully selected
example, but it vividly demonstrates the fact that in the spoken medium thediscrete units of language are being realised in a continuous and flowing event.
In the written medium of language, it is fluent (even sloppy) handwriting
rather than printing or typescript which provides a slightly closer analogy with
speech. In a handwritten realisation ofbig dogs scare her
although the words are still separated (unlike in speech), the letters within thewords are not. More interestingly the two occurrences of and are
different. In the second , the descender loops back up again across itself on
-
8/3/2019 Nature of Speech
18/20
The nature of speech
18
the way to the following letter, and in the second , there is no diagonal
rising stroke at the start of the letter, as seen below:
The marks on the paper corresponding to a particular letter-type vary
according to the context in which they occur. Everyones handwriting is
different, but most will exhibit equivalent examples of contextual variation.
Some of the variation can be interpreted as short-cuts, but in other cases the
contextual variation may be purely a matter of habit or convention.
Something very similar is true of speech. A sound will vary in its
pronunciation according to adjacent sounds and its position. Both instances of
in khaki (the in the spelling is irrelevant to the English pronunciation)
require the body of the tongue to make a closure against the roof of the mouth,
but the first will be further back in the mouth than the second. This is in
keeping with the fact that the first vowel requires the body of the tongue to be
near the back of the mouth, and the second requires it to be raised at the front
of the mouth. The difference can be felt best if the word is said silently. If the
word is whispered, emphasing the explosion, or burst in phonetic terms, of
each , the second burst will be heard as higher in pitch. This is because of
the smaller mouth cavity in front of the release which has higher resonant
frequencies.
1.7 Phonology
The situation would be chaotic if such variation were random. We
could never be sure which linguistic unit a particular burst was representing.
What is needed is a set of conventions which express what variation is
allowable with the representatives of a particular unit, and what is not. We can
consider this with an abstract example using geometric shapes, which we
could imagine to be elements in a visual code. The shape on the left of the
arrow represents the abstract (or ideal) element of the code. The arrow means
the following ways of writing the symbol are acceptable, while the crossed
out arrow means these variants are not acceptable.
-
8/3/2019 Nature of Speech
19/20
Phonology
19
In fact, it is neither necessary nor feasible to list what is unacceptable instead
we just assume that everything else not mentioned in the positive part of the
rule is not acceptable.
Much the same sort of rules can be expressed for sounds. The following
rules, which simplify the situation, express what the khaki case exemplified:
/k/ [k h ] / []
/k/ [k h ] / [i]
This means that the linguistic unit or phoneme /k/ can be made at the back of
the mouth or nearer the front (the minus and plus under the symbols mean
further back and further forward in the vocal tract respectively) depending
on the vowel following. In fact a whole range of realisations between these is
possible. Notice that the abstract linguistic unit, the phoneme, is shown in
slants / /, while its realisation is shown in square brackets [ ]. This is an
important convention in phonetics. The variants realising a phoneme are called
allophones.
Such a rule reconciles a coding system using invariant abstract elements
with the pressures towards variability arising from the physical constraints of
the vocal mechanism. Phonology is the part of the language system which
deals with sound patterns, including patterns of variability. Like other terms
(syntax, the lexicon etc.) describing parts of the language system,
phonology can mean both the phenomena we hypothesise to be part of
language, and the scientific study of those phenomena.
The phonology of a language also consists of its inventory of phonemes
(sound-units), and principles governing their combination. For instance the
phonology of many varieties of English specifies seven short-vowel
phonemes, as exemplified in the contrasting words pit, pet, pat, pot, putt, put,
and the first vowel ofpotato. And it specifies that spray, stray, and scray are
all good sequences of phonemes (scray happens not to be a word, but could
be), whilefpray, stlay, and tsray are ill-formed.
Phonology also has a role to play in the combination of morphemes.
-
8/3/2019 Nature of Speech
20/20
The nature of speech
Morphemes are the units which used in word-building, such as point and -
less. Although there is a arguably a constant relationship between a morpheme
and a meaning, there can be variability in the relationship between a
morpheme and its pronunciation. English provides many examples of this, butthe spelling often disguises them. So when the noun-forming suffix -ity is
added to the adjective electric to make electricity, the [k] at the end ofelectric
changes to [s] and the stress shifts from -lec- to -tric- ([lktrk] [lktrsti]).
Like the variation in /k/ according to what vowel followed it, this is another
example of invariant units of a language receiving variable realisation. The
part of phonology which specifies these patterns of variation in morphemes is
sometimes called morphophonology.
Various aspects of phonology are dealt with in Chapter N. In the nextChapter, however, the focus switches from the linguistic mechanism to the
vocal mechanism as we examine how it generates the speech signal.
Fig. 1.6 Spectrogram ofa real worry, with speech wave at the top and pitch trace at the
bottom