Nature of Speech

download Nature of Speech

of 20

Transcript of Nature of Speech

  • 8/3/2019 Nature of Speech

    1/20

    1

    1

    The nature of speech

    1.1 More than a word

    Darling?, came an anxious voice as he edged open the

    unlocked door to the darkened hotel room. He knew at once it was not his

    wifes voice. It was the voice of a younger woman an Australian, he

    thought.

    Sadly, perhaps, the rest of this book will not reveal whether our hero was a

    leading scientist about to be seduced by a beautiful spy, or just the kind of

    average fellow who is prone to get out of the hotel lift at the wrong floor

    without noticing and blunder into someone elses room. Instead, the book will

    concentrate on an ultimately more intriguing mystery: how speech conveys

    information. This, broadly, is the subject matter of the discipline ofphonetics.

    In the fictional example above the unseen woman spoke only one word. It is

    clear, however, that as she did so a number of quite different kinds of

    information were conveyed. The hearer identified a particular word (darling)

    of the language he shares with the speaker. As indicated by the question mark,

    he interpreted it to be a query of some kind. Something about the utterance

    told the hearer that the speaker was anxious perhaps it was spoken quickly,

    quietly, and a little breathily. He knew immediately from the voice that the

    speaker was not his wife; nor, presumably, anyone else he knew very well. But

    even so he was able to infer, with at least a fair degree of confidence, further

    facts about the unknown speaker: her sex, her age, and her geographical

    background. Unlike the word, and its function as a query, these other kinds of

    information cannot be directly represented in the way an utterance is written,

    and so an author has to resort to commentary to convey them.

    The main point, then, is that any utterance conveys a number of distinct

    types of information. Two further points need to be made. Firstly, not all types

    of information conveyed by an utterance are equally intended by the speaker.

    Speakers clearly do intend to say particular words appropriate to given

  • 8/3/2019 Nature of Speech

    2/20

    The nature of speech

    2

    contexts; but it would be odd to suggest that they normally intend to sound

    like a tired young woman, or a large man with a cold. The pair of terms

    communicative and informativehave been used for this kind of distinction

    (Lyons 1977:33). Any aspect of an utterance is informative if it potentiallymakes the hearer aware of something. Only those aspects which are intended

    by the speaker to be informative are communicative.

    Secondly, whilst the various types of information are distinct, they are all

    conveyed by a single complex speech signal. The speech signal is the physical

    link between a speaker and a hearer (the only one if they are out of sight and

    touch of each other). It consists of very rapid pressure variations in the air,

    caused by the speakers speech organs, and sensed by the hearers ear.

    Although the speech signal can be analysed in terms of a number of separateacoustic dimensions, corresponding to what we perceive as for instance pitch,

    loudness, rhythm, and so on, it is far from being the case that each type of

    information will be carried by its own acoustic dimension. Any acoustic

    dimension will help to carry a variety of information. The pitch of the speech

    signal, for instance, might tell a listener that the utterance is a question, and

    that the speaker is a man, and that he is bored. Humans are very skilled at

    unravelling the information in the speech signal, but the difficulty of building

    a machine that will replicate some of this skill an automatic speechrecogniser, for example demonstrates the complexity of the way in which

    information is represented in the speech signal.

    1.2 Information carried by the speech signal

    A speaker is often described as communicating a message. Since

    a speaker in fact uses the speech signal to convey a variety of information, it is

    better to think of the speaker having not a simple message but a complex

    communicative intent. This is made up of a number of distinct kinds ofinformation.

    Cognitive information is essentially factual, or propositional; it consists of

    things we know, or could know as opposed, for instance, to how we feel.

    Words, and their combination into phrases and sentences, are the primary

    vehicle of cognitive information, and it is the kind of information which

    writing copes with best.

    Affective information has to do with a speakers feelings and attitudes. In

    everyday terms a speakers tone of voice is one way of conveying affectiveinformation, as in I didnt mind what he said to me, it was his patronising

    tone which I objected to. Choice of words, too, can be important.

  • 8/3/2019 Nature of Speech

    3/20

    Information carried by the speech signal

    3

    Social information in speech might seem to be something the speaker does

    not choose, and therefore to be merely informative rather than communicative.

    We think of speakers having an accent which indicates they belong toparticular geographically and socially defined communities, and generally see

    this as an unalterable attribute of the person. But in fact most speakers vary

    their way of speaking according to the situation and their addressee(s). A

    speaker may also adjust his or her accent in the direction of that of another

    person, probably as a way of indicating friendliness or solidarity with that

    person.

    Self-presentational information concerns the speakers self-image. A

    speaker who wishes to present an authoritative, knowledgable, persona to theworld may adopt a confident tone of voice (probably relatively loud, and

    involving a moderate degree of muscular tension and clear, precise

    pronunciation).

    Finally here, though this list does not necessarily exhaust the kinds of

    information a speaker may choose to convey in an utterance, is regulative

    information. This concerns the management of a spoken interaction. For a

    conversation to proceed smoothly there have to be some traffic rules, to

    avoid the counterproductive situation of both participants speakingsimultaneously and being silent simultaneously. The participant who is

    speaking will encode signals in the utterance to communicate that he or she is

    full flow, and shouldnt be interrupted (maybe by speaking louder and

    speeding up a little), or nearing the end of his or her conversational turn (by

    lowering the voice and slowing).

    In contrast to the above types of information which speakers may intend to

    communicate are those which they convey willy-nilly. The latter kinds of

    information leave their trace in the speech signal without intent on the part ofthe speaker, and are sometimes called indexical because they serve as an

    index or indicator of aspects of an individual (e.g. Abercrombie, 1967:6). Such

    aspects include the speakers social background, age, sex, physique,

    psychological state, and health.

    The routes by which these various kinds of information leave their traces in

    the speech signal will be discussed in section 1.4.

    1.3 The speech machine

    We have considered various kinds of information which originate

  • 8/3/2019 Nature of Speech

    4/20

    The nature of speech

    4

    within the speaker, and an external speech signal in which that information is

    conveyed. The machine which produces this signal is made up of two parts,

    the vocal mechanism and the linguistic mechanism. In a sense these are

    rather like a computer and its software. The vocal mechanism is the physicaldevice concerned with speaking, while the linguistic mechanism is the

    software which controls it.

    The vocal mechanism is shown schematically in Fig. 1, which is like an x-

    ray picture of someone facing to the left. The vocal mechanism consists most

    obviously of the speakers mouth, throat, nose, larynx, and associated

    structures such as the tongue and the vocal cords. But it also consists of the

    lungs and the muscles which control breathing, since speech requires air; and

    very importantly those parts of the brain and nervous system which control therest of the vocal mechanism.

  • 8/3/2019 Nature of Speech

    5/20

    The speech machine

    5

    Fig. 1.1 Sagittal section of the vocal mechanism

    In the production of speech, broadly, air is expelled in a controlled way

    from the lungs, and the airstream is interfered with at various points. These

    various kinds of interference create acoustic energy (sound) of different

    kinds, which is further modified by the shape of the vocal tract (the air passagethrough the mouth and nose). These processes will be dealt with in detail in

    Chapter 2.

  • 8/3/2019 Nature of Speech

    6/20

    The nature of speech

    6

    The linguistic mechanism includes the speakers language, but goes

    beyond what is most commonly thought of as a language. The linguistic

    mechanism consists of the set of conventions which the speaker shares with

    the relevant language community. These conventions range over aspects suchas the following (assuming, say, that the relevant language community is

    English speaking and in the South East of England): that the word for a canine

    quadruped is dog; that adjectives come before nouns (a big dog not a dog big);

    that a rising pitch towards the end of an utterance often signals a question; that

    shifting the pronunciation of the word time in the direction of toym is less

    educated, while shifting it in the direction oftame is posh or affected; and

    that the use of a whispery voice can mark what is being said as in some way

    confidential. The last two of these conventions would be excluded by manydefinitions of a language, but are an integral part of the linguistic mechanism

    as a whole.

    Linguistics gives us a more structured way of looking at some of these

    conventions. Fig. 1.2 shows several components of the linguistic mechanism.

    These could be thought of as a set of resources at the speakers disposal. The

    lexicon is the mental dictionary shared by speakers of a language, linking

    meanings and pronunciations, and including grammatical information.

    Arguably it is organised on the basis of morphemes, meaningful sub-wordelements such as point, -ing, aim, -less, -ly, and a set of rules for combining

    them into words such as aiming, and pointlessly. Syntax is a set of

    conventions governing the combination of words grammar in its most

    familiar sense. It is responsible for the sense the English speaker has, for

    instance, that pass me the butter is a usable combination of words, and me

    butter the pass is not. Phonology is a set of conventions specifying how a

    language organises sound. The fact that English has a th sound as in thin and

    French does not, or thatpalm andfarm rhyme in some varieties of English andnot in others, are two small examples of phonological differences between

    languages or varieties. Prosody provides a set of conventionalised patterns of

    pitch and timing which can signal the organisation of the words of an

    utterance, refine the meaning of the utterance, and organise sounds within

    words. So in Youre broke again? You neverhave any money! the punctuation

    partially captures the prosody (the utterance is organised into two parts; the

    first has questioning role despite its declarative syntax, and the second is

    spoken with marked emphasis); and the bold type conveys a particularprominence lent to never by prosodic features. Not indicated in the written

    form is the organisation of never into a more prominent part nev and a less

  • 8/3/2019 Nature of Speech

    7/20

    The speech machine

    7

    prominent part -er.

    Tone of voice lies outside the usual definition of language (see 1.5), and

    provides for the communication of additional information in ways such asloudness, voice quality, and whisper. Core language relies on mapping

    meaning via discrete abstract categories. Tone of voice involves a more direct

    signalling system where gradual changes in meaning are mapped onto gradual

    changes in phonetic signals. Increasing anger may correlate gradiently with

    increasing loudness. Likewise, different pitch ranges may express a continuum

    of involvement or enthusiasm across utterances of the words Oh, thats great.

    The categorical and gradient elements of the linguistic mechanism are

    represented in Fig 1.2 by the two cylinders abutting over prosody. Thisacknowledges that part of prosody, for instance intonation pitch range, is not

    categorical.

    PHONETIC PLAN

    Fig. 1.2 The linguistic mechanism

    There must be a point of contact between the linguistic mechanism and thevocal mechanism. In the view adopted here, that point of contact is the

    phonetic plan of an utterance. From the speakers point of view this is a

    specification of all the sound properties which the vocal mechanism will have

    to achieve during the utterance. It is also likely that the listener will have to

    derive something similar to the phonetic plan as a stage in interpreting the

    utterance. The exact nature of the phonetic plan is a difficult issue, and

    depends on assumptions about how speech is produced and perceived.

    1.4 The mapping of information onto the speech signal

  • 8/3/2019 Nature of Speech

    8/20

    The nature of speech

    8

    This section combines the conceptualisation of the speech

    machine developed in section 1.3 with the different sources of information

    discussed in section 1.2. The purpose is to show the variety of routes by which

    information gets into, or is mapped onto, the speech signal. Fig 1.3 gives an

    overview of the mapping process.

    Fig. 1.3 Overview of the mapping of communicative intent

    1.4.1 The encoding of communicative intent

    Communicative intent is shown at the top of Fig 1.4. Its mapping

    onto the speech signal is mediated by the linguistic mechanism. The linguistic

    mechanism can be thought of as defining a kind of code shared between

    speakers of a language, and the process of mapping communicative intent onto

    the speech signal as encoding. There is not, however, a simple one-to-one

    relation between aspects of communicative intent and the distinct resources of

    the linguistic mechanism. Cognitive information, for example, is not mapped

  • 8/3/2019 Nature of Speech

    9/20

    The mapping of information onto the speech signal

    9

    exclusively through lexical choices, and affective information is not conveyed

    solely through choices of tone of voice.

    Fig. 1.4 Mapping of communicative intent onto the linguistic mechanism

    Communicating cognitive information, the most factual, message-like

    element behind speaking, depends on selecting appropriate words, combining

    them into grammatical structures, choosing the right prosody (e.g. question or

    statement), and, less obviously, using the right tone of voice (it is possible to

    override the apparent meaning of an utterance by using an ironic or

    sarcastic tone of voice).

    Affective communicative intent the speakers attitude can likewise be

    conveyed in multiple ways: by choices of prosody and tone of voice certainly,but also by the words chosen, and perhaps by syntax (the lines are not shown

    in Fig. 1.4 to avoid a visual cats cradle). Social intent may affect any of the

    four resources, since finding a way of speaking appropriate to a particular

    social setting may involve the precise variety of a language chosen, and a

    particular tone of voice.

    Self-presentation may depend on choosing the right words, opting for more

    or less complex syntax, and making phonological and tone-of-voice choices.

    Regulation of an interaction perhaps recruits fewest resources, thecompleteness or incompleteness of a turn being communicated to interlocutors

    mainly by prosodic choices and tone of voice.

  • 8/3/2019 Nature of Speech

    10/20

    The nature of speech

    10

    1.4.2 The imprinting of indexical factors

    It would be less appropriate to talk of indexical factors being

    encoded, as there is no intention on the part of the speaker, and no obvious

    code. The metaphor used by Laver (1994:20-21) is that of a handworker

    producing artifacts, and leaving traces of the apparatus used to produce the

    artifact and of his or her personal style. Both the apparatus and the style can

    leave what will be called here their imprint. A cast metal object might have a

    detectable seam where two halves of the mould in which it was cast joined,

    and a particular detail of working in its finish characteristic of the individual

    who worked it. Indexical factors carried in the speech signal will be regarded

    as the result of a similar kind of imprinting.

    At the left of Fig 1.5 indexical factors are shown divided into two

    overlapping sets, according to whether their imprint is left mainly via the

    linguistic mechanism or the vocal mechanism. Consider first two extreme

    cases. We can tell social indexical information, including geographical, from a

    Fig. 1.5 Imprinting of indexical factors on speech

    persons dialectoraccent as did our hero in section 1.1 when he guessed

  • 8/3/2019 Nature of Speech

    11/20

    The mapping of information onto the speech signal

    11

    Australian. A dialect consists in features of the linguistic mechanism specific

    to a given geographical and/or social group. Dialect is often taken to refer

    more broadly to any linguistic resource, including for instance grammar (I

    dont know nothing about it is grammatical in many dialects of English) and

    vocabulary (to laik means to play in many parts of the North of England),

    whilst accent refers specifically to regular pronunciation differences, such as

    the use of a particular set of vowels, or the use of a glottal stop for certain

    consonants. Admittedly it was pointed out above that people have some

    socially oriented communicative choices in how they speak, but for most

    people such choices only cover a tiny part of the total range of variation which

    exists in a particular language, and so individuals are likely to reveal

    themselves reliably as for instance a Texan, or a middle class Liverpudlian. It

    is fairly difficult, however, to imagine how such indexical information would

    have an effect directly on the vocal mechanism.

    Contrast the case of a speakers health. A simple cold can have a drastic

    effect on the state of the vocal apparatus a blocked nose makes it hard to

    produce words like man properly, which contain nasal consonants, and

    inflammation of the vocal cords makes the whole voice sound croaky. In the

    longer term, persistent hoarseness can be a cue to serious diseases of the

    larynx such as cancer, and in the short term, normal tiredness will also be

    reflected in aspects of a persons voice. Similarly psychological state, such as

    momentary stress or longer term conditions such as depression, may also leave

    its imprint on the speech signal as a result of bio-chemical effects on the

    performance of the vocal mechanism. On the other hand a persons health or

    psychological state would not be encoded through the linguistic mechanism

    (utterances such as Ive got a throat like sandpaper or Im really depressed

    encode a cognitive analysis of the states giving rise to the indexical

    information, not the indexical information itself).

    Between these extremes there will be indexical information which the

    listener may be able to glean from the speakers linguistic resources and from

    acoustic effects encoded in the speech wave directly by the vocal tract. Age,

    for instance, has direct effects on the physiology of the vocal tract, including a

    lowering of the larynx toward middle age which results in a deepening of the

    voice, and a hardening of the vocal cords in old age which contributes to a

    very old persons characteristically quavery voice. But age may also be

    indicated by aspects of a persons linguistic mechanism, for instance the use of

    particular words, such as wireless rather than radio; the use of slang

  • 8/3/2019 Nature of Speech

    12/20

    The nature of speech

    12

    expressions, which have a notoriously short life-span (e.g. groovy; far out)

    and, of more interest to phonetics, the use of particular sounds and

    pronunciations. It is less apparent that pronunciation, as opposed to words and

    expressions, changes within a lifetime; but there is no doubt that it does.

    Informally we may be aware of this listening to sound recordings (e.g. in

    films) from some decades ago. Speakers using an educated (i.e. prestige)

    pronunciation of South East England were much more likely to pronounce off

    as awf(as in awful), and much less likely to use glottal stops (see Chapter N)

    than their equivalents today. Speakers may modify their pronunciations to

    keep up with the developing trends of their speech community, but in general

    they get left behind enough for pronunciation to be informative about their

    age.

    Speakers physique is frequently reflected in the speech signal. Large

    objects have lower natural pitches (compare a violin and a cello). Larger vocal

    tracts have lower resonances, and larger vocal cords vibrate more slowly.

    Since there is a tendency for size of vocal tract and vocal cords to correlate

    with size of person, we have a better than chance ability to guess which of two

    speakers whose voices we hear is larger. Physique leaves it imprint on the

    speech signal directly through the vocal tract.

    Sex might be regarded as a special case of physique, and it can be inferred

    with fair reliability. Apart from mens and womens differing (though

    overlapping) ranges of vocal tract and vocal cord sizes, there are also

    differences in the proportions of the vocal tract (concerning the pharynx,

    which is proportionately longer in men) which may help the cuing of sex.

    However there are also said to be languages where women speak a different

    dialect, either in terms of pronunciation or grammar, and there are well

    attested differences in pronunciation trends between the sexes in English (see

    e.g. Trudgill 1974: 84ff). If it is truly the case that men and women have

    different dialects, rather than merely being free to make choices from a shared

    linguistic system, then sex is imprinted not only through the vocal mechanism

    but also through the linguistic mechanism.

    1.4.3 Speaker identity and other problems

    This overview of the mapping of information mapped onto the

    speech signal should be seen as a suggestive outline rather than a definitive

    analysis. Where, for instance, does personalityfit in? There is evidence that

    personality traits correlate with certain features of speech, for instance that

  • 8/3/2019 Nature of Speech

    13/20

    The mapping of information onto the speech signal

    13

    extroversion is associated with greater loudness (cf. Scherer 1979:191), but is

    this a matter of self-presentation, chosen by the speaker, or is it purely

    indexical, determined in a biological way? Nor are the boundaries between

    categories as clear-cut as the boxes in Fig. 1.4 would suggest. Adjusting ones

    way of speaking to make it more like that of an addressee may involve

    changes in accent, apparently of a social kind; but it may be perceived as a cue

    to affective information, equivalent to the use of a friendly voice quality and

    intonation.

    Another question is how accurate hearers are at utilising information carried

    by the speech signal. Undoubtedly many of the factors discussed are often

    inferred quite inaccurately. We can be surprised to discover the real age of

    someone who sounded quite young over the phone; we can inadvertently butt

    in to another persons conversational turn thinking they have signalled the

    end of their own turn; and of course we can even misunderstand the

    cognitive content of speech by mishearing a word or misparsing a sentence.

    A surprising omission, perhaps, from the list of indexical factors, is

    identity. In everyday life, particularly over the telephone, we successfully

    identify a person just from speech, and techniques for identifying speakers for

    forensic and other purposes (see Chapter N) exploit information about identity

    in the speech signal. The omission reflects the problems surrounding the

    concept of identity. It can mean the biological entity constituting a person

    (this is the sense which is of relevance to forensic applications), or for instance

    a persons membership of certain subgroups of the population. The biological

    entity is what a fingerprint defines, with very few exceptions. But the

    indirectness of the relationship between the biological person and the speech

    signal, and the multiplicity of information mapped onto it, makes it

    questionable whether much information about the biological individual is

    available. An alternative is that our sense of hearing a particular person is

    actually derived from other indexical factors. If ones uncle Herbert is a large

    middle-aged man from Birmingham with chronic hoarseness, then those

    factors in themselves suffice for an identification when his habitual Sunday

    morning telephone call arrives.

    The possibility of voice disguise, and mimicry, highlight further the

    limitations of a model such as the one in Fig. 1.3. Some issues can be dealt

    with, albeit imperfectly. What kind of communicative intent is involved?

    Perhaps a special case of self-presentation. What linguistic resources are being

    manipulated? Presumably all four may be, though we tend in particular to

  • 8/3/2019 Nature of Speech

    14/20

    The nature of speech

    14

    think of disguise and mimicry primarily in terms of phonology and tone of

    voice (adopting a different accent, and speaking loudly and fast, for instance).

    But the model does not really allow for that part of disguise or mimcry which

    consists in distorting ones vocal mechanism to make it sound like that of

    someone else (known in the case of mimicry or unknown in the case of

    disguise). This is not part of the linguistic mechanism (the shared conventions

    of the language community hardly includes the recipe for sounding each

    person in that community), and the model fails to provide a direct input from

    intention to the vocal mechanism.

    So in this respect, and many others, the model is imperfect. Nonetheless it

    does show how the speech signal at any given moment is determined by a

    wide variety of factors, and how it is potentially informative to a hearer in

    many different ways, intended and unintended by the speaker. The next

    section compares the ways in which different kinds of information are carried

    by the speech signal.

    1.5 Gradience, discreteness, and componentiality

    Imagine someone asking Have you ever air-mailed a miniskirt to

    Iceland? Imagine, too, that the speaker has a cold. As a result of the utterance

    the hearer should have gleaned two very different kinds of information. The

    first comprises the speakers communicative intent, the cognitive content of

    which is almost certainly novel to the hearer. The second comprises the

    speakers state of health, about which the speaker has unintentionally informed

    the hearer. The way the speech signal carries these two kinds of information

    contrasts in a number of respects.

    As discussed in section 1.4, the cold imprints itself rather directly on the

    speech signal through its effect on the vocal mechanism the blocked nose

    and inflamed vocal cords. Miniskirt will sound a bit like bidiskirt because

    of the blocked nose, and the whole utterance may sound rather as if it came

    from a talking frog. The imprinting can vary, but only in the sense that the

    severity of the cold will be reflected in the degree of distortion of the speakers

    normal voice. The imprinting of the cold on the speech signal is gradient, that

    is, it varies continuously.

    If the listener chose to respond not to the communciative intent but to the

    indexical information, the response, depending on the severity of the cold,

    might be Youve got a cold, or Youve got a bad cold, or Youve got a

    terrible cold or even Youve got a really terrible cold. This illustrates that

  • 8/3/2019 Nature of Speech

    15/20

    Gradience, discreteness, and componentiality

    15

    the language code is not gradient but works in terms of choices which are

    discrete. The respondent either uses an adjective, or not, chooses the word

    bad, or terrible or some other word, and so on. This discreteness is centralto how language works. It is as though language partitions our mental

    experience, and allocates a label or symbol to stand for each partition.

    Lets imagine now a simplified language, in which each partition is a

    complete meaning, and the symbol standing for each meaning is a unique,

    simple sound. The meaning Im exhausted might be conveyed by a long f

    fffff; You bore me by ssssss; look at this by ooooo, and so on. This situation

    is schematised below, where geometric symbols are used instead of sound

    symbols.

    M1 M2 M3 M4 Meanings

    Symbols / Sounds

    But what about Have you ever air-mailed a miniskirt to Iceland? Despite the

    versatility of our vocal tracts, we would soon run out of adequately distinct

    (and memorable) noises to convey the infinite variety of messages we might

    need to convey. Not surprisingly, no human language uses this kind of direct

    mapping of messages onto sound.

    Instead, all human languages exhibit componentiality: they construct larger

    units of communication out of smaller discrete components. At one level,

    roughly speaking, areas of our mental experience are mapped onto a finite

    number of words (the vocabulary of the language). At another level, these

    words are mapped onto a comparatively small set (often less than 50) of

    meaningless sound units. This is represented schematically as follows

    M1 M2 M3 M4 .......... M? Meanings

    Symbols / Sounds

    Words

    The meaningless sound units are often called phonemes and are what are

    represented, albeit often inconsistently, by letters in alphabetic writing. So for

  • 8/3/2019 Nature of Speech

    16/20

    The nature of speech

    16

    instance inpin, tin, kin, we have three different words composed of the same

    sound units except for the first one which crucially differentiates them. By

    adding s to the beginning of each of these sequences, three new words are

    created (stin, in fact, is not an existing word of English, but is one which couldbe adopted if needed). The crucial fact that all human languages associate

    meanings with abstract units (words or morphemes) and designate these units

    by sequences of discrete meaningless sound elements is sometimes known as

    the dual structure of language. The meaningful units can then be combined

    into an infinite number of more complex sequential structures (phrases,

    sentences) according to the syntax of the language in order to convey a

    limitless set of complex meanings including that of the novel and improbable

    utterance Have you ever air-mailed a miniskirt to Iceland?. More precisely,dual structure may be taken to refer to two levels of structure, the

    grammatical level and the phonological level (Laver 1994:18). Meaningless

    sound elements at the phonological level serve as building blocks which

    designate meaningful structures at the grammatical level. Componentiality and

    sequential structure thus circumvent the problems that would be encountered if

    a distinct sound had to be found for every message we wanted to

    communicate, and they are taken here as definitional properties of language.

    But it is not the case that the speaker has no control over non-componential,gradient aspects of speech. The listener could, for instance, emphasise the

    severity of the respiratory ailment perceived in the speaker by saying youve

    got a REALLY TERRIBLE COLD! that is, by giving the words in capitals

    extra pitch variation and length, and perhaps speaking them with a breathy

    voice, as if gasping at the seriousness of the situation. These uses of extra

    pitch and length, and of breathy voice, do not share the properties with the

    units of language of being discrete elements which can be structured

    sequentially, and so are not regarded as part of language itself. They areconsidered here to be choices from a linguistic resource oftone of voice lying

    outside the language system, but closely allied to it.

    Tone of voice may mimic indexical effects. A person genuinely shocked

    may undergo physiological changes which make an extended pitch range and a

    breathy voice an automatic consequence. But in the present example the vocal

    effects are deliberately chosen by the speaker, and are part of the conventions

    shared by the speaker and the hearer. Tone of voice perhaps constitutes an

    exploitation and incorporation into the linguistic mechanism of the kind ofgradual vocal effects which naturally inform of certain indexical facts.

  • 8/3/2019 Nature of Speech

    17/20

    Sameness and variation

    17

    1.6 Sameness and variation

    A language, as an abstract code, can be embodied in more than

    one medium or physical carrier. Speech is one, and writing another. Thewritten medium, as exemplified in the words on this page, clearly reflects the

    componentiality of language. Words are separated by spaces, and made up of

    letters which are discrete elements. Each occurrence or token of a letter is an

    exact replica of the other occcurences of its type. So for instance in big dogs

    scare her, the spaces reflect the linguistic structure of an utterance consisting

    of four words, and the two tokens of the letter are identical, as are the two

    of . The structure of the language system is closely mirrored in its

    realisation in the printed medium.In speech, there is a potential conflict between the discrete, componential

    structure of language, and the natural behaviour of the vocal mechanism. Like

    a gymnast, the vocal mechanism does not, indeed could not, move abruptly

    from one static posture to another. The laws of physics, governing the

    velocity, acceleration, and momentum of the gymnasts body, or the vocal

    organs, conspire against such abruptness. The gymnast, and the vocal

    mechanism, produce a flowing movement. This can be seen reflected in the

    spectrogram of the phrase a real worry in Fig. 1.6 [p.20]. A spectrogram is away of displaying an acoustic analysis of speech produced by a computer or

    other device. In the picture, time runs from left to right, so the start of the

    utterance is at the left, while the vertical axis shows the breakdown of the

    complex speech signal into the different pitches or frequencies which make it

    up. The darker the pattern, the more sound there is at a particular frequency.

    Note that there are no breaks between the words, and there are no successive

    static patterns corresponding to successive sounds. This is a carefully selected

    example, but it vividly demonstrates the fact that in the spoken medium thediscrete units of language are being realised in a continuous and flowing event.

    In the written medium of language, it is fluent (even sloppy) handwriting

    rather than printing or typescript which provides a slightly closer analogy with

    speech. In a handwritten realisation ofbig dogs scare her

    although the words are still separated (unlike in speech), the letters within thewords are not. More interestingly the two occurrences of and are

    different. In the second , the descender loops back up again across itself on

  • 8/3/2019 Nature of Speech

    18/20

    The nature of speech

    18

    the way to the following letter, and in the second , there is no diagonal

    rising stroke at the start of the letter, as seen below:

    The marks on the paper corresponding to a particular letter-type vary

    according to the context in which they occur. Everyones handwriting is

    different, but most will exhibit equivalent examples of contextual variation.

    Some of the variation can be interpreted as short-cuts, but in other cases the

    contextual variation may be purely a matter of habit or convention.

    Something very similar is true of speech. A sound will vary in its

    pronunciation according to adjacent sounds and its position. Both instances of

    in khaki (the in the spelling is irrelevant to the English pronunciation)

    require the body of the tongue to make a closure against the roof of the mouth,

    but the first will be further back in the mouth than the second. This is in

    keeping with the fact that the first vowel requires the body of the tongue to be

    near the back of the mouth, and the second requires it to be raised at the front

    of the mouth. The difference can be felt best if the word is said silently. If the

    word is whispered, emphasing the explosion, or burst in phonetic terms, of

    each , the second burst will be heard as higher in pitch. This is because of

    the smaller mouth cavity in front of the release which has higher resonant

    frequencies.

    1.7 Phonology

    The situation would be chaotic if such variation were random. We

    could never be sure which linguistic unit a particular burst was representing.

    What is needed is a set of conventions which express what variation is

    allowable with the representatives of a particular unit, and what is not. We can

    consider this with an abstract example using geometric shapes, which we

    could imagine to be elements in a visual code. The shape on the left of the

    arrow represents the abstract (or ideal) element of the code. The arrow means

    the following ways of writing the symbol are acceptable, while the crossed

    out arrow means these variants are not acceptable.

  • 8/3/2019 Nature of Speech

    19/20

    Phonology

    19

    In fact, it is neither necessary nor feasible to list what is unacceptable instead

    we just assume that everything else not mentioned in the positive part of the

    rule is not acceptable.

    Much the same sort of rules can be expressed for sounds. The following

    rules, which simplify the situation, express what the khaki case exemplified:

    /k/ [k h ] / []

    /k/ [k h ] / [i]

    This means that the linguistic unit or phoneme /k/ can be made at the back of

    the mouth or nearer the front (the minus and plus under the symbols mean

    further back and further forward in the vocal tract respectively) depending

    on the vowel following. In fact a whole range of realisations between these is

    possible. Notice that the abstract linguistic unit, the phoneme, is shown in

    slants / /, while its realisation is shown in square brackets [ ]. This is an

    important convention in phonetics. The variants realising a phoneme are called

    allophones.

    Such a rule reconciles a coding system using invariant abstract elements

    with the pressures towards variability arising from the physical constraints of

    the vocal mechanism. Phonology is the part of the language system which

    deals with sound patterns, including patterns of variability. Like other terms

    (syntax, the lexicon etc.) describing parts of the language system,

    phonology can mean both the phenomena we hypothesise to be part of

    language, and the scientific study of those phenomena.

    The phonology of a language also consists of its inventory of phonemes

    (sound-units), and principles governing their combination. For instance the

    phonology of many varieties of English specifies seven short-vowel

    phonemes, as exemplified in the contrasting words pit, pet, pat, pot, putt, put,

    and the first vowel ofpotato. And it specifies that spray, stray, and scray are

    all good sequences of phonemes (scray happens not to be a word, but could

    be), whilefpray, stlay, and tsray are ill-formed.

    Phonology also has a role to play in the combination of morphemes.

  • 8/3/2019 Nature of Speech

    20/20

    The nature of speech

    Morphemes are the units which used in word-building, such as point and -

    less. Although there is a arguably a constant relationship between a morpheme

    and a meaning, there can be variability in the relationship between a

    morpheme and its pronunciation. English provides many examples of this, butthe spelling often disguises them. So when the noun-forming suffix -ity is

    added to the adjective electric to make electricity, the [k] at the end ofelectric

    changes to [s] and the stress shifts from -lec- to -tric- ([lktrk] [lktrsti]).

    Like the variation in /k/ according to what vowel followed it, this is another

    example of invariant units of a language receiving variable realisation. The

    part of phonology which specifies these patterns of variation in morphemes is

    sometimes called morphophonology.

    Various aspects of phonology are dealt with in Chapter N. In the nextChapter, however, the focus switches from the linguistic mechanism to the

    vocal mechanism as we examine how it generates the speech signal.

    Fig. 1.6 Spectrogram ofa real worry, with speech wave at the top and pitch trace at the

    bottom