Post on 27-Dec-2021
REPORT RESUMESED 012 923 AL 000 639
COMPUTATIONAL LINGUISTICS-PROCEDURES AND PROBLEMS.BY- LEHMANN, W.F.TEXAS UNIV., AUSTIN, LINGUISTICS RES. CTR.REPORT NUMBER LRC.--65-41A--1 PUB DATE JAN 65
EDRS PRICE MF..-.1-0.25 HC-S1.48 37F.
DESCRIPTORS- tCOMFUTA:IONAL LINGUISTICS, LANGUAGE, LINGUISTICTHEORY, *MACHINE TRANSLATION, MATHEMATICAL LINGUISTICS,LINGUISTIC PATTERNS, STRUCTURAL ANALYSIS, 'DATA PROCESSING,COMPUTERS, CLASSIFICATION,
BASED ON A LECTURE GIVEN AT THE UNIV. OF TEXAS SCIENCECONFERENCE, NOV. 2D, 1964, THIS PAPER PRESENTS IN RELATIVELYNON-TECHNICAL TERMINOLOGY A DESCRIPTION OF THE "STRUCTURAL"APPROACH TO THE STUDY OF LANGUAGE WHICH UNDERLIES THE WORK OF
THE LINGUISTICS RESEARCH CENTER. THIS APPROACH ANALYZESLANGUAGE IN SUCH A WAY THAT IT CAN BE MANIPULATED WITH ACOMPUTER. STRESSING THE NECESSITY FOR A MORE COMPLETEUNDERSTANDING OF LANGUAGE AS THE BASIS FOR MACHINETRANSLATION AND COMPUTATIONAL LINGUISTICS, THE AUTHOR DEALSWITH (1) THE FORMAL STRUCTURE OF LANGUAGE, (2) SIMULATION,
(3) LANGUAGE DATA PROCESSING, (4) AUTOMATIC CLASSIFICATION,(5) ANALYSIS OF MEANING, AND (6) ACCOMPLISHMENTS IN THE FIELDOF LINGUISTIC RESEARCH. INCLUDED ARE REPRODUCTIONS OF THEANALYSIS OF A SENTENCE WITH A PARSING DIAGRAM, AND A CHART OFTHE LINGUISTICS RESEARCH SYSTEM. (AM)
i
1
In (4) 00 639
THIS IS A WORKING PAPER 0 IT MAY BE EXPANDED. MODIFIED
OR WITHDRAWN AT ANY TIME 0 THE VIEWS. CONCLUSIONS,
AND RECOMMENDATIONS EXPRESSED HEREIN DO NOT
NECESSARILY REFLECT THE OFFICIAL VIEWS OF THE SPONSOR 0
LINGUISTICS RESEARCH CENTER
THE UNIVERSITY OF TEX AS
BOX 7247 UNIVERSITY STATION AUSTIN 12, TEXAS
U.S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE
OFFICE OF EDUCATION
THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE
PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE Of EDUCATION
POSITION OR POLICY.
COMPUTATIONAL LINGUISTICS:
PROCEDURES AND PROBLEMS
"PERMISSION TO REPRODUCE THIS
MUM MATERIAL HAS BEEN GRANTED
By Cc'. p oc......x..71,-,-..--
TO ERIC AND ORGANIZATIONS OPERATiiig
UNDER AGREEMENTS WITH THE U.S. OFFICE OF
EDUCATION. FURTHER REPRODUCTION OUTSIDE
THE ERIC SYSTEM REQUIRES PERMISSION OF
THE mum OWNER."
W. P0 Lehmann
prepared for
National Science Foundation
Grant NSF GN-308
LINGUISTICS RESEARCH'CENTER
The University of Texas
Box 72479 University Station
Austin, Texas 78712
LRC 65 WA-1 January 1965
CONTENTS
Abstract 0 00 00000.0.0. 00 0 111
Foreword . 000110000000000001 Introduction .
4 1-1
2 Formal Structure 0 .00.00t I I 2-1
3 Simulation 0000000000000 3-1
4 Language Data Processing . 00000 4-1
5 Automatic Classification 000000 5-1
6 Analysis of Meaning . . 000000111 6-1
7 Accomplishments 0 0000000000 7-1
Appendix
i
ABSTRACT
The necessity for a more complete understand-
ing of language as the basis for machine translation
and computational linguistics is stressed. Other
benefits which will result from this longterm re-
search -- including information retrieval and auto-
matic classification -- are also mentioned.
iii
FOREWORD
This paper is based on a lecture given at
The University of Texas Science Conference, November 20,
1964. The conference provided a means for scientists
in The University of Texas faculties to get acquainted
with one another and to listen to brief expositions
of research in progress on the campus. Part of the
research mentioned here was performed under grant
NSF GN-708.
v
1 INTRODUCTION
This paper deals with a relatively new approach
to the study of language, one underlying the work of
the Linguistics Research Center. This view regards
language in such a way that it can be manipulated
with a computer. Yet the view cannot be related to
technological developments, for it preceded the computer.
The resultant approach to language has often been
called structural linguistics.
Language may be studied from many points of
view. One may wish to acquire a graceful mastery of
one or more languages, either for writing special
kinds of texts called poems, or simply to impress as
well as inform an audience. One may wish to learn
about the history of specific languages, how English
is related to Hindi, Greek, Armenian or Irish. The
most prominent interest in language in Western culture
arose from a desire to understand venerated texts,
primarily texts in Hebrew, Greek, Latin, the Bible
and the classics. The understanding of these texts
led to the development of special techniques and
attitudes about language. For us, oddly enough in
contrast with the Greeks and Romans, the written
language has seemed more fundamental than the spoken,
and we have spent more time learning to read than to
speak languages, whether French, German, Russian, or
the cited classical languages. Further, since in our
day written materials are broken up into units called
words, these seem to us the fundamental entities of
language.
1-1
Moreover, since these languages, especially
Latin seem somehow to be model languages, we have
sought a mastery of current languages, including our
own, through descriptions--grammars--which are modeled
on the grammar of Latin. To master a language,
including English, our grammaridns, teachers and students
note its resemblances to Latin and also the differences
from it. This procedure may be compared with that of a
geographer who adopts one location, for example, New
York City, as the ideal and describes all other locations
by their resemblances to it. From such a geographer
we would not get a map of London, Paris or Moscow, but
rather various maps of New York modified in accordance
with deviations from New York in these cities.
It is not my aim to present a critique of
any view of language, or of our methods of teaching
languagesp or even of any type of research on language.
But since we have all studied languages in accordance
with the Latiri-based approach, we regard any language
in accordance with the views given us in our schools.
These views must therefore be specified if we are to
understand one another° I might also mention that the
first attempts at computer processing of language
failed because the scholars concerned viewed the es-
sential problem as the manipulation of words.
Besides discussing a somewhat different
approach to language I will touch on the linguistic
investigations it has prompted and is continuing to
require° I will also deal briefly with the require-
ments this approach is making on computer programming
and may make on computer technology.
1-2
2 FORMAL STRUCTURE
Possibly the most important feature of struc-
tural linguistics is the understanding that language
has a formal structure, composed of various sub-struc-
tures. In these structures the function or value of
any entity is determined largely by its relationship
with other entities. Entities then are not defined
by their relationship to the outside world; a noun
for example is not defined as the name of a person,
place or thing. Viewing language in this way seems
to a linguist somewhat similar to presenting mathe-
matics through concrete objects, never to add 2 and 2,
but always two apples to two apples, and so on. There
is little doubt that our understanding of numbers,
our progress in mathematics, would have been hampered
if we had dealt with them only in connection with the
outside world rather than as abstract signs. Lin-
guists hold that such a view of language has impeded
our understanding of it.
Some decades ago a few scholars began to
examine language as a system of signs whose function
was specified by their interrelationships. This
approach to language--this theory, if you wish- -
would define a noun in any given language by its
relationship to other entities in that language. A
noun in English, for example, might be defined as an
entity with certain relationships to inflectional
elements, to a Z-like entity in the plural: arm :
arms, or in the possessive: man : man's. Other
languages might not have nominal inflection and ac-
cordingly would 'not, have a class of nouns. In Japa-
nese, for example, Only verbs are inflected; we cannot
2-1
then speak of a class of inflected nouns. Another
basis of definition of entities might be by their
relationship to independent entities, for example,
articles: the; a, or to verbs, By such a definition
man is a noun because it can follow the; went on the
other hand is not. Using such a procedure, we can
identify nouns in Japanese. Further, we can also
identify larger acceptable entities; such as sentences,
by their entities and the interrelationships of these;
men talk, for e::ample9 is an acceptable English sen-
tence, but not men happilys or even men ham:.
When this approach to language was pursued, the
work of linguistics came to be looked on as the deter-
mination of the entities of any language and their
interrelationships.
Two requirements are necessary before one
can deal usefully with language in this way. We must
first determine whether the materials are genuine,
whether for example an English speaker permits the
sequence: men talk. Next, we ascertain whether the
entities in this sequence have a characteristic meaning.
In such determination we elicit comparable sequences,
e.g. men walk; then talk, and so on. With such con-
trasting sequences we would satisfy ourselves that m
is a characteristic markers for it is the only entity
distinguishing men talk from then talk, or distin-
guishing man talks (a statement an anthropologist
might make) from Ann talks (a statement one might make
about a very young lady), Similarly, t is a charac-
teristic sound markers distinguishing ten from men,
talk from walk, and so on.
2-2
In addition to determining the characteristic
entities of sound in a language, we may also determine
larger entities, for example, talk as opposed to walk.
These differ from entities like the first consonants
of men, ten and then in that they have established
relationships to certain concepts. Briefly, we say
they have meaning. The first consonants of men, ten
and then do not. They serve to distinguish meanings,
but we cannot associate with them any given concepts,
such as 'animateness', 'number' or 'temporality'.
Rigorous techniques for determining entities
of both kinds have been developed.
When such entities are specified in a given
language, linguists set out to determine their role- -
one might say, their properties. In English there
are about forty entities like m and t. These may be
regarded as signs, comparable to other kinds of signs
man uses, e.g. 3 4. Just as a mathematician might
investigate relationships between such units in a
given number system, setting up various classes, e.g.
primes, so a linguist might investigate the role of
such entities in a given language. He might determine
what relationships t has with regard to the other
entities of sound. In English, for example, t may
precede e if n follows, but not e alone; there is no
English sequence te. Nor are there sequences like
tne, etn, and so on. Other such problems will occur
to any linguist, mathematician, or to anyone who en-
joys manipulating signs. Yet few such problems have
been investigated, even in a widely used language like
English, to say nothing of 5,000 other languages. We
have not had the personal nor the physical resources.
2-3
A similar range of problems might be cited
in the investigation of entities like walk talk take
brake and so one We might find sequences in which man:
men precede any of these, e.g. the man walked, the man
braked around curves, etc. But we do not find sequences
like: walkman, talkman, takeman paralleling brakeman.
A complete description of any language would specify
which of such sequences occur.
At this stage of his investigations a linguist
does not deal with meaning. He has determined that
brake differs from take, that it has a meaning; but in
examining possible sequences like brakeman he deals
only with its properties of occurrence, Nonetheless
this second type of investigation, noting the inter-
relationships between entities like takes brakep
man, provides even more problems than does the
first.
Still other entities must be identified in
language and investigated similarly. But the two types
of entities I have selected may exemplify the approach
of structural linguistics.
Those structural linguists who concern them-
selves exclusively with the study of sets of linguistic
entities and their interrelationships are sometimes
called mathematical or computational linguists. Other
linguists may deal with other language problems--the
pronunciation of talk, walk in various areas, the
stylistic differences between talk and speak; and
so on. But a computational linguist limits his concern
to sets of entities and their interrelationships.
2-4
If his approach to language is valid, in
using language men acquire a number of entities and
learn how to manipulate them in relation to one
another. Further, if a machine could be devised
which would store the number of entities stored by
man, with rules specifying their relationships to
other entities, the machine might simulate manes mas-
tery of language.
2-5
3 SIMULATION
As is well-known, about twenty years ago a
machine was developed which seemed to have the essen-
tial capabilities, the computer. Possibly the manip-
ulation of language would never have engaged the at-
tention of computer specialists if the problem of
rapid intercommunication had not become so prominent.
To be sure computation centers might have found it
amusing in time to have a few language games available
for visitors, when they became bored with tic-tac-toe,
checkers, chess or go. But since the scientists, who
were nursing along the infant computer and contemplating
uses for it when it matured, had just been involved in
international struggles which pointed up the importance
of reading the scientific publications of the other
side, they suggested that the computer might solve the
problem of intercommunication. The computer therefore
was looked on as the machine to take over the unin-
spiring activity of translation; supporting agencies
provided time on computers and a small amount of money
to research workers, whose goal was to be machine trans
lation. This seemingly overriding goal was the prime
activity for which language specialists might use
computers. To the outside world todarstill9 linguists
doing research with computers are working on machine
translation.
With a million words a day of important
materials awaiting translation from Russian to English,
let alone materials of secondary importance or materials
in Chinese, Japanese, German, French and so on, machine
translation would be a fine accomplishment. But the
3-1
problem requires a bit of preliminary work. We may view
the essential requirement one of synthesizing sentences.
This activity may be compared to synthesizing proteint
molecules--though nothing like the expenditure of time
and money has been applied to linguistic investigation
as to that of chemistry.
One of the first problems we may note is that
language is not a simple linear structure; rather, it
consists of numerous structures. One is made up of
entities like t m and so on, which might be compared
with atoms; this structure contains relatively few
entities, but their rules of interrelationship are com-
plex. A second structure is made up of entities like
then men walk, which might be compared with radicals;
this structure contains a great number of entities,
possibly with somewhat less complex rules of inter-
relationships. From these the smallest free form of
language is constructed, the sentence. In making
sentences, in using language man has somehow learned to
master both of these structures. More, he has learned
the relationships of the entities in the second struc-
ture to a totally different structure, that of concepts.
Since computer manipulation of language is a type of
simulation, before we can use computers effectively for
managing language, we must understand how these various
structures relate to one another, how language functions.
3-2
4 LANGUAGE DATA PROCESSING
Of the various problems, some are straight-
forwa?d, for example, the amassing of entities and their
rules, We spend the first ten years of our lives ac-
quiring control over one language, continue to add to
our stock of entities, and rarely achieve mastery over
a secotd language. To give the computer similar
opportunities we must have large-scale programs, by
means of which we can store genuine materials and
materials with a characteristic meaning. A great deal
of effort has been expended by members of the Linguistics
Research Center over the past five years to develop the
system of programs which handle the data of language and
their interrelationships.
Man has taken care of this problem very clev-
erly. He reduces language to a set of entities of
sound, about for-ty in a language, and accordingly has
relatively few building blocks to control. Unfortu-
nately no machine has been devised to match manes dis-
criminatory powers in managing the entities of speech.
Accordingly at present, machine manipulation of language
must be based on the second level with its tremendous
number of entities.
Since our work in the Center is still experi-
mental, it is difficult to forecast how many such
entities must be stored in a computer. Some estimates
put the number of chemical terms in German at two
million; the rules for relating them to other entities
of the language will obviously be fewer.
In the relatively small computers available today
the rules indicating interrelationships and vocabulary
4'l
items of a language must be stored on magnetic tape.
The programming system developed at the Linguistics
Research Center has been successful in analyzing
materials of limited vocabulary and syntactic complex-
ity, consisting of about 50,000 rules and items in
each language.
Until one has dealt with the highly rigorous
computer it is almost impossible to visualize the prob-
lems involved in a thorough analysis of language. A
simple example from a physics textbook may illustrate
some of them:
Loudness is the property of sound de°
termined by the effect of the power
of the sound waves on our ears.
Let us suppose that we write a computer routine--a
syntactic rule--relating of to a following noun, as in
of sound; the rule will not then handle the sequence of
the power, for here of is followed by the definite
article. If we modify our rule accordingly, we still
have not handled the use of of in of the sound waves,
for here the article is followed by a noun used as
adjective. In putting a sentence like this into a
computer, we must therefore provide for sequences
of of and a variety of entities. Obviously our rules
cannot be simple, though our example may have been.
Another entity of the sentence9 on may
illustrate a further type of problem° If we relate
on to the surrounding entities, we arrive at the
possible sequence: the sound waves on our ears. ThisIIMI1114110111/IJWOMMOVIIMINI
sequence might compare to that of the flat waves on the
4-2
flag-pole or the policeman waves on the traffic. But
such relationships which would appear identical to a
computer trouble us; the sentence from our physics
textbook seems absurd to us if on is related to waves
in either of these ways. We have learned that on is
related to effect and that the meaningful seqtence is
effect on our ears. We scarcely need to discuss the in-
adequate translations that would be produced if such
sentences were put word-for-word into German, Russian
or other languages.
Since an English speaker understands his
language in this way, a computer must also be prepared
to manipulate it accordingly. To arrange such manip-
ulations we must describe English far more precisely
than has ever been done before. The required detail of
description has never been provided before because
native speakers master such sequences, and we are
charitable to foreigners who learn inadequate English
from our inadequate grammars. But if a computer makes
any requirement, it is for precision. A computer would
not be happy with our simple sentence until it knows
what to do with every entity, including on. Conse-
quently a linguist has to determine the role of an
entity first of all, then describe it. Since even the
large dictionaries which have been produced for Eng-
lish, German and the other widely studied languages
have not described these languages adequately, lin-
guists in the Linguistics Research Center are row at
work producing such descriptions--writing rules for
English, German, Russian and other languages. Figures
1 - 5 illustrate the procedures involved in making
4c3
a syntactic analysis of an English sentence, in
accordance with a grammar written by Dr. Wayne Tosh
of our Center.
The resultant rules are many and intricate.
When produced, they must be handled by the computer,
but kept independent of it through use of generalized
computer programming. If, for example, a specialized
program were written for handling combinations of
prepositions plus nouns or prepositions plus articles
plus nouns, it would have to be revised to handle
sequences of prepositions plus articles plus noun-
adjectives plus nouns, as in of the sound waves. The
Linguistic Research System, produced under the direction
of Eugene Pendergraft, was devised to meet this require-
ment of generalization. With this system linguistic
rules, independent of specialized computer programs,
may be produced to handle phrases of various length- -
preposition plus noun as in of sound, preposition plus
article plus noun, as in of the pover, and longer phrases
like of the sound waves. Other instructions alert the
computer to watch out for prepositions like on after
a noun like effect. A chart of the system indicates
the demands placed on computer manipulatioL of language
and also one of the results of five years of work,
supported by the US Army Electronics Laboratories and
by the National Science Foundation.
One of our practical problems is to achieve
an understanding by outsiders of the use of computers
in processing linguistic material. Most of us had
our notions about scientific procedures determined by
elementary science class, in which we probably used a
Bunsen burner. This early activity seems to leave the
4-4
indelible impression that scientific equipment, for
example a_ ccputer, is like a Bunsen burner. There is
little variety of use for a Bunsen burner--it merely
heats things. The heat isn't different if one lights
it with a flint, or a match--if one strikes the match
on a piece of sandpaper or one's thumbnail. By anal-
ogy it is assumed that the machine is the essential
part of computation; after one switches on the power,
a computer can cook your data as well as mine. Yet
in language, as in the social sciences, the important
part of computation is the program. The importance
of how one utilizes a machine rather than the makeup
of the machine may be one of the essential differences
between work in the social sciences and that in the
natural sciences. Possibly software and hardware
sciences would be more appropriate names.
5 AUTOMATIC CLASSIFICATION
Yet even a system with programs of the com-
plexity of those illustrated is inadequate for handling
language. In a sentence of fewer than twenty elements,
for example, there are more than a million possibili-
ties of analysis. But this figures large though it is,
fails to take into account an analysis for meanings for
determining among other things that in our sentence
sound is similar in meaning to noise rather than to
healthys valid, as in a sound mind or a sound theorz.
When we handle the multitude of entities necessary in
analysis of meaning we will deal with many more
possibilities of interpretation than are found for on.
In managing these, our present computers would be choked.
Even the larger computers now becoming available would
deal with the quantities of data slowly. Adequate
speed seems possible only by refinements of computer
theory and in improved techniques of classification.
A few years ago R. M. Needham of Cambridge
University, pointed the way to such classification
with his clumping theory. His procedures are being
expanded for application to larger sets of data by
A. G. and N. Dale of our Center. Details are pro-
vided in the paper, A Pro ummiauuLELsaLllt2:maticcialtsish.122115atip2LILLinguisticand Information Retrieval Research LRC 64 WTM-40
written by A. G. and N. Dale and E. D. Pendergraft.
With other papers, this is available from the Center.
Even the procedures described in this paper require
a great deal of computer time for handling a relatively
small number of entities. Further research is being
pursued to improve and speed up the procedure. I have
5-1
time merely to mention such research; but would also
like to point out that it was not even envisaged be-
fore language analysis with computers was undertaken.
The amount of linguistic data which must be manipulated,
as well as its complexity; has pointed up the need for
research in fields of applied logic or mathematics that
would not have been related to language investigation
a few years ago. Students in the sciences; for whom
the required language courses may seem to have little
lasting value, might well consider applying themselves
to these problems. Solutions will follow only from a
quantitative approach; generally lacking in previous
students of language.
5-2
6 ANALYSIS OF MEANING
But though we face numerous problems in the
development of computer systems and in the theoretical
work which must be carried out before systems and com-
puters can manage efficiently the huge and complex
amounts of data, our largest problems remain in the
understanding of language° Chief among these is the
treatment of meaning° In dealing with meaning we are
probably a bit farther along than Plato, though not
much. Our dictionaries largely sidestep the pro-
blem; they set out to provide synonyms, whether mono-
lingually or bilingually° Since they are fairly effec-
tive tools, we can handle translation of a sort without
understanding meaning° But for competent translation,
for automating indexing and abstracting, for problems
in artificial intelligence, we will have to control
meaning as we now do syntactic relationships°
Our theoretical approach is clear° We assume
that language is structured at the level of meaning
similarly to its structure at the levels of sound
and syntax. Again, we do not relate entities to the
outside world, but to concepts° Still the problems
of analysis are staggering° The sheer magnitude of
the data--all human knowledge--is troublesome enough.
But how to classify it? By specialties as we do in real
life? Should one computer handle nuclear physics,
another the physics of light, another molecular biology,
and so on? (If we did, we would not welcome a physicist
who also concerns himself with biology)0 But if we
divide the universe of concepts in this way, what type
of hierarchical arrangement should we use? If9 for
6-1
exampl7a, we define man as 'male human being', should
we distinguish between the concepts 'male° and 'hUman
being' because 'male' is automatically supplied in such
sequences as 'he was a man who 000, the king is a man
who0.0'? It will be difficult to answer such questions
until we carry on a fair bit of investigation° Be-
fore then, it will even be difficult to pose the proper
questions.
6-2
7 ACCOMPLISHMENTS
It may be disappointing for non-linguists
to hear that linguistic work has scarcely begun, with
or without computers. Be we have some accomplishments.
Some theoretical positions seem supportable. We are
on our way to an extensive and flexible linguistic
research system, and expect to have adequate computers
to make use of it. The traditionally lone linguist
is beginning to work with specialists in related fields.
Even the achievement of analyzing language syntactically
may seem small. But our tools are still inadequate.
Given satisfactory scanning devices and more powerful
computers we will be able to use our system for ana-
lyzing more than a snatch of language. Already straight-
forward linguistic applications may be carried out,
if adequate resources are provided; any book may be auto-
matically indexed, and accordingly among other things
more readily proof-read. Bibliographical and other
data may be managed automatically; in a pilot project,
the Center has listed all Slavic books in the University
Library, so that anyone interested in Tolstoy9 in
Russian novels or the like, may be given an immediate
print-out of the titles. Other such projects need only
financial support for achievement. The chief aim of
the Center, however, is to continue theoretical investi-
gations of language and data processing techniques, and
the preparation of computer programs, so that ultimately
a computer will be able to manipulate language with
somewhat the same proficiency as does man.
7-1
i THIS IS A SENTENCE ANALYZED BY THE LINGUISTICS
RESEARCH SYSTEM. I
SCIENCE CONFERENCE CORPUS
CORPUS DISPLAY
OICCC001
CIC00002
01000003
01001001
01C01002
C1002001 (SEE ACCOPPANYING CISPLAYS)
01002002
20 NUVEMDER 1964
LNIVERSITY OF TEXAS SCIENCE CONFERENCE
NCVEMBER 20, 1964
INPUT SBNTENCE
TO
ANALYSIS PROGRAM
I
SCIENCE CONFERENCE
CORPUS DISPLAY
PAGE 1
01
001
OCO
CORPUS
TIS IS
10* 4
A *
20* *
SENTENCE *ANALYZED
30* *
64-Y
20
THE
NOVEMBER 1964
4 0*
50*
**
LING*UISTICS RE*SEARCH
60* *
SYS*TEM
1
70*
801-
90*
1004-
4*
4:
1-
**
41.
2.
MATRIX IDENTIFYING
CHARACTER POSITIONS
IN INPUT SENTENCE
Sentence begins
in col. 2 and
ends in col.
64.
SCIENCE CONFERENCE GRAMMAR
SYNTACTIC DATA----STRATUM 2--FORM
SORT
NOTES
FORM
CESIGNATUM
0 0
2C
67
1P 1.000000
00 0
2C
72
1
P 1.000000
10 0
2C
75
P 1.000000
0D 0
2C
77
P 1.000000
00 0
2C
79
P 1.000000
0D 0
2C
82
P 1.000000
0D0
2C
84
P 1.000000
0D 0
2C
85
P 1.000000
1
20 NOVEMBER 1964
s
V OIRMNR, * THIS
1S
1
N5A
1* SEN
* TENCE
B
V4C
1
* ANALYZ
1
V PRPSTN
I
* BY
1
DTRMNR
I
* THE
11
V N3A
* LIN
* GUIS
11, TICS
6B
1
N5H
i* RE
* SEARCH
B
N5A
* SYS
* TEM
B
3.
GRAMMAR USED
IN AUTOMATIC ANALYSIS
Symbols in DESIGNATUM column are class
names of construction lying to right of
symbol.
Each entry is separate rule
identified uniquely by number in FORM
column.
*nun
......
.
SCIENCE CONFERENCE GRAMMAR
20 NOVEMBER 1964
SYNTACTIC DATA----STRATUM 3--FORM SORT
NOTES
FORM
D1
2C 65
P 1.000000
-0
D 2
2C 66
P 1.000000
-0
D I
2C
P 1.000000
-0
D 1
2C
P 1.000000
-1
D 2
2C
P 1.333312
-2
D 1
2C'
P 1.000000
-1
D 2
P 1.000000
-0
D 1
P 1.000000
-0
D 2
P 1.000000
-0
2C
2C
2C
68
69
70
71
73
74
76
DESIGNATUM
1V SNTNC
V CLS
*
S 1
1 1 1 1 1
V CLS
V DTRMNR
V BE
IS
SNGLR
S1
PRSNT
S 2
V BE
* IS
V NP
SNGLR
SNGLR
PRSNT
S 1
V NP
* A
V NMNL
SNGLR
1
A
Si
V NMNL
r V NMNL
V VRBL
AA
-02
/6/S
PHRS
S1
S 2
V NMNL
iV N5A
AS
1
/6/S
V VRBL
V VRBL
V PRPSTN
- D2
-D2AN
PHRS
PHRS
rS
IS 2
V VRBL
V V4C
* ED
- D2AN
S1
V PRPSTNI V PRPSTN
V NP
PHRS
S1
SNGLR
S 2
PAGE 1
NOTES
FORM
O 2
2C
P 1.000000
-1
D 2
2C
P 1.333312
- 2
O 2
2C
P 1.333312
- 2
D 1
2C
P 1.000000
- 1
78 80
81 83
0 SIGNATUM
1V NP
V DTRMNR
V NMNL
SNGLR
1A
S1
S 2
1V NMNL
V NMNL
V N5A
AA
S 2
S1
1V NMNL
V N3A
V NMNL
Ai
S1
A /6/S
S 2
1V NMNL
iV N5H
AIS
1
/6/S
PAGE 2
,,.7.
7RK
IIM
Prrq
,'",=
.717
01..7
CCRPUS Cl
FROM
10
10
SAMPLE 001
TO
PRCBABILITY
10
11
*A A
PAGE 2 N
OT
ES
10
56
1.7777C-9
69
70
71
72
73
74
75
76
77
76
79
81
82
83
84
10
63
1.18511-11
69
80
7C
71
72
73
74
75
76
77
78
79
81
82
83
84
85
10
63
1.18511-11
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
11
11
12
14
*SEN
12
19
1.0CC00-1
72
12
19
1.00000-2
71
72
12
20
SENTENCE
12
56
1.77770-8
70
71
72
73
74
75
76
77
78
79
81
62
83
84
12
63
1.18511-10
80
70
71
72
73
74
75
76
77
78
79
81
82
83
84
85
12
63
1.18511-10
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
15
19
*TENCE
20
20
20
20
21
21
itA
21
26
ANALYZ
21
26
1.0
75
21
28
1.0
74
75
21
56
1.33331-4
73
74
75
76
77
78
79
81
82
83
84
21
63
1.77770-7
73
74
75
76
77
78
79
80
81
82
83
84
85
CORPUS 01
FROM
23
27
27
27
SAMPLE 001
TO
PROBABILITY
23
28 28
29
*A
*ED
ED
ED
PAGE 3 NOTES
29
29
*
29
29
30
31
*BY
30
31
1.0
77
30
32
BY
30
56
1.33331-4
76
77
78
79
81
'
82
83
84
30
63
1.77770-7
76
77
78
79
80
81
82
83
84
85
32
32
4
32
32
33
35
*ME
33
35
1.0
79
33
36
THE
33
56
1.33331-4
78
79
81
82
83
84
33
63
1.77770-7
78
79
80
81
82
83
84
85
36
36
*
36
36
37
39
*LIN
37
47
1.0
82
37
48
LINGUISTICS
37
56
1.33331-3
81
82
83
84
37
63
1.77770-6-
80
81
82
83
84
85
DT
RM
NR
IS
67
TH
ISIS
CLS
66
BE SN
GLR
PR
SN
T
NP
SN
GLR
NM
NL
A
70
SN
INC
VR
BL
-D2
PH
RS
73
PR
PS
TN
PH
RS
NP
SN
GLR 78
5. P
AR
SIN
G D
IAG
RA
M
Rec
onst
ruct
ed fr
om A
NA
LYS
IS d
ispl
ay to
illus
trat
e an
alys
is p
rovi
ded
by c
ompu
ter.
NM
NL
A
NM
NL
A /o/S
71I
N5S
/712
\S
EN
TE
NC
E
VR
BL
-D2A
N
/4D
TR
MN
RV
4CP
RP
ST
NI
A77\
79
AN
ALY
ZE
DB
YT
HE
LIN
N3A
N5H
N5A
GU
IST
ICS
RE
SE
AR
CH
SY
ST
EM
LEXICALANALYSISANAL YSIS DISPLAY
--->
INPUTCORPUS
CORPUSREVISION
CORPUSDISPLAY
CORPUSSELECTION
CORPUS MAINTENANCE
LEXICALANALYSIS & CHOICE
1
SYNTACTICANALYSIS & CHOICE
SEMANTICANALYSISANALYSIS & CHOICE
LEXICALANALYSIS DISPLAY
LEXICAL & SYNTACTICANALYSIS DISPLAY
#
LEXICAL, SYNTACTIC& SEMANTIC
ANALYSIS DISPLAY
MONOLINGUAL RECOGNITION
INPUTDISTRI-BUTION
TRANSFER MAINTENANCE
MONOLINGUALINPUT TRANSFERSELECTION
LEXICALANALYSIS
SYNTACTICANALYSIS
SEMANTICANALYSIS
SYNTACTICANALYSIS DISPLAY
SEMANTIC,
ANALYSIS DISPLAY
INTERLINGUAL RECOGNITION
INPUTTRANSFEF
GRAMMAR MAINTENANCE
INPUT GR.:MMARSELECTION
RANSFER MAINTENANCE MONOLINGUALTRANSFER REvISION
THE UNIVERSITY OF TEXAS
LINGUISTICS RESEARCH SYSTEM
SUBSTI-TUTION )
MONOLINGUALTRANSFER DISPLAY
INTERLINGUALTRANSFER REVISION
DISTRI-BUTION
CORPUSDISPLAY
INTERLINGUALTRANSFER DISPLAY
OUTPUTDISTRI-BUTION
OUTPUTCORPUS
MONOLINGUALINPUT TRANSFERSELECTION
INTERLINGUAL MONOLINGUALTRANSFER OUTPUT TRANSFERSELECTION SELECTION
I
AYINTER -LINGUA
LEXICALSYNTHESIS
LEXICALCHOICE & SYNTHESIS
AY
AY
TRANSFER
;RAMMAR MAINTENANCE RULEREVISION
OUTPUTTRANSFER
SEMANTICSYNTHESIS
INTERLINGUAL PRODUCTION
44
SYNTACTICCHOICE G SYNTHESIS
SEMANTICCHOICE t SYNTHESIS
MONOLINGUAL PRODUCTION
OUTPUTGRAMMAR
PROBABILITYREVISION
INPUT GRAMMAR GRAMMAR OUTPUT GRAMMARSELECTION SELECTION
GRAMMARDISPLAY
Circles represent magneticdata tapes. Boxes representprograms; those with heavylines are scheduled forcompletion by the end ofthis year.