LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN...

11
Annals of Library Science and Documentation 1988,35(2),58-68 LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN REVISED F J DEV ADASON Documentation Research Training~ntre Indian Statistical Institute &ngalore-560059 MRKUMBHAR Librarian Manipur Univel'lity Library Canchipur Imphal-795003 A study of the salient features of ordinary difficult to imagine any satisfactory definition language is necessary for the design of infor- of the term 'language' that did not incorporate mation storage and retrieval systems in general some reference to the notion of cornmunica- and indexing systems in particular. This paper, tion" [5]. Language is a specific social means based on the works of Nalimov, and Gardin, of information storage and transfer as well as attempts to discuss some of these salient [ea- of controlling human behaviour. It is a purely tures. Only the relevant characteristics of ordi- human and non-instinctive method of com- nary language are mentioned, which are re- municating ideas, emotions and desires by fleeted in indexing languages. The important means of a system of voluntarily produced features of indexing language and its meta- symbols. Only the communicative aspect of language character are also mentioned. Com- language plays an important part in inform a- morl analogical features of both subject ana- tion systems, "the essence of which is storage lysis and linguistic analysis are presented along of information for use and manipulation by an with a guage. Indicaters that the Postulate based individual" [6]. Permuted Subject Indexing (POPSI) language, One of the features of modem investigation together with its vocabulary control tool Clas- of language is the recognition that 'ordinary saurus, has also the important components of language' - be it written or spoken - is only an indexing language. a member of a class of coherent symbolic com- munication systems. All conceivable symbol systems have been considered as languages. For instance : the language of music, the language of dance, the languages used for computer programming etc. But fulfilment of the com- munication function alone cannot be con- sidered as a necessary and sufficient require- ment for elevating a symbol system to the rank of 'ordinary language'. Exchange of informa- tion may be effectuated not only with 'words' but also with other symbols. Also exchange of information may take place between a human being and a computer and also between inani- mate mechanism - between two computers. Many phenomena of the physical world can be regarded in terms of receipt and transfer of information. Even photoelectric effect can be regarded as information transfer. But these do not become 'ordinary languages'. The essential functional characteristic of ordinary language (referred to as just 'language' hereafter) is its role in information reduction or synthesis, abstraction, generalisation; in elaboration or expansion; and particularisation or concreti- sation - in short, thinking. INTRODUCTION Language is a means of communication and thinking - for dissemination of information and to at' extent the creation of it. In indexing, language plays a very special role in information organisation and documentation. However, it is not language which is the object of study here, but rather the use of language to formu- late and communicate ideas and opinions [1]. A study of the salient features of language is helpful in the design of information storage and retrieval systems ir; general and indexing systems in particular [2]. This paper based on the works of Nalimov [3] and Bardin [4] pre- sents some of the salient features of langugage and indexing language. LANGUAGE Language is system of symbols serving the means of human communication, thinking and ex- pression. The primary function of language is communication between individuals. "It is 58 Ann Lib Sci Docu

Transcript of LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN...

Page 1: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

Annals of Library Science and Documentation 1988,35(2),58-68

LANGUAGE AND INDEXING LANGUAGENALIMOV AND GARDIN REVISED

F J DEV ADASONDocumentation Research Training ~ntreIndian Statistical Institute&ngalore-560059

MRKUMBHARLibrarianManipur Univel'lity LibraryCanchipurImphal-795003

A study of the salient features of ordinary difficult to imagine any satisfactory definitionlanguage is necessary for the design of infor- of the term 'language' that did not incorporatemation storage and retrieval systems in general some reference to the notion of cornmunica-and indexing systems in particular. This paper, tion" [5]. Language is a specific social meansbased on the works of Nalimov, and Gardin, of information storage and transfer as well asattempts to discuss some of these salient [ea- of controlling human behaviour. It is a purelytures. Only the relevant characteristics of ordi- human and non-instinctive method of com-nary language are mentioned, which are re- municating ideas, emotions and desires byfleeted in indexing languages. The important means of a system of voluntarily producedfeatures of indexing language and its meta- symbols. Only the communicative aspect oflanguage character are also mentioned. Com- language plays an important part in inform a-morl analogical features of both subject ana- tion systems, "the essence of which is storagelysis and linguistic analysis are presented along of information for use and manipulation by anwith a guage. Indicaters that the Postulate based individual" [6].Permuted Subject Indexing (POPSI) language, One of the features of modem investigationtogether with its vocabulary control tool Clas- of language is the recognition that 'ordinarysaurus, has also the important components of language' - be it written or spoken - is onlyan indexing language. a member of a class of coherent symbolic com-

munication systems. All conceivable symbolsystems have been considered as languages. Forinstance : the language of music, the languageof dance, the languages used for computerprogramming etc. But fulfilment of the com-munication function alone cannot be con-sidered as a necessary and sufficient require-ment for elevating a symbol system to the rankof 'ordinary language'. Exchange of informa-tion may be effectuated not only with 'words'but also with other symbols. Also exchange ofinformation may take place between a humanbeing and a computer and also between inani-mate mechanism - between two computers.Many phenomena of the physical world can beregarded in terms of receipt and transfer ofinformation. Even photoelectric effect can beregarded as information transfer. But these donot become 'ordinary languages'. The essentialfunctional characteristic of ordinary language(referred to as just 'language' hereafter) is itsrole in information reduction or synthesis,abstraction, generalisation; in elaboration orexpansion; and particularisation or concreti-sation - in short, thinking.

INTRODUCTION

Language is a means of communication andthinking - for dissemination of informationand to at' extent the creation of it. In indexing,language plays a very special role in informationorganisation and documentation. However, itis not language which is the object of studyhere, but rather the use of language to formu-late and communicate ideas and opinions [1].A study of the salient features of language ishelpful in the design of information storageand retrieval systems ir; general and indexingsystems in particular [2]. This paper based onthe works of Nalimov [3] and Bardin [4] pre-sents some of the salient features of langugageand indexing language.

LANGUAGE

Language is system of symbols serving the meansof human communication, thinking and ex-pression. The primary function of language iscommunication between individuals. "It is

58 Ann Lib Sci Docu

Page 2: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

LANGUAGE AND INDEXING LANGUAGE

STRUCTURE OF LANGUAGE

Signs/Symbols

First, a language is composed of a plurality ofsigns - articulate sounds as signs/symbols of 2.ideas.

Second, in a language each sign/symbol hasa signification common to a number of inter-preters.

Third, signs constituting a language must be'common signs', that is, producible by themembers of the interpreter-family and have thesame signification to the producers which theyhave to other interpreters.

Fourth, the signs which constitute a langu-age are plurisituational signs, that is, signs witha relative constancy of signification in everysituation in which a sign of a sign-family inquestion appears. A sign in a language is a sign-family and not merely an un i-situational vehicle[7). A sign and its meaning do not completelycover each other. Their boundaries do notcoincide in all points. One and the same signhas several functior:s. One and the same meaningis expressed by several signs. "If signs werefixed and _each of them fulfilled only onefunction, language would become a mere collec-tion of labels" [3). It is not enough for theperfection of language that sounds can be madesigns of ideas, unless those signs can be so madeuse of as to comprehend several particularthings: for, the multiplication of signs wouldhave perplexed their use, had every particularthing needed a distinct name to be signifiedby. To remedy this inconvenience, language hadyet a further improvement in the use of generalterrns, whereby one word was made to make amultitude of particular existences.

Fifth, the differen t signs in a languageshould fit into a semantic hierarchic structure."I see two things: a chair and a furniture" isnot an acceptable sentence. Semantic hierarchyamong the different signs may be viewed asone of the conditions necessary for regarding asystem of signs/symbols as a language. The signsin a language must constitute a system of inter-connected signs combinable ir. some ways andnot in others in order to form a variety ofcomplex symbols of different levels [7] .

Symbol system

The structure of language starts with

Vol 35 No 2 June 1988

1. subelementary linguistic symbols: the lettersor alphabets, morphemes (a morpheme is ameaningful part of word - the root and theaffix, prefixes, suffixes etc.):

the elementary symbols: the words (a wordis a symbol for one of the smallest com-pletely satisfying bits of isolated 'meaning' -a fragment of a text between spaces);

3. phrases (a phrase is a fragment of a textbetween two punctuation marks);

4. clauses and sentences (a sentence is a frag-ment of a text between full stops); para-graphs and SCi on.

The complex symbols constructed from simplerones fcrm a hierarchical symbol system ofseveral levels. Phenomenologically, thinking isa proc.ess of constructing complicated symbolsfrom simpler ones which is outwardly reflectedin the hierarchical structure of language (3). Itis reflected even in the sequence of sentences,paragraphs and even larger portions of text. Atext does not consist of a sequence of unrelatedsentences. If it did, it would not convey theintended message properly. In fact, every pre-ceding sentence thrives to fix the context forthe next sentence, helping the precise under-standing of the following sentences, infusing akind of hierarchy among them. "It should beacknowledged that logical hierarchy of state-ments exists in language, but it is so concealedthat in practice, it cannot be directly ob-served" [3, p.31). The narration of events inchronological sequence in text is an exampleof this kind of 'hierarchy'.

GrammarIn order tc make operative the rules for con-structing complex logical structures using thesimpler symbols, the symbols are viewed asbelonging to grammatical categories or parts-of-speech such as noun, verb, adverb etc. Therules of syntax (grammar) are defmed withrespect to these categories. But Chinese makesno use of the formal categories devised for theIndo-European languages. So also the poly-synthetic languages of American Indians. Sym-bols and grammar are, of course structural ele-ments of language; they are clearly seen in themajority of the symbol systems perceived aslanguages [3, p.28).

59

Page 3: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

DEVADASON

Characteristics of Language

Interpretability

A sign system has a right to be called a languageif it can be interpreted into another language.Speaking to a foreigner we interpret our mothertongue in the system of another language, andactually it is not a translation but a mere inter-pretation. This is because of the metaphors,special phrases, idoms etc., linked to cultureand historical experiences present in language.

PolymorphismConsider the following three phrases :

1 This a good mango,2 This a good lemon,3 This is a good carving knife.

It is readily seen that the word 'good' has quitedifferent meanings in these phrases. A goodlemon Should be sour, a good mango shouldnot be sour; and a good carving knife shouldbe sharp which has nothing to do either withthe quality of lemon or with that of mango.Both in every-day language and in many otherlanguages every special symbol is connected ina probabilistic manner with a variety of mean-ings. One of the characteristics of polymorphismis that of synonymy. This diversity of everydaylanguage is considered its most essential charac-teristic and is no longer regarded as an indexof its deficiency. It is due to its polymorphismthat natural language is richer than any arti-ficially created one.

METALANGUAGE

Languages with highly developed logic emergeup to constitute metalanguages whereby onemay judge the correctness of statements madein the object-languages. Our everyday languageis a metalanguage in relation to the. 'language'of things surrounding us. In terms of every-daylanguage we operate not with things but withtheir names. Making judgements about thethings of the outer world, we try to arrangethem in some consistent structures which isequivalent to searching for logical foundationsof the world of things. Generally the metal-anguage is formalised to a greater extent thanthe object language. A metalanguage should be

60

so rich that every thing stated in terms of theobject language could be said in the metal-language; particularly, it should have the meansfor constructing names of the object-language."The metalanguage of everyday speech usesthe same guage [3, p.39] .

LANGUAGE OF SCIENCE

Components of the Language of Science

Science is a linguistic or symbolic representa-tion of experience. Scientific development isreflected in the development of the languageof science. In the language of science the sym-bols and grammar of ordinary language are used.New terms (or new combination of old words)and new meanings ascribed to old words bor-rowed from everyday vocabulary are introduced.This gives an esoteric character to specific langu-ages of sciences: they prove comprehensibleonly to the initiated. Terms in science areclosely connected with the respective theoreti-cal concepts, though on the surface, manyterms seem to be more than names of someobject or phenomena. For example the term'Raman effect' would seem to be a name ofsome physically observed phenomenon. Infact, the meaning of this term becomes clearonly through the understanding of the theoryof this phenomenon. No doubt, in the languageof science there are some terms which may beclearly defined. But the meaning changes withtime, together with the development of scienti-fic concepts. The meaning ascribed to the word'atom' now differs considerably both from thatascribed by the ancient Greeks and from thatused at the beginning of this century. Themeaning ascribed to terms change in differenttheories though they may be closely related.Both 'Classical mechanics' and 'Relativitytheory' make use of such terms as 'mass' and'length' but they are interpreted differently.

Polymorphism of the LangUageof Science

Scientific terms have a more polymorphiccharacter than the words of ordinary language,The term 'model' has been studied by ChoYuan-Ren [8]. He has given a list of severalsynonyms which is characteristic of 'model'and non-synonyms which are notions contrastedto 'model'. Scientists express new notions withthe help of rather unusual combinations of old,

Ann Lib Sei Docu

Page 4: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

LANGUAGE AND INDEXING LANGUAGE

well known and familiar expressions. Scientistshave permitted rather differen t senses for oldterms with the emergence of new theoreticalconcepts. In science, theories are continuouslychanging, but the change does not cuase awater fall of new words. The new phenomenaare interpreted, through the old familiar ones,through old words whose meaning is slightly,continuously changed. For instance, every bodyknows the common meaning of the word'isolate'. In Colon Classification its meaningis completely changed (that which in isolationis not fit to form the name of subject and hasto have a Basic Subject component along withit to represent a subject). In the language ofscience, new concepts emerge, and old notionsare often assigned new meaning. Because of itscontinuously changing nature, specific languagesof science are accessible only to those workingin the field and thus constantly interacting withthe informational flow in science. It can besaid that, the emergence of a new independentdiscipline must be accompanied by the emer-gence of a new specific language or a dialect ofthe discipline. "In the realm of professionalactivities too there evolves a restricted andaccommodated language, a so called professionallanguage. A professional language is developedand used by people for some professional pur-pose, that is, performing professional tasks ina specific field and communicating about them.A professional language consists of vocabulary(terminology and concepts) and types of com-municative acts (including typical intentions).A professional language is not only used fortalking about an activity field (a universe of dis-course); it is also a part constituting it" 19].of course Library and Information Processingprofession has its own professional language anddialects too.

BIBLIOGRAPHIC INFORMATION SYSTEM

The functions of a bibliographic InformationSystem are: collecting, organising, storing,citing and disseminating of documents or· theinformation/ideas recorded in the documents.Such information processing activity can begenerally divided into the following stages:

1. Collection of information/documents em-bodying information with the purpose ofsupplying most fully, the necessary infor-mation/ documen ts in accordance with therequest of the users;

Vol 35 No 2 June 1988

2. Information analysis (analytic and syntheticprocessing of information/document surro-gates/ documen ts themselves), categorisationand systematisation of the incoming flowof information; and

3. Storage and retrieval of information/docu-ments and the development and use ofinformation retrieval procedures with theuse of modern means for achieving the de-sired results.

of the above mentioned three stages, the secondone namely, information analysis or documentanalysis or subject analysis or content analysisdecides the efficiency of the system. It dealswith systematising or organizing the informa-tion/documents accompanied by classificationand indexing of the information content of thedocuments; the creation of new secondarydocuments or document surrogates on the basisof certain rules and procedures depending onspecific tasks of information practice to formthe index me or enquiry me, and the retrievalprocedures constituting the interface in theinformation retrieval process. A model Biblio-graphic Information System is given in figure-I.

Documents are selected, received and theirinformation content analysed to determine thesubject categories into which they are to beclassified as well as the index terms associatedwith the documents, according to a specifiedlanguage having grammar, called information/documentation/indexing language. After thecompletion of this analysis, there is a branchingwhich allows for more than one method of'organising' the 'files'. The documents them-selves are stored physically in a pre-definedsequence constituting the, 'storage file'. Sinceonly one physical arrangement of documentsis possible without expensive duplication of thedocuments, alternative paths of access to themare provided through the 'index me' which con-tains the index terms determined at the analysisstep .

The user requests information. She/He inter-acts with the index and/or storage file. To do so,she/he must convert his/her request for informa-tion into a well defined search question to whichthe system can respond. The formulation ofthe search question again has to be accordingto or in the information/indexing language.When the user is satisfied with the documentswhich have been located in response to this

61

Page 5: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

DEVADASON

r-----------~STORAGq_------~

IIIIII

SELECTION ANALYSIS _- --- --- - -1- - - - --III

I II I iI II II III II INDEX 1I II1 II I: II I

I I II I I1.- ....1. -I

request, she/he reads them to obtain new factsand ideas. Presumably, she/he will now become~ gene~ator of new documents which will in turnfind their way into the system.

The nucleus of the Bibliographic Informa-tion System is the Information/Indexing langu-age. According to it the documents are ana-lysed. The creation and organisation of the'index file' and the setting up of the retrievalprocedures are all based on the analysis of thedocuments according to the prescriptions ofthe information/indexing language.

Natural language is a tool for thinking anda means of communication. In such a languagethere are synonymous correspondences bet-ween words and meanings. The meaning of 4words, sentences and even speech may changewith time. Hence, the use of a natura1languageas such in information retrieval systems for thedescription of semantic contents of documentsis associated with difficult problems. in orderto overcome these problems, special artificiallanguages called information languages/indexinglanguages/documentation languages are used.These artificial languages are formalised auxili-ary languages, created by information pro-fessionals for use in bibliographic informationstorage and retrieval systems. Because of their

62

Fig. 1

formalised structure they are regarded as meta-languages for information organisation and re-trieval. They are used in

1 analysing the information content of docu-ments;

2 formulating names for the information con-tent or names for the subject of the docu-ments;

3 verifying the correctness and completenessof the statements of the names of subjectsof documents; and

constructingtools.

other information handling

INFORMATION/INDEXING LANGUAGE ASMETALANGUAGE

In information/indexing languages, the meaningextracted from the text of a document (denot-ing the information content), is designated byconcepts and their inter-relations (which are notnecessarily found in the text) by 'adhoc' sym-bols (descriptors, role indicators, facet indica-tors, links etc.), in a formalised way. Because

Ann Lib Sci Docu

Page 6: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

LANGUAGE AND INDEXING LANGUAGE

these set of symbols are external to the object-language in which the documents under analysisare written and intended to facilitate the mani-pulation of these documents in various waysand for various purposes, they can be called'metalanguages' [41. They are not only used inanalysing 'what the information content ofdocuments are' but also in constructing state-ments of the names of subjects of documentsand also for verifying the correctness and com-pleteness of these statements. In the formalisedstatements of these metalanguages "some ofthe 'logical semantic' relations, specificallythose of implication are specified, but not inthe surface-structure of the object-language,that is, natural language" [10]. Moreover, theseinformation/indexing languages, though de-veloped under independent conditions for theanalysis of documents of many different kinds -not only scientific texts, but also sociologicalrecords, scripture, folklore etc., - have strikingsimilarities [4]. An account of these meta-languages has been given by Gardin [11] .

FEATURES OF INFORMATION/SUBJECTAND LINGUISTIC ANALYSIS

Information/document analysis in the contextof a particular science, obviously implies someknowledge 'about' the universe of discourse.There is a need to have a knowledge to writesummarising statements about the content ofthe document/information being analysed. It isleft to be decided whether language analysis ina more general, less specialised field of speechor writing, does not also imply some knowledgeabout that field.

Information/indexing languages do not feelit an obligation to consider the natural languagesentence as the proper unit of analysis, as alarge number of linguists would have it. For onething, the definitions of what a sentence is, inany given natural language are not so stable asto provide with a firm analytical framework forthe kind of semantic analysis required in infor-mation/indexing languages. Conversely, infra-sentential units often provide a more con-venient basis on which to conduct the deriva-tion of information/indexing language units,which are later chained to one another throughprocedures that again overstep the boundariesof natural language sentence.

In language analysis the priority is given tosyntactical analysis, which is considered as the

Vol 35 No 2 June 1988

necessary step by which to initiate the descrip-tion or generation of natural language text andto the understanding of language. The standardunit of analysis is the sentence, as it is more orless intuitively defined in traditional grammar;any larger unit is felt to exceed the scope ofsyn tactical analysis. The basic language unitsfor syntactical analysis are grammatical cate-gories (nouns, verbs, adjectives etc.) and gram-matical functions (subject, object etc.) offormer times. But they are being further refined,In accordance with the priority given to syn-tactical analysis, semantic analysis is confinedto a subsidiary position as illustrated by Katzand Fodor [12], which still seems to form thebasis of the idea that the properties of 'surfacestructures' (in general terms) play a primaryrole in the determination of meaning [13] .

But linguists themselves have felt the needto broaden the basis of language analysis. Grimes[14] has pointed out that it is unwise to go onignoring the findings of other disciplines alsoconcerned with the analysis of text such as,rhetoric, criticism etc., and information scienceitself, especially since some of the better de-fined procedures recently setforth in theseareas occasionally unveil facts that linguistswould be ill-advised to neglect. A number oflinguists have proposed the broader concept of'paraphrase' as a basis upon which to decide theproper level where sentences should be assignedthe same 'deep structures' [15]. Also a notionof syntactic analysis as a less discriminating formof semantic analysis [16], and the notion that"syntactic structures are derived from semanticgraphs" [17] point to the fact that syntax isno more considered to be the first step of langu-age analysis. The derivation of syntactic struc-tures on 'semantic graphs' is exactly the strategyadopted in information/indexing languages inconnection with subject/document analysis.

An investigation of 'prelexical' structures,equivalent to representations in information/indexing languages, prior to the development ofgrammatical transformations has also beenadvocated by linguists [18]. Again the adop-tion of ...•.'seman tic categories (that) can guidethe interpretation of sentences, independentlyof and in parallel with perceptual processing ofthe syntactic structure" [19] has also beenadvocated. Though some linguists have favouredunits smaller than sentence [16, p. 101-102],others on the contrary, have demonstrated thatlarger units are necessary to provide a proper

Page 7: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

DEVADASON

understanding of 'coordination', 'pronominali-sation', 'subordination' etc. [20-22]. Moregenerally there have been many signs of a revivalof interest for the analysis of 'discourse' overand above the analysis of sentences, as done indocument/subject analysis.

It has also been observed by linguists [20,P: 32; 199p 297; 14, p.141] that the 'proposi-tion' defined in more or less logical terms pro-vides a more convenient analytical frameworkthan the grammatically defined sentence. In thepresentation of the basic structures that aresupposed to underlie the formation of speech,linguists have tended to use logical conceptsoften very close to those which underlie docu-ment/subject analysis. From the modest use of'markers' [12] came the view of semantic re-presentations as 'trees' from which has come thebroader concept of relation of combinationinformation 'about the world' that have to betaken into account for a proper understandingof language. The function of such semantic net-works is to provide for the enunciation andapplication of rules of 'well-forme dness , thatwere presented as a necessary extension ofsyntactical theory. The 'presuppositions' havebeen given more and more importance later[23]. Though the notion of 'presupposition'needs "some sort of conceptual straighteningup" [24] it can still be observed that it is beingused in the same way, for theoretical purposesjust as the notion of semantic or 'paradigmatic'organisation has been used for applied pur-poses in information/subject analysis.

Another parallelism with the procedures ofdocument/subject analysis could be seen if thekind of 'categories' used in the fomulation ofpresuppositions is considered. In brief, the pre-suppositions associated with a given word con-sists in a statement of some relational/combi-national properties of the given word with res-pect to other words or classes of other words.The basic idea is that 'role structures' can beattached to particular words [22, p. 66] inorder to express the range of their possible re-lations with other words, from a syntactico-semantic viewpoint. The 'case frames' of Fill-more [25], the 'basic relation structures' ofBever [13, p.279-352], the 'mental templates'etc.) naturally differ according to the parti-cular word under consideration. However,identical role structures have been assigned toclasses of words. By carrying the abstractionprocess further, a set of 'universal roles' can be

64

suggested. According to a well known linguistLangendoen : "eventually it will be necessaryfor linguistics theory to come up with a uni-versal inventory of roles" [20, P: 79). Vermier[26] too has come up with such conclusions.

Such universal roles, then would have to beof a rather abstract nature (such as Agent,Instrument, Object, Process, Property etc).They would thus tend to converge with logicaloperators of the kind used in 'syntactical' ana-lysis, even though the initial purpose was toaccount for 'semantic' structures. Specific field-bound roles have been used in document/subject analysis for a long time. For examplethe various faceted classification schemes fordifferent special fields. The process of abstrac-tion leading to the more general 'role opera-tors' or 'categories' too is a well known pathused in theory of document analysis to relatesemantics and syntax. For example the cate-gories of Kaiser's systematic indexing, theFundamental Categories Personality, Matter,Energy, Space and Time of Ranganathan'sColon Classification and his search for 'absolutesyntax' of facets [27, 28]. Moreover the sug-gestion that "deep structures should be statedin terms of role relationships rather than syn-thetic relationships" [19, p. 279; 32, P.62] sug-gested that broad conceptual framework infer-red from the practice of document/subjectanalysis should bear some (at least analogical)relation to linguistic theory and language ana-lysis. Linguists also have come up with similarproposals. For instance, Grimes' categories:States, Processes, Actions etc., [14, p. 167-8]and Bach's new category 'Contentive' to covernouns, verbs and adjectives [29] .

Further, meaning representation systemssuch as Sager's automatic conversion of texts toa structured information base [30], Schank'scomputerised paraphrase and inference system[31, 32] , wilk's intelligent analyser and under-stander of English [33], question answeringsystems of Grishman and Hirschman [34 J ,and of Lehnert [35] and so on, all depend onsome sort of underlying semantic categorisationand deep syntactic relations.

Formally stated, the common thread bet-ween the theories of document/subject analysisand that of language analysis is, the role assignedto n-place predicates (the term 'predicate' in thiscontext designates any relation holding betweentwo, or more entities or any property of any

Ann Lib Sci Docu

Page 8: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

LANGUAGE AND INDEXING LANGUAGE

entity) [10, p. 210], to express semantic as 3well as syn tactic represen tations with:

1 the need to categorise the symbols of thevocabulary (words, descriptors) in such away that formation rules equivalent to thephrase-structure rules of grammar can bestated adequately with no regard to thegrammatical status customarily assigned tothe words concerned; and

2 the need to account for the derivation ofpropositions from one another, as a necess-ary component, in the understanding oflanguage behaviour.

COMPONENTS OF INFORMATION/INDEX-ING LANGUAGE

In documentation/information/indexing langu-ages, there is a focus on the concept of 'sum-mary' or 'aboutness' which may not be consi-dered important in 'other' languages. Infor-mation/indexing languages are used to formu-late statements as answers to questions ofthe type 'what the information content ofparticular documen ts are about'. These state-ments stand as names of the topics or subjects ofthe documents. In short they are languages fornaming subjects for the special purpose ofstorage and retrievaL The minimum essentialcomponents of any such metalanguage are [4,p.147] .

1 The 'lexicon', a list of content terms, eitherextracted as they occur in a given naturallanguage (keywords) or redefined for thepurpose of the analysis (descriptors). If thelist of content terms carry no relationalinformation of any kind, then the metalan-guage is said to be 'unorganised' (unitermlists) .

2 A set of relational data provided 'a periori'with the lexicon, irrespective of the way inwhich they are expressed (cross references,hierarchy, tree, factoring etc.), and whetherthey be regarded as semantic data as intaxonomies or as syntactic templates as insome faceted classification schemes. Thisforms the paradigmatic structure of themetalanguage .

Vol 35 No 2 June 1988

To take into account the immensely diverserelational data observed in documents, aset of rules of syntax, constituting thesyntagmatic structure of the metalanguage,which is contrasted to the paradigmaticstructure, not in essence but in use.

POSTULATE BASED PERMUTED SUBJECTINDEXING LANGUAGE

The Postulate-based Permuted Subject Indexinglanguage of Bhattacharyya [36-42] has all theessential features mentioned above. It is basedon

a set of postulated Elementary Categoriesof the elements fit to form components ofnames of subjects;

2 a set of syntax rules with reference to theElementary Categories for formulating ad-missible names of subjects; and

1

3 a faceted hierarchic scheme of terms forvocabulary control called Classaurus.

Each of the terms occurring in POPSI languagestatements have their appropriate broader termsprefixed to them. Hence the POPSI languagehas been shown as a source language for pro-ducing and organising associated indexes [43].It has also been shown as a metalanguage forcomputer aided generation of information re-trieval thesaurus [44], as well as Classaurus [45,46]. POPSI language is amenable for com-puterisation of index generation, especially inproducing different types of indexes includingchain index and PRECIS-format index entries[47. 48] . It is also possible to use POPSI langu-age in computer based online informationretrieval. In such an online information retrievalsystem the searcher need not know any querylanguage and the system would have built-invocabulary control mechanism [48,49] .

CONCLUSIONS

A study of the salient features of language andlanguage analysis is necessary for the design ofinformation retrieval systems in general andindexing system in particular [2] . It would helpin incorporating the required features thoughnot exactly but atleast analogically in indexingsystems. POPSI has incorporated all the essential

65

Page 9: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

DEVADASON

features necessary for making it an universalindexing system. Its resilience and amenabilityto computerisation has paved the way for re-search in this direction, the ultimate aim ofwhich is to develop an online information re-trieval system which would be user friendly.

BIBLIOG RAPHlCAL RE FERENCES

1. Beling, G. The use of EDP in terminologicalwork. In Overcoming the language barrier: Pro-ceedings of the language barrier: Proceedings ofthe Third European Congress on InformationSystems and Networks. Luxembourg, 3-6 May,1977. K.G. Saur, London. 1979. v.1, P 102-103.

2. Mitchell, Gillian. The natural language founda-tions of indexing language relations. CanJI of InfSc 1979,4,99-104.

3. Nalimov, V.V. In the labyrinths of language: Amathematician's journey. ISI Press, Philadelphia.1981. p 3.

4. Gardin, J .C. Document analysis and linguistictheory.Jl of Doc 1973,29(2),144.

5. Lyons, J. Semantics. Cambridge University Press.Cambridge. 1977. p.32.

6. Sloman, A. The primacy of non-communicativelanguage. In The analysis of meaning: Proceed-ings of a conference held by the Aslib Informa-tics Group and the BCS Information RetrievalSpecialist Group. (Informatics 5). Oxford, 26-28March 1979. Aslib, London. 1979. p 1.

7. Morris, C. Signs, language and behaviour. Pren-tice Hall, New York. 1946.

8. Chao Yuan-Ren, Models in linguistics and modelsin general. In Logic, methodology and philoso-phy of science: Proceedings of the InternationalCongress. Stanford University, Palo Alto, Califor-nia. 1962. p 558-66.

9. Israel, J. The language of dialectics and the dia-lectics of language. Munksgaard, Copenhagen.1979.

10. Montgomery, C.A. Linguistics and information 23.science. Jl of Amer Soc for Inf Sc 1972, 23(3),214-215.

66

11. Gardin, J C. Semantic analysis procedures in thescience of man. Soc Sc Inf. 1969,9, 1342.

12. Katz, J and Fodor, J .A. The structure of seman-tic theory. Language 1963,39,170-210.

13. Chomsky, N. Deep structure, surface structureand semantic interpretation. In Steinberg, K andJakobovits, L., Ed. Semantics: An interdiscipli-nary reader in philosophy, linguistics and psycho-logy. Cambridge University Press, Cambridge.1971. p 214.

14. Grimes, J E. The thread of discourse. CornellUniversity, Dept. of Modern Languages and Lin-guistics, Ithaca, New York. (Technical ReportNo.1). (NSF Grant. GS-3180). 1972. P 13-34.

15. Partee, B H. On the requirement that transfonna-tions preserve meaning. In Fillmore, C.J. andLangendden, D.T., Ed. Studies in linguisticsemantics. Holt, Rinehart and Winston, NewYork. 1971, p 3.

16. Wilks, Y A. Grammar, meaning and the machineanalysis of language. Routledge and Kegan Paul,London. 1972, p 94.

17. Hutchins, W J. The generation of syntactic struc-tures from a semantic base. North Holland,Amsterdam. 1971.p 2.

18. Postal P M. On the surface verbs 'remind'. Inabove cited ref. 14; p 249-50.

19. Bever, T G. The cognitive basis for linguisticstructures. In Hayes, J.R., Ed. Cognition and thedevelopment of language. John Wiley, New York.1970. p. 297.

20. Langendden, D T. Essentials of English Gram-mar. Holt, Rinehart and Winston, New York.1970. p. 141.

21. Lakoff, R. If's, and's and But's about conjunc-tion. In above cited ref. 14; p. 55-70.

22. Thompson, S A. The deep structure of relativeclauses. In above cited ref. 14; p 79-96.

Fillmore, C J. Verbs of judging: An exercise insemantic description. In above cited ref. 14;p 273-89.

Ann Lib Sei Docu

Page 10: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

LANGUAGE AND INDEXING LANGUAGE

24. Fillmore, C J and Langendden, D.T., Ed: Studiesin linguistic semantics. Holt, Rinehart and Wins-ton, New York. 1971.

25. Fillmore, C J. The case for case. In Bach, E andHarms, R. Ed. Universals in linguistic theory.Holt, Rinehart and Winston, New York. 1968,p 1090.

26. Vermier, D. Semantic hierarchies and abstrac-tions in conceptual schemata. Inf Systems 1983,8(2),117-24.

27. Ranganathan, S R. Hidden roots of classification.Inf Stor and Retr 1967,3,407-9.

28. Neelameghan, A: Absolute syntax and structureof an indexing and switching language. In Order-ing systems for global information networks:Proceedings of the Third International StudyConference on Classification Research. Bombay,6-11 Jan, 1975. (FID/CR Pub No. 553). DRTC,Bangalore, 1979, p 170.

29. Bach, E: Nouns and noun phrases. In above citedref. 24; p 91.

30. Sager, N: Natural language information format-ting: The automatic conversion of texts to astructured data base. Advances in Computers1979,17,89-162.

31. Schank, R C. Inference and paraphrase by com-puter. Jl of Asso for Comput Machinery 1975,22,309-28.

32. Schank, R C and Abelson, R P: Scripts, plans andknowledge. In Proceedings of the Fourth Inter-national Joint Conference on Artificial Intelli-gence. Tbilisi. 1975, p 151-157.

33. Wilks, Y A: An intelligent analyser and under-stander of English. Comm of the Asso forComputMachinery 1975, 18,264-74.

34. Grishman, R. and Hirschman, L: Question ans-wering for natural language medical data bases.Artif Intell 1978,11,2543.

35. Lehnert, W A: The process of question answer-ing. Dept of Computer Science, Yale University.Research Report. 88. 1977.

Vol 35 No 2 June 1988

36. Bhattacharyya, G and Neelameghan, A: Postu-late-based subject headings for dictionary catalo-gue system. (DRTC Annual Seminar. 7; 1969;paper CA).

37. Bhattacharyya, G: A general theory of SIL,POPSI and Classaurus: Results of current classifi-cation research in India. (Paper presented at theInternational Classification Research Forum,organised by SIG/CR of ASIS, Minneapolis, Oct1979).

38. Bhattacharyya, G: POPSI: Its fundamentals andprocedure based on a general theory of subjectindexing languages. Lib Sc with a slant to Doc1979,16(1),142.

39. Bhattacharyya, G: Some significant results ofcurrent classification research in India. IntlForum on Inf and Doc 1981,6(1),11-21,

40. Bhattacharyya, G: Subject indexing language: Itstheory and practice. (DRTC Refresher Seminar,13; 1981;paper BA).

41. Bhattacharyya, G: Elements of POPS!. In Rajan,T.N.' Ed. Indexing systems: Concepts, modelsand techniques. IASLIC, Calcutta. 1981. p 82.

42. Bhattacharyya, G. Classaurus: Its fundamentals,design and use. In Dahlberg, I., Ed. Proceedingsof the 4th International Study Conference onClassification Research. Augusburg, 28 June-2 July, 1982. Indeks Verlaag, Frankfurt. v 1.1982. p 13948.

43. Bhattacharyya, G: POPSI: A source language fororganising and associative classifications. Lib Scwith a slant to Doc 1982, 19, 243-52.

44. Devadason, FJ: Postulate based Permuted Sub-ject Indexing language as a metalanguage forcomputer aided generation of information retrie-val thesaurus. IntI Forum on Inf and Doc 1983,8(1),22-9.

45. Devadason, F.J. and Kothanda Ramanujam, M:Computer aided construction of alphabetic class-aurus. In (above cited ref. 41 ;p 173 -82.

46. Devadason, F J: Online construction of alphabe-tic classaurus: A vocabulary control and indexingtool. Inf Process and Mgt 1985,21(1),14.

67

Page 11: LANGUAGE AND INDEXING LANGUAGE NALIMOV AND GARDIN …nopr.niscair.res.in/bitstream/123456789/27811/1/ALIS 35(2) 58-68.pdf · synonyms which is characteristic of 'model' and non-synonyms

DEVADASON

47. Devadason, F J: Computerisation of deep struc-ture based indexes, IntI Classif 1985;-12(2).P 87.

48_ Devadason, F J: Computer-based system forgenerating different types of subject indexes and 49.-alphabetical classaurus based on 'deep structure'of subject indexing languages. Ph.D Thesis.

68

(Guide. M.R. Kumbhar]. Kamatak University.Dharwad. 1985.

Devadason, F.J. Computerised Deep StructureIndexing System. FID/CR Report. 21. IndeksVerlaag,Frankfurt. 1986.42 p.

Ann Lib Sci Docu