A Workbench for Acquiring Semantic Information and...

13
E.-P. Lim et al. (Eds.): ICADL 2002, LNCS 2555, pp. 315-327, 2002. © Springer-Verlag Berlin Heidelberg 2002 A Workbench for Acquiring Semantic Information and Constructing Dictionary for Compound Noun Analysis Kyung-Soon Lee 1 , Do-Wan Kim 2 , Kyo Kageura 1 , and Key-Sun Choi 2 1 National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan {kslee, kyo}@nii.ac.jp 2 Division of Computer Science, KAIST/KORTERM 373-1 Kusung Yusong Daejeon, 305-701, Korea [email protected], [email protected] Abstract. This paper describes a workbench system for constructing a diction- ary to interpret compound nouns, which integrates the acquisition of semantic information and interpretation of compound nouns. First, we extract semantic information from a machine readable dictionary and corpora using regular ex- pressions. Then, the semantic relation of compound nouns are interpreted based on semantic relations, semantic features extracted automatically, and subcatego- rization information according to the characteristics of a head noun, i.e. attribu- tive or predicative. Experimental results show that our method using hybrid knowledge depending on the characteristics of a head noun improves the accu- racy rate by 40.30% and the coverage rate by 12.73% better than previous re- searches using semantic relations extracted from MRDs. As compound nouns are highly productive and their interpretation requires hybrid knowledge, we propose a workbench for compound noun interpretation in which necessary knowledge such as semantic patterns, semantic relations, and interpretation in- stances can be extended, rather than assuming a pre-defined lexical knowledge. 1 Introduction The semantic interpretation of compound nouns (consisting of two nouns), or analyz- ing semantic relations between constituent nouns, is useful for various applications. For instance, it makes possible the regeneration or paraphrasing of natural language sentences. Also, it is useful for the syntagmatic query expansion in information re- trieval and for the classification of answer type in question answering. Much work has been done in the interpretation of compound nouns or noun se- quences ([3], [16], [10], [14]). The methods used in interpretation fall into two catego- ries: those based on semantic relations and those based on semantic features. The method based on semantic relation ([17], [16], [15]) interpret a compound noun using rules for lexical patterns and their semantic relations which are extracted from ma- chine readable dictionaries (MRDs). The MRD data is the structured and incomplete information which has limited expressions to define terms. On the other hand, the

Transcript of A Workbench for Acquiring Semantic Information and...

Page 1: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

E.-P. Lim et al. (Eds.): ICADL 2002, LNCS 2555, pp. 315-327, 2002.© Springer-Verlag Berlin Heidelberg 2002

A Workbench for Acquiring Semantic Information andConstructing Dictionary for Compound Noun Analysis

Kyung-Soon Lee1, Do-Wan Kim2, Kyo Kageura1, and Key-Sun Choi2

1 National Institute of Informatics2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan

kslee, [email protected] Division of Computer Science, KAIST/KORTERM

373-1 Kusung Yusong Daejeon, 305-701, [email protected], [email protected]

Abstract. This paper describes a workbench system for constructing a diction-ary to interpret compound nouns, which integrates the acquisition of semanticinformation and interpretation of compound nouns. First, we extract semanticinformation from a machine readable dictionary and corpora using regular ex-pressions. Then, the semantic relation of compound nouns are interpreted basedon semantic relations, semantic features extracted automatically, and subcatego-rization information according to the characteristics of a head noun, i.e. attribu-tive or predicative. Experimental results show that our method using hybridknowledge depending on the characteristics of a head noun improves the accu-racy rate by 40.30% and the coverage rate by 12.73% better than previous re-searches using semantic relations extracted from MRDs. As compound nounsare highly productive and their interpretation requires hybrid knowledge, wepropose a workbench for compound noun interpretation in which necessaryknowledge such as semantic patterns, semantic relations, and interpretation in-stances can be extended, rather than assuming a pre-defined lexical knowledge.

1 Introduction

The semantic interpretation of compound nouns (consisting of two nouns), or analyz-ing semantic relations between constituent nouns, is useful for various applications.For instance, it makes possible the regeneration or paraphrasing of natural languagesentences. Also, it is useful for the syntagmatic query expansion in information re-trieval and for the classification of answer type in question answering.

Much work has been done in the interpretation of compound nouns or noun se-quences ([3], [16], [10], [14]). The methods used in interpretation fall into two catego-ries: those based on semantic relations and those based on semantic features. Themethod based on semantic relation ([17], [16], [15]) interpret a compound noun usingrules for lexical patterns and their semantic relations which are extracted from ma-chine readable dictionaries (MRDs). The MRD data is the structured and incompleteinformation which has limited expressions to define terms. On the other hand, the

Page 2: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

316 K.-S. Lee et al.

corpora data is the unstructured but more complete information which includes variousexpressions between terms. What is necessary is a combination of the corpora and theMRD data, each of which is inadequate, but which, when combined, creates a richsource of semantic information. The method based on semantic features ([1], [6], [8])considers all the possibilities of combination between semantic features for a modifiernoun and a head noun. The feature-based method has difficulty in dealing with am-biguous cases where the same feature sequences can take different relations.

In Korean text, the appearance of compound nouns is a general phenomenon andhighly productive. The enumeration of a modifier noun and a head noun can make upa compound noun and constitutes the majority of Korean compound nouns. In addi-tion, predicative nouns as a head of compounds can take a wide variety of case rela-tions with modifier nouns. But most predicative nouns are expressed with the samepostpositions in MRDs and corpora. It makes it difficult to interpret semantic rela-tions. Since they have selection restriction like a verb, we can interpret compoundnouns for predicative nouns using subcategorization information. In machine transla-tion, though in some cases word-to-word translation works, in many cases semanticinterpretation of compound nouns is necessary. For example, ‘ [jang-ae-in](thehandicapped) [hak-gyo] (school)’ should be translated to ‘a school for the handi-capped’ through semantic interpretation of a compound noun, not ‘the handicappedschool’ by word-to-word translation.

In this paper, we present a workbench system to integrate acquisition of semanticinformation and interpretation of compound noun for Korean semantic analysis. Thesemantic relation of compound nouns are interpreted based on semantic relations andsemantic features extracted automatically from an MRD and corpora, and subcategori-zation information of predicative nouns according to the characteristics of a headnoun. Because the compound nouns are highly productive and the information neces-sary for interpreting them is complex, we propose a workbench which integratesknowledge acquisition and compound noun interpretation with user’s feedback. Thesystem can keep logs for interpretation errors which are useful to analyze error pat-terns between a lexical pattern and semantic relations.

In the following, we first explain the method we propose for interpreting compoundnouns in section 2. Then, in section 3, we explain the actual workbench system, whichincorporates the mechanisms explained in section 2 and some additional features.

2 Interpretation of Korean Compound Nouns

Fig. 1 shows the overall architecture of our system for the interpretation of Koreancompound nouns. The system acquires semantic information such as semantic rela-tions and semantic features from an MRD and corpora, and constructs a semanticnetwork of nouns. Using this semantic network, together with the subcategorizationinformation of predicative nouns and interpretation rules, compound nouns are inter-preted. Below, we explain the construction of the semantic network of nouns and theinterpretation of compound nouns. Then we show the experimental results of extrac-tion of semantic information and interpretation of compound nouns.

Page 3: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

A Workbench for Acquiring Semantic Information and Constructing Dictionary 317

2.1 Semantic Information Automatically Extracted from MRDs and Corpora

As knowledge resources to interpret compound nouns, we extract semantic informa-tion such as semantic relations and semantic features from MRDs and corpora bydefining regular patterns, respectively. What is necessary is a combination of the cor-pora and the MRD data, each of which is inadequate, but which, when combined,creates a rich source of semantic information.

The semantic relations extracted are as follows: <subject>, <object>, <location>,<time>, <possessive>, <whole-part>, <part-whole>, <instrument>, <purpose>, <mate-rial>, <cause>, <caused-by>, and <by-means-of>. The <hypernym> relation betweennouns is also extracted. These classifications account for most of the compound nounclasses studied previously in theoretical linguistics ([2], [4], [17]). The semantic fea-tures extracted are like these: <±abstract>, <±animal>, <±organization>, <±person>,<±location>, <±material> and <±time>. The ‘+’ and ‘-’ sign represent whether a nounhave the feature. These features are used in defining interpretation rules unambigu-ously. Semantic information extracted form semantic network in which a link repre-sents a semantic relation and a node represent a noun. Each node can have semanticfeatures which are used for generalization.

For MRDs, semantic information is extracted from a head word and its definitionsentence. The regular pattern consists of a word, part-of-speech tag and some symbolsfor matching. The symbol ‘|’ means an option and ‘*’ means any matching. We de-fined the expressions by analyzing a head word and its definition from some part of anMRD. The semantic feature is determined by hypernym of a head word. Corpora havericher terms and various semantic relations than MRDs although it has rare frequencyof a regular pattern. The regular patterns for corpora have a simple sentence structure,which is different from that of MRDs since in MRDs, the subject is a head word im-plicitly (Table 1). Table 2 shows the example for compound noun, its semantic rela-tion and its interpretation. Table 3 shows different semantic information extractedfrom an MRD and corpora for a noun ‘ (seol-myeong-seo)[manual]’. Usingcorpora, we can acquire <purpose> relation which is not extracted from MRDs.

From extracted semantic information, MRDs is useful to extract semantic featuresand hypernym relation between nouns, but difficult to extract various semantic rela-

MRD

Corpora

RegularExpressions

Semantic RelationExtractor

Semantic RelationExtractor

Compound NounInterpreter

Compound NounInterpreter

Subcategorizationof Predicative Nouns

InterpretationRules

Compound Noun

Result of Interpretation

Construction of Semantic NetworkConstruction of

Semantic Network

Interpretation ofCompound NounInterpretation ofCompound Noun

MRD

Corpora

RegularExpressions

Semantic RelationExtractor

Semantic RelationExtractor

Compound NounInterpreter

Compound NounInterpreter

Subcategorizationof Predicative Nouns

InterpretationRules

Compound Noun

Result of Interpretation

Construction of Semantic NetworkConstruction of

Semantic Network

Interpretation ofCompound NounInterpretation ofCompound Noun

Fig. 1. System architecture for acquiring knowledge and interpreting compound nouns.

Page 4: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

318 K.-S. Lee et al.

tions. On the other hand, corpora are very useful to extract various semantic relationsbetween nouns. Using various corpora is helpful to solve sparseness problem of se-mantic information.

Table 1. Regular patterns to extract <purpose> relation from MRDs and corpora.

regular pattern A/ncn */jco /pvg+ /ecsMRD

semantic relation [head word] –<purpose>

regular pattern B/ncn ( | )/jxt C/ncn */jco /pvg+ /etm DCorpora

semantic relation [B] –<purpose>

Table 2. Korean compound nouns and their interpretation

Compound NounN1 N2

Semantic Relation Interpretation

(user) (manual)<purpose>

(manual for user)

(leather) (bag)<material>

(bag made of leather)

(car) (wheel)<whole-part>

(car’s wheel )

Table 3. Different semantic information acquired from an MRD and corpora

MRD [seol-myeong-seo](manual)–<hypernym> [geul](writings)Corpus [seol-myeong-seo](manual)–<purpose> [sa-yong-ja](user)

2. 2 Interpretation of Korean Compound Noun Depending on Head Nouns

Compound nouns are interpreted using information on semantic network and subcate-gorization information of predicative nouns. The procedure of interpretation differsaccording to the type of head nouns, i.e. attributive or predicative. In the case of at-tributive nouns, the system interprets based on semantic relations and semantic fea-tures. In the case of predicative nouns, the system interprets based on semantic rela-tion and subcategorization information.

In Korean, predicative nouns as a head of compounds can take a wide variety ofcase relations with modifier nouns. But most predicative nouns are expressed with thesame postpositions such as ‘ [i]’ for subjective postposition and ‘ [leul]’ for objec-tive postposition in MRDs and corpora. It makes difficult to interpret semantic rela-tions. Since they have selection restriction like a verb, subcategorization information isuseful to decide semantic relations in lexical patterns with ambiguity. Semantic fea-tures are used to supplement the interpretation based on semantic relation. The inter-pretation system gives weights to the results by interpretation rules according to theirdistance on semantic network and the types of semantic relations.

Page 5: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

A Workbench for Acquiring Semantic Information and Constructing Dictionary 319

2.2.1 Interpreting Compound Nouns with Attributive HeadsWhen a head noun is attributive in a compound noun, the system interprets it based onsemantic network and interpretation rules.

On the semantic network, some compound nouns are connected with direct link(‘ [ga-juk](leather) [ga-bang](bag)’ and some compound nouns are con-nected indirectly through several links (‘ [hwa-jang-pum](cosmetics) [ga-ge](store)’ in Fig. 2).

Fig. 2. Compound nouns connected with direct or indirect links on semantic network.

To interpret a compound noun connected by several links in semantic network, thesystem use interpretation rules for inference. Interpretation rules are constructed byusing semantic relations and semantic features. The rules with semantic features areused when semantic information is lacking or insufficient for determining an inter-pretation. For <possessive> relation, the system interprets it by the rule based on se-mantic feature since the relation is not extracted from MRDs and corpora, which isrepresented with the same expression such as ‘ [eui]’. Table 4 shows interpretationrules for <material> relation.

Table 4. Interpretation rules for <material> relation.

Rule Modifier noun Head nounRule 15 <material> semantic networkRule 16 semantic network <object>Rule 17 <+material> <-abstract>

Two nouns on the semantic network can be connected by direct and indirect links.Therefore, the system has to choose proper interpretation among those links. Thesystem selects the best interpretation by weighting to the links according to the dis-tance and the type of semantic relation. In other words, if two nouns are connected bya direct link, the interpretation of the link has the highest weight. According to thenumber of bridge node to connect two nouns increase, the weight of interpretationbecomes lower. If two nouns are connected by several indirect links, we give higherpriority according to the type of semantic relation as follows: Priority 1: <hypernym>,priority 2: <part>, <material>, priority 3: <object>, <subject>.

2.2.2 Interpreting Compound Nouns with Predicative HeadsWhen a head noun is predicative which represent state or action, the system interpretsit based on semantic relations and subcategorization information.

Page 6: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

320 K.-S. Lee et al.

Predicative noun and suffix ‘ [ha-da](do)’ or ‘ [doi-da](become)’ form averb in Korean. For example, the combination of a predicative noun ‘ [geo-lae](transaction)’ and suffix ‘ ’ make a verb ‘ [geo-lae ha-da](transact)’. Therefore predicative verb governs cases. Subcategorization is useful incase of subject and object relation., because they are difficult to be extracted fromMRDs since most regular patterns are expressed with the ambiguous patterns such as‘ [i]’ and ‘ [leul]’ which represent postpositions for subjective and objective, re-spectively. For example, a compound noun, ‘ [jusik] (stock) [geo-lae](transaction)’, is interpreted as <object> relation based on subcategorization of ‘ ’which is predicative and ‘ [jeung-kwon]’ which has <hypernym> relation to‘ ’ on semantic network. Matching is tried using nouns of N2 or concept such as<thing>, and their links on semantic network (in Figure 3).

Fig. 3. Example of interpreting a compound noun with a predicative head.

2.3 Experiments

We experimented for the extraction of semantic information and the interpretation ofcompound nouns. Table 5 shows resource statistics for an MRD and corpora to extractsemantic information.

Table 5. The statistics of experimental data from an MRD and corpora.

Resources The number of sentence RatioMRD (definition) 8780 (for 5956 head words) 3.22%

Fiction 251279 91.89%Corpora

Essay 13385 4.85%Total 273444 100%

2.3.1 Experiment 1: Extraction of Semantic InformationWe extracted 18,262 semantic relations among 5,235 terms from MRD. In case ofboth MRD and corpus, we extracted 53,644 semantic relations among 10,255 terms bydefining 128 regular expressions. 3,160 terms and 6,298 semantic relations are redun-dantly extracted in both MRD and corpus. The average number of semantic relationsfor one term is 5.23. We evaluated 500 randomly selected semantic relations. Theprecision are 80.6% and 82.6% for an MRD and corpora, respectively (Table 6). To

Page 7: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

A Workbench for Acquiring Semantic Information and Constructing Dictionary 321

extract 7 semantic features, we used only an MRD and decided it depending on hy-pernym of a head word. The precision is 97.32 % for 996 terms.

Table 6. The number of semantic relations and evaluation results (Corr: the number of correctanswer, Samp: the number of sample randomly selected).

MRD CorpusSemantic Relation

Total Corr /Samp Total Corr/SampBY-MEANS-OF 317 4 / 5 576 2 / 5CAUSE 4 7CAUSE-BY 14 25HAS-OBJECT 5523 144 / 164 6798 43 / 49OBJECT-OF 0 21379 283 / 335HAS-PART 14 0 / 1 21PART-OF 50 2 / 2 1HAS-SUBJECT 2054 43 / 69 2365 6 / 11SUBECT-OF 0 8877 76 / 93

HYPERNYM 7947 182 / 221 0

INSTRUMENT-FOR 14 1 / 1 3LOCATED-AT 71 2 / 4 1157 2 / 4LOCATION-OF 287 4 / 8 17MADE-INTO 3 16

MADE-OF 35 1 / 1 45PURPOSE 309 8 / 8 182 1 / 1TIME-OF 624 12 / 16 211 0 / 2Total (Precision %) 18262 403 / 500

(80.6%)41680 413 / 500

(82.6%)

2.3.2 Experiment 2: Interpretation of Compound NounUsing subcategorization information ([7]) and semantic information automaticallyextracted, we interpreted compound nouns and evaluated the performance for 450compound nouns randomly selected from compound nouns list constructed by Nam([13]). Evaluator consists of five persons. We regard interpretation of compound nounas correct answer when three or more persons evaluate it as correct.

Fig. 4. The ratio of compound nouns according to interpretability.

Fig. 4 shows composition ratio of interpretability for compound nouns used in ourexperiment. 303 compound nouns are interpretable, 54 are hard to interpret, and 93 arebeyond scope of semantic relations. The relations of beyond scope include <equal>(‘son daughter’) and <color> (‘black shoes’). It needs more semantic relations to in-terpret a compound noun.

Interpretable

Beyond scope of semantic relations

Hard to interpret

Interpretable

Beyond scope of semantic relations

Hard to interpret

Page 8: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

322 K.-S. Lee et al.

We evaluated interpretable 303 compound nouns by precision and recall measures:Precision = C / (C + W) * 100 (1)Recall = (C + W) / (C + W + F) * 100 (2)

where C means the number of correct answers, W is for incorrect answers, and F is forfailure of interpretation.

Table 7 shows the effect of interpretation according to resources. The performanceusing semantic information extracted from an MRD, corpora and subcategorizationimproves +40.30% in precision and +12.73% in coverage over the performance inprevious researches using semantic relations extracted from only MRDs.

Table 7. The performance for interpretation of compound nouns according to resources used.

Resources Correct Incorrect Failure Precision Recall

MRD 35 48 220 42.2 % 27.4%MRD+Corpus 54 46 203 54.0% 33.0%

MRD+Subcategorization 55 41 207 57.3% 31.7%

MRD+Corpus+Subcategorization 73 38 192 65.8% 36.6%

Table 8. Error example for exceptional nouns applied to a regular pattern.

Head word Definition in an MRD Semantic information –<material> O(sack) (paper), (textile)

(bag) –<material> O

(wheel) (round) (shape (thing)

–<material> X

(artificialflower)

(artificiality) (flower) –<material> X

2.3.3 Error Analysis

Error types in extracting semantic information(1) Error in part-of-speech tagging: Regular patterns were not applied or wrong se-

mantic information was extracted by POS tagging error.

(2) Error in syntactic analysis for parallel structure: Since regular patterns are definedto extract simple parallel structure, we failed to acquire semantic information withcomplex parallel structure.

(3) Error from exceptional nouns for a regular pattern: The combination of specialnouns and a regular pattern cause error. For example, when a regular pattern, ‘

[eu-lo man-deun] (made by,of,from)’ for <material> relation, is applied tonouns such as ‘ [in-gong](artificiality)’ and ‘ [mo-yang](shape), the seman-tic information extracted is wrong.

Page 9: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

A Workbench for Acquiring Semantic Information and Constructing Dictionary 323

Error types in interpreting compound nouns(1) Lack of semantic information and error in semantic network: The error of semantic

network give direct effect to the error of interpretation. By elaborating regular pat-terns and regulating semantic network, the error will lessen. And by expandingregular patterns and by using thesaurus and inference, we deal with lack problem ofsemantic information.

(2) Error in interpretation rule: In semantic network, we could not interpret by infer-ence although two nouns are connected in case they are connected by <subject> or<object>. Interpretation rule using semantic features make errors. We interpreted<possessive> relation based on semantic features. There are many errors since itcould not deal lexical level interpretation. It needs delicate interpretation rule.

To interpret a compound noun, two nouns should be connected on semantic net-work in any links. By constructing semantic network with semantic relations extractedfrom corpora as well as MRDs, semantic network could have much more informationsuch as terms and links. Therefore, the system could obtain more correct interpreta-tion. By interpreting compound nouns according to the characteristics of a head noun,i.e. attributive or predicative using hybrid knowledge, the coverage of interpretationcould be extended.

3 A Workbench for Integration of Knowledge Acquisition andInterpretation

The interpretation method of compound nouns requires very large lexical and semanticinformation between nouns with high quality. But extracting large and faultless se-mantic information is very difficult job. By experiments and analysis, we observedthat interpretation systems require more delicate regular patterns, more various se-mantic relations, and interpretation rules. We propose a workbench system whichintegrates knowledge acquisition and compound noun interpretation procedure forKorean semantic analysis with user’s feedback. The system consists of semantic rela-tion pattern extractor, knowledge indexer, compound noun extractor, and compoundnoun interpreter. The system keeps user’s error correction log which make possible tospecialize exceptional usages.

3.1 Semantic Relation Pattern Extractor

Semantic relation pattern extractor defines regular patterns and their semantic rela-tions by searching lexical patterns for POS tagged corpus (Fig. 5).1. Input POS tagged corpus.2. Define lexical patterns to examine. The first and last part consists of nouns. For

example, a lexical pattern to search is this:

N1/n* */j* #4 N2/N* (3)

where N1 means a modifier noun, N2 means a head noun, ‘/’ symbol is for di-viding lexical and POS tag, ‘*’ symbol means any matching is possible, ‘#<num-

Page 10: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

324 K.-S. Lee et al.

ber>’ represents the maximum distance of N1 and N2, ‘n*’ is all nouns, and ‘j*’is for all postposition at POS tag.

3. Results of all possible compound nouns (CN) list for the regular patterns. If auser define the regular pattern in (3), the results will include this:

/ncn+ /ncpa+ /jca /pvg+ /etm /ncn (4)

4. The system provides all possible regular patterns between two nouns. For (4), thesystem gives two results: ‘ /jca’ and ‘ /jca /pvg+ /etm’.

5. A user decides the regular pattern and the semantic relation for the compoundnoun, which predefined semantic relations are shown such as <by-means-of> and<purpose>. Also user can define new semantic relations.

3.2 Compound Nouns and Their Interpretations Extractor

Compound noun extractor extracts compound nouns and their possible interpretationsbased on semantic relations, inference, and on semantic network (Fig. 6).1. A user selects the block to interpret extract compound nouns for POS tagged

corpus and press the extraction button.2. Then, system provides all possible compound nouns list and their semantic rela-

tions.3. For selected passage including a compound noun, the system provides the com-

pound nouns and the interpretation by knowledge of current system.4. The result can be one by searching indexed knowledge for semantic relation

patterns and existing semantic network.5. Here a user can select the level of hypernym. It makes possible to generalize.6. Regular patterns are presented. For the wrong interpretation of a compound noun,

the system reserves error log.

2

34

5

2

34

5

1

2

3

6

5

1

2

3

6

5

Fig. 5. Semantic relation pattern extractor. Fig. 6. Compound noun extractor.

Page 11: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

A Workbench for Acquiring Semantic Information and Constructing Dictionary 325

3.3 Knowledge Indexer

Knowledge indexer constructs internal indexing structure of nouns and their semanticrelations on semantic network. To interpretation a compound noun, the system useindexing information. Through indexing of incoming semantic information and com-pound noun extracted, we can extend knowledge database.Indexing for semantic relation patterns: For each regular pattern (key part of in-dexing), the order of nouns and the possible semantic relations are indexed.

Key field (regular pattern) Data field (<order of nouns, relation>)/jca /pvg+ /etm (by means of) N1, N2 <cause>, <by-means-of>

/jca pvg+ /etm (made from) N1, N2 <material>

Indexing for compound nouns and their semantic relations: For each noun for acompound noun, the position and semantic relation are indexed. Key part of index is anoun, and data part is position, i.e. rear or front of noun and their semantic relation.For a compound noun, ‘ (leather bag)’, indexing structure is as follows:

Key field (noun) Data field (<position, noun, relation>) (leather) front (bag) <material> (bag) rear (leather) <material>

3.4 Compound Noun Interpreter

Compound noun interpreter interprets a compound noun by interpretation rules andinference. Figure 7 shows the interpretation of ‘ (bus accident)’ by directlink and ‘ (car accident)’ by <hypernym> relation between ‘ (bus)’and ‘ (car)’ as <cause> relation.

Fig. 7. Examples for compound noun interpretation of ‘bus accident’ and ‘car accident’

Page 12: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

326 K.-S. Lee et al.

4 Conclusion

This paper has described the workbench system for constructing a dictionary to inter-pret Korean compound nouns, which integrates the acquisition of semantic informa-tion and the interpretation of compound nouns. To acquire knowledge for interpreta-tion of compound nouns, we extracted semantic relations and semantic features fromMRDs and corpora. The precision of semantic information is 80.6% and 82.6% fromMRD and corpora, respectively. To interpret compound nouns, we used hybrid knowl-edge such as semantic relations, semantic features, and subcategorization informationdepending on the characteristics of a head noun. Experimental results show that ourmethod improved the accuracy rate by 40.30% and the coverage rate by 12.73%, bet-ter than the rates obtained in previous work using semantic relations extracted fromMRDs. By constructing a semantic network with semantic relations extracted fromcorpora as well as MRDs, the semantic network can have much more informationsuch as terms and links. Therefore, the system can give more correct interpretation. Byinterpreting compound nouns based on hybrid knowledge depending on the character-istics of a head noun, the coverage of interpretation could be extended.

As compound nouns are highly productive and their interpretation requires complexknowledge, we built a workbench for compound noun interpretation in which neces-sary knowledge such as semantic patterns and interpretation instances of compoundnouns can be extended, rather than assuming pre-defined lexical knowledge. Theworkbench integrated knowledge acquisition and interpretation.

References

1. Dolan, B., Vanderwende, L., Richardson, S.: Automatically deriving structured knowledgebase from on-line dictionaries. In Pacific Association for Computational Linguistics.(1993).

2. Downing, P.: On the creation and use of English compound nouns. Language 53. (1977)pp.810-842.

3. Finin, T.W.: The semantic interpretation of compound nominals. U. of Illinois at Urbana-Champaign. University Microfilms International. (1980).

4. Jespersen, O.: A modern English grammar on historical principles., VI. George Allen &Unwin Ltd., London, 1909-49; reprinted 1954. (1954).

5. Kang, I.H.: Korean part-of-speech tagging based on maximum entropy model. MS thesis.KAIST (1999) (in Korean).

6. Kang, Y.H.: Noun semantic classification for Korean-to-English machine translation. MSthesis. Universty of Kyungpook (1989) (in Korean).

7. Kim, I.T.: Research on subcategorization of verb for Korean sentence analysis. Report ofSERI. (1997).

8. Kim, S.N., Won, S.Y., Kwon, H.C. et al.: Korean compound noun analysis using semanticinformation. In KISS Fall. (1998) (in Korean).

9. Kurohashi, Sadao, Sakai, Yasuyuki.: Semantic analysis of Japanese noun phrases : A NewApproach to Dictionary-Based Understanding. In ACL99. (1999).

Page 13: A Workbench for Acquiring Semantic Information and …semanticweb.kaist.ac.kr/home/images/2/25/A_Workbench_for... · 2011-03-16 · noun and a head noun. The feature-based method

A Workbench for Acquiring Semantic Information and Constructing Dictionary 327

10. Lehnert, W.: The analysis of nominal compounds. In U. Eco, M. Santambrogio, and P.Violi Ed., Meaning and Mental Representations, VS 44/45, Bompiani, Milan. (1988).

11. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: anon-line lexical database. International Journal of Lexicography 3. (1990).

12. Montemagni, S., Vanderwende, L.: Structural Patterns vs. String Patterns for ExtractingSemantic Information from Dictionaries. In COLING92. (1992).

13. Nam, Ji-Sun and Choi, Key-Sun.: Korean electronic dictionary. Technical report. CAIR-TR-97-72. (1997) (in Korean).

14. Rosario, B., Hearst, M.: Classifying the semantic relation in noun compounds via a do-main-specific lexical hierarchy. In EMNLP-2001. (2001).

15. Richardson, S., Dolan, W., Vanderwende, L.: MindNet: acquiring and structuring semanticinformation from text. In COLING-98. (1998).

16. Sparck Jones, K.: So what about parsing compound nouns? In K. Sparck Jones and Y.Wilks Ed., Automatic Natural Language Parsing, Ellis Horwood, Chichester, England.(1983) pp. 164-168.

17. Vanderwende, L.: The analysis of noun sequences using semantic information extractedfrom on-line dictionaries. Ph.D. thesis, Georgetown University. (1995).