Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

25
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Advisor Dr. Hsu Graduate Chun Kai Chen Author Qing Ma, Kyoko Kanzaki, Yujie Zhang, Masaki Murata, Hit oshi Isahara Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel Neural Networks 17 (2004) 1241–1253

description

Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel. Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Qing Ma, Kyoko Kanzaki, Yujie Zhang, Masaki Murata, Hitoshi Isahara. Neural Networks 17 (2004) 1241–1253. Outline. Motivation - PowerPoint PPT Presentation

Transcript of Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Page 1: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Advisor : Dr. HsuGraduate : Chun Kai ChenAuthor : Qing Ma, Kyoko Kanzaki, Yujie Zhang,         Masaki Murata, Hitoshi Isahara

Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Neural Networks 17 (2004) 1241–1253

Page 2: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Introduction Self-organizing monolingual semantic maps Experimental Results Conclusions Personal Opinion

Page 3: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

A number of corpus-based statistical approaches have been used to compute word similarity

It is difficult to recognize the relationships between groups or the relationships between words within groups

Page 4: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

We need a technique that can map words from a very large lexicon into a small semantic space

A visible representation where words with similar meanings are placed at the same or neighboring points so that the distance between the points represents the semantic similarity in the words

Semantic maps can be automatically constructed with self-organization

Page 5: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction

Presents a method of self-organizing monolingual semantic maps for Chinese and Japanese using SOM for specific purpose

To construct semantic maps of nouns from the point of view of the adnominal constituents

Extended to the construction of Japanese–Chinese bilingual semantic maps

Page 6: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Self-organizing monolingual semantic maps

Data coding•Baseline method•Frequency term-weighting method•TFIDF term-weighting method

dij is the word similarity

。。。。。。

Page 7: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Data coding

Word wi can be defined by a set of its co-occurring words as

V(wi) is the input to the SOMonly reflects the relationships between a pair of words

Page 8: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Data coding method

Baseline method

─ dij : word similarity

─ ai & aj: are the numbers of co-occurring words of wi and wj

─ cij : is the number of co-occurring words that both wi and wj have in common

Frequency term-weighting method TFIDF term-weighting method

Page 9: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Table 1Comparative results for various coding methods and clustering

Page 10: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Evaluation methods

Numerical evaluation─ precision─ recall─ F-measure

Intuitive evaluation─ our ‘common sense’

Comparison with other methods─ multivariate statistical analyses

Page 11: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Experimental Results

Page 12: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 13: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.TFIDF comparison with PCA

Fig. 2. Chinese semantic map using principal component analysisFig. 1. Chinese semantic map based on TFIDF term-weighted coding

Page 14: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Table 2Clustering results with TFIDF term-weighted coding

The underlined words are those classified into incorrect areas.

Page 15: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 16: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Semantic map comparison with PCA

Fig. 3. Japanese semantic map based on the TFIDF term-weighted coding method

Fig. 4. Japanese semantic map using principal component analysis

Page 17: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Self-organizing bilingual semantic maps(1/2)

When a translation pair of sentences like

Each Japanese word can therefore be automatically aligned to a Chinese word from this map by measuring its distance

If the Chinese word keyi (can) is closest to the Japanese word seta (can), then the Japanese word seta (can) is regarded as being aligned to the Chinese word keyi (can)

(Japanese) keiei toppu ga tei seichou jidai teichaku wo jikkan shite iru koto wo ukagawa seta.(Chinese) youci keyi kanchu, zuigao jingyingzhe shengan jingji ren tingliu zai dishu zengzhang shidai.(English) We can see that upper management has realized that the economy is fixed in an eras of slow growth.

Page 18: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Self-organizing bilingual semantic maps(2/2)

A small-scale (10 translation pairs) experimental comparison with the baseline method

Comparison with hierarchical clustering and multivariate statistical analysis

Page 19: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Data coding(1/2)

(Japanese) keiei toppu ga tei seichou jidai teichaku wo jikkan shite iru koto wo ukagawa seta.(Chinese) youci keyi kanchu, zuigao jingyingzhe shengan jingji ren tingliu zai dishu zengzhang shidai.(English) We can see that upper management has realized that the economy is fixed in an eras of slow growth.

Ji (i=1,.,m) are Japanese words forming the Japanese sentenceCi (i=1,.,n) are Chinese words forming the translated Chinese sentence

Page 20: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Data coding(2/2)

is a co-occurring word of Ji

is the normalized co-occurrence frequency

is a co-occurring word of either or severals of Jj1;.; Jj;ni

is the normalized co-occurrence frequency

Page 21: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Semantic map comparison with PCA

Page 22: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Semantic map comparison with Baseline

Table 3Word alignment result obtained from semantic map

Table 4Baseline word alignment results

Page 23: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions and Future Work

Proposed a method of self-organizing monolingual semantic maps for Japanese and Chinese

Experimental results proved that these maps were generally consistent with our intuition

Comparison demonstrated that the hierarchical clustering technique is inferior to SOM in terms of classifying ability

Furthermore, multivariate statistical analysis such as principal component analysis and factor analysis gave worse results

Page 24: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions and Future Work

An extension to the automatic construction of bilingual semantic maps of Japanese and Chinese

Develop an automatic method of transforming both Japanese and Chinese words

Page 25: Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Personal Opinion

…..