품사 태깅 시스템 개요

2005. 1. 5

? (Statistical Approach) (Rule-based Approach) (Hybrid Approach)

(Part-of-speech Tagging)?(,) ( ) : , ,

(Part-of-speech Tag Set) - . - ) ' ' '+' + '+' + , '+' + ,

- (Unknown Word) () [][][][ ] . ().

- (Data Sparseness) . (Balanced Corpus) (Smoothing)

- (Unsupervised Learning) (Supervised Learning)

- (Adaptability) .

- '+' .

, ,

=> ++ or ++

(Statistical Approach) ( )

(Rule-based Approach)

(Hybrid Approach)

(Statistical Approach) (Statistical Approach) ( ) ,

HMM(Hidden Markov Model)

N () ,

- Chain rule (1) : w w1,0 c1,0 =>=>w1wi-1wic1Ci-1ci

- Chain rule (2) : c w1,0 c1,0 =>=>w1wi-1wic1Ci-1ci

- C1,i i . (), . 6 9 .

[ 6] [ 1] : .[ 2] : .

- [ 1], [ 2] [ 6] [ 3] [ 10] c1,N [1]

HMM [ 9] [ 3], [ 4] [ 3] : .[ 4] : .

HMM - [ 3] [ 4] [ 9] , [ 3]

Tri-gram [2][3]

HMM - HMM NVNVPARTNNV$flieslikeaflies0.290.0250.430.10.650.36010.063$

HMM - (Hidden Markov Model)2 HMM .P(ci|ci-1) (State Transition Probability) P(wi|ci) (Observation Symbol Probability)

N () ,

HMM

Tri-gram

- HMM : : ( ) ( ) ( ) NVNVPARTNNV$+.+..++flieslikeaflies$

N () ,

HMM

HMM

- HMM : : (Multiple Observation) (Shared Word Sequence) (Virtual Word)

(Twoply HMM) hi i (Head Category)ti i (Tail Category)

HMM

(Rule-based Approach) ( ) ( ) ( ) ( ) . (Positive)/(Negative) (Disambiguation Rule-based) , (Transformation Rule-based) .

Klein Simmons Green Rubin (TAGGIT)Hindle Chanod Tapanainen Voutilainen ENGCGBrill

Klein & SimmonsKlein Simmons 400 , 30 , 90%

Green Rubin (TAGGIT)Green Rubin (TAGGIT) , 3,000 Negative, Positive ( )W X ? Y Z -> not A ( W X ? Y Z -> A) ( W, X, ?, Y, Z ? not A( A) : 286 , Brown 100 => 77% 25% 80% CLAWS Brown

Hindle Hindle ( 136 )

(default rule) : [ADJ+N+V] -> [N] [*] [*] ( , , ) [PREP+TNS] -> TNS[N+V] (PREP TNS , PREP TNS TNS )

Hindle - if correct goto next Else 46 100 Brown 5 ; 35,000 : 98%( 95%) : 97%( 90%)

Chanod & Tapanainen Chanod Tapanainen ( )16 : 50% 97 : 2/3 . 50 1 (Principle rule) : Heuristic rules : Principle rule Non-Contextual rules : Heuristic rule 11 Finite State Tranducer

Chanod & Tapanainen -37 5,752 98.7% 12 , 97.5%

. .

ENGCGVoutilainen ENGCG TokenizerENGCG ENGCG disambiguator Finite state syntactic disambiguator ENGCG (disambiguator) Finite-State Intersection Grammar

ENGCG-ENGCG (disambiguator) - (negative)

D0() 39.0% 67,737 1.77 31 0.08%D1(D0+ENGCG) 6.2% 40,450 1.06 124 0.32%D2(D1+) 3.2% 38,946 1.02 226 0.59%D3(D2+) 0.6% 38,342 1.00 281 0.74%

.

Brill

(Transformation-based Error-driven Learning) PreprocessorScoring Rule TemplateLearnerUnannotatedtextPreprocessorAnnotatedtextTruthLearnerRule

Brill - Scoring / 2.-5.

( )

http://infocom.chonan.ac.kr/~limhs/ (/cwb-data/data/nlp/%C7%B0%BB%E7%C5%C2%B1%EB.ppt)

품사 태깅 시스템 개요

Documents

Transcript of 품사 태깅 시스템 개요