품사 태깅 시스템 개요
description
Transcript of 품사 태깅 시스템 개요
-
2005. 1. 5
-
? (Statistical Approach) (Rule-based Approach) (Hybrid Approach)
-
(Part-of-speech Tagging)?(,) ( ) : , ,
-
(Part-of-speech Tag Set) - . - ) ' ' '+' + '+' + , '+' + ,
-
- (Unknown Word) () [][][][ ] . ().
-
- (Data Sparseness) . (Balanced Corpus) (Smoothing)
-
- (Unsupervised Learning) (Supervised Learning)
-
- (Adaptability) .
-
- '+' .
, ,
=> ++ or ++
-
(Statistical Approach) ( )
(Rule-based Approach)
(Hybrid Approach)
-
(Statistical Approach) (Statistical Approach) ( ) ,
HMM(Hidden Markov Model)
-
N () ,
-
- Chain rule (1) : w w1,0 c1,0 =>=>w1wi-1wic1Ci-1ci
-
- Chain rule (2) : c w1,0 c1,0 =>=>w1wi-1wic1Ci-1ci
-
- C1,i i . (), . 6 9 .
-
[ 6] [ 1] : .[ 2] : .
-
- [ 1], [ 2] [ 6] [ 3] [ 10] c1,N [1]
-
-
-
HMM [ 9] [ 3], [ 4] [ 3] : .[ 4] : .
-
HMM - [ 3] [ 4] [ 9] , [ 3]
Tri-gram [2][3]
-
HMM - HMM
P ( flies(N) like(V) a(ART) flower(N) )A = P(N | )P(V|N)P(ART|V)P(N|ART) = 0.29 0.43 0.65 1.0 = 0.081B = P(flies | N)P(like | V)P(a | ART)P(flower | N) = 0.025 0.1 0.36 0.063 = 5.4 10-5
P = A * B = 4.37 10-6
-
HMM - HMM NVNVPARTNNV$flieslikeaflies0.290.0250.430.10.650.36010.063$
-
HMM - (Hidden Markov Model)2 HMM .P(ci|ci-1) (State Transition Probability) P(wi|ci) (Observation Symbol Probability)
-
N () ,
HMM
Tri-gram
-
- HMM : : ( ) ( ) ( ) NVNVPARTNNV$+.+..++flieslikeaflies$
-
N () ,
HMM
HMM
-
- HMM : : (Multiple Observation) (Shared Word Sequence) (Virtual Word)
(Twoply HMM) hi i (Head Category)ti i (Tail Category)
HMM
-
(Rule-based Approach) ( ) ( ) ( ) ( ) . (Positive)/(Negative) (Disambiguation Rule-based) , (Transformation Rule-based) .
Klein Simmons Green Rubin (TAGGIT)Hindle Chanod Tapanainen Voutilainen ENGCGBrill
-
Klein & SimmonsKlein Simmons 400 , 30 , 90%
-
Green Rubin (TAGGIT)Green Rubin (TAGGIT) , 3,000 Negative, Positive ( )W X ? Y Z -> not A ( W X ? Y Z -> A) ( W, X, ?, Y, Z ? not A( A) : 286 , Brown 100 => 77% 25% 80% CLAWS Brown
-
Hindle Hindle ( 136 )
(default rule) : [ADJ+N+V] -> [N] [*] [*] ( , , ) [PREP+TNS] -> TNS[N+V] (PREP TNS , PREP TNS TNS )
-
Hindle - if correct goto next Else 46 100 Brown 5 ; 35,000 : 98%( 95%) : 97%( 90%)
-
Chanod & Tapanainen Chanod Tapanainen ( )16 : 50% 97 : 2/3 . 50 1 (Principle rule) : Heuristic rules : Principle rule Non-Contextual rules : Heuristic rule 11 Finite State Tranducer
-
Chanod & Tapanainen -37 5,752 98.7% 12 , 97.5%
. .
-
ENGCGVoutilainen ENGCG TokenizerENGCG ENGCG disambiguator Finite state syntactic disambiguator ENGCG (disambiguator) Finite-State Intersection Grammar
-
ENGCG-ENGCG (disambiguator) - (negative)
D0() 39.0% 67,737 1.77 31 0.08%D1(D0+ENGCG) 6.2% 40,450 1.06 124 0.32%D2(D1+) 3.2% 38,946 1.02 226 0.59%D3(D2+) 0.6% 38,342 1.00 281 0.74%
.
-
Brill
(Transformation-based Error-driven Learning) PreprocessorScoring Rule TemplateLearnerUnannotatedtextPreprocessorAnnotatedtextTruthLearnerRule
-
Brill - Scoring / 2.-5.
-
( )
http://infocom.chonan.ac.kr/~limhs/ (/cwb-data/data/nlp/%C7%B0%BB%E7%C5%C2%B1%EB.ppt)