품사 태깅 시스템 개요

Click here to load reader

download 품사 태깅 시스템 개요

of 39

description

품사 태깅 시스템 개요. 2005. 1. 5 황명진. 차례. 품사 태깅이란 ? 품사 태깅 시 고려사항 품사 태깅에 대한 연구 접근 방법 통계 기반 접근 방법 (Statistical Approach) 규칙 기반 접근 방법 (Rule-based Approach) 통합 접근 방법 (Hybrid Approach) 품사 태깅 시스템 평가 기준 결론 참고문헌. 품사 태깅 (Part-of-speech Tagging). 품사태깅이란 ? - PowerPoint PPT Presentation

Transcript of 품사 태깅 시스템 개요

  • 2005. 1. 5

  • ? (Statistical Approach) (Rule-based Approach) (Hybrid Approach)

  • (Part-of-speech Tagging)?(,) ( ) : , ,

  • (Part-of-speech Tag Set) - . - ) ' ' '+' + '+' + , '+' + ,

  • - (Unknown Word) () [][][][ ] . ().

  • - (Data Sparseness) . (Balanced Corpus) (Smoothing)

  • - (Unsupervised Learning) (Supervised Learning)

  • - (Adaptability) .

  • - '+' .

    , ,

    => ++ or ++

  • (Statistical Approach) ( )

    (Rule-based Approach)

    (Hybrid Approach)

  • (Statistical Approach) (Statistical Approach) ( ) ,

    HMM(Hidden Markov Model)

  • N () ,

  • - Chain rule (1) : w w1,0 c1,0 =>=>w1wi-1wic1Ci-1ci

  • - Chain rule (2) : c w1,0 c1,0 =>=>w1wi-1wic1Ci-1ci

  • - C1,i i . (), . 6 9 .

  • [ 6] [ 1] : .[ 2] : .

  • - [ 1], [ 2] [ 6] [ 3] [ 10] c1,N [1]

  • -

  • HMM [ 9] [ 3], [ 4] [ 3] : .[ 4] : .

  • HMM - [ 3] [ 4] [ 9] , [ 3]

    Tri-gram [2][3]

  • HMM - HMM

    P ( flies(N) like(V) a(ART) flower(N) )A = P(N | )P(V|N)P(ART|V)P(N|ART) = 0.29 0.43 0.65 1.0 = 0.081B = P(flies | N)P(like | V)P(a | ART)P(flower | N) = 0.025 0.1 0.36 0.063 = 5.4 10-5

    P = A * B = 4.37 10-6

  • HMM - HMM NVNVPARTNNV$flieslikeaflies0.290.0250.430.10.650.36010.063$

  • HMM - (Hidden Markov Model)2 HMM .P(ci|ci-1) (State Transition Probability) P(wi|ci) (Observation Symbol Probability)

  • N () ,

    HMM

    Tri-gram

  • - HMM : : ( ) ( ) ( ) NVNVPARTNNV$+.+..++flieslikeaflies$

  • N () ,

    HMM

    HMM

  • - HMM : : (Multiple Observation) (Shared Word Sequence) (Virtual Word)

    (Twoply HMM) hi i (Head Category)ti i (Tail Category)

    HMM

  • (Rule-based Approach) ( ) ( ) ( ) ( ) . (Positive)/(Negative) (Disambiguation Rule-based) , (Transformation Rule-based) .

    Klein Simmons Green Rubin (TAGGIT)Hindle Chanod Tapanainen Voutilainen ENGCGBrill

  • Klein & SimmonsKlein Simmons 400 , 30 , 90%

  • Green Rubin (TAGGIT)Green Rubin (TAGGIT) , 3,000 Negative, Positive ( )W X ? Y Z -> not A ( W X ? Y Z -> A) ( W, X, ?, Y, Z ? not A( A) : 286 , Brown 100 => 77% 25% 80% CLAWS Brown

  • Hindle Hindle ( 136 )

    (default rule) : [ADJ+N+V] -> [N] [*] [*] ( , , ) [PREP+TNS] -> TNS[N+V] (PREP TNS , PREP TNS TNS )

  • Hindle - if correct goto next Else 46 100 Brown 5 ; 35,000 : 98%( 95%) : 97%( 90%)

  • Chanod & Tapanainen Chanod Tapanainen ( )16 : 50% 97 : 2/3 . 50 1 (Principle rule) : Heuristic rules : Principle rule Non-Contextual rules : Heuristic rule 11 Finite State Tranducer

  • Chanod & Tapanainen -37 5,752 98.7% 12 , 97.5%

    . .

  • ENGCGVoutilainen ENGCG TokenizerENGCG ENGCG disambiguator Finite state syntactic disambiguator ENGCG (disambiguator) Finite-State Intersection Grammar

  • ENGCG-ENGCG (disambiguator) - (negative)

    D0() 39.0% 67,737 1.77 31 0.08%D1(D0+ENGCG) 6.2% 40,450 1.06 124 0.32%D2(D1+) 3.2% 38,946 1.02 226 0.59%D3(D2+) 0.6% 38,342 1.00 281 0.74%

    .

  • Brill

    (Transformation-based Error-driven Learning) PreprocessorScoring Rule TemplateLearnerUnannotatedtextPreprocessorAnnotatedtextTruthLearnerRule

  • Brill - Scoring / 2.-5.

  • ( )

    http://infocom.chonan.ac.kr/~limhs/ (/cwb-data/data/nlp/%C7%B0%BB%E7%C5%C2%B1%EB.ppt)