제4장 자연언어처리, 인공지능 , 기계학습cs.kangwon.ac.kr/~leeck/NLP/04_ml.pdf ·...

제4장

자연언어처리인공지능기계학습

목차

• 인공지능

• 기계학습

2

인공지능

• 정의(위키피디아)– 인공지능은철학적으로인간이나지성을갖춘존재, 혹은시스템에의해만들어진지능, 즉인공적인지능을뜻한다

– 일반적으로범용컴퓨터에적용한다고가정한다

– 이용어는또한그와같은지능을만들수있는방법론이나실현가능성등을연구하는과학분야를지칭하기도한다

• 다양한연구주제– 지식표현, 탐색, 추론, 문제해결, 학습, 인지, 행동, 자연언어처리

3

지식표현및추론

• 지식표현

–명제논리

• Prolog, Lisp

– Semantic Network

• 개념간의관계를망형태로표현

• 추론

–전문가시스템

– Theorem Prover

4

탐색및문제해결

• 게임이론– 탐색: branch and bound, min-max– 체스, 바둑, 장기

• 체스의경우,컴퓨터가세계챔피언을이김

• 최적화및탐색방법– Greedy search– Beam search– Gradient– Simulated annealing– 유전자알고리즘

5

기계학습

• 정의(위키피디아)– 기계학습(machine learning)은인공지능의한분야로, 컴퓨터가학습할수있도록하는알고리즘과기술을개발하는분야를말한다

– 가령, 기계학습을통해서수신한이메일이스팸인지아닌지를구분할수있도록훈련할수있다

• 관련분야– 인공지능

– Bayesian Methods

– Computational Complexity Theory

– Control Theory

– Information Theory

– Statistics

– Philosophy

– Psychology and Neurobiology6

자연언어처리와인공지능

• 인공지능의연구분야로서의자연언어처리

– 음성인식, 형태소분석, 통사분석, 의미분석

– 언어이해인공지능

• 자연언어처리를위한인공지능기법

– 형태론, 구문론, 의미론, 화용론적언어지식

지식표현 (WordNet)

– 자연언어처리문제해결기계학습

7

WordNet

• 자연언어처리를위한영단어의관계망

8

말뭉치데이터

• 신문, 잡지, 교과서등에서추출한다양한문장들로구성

• 언어에대한다양한표식– 품사, 문장성분, 구문분석결과

• KIBS, 세종코퍼스

• Brown Corpus, Penn Treebank, …

9

브라운말뭉치

10

기계학습기반의자연언어처리

• 중의성해소분류문제

– 구조표지, 품사표지, 중의성해소, 전치사접속결정등

• 언어습득및이해

– 규칙추론, 정보추출및검색, 자동요약, 기계번역

11

기계학습기법구분의예

• 기호적학습

– 사례기반학습, 결정트리, 귀납논리, …

• 비기호적학습

– 신경망, 유전자알고리즘, …

• 확률적학습

– 베이지안망, 은닉마코프모델, 확률문법

• 변형기반학습, 능동학습, 강화학습, …

12

기호적학습

• 분류문제

– 주어진개체의각종특성들로부터그개체의종류(class)를결정하는문제

• 기호적학습

– 특성과종류간의관계를몇가지규칙으로서술• if-then 규칙등

– 주어진데이터로부터규칙을학습

13

기호적학습방법

• 결정트리

• 결정리스트

• 변형기반오류에의한학습

• 선형분리자

• 사례기반학습

14

결정트리

• 결정트리 (decision tree)

– 귀납적학습을위한실용적인방법

– 이산값을가지는함수의추정 = 규칙집합의구축

– 생성이용이, 학습을통해생성된결정트리를규칙의집합으로이해가능

15

Day Outlook 온도 Humidity Wind Pl ay T e nn i s

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

결정트리학습데이터예

Play tennis?

결정트리표현

• <outlook, humidity, wind, playtennis>– 트리생성경우의수?

17

weak

outlook

windhumidity

sunny overcast rain

high low strong

No NoYes

Yes

Yes

결정트리학습

• Top-down greedy search through the space of possible decision trees.

• 학습데이터(training examples)를가장잘분류할수있는속성(attribute)을루트(혹은상위노드)에둔다– Entropy, Information Gain 등을이용

– ID3 및 C4.5 알고리즘

• 데이터단편화– 데이터가적은경우일반화성능저하

– Pruning

– 결정리스트 (decision list)• 논리곱형식의규칙들의순서화된리스트

18

Entropy

• Minimum number of bits of information needed to encode the classification of an arbitrary member of S

• entropy = 0, if all members in the same class

• entropy = 1, if |positive examples|=|negative examples|

i

c

i

i ppSEntropy 2

1

log)(

Entro

py(S

)P 1.00.0

1.0

ppppSEntropy 22 loglog)(

940.0

)14/5(log)14/5()14/9(log)14/9(])5,9([ 22

Entropy

Information Gain

• Expected reduction in entropy caused by partitioning the examples according to attribute A

• Attribute A를알게되어얻어지는 entropy의축소정도

)()(),()(

v

AValuesv

v

SEntropyS

SSEntropyASGain

048.0

00.1)14/6(811.0)14/8(940.0

)()14/6(

)()14/8()(

)()(),(

]3,3[

]2,6[

]5,9[

,)(

,

SstrongEntropy

SweakEntropySEntropy

SvEntropyS

SvSEntropyWindSGain

Sstrong

Sweak

S

StrongWeakWindValues

StrongWeakv

Information Gain – cont’d

Which Attribute is the Best Classifier?

Humidity

High Normal

S:[9+, 5-]

E=0.940

[3+, 4-]

E=0.985

[6+, 1-]

E=0.592

0.151

592.0)14/7(

985.0)14/7(940.0

Humidity) Gain(S,

Wind

Weak Strong

S:[9+, 5-]

E=0.940

[6+, 2-]

E=0.811

[3+, 3-]

E=1.000

0.048

0.1)14/6(

811.0)14/8(940.0

Wind) Gain(S,

Classifying examples by Humidity provides more

information gain than by Wind.

Which Attribute is the Best Classifier? – cont’d

Hypothesis Space Search

• Training examples에적합한하나의 hypothesis를찾는다.

• ID3의 hypothesis space– the set of possible decision

trees

• hill-climbing search– Information gain

• hill-climbing의 guide

– Single current hypothesis 만유지

– No back-tracking

Pruning

• Overfitting 문제– 학습데이터에만맞도록학습되어일반성을잃어버림

• Occam’s razor– Prefer the simplest

hypothesis that fits the data

– Shorter trees are preferred over larger trees Prunning

• Cross Validation (교차검증)– Validation set의성능을측정하여 validation set의성능이떨어지기시작하면학습을멈춤 (혹은가지치기)

27

결정트리예

28

사례기반학습

• 학습데이터를 “모두” 저장– 귀납적감독학습

(inductive supervised learning)

– k-nearest neighbor– 잡음에약함– 실행속도가느림

• TiMBL (Tilburg memory-based learning environment)

29

비기호적학습

• 신경망

– 인간의뇌의정보처리를모방하려고하는학습모델

– 병렬처리에기반

– 회귀(regression), 분류(classification)문제에적용

• 유전자알고리즘

– 생물의진화를모방한학습방법

– 지역해를벗어나는것이목표

30

신경망의표현

• 입출력간의사상을학습

– y = f(x1, x2, ..., xn)

31

x1 xnx2

h1 hk

y

h2

x3

연결가중치

32

x1

x2

xn

w1

w2

wn

w0

n

i ii xwwo10

)exp(1

1

ox3w3

선형분리자

• 가중치갱신방법으로학습

• 잡음, 고차원문제에적합– 철자교정, 품사태깅, 문서분류

• SNOW (sparse network of Winnows)

• Widrow-Hoff rule, EG (exponentially gradient)

• Perceptron

• SVM (Support Vector Machine)– linear kernel

33

34

Linear Functions

w x = 0

- --- -

-

-- -

- -

- -

-

-

w x =

35

Perceptron learning rule• On-line, mistake driven algorithm.

• Rosenblatt (1959)

suggested that when a target output value is provided for a single neuron with fixed input, it can incrementally change weights and learn to produce the output using the Perceptron learning rulePerceptron == Linear Threshold Unit

12

6

345

7

6w

1w

T

y

1x

6x

xwTi

i

ixwy ˆ

36

Perceptron learning rule• We learn f:X{-1,+1} represented as f = sgn{wx)

Where X= or X= wn{0,1}nR nR

• Given Labeled examples: )}y,(x),...,y,(x),y,{(x mm2211

1. Initialize w=0

2. Cycle through all examples

a. Predict the label of instance x to be y’ = sgn{wx)

b. If y’y, update the weight vector:

w = w + r y x (r - a constant, learning rate)

Otherwise, if y’=y, leave weights unchanged.

nR

37

Footnote About the Threshold

• On previous slide, Perceptron has no threshold

• But we don’t lose generality:

,

1,

ww

xxx

0x

1x

xw

0x

1x

01,, xw

38

Geometric View

42

Perceptron Learnability

• Only linearly separable functions

• Minsky and Papert (1969) wrote an influential book demonstrating Perceptron’s representational limitations– Parity functions can’t be learned (XOR)

신경망학습

• 가중치조절– 헤비안학습규칙, 오류역전파, 볼츠만방법

• 다층퍼셉트론(multi-layer perceptron)– Universal Approximator

• 재귀망(recurrent network)– 동적데이터

• 자기조직신경망(self-organizing map)– 클러스터링

43

기타기계학습방법• 최대엔트로피

– 다양한통계적증거들을최대엔트로피원리에의거해결합, 활용

• SVM– 계산학습이론에기반– 문서분류

• 은닉마코프모델– 음성인식, 합성, 품사태깅– Viterbi 알고리즘 (dynamic programming)

• 베이지안망– 확률그래프모델– 인과관계의추론

• 클러스터링– 비지도학습

• 앙상블머신– 품사태깅, 철자교정– 배깅, 부스팅

44

제4장 자연언어처리, 인공지능 , 기계학습cs.kangwon.ac.kr/~leeck/NLP/04_ml.pdf ·...

Documents

Transcript of 제4장 자연언어처리, 인공지능 , 기계학습cs.kangwon.ac.kr/~leeck/NLP/04_ml.pdf ·...