Structural SVMs 및 Pegasos 알고리즘을 이용한 한국어 개체명...

Deep Learning

차례

• 현재 딥러닝 기술 수준 소개

• 딥러닝

• 딥러닝 기반의 자연어처리

Object Recognition

https://www.youtube.com/watch?v=n5uP_LP9SmM



Semantic Segmentation

https://youtu.be/ZJMtDRbqH40



Semantic Segmentation

VGGNet + Deconvolution network

Image Completion

https://vimeo.com/38359771



Neural Art

• Artistic style transfer using CNN

Hand Writing by Machine

Input: recurrent neural network handwriting generation demo

Style:

http://www.cs.toronto.edu/~graves/handwriting.html

LSTM RNN:



Music Composition

https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/




















Image Caption Generation

한 어린 소녀가 풀로 덮인 들판에 서 있다

건물 앞에 서 있는 한 남자

분홍색 개를 데리고 있는 한 여자와 한 여자

구명조끼를 입은 한 작은 소녀가 웃고 있다

GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image

Visual Question Answering

Facebook: Visual Q&A

Word Analogy

King – Man + Woman ≈ Queen Queen – King + Kings ≈ Queens

http://deeplearner.fz-qqq.net/

Japan – Korean + Hangul =?





Neural Machine Translation

T|S 777 항공편 은 3 시간 동안 지상 에 있 겠 습니다 . </s>

flight 0.5 0.4 0 0 0 0 0 0 0 0 0 0 0

777 0.3 0.6 0 0 0 0 0 0 0 0 0 0 0

is 0 0.1 0 0 0.1 0.2 0 0.4 0 0.1 0 0 0

on 0 0 0 0 0 0 0 0.7 0.2 0.1 0 0 0

the 0 0 0 0.2 0.3 0.3 0.1 0 0 0 0 0

ground 0 0 0 0.1 0.2 0.5 0.3 0 0 0 0 0 0

for 0 0 0 0.1 0.2 0.5 0.1 0.1 0 0 0 0 0

three 0 0 0 0.2 0.2 0.6 0 0 0 0 0 0 0

hours 0 0 0 0.1 0.3 0.5 0 0 0 0 0 0 0

. 0 0 0 0.4 0 0.1 0.2 0.1 0.1 0.1 0 0 0

</s> 0 0 0 0 0 0 0 0.1 0 0.1 0.1 0.3 0.3

Neural Conversation Model • 세탁기가 전원이 안 들어와요

A) 많이 불편하셨겠습니다 A) 드럼이세요 아니면 일반 세탁기세요

• 언제 방문해 줄 수 있나요? A) 언제 방문 드리면 편하실까요 A) 언제 방문 드리면 편하시겠어요

• 출장비가 얼마죠? A) 출장 점검료는 15000원이구요 A) 출장비가 만오천원이구요

Abstractive Text Summarization

로드킬로 숨진 친구의 곁을 지키는 길고양이의 모습이 포착되었다.

RNN_search+input_feeding+CopyNet

Learning to Execute LSTM RNN

Learning Approximate Solutions • Travelling Salesman Problem: NP-hard • Pointer Network can learn approximate solutions: O(n^2)

One Shot Learning • Learning from a few examples • Matching Nets use attention and memory

a(x1,x2) is a attention kernel

차례


• 딥러닝


Neural Networks

20

Deep Neural Networks

• Deep Neural Network = Neural Network + multiple levels of nonlinear operations.

21

Why Deep Neural Networks?

• 사람의 인지 과정과 유사함

– 추상화: 저수준의 표현 고수준의 표현

22

Why Deep Neural Networks?: Integrated Learning

• 기존 기계학습 방법론 – Handcrafting features time-consuming

• Deep Neural Network: Feature Extractor + Classifier

23 <겨울학교14 Deep Learning 자료 참고>

Why Deep Neural Networks?: Unsupervised Feature Learning

• 기계학습에 많은 학습 데이터 필요 – 소량의 학습 데이터

• 학습 데이터 구축 비용/시간

– 대량의 원시 코퍼스 (unlabeled data)

• Semi-supervised, Unsupervised …

• Deep Neural Network – Pre-training 방법을 통해 대량의 원시 코퍼스에서 자질 학습

– Restricted Boltzmann Machines (RBM)

– Stacked Autoencoder, Stacked Denosing Autoencoder

– Word Embedding (for NLP)

24

DNN Difficulties Now

• 학습이 잘 안됨 Unsupervised Pre-training

– Back-propagation 알고리즘 X

• 많은 계산이 필요함 하드웨어/GPU 발전

– Many parameters

• Over-fitting 문제 Pre-training, Drop-out, …

25

Deep Belief Network [Hinton06]

• Key idea

– Pre-train layers with an unsupervised learning algorithm in phases

– Then, fine-tune the whole network by supervised learning

• DBN are stacks of Restricted Boltzmann Machines (RBM)

26

Restricted Boltzmann Machine

• A Restricted Boltzmann machine (RBM) is a generative stochastic neural network that can learn a probability distribution over its set of inputs

• Major applications – Dimensionality reduction

– Topic modeling, …

27

Training DBN: Pre-Training

• 1. Layer-wise greedy unsupervised pre-training – Train layers in phase from the bottom layer

28

Training DBN: Fine-Tuning

• 2. Supervised fine-tuning for the classification task

29

The Back-Propagation Algorithm

Autoencoder

• Autoencoder is an NN whose desired output is the same as the input

– To learn a compressed representation (encoding) for a set of data.

– Find weight vectors A and B that minimize: Σi(yi-xi)

2


Stacked Autoencoders

• After training, the hidden node extracts features from the input nodes

• Stacking autoencoders constructs a deep network


Dropout (Hinton12)

33

• In training, randomly dropout hidden units with probability p.

<겨울학교14 Deep Learning 자료 참고>

Non-linearity (Activation Function)

34

Convolutional Neural Network (LeCun98)

• Convolutional NN

– Convolution Layer • Sparse Connectivity

• Shared Weights

• Multiple feature maps

– Sub-sampling Layer • Average/max pooling

• NxN1

• Ex. LeNet

Multiple feature maps

CNN Architectures

CNN for Audio

Recurrent Neural Network

• “Recurrent” property dynamical system over time

Bidirectional RNN

• Exploit future context as well as past

Long Short-Term Memory RNN

• LSTM can preserve gradient information

차례


• 딥러닝


텍스트의 표현 방식

• One-hot representation (or symbolic)

– Ex. [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]

– Dimensionality • 50K (PTB) – 500K (big vocab) – 3M (Google 1T)

– Problem • Motel [0 0 0 0 0 0 0 0 1 0 0] AND

• Hotel [0 0 0 0 0 0 1 0 0 0 0] = 0

• Continuous representation

– Latent Semantic Analysis, Random projection

– Latent Dirichlet Allocation, HMM clustering

– Neural Word Embedding • Dense vector

• By adding supervision from other tasks improve the representation

Neural Network Language Model (Bengio00,03)

Shared weights = Word embedding

• Idea – A word and its context is a

positive training sample

– A random word in that same context negative training sample

– Score(positive) > Score(neg.)

• Training complexity is high – Hidden layer output

– Softmax in the output layer

• Hierarchical softmax

• Negative sampling

• Ranking(hinge loss) Input Dim: 1 Dim: 2 Dim: 3 Dim: 4 Dim: 5

1 (boy) 0.01 0.2 -0.04 0.05 -0.3

2 (girl) 0.02 0.22 -0.05 0.04 -0.4

LT: |V|*d, Input(one hot): |V|*1 LTT I

한국어 Word Embedding: NNLM

전이 기반의 한국어 의존구문분석: Forward

• Transition-based(Arc-Eager): O(N)

• 예: CJ그룹이1 대한통운2 인수계약을3 체결했다4

– [root], [CJ그룹이1 대한통운2 …], {}

• 1: Shift

– [root CJ그룹이1], [대한통운2 인수계약을3 …], {}

• 2: Shift

– [root CJ그룹이1 대한통운2], [인수계약을3 체결했다4], {}

• 3: Left-arc(NP_MOD)

– [root CJ그룹이1], [2인수계약을3 체결했다4], {(인수계약을3대한통운2)}

• 4: Shift

– [root CJ그룹이1 2인수계약을3], [체결했다4], {(인수계약을3대한통운2)}

• 5: Left-arc(NP_OBJ)

– [root CJ그룹이1], [3체결했다4], {(체결했다4인수계약을3), …}

• 6: Left-arc(NP_SUB)

– [root], [(1,3)체결했다4], {(체결했다4CJ그룹이1), …}

• 7: Right-arc(VP)

– [root4 (1,3)체결했다4], [], {(root체결했다4), …}

딥러닝 기반 한국어 의존구문분석 (한글 및 한국어14)

• Transition-based + Backward

– O(N)

– 세종코퍼스 의존 구문 변환

• 보조용언/의사보조용언 후처리

• Deep Learning 기반

– ReLU(> Sigmoid) + Dropout

– Korean Word Embedding

• NNLM, Ranking(hinge, logit)

• Word2Vec

– Feature Embedding

• POS (stack + buffer)

– 자동 분석(오류 포함)

• Dependency Label (stack)

• Distance information

• Valency information

• Mutual Information

– 대용량 코퍼스 자동 구문 분석

Input Word

S[wt-2 wt-1 ] B[wt …]

Input Feature

f1 f2 f3 f4 …

Word Lookup Table Feature Lookup Table

LT1

LTN

…

LT1

LTD

…

concat

Linear

M1 x h

ReLU

Linear

M2 x #output

한국어 의존구문분석 실험 결과

• 기존연구: UAS 85~88%

• Structural SVM 기반 성능: • UAS=89.99% • LAS=87.74%

• Pre-training > no Pre. • Dropout > no Dropout • ReLU > Sigmoid • MI feat. > no MI feat. • Word Embedding 성능 순위

1. NNLM 2. Ranking(logit loss) 3. Word2vec 4. Ranking(hinge loss)

x(t-1) x(t ) x(t+1)

y(t+1) y(t+1) y(t )

C(t) x(t ) h(t )

i (t )

f (t )

o(t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t+1) y(t )

LSTM RNN + CRF LSTM-CRF 제안

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t+1) y(t )

영어 개체명 인식 (KCC 15, Journal submitted)

영어 개체명 인식 (CoNLL03 data set) F1(dev) F1(test)

SENNA (Collobert) - 89.59

Structural SVM (baseline + Word embedding feature) - 85.58

FFNN (Sigm + Dropout + Word embedding) 91.58 87.35

RNN (Sigm + Dropout + Word embedding) 91.83 88.09

LSTM RNN (Sigm + Dropout + Word embedding) 91.77 87.73

GRU RNN (Sigm + Dropout + Word embedding) 92.01 87.96

CNN+CRF (Sigm + Dropout + Word embedding) 93.09 88.69

RNN+CRF (Sigm + Dropout + Word embedding) 93.23 88.76

LSTM+CRF (Sigm + Dropout + Word embedding) 93.82 90.12

GRU+CRF (Sigm + Dropout + Word embedding) 93.67 89.98

한국어 감성 분석 – CNN • Mobile data

– Train: 4543, Test: 500

• EMNLP14 모델(CNN) 적용 – Matlab으로 구현

– Word embedding: 한국어 10만 단어 + 도메인 특화 1420 단어

Data set Model Accuracy

Mobile Train: 4543 Test: 500

SVM (word feature) 85.58

CNN(relu,kernel3,hid50)+Word embedding (word feature)

91.20

LSTM RNN 기반 한국어 감성분석

• LSTM RNN-based encoding – Sentence embedding 입력

– Fully connected NN 출력

– GRU encoding 도 유사함

x(1) x(2 ) x(t)

h(1) h(2 ) h(t)

y

Data set Model Accuracy

Mobile Train: 4543 Test: 500

SVM (word feature) 85.58

CNN(relu,kernel3,hid50)+Word embedding (word feature)

91.20

GRU encoding + Fully connected NN 91.12

LSTM RNN encoding + Fully connected NN 90.93

Recurrent NN Encoder–Decoder for Statistical Machine Translation (EMNLP14)

Sequence to Sequence Learning with Neural Networks (NIPS14 – Google)

Source Voc.: 160,000 Target Voc.: 80,000 Deep LSTMs with 4 layers Train: 7.5 epochs (12M sentences, 10 days with 8-GPU machine)

Neural MT by Jointly Learning to Align and Translate (ICLR15)

GRU RNN + Alignment Encoding GRU RNN Decoding Vocab: 30,000 (src, tgt) Train: 5 days

J-to-E Neural MT (WAT) – 1/2

• ASPEC-JE data

• Neural MT (RNN-search) – GRU RNN + Alignment Encoding

– GRU RNN Decoding

– Vocab size: 20,000 (src, tgt)

– BLEU(test): 21.63 (beam=10)

• WAT14(Juman): PBMT=18.45, HPBMT=18.72, NAIST(1위,forest-to-string)=23.29

J-to-E Neural MT 실험 (WAT) – 2/2 • 最後/ncc:0 に/ps:1 ,/sl:2 将来/nca:3 展望/ncs:4 に/ps:5 つい/vc:6 て/pj:7 記述/ncs:8 </s>:9

• the/dt:0 future/jj:1 view/nn:2 is/vbz:3 described/vbn:4 ./.:5 </s>:6

• 食物/ncc:0 アレルギー/ncc:1 は/pc:2 アナフィラキシー/ncc:3 の/ps:4 主要/dc:5 な/vx:6 原因/ncs:7 抗原/ncc:8 の/ps:9 一/nn:10 つ/xnn:11 で/vx:12 ある/vd:13 。/op:14 </s>:15

• the/dt:0 food/nn:1 allergy/nn:2 is/vbz:3 one/cd:4 of/in:5 the/dt:6 main/jj:7 causal/jj:8 antigen/nn:9 of/in:10 the/dt:11 anaphylaxis/nn:12 ./.:13 </s>:14

Neural Conversation Model • 세탁기가 전원이 안 들어와요

A) 많이 불편하셨겠습니다 A) 드럼이세요 아니면 일반 세탁기세요

• 언제 방문해 줄 수 있나요? A) 언제 방문 드리면 편하실까요 A) 언제 방문 드리면 편하시겠어요

• 출장비가 얼마죠? A) 출장 점검료는 15000원이구요 A) 출장비가 만오천원이구요

Abstractive Text Summarization

로드킬로 숨진 친구의 곁을 지키는 길고양이의 모습이 포착되었다.

RNN_search+input_feeding+CopyNet

Learning to Execute LSTM RNN

Learning Approximate Solutions • Travelling Salesman Problem: NP-hard • Pointer Network can learn approximate solutions: O(n^2)

End-to-End Neural Speech Recognition (15)

Neural Image Caption Generator (14)

Korean Image Caption Generation

한 어린 소녀가 풀로 덮인 들판에 서 있다

건물 앞에 서 있는 한 남자

분홍색 개를 데리고 있는 한 여자와 한 여자

구명조끼를 입은 한 작은 소녀가 웃고 있다

GRU

Embedding

CNNMultimodal

Softmax

Wt

Wt+1

Image

Structural SVMs 및 Pegasos 알고리즘을 이용한 한국어 개체명...

Documents

Transcript of Structural SVMs 및 Pegasos 알고리즘을 이용한 한국어 개체명...