Fast Methods for Kernel-based Text Analysis
-
Upload
kenyon-lane -
Category
Documents
-
view
83 -
download
3
description
Transcript of Fast Methods for Kernel-based Text Analysis
![Page 1: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/1.jpg)
1
Fast Methods for Kernel-based Text Analysis
Taku Kudo 工藤 拓Yuji Matsumoto 松本 裕治NAIST (Nara Institute of Science and Technology)
41st Annual Meeting of the Association for Computational Linguistics , Sapporo JAPAN
![Page 2: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/2.jpg)
2
Background
Kernel methods (e.g., SVM) become popularCan incorporate prior knowledge independently from the machine learning algorithms by giving task dependent kernel (generalized dot-product) High accuracy
![Page 3: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/3.jpg)
3
Problem
Too slow to use kernel-based text analyzers to the real NL applications (e.g., QA or text mining) because of their inefficiency in testingSome kernel-based parsers run only at 2 - 3 seconds/sentence
![Page 4: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/4.jpg)
4
Goals
Build fast but still accurate kernel- based text analyzersMake it possible to use them to wider range of NL applications
![Page 5: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/5.jpg)
5
Outline
Polynomial Kernel of degree d Fast Methods for Polynomial kernel PKI PKE
Experiments Conclusions and Future Work
![Page 6: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/6.jpg)
6
Outline
Polynomial Kernel of degree d Fast Methods for Polynomial kernels PKI PKE
Experiments Conclusions and Future Work
![Page 7: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/7.jpg)
7
Kernel Methods
No need to represent example in an explicit feature vector
Complexity of testing is O(L ・ |X|)
L
iii
L
iii
XXK
XXXf
1
1
),(
)φ()φ()(
},,,{ 21 LXXXT Training data
![Page 8: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/8.jpg)
8
Kernels for Sets (1/3)
FXXXX T
iiiF
jL
N
},,,,{
},,,{
21
21
Focus on the special case where examples are represented as sets
The instances in NLP are usually represented as sets (e.g., bag-of-words)
Feature set:
Training data:
![Page 9: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/9.jpg)
9
Kernels for Sets (2/3)},,,{ ,},,,{ 21 edbaXdcbaX
Combinations (subsets) of features
}},,{{
}},{},,{},,{{
dba
dbdaba
3 |},,{| || ),( 2121 dbaXXXXK
Simple definition:
2nd order
3rd order
![Page 10: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/10.jpg)
10
Kernels for Sets (3/3)
I ate a cake PRP VBD DT NN
Dependent (+1) or independent (-1) ?
head modifier
Head-word: ateHead-POS: VBDModifier-word: cakeModifier-POS: NN
X=
Head-word: ateHead-POS: VBDModifier-word: cakeModifier-POS: NNHead-POS/Modifier-POS: VBD/NNHead-word/Modifier-POS: ate/NN …
X=
Subsets (combinations) of basic features are critical to improve overall accuracy in many NL tasks
Previous approaches select combinations heuristically
Heuristic
selection
![Page 11: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/11.jpg)
11
Polynomial Kernel of degree d
..}2,1,0{1||),( 2121 dXXXXK dd
Implicit form
|)(|)(),(0
2121
d
rrdd XXPrcXXK
Explicit form
r
rm
lmrd
rld m
rm
l
drc )1()(
is a set of all subsets of with exactly elements in it
is prior weight to the subsets with size
)(XPr X
r )(rcd
r
(subset weight)
![Page 12: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/12.jpg)
12
Example (Cubic Kernel d=3 )
},,,{ ,},,,{ 21 edbaXdcbaX
64)13(1||),( 3321213 XXXXK
Implicit form:
}},,{{)( ,6)3(
}},{},,{},,{{)( ,12)2(
}}{},{},{{)( ,7)1(
}{)( ,1)0(
2133
2123
2113
2103
dbaXXPc
dbdabaXXPc
dbaXXPc
XXPc
64163123711),( 213 XXK
Explicit form:
Up to 3 subsets are used as new
features
![Page 13: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/13.jpg)
13
Outline
Polynomial Kernel of degree d Fast Methods for Polynomial kernel PKI PKE
Experiments Conclusions and Future Work
![Page 14: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/14.jpg)
14
Toy Example
{a, b, c}{a, b, d}{b, c, d}
10.5-2
Xα
X={a,c,e}
123
Feature Set: F={a,b,c,d,e}
Examples:
Test Example:
Kernel: 321213 1||),( XXXXK
j
#SVs L =3
j
![Page 15: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/15.jpg)
15
PKB (Baseline)
{a, b, c}{a, b, d}{b, c, d}
10.5-2
Xα
Test Example X={a,c,e}
K(X,X’) = (|X∩X’|+1)3
123
f(X) = 1 ・ (2+1) + 0.5 ・ (1+1) - 2 (1+1) = 15
Complexity is always O(L ・ |X|)
3 3 3
K(Xj,X)
j
![Page 16: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/16.jpg)
16
PKI (Inverted Representation)
{a, b, c}{a, b, d}{b, c, d}
10.5-2
Xjα
K(X,X’) = (|X∩X’|+1)3
123
a b c d
{1,2}{1,2,3}{1,3}{2,3}
Test Example X= {a, c, e}
f(X)=1 ・ (2+1) + 0.5 ・ (1+1) - 2 (1+1) = 153 3 3
Average complexity is O(B ・ |X|+L) Efficient if feature space is sparse Suitable for many NL tasks
Inverted Index
B = Avg. size
![Page 17: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/17.jpg)
17
PKE (Expanded Representation)
L
iii XXKXf
1
),()(
L
iii
L
iii
XX
XX
1
1
)φ( )φ(
)φ()φ(
ww
Convert into linear form by calculating vector w projects X into its subsets space)φ(X
![Page 18: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/18.jpg)
18
PKE (Expanded Representation)
K(X,X’) = (|X∩X’|+1)
c3(0)=1, c3(1)=7,c3(2)=12, c3(3)=6
{a, b, c} {a, b, d} {b, c, d}
10.5-2
Xjαj
123
φ{a}{b}{c}{d}{a,b}{a,c}{a,d}{b,c}{b,d}{c,d}{a,b,c}{a,b,d}{a,c,d}{b,c,d}
-0.5 10.5-3.5-7-10.5 18 12 6-12-18-24 6 3 0-12
C w
1
12
7
6
W (Expansion Table)3
F(X)= - 0.5 + 10.5 – 7 + 12 = 15
Test Example X={a,c,e}
{φ,{a},{c}, {e}, {a,c},{a,e}, {c,e},{a,c,e}}
Complexity is O(|X| ) , independent of the number of SVs (L)
Efficient if the number of SVs is large
d
w({b,d}) = 12 (0.5 – 2 ) = -18
![Page 19: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/19.jpg)
19
PKE in Practice
Hard to calculate Expansion Table exactlyUse Approximated Expansion TableSubsets with smaller |w| can be removed, since |w| represents a contribution to the final classification Use subset mining (a.k.a. basket mining) algorithm for efficient calculation
![Page 20: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/20.jpg)
20
Subset Mining Problemid set
1234
{ a c d } { a b c } { a b d } { b c e }
Transaction Database
{a}:3 {b}:3 {c}:3 {d}:2 {a b}:2 {b c}: 2 {a c}:2 {a d}: 2
Results
Extract all subsets that occur in no less than sets of the transaction database
and no size constraints → NP-hard Efficient algorithms have been proposed
(e.g., Apriori, PrefixSpan)
2
1
![Page 21: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/21.jpg)
21
Feature Selection as Mining
• Can efficiently build the approximated table • σ controls the rate of approximation
{a, b, c} {a, b, d} {b, c, d}
10.5-2
Xiαi
123
Direct generation with subset mining
{a}{d}{a,b}{a,c}{b,c}{b,d}{c,d}{b,c,d}
10.5-10.5 12 12 -12-18-24-12
W φ{a}{b}{c}{d}{a,b}{a,c}{a,d}{b,c}{b,d}{c,d}{a,b,c}{a,b,d}{a,c,d}{b,c,d}
σ=10
-0.5 10.5-3.5-7-10.5 12 12 6-12-18-24 6 3 0-12
s w
Exhaustive generation and testing
→ Impractical!
s
![Page 22: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/22.jpg)
22
Outline
Polynomial Kernel of degree d Fast Methods for Polynomial kernel PKI PKE
Experiments Conclusions and Future Work
![Page 23: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/23.jpg)
23
Experimental Settings
Three NL tasks English Base-NP Chunking (EBC) Japanese Word Segmentation (JWS) Japanese Dependency Parsing (JDP)
Kernel Settings Quadratic kernel is applied to EBC Cubic kernel is applied to JWS and JDP
![Page 24: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/24.jpg)
24
Results (English Base-NP Chunking)
Time(Sec./Sent.)
Speedup Ratio
F-score
PKB .164 1.0 93.84PKI .020 8.3 93.84PKE (σ=.01) .0016 105.2 93.79PKE (σ=.005) .0016 101.3 93.85PKE (σ=.001) .0017 97.7 93.84PKE (σ=.0005) .0017 96.8 93.84
![Page 25: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/25.jpg)
25
Results (Japanese Word Segmentation)
Time(Sec./Sent.)
Speedup Ratio
Accuracy (%)
PKB .85 1.0 97.94PKI .49 1.7 97.94PKE (σ=.01) .0024 358.2 97.93PKE (σ=.005) .0028 300.1 97.95 PKE (σ=.001) .0034 242.6 97.94 PKE (σ=.0005) .0035 238.8 97.94
![Page 26: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/26.jpg)
26
Results (Japanese Dependency Parsing)
Time(Sec./Sent.)
Speedup Ratio
Accuracy (%)
PKB .285 1.0 89.29PKI .0226 12.6 89.29PKE (σ=.01) .0042 66.8 88.91PKE (σ=.005) .0060 47.8 89.05 PKE (σ=.001) .0086 33.3 89.26PKE (σ=.0005) .0090 31.8 89.29
![Page 27: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/27.jpg)
27
Results
2 - 12 fold speed up in PKI 30 - 300 fold speed up in PKE Preserve the accuracy when we set an appropriate σ
![Page 28: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/28.jpg)
28
Comparison with related work
XQK [Isozaki et al. 02] Same concept as PKE Designed only for the Quadratic Kernel Exhaustively creates the expansion
table
PKE Designed for general Polynomial Kernels Uses subset mining algorithms to create
the expansion table
![Page 29: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/29.jpg)
29
Conclusions
Propose two fast methods for the polynomial kernel of degree d PKI (Inverted) PKE (Expanded)
2-12 fold speed up in PKI, 30-300 fold speed up in PKEPreserve the accuracy
![Page 30: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/30.jpg)
30
Future Work
Examine the effectiveness in a general machine learning dataset Apply PKE to other convolution kernels Tree Kernel [Collins 00]
Dot-product between trees Feature space is all sub-tree Apply sub-tree mining algorithm [Zaki 02]
![Page 31: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/31.jpg)
31
English Base-NP ChunkingExtract Non-overlapping Noun Phrase from text[NP He ] reckons [NP the current account deficit ] will narrow to[NP only # 1.8 billion ] in [NP September ] .
BIO representation (seeing as a tagging task) B: beginning of chunk I: non-initial chunk O: outside
Pair-wise method to 3-class problem
training: wsj15-18, test: wsj20 (standard set)
![Page 32: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/32.jpg)
32
Japanese Word Segmentation
太 郎 は 花 子 に 本 を 読 ま せ た ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
Sentence:Boundaries:
},,,,,{ 321,12 iiiiiii ccccccX
Distinguish the relative position Use also the character types of Japanese Training: KUC 01-08, Test: KUC 09
If there is a boundary between and i 1i1iY , otherwise 1iY
Taro made Hanako read a book
![Page 33: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/33.jpg)
33
Japanese Dependency Parsing
私は ケーキを 食べるI-top cake-acc. eat
Identify the correct dependency relations between two bunsetsu (base phrase in English)
Linguistic features related to the modifier and head (word, POS, POS-subcat, inflections, punctuations, etc)
Binary classification (+1 dependent, -1 independent)
Cascaded Chunking Model [kudo, et al. 02]
Training: KUC 01-08, Test: KUC 09
I eat a cake
![Page 34: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/34.jpg)
34
Kernel Methods (1/2)
L
iii XXXf
1
)φ()φ()(
X : example to be classified Xi: training examples : weight for examples : a function to map examples to another vectorial
spaceφ
i
Suppose a learning task: }1,1{: Xg
))(sgn()( XfXg
},{ 1 LXXT training examples
![Page 35: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/35.jpg)
35
PKE (Expanded Representation)
L
i
d
rirdi XXPrcXf
1 0
|)(|)()(
If we calculate in advance ( is the indicator function)
))((|)(|)(1
||
L
iisdi XPsIscsw
for all subsets
)(
)()(Xs d
swXf
d
r rd FPFs0
)()(
d
r rd XPX0
)()(
I
![Page 36: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/36.jpg)
36
TRIE representation
{a}{d}{a,b}{a,c}{b,c}{b,d}{c,d}{b,c,d}
10.5-10.5 12 12 -12-18-24-12
w
a db
b c c d
c
d
d
root
10.5
12 12
-10.5
-24-18-12
-12
Compress redundant structures Classification can be done by simply
traversing the TRIE
![Page 37: Fast Methods for Kernel-based Text Analysis](https://reader038.fdocuments.net/reader038/viewer/2022103006/56812f2b550346895d94bdd2/html5/thumbnails/37.jpg)
37
Kernel Methods
No need to represent example in an explicit feature vector
Complexity of testing is O(L |X|)
L
iii
L
iii
XXK
XXXf
1
1
),(
)φ()φ()(
},,,{ 21 LXXXT Training data