Mismatch string kernels for discriminative protein classification
Text classification using Text kernels
-
Upload
dibyendu-dev-nath -
Category
Data & Analytics
-
view
127 -
download
4
Transcript of Text classification using Text kernels
![Page 1: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/1.jpg)
Text Classification Using String Kernels
Presented byDibyendu Nath & Divya Sambasivan
CS 290D : Spring 2014
Huma Lodhi, Craig Saunders, et alDepartment of Computer Science, Royal Holloway, University of London
![Page 2: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/2.jpg)
Intro: Text Classification• Task of assigning a document to one
or more categories.
• Done manually (library science) or algorithmically (information science, data mining, machine learning).
• Learning systems (neural networks or decision trees) work on feature vectors, transformed from the input space.
• Text documents cannot readily be described by explicit feature vectors. lingua-systems.eu
![Page 3: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/3.jpg)
Problem Definition• Input : A corpus of documents.
• Output : A kernel representing the documents. • This kernel can then be used to classify, cluster etc.
using existing algorithms which work on kernels, eg: SVM, perceptron.
• Methodology : Find a mapping and a kernel function so that we can apply any of the standard kernel methods of classification, clustering etc. to the corpus of documents.
![Page 4: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/4.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 5: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/5.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 6: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/6.jpg)
Motivation
• Text documents cannot readily be described by explicit feature vectors.
• Feature Extraction - Requires extensive domain knowledge- Possible loss of important information.
• Kernel Methods – an alternative to explicit feature extraction
![Page 7: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/7.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 8: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/8.jpg)
The Kernel Trick• Map data into feature space via mapping ϕ. • The mapping may be assessed via a kernel
function.• Construct a linear function in feature space
slide from Huma Lodhi
![Page 9: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/9.jpg)
Kernel Function
slide from Huma Lodhi
Kernel Function – Measure of Similarity, returns the inner product between mapped data points
K(xi, xj) = < Φ(xi), Φ(xj)>
Example –
![Page 10: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/10.jpg)
Kernels for Sequences• Word Kernels [WK] - Bag of Words- Sequence of characters followed by
punctuation or space
• N-Grams Kernel [NGK]• Sequence of n consecutive substrings• Example : “quick brown”
3-gram - qui, uic, ick, ck_, _br, bro, row, own
• String Subsequence Kernel [SSK]• All (non-contiguous) substrings of n-symbols
![Page 11: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/11.jpg)
Word Kernels• Documents are mapped to very high
dimensional space where dimensionality of the feature space is equal to the number of unique words in the corpus.
• Each entry of the vector represents the occurrence or non-occurrence of the word.
• Kernel - inner product between mapped sequences give a sum over all common (weighted) words
fish tank sea
Doc 1 2 0 1
Doc 2 1 1 0
![Page 12: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/12.jpg)
String Subsequence KernelsBasic IdeaNon-contiguous substrings :
substring “c-a-r”
card – length of sequence = 3
custard – length of sequence = 6
The more subsequences (of length n) two strings have in common, the more similar they are considered
Decay FactorSubstrings are weighted according to the degree of contiguity in a string by a decay factor λ ∊ (0,1)
![Page 13: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/13.jpg)
Example
c-a c-t a-t c-r a-r
car
cat
car cat
Documents we want to compare
λ2 λ2λ30 0
λ2 λ2λ3 0 0
K(car, car) = 2λ4 + λ6
K(cat, cat) = 2λ4 + λ6
n=2
K(car, cat) = K(car, cat) = λ4
![Page 14: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/14.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 15: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/15.jpg)
Algorithm Definitions
• AlphabetLet Σ be the finite alphabet
• StringA string is a finite sequence of characters from alphabet with length |s|
• SubsequenceA vector of indices ij, sorted in ascending order, in a string ‘s’ such that they form the letters of a sequence
Eg: ‘lancasters’ = [4,5,9] Length of subsequence = in – i1 +1 = 9 - 4 + 1 = 6
![Page 16: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/16.jpg)
Algorithm Definitions• Feature Spaces
• Feature MappingThe feature mapping φ for a string s is given by
defining the u coordinate φu(s) for each u ∈ Σn
These features measure the number of occurrences of subsequences in the string s weighting them according to their lengths.
![Page 17: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/17.jpg)
String Kernel• The inner product between two mapped strings
is a sum over all the common weighted subsequence
λ2 λ2λ30 0
λ2 λ2λ3 0 0
K(car, cat) = λ4
![Page 18: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/18.jpg)
Intermediate Kernel
c-a c-t a-t c-r a-r
car
cat
λ2 λ2λ30 0
λ2 λ2λ3 0 0
λ3
λ3
Count the length from the beginning of the sequence through the end of the strings s and t.
K’
![Page 19: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/19.jpg)
Recursive Computation
Null sub-string
Target string is shorter than search sub-string
![Page 20: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/20.jpg)
c-a c-t a-t c-r a-r
car
cat
λ2λ30 0
λ2λ3 0 0
λ3
λ3
c-a c-t a-t c-r a-r
cart3
cat λ2λ3 0 0λ3
s
t
sx
t
λ4 λλ40 0
K’(car,cat) = λ6
K’(cart,cat) = λ7
λ3λ4
+λ7+λ5
K’
K’
![Page 21: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/21.jpg)
λ2 λ2λ30 0
λ2 λ2λ3 0 0
K(car,cat) = λ4
s
t
c-a c-t a-t c-r a-r
cart
cat λ2
λ2
λ3
λ4
λ2
λ3
0
λ3 λ2
0
K(cart,cat) = λ4
sx
t
+λ7 +λ5
K
K
![Page 22: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/22.jpg)
Recursive ComputationNull sub-string
Target string is shorter than search sub-string
O(n |s||t|2) O(n |s||t|)Dynamic
ProgrammingRecursion
![Page 23: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/23.jpg)
Efficiency
O(|Σ|n)
O(n |s||t|)
O(n |s||t|2)
All subsequences of length n.
![Page 24: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/24.jpg)
Kernel Normalization
![Page 25: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/25.jpg)
Setting Algorithm Parameters
![Page 26: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/26.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 27: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/27.jpg)
Kernel Approximation
Suppose, we have some training points (xi, y
i)∈ X × Y , and
some kernel function K(x,z) corresponding to a feature
space mapping φ : X → F such that K(x, z) = ⟨φ(x), φ(z)⟩.
Consider a set S of vectors S = {si
∈ X }.
If the cardinality of S is equal to the dimensionality of the
space F and the vectors φ(si) are orthogonal
(i.e. K(si,s
j) = Cδ
ij)*, then the following is true:
*
![Page 28: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/28.jpg)
Kernel ApproximationIf instead of forming a complete orthonormal basis, the cardinality of S Q ⊆ S is less than the dimensionality of X or the vectors si are not fully orthogonal, then we can construct an approximation to the kernel K:
If the set S Q is carefully constructed, then the production of a Gram matrix which is closely aligned to the true Gram matrix can be achieved with a fraction of the computational cost.
Problem : Choose the set S Q to ensure that the vectors φ(si) are orthogonal.
![Page 29: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/29.jpg)
Selecting Feature SubsetHeuristic for obtaining the set S Q is as follows:
1.We choose a substring size n.
2.We enumerate all possible contiguous strings of length n.
3.We choose the x strings of length n which occur most frequently in the dataset and this forms our set S Q.
By definition, all such strings of length n are orthogonal (i.e. K(si,sj) = Cδij for some constant C) when used in conjunction with the string kernel of degree n.
![Page 30: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/30.jpg)
Kernel Approximation Results
![Page 31: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/31.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 32: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/32.jpg)
EvaluationDataset : Reuters-21578, ModeApte Split
Categoried Selected:Precision = relevant documents categorized relevant / total documents categorized relevant
Recall = relevant documents categorized relevant/total relevant documents
F1 = 2*Precision*Recall/Precision+Recall
![Page 33: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/33.jpg)
Evaluation
![Page 34: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/34.jpg)
Evaluation
![Page 35: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/35.jpg)
Evaluation Effectiveness of Sequence Length
[k = 7] [k = 5]
[k = 6] [k = 5]
[k = 5]
[k = 5][k = 5]
[k = 5]
![Page 36: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/36.jpg)
EvaluationEffectiveness of Decay Factor
λ = 0.3
λ = 0.03
λ = 0.05
λ = 0.03
![Page 37: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/37.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 38: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/38.jpg)
Follow Up• String Kernel using sequences of words rather than
characters, less computationally demanding, no fixed decay factor, combination of string kernels
Cancedda, Nicola, et al. "Word sequence kernels." The Journal of Machine Learning Research 3 (2003): 1059-1082.
• Extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels.
Bunescu, Razvan, and Raymond J. Mooney. "Subsequence kernels for relation extraction." NIPS. 2005.
![Page 39: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/39.jpg)
Follow Up
•Homology – Computational biology method to identify the ancestry of proteins.
Model should be able to tolerate upto m-mismatches. The kernels used in this method measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m-mismatches.
![Page 40: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/40.jpg)
Overview
• Motivation
• Kernel Methods
• Algorithms - with increasingly better
efficiency
• Approximation
• Evaluation
• Follow Up
• Conclusion
![Page 41: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/41.jpg)
ConclusionKey Idea: Using non-contiguous string subsequences to compute similarity between documents with a decay factor which discounts similarity according to the degree of contiguity
•Highly computationally intensive method – authors reduced the time complexity from O(|Σ|n) to O(n|s||t|) by a dynamic programming approach
•Still less intensive method – Kernel Approximation by Feature Subset Selection.
•Empirical estimation of k and λ, from experimental results
•Showed promising results only for small datasets
•Seems to mimic stemming for small datasets
![Page 42: Text classification using Text kernels](https://reader036.fdocuments.net/reader036/viewer/2022062320/55b2b890bb61eba86b8b45dc/html5/thumbnails/42.jpg)
Any Q?Thank You :)