A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in...
-
Upload
alexandra-wood -
Category
Documents
-
view
221 -
download
0
Transcript of A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in...
![Page 1: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/1.jpg)
A Neural Network A Neural Network Approach to Topic Approach to Topic SpottingSpotting
Presented by: Loulwah AlSumait
INFS 795 Spec. Topics in Data Mining
4.14.2005
![Page 2: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/2.jpg)
Article Information
Published inProceedings of SDAIR-95, 4th Annual
Symposium on Document Analysis and Information Retrieval 1995
AuthorsWiener E.,Pedersen, J.O.Weigend, A.S.
54 citations
![Page 3: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/3.jpg)
Summary
Introduction Related Work The Corpus
Representation Term Selection Latent Semantic Indexing
Generic LSI Local LSI
Cluster-Directed LSI Topic-Directed LSI
Relevancy Weighting LSI
![Page 4: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/4.jpg)
Summary
Neural Network Classifier Neural Networks for Topic
SpottingLinear vs. Non Linear NetworksFlat Architecture vs. Modular
Architecture Experiment Results
Evaluating PerformanceResults & discussions
![Page 5: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/5.jpg)
Introduction
Topic Spotting = Text Categorization = Text Classification
Problem of identifying which of a set of predefined topics are present in a natural language document.
Document
Topic 1
Topic 2
Topic n
![Page 6: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/6.jpg)
Introduction
Classification Approaches Expert system approach:
manually construct a system of inference rules on top of large body of linguistic and domain knowledge
could be extremely accurate very time consuming brittle to changes in the data environment
Data driven approach: induce a set of rules from a corpus of labeled training
documents practically better
![Page 7: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/7.jpg)
Introduction – Related Work
The major remarks regarding the related work:Separate classifier was constructed for
each topic.Different set of terms was used to train
each classifier.
![Page 8: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/8.jpg)
Introduction – The Corpus Reuters 22173 corpus of Reuters newswire
stories from 1987 21,450 stories 9,610 for training 3,662 for testing mean length: 90.6 words, SD 91.6 92 topics appeared at least once in the training set.
The mean is 1.24 topics/doc. (up to 14 topics for some doc.) 11,161 unique terms after preprocessing
inflectional stemming, stop word removal, conversion to lower case elimination of words appeared in fewer three documents
![Page 9: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/9.jpg)
Representations
starting point:Document Profile: term by document
matrix containing word frequency entries
vector di
dkdk
f
fP
2
![Page 10: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/10.jpg)
Representation
3/ 33
1/ 33
1/ 33
2/ 33Thorsten Joachims. 1997. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. http://citeseer.ist.psu.edu/joachims97text.html
![Page 11: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/11.jpg)
Representation - Term Selection
the subset of the original terms that are most useful for the classification task.
Difficult to select terms that discriminate between 92 classes while being small enough to serve as the feature set for a neural network Divide problem into 92 independent
classification tasks Search for best discriminator terms between
documents with the topic and those without
![Page 12: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/12.jpg)
Representation - Term Selection Relevancy Score
measures how unbalanced the term is across documents w/ or w/o the topic
Highly +ve and highly -ve scores indicate useful terms for discrimination
using about 20 terms yielded the best classification performance
61
61
log
t
kt
t
tk
k
dwd
w
r
No. of doc. w/ topic t & contain term k
Total No. of doc. w/ topic t
![Page 13: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/13.jpg)
Representation - Term Selection
![Page 14: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/14.jpg)
Advantage: little computation is required resulting features have direct interpretability
Drawback: many of best individual predictors contain
redundant information a term which may appear to be a very poor
predictor on its own may turn out to have great discriminative power in combination with other terms, and vise verse.
Apple vs. Apple Computers Selected Term Representation
(TERMS) with 20 features
Representation - Term Selection
TERMS
![Page 15: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/15.jpg)
Representation – LSI Transform original doc to lower-dimensional
space by analyzing correlational structure of terms in the document collection (Training Set): applying a singular-value decomposition
(SVD) to the original term by document matrix Get U, , V
(test set): Transform document vectors by projecting them into LSI space
Property of LSI: higher dimensions capture less of variance of original data drop w/ minimal loss. Found: performance continues to improve up to at least 250
dimensions Improvement rapidly slows dawn after about 100
dimensions Generic LSI Representation (LSI) with 200
features
LSI
![Page 16: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/16.jpg)
Representation – LSI
SVD
Reuters Corpus
Wool
Barley
Wheat
Money-supply
Zinc
Gold
Generic LSI Representation w/ 200 features
![Page 17: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/17.jpg)
Representation – Local LSI
Global LSI performs worse as frequency decreases infrequent topics are usually indicated by
infrequent terms and infrequent terms may be projected out of LSI and considered as mere noise.
Proposed two task-directed methods that make use of prior knowledge of the classification task
![Page 18: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/18.jpg)
Representation – Local LSI
What is Local LSI? modeling only the local portion of the corpus
related to those topics includes documents that use terminology
related to the topics (not necessary have any of the topics assigned)
Performing SVD over only the local set of documents
representation more sensitive to small, localized effects of infrequent terms.
representation more effective for classification of topics related to that local structure.
![Page 19: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/19.jpg)
Representation – Local LSI
Type of Local LSI:Cluster Directed representation
5 Meta-topics (clusters): Agriculture, Energy, Foreign exchange,
Government, and metals How to construct local region?
Break corpus into 5 clusters each containing all documents on corresponding meta-topic
Perform SVD for each Meta-topic region
Clustor-Directed LSI Representation (CD/LSI) with 200 featuresCD/LSI
![Page 20: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/20.jpg)
Representation – Local LSI
SVD
Reuters Corpus
Wool
Barley
Wheat
Money-supply
Zinc
Gold
![Page 21: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/21.jpg)
Energy
Metal
Foreign Exchange
Agriculture
Government
Representation – Local LSI
SVD
Reuters Corpus
Wool
Barley
Wheat
Money-supply
ZincGold
GOVERNMENT
AGRICULTURE
Foreign
Exchange
METAL
ENERGY
Clustor-Directed LSI Representation (CD/LSI) w/ 200 features
SVD
SVD
SVD
SVD
![Page 22: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/22.jpg)
Representation – Local LSI
Types of Local LSI: Term Directed representation
More fine-grained approach to local LSI Separate representation for each topic. How to construct the local region?
Use 100 most predictive terms for the topic. Pick N most similar documents.
N = 5 * No. of documents containing topic, 350 N 110
Final Documents in topic region = N documents + 150 random documents
Topic-Directed LSI Representation (TD/LSI) with 200 features
![Page 23: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/23.jpg)
Representation – Local LSI
SVD
Reuters Corpus
Wool
Barley
Wheat
Money-supply
Zinc
Gold
![Page 24: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/24.jpg)
Representation – Local LSIReuters Corpus
Wool
Barley
Wheat
Money-supply
Zinc
Gold
SVD
SVD
SVD
SVD
SVD
SVD
Term-Directed LSI Representation (TD/LSI) w/ 200 features
![Page 25: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/25.jpg)
Drawback of Local LSI:Narrower the region, the Lower
flexibility in representations for modeling the classification of multiple topics
High computational overhead
Representation – Local LSI
![Page 26: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/26.jpg)
Representation - Relevancy Weighting LSI
Use term weight to emphasize the importance of particular terms before applying SVD IDF weighting
importance of low frequency terms the importance of high frequency terms Assumes low frequency terms to be better
discriminators than high frequency terms
![Page 27: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/27.jpg)
Relevancy Weighting tune the IDF assumption emphasize terms in proportion to their estimated topic
discrimination power Global Relevancy Weighting of term k (GRWk)
Final Weighting of term k = IDF2 * GRWk
all low frequency terms pulled up by IDF Poor predictors pushed down leaving only relevant low frequency terms with high
weights Relevancy Weighted LSI Representation
(REL/LSI) with 200 features
t
kk rGRW
Representation - Relevancy Weighting LSI
61
61
log
t
kt
t
tk
k
dwd
w
r
![Page 28: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/28.jpg)
Neural Network Classifier (NN)
NN consists of:processing units (Neurons)weighted links connecting neurons
![Page 29: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/29.jpg)
major components of NN model:architecture: defines the functional form
relating input to output network topology unit connectivity activation functions: e.g. Logistic regression
fn.
Neural Network Classifier (NN)
![Page 30: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/30.jpg)
Neural Network Classifier (NN) Logistic regression
function
z =
is a linear combination of the input features
p (0,1) - can be converted to binary classification method by thresholding the output probability
zep
1
1
)...(110 nn
xwxww
A
f(A)
+1
-1
0
Sigmoid Function
![Page 31: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/31.jpg)
major components of NN model (cont):search algorithm: the search in weight
space for a set of weights which minimizes the error between the output and the expected output (TRAINING PROCESS)
Backpropagation method Mean squared errors Cross-entropy error performance
functionC = - sum [all cases and outputs] (d*log(y) + (1-d)*log(1-y) ) d: desired output, y: actual output
Neural Network Classifier (NN)
![Page 32: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/32.jpg)
NN for Topic Spotting
Network outputs are estimates of the probability of topic presence given the feature vector of a document
Generic LSI representation each network uses same representation
Local LSI representationdifferent representation for each network
![Page 33: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/33.jpg)
Linear NNOutput units with logistic activation and
no hidden layer
NN for Topic Spotting
n2
1
![Page 34: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/34.jpg)
NN for Topic Spotting
Non Linear NN Simple networks with a single hidden layer
of logistic sigmoid units (6 – 15)
![Page 35: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/35.jpg)
NN for Topic SpottingFlat Architecture
Separate network for each topic use entire training set to train for each topic Avoiding overfitting
problem by adding penalty term
to the cross-entropy cost function to encourage eliminationof small weights.
Early stopping based on cross-validation
![Page 36: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/36.jpg)
NN for Topic SpottingModular Architecture
decompose learning problem into smaller problems
Meta-Topic Network trained on full training set estimate the presence probability of the five topics in
doc. use 15 hidden units
![Page 37: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/37.jpg)
NN for Topic SpottingModular Architecture
five groups of local topic networks consists of local topic networks for each topic in meta-
topic each network trained only on the meta-topic region
![Page 38: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/38.jpg)
NN for Topic SpottingModular Architecture
five groups of local topic networks (cont.) Example: wheat network trained Agriculture meta-topic. Focus on finer distinctions, e.g. wheat and grain Don’t waste time on easier distinctions, e.g. wheat and
gold. Each local topic networks uses 6 hidden units.
![Page 39: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/39.jpg)
NN for Topic SpottingModular Architecture
To compute topic predictions for a given document
Present document to meta-topic network Present document to each of the topic networks Outputs of meta-topic network estimate of topic
networks = final topic estimates
![Page 40: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/40.jpg)
Experimental Results
Evaluating PerformanceMean squared error between actual and
predicted values is inefficient Compute precision and recall based on
contingency table constructed over range of decision thresholds
How to get the decision Thresholds?
![Page 41: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/41.jpg)
Experimental Results
Evaluating PerformanceHow to get the decision Thresholds?
Proportional assignment
Topic = ‘wool’
Topic ‘wool’
Predicted Topic = ‘wool’ iff Output probability = output probability of kp’th highest rank doc.K integer, P prior probability of “wool” topic
Predicted Topic ‘wool’, iffoutput probability <
![Page 42: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/42.jpg)
Experimental Results
Evaluating PerformanceHow to get the decision Thresholds?
fixed recall level approach determine set of recall levels analyze ranked documents to determine what
decision thresholds lead to the desired set of recall levels. Topic =
‘wool’Topic ‘wool’
Predicted Topic = ‘wool’ iff Output probability = output probability of doc. where # of doc. with higher output probability Leads to desired recall level
Predicted Topic ‘wool’, iffoutput probability <
TargetRecall
![Page 43: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/43.jpg)
Experimental Results Performance by Micoraveraging
add all contingency tables together across topics at a certain threshold
compute precision and recallused proportional assignment for picking
decision thresholdsdoes not weight the topics evenlyused for comparisons to previously
reported resultsBreakeven point is used as a summary
value
![Page 44: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/44.jpg)
Experimental Results
Performance by Macoraveragingcompute precision and recall for each
topictake the average across topicsused fixed set of recall levelssummary values are obtained for
particular topics by averaging precision over the 19 evenly spaced recall levels between 0.05 and 0.95
![Page 45: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/45.jpg)
Experimental Results
Microaveraged performance Breakpoints
compared to best algorithm:
rule induction method best on heuristic search with breakpoint (0.789)
0.82
0.801
0.795
0.775
![Page 46: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/46.jpg)
Experimental Results
Macroaveraged performance TERMS appears
much closer to other three.
Relative effectivenessof the representationsat low recall levels isreversed at high recalllevels
![Page 47: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/47.jpg)
Slight improvement of nonlinear
networks
LSI performance
degrades compared to
TERMS when ft
decreases
Six techniques performance on54 most frequent topics considerable
variation of performance across topics
relative ups and downs are mirrored in both plots
![Page 48: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/48.jpg)
Experimental Results Performance of Combination of
Techniques and Its Improvement NN architecture
DocumentRepresentation
Flat Modular(Meta-Topic NW trained using LSI representation)
Linear Non Linear Linear Non Linear
TERMS
LSI
CD-LSI
TD-LSI
REL-LSI
Hybrid (CD-LSI + TERMS)
Match color & shape to get an experiment
![Page 49: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/49.jpg)
Experimental Results
Flat Networks
![Page 50: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/50.jpg)
Experimental Results
Modular Networks 4 clusters only used Recomputed average precision for the flat
networks
![Page 51: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/51.jpg)
Non linear networks seem to perform better than the linear models, but the difference is very slight.Non linear networks seem to perform better than the linear models, but the difference is very slight.
![Page 52: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/52.jpg)
LSI representation is able to equal or exceed TERMS performance
for high frequency topic, but performs poorly for
low frequency
![Page 53: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/53.jpg)
Task-Directed LSI representations improve performance in the low
frequency domain TD/LSI Trade-off Cost
REL/LSI Trade-off lower performance on m/h topics
![Page 54: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/54.jpg)
Modular CD/LSI improves performance
further for low frequency, because individual
networks are trained only in the domain that LSI
was performed
![Page 55: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/55.jpg)
TERMS proves to be competitive to more
sophisticated LSI technique most topics are predictable
by small set of terms
![Page 56: A Neural Network Approach to Topic Spotting Presented by: Loulwah AlSumait INFS 795 Spec. Topics in Data Mining 4.14.2005.](https://reader033.fdocuments.net/reader033/viewer/2022042703/56649e7c5503460f94b7dd77/html5/thumbnails/56.jpg)
Discussion Rich solution – many representations
and many models Total Supervised approach Results are lower than what expected
Is the dataset responsible? High computational overhead Does NN deserve a place in DM tool
boxes?
Questions?Questions?