Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
TIARA: A Visual Exploratory Text Analytic System
Presenter : Wei-Hao Huang Authors : Furu Wei, Shixia Liu, Yangqiu Song, Shimei Pan
Michelle X. Zhou, Weihong Qian, Lei Shi, Li Tan Qiang Zhang
SIGKDD 2010
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
Motivation· The large collection of text to locate needed
information or simply deciding is very costly and time-consuming.
· Although a number of text analysis technologiesare often abstract and complex, may not be consumable by users.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
4
• To present exploratory visual analytic system called TIARA (Text Insight via Automated Responsive Analytics).
• To combine text analytics and interactive visualization to help users explore and analyze large collections of text.
Documents TIARA System
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
5
Methodology· TIARA
Topic Analysis Topic Ranking Keyword based Topic Summarization Time-sensitive Keyword Extraction
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.TIARA
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.TIARA System architecture
7
Database File system
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Topic Analysis· To use unsupervised learning methods.· is the number of Document · is word of Document· is vocabulary of size· K is the number of topic· is document-topic distribution
matrix· is topic-word distribution matrix
8
N1 N2
K1 0 1
K2 1 1
K1 K2
V1 0.3 0.7
V2 0.8 0.1
Term frequencies in each cluster
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Topic Ranking· Topic rank is measured by a combination of
both topic content coverage and topic variance.
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Keyword based Topic Summarization
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Time-sensitive Keyword Extraction
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
12
Time-sensitive Keyword Extraction
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Time-sensitive keyword extraction procedure
Completeness Distinctiveness
· Response Time· Data set:
A personal email collection with 8326 email messages. Emergency room data set containing 23,501 patient
records.
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Completeness· Defined as whether we can recover the
original keywords of a topic by combining the keywords associated associated with each time segment.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Distinctiveness· Defined as whether we can distinguish one
topic segment from another based on their associated keywords to avoid redundancy.
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Completeness and Distinctiveness Results
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Response Time
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
18
Conclusions
• TIARA tightly integrates text analytics with interactive visualization to support effective exploratory text analysis.
• Future work Add sentence-base summaries Support other languages Improve performance
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
19
Comments· Advantages
─ To explore and analyze large text collections with interactive visualization
· Applications─ Text mining
Top Related