Computational Molecular Biology Prof. Peng-Yeng Yin Ming Chuan University Taiwan.
Progress Report Related work in KM Advisor: Prof. Hahn-Ming Lee Prof. Jan-Ming Ho Reporter: Shou-Wei...
-
Upload
antonia-cunningham -
Category
Documents
-
view
221 -
download
0
Transcript of Progress Report Related work in KM Advisor: Prof. Hahn-Ming Lee Prof. Jan-Ming Ho Reporter: Shou-Wei...
Progress ReportRelated work in KM
Advisor: Prof. Hahn-Ming Lee
Prof. Jan-Ming Ho
Reporter: Shou-Wei Ho
Chung-Hung Lin
2009.08.31
1
Related work in KM (Knowledge Management)
Mining Academic Community
Social network analysis
Research domain detector
Conflict of Interesting Analysis (家慶,俊佑,秋宜)
NSC Expert Finding System
Expert Finding(家慶,俊佑,泰良)
Researcher Ranking (泰良)
Building Academic Database
Mining Academic Network
Researcher page finder
Page parserWAD: Web Appearance Disambiguation (坤彥)
Chinese Name Translation (威達)
Data parser BibPro: Citation Parser (建智)
PLF: Publication List Pages Finder (任明)
Authorship Disambiguation (建毅,信璁)
CRE: Citation Record Extractor (水石)
Author Personal Data Parser
English Name Translation (任明)
Automatic Survey
Publication Representation Evaluation
Academic Contribution
System
Academic Contribution Analysis
Citation Matching(大為)
Research Domain Analysis (紹威)
Document Topic Discovery (桀宏)
Indexing Integration (大為) 2
Challenges in Chinese name translation
• Many pronunciation rules in different areas– 陳 Chen (Taiwan)
陳 Tsun (Hong Kong)
陳 Tan (Fukien)
• Some additional words exist.– Ex: 黃光明 (Kwang-Ming Frank Hwang)
Ex: 張韻詩 (Jane Win-Shih Liu)
4
Anchor text mapping1.
2.
3.
1.
2.3.
A. Personal main page
B. NVIDIA web site page
Search Name: Bill Mark
6
CRE: Why do we extract information from publication list web page? ( 水石 )
• Publication list page is an important resource for many value-added applications, such as citation analysis and academic network analysis.
• What could we get from publication list pages?– Some up-to-date literatures before they are
formally published– Some reference materials, such as slides and
talks.7
An automatic extractor
Structure Data
Extract
Web Page
Citation String
Detect 3 relationships cont.Detect 3 relationships cont.
8
Citation extracting Citation
Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory. 2(3) 113--124.
Our System
MetaDataAuthor: Chomsky, NoamTitle: Three models for the description of languageJournal: IRE Transactions on Information TheoryVolume: 2Issue: 3Page: 113-124Month:Year: 1956
9
A set of citations with the same author name
A cluster is a citation set of an author
Grouping
Suppose the number of authors is unknown
Name Disambiguation( 信璁 )
• Problem– Given a set of citations with the same author name, how do we
identify which one belongs to whom?
• Goal– To group the citations into several clusters, so that each cluster
represents an author
16
Procedure
Coauthorcorrelation
Author information correlation
SVM
Classify whether a pair of citations is published by the same author
Citation A Citation B
Title correlation
Venue correlation
Web correlation
Topic correlation
A pair of citations
17
Procedure
• Use classification result to group citations into several clusters– Each cluster contains citations belonging to the same author
Grouping
If SVM determines two citations are authored by the same person, then they are connected each other
18
Citation Correspondence( 大為 )
• Query construction:– A good query
• If proper records are achieved in digital libraries, good query should get them in search result, at the same time, proper records should have higher ranking.
• Search result should be small.
• Citation correspondence:– Find proper records from search result by matching
local citation string and records in search result.• Field-by-field comparison.
– May be not enough due to errors in digital libraries (optional).
• Metrics: precision, recall, and F-measure.
19
Partial Solution: Abbreviation Matching
v1
v2
Example: CIKM = Conference on Information and Knowledge Management
20