Progress Report Related work in KM Advisor: Prof. Hahn-Ming Lee Prof. Jan-Ming Ho Reporter: Shou-Wei...

22
Progress Report Related work in KM Advisor: Prof. Hahn-Ming Lee Prof. Jan-Ming Ho Reporter: Shou-Wei Ho Chung-Hung Lin 2009.08.31 1

Transcript of Progress Report Related work in KM Advisor: Prof. Hahn-Ming Lee Prof. Jan-Ming Ho Reporter: Shou-Wei...

Progress ReportRelated work in KM

Advisor: Prof. Hahn-Ming Lee

Prof. Jan-Ming Ho

Reporter: Shou-Wei Ho

Chung-Hung Lin

2009.08.31

1

Related work in KM (Knowledge Management)

Mining Academic Community

Social network analysis

Research domain detector

Conflict of Interesting Analysis (家慶,俊佑,秋宜)

NSC Expert Finding System

Expert Finding(家慶,俊佑,泰良)

Researcher Ranking (泰良)

Building Academic Database

Mining Academic Network

Researcher page finder

Page parserWAD: Web Appearance Disambiguation (坤彥)

Chinese Name Translation (威達)

Data parser BibPro: Citation Parser (建智)

PLF: Publication List Pages Finder (任明)

Authorship Disambiguation (建毅,信璁)

CRE: Citation Record Extractor (水石)

Author Personal Data Parser

English Name Translation (任明)

Automatic Survey

Publication Representation Evaluation

Academic Contribution

System

Academic Contribution Analysis

Citation Matching(大為)

Research Domain Analysis (紹威)

Document Topic Discovery (桀宏)

Indexing Integration (大為) 2

Problems in searching Chinese( 威達 ) name

Only Chinese Corpus

3

Challenges in Chinese name translation

• Many pronunciation rules in different areas– 陳 Chen (Taiwan)

陳 Tsun (Hong Kong)

陳 Tan (Fukien)

• Some additional words exist.– Ex: 黃光明 (Kwang-Ming Frank Hwang)

Ex: 張韻詩 (Jane Win-Shih Liu)

4

CMU Professor

?

Guitar Player and Singer

Ambiguous pages in the WWW( 坤彥 )

5

Anchor text mapping1.

2.

3.

1.

2.3.

A. Personal main page

B. NVIDIA web site page

Search Name: Bill Mark

6

CRE: Why do we extract information from publication list web page? ( 水石 )

• Publication list page is an important resource for many value-added applications, such as citation analysis and academic network analysis.

• What could we get from publication list pages?– Some up-to-date literatures before they are

formally published– Some reference materials, such as slides and

talks.7

An automatic extractor

Structure Data

Extract

Web Page

Citation String

Detect 3 relationships cont.Detect 3 relationships cont.

8

Citation extracting Citation

Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory. 2(3) 113--124.

Our System

MetaDataAuthor: Chomsky, NoamTitle: Three models for the description of languageJournal: IRE Transactions on Information TheoryVolume: 2Issue: 3Page: 113-124Month:Year: 1956

9

Authorship Disambiguation( 建毅 )

Prof. A’Prof. A

? ?

10

Detect 3 relationships(COI)

1. Teacher-Student

Prof. AProf. B

Student C

11

2. Co-author

Detect 3 relationships cont.

Prof. SProf. A

Prof. B

12

3. Colleagues

Detect 3 relationships cont.

Prof. WProf. A

Prof. E

13

Mining a Chinese Person’s Name from the English Translation (任

明)

14

15

A set of citations with the same author name

A cluster is a citation set of an author

Grouping

Suppose the number of authors is unknown

Name Disambiguation( 信璁 )

• Problem– Given a set of citations with the same author name, how do we

identify which one belongs to whom?

• Goal– To group the citations into several clusters, so that each cluster

represents an author

16

Procedure

Coauthorcorrelation

Author information correlation

SVM

Classify whether a pair of citations is published by the same author

Citation A Citation B

Title correlation

Venue correlation

Web correlation

Topic correlation

A pair of citations

17

Procedure

• Use classification result to group citations into several clusters– Each cluster contains citations belonging to the same author

Grouping

If SVM determines two citations are authored by the same person, then they are connected each other

18

Citation Correspondence( 大為 )

• Query construction:– A good query

• If proper records are achieved in digital libraries, good query should get them in search result, at the same time, proper records should have higher ranking.

• Search result should be small.

• Citation correspondence:– Find proper records from search result by matching

local citation string and records in search result.• Field-by-field comparison.

– May be not enough due to errors in digital libraries (optional).

• Metrics: precision, recall, and F-measure.

19

Partial Solution: Abbreviation Matching

v1

v2

Example: CIKM = Conference on Information and Knowledge Management

20

Reviewer Recommendation( 泰良 )

21

COI in incomplete collaboration Network via social Interaction( 秋宜 )

22