Aspect Based Clustering for Turkish News

14
Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu

description

Aspect Based Clustering for Turkish News. Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu. Outline. Introduction Motivation Aspect Based Clustering Modeling Aspects Aspect Extraction Framing Cycle-Aware Clustering User Interface & Demo Conclusion References. Introduction. - PowerPoint PPT Presentation

Transcript of Aspect Based Clustering for Turkish News

Page 1: Aspect Based Clustering for Turkish News

Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu

Page 2: Aspect Based Clustering for Turkish News

Introduction Motivation Aspect Based Clustering

◦ Modeling Aspects◦ Aspect Extraction◦ Framing Cycle-Aware Clustering

User Interface & Demo Conclusion References

2/14

Page 3: Aspect Based Clustering for Turkish News

News are produced in multiple stages:◦ Gathering, writing, editing, etc.

Subjective opinion of producers, owners, advertisers – biased environment

Effort needed for a comprehensive and balanced understanding of a news event

A system that guides and encourages reader to read news from different perspectives

3/14

Page 4: Aspect Based Clustering for Turkish News

Current systems provide limited presentation of news◦ Listing news arbitrarily or according to date

A system that helps users reach news from different viewpoints via a single portal

Capture the difference of aspects within articles reporting a common news story

Use of advanced computational techniques of information retrieval

4/14

Page 5: Aspect Based Clustering for Turkish News

5/14

Page 6: Aspect Based Clustering for Turkish News

Aspect: keyword-weight pairs Keywords are extracted from

◦ Head, sub-head, lead GATE (General Architecture for Text

Engineering)◦ Person, organization, location

Event extraction (Zemberek)◦ Frequently used action words/phrases

6/14

Page 7: Aspect Based Clustering for Turkish News

7/14

Page 8: Aspect Based Clustering for Turkish News

Set of articles on a news shows head-tail characteristics

Head – common aspects Tail – uncommon aspects Separation of head and tail provides

effective classification Two steps:

◦ Head-tail partitioning◦ Tail-side clustering

8/14

Page 9: Aspect Based Clustering for Turkish News

Generate common-uncommon keyword sets HgP: head group proportion Calculate keyword commonness &

uncommonness Commonness – an article with many

common keywords with high weight values Uncommonness - an article with many

uncommon keywords with high weight values

9/14

Page 10: Aspect Based Clustering for Turkish News

Agglomerative hierarchical clustering Similarity measure – Cosine similarity During Agglomerative Clustering

◦ Each object forms a cluster of its own as a singleton

◦ Pairs of clusters are merged iteratively until a certain stopping criterion is met

◦ In the merging process - the similarity between two clusters is measured by the similarity of the most similar pair of sequences belonging to these two clusters (the single-link approach)

10/14

Page 11: Aspect Based Clustering for Turkish News

Simple & user-friendly Present news from different aspects fairly Motivate reader to read news from different

aspects

11/14

Page 12: Aspect Based Clustering for Turkish News

Existing systems: Google news, Yahoo News◦ Limited presentation◦ News listed arbitrarily

Proposed system:◦ Gathers same news with existing systems◦ Clusters news according to aspects◦ Simple user interface◦ Easy to track news stories

The approach is suitable for Turkish news

12/14

Page 13: Aspect Based Clustering for Turkish News

[1] Park, S., Kang, S., Lee, S., Chung, S., Song, J. Mitigating Media Bias: A Computational Approach. ACM, 2008, pp. 47-51.

[2] Park, S., Kang, S., Chung, S., Song, J. NewsCube: Delivering Multiple Aspects of News to Mitigate Media Bias. ACM, 2009.

[3] Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. ACL'02, 2002.

[4] Park, S., Lee, S., Song, J. Aspect-level News Browsing: Understanding News Events from Multiple Viewpoints. ACM, 2010, pp. 41-50.

13/14

Page 14: Aspect Based Clustering for Turkish News