Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

54
Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering Alberto P´ erez Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez NLP & IR Group, Distance Learning University (UNED) CICLing 2012, New Delhi, India March 15, 2012

description

Slides for CICLing 2012, New Delhi, India. http://nlp.uned.es/~alpgarcia/pub_index.php

Transcript of Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Page 1: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Fuzzy Combinations of Criteria: An Application toWeb Page Representation for Clustering

Alberto Perez Garcıa-Plaza, Vıctor Fresno, Raquel Martınez

NLP & IR Group, Distance Learning University (UNED)

CICLing 2012, New Delhi, India

March 15, 2012

Page 2: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

2 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 3: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Motivation

Main goal

To understand how to represent web pages for clustering.

Question

How to combine different page features to represent web pages?

3 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 4: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Motivation

Main goal

To understand how to represent web pages for clustering.

Question

How to combine different page features to represent web pages?

3 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 5: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

4 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 6: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Web Page Representation

Hypothesis

A good document representation should be based on how humansread documents.

5 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 7: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Different Criteria for Web PageRepresentation

Criteria:

6 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 8: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Different Criteria for Web PageRepresentation

Criteria:�� ��Title

6 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 9: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Different Criteria for Web PageRepresentation

Criteria:�� ��Title

�� ��Emphasis

6 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 10: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Different Criteria for Web PageRepresentation

Word positions:

6 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 11: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Different Criteria for Web PageRepresentation

Word positions:�� ��Preferential

6 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 12: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Different Criteria for Web PageRepresentation

Word positions:�� ��Preferential

�� ��Standard

6 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 13: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

7 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 14: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Linear Combination of Criteria

For example: Analytical Combination of Criteria (acc)1.Importance of a term in a document:

Ik = tk it + ek ie + fk if + pk ip (1)

Ik = 1 ∗ 0.4 + 0.6 ∗ 0.3 + 0 ∗ 0.2 + 0 ∗ 0.1 = 0.4 (2)

Drawback

The importance of a term in a component is calculated regardlessthe rest of the components.

1V. Fresno and A. Ribeiro. An analytical approach to concept extraction in html environments. J. Intell. Inf.

Syst., 22(3):215–235, 2004.

8 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 15: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Linear Combination of Criteria

For example: Analytical Combination of Criteria (acc)1.Importance of a term in a document:

Ik = tk it + ek ie + fk if + pk ip (1)

Ik = 1 ∗ 0.4 + 0.6 ∗ 0.3 + 0 ∗ 0.2 + 0 ∗ 0.1 = 0.4 (2)

Drawback

The importance of a term in a component is calculated regardlessthe rest of the components.

1V. Fresno and A. Ribeiro. An analytical approach to concept extraction in html environments. J. Intell. Inf.

Syst., 22(3):215–235, 2004.

8 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 16: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Example: acc

Call to Arms

9 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 17: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Example: accExample of rethoric title

“Call to arms” is the title of apage that contains an articleabout the new trades made byNew York Yankees baseball teamand how these trades affect toBoston Red Sox, their main rivalin the Major League Baseball.

Drawback

Title terms are not related todocument topic.

10 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 18: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

11 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 19: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Nonlinear Combination of Criteria

Fuzzy Combination of Criteria (fcc)2 allows nonlinearcombinations of criteria.

It is possible to define related conditions.

It produces vectors within the VSM.

2A. Ribeiro, V. Fresno, M. C. Garcia-Alegre, and D. Guinea. A fuzzy system for the web page representation.

2003.

12 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 20: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Nonlinear Combination of Criteria

Fuzzy Combination of Criteria (fcc)2 allows nonlinearcombinations of criteria.

It is possible to define related conditions.

It produces vectors within the VSM.

2A. Ribeiro, V. Fresno, M. C. Garcia-Alegre, and D. Guinea. A fuzzy system for the web page representation.

2003.

12 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 21: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Nonlinear Combination of Criteria

Fuzzy Combination of Criteria (fcc)2 allows nonlinearcombinations of criteria.

It is possible to define related conditions.

It produces vectors within the VSM.

2A. Ribeiro, V. Fresno, M. C. Garcia-Alegre, and D. Guinea. A fuzzy system for the web page representation.

2003.

12 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 22: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Example: fcc

Example of rethoric title

Now, we can express that a termshould appear in the title andemphasized to be consideredimportant.

Nonlinearity

Title terms can be considered notimportant because they do notappear in the rest of the text.

13 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 23: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Example: fcc

Example of rethoric title

Now, we can express that a termshould appear in the title andemphasized to be consideredimportant.

Nonlinearity

Title terms can be considered notimportant because they do notappear in the rest of the text.

13 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 24: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

A quick glance at fcc

Close to natural language.

Knowledge base: defined by a set of IF-THEN rules.

Rules are based on how humans read documents.

14 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 25: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

A quick glance at fcc

Close to natural language.

Knowledge base: defined by a set of IF-THEN rules.

Rules are based on how humans read documents.

14 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 26: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

A quick glance at fcc

Close to natural language.

Knowledge base: defined by a set of IF-THEN rules.

Rules are based on how humans read documents.

14 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 27: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

15 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 28: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

16 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 29: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Basic Clustering Settings

We remove stopwords, punctuation and suffixes (Porter’salgorithm).

Clustering: Cluto-rbr with default parameters.

Web page representations: tf-idf and fcc

Dimension reduction techniques (100, 500, 1000, 2000 and5000 features): mft and lsi.

Banksearch and Webkb.

F-measure to evaluate clustering quality.

17 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 30: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Basic Clustering Settings

We remove stopwords, punctuation and suffixes (Porter’salgorithm).

Clustering: Cluto-rbr with default parameters.

Web page representations: tf-idf and fcc

Dimension reduction techniques (100, 500, 1000, 2000 and5000 features): mft and lsi.

Banksearch and Webkb.

F-measure to evaluate clustering quality.

17 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 31: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Basic Clustering Settings

We remove stopwords, punctuation and suffixes (Porter’salgorithm).

Clustering: Cluto-rbr with default parameters.

Web page representations: tf-idf and fcc

Dimension reduction techniques (100, 500, 1000, 2000 and5000 features): mft and lsi.

Banksearch and Webkb.

F-measure to evaluate clustering quality.

17 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 32: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Basic Clustering Settings

We remove stopwords, punctuation and suffixes (Porter’salgorithm).

Clustering: Cluto-rbr with default parameters.

Web page representations: tf-idf and fcc

Dimension reduction techniques (100, 500, 1000, 2000 and5000 features): mft and lsi.

Banksearch and Webkb.

F-measure to evaluate clustering quality.

17 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 33: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Basic Clustering Settings

We remove stopwords, punctuation and suffixes (Porter’salgorithm).

Clustering: Cluto-rbr with default parameters.

Web page representations: tf-idf and fcc

Dimension reduction techniques (100, 500, 1000, 2000 and5000 features): mft and lsi.

Banksearch and Webkb.

F-measure to evaluate clustering quality.

17 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 34: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Basic Clustering Settings

We remove stopwords, punctuation and suffixes (Porter’salgorithm).

Clustering: Cluto-rbr with default parameters.

Web page representations: tf-idf and fcc

Dimension reduction techniques (100, 500, 1000, 2000 and5000 features): mft and lsi.

Banksearch and Webkb.

F-measure to evaluate clustering quality.

17 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 35: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

18 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 36: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Dimension Reduction Analysis

Hypothesis

If lsi improves mft, then the weighting function is not able to findthe most representative terms.

19 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 37: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Rep. Avg. S.D.Banksearchtf-idf mft 0,748 0,028tf-idf lsi 0,756 0,005fcc mft 0,756 0,019fcc lsi 0,769 0,011Webkbtf-idf mft 0,460 0,051tf-idf lsi 0,507 0,006fcc mft 0,469 0,009fcc lsi 0,466 0,011

Conclusion

The weighting function is not working as well as it could.

20 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 38: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Rep. Avg. S.D.Banksearchtf-idf mft 0,748 0,028tf-idf lsi 0,756 0,005fcc mft 0,756 0,019fcc lsi 0,769 0,011Webkbtf-idf mft 0,460 0,051tf-idf lsi 0,507 0,006fcc mft 0,469 0,009fcc lsi 0,466 0,011

Conclusion

Results for fcc in Webkb dataset are surprisingly bad.

20 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 39: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

21 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 40: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Results for Criteria Analysis

Rep.\Dim. 100 500 1000 2000 5000Banksearchfcc mft 0,723 0,757 0,768 0,765 0,768title 0,626 0,646 0,632 0,634 0,639emphasis 0,586 0,671 0,674 0,685 0,693frequency 0,689 0,715 0,720 0,724 0,731position 0,310 0,525 0,538 0,599 0,608

For Banksearch, fcc get always higher values than individualcriteria, so the combination works better in all cases.

Frequency seems to be the best among the individual criteria.

22 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 41: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Results for Criteria Analysis

Rep.\Dim. 100 500 1000 2000 5000Webkbfcc mft 0,453 0,472 0,475 0,468 0,475title 0,432 0,433 0,404 0,488 0,479emphasis 0,415 0,431 0,433 0,465 0,489frequency 0,441 0,460 0,460 0,468 0,446position 0,301 0,283 0,317 0,281 0,286

For Webkb, fcc does not always outperform the others.

Frequency is not always the best among the individual criteria.

When title and emphasis could lead to a better clustering, thecombination get worse.

23 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 42: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

24 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 43: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Improving the Combination

Frequency should influence the decision more than position.

IF Title AND Frequency AND Emphasis AND Position THEN ImportanceLow Medium Low Preferential ⇒ LowLow Medium Low Standard ⇒ No

25 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 44: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Extended Fuzzy Combination of Criteria (efcc)

IF Title AND Frequency AND Emphasis AND Position THEN ImportanceHigh High ⇒ Very HighHigh Medium Preferential ⇒ HighHigh Medium Standard ⇒ MediumHigh Low Preferential ⇒ MediumHigh Low Standard ⇒ LowLow High Preferential ⇒ HighLow High Standard ⇒ MediumLow Medium Preferential ⇒ MediumLow Medium Standard ⇒ LowLow Low Preferential ⇒ LowLow Low Standard ⇒ No

High ⇒ Very HighMedium ⇒ MediumLow ⇒ No

26 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 45: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

System Comparison

With efcc, both reduction methods get similar results.

Rep. Avg. S.D.Banksearchtf-idf lsi 0,756 0,005fcc lsi 0,769 0,011efcc mft 0,760 0,014efcc lsi 0,758 0,013Webkbtf-idf lsi 0,507 0,006fcc mft 0,469 0,009efcc mft 0,532 0,032efcc lsi 0,483 0,000

27 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 46: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

System Comparison

efcc solves the problems of fcc in Webkb.

Rep. Avg. S.D.Banksearchtf-idf lsi 0,756 0,005fcc lsi 0,769 0,011efcc mft 0,760 0,014efcc lsi 0,758 0,013Webkbtf-idf lsi 0,507 0,006fcc mft 0,469 0,009efcc mft 0,532 0,032efcc lsi 0,483 0,000

27 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 47: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Table of Contents

1 MotivationWeb Page RepresentationLinear Combination of CriteriaNonlinear Combination of Criteria

2 Understanding the systemExperimental SettingsDimension Reduction AnalysisStudy of Individual Criteria

3 Improving the Combination

4 Summary

28 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 48: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Summary

We present a term weighting function based on how humanread documents.

The representation is not oriented to concrete sets of webpages.

Nonlinear systems help express relations among criteria.

With a good term weighting function it is possible to uselightweight dimension reduction techniques.

Our system try to ease the communication between technicaland linguistic experts.

Anchor texts were also studied as a way of adding contextualinformation.

29 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 49: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Summary

We present a term weighting function based on how humanread documents.

The representation is not oriented to concrete sets of webpages.

Nonlinear systems help express relations among criteria.

With a good term weighting function it is possible to uselightweight dimension reduction techniques.

Our system try to ease the communication between technicaland linguistic experts.

Anchor texts were also studied as a way of adding contextualinformation.

29 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 50: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Summary

We present a term weighting function based on how humanread documents.

The representation is not oriented to concrete sets of webpages.

Nonlinear systems help express relations among criteria.

With a good term weighting function it is possible to uselightweight dimension reduction techniques.

Our system try to ease the communication between technicaland linguistic experts.

Anchor texts were also studied as a way of adding contextualinformation.

29 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 51: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Summary

We present a term weighting function based on how humanread documents.

The representation is not oriented to concrete sets of webpages.

Nonlinear systems help express relations among criteria.

With a good term weighting function it is possible to uselightweight dimension reduction techniques.

Our system try to ease the communication between technicaland linguistic experts.

Anchor texts were also studied as a way of adding contextualinformation.

29 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 52: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Summary

We present a term weighting function based on how humanread documents.

The representation is not oriented to concrete sets of webpages.

Nonlinear systems help express relations among criteria.

With a good term weighting function it is possible to uselightweight dimension reduction techniques.

Our system try to ease the communication between technicaland linguistic experts.

Anchor texts were also studied as a way of adding contextualinformation.

29 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 53: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Summary

We present a term weighting function based on how humanread documents.

The representation is not oriented to concrete sets of webpages.

Nonlinear systems help express relations among criteria.

With a good term weighting function it is possible to uselightweight dimension reduction techniques.

Our system try to ease the communication between technicaland linguistic experts.

Anchor texts were also studied as a way of adding contextualinformation.

29 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering

Page 54: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

Motivation Understanding the system Improving the Combination Summary

Thank You!

30 / 30

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering