Solving Some Text Mining Problems with Conceptual Graphs

19
Solving Some Text Mining Problems with Conceptual Graphs M. Bogatyrev, V. Tuhtin Tula State University Faculty of Cybernetics Laboratory of Information Systems 2008

description

Tula State University Faculty of Cybernetics Laboratory of Information Systems. Solving Some Text Mining Problems with Conceptual Graphs. M. Bogatyrev , V. Tuhtin. 200 8. The Nature of Text Mining. Data mining: "the nontrivial extraction of implicit, previously unknown, and - PowerPoint PPT Presentation

Transcript of Solving Some Text Mining Problems with Conceptual Graphs

Page 1: Solving Some Text Mining Problems with Conceptual Graphs

Solving Some Text Mining Problems with Conceptual Graphs

M. Bogatyrev, V. Tuhtin

Tula State UniversityFaculty of Cybernetics

Laboratory of Information Systems

2008

Page 2: Solving Some Text Mining Problems with Conceptual Graphs

The Nature of Text MiningThe Nature of Text Mining

1. W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992), Knowledge Discovery in Databases: An Overview, AI Magazine: pp. 213–2282. D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA.

Data mining:"the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"[1] "the science of extracting useful information from large data sets or databases."[2]

Text mining:• process of deriving high quality information from text;• text data mining;• text analytics

• information retrieval, • machine learning, • statistics,

Computational Linguistics

Text mining is interdisciplinary:

Page 3: Solving Some Text Mining Problems with Conceptual Graphs

Computational (Corpora) Computational (Corpora) LinguisticsLinguistics

• text categorization, • text clustering, • concept/entity extraction, • sentiment analysis, • document summarization

Natural Language Processing

•annotation•abstraction• ontologies•semantic roles •Objects of tagging

• clusters,• trends,• associations,• deviations

Corpora:• large and structured text• taggingData:

Plain text

Knowledge Models:• rules;• ontologies

Processing objects

Metadata

Knowledge DiscoveryText Text

MiningMiningGlobal Problems

Analysis of: • syntax• grammar• morphology• semantics

Problems

Page 4: Solving Some Text Mining Problems with Conceptual Graphs

Conceptual Graph::Example:“John is going to Boston by bus”

Concepts

Conceptual relations

[City*a:'Boston'] [Bus*b:''] [Person*c:'John'] [Going*d:''] (agent?d?c) (dest?d?a) (instrument?d?b)

Representations. Conceptual Graph Standard by J. Sowa [8]

1. Conceptual Graph Interchange Form (CGIF)

2. XML Form<graph id="35979486054" owner="0"> <type> <label>Proposition</label> </type>

<layout> <rectangle x="0.0" y="0.0" width="1500.0" height="1500.0"/> <color foreground="0,0,175" background="0,0,175"/> </layout> … </layout> </arrow>

</graph></conceptualgraph>

( : )( : )( : )( : )( ( , ' ') ( , ' ') ( , ) ( , ) ( , ))

x Go y Person z City w Bus Name y JohnName z Boston Agnt x y Dest x z Inst x w

Applying Predicate Calculus(CGIF + NOTIO)

Page 5: Solving Some Text Mining Problems with Conceptual Graphs

Conceptual Graphs in Conceptual Graphs in Digital LibrariesDigital Libraries

Supporting CGs in Digital Libraries:

1. Building and storing CGs• Automated building of CGs• Organizing access to CGs in Datastore

2. Solving applied problems with CGs• Automated building and developing catalogues and

rubricators of DLs• KDD problems

Page 6: Solving Some Text Mining Problems with Conceptual Graphs

Semantic RoleLabelling helps

tocreate

conceptualrelations in CGs

SupportingSupporting Conceptual Graphs Conceptual Graphs :: Building and storing CGs

Standard way of building CG

•The sentences are marked with part-of-speech tags.

• Some titles and sentences from abstracts are filtered

•The selected sentences are parsed, obtaining their syntactic tree.

• The syntactic tree is traversed and the canonical conceptual graphs related to it nodes are joined.

Lexical restrictions are needed:

1. DL contains scientific papers

2. Only abstracts are transformed to CGs

http://framenet.icsi.berkeley.edu/http://wordnet.princeton.edu/

Page 7: Solving Some Text Mining Problems with Conceptual Graphs

Semantic RoleSemantic RoleLabelling for CGs Labelling for CGs BuildingBuilding

“The working of a genetic algorithm is usually explained by the search for superior building blocks”.

“John is going to Boston by bus”

http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php

Page 8: Solving Some Text Mining Problems with Conceptual Graphs

Conceptual Graphs in Some Text Conceptual Graphs in Some Text Mining ProblemsMining Problems

1ˆ ˆ{ ,..., }mg g

1. Building Association Rules

- initial set

- transactional set-subsets

represented in T

- Association Rule

1{ ,..., }nR r r1{ ,..., }mT t t, \X R Y R X

X Y

1

[ ]m

jj

t X Y

m

Supported by

Having Confidence as 1

1

[ ]

[ ]

m

jj

m

jj

t X Y

t X

1{ ,..., }nG g g - Set of CGs

- Set of generalized CGs

ˆ i kg g - Association Rule on CGs

Generalization for conceptsDisjoin for relations

ˆ[ , ]ˆ[ ]

N g gN g

ˆ[ , ]N g gn

Page 9: Solving Some Text Mining Problems with Conceptual Graphs

Conceptual Graphs in Some Text Conceptual Graphs in Some Text Mining ProblemsMining Problems

2. Building Ontologies by Aggregation of CGs Supporting Contexts:

- with CGs:

- with Corpora:

In analyzing the ambiguities, Wittgenstein developed his theory of language games, which allow words to have different senses in different contexts, applications, or modes of use.

Page 10: Solving Some Text Mining Problems with Conceptual Graphs

Solving Text Mining problems Solving Text Mining problems by CGs clusterby CGs clusteringing

CGs Hierarchy

• CGs Contexts problem

• CGs Similarity problem

?Clustering algorithm for specific similarity measures

Page 11: Solving Some Text Mining Problems with Conceptual Graphs

Conceptual Graphs ClusterConceptual Graphs Clusteringing

Similarity Measures

Conceptual similarity1 2

2 ( )( ) ( )

cc

n Gsn G n G

Relational similarity1 2

2 ( )( ) ( )

c c

cr

G G

m Gsm G m G

Some modifications of similarity measures

1 2

2 ( ),

( ) ( )c

cn G l

sn G n G

1

1 22

21 2

1

( ) , ( ) ( )

( ) , ( ) ( )( )

n Gk if n G n Gn G

ln Gk if n G n Gn G

ibothG bbbmGmc

...)( 21

bothmmi ,...,1

1 2c rs d s d s Unified similarity measure

Page 12: Solving Some Text Mining Problems with Conceptual Graphs

Genetic algorithmGenetic algorithm::speciality of decisionsspeciality of decisions

2 21 2 1 20.5 ( ) 0.5(cos2 cos2 )

1 2( , ) (20 20( )) 30x x x xy x x e e e

-20

0

20

-20

0

20

01020

30

-20

0

20

Ackley test function

Fitness function trajectoriesInitial population

Final population

50 100 150 200 250 300Число поколений

1000

1200

1400

1600

1800

2000Пригодность

-20

0

20

-20

0

20

010

20

30

-20

0

20

Page 13: Solving Some Text Mining Problems with Conceptual Graphs

Genetic algorithmGenetic algorithm for for clusteringclustering

1 3 6 2 4 5{{ , , },{ , , }}X X X X X X

GA chromosomes representing the clustering for various encoding schemes for clustering[5]:

(a) group number; (b) matrix; (c) permutation with the separator character 7; (d) greedy permutation; (e) order based.

Our encoding scheme:

picks the number of object which is in the same cluster as i -th objectia

… …a1 aia2 an

n objects

Clusters: {X1, X3, X6}, {X2, X4, X5}

Page 14: Solving Some Text Mining Problems with Conceptual Graphs

Genetic algorithmGenetic algorithm for clusteringfor clustering

Chain encoding for Conceptual Graphs:

• realizes implicit parallelism of genetic algorithms;

•forces clustering algorithm to work faster;

• is invariant under similarity measure on CGs

… …a1 aia2 an

n objectsAn idea about a possibility to vary fitness function of GA by varying

its parameters

Page 15: Solving Some Text Mining Problems with Conceptual Graphs

EVO – LIB ProjectEVO – LIB Project

System’s architecture

13

Индивидуумы

БД

Модуль оценкииндивидуумовМодуль ГА

Метабазаданных

Ядро системы

Модульагрегирования

Пользователь

Объекты

Наборобъектов

Пригодность

Отношения

Характерис-тическиезапросы

Отношения

Запросы

Результаты

Модульвыделения

ассоциативныхсвязей

Характеристические запросы

Зап-росы

Резуль-таты

Структура БД

Структура БД

Ассоциативныесвязи

Семантическая информация

Объекты

Агент

Структура БД

Page 16: Solving Some Text Mining Problems with Conceptual Graphs

Conceptual Graphs ClusterConceptual Graphs Clusteringing::Data Example for Clustering

1. We assume that the modality (i.e., number of local optima) of a fitness landscape is related to the difficulty of finding the best point on that landscape by evolutionary computation (e.g., hillclimbers and genetic algorithms (GAs)).

2. We first examine the limits of modality by constructing a unimodal function and a maximally multimodal function.

3. At such extremes our intuition breaks down. 4. A fitness landscape consisting entirely of a single hill leading to the global optimum proves

to be hard for hillclimbers but apparently easy for GAs. 5. A provably maximally multimodal function, in which half the points in the search space are

local optima, can be easy for both hillclimbers and GAs. 6. Exploring the more realistic intermediate range between the extremes of modality, we

construct local optima with varying degrees of “attraction” to our evolutionary algorithms. 7. Most work on optima and their basins of attraction has focused on hills and hillclimbers,

while some research has explored attraction for the GA's crossover operator. 8. We extend the latter results by defining and implementing maximal partial deception in

problems with k arbitrarily placed global optima. 9. This allows us to create functions with multiple local optima attractive to crossover. 10. The resulting maximally deceptive function has several local optima, in addition to the

global optima, each with various size basins of attraction for hillclimbers as well as attraction for GA crossover.

11. This minimum distance function seems to be a powerful new tool for generalizing deception and relating hillclimbers (and Hamming space) to GAs and crossover.

12. This paper describes an initial version of a library of sharable and reusable medical ontological theories, organized according to a proposed classification of ontologies.

Page 17: Solving Some Text Mining Problems with Conceptual Graphs

Conceptual Graphs ClusterConceptual Graphs Clusteringing::clustering results

- applying conceptual nearness - applying relational nearness

Page 18: Solving Some Text Mining Problems with Conceptual Graphs

Resume

1. Conceptual graphs is the perspective tool for modelling semantics of texts in DL.

2. A process of creating ontologies can be based on technologies which use conceptual graphs.

3. Conceptual graphs clustering helps in solving structural problems in DLs and in understanding its data.

4. Evolutionary approach is perspective in semantic modelling with conceptual graphs.

5. To progress CGs technologies, a joined efforts of computer specialists and linguists are needed.

Page 19: Solving Some Text Mining Problems with Conceptual Graphs

References1.1. A World of Conceptual Graphs: http://conceptualgraphs.org/ 2. Boytcheva, S. Dobrev, P. Angelova, G.CGExtract: Towards Extraction of Conceptual Graphs from Controlled

English. Lecture Notes in Computer Science № 2120, Springer 2001.3. F. Southey J. G. Linders. Notio - A Java API for Developing CG Tools. 7th International Conference on

Conceptual Structures, 1999. P.p. 262-271.4. Hirst G. Ontology and the Lexicon. - Handbook on Ontologies in Information Systems, Berlin – Springer, 2003. 5. Cole, R. M. Clustering With Genetic Algorithms http://citeseer.ist.psu.edu/cole98clustering.html.6. Montes-y-Gomez, Gelbukh, Lopez-Lopez, Baeza-Yates, Text Mining at Detail Level Using Conceptual

Graphs. Lecture Notes in Computer Science Vol. 2393. Springer-Verlag, 2002. Pp. 122 - 136 7. Sarbo, J. Formal conceptual structure in language. In Dubois, D. M., editor, Proceedings of Computing

Anticipatory Systems (CASYS'98), pp. 289 - 300, Woodbury, New York. 1999.8. Sowa R., Conceptual Graphs: Draft Proposed American National Standard, International Conference on

Conceptual Structures ICCS-99, Lecture Notes in Artificial Intelligence 1640, Springer 1999. 9. Богатырёв М.Ю. , Латов В.Е. Исследование генетических алгоритмов кластеризации. - Изв. ТулГУ. Сер.

Математика. Механика. Информатика. Том 8, вып. 3 . Информатика. - Тула, 2002. - С. 101- 107. 10. Holland J.H. Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press.

Reprinted by MIT, 1992. 11. Растригин Л.А. Адаптация сложных систем. Рига: Зинатне, 1981. 375 с. 12. Емельянов В.В., Курейчик В.В., Курейчик В.М. Теория и практика эволюционного моделирования. – М.:

Физматлит, 2003 - 432 с.13. Богатырёв М.Ю. Генетические алгоритмы: принципы работы, моделирование, применение. Тула, ТулГУ,

2003. 152 с.14. M. Bogatyrev. Modelling Systems With Symmetry// Proceedings of the 4 th International IMACS Symposium of

Mathematical Modelling. - Vienna, Austria, February 5-7, 2003.- ARGESIM-Verlag, Vienna, 2003. - pp. 270 - 275.15. M. Bogatyrev, V. Latov, K. Avdeev. Symmetry Based Decomposition and its Application in Evolutionary

Modelling. – Applied Mathematica: Proc. of 8 th International Mathematica Symposium. Avignon, 19-23 June, France, 2006