Solving Some Text Mining Problems with Conceptual Graphs
description
Transcript of Solving Some Text Mining Problems with Conceptual Graphs
Solving Some Text Mining Problems with Conceptual Graphs
M. Bogatyrev, V. Tuhtin
Tula State UniversityFaculty of Cybernetics
Laboratory of Information Systems
2008
The Nature of Text MiningThe Nature of Text Mining
1. W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992), Knowledge Discovery in Databases: An Overview, AI Magazine: pp. 213–2282. D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA.
Data mining:"the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"[1] "the science of extracting useful information from large data sets or databases."[2]
Text mining:• process of deriving high quality information from text;• text data mining;• text analytics
• information retrieval, • machine learning, • statistics,
Computational Linguistics
Text mining is interdisciplinary:
Computational (Corpora) Computational (Corpora) LinguisticsLinguistics
• text categorization, • text clustering, • concept/entity extraction, • sentiment analysis, • document summarization
Natural Language Processing
•annotation•abstraction• ontologies•semantic roles •Objects of tagging
• clusters,• trends,• associations,• deviations
Corpora:• large and structured text• taggingData:
Plain text
Knowledge Models:• rules;• ontologies
Processing objects
Metadata
Knowledge DiscoveryText Text
MiningMiningGlobal Problems
Analysis of: • syntax• grammar• morphology• semantics
Problems
Conceptual Graph::Example:“John is going to Boston by bus”
Concepts
Conceptual relations
[City*a:'Boston'] [Bus*b:''] [Person*c:'John'] [Going*d:''] (agent?d?c) (dest?d?a) (instrument?d?b)
Representations. Conceptual Graph Standard by J. Sowa [8]
1. Conceptual Graph Interchange Form (CGIF)
2. XML Form<graph id="35979486054" owner="0"> <type> <label>Proposition</label> </type>
<layout> <rectangle x="0.0" y="0.0" width="1500.0" height="1500.0"/> <color foreground="0,0,175" background="0,0,175"/> </layout> … </layout> </arrow>
</graph></conceptualgraph>
( : )( : )( : )( : )( ( , ' ') ( , ' ') ( , ) ( , ) ( , ))
x Go y Person z City w Bus Name y JohnName z Boston Agnt x y Dest x z Inst x w
Applying Predicate Calculus(CGIF + NOTIO)
Conceptual Graphs in Conceptual Graphs in Digital LibrariesDigital Libraries
Supporting CGs in Digital Libraries:
1. Building and storing CGs• Automated building of CGs• Organizing access to CGs in Datastore
2. Solving applied problems with CGs• Automated building and developing catalogues and
rubricators of DLs• KDD problems
Semantic RoleLabelling helps
tocreate
conceptualrelations in CGs
SupportingSupporting Conceptual Graphs Conceptual Graphs :: Building and storing CGs
Standard way of building CG
•The sentences are marked with part-of-speech tags.
• Some titles and sentences from abstracts are filtered
•The selected sentences are parsed, obtaining their syntactic tree.
• The syntactic tree is traversed and the canonical conceptual graphs related to it nodes are joined.
Lexical restrictions are needed:
1. DL contains scientific papers
2. Only abstracts are transformed to CGs
http://framenet.icsi.berkeley.edu/http://wordnet.princeton.edu/
Semantic RoleSemantic RoleLabelling for CGs Labelling for CGs BuildingBuilding
“The working of a genetic algorithm is usually explained by the search for superior building blocks”.
“John is going to Boston by bus”
http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php
Conceptual Graphs in Some Text Conceptual Graphs in Some Text Mining ProblemsMining Problems
1ˆ ˆ{ ,..., }mg g
1. Building Association Rules
- initial set
- transactional set-subsets
represented in T
- Association Rule
1{ ,..., }nR r r1{ ,..., }mT t t, \X R Y R X
X Y
1
[ ]m
jj
t X Y
m
Supported by
Having Confidence as 1
1
[ ]
[ ]
m
jj
m
jj
t X Y
t X
1{ ,..., }nG g g - Set of CGs
- Set of generalized CGs
ˆ i kg g - Association Rule on CGs
Generalization for conceptsDisjoin for relations
ˆ[ , ]ˆ[ ]
N g gN g
ˆ[ , ]N g gn
Conceptual Graphs in Some Text Conceptual Graphs in Some Text Mining ProblemsMining Problems
2. Building Ontologies by Aggregation of CGs Supporting Contexts:
- with CGs:
- with Corpora:
In analyzing the ambiguities, Wittgenstein developed his theory of language games, which allow words to have different senses in different contexts, applications, or modes of use.
Solving Text Mining problems Solving Text Mining problems by CGs clusterby CGs clusteringing
CGs Hierarchy
• CGs Contexts problem
• CGs Similarity problem
?Clustering algorithm for specific similarity measures
Conceptual Graphs ClusterConceptual Graphs Clusteringing
Similarity Measures
Conceptual similarity1 2
2 ( )( ) ( )
cc
n Gsn G n G
Relational similarity1 2
2 ( )( ) ( )
c c
cr
G G
m Gsm G m G
Some modifications of similarity measures
1 2
2 ( ),
( ) ( )c
cn G l
sn G n G
1
1 22
21 2
1
( ) , ( ) ( )
( ) , ( ) ( )( )
n Gk if n G n Gn G
ln Gk if n G n Gn G
ibothG bbbmGmc
...)( 21
bothmmi ,...,1
1 2c rs d s d s Unified similarity measure
Genetic algorithmGenetic algorithm::speciality of decisionsspeciality of decisions
2 21 2 1 20.5 ( ) 0.5(cos2 cos2 )
1 2( , ) (20 20( )) 30x x x xy x x e e e
-20
0
20
-20
0
20
01020
30
-20
0
20
Ackley test function
Fitness function trajectoriesInitial population
Final population
50 100 150 200 250 300Число поколений
1000
1200
1400
1600
1800
2000Пригодность
-20
0
20
-20
0
20
010
20
30
-20
0
20
Genetic algorithmGenetic algorithm for for clusteringclustering
1 3 6 2 4 5{{ , , },{ , , }}X X X X X X
GA chromosomes representing the clustering for various encoding schemes for clustering[5]:
(a) group number; (b) matrix; (c) permutation with the separator character 7; (d) greedy permutation; (e) order based.
Our encoding scheme:
picks the number of object which is in the same cluster as i -th objectia
… …a1 aia2 an
n objects
Clusters: {X1, X3, X6}, {X2, X4, X5}
Genetic algorithmGenetic algorithm for clusteringfor clustering
Chain encoding for Conceptual Graphs:
• realizes implicit parallelism of genetic algorithms;
•forces clustering algorithm to work faster;
• is invariant under similarity measure on CGs
… …a1 aia2 an
n objectsAn idea about a possibility to vary fitness function of GA by varying
its parameters
EVO – LIB ProjectEVO – LIB Project
System’s architecture
13
Индивидуумы
БД
Модуль оценкииндивидуумовМодуль ГА
Метабазаданных
Ядро системы
Модульагрегирования
Пользователь
Объекты
Наборобъектов
Пригодность
Отношения
Характерис-тическиезапросы
Отношения
Запросы
Результаты
Модульвыделения
ассоциативныхсвязей
Характеристические запросы
Зап-росы
Резуль-таты
Структура БД
Структура БД
Ассоциативныесвязи
Семантическая информация
Объекты
Агент
Структура БД
Conceptual Graphs ClusterConceptual Graphs Clusteringing::Data Example for Clustering
1. We assume that the modality (i.e., number of local optima) of a fitness landscape is related to the difficulty of finding the best point on that landscape by evolutionary computation (e.g., hillclimbers and genetic algorithms (GAs)).
2. We first examine the limits of modality by constructing a unimodal function and a maximally multimodal function.
3. At such extremes our intuition breaks down. 4. A fitness landscape consisting entirely of a single hill leading to the global optimum proves
to be hard for hillclimbers but apparently easy for GAs. 5. A provably maximally multimodal function, in which half the points in the search space are
local optima, can be easy for both hillclimbers and GAs. 6. Exploring the more realistic intermediate range between the extremes of modality, we
construct local optima with varying degrees of “attraction” to our evolutionary algorithms. 7. Most work on optima and their basins of attraction has focused on hills and hillclimbers,
while some research has explored attraction for the GA's crossover operator. 8. We extend the latter results by defining and implementing maximal partial deception in
problems with k arbitrarily placed global optima. 9. This allows us to create functions with multiple local optima attractive to crossover. 10. The resulting maximally deceptive function has several local optima, in addition to the
global optima, each with various size basins of attraction for hillclimbers as well as attraction for GA crossover.
11. This minimum distance function seems to be a powerful new tool for generalizing deception and relating hillclimbers (and Hamming space) to GAs and crossover.
12. This paper describes an initial version of a library of sharable and reusable medical ontological theories, organized according to a proposed classification of ontologies.
Conceptual Graphs ClusterConceptual Graphs Clusteringing::clustering results
- applying conceptual nearness - applying relational nearness
Resume
1. Conceptual graphs is the perspective tool for modelling semantics of texts in DL.
2. A process of creating ontologies can be based on technologies which use conceptual graphs.
3. Conceptual graphs clustering helps in solving structural problems in DLs and in understanding its data.
4. Evolutionary approach is perspective in semantic modelling with conceptual graphs.
5. To progress CGs technologies, a joined efforts of computer specialists and linguists are needed.
References1.1. A World of Conceptual Graphs: http://conceptualgraphs.org/ 2. Boytcheva, S. Dobrev, P. Angelova, G.CGExtract: Towards Extraction of Conceptual Graphs from Controlled
English. Lecture Notes in Computer Science № 2120, Springer 2001.3. F. Southey J. G. Linders. Notio - A Java API for Developing CG Tools. 7th International Conference on
Conceptual Structures, 1999. P.p. 262-271.4. Hirst G. Ontology and the Lexicon. - Handbook on Ontologies in Information Systems, Berlin – Springer, 2003. 5. Cole, R. M. Clustering With Genetic Algorithms http://citeseer.ist.psu.edu/cole98clustering.html.6. Montes-y-Gomez, Gelbukh, Lopez-Lopez, Baeza-Yates, Text Mining at Detail Level Using Conceptual
Graphs. Lecture Notes in Computer Science Vol. 2393. Springer-Verlag, 2002. Pp. 122 - 136 7. Sarbo, J. Formal conceptual structure in language. In Dubois, D. M., editor, Proceedings of Computing
Anticipatory Systems (CASYS'98), pp. 289 - 300, Woodbury, New York. 1999.8. Sowa R., Conceptual Graphs: Draft Proposed American National Standard, International Conference on
Conceptual Structures ICCS-99, Lecture Notes in Artificial Intelligence 1640, Springer 1999. 9. Богатырёв М.Ю. , Латов В.Е. Исследование генетических алгоритмов кластеризации. - Изв. ТулГУ. Сер.
Математика. Механика. Информатика. Том 8, вып. 3 . Информатика. - Тула, 2002. - С. 101- 107. 10. Holland J.H. Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press.
Reprinted by MIT, 1992. 11. Растригин Л.А. Адаптация сложных систем. Рига: Зинатне, 1981. 375 с. 12. Емельянов В.В., Курейчик В.В., Курейчик В.М. Теория и практика эволюционного моделирования. – М.:
Физматлит, 2003 - 432 с.13. Богатырёв М.Ю. Генетические алгоритмы: принципы работы, моделирование, применение. Тула, ТулГУ,
2003. 152 с.14. M. Bogatyrev. Modelling Systems With Symmetry// Proceedings of the 4 th International IMACS Symposium of
Mathematical Modelling. - Vienna, Austria, February 5-7, 2003.- ARGESIM-Verlag, Vienna, 2003. - pp. 270 - 275.15. M. Bogatyrev, V. Latov, K. Avdeev. Symmetry Based Decomposition and its Application in Evolutionary
Modelling. – Applied Mathematica: Proc. of 8 th International Mathematica Symposium. Avignon, 19-23 June, France, 2006