Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery...

31
Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements Poznan University of Technology, Poland 25th September 2014 OEG group seminar at Universidad Polit´ ecnica de Madrid (UPM) Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowle Data Mining OPtimization Ontology and its application to meta-mining of knowledge disco 25th September 2014 OEG group sem / 31

description

Slides from the seminar of OEG group at UPM Madrid.

Transcript of Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery...

Page 1: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Data Mining OPtimization Ontology and its applicationto meta-mining of knowledge discovery processes

Agnieszka Lawrynowiczcollaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato,

Jedrzej Potoniec and others - see acknowledgements

Poznan University of Technology, Poland

25th September 2014OEG group seminar at Universidad Politecnica de Madrid (UPM)

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 1

/ 31

Page 2: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Outline

Overview of DMOP: purpose, scope, core classes

Modeling issues▸ meta-modeling in DMOP;▸ alignment of DMOP with the DOLCE foundational ontology;▸ qualities and attributes;▸ property chains in DMOP;▸ other modeling considerations;

Meta-mining of KDD processes▸ RapidMiner▸ RMOnto▸ Fr-ONT-Qu▸ experimental evaluation

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 2

/ 31

Page 3: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Data Mining OPtimization Ontology (DMOP)

the primary goal of DMOP is to support all decision-making stepsthat determine the outcome of the data mining process;

development started in EU FP7 project e-LICO (2009-2012);

DMOP v5.4: ∼ 750 classes, ∼ 200 properties, ∼ 3200 axioms;

highly axiomatized;

using almost all of OWL 2 features;

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 3

/ 31

Page 4: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Overview of meta-learning

Meta-learning: learning to learn

application of machine learning techniques to meta-data about pastmachine learning experiments;

the goal: to modify some aspect of the learning process to improvethe performance of the resulting model;

meta-mining: meta-learning applied to full data mining process

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 4

/ 31

Page 5: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Overview of the e-LICO system

!"#$%&'()*+,'-!./01' ' ' '(23"$%4'567879'

"':'"'

'

! "#$%&'()&*+,-./,012*+3*2-%4,&

!56 78+*8$+9&21&/:+&+;<=>7&?"&<#@&4;!' <=!*+0)/' />1,)*!?' )*' @!=1)/*' 5' A)!,?<' +' <!1' /B' 0!C>)0!D!*1<' /*' 1;!' >*?!0,A*E' ?+1+' D)*)*E'.,+1B/0DF'4;)<'<!=1)/*'.0!<!*1<'1;!'?)BB!0!*1'=/D./*!*1<'/B'1;!'!"#$%&'+0=;)1!=1>0!'G()E>0!'7H'+*?'<;/I<';/I'1;!A')*1!0+=1'1/'+=;)!J!'1;!'><!0K<'L*/I,!?E!'?)<=/J!0A'E/+,F''

4;!'!"#$%&')*B0+<10>=1>0!'G?!.)=1!?')*'1;!'B)E>0!'>*?!0'1;!'?+<;!?',)*!H')<'1;!'D!+*<'MA'I;)=;'1;!'?+1+"D)*)*E' .,+1B/0D' )<' ?!,)J!0!?' 1/' <=)!*1)<1<F' 4;!' )**/J+1)J!' =/0!' ' /B' 1;!' !"#$%&'.,+1B/0D' )<' 1;!'!"#$%%&'$"#( )&*+,-$./( 0**&*#1"#' G$NOP' +M/J!' 1;!' ?+<;!?' ,)*!H' I)1;' )1<' .,+**!0' +*?' D!1+",!+0*!0F'Q/I!J!0P'1/'?!,)J!0'1;!'?+1+"D)*)*E'.,+1B/0D'1/')1<'<=)!*1)<1'><!0<P'1;!0!'+0!'<!J!0+,'/1;!0'<!0J)=!<'+*?'=/D./*!*1<F'()E>0!'7'<;/I<'+*'/J!0J)!I'/B'!"#$%&R<'=/D./*!*1<'+*?';/I'1;!A' )*1!0+=1'I)1;'!+=;'/1;!0F'

'()E>0!'7F'&J!0J)!I'/B'1;!'!"#$%&'<A<1!DF''

4;!0!'+0!'1I/'><!0"B+=)*E'=/D./*!*1<'B/0'1;!'!"#$%&'.,+1B/0DS'1;!<!'+,,/I'<=)!*1)<1<'1/'+==!<<'?+1+"D)*)*E' /.!0+1/0<' +*?T/0' /1;!0' ?+1+' .0/=!<<)*E' <!0J)=!<P' 1/' =/D./<!' 1;!D' )*1/' I/0LB,/I<' +*?'!U!=>1!' 1;!DP' =/,,!=1)*E' 1;!' 0!<>,1<' B/0' )*1!0.0!1+1)/*' /0' B>01;!0' +*+,A<)<F' 4;!<!' 1I/' =!*10+,')*B0+<10>=1>0!'=/D./*!*1<'+0!V'

7F 213&45&"$.V' O*' +..,)=+1)/*' 1;+1' E)J!<' +==!<<' 1/' +' I)?!' J+0)!1A' /B' ?+1+"D)*)*E' /.!0+1/0<P'1/E!1;!0'I)1;'1;!'D!+*<'1/'=/D./<!'1;!D')*1/'I/0LB,/I<F'

5F 61-$."1V' O' I/0LB,/I' =0!+1)/*' +*?' !*+=1D!*1' I/0LM!*=;' 1;+1' E)J!<' +==!<<' 1/' +0M)10+0A'W!M'<!0J)=!<'+*?'D+*A'/1;!0'L)*?<'/B'<!0J)=!<F' $1' )<'I)?!,A'><!?' )*'M)/)*B/0D+1)=<P'M>1'+,</' )*'D+*A'/1;!0'?)<=).,)*!<F'

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 5

/ 31

Page 6: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Competency questions

”Given a data mining task/data set, which of the valid or applicableworkflows/algorithms will yield optimal results (or at least better resultsthan the others)?”

”Given a set of candidate workflows/algorithms for a given task/dataset, which data set/workflow/algorithm characteristics should betaken into account in order to select the most appropriate one?”

and others more fine-grained, e.g.:

”Which induction algorithms should I use (or avoid) when my datasethas many more variables than instances?”

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 6

/ 31

Page 7: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Architecture of DMOP knowledge base and its satellitetriple stores

12 e-LICO

Figure 5: Architecture of DMOP knowledge base and its satellite triple stores

explicitly but derived from equivalent class definitions and property chains. Unless these inferred facts onalgorithms, datasets and hypotheses are extracted, many interesting patterns may remain undetected.

2. Gather dataset, workflow, and experiment descriptions and parse them into 3 temporary RDF triple files.Dataset descriptions are extracted from a given dataset by the Data Characterization Tool; workflow descrip-tions and other experimental meta-data (e.g. learned hypotheses, performance measures) are gleaned fromthe DMER.

3. Infer new facts by running a reasoner on the inferred DMOP and the three RDF triple files from step 2.4. Store the expanded dataset, workflow, and experiment files in their respective databases. These application-

specific DBs are stored in an OWLIM semantic repository.

The resulting databases — DSET-DB, WFLO-DB and DMEX-DB — can then be queried using SPARQL and theretrieved results used for meta-analysis by humans or as training data for a meta-miner.

3.2 Building the DMO Foundry

The creation of the DMO Foundry portal is an important off-DoW achievement of the e-LICO project. Theoriginal plan, as described in the DoW, was simply to develop a collaborative ontology development platform(CoDeP) in the e-LICO website and invite interested data miners to give their feedback or offer potentialcontributions to the DM Ontology. As work on DMOP progressed, the idea gained ground that our CoDePcould serve as the infrastructure of a Data Mining Ontology Foundry. This was envisioned as an independentportal that will be maintained and co-managed by committed volunteers from the worldwide DM and ontologyresearch communities, and where authors of other DM ontologies can elicit collaboration using the software toolsdeveloped in e-LICO. A DMO Jamboree was held in JSI, Ljubljana, in November 2010 and gathered participantsfrom e-LICO and beyond, in particular from K. U. Leeuven, Aberystwyth University and the University ofEconomics, Prague. It was agreed that the DMO Foundry should play, in the data mining field, a similar roleto that of the OBO Foundry in the life sciences.

3.2.1 Infrastructure

The DMO Foundry infrastructure is based on the e-LICO CoDeP, which contains the following tools:

� the DMO browser, an adaptation of the Manchester OWL Browser to specific needs of DMOP, such as thedisplay of mathematical formulas as annotation properties.

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 7

/ 31

Page 8: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

The core concepts of DMOP

Fig. 1. The core concepts of DMOP.

more than specify their input/output types; only processes called DM-Operations haveactual inputs and outputs. A process that executes a DM-Operator also realizes the DM-Algorithm implemented by the operator and achieves the DM-Task addressed by thealgorithm. Finally, a DM-Workflow is a complex structure composed of DM operators, aDM-Experiment is a complex process composed of operations (or operator executions).An experiment is described by all the objects that participate in the process: a workflow,data sets used and produced by the different data processing phases, the resulting mod-els, and meta-data quantifying their performance. In the following, the basic elementsof DMOP are detailed.

DM Tasks: The top-level DM tasks are defined by their inputs and outputs. ADataProcessingTask receives and outputs data. Its three subclasses produce new databy cleansing (DataCleaningTask), reducing (DataReductionTask), or otherwise trans-forming the input data (DataTransformationTask). These classes are further articulatedin subclasses representing more fine-grained tasks for each category. An Induction-Task consumes data and produces hypotheses. It can be either a ModelingTask or aPatternDiscoveryTask, based on whether it generates hypotheses in the form of globalmodels or local pattern sets. Modeling tasks can be predictive (e.g. classification) ordescriptive (e.g., clustering), while pattern discovery tasks are further subdivided intoclasses based on the nature of the extracted patterns: associations, dissociations, devia-tions, or subgroups. A HypothesisProcessingTask consumes hypotheses and transforms(e.g., rewrites or prunes) them to produce enhanced—less complex or more readable—versions of the input hypotheses.

Data: As the primary resource that feeds the knowledge discovery process, datahave been a natural research focus for data miners. Over the past decades meta-learningresearchers have actively investigated data characteristics that might explain generaliza-tion success or failure. Fig. 2 shows the characteristics associated with the different Datasubclasses (shaded boxes). Most of these are statistical measures, such as the number of

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 8

/ 31

Page 9: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Meta-modeling in DMOP 1/4

only processes (executions of workflows) and operations (executionsof operators) consume inputs and produce outputs

DM algorithms (as well as operators and workflows) can only specifythe type of input or output

inputs and outputs (DM-Dataset and DM-Hypothesis class hierarchy,respectively) are modeled as subclasses of IO-Object class

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 9

/ 31

Page 10: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Meta-modeling in DMOP 2/4

DM algorithms: classes or individuals? Individuals.

Problem: expressing types of inputs/outputs associated withalgorithm

”C4.5 specifiesInputClass CategoricalLabeledDataSet” 8

↗ ↖

Individual Class(instance of DM-Algorithm) (subclass of DM-Hypothesis)

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 10

/ 31

Page 11: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Meta-modeling in DMOP 3/4

Initial solution: one artificial class per each single algorithm with asingle instance corresponding to this particular algorithm

Problem: hasInput, hasOutput, specifiesInputClass,specifiesOutputClass—assigned a common range—IO-Object

”C4.5 specifiesInputClass Iris” ?

↗ ↖

Individual Individual(instance of DM-Algorithm) (instance of DM-Hypothesis)

Iris is a concrete dataset. Clearly, any DM algorithm is not designedto handle only a particular dataset.

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 11

/ 31

Page 12: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Meta-modeling in DMOP 4/4

Final solution: weak form of punning available in OWL 2

IO-Class: meta-class—the class of all classes of input and outputobjects

”C4.5 specifiesInputClass CategoricalLabeledDataSet” 4

↗ ↖

Individual Individual(instance of DM-Algorithm) (instance of IO-Class)

”DM-Process hasInput some CategoricalLabeledDataSet” 4↗ ↖

Class Class(subclass of dolce:process) (subclass of IO-Object)

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 12

/ 31

Page 13: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Alignment of DMOP with DOLCE 1/2

Two main reasons to align DMOP with a foundational ontology:

considerations about attributes and data properties; extantnon-foundational ontology solutions were partial re-inventions of howthey are treated in a foundational ontology;

reuse of the ontology’s object properties;

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 13

/ 31

Page 14: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Alignment of DMOP with DOLCE 2/2

Perdurant: DM-Experiment and DM-Operation are subclasses ofdolce:process;

Endurant: most DM classes, such as algorithm, software, strategy,task, and optimization problem, are subclasses ofdolce:non-physical-endurant;

Quality: characteristics and parameters of DM entities madesubclasses of dolce:abstract-quality;

Abstract: for identifying discrete values, classes added as subclassesof dolce:abstract-region;

object properties: DMOP reuses mainly DOLCE’s parthood, quality,and quale relations;

each of the four DOLCE main branches have been used.

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 14

/ 31

Page 15: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Qualities and attributes 1/4

How to handle ’attributes’ in OWL ontologies, and, in a broader context,measurements?

easy way: attribute is a binary functional relation between a class anda datatype

Elephant ⊑ =1 hasWeight.integerElephant ⊑ =1 hasWeightPrecise.realElephant ⊑ =1 hasWeightImperial.integer (in lbs)

building into one’s ontology application decisions about how to storethe data (and in which unit it is) /

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 15

/ 31

Page 16: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Qualities and attributes 2/4

How to handle ’attributes’ in OWL ontologies, and, in a broader context,measurements?

more elaborate way: unfold the notion of an object’s property (e.g.weight) from one attribute/OWL data property into at least twoproperties: one OWL object property from the object to the ’reifiedattribute’ (“quality property” represented as an OWL class) andanother property to the value(s)

▸ favoured in foundational ontologies;▸ solves the problem of non-reusability of the ’attribute’ and prevents

duplication of data properties;▸ neither ontology has any solution to represent actual values and units

of measurements;

measurements for DMOP more alike values for parameters;

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 16

/ 31

Page 17: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Qualities and attributes 3/4

DM-Data

dolce:non-physical-endurant dolce:abstract

DataType DataFormat

dolce:quality

dolce:region

dolce:abstract-regiondolce:quale

dolce:abstract-quality

anyType

hasDataValue

Characteristic Parameter

hasDataType

hasDataType

dolce:has-quale

dolce:particular

dolce:has-quality

dolce:q-location

TableFormat

DataTable hasTableFormat

DataCharacteristic

has-quality

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 17

/ 31

Page 18: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Qualities and attributes 4/4

ModelingAlgorithm ⊑ =1 has-quality.LearningPolicy

LearningPolicy is a dolce:quality

LearningPolicy ⊑ =1 has-quale.Eager-Lazy

Eager-Lazy is a subclass of dolce:abstract-region

Eager-Lazy ⊑ ≤ 1 hasDataValue.anyType

In this way, the ontology can be linked to many different applications, whoeven may use different data types, yet still agree on the meaning of thecharacteristics and parameters (’attributes’) of the algorithms, tasks, andother DM endurants.

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 18

/ 31

Page 19: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Property chains

DMOP has 11 property chains;

principal issues in declaring safe property chains (guaranteed not tocause unsatisfiable classes or other undesirable deductions), aredeclaring and choosing properties, and their domain and range axioms;

all investigated in detail in (Keet, EKAW ’2012) and adjusted werenecessary;

Example: hasMainTable ○ hasFeature ⊑ hasFeature

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 19

/ 31

Page 20: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Other modeling considerations

several other OWL 2 features were used;

ObjectInverseOf;

“object property characteristics” used sparingly, and only the basic‘functional’ characteristic asserted;

local reflexivity investigated on a subsumes property for instances inDMOP v5.2, but eventually modeled differently with classes andmetamodeling/punning;

DOLCE’s parthood is transitive, should be transitive in DMOP; but itwas discovered after the release of v5.3 that the object property copyfunction in Protege does not copy any property characteristics;

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 20

/ 31

Page 21: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

What is RapidMiner? 1/2

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 21

/ 31

Page 22: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

What is RapidMiner? 2/2

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 22

/ 31

Page 23: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

What is RapidMiner? 2/2

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 23

/ 31

Page 24: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

RMonto - plugin to RapidMiner

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 24

/ 31

Page 25: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Fr-ONT-Qu

algorithm for mining patterns in RDF(s) data

patterns expressed as SPARQL queries

2 thresholds: for keeping good enough patterns and for refining bestpatterns

several quality measures to select for thresholds (e.g. support on KB)

for classification task outperformed state-of-art approaches toclassification of Semantic Web data on tasks with available resultsand datasets (see: ”Pattern based feature construction in semanticdata mining” by A. Lawrynowicz, J. Potoniec, IJSWIS 10(1), 2014):

▸ kernel methods Bloehdorn et al. (2007), Loesch et al. (ESWC 2012best paper) on SWRC AIFB dataset,

▸ statistical relational classifier SPARQL-ML by Kiefer et al (ESWC 2008best paper) on SWRC AIFB dataset and OWLS-TC v2.1 dataset,

▸ concept learning algorithms DL-FOIL by Fanizzi et al (2008),DL-Learner cutting-edge CELOE variant by Lehmann (2009) on allmeasures on datasets BioPax, NTN, Financial

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 25

/ 31

Page 26: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Fr-ONT-Qu - pattern based classification

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 26

/ 31

Page 27: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Fr-ONT-Qu - trie data structure

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 27

/ 31

Page 28: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Semantic meta-mining experimental setup

baseline DM experiment set: 1581 RapidMiner workflows solving apredictive modeling task on 11 UCI datasets

dataset characteristics meta-data stored in DMEX-DB containingover 85 million of RDF triples

workflow patterns represented as SPARQL queries using DMOPentities

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 28

/ 31

Page 29: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Semantic meta-mining results

McNemar’s test for pairs of classifiers performed with the nullhypothesis that a classifier built using dataset characteristics and amined pattern set has the same error rate as the baseline that useddataset characteristics and only the names of the machine learningDM operators

Test confirmed that classifiers trained using workflow patternsperformed significantly better (accuracy 0.927) than the baseline(accuracy 0.890)

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 29

/ 31

Page 30: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Acknowledgements

EU FP7 ICT-2007.4.4 (No 231519) ”e-LICO: An e-Laboratory forInterdisciplinary Collaborative Research in Data Mining andData-Intensive Science”

Foundation for Polish Science under the PARENT/BRIDGEprogramme, cofinanced from European Union, Regional DevelopmentFund (No POMOST/2013-7/8)

Contributors to the development of DMOP and/or other e-LICOinfrastructure used in the research described in this presentation:Claudia d’Amato, Huyen Do, Simon Fischer, Dragan Gamberger,Melanie Hilario, Lina Al-Jadir, Simon Jupp, Alexandros Kalousis, C.Maria Keet, Joerg Uwe-Kietz, Petra Kralj Novak, Babak Mougouie,Phong Nguyen, Raul Palma, Jedrzej Potoniec, Floarea Serban, RobertStevens, Anze Vavpetic, Jun Wang, Derry Wijaya, Adam Woznica

RMonto and Meta-mining experiments done jointly with JedrzejPotoniec

Thanks to Veli Bicer for sharing the AIFB dataset

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 30

/ 31

Page 31: Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes

Bibliography

Keet, C.M., Lawrynowicz, A., dAmato, C., Hilario, M.: Modeling issues, choices in the data mining optimization ontology.

In Rodriguez-Muro, M., et al., eds.: OWLED. Volume 1080 of CEUR Workshop Proceedings., CEUR-WS.org (2013)

Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A. (2011). Ontology-Based Meta-Mining of Knowledge Discovery

Workflows. In N. Jankowski, W. Duch, K. Grabczewski (Ed.), Meta-Learning in Computational Intelligence (pp.273-316). Springer.

Potoniec, J., Lawrynowicz, A. (2011b). RMonto: Ontological extension to RapidMiner. Poster and Demo Session of the

ISWC 2011 - 10th International Semantic Web Conference.

Lawrynowicz, A., Potoniec, J.:Pattern Based Feature Construction in Semantic Data Mining. IJSWIS 10(1) (2014)

Keet, C.M, Detecting and Revising Flaws in OWL Object Property Expressions. EKAW 2012: 252-266

Serban, F., Vanschoren, J., Kietz, J.-U., Bernstein, A. (2012). A survey of intelligent assistants for data analysis. ACM

Computing Surveys

Agnieszka Lawrynowicz collaboration with C. Maria Keet, Melanie Hilario, Claudia d’Amato, Jedrzej Potoniec and others - see acknowledgements ( Poznan University of Technology, Poland )Data Mining OPtimization Ontology and its application to meta-mining of knowledge discovery processes25th September 2014 OEG group seminar at Universidad Politecnica de Madrid (UPM) 31

/ 31