1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de...
-
Upload
lucas-lewis -
Category
Documents
-
view
221 -
download
0
description
Transcript of 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de...
1
Solving ILP Problems in the EELA infrastructure
Inês DutraDepartamento de Ciência de
ComputadoresUniversidade do Porto, Portugal
2
Outline
• Introduction– ILP– Examples– Motivation
• Experiments• Conclusions• Future Work
3
Introduction
• EELA selected application• Task 3.3: additional applications
4
Introduction
• What is ILP?– It is NOT Instruction Level Parallelism– It is NOT Integer Linear Programming
• So, what is it????• .......
5
Introduction• It is Inductive Logic Programming
– data mining – machine learning– Knowledge/information extraction
• Where:– Given:
• Set of observations (positive and negative)• Background knowledge (descriptions)• Language bias
– Find:• A hypothesis (in first order language) that best explains all
positive observations and none of the negatives.
6
Introduction
• Advantages:– Use of an understandable description
language– Relational knowledge
7
Introduction: example
TRAINS GOING EAST TRAINS GOING WEST
8
Introduction: example
short(car_12).closed(car_12).long(car_11).long(car_13).short(car_14).open_car(car_11).open_car(car_13).open_car(car_14).shape(car_11,rectangle). shape(car_12,rectangle).shape(car_13,rectangle).shape(car_14,rectangle).
load(car_11,rectangle,3). load(car_12,triangle,1).load(car_13,hexagon,1).load(car_14,circle,1).wheels(car_11,2).wheels(car_12,2).wheels(car_13,3).wheels(car_14,2).has_car(east1,car_11).has_car(east1,car_12).has_car(east1,car_13).has_car(east1,car_14).
9
Introduction: example
TRAINS GOING EAST TRAINS GOING WEST
10
Introduction: example
eastbound(T) IF has_car(T,C) AND short(C) AND closed(C)
TRAINS GOING EAST TRAINS GOING WEST
11
Another less “toyish” example: extracting knowledge from mammograms
is_malignant(A) if 'BIRADS_category'(A,b5), 'MassPAO'(A,present), 'Age'(A,age6570), previous_finding(A,B,C), 'MassesShape'(B,none), 'Calc_Punctate'(B,notPresent), previous_finding(A,C), 'BIRADS_category'(C,b3).
This rule states that finding (A) IS malignant IF it is:
classified as BI-RADS 5 ANDhad a mass presentin a patient who: was between the ages of 65 and 70 had two prior mammograms (B, C)and prior mammogram (B): had no mass shape described had no punctate calcificationsand prior mammogram (C) was classified as BI-RADS 3
12
Introduction: Motivation
• Applications:– Link discovery– Social Network Analysis– Equivalent identities– Drug design– Protein unfolding– Protein metabolism– Why not? Classifying grid failures ()– And...many others!
13
Introduction: Motivation
• Why does ILP need a grid?– Search space can become large very
quickly– Need many experiments to have statistical
significant results• Cross-validation• Training, tuning, testing
– Can combine classifiers: ensembles
14
Introduction: Motivation
• Assume we want to run a task for one domain: find a “good” hypothesis that describes pos examples
• Assume we run 5x4-fold cross-validation• Assume we have 100 classifiers per fold• # of experiments: 2,000
15
Introduction: Motivation
• Now assume each experiment takes 1 hour to run
• How long would it take to generate the 2,000 classifiers to be combined?
~ 83 days!!!• If we consider varying learning parameters
and learning algorithms, this number can be really big!!
16
Experiment
• Predict carcinogenecity in rodents– Difficult task– large search space!– Important problem
• Phase 1:– Tuning using 5x4-fold cross-validaton– Generating ensembles up to 100
• Aleph: well-known ILP system• Yap: Yet another prolog
17
Experiment: one of the classifiers
active(A) if atom(A,_,n,32,B), B ≤ -0.401, has_property(A,cytogen_sce,n), methyl(A,_).
Sister Chromatid Exchange (SCE)SCE is used for the determination of mutagenity
18
Experiment
• 2 submissions:– From LA– From EU
19
Submitting jobs from LA....
20
Experiment
EELA resources
utilised
Resource # of jobsCERN 1,160
CIEMAT 279CETA-CIEMAT 173
UniCan 98LIP 10
INFN 38UNAM 16
BIOF.UFRJ 159IF.UFRJ 8UFCG 28Total 1,969
~ 300 resources in LA
211 jobs in LA
21
Experiments
• Why 1,969 out of 2,000???• 2 reasons:
– Proxy expiration:• On submission (takes loooooong!!!)• On execution
– Use of dynamic libraries
22
• Submitting jobs from EU...• from a non-EELA site, BUT• Using the EELA VO:
– Jobs run only on EU resources...• Reasons:
– Misconfiguration?– Closer brokers with more machines?
23
Conclusions• Happiness: EELA is working!!!• We can run thousands of experiments!• Frida is happy!!! (see Condor introductory
tutorials, if you feel curious about Frida )• Experiment showed good utilization of EELA
resources in LA and EU• Low failure rate (1%)• Failures motivated by:
– Dynamic libs not available in the remote machine– Proxy expiration
24
Future work
• More detailed analysis of jobs and logs• Full ILP experiment• More domains• Other kinds of experiments based on
Statistical Relational Learning
• And, do not forget: ILP can help to model and diagnose errors in the grid environment!
25
Collaborators
• Fernando Silva (DCC-UPorto)• Vítor Santos Costa (DCC-UPorto)• Rui Camacho (FE-UPorto)• Nuno Fonseca (IBMC/IBMEC, Porto)• Beth Burnside (UW-Madison hospital)• David Page (UW-Madison)• Jesse Davis (UWashington)
26
Thanks!!!
Questions??