2 partners ed_kickoff_dtai

20
Declarative Languages and Artificial Intelligence Research Group Wannes Meert, Luc De Raedt EluciDATA November 2014

description

Elucidata Kick-off event - KULeuven - Declarative languages and artificial intelligence research group

Transcript of 2 partners ed_kickoff_dtai

Page 1: 2 partners ed_kickoff_dtai

Declarative Languages and Artificial Intelligence

Research GroupWannes Meert, Luc De Raedt

EluciDATA November 2014

Page 2: 2 partners ed_kickoff_dtai

Who is DTAI?

2

Machine Learning 5 ZAP 1 ERC StG ±13 post-docs ±35 Ph.D. students

Declarative Languages and Systems 5 ZAP ±2 post-docs ±11 Ph.D. students

Page 3: 2 partners ed_kickoff_dtai

3

LSTATResearch Centres

Page 4: 2 partners ed_kickoff_dtai

Mission Statement

4

To apply these in various application domains

To develop techniques, theory, systems and software solutions for machine learning and data mining2

3

To design languages to express complex, relational and uncertain knowledge1

Page 5: 2 partners ed_kickoff_dtai

Basic Research

5

Page 6: 2 partners ed_kickoff_dtai

Machine Learning and Data Mining

6

Complex Data

Lots of Knowledge

Clinical Genetic

Papers Knowledge Base

Learn predictive models MassSize ≥ 10 ⇒ Cancer

Reason about the data Prob( Flu | Fever ) = ?

Discover patterns Smoking ⋀ Cancer

Find solutions for problem Room1:CourseB, Room2:CourseA

Network

Include new data Modelt ⟵ Modelt+1 + data

Page 7: 2 partners ed_kickoff_dtai

Structure and Uncertainty: Statistical Relational Learning

7

No

Yes

No

No …

RU4

RU4

LL3

RL2 …

P1

P1

P1

P2 …

5/02

5/04

5/04

6/00 …

3mm

5mm

4mm

2mm …

Absent

Present

Absent

Absent …

1

2

3

4 …

Abnormality Patient Date Calcification … Mass Loc Cancer Fine/Linear Size

Interdependent samples Non-deterministic dependencies

Precision

0.0

0.3

0.5

0.8

1.0

Recall

0.0 0.3 0.5 0.8 1.0

RadiologistSAYU-VISTA

57% reduction in FPs at same detection rate

Page 8: 2 partners ed_kickoff_dtai

Patterns in Graphs: Graph Mining

8

�����������������

��������� ���������

��������

����������������

��������

������������������������

�����������������

��������� ���������

��������

����������������

��������

������������������������

�����������������

��������� ���������

��������

����������������

��������

������������������������

��������

������

��������

���������������

����������

��������

�������������������������

����������������

��������������������������������

��������

����������

������������

��������

��������

��������

����������������

�����������������������������������

Page 9: 2 partners ed_kickoff_dtai

Models to act upon: Interpretable Results

9

Rich database, communicate with medical professionals

0.2 IF temp(T)+5 < temp(T+1) THEN failure(kidney)

Page 10: 2 partners ed_kickoff_dtai

Probabilistic Programming

10

Page 11: 2 partners ed_kickoff_dtai

Constraint Programming

11

Page 12: 2 partners ed_kickoff_dtai

Applications

12

Page 13: 2 partners ed_kickoff_dtai

Robotics:

13

Example scenario: 1. Localise door 2. Localise handle 3. Recognize grasping points 4. Find correct action to apply

Hinge

Light switch

Handle

http://www.first-mm.eu

Page 14: 2 partners ed_kickoff_dtai

Mol. BioSyst. This journal is c The Royal Society of Chemistry 2013

carA and carB which are also selected by PheNetic, are involvedin the conversion of glutamine to glutamate, which is one of themajor known acid resistance mechanisms in E. coli.30 This geneencoding pepA seems barely altered at its expression level,explaining why it might have been missed by previous studies.Another strongly prioritized gene/gene product is TyrR, themain regulator of tyrosine synthesis, which has previously beenassociated with acid conditions in Salmonella typhimurium.38

TyrR regulates the amino acid metabolism regulator Mtr, whichin turn regulates the tryptophan or indole metabolism operon,many genes of which were retrieved in the sub-network selectedby PheNetic. The indole biosynthesis operon was found to bedown-regulated in many of the KO strains we analysed, butnone of its known regulators ranked well by the differentialexpression-based ranking method, explaining why it has beenlargely overlooked in the past. So far tryptophan biosynthesishas been only associated with acid resistance through thetryptophanases TnaA and TnaC.26,28,29

Discussion

In this work, we developed an analysis method that allowsinterpreting in-house generated omics data in the light of

publicly available information, represented as an interactionnetwork. The developed method extracts sub-networks des-cribing the mechanism behind the omics data from this inter-action network.

The method was applied to reanalyse a previously publishedexpression study, assessing gene expression of 27 KO strainsunder mild acid growth conditions in E. coli.19 To this end, anE. coli interaction network was compiled, spanning multiplelayers of interactions. Applying PheNetic on this KO expressiondataset using this E. coli interaction network, allowed recoveringmechanisms known to be involved in acid resistance.

According to the classification of network inference methodsof De Smet and Marchal39 PheNetic can be considered as anintegrative inference scheme that uses next to expression dataalso omics derived network information to prioritize genesinvolved in the process of interest. Comparing PheNetic withclassical differential expression-based ranking illustrates theadded value of using such an integrative network-basedapproaches to analyse omics derived gene lists. This integrativeinference strategy39 allowed reaching higher sensitivities atlower FPR: false positive genes (e.g. genes that were erroneouslyfound differentially expressed under the tested conditions) arefiltered out more easily as these genes cannot be connected to

Fig. 4 detailed view of the sub-network involved in acid resistance identified by PheNetic. The sub-network was decomposed into different subgroups centredaround the overrepresented GO categories. For visualization purposes, genes are grouped based on their annotation in KEGG and GO (shaded areas). Genes containedin both benchmark sets are indicated as yellow nodes.

Method Molecular BioSystems

Publ

ishe

d on

26

Mar

ch 2

013.

Dow

nloa

ded

by K

U L

euve

n U

nive

rsity

Lib

rary

on

28/0

5/20

13 0

9:26

:45.

View Article OnlineBio- and Cheminformatics

14

Mol. BioSyst. This journal is c The Royal Society of Chemistry 2013

with sets of cause–effect pairs resulting from in-house experi-ments (see Fig. 1). Here, a cause corresponds to a mutation thatis expected to trigger an alteration in downstream genes. If thealteration affects the expression level, the downstream geneswill be visible in an expression profiling experiment (andreferred to as effects). The network, in which nodes representgenes and edges the interactions between the nodes, will beused as a scaffold to connect causes to their effects. Every edgein the network is assigned a probability that expresses ourbelief in the interaction being truly present in the organismsinteractome9 and each node is annotated with a probabilitythat reflects its centrality in the network (see Materials andmethods). The goal is to extract from this interaction networkthe sub-network involved in transducing the perturbation fromthe causes to their corresponding effects. This sub-networkwill comprise genes related to the processes highlighted by thein-house dataset.

Because of the probabilistic nature of the network, we canobtain for each path in the interaction network a probability(see Materials and methods). This probability is determined bythe probabilities of the edges, expressing the belief in the edge,and the nodes, expressing the network centrality of the node,composing the path. The latter term penalizes paths throughhighly connected or hub nodes. By doing so the amount ofredundancy between paths is quantified in the probability of apath. Highly probable paths will thus avoid hub nodes as thesehave low centrality probabilities. Given the probabilities on thepaths we can formulate the sub-network selection problem as apath-finding problem in the decision theoretic framework ofDT ProbLog.18 Briefly, we first determine for each cause–effectpair the set of most likely paths that connect them. Sub-sequently, we merge these paths into a sub-network in whichcauses (i.e. mutated genes) are connected to the most andpreferentially strongest differentially expressed effects usingthe most probable paths in a parsimonious way (using thesmallest sub-network).

This is achieved by assigning a reward to the selected sub-network based on the cause–effect pairs that are connected in

the sub-network. Cause–effect pairs connected with a highprobability, and having a high level of differential expressionwill obtain higher rewards. On the other hand a cost will beassigned with increasing size of the selected sub-network.By maximizing the reward minus the cost, the most sparsesub-network will be selected that best explains the input data(see Materials and methods). The motivation for selecting themost parsimonious solution is based on the assumption thatall cause–effect pairs are involved in the same phenotypeand therefore should trigger common paths in the interactionnetwork.

Network analysis in Escherichia coli

To use PheNetic on E. coli datasets, we compiled a compre-hensive interaction network for this organism from publiclyavailable omics datasets and predictions (see Materials andmethods). The network, consisting of 16 794 physical or meta-bolic interactions between 3063 nodes, covers protein–protein,transcriptional and metabolic interactions.

This network was used in combination with PheNetic toreanalyse the KO compendium published by Stincone et al.This dataset19 profiles the expression of 27 E. coli KO strains,known to be involved in acid resistance (referred to as causes).For each KO strain, expression was compared to that of thewild type strain under similar conditions,19 resulting in listsof genes differentially expressed between wild type and KOstrain (referred to as effects). As all mutated genes weresupposed to be involved in the same acid resistance phenotype,the cause–effect pairs of the 27 different experiments werepooled in a single list of cause–effect pairs which was theninterpreted by means of the interaction network (see Materialsand methods).

To optimize parameter selection and test algorithmicperformance, two benchmark sets were defined consisting ofgenes previously associated with acid resistance in E. coli.A first stringent, but small benchmark consisting of 53 geneswas based on literature curated information. A second morerelaxed benchmark was composed of genes, reported to be

Fig. 1 Schematic overview of the experimental setup. KO genes are referred to as causes and differentially expressed genes as effects (left panel). PheNetic connectscauses to effects over an interaction network derived from publicly available data (middle panel) by searching for paths in the interaction network that connect causes(blue dots) to as many possible effects (red and green nodes) in the most parsimonious way (right panel). PheNetic allows extracting from the global interactionnetwork, the molecular mechanism that is activated by the KO experiment.

Method Molecular BioSystems

Publ

ishe

d on

26

Mar

ch 2

013.

Dow

nloa

ded

by K

U L

euve

n U

nive

rsity

Lib

rary

on

28/0

5/20

13 0

9:26

:45.

View Article Online

SBO Research Proposal

Network-based approaches for the identification and mode of action determination

of anti-bacterial agents

KU Leuven

Ghent University

Page 15: 2 partners ed_kickoff_dtai

Transportation: Energy:

15

http://icon-fet.eu

Integrate constraints and data mining to dynamically optimise: - Public transportation schedules - Energy distribution

Page 16: 2 partners ed_kickoff_dtai

Resource Efficient Machine Learning

16

Figure 24.2.1: (left) Architectural representation of voice activity detector detailing hierarchical information extraction (right) energy consumption at different levels of hierarchy.

Figure 24.2.2: Schematic representation of (top) Wakeup detector (bottom) Analog feature extractor

Figure 24.2.3: (top) Measured response of Wakeup to audio input (bottom left) measured band frequency response and (bottom right) measured performance summary of analog feature extraction block and energy detector

Figure 24.2.4: (left) Schematic and decision tree algorithm for mixed-signal classifier (right) Measurement results for HR speech / Non speech for different contexts.

Figure 24.2.5: Measured power consumption and Speech / Non Speech Hit rates for different operating modes and contexts

Figure 24.2.6: Comparison to state-of-the-art.

Demonstration proposal for

“Context-aware hierarchical information-sensing in a 6 µW 90nm CMOS voice activity detector”

Komail Badami, Steven Lauwereins, Wannes Meert, Marian Verhelst; KU Leuven

We want to propose a demo of our chip for the academic demo session at ISSCC, which will highlight the power savings achieved in the sensor interface through hierarchical sensing, and through embedded cost- and context-aware classification. To this end, a little board carrying our chip and the micro-processor will be fed with pre-recorded, as well as live audio signals. The sensing system will constantly do voice activity detection on the incoming signals. A GUI on a connected PC will at the same time give more information to the bystanders, being:

x On the input signals: the noise context, the recorded audio signal and the current acoustic SNR x On  the  system’s  operating  mode:  

o The parameters of the decision tree programmed on the chip o Which blocks in the chip are on, and which ones are off. Demonstrating the automatic at

run-time scalability of the proposed system. x On the live performance results

o The current (measured) power consumption of the chip o The current achieved detection accuracy (voice miss detects and false alarms) o The current classification output of the system, as well as the averaged accuracy.

We can then dynamically play with this setup, to show its power saving potential, as well as the trade-offs involved. E.g.:

x Dynamically shut down expensive frequency features, to show the resulting power savings, yet also the resulting decrease in accuracy

x Dynamically change noise context, resulting in a drop of accuracy, which is restored when other feature banks (more relevant for this context) are switched in.

Fig. 1: System setup with chip and μP and the GUI demonstrating the current activated blocks, DT and performance for live audio or prerecorded audio.

Reduce: - features’ energy consumption - features’ collection cost - classification cost

Efficient Diagnostics Smart Hardware

3UREOHHPVWHOOLQJ��VHTXHQWLH

�ZDQGHOHQ � � ��� WUDS�RS�

UXLV UXLV UXLV

Figure 24.2.7: Chip micrograph highlighting different sections

Sensor Data

Page 17: 2 partners ed_kickoff_dtai

Fig. 6: Skeleton estimates after cylinder correction, hierarchical particle filter(particle filter in white, NITE in blue).

Fig. 7: Skeleton estimates after cylinder correction, non-hierarchical particlefilter (particle filter in white, NITE in blue)

Sports Analytics

17

http://dtai.cs.kuleuven.be/sports

Soccer, Basketball, Runners, Rehabilitation

Page 18: 2 partners ed_kickoff_dtai

Social Profit

18

Page 19: 2 partners ed_kickoff_dtai

Educational Tools

19

http://eng.kuleuven.be/innovationlab

Page 20: 2 partners ed_kickoff_dtai

http://dtai.cs.kuleuven.be