Learning the Structure of Related Tasks

16
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana

description

Learning the Structure of Related Tasks. A. Niculescu-Mizil, R. Caruana. Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006. Outline. Introduction Learning single Bayes networks from data Learning from related tasks Experimental results Conclusions. x 1. - PowerPoint PPT Presentation

Transcript of Learning the Structure of Related Tasks

Page 1: Learning the Structure of Related Tasks

Learning the Structure of Related Tasks

Presented by Lihan He

Machine Learning Reading Group

Duke University

02/03/2006

A. Niculescu-Mizil, R. Caruana

Page 2: Learning the Structure of Related Tasks

Outline

Introduction

Learning single Bayes networks from data

Learning from related tasks

Experimental results

Conclusions

Page 3: Learning the Structure of Related Tasks

Introduction

Graphical model:

Node represents random variables; edge represents dependency.

Undirected graphical model: Markov network

Directed graphical model: Bayesian network

x1

x2 x3

x4

Causal relationships between nodes;

Directed acyclic graph (DAG) : No directed cycles allowed;

B={G,θ}

),,,( 4321 XXXXp

)|()|()|()( 3,2413121 XXXpXXpXXpXp

Page 4: Learning the Structure of Related Tasks

Introduction

Goal: simultaneously learn Bayes Net structures for multiple tasks.

Different tasks are related;

Structures might be similar, but not identical.

Example: gene expression data.

1) Learning one single structure from data.

2) Generalizing to multiple task learning by setting joint prior of structures.

Page 5: Learning the Structure of Related Tasks

Single Bayesian network learning from data

Bayes Network B={G, θ}, including a set of n random variables X={X1, X2,…, Xn}

Joint probability P(X) can be factorized by

Given dataset D={x1, x2, …, xm}, where xi = (x1,x2,…,xn), we can learn structure G

and parameter θ from the dataset D.

Page 6: Learning the Structure of Related Tasks

Single Bayesian network learning from data

Model selection: find the highest P(G|D) for all possible G

Searching for all possible G is impossible:

n=4, there are 543 possible DAGs

n=10, there are O(1018) possible DAGs

Question: How to search the best structure in the huge amount of possible DAGs?

Page 7: Learning the Structure of Related Tasks

Algorithm:

1) Randomly generate an initial DAG, evaluate its score;

2) Evaluate the scores of all the neighbors of current DAG;

3) while {some neighbors have higher scores than current DAG}

move to the neighbor that has the highest score

Evaluate the scores of all the neighbors of the new DAG;

end

4) Repeat (1) - (3) a number of times starting from different DAG every time.

Single Bayesian network learning from data

Page 8: Learning the Structure of Related Tasks

Neighbors of a structure G: the set of all the DAGs that can be obtained by adding, removing or reversing an edge in G

Single Bayesian network learning from data

Must satisfy acyclic constraint

x1

x2 x3

x4

x1

x2 x3

x4

x1

x2 x3

x4

x1

x2 x3

x4

x1

x2 x3

x4

Page 9: Learning the Structure of Related Tasks

Given iid dataset D1, D2, …, Dk,

Simultaneously learn the structure B1={G1, θ1} ,B2={G2, θ2},…,Bk={Gk, θk}

Structures (G1,G2,…,Gk) – similar, but not identical

Learning from related task

Page 10: Learning the Structure of Related Tasks

Learning from related task

One more assumption: the parameters of different networks are

independent:

Not true, but make structure learning more efficient. Since we focus on structure

learning, not parameter learning, this is acceptable.

Page 11: Learning the Structure of Related Tasks

Learning from related task

Prior:

If structures are not related: G1,…,Gk are independent a priori

Structures are learned independently for each task.

If structures are identical, )...(),...,( 11 kk GGcGGp

Learning the same structure:

},...2,1{ ),,,...,,(),...,,( 2121 kTSKTSKXXXXXX nn

Learning the single structure under the restriction that TSK is always the parent of all the other nodes.

Common structure: remove node TSK and all the edges connected to it.

Page 12: Learning the Structure of Related Tasks

Learning from related task

Prior:

Between independent and identical:

Penalize each edge (Xi, Xj) that is different in two DAGs

δ=0: independent

δ=1: identical

0<δ<1

For the k task prior

Page 13: Learning the Structure of Related Tasks

Learning from related task

Model selection: find the highest P(G1,…,Gk|D1,…Dk)

Same idea as single task structure learning.

Question: what is a neighbor of (G1,…,Gk) ?

Def 1: )()...()( 21 kGneighborGneighborGneighbor

Size of neighbors: O(n2k)

Def 2: Def1 + one more constraint:

All the changes of edges happen between the same two nodes for all DAGs in (G1,…,Gk)

Size of neighbors: O(n23k)

Page 14: Learning the Structure of Related Tasks

Learning from related task

Acceleration:

At each iteration, algorithm must find best score from a set of neighbors

Not necessary search all the elements in

),...,,( 21 ii GGGC

The first i tasks are specified and the rest k-i tasks are not specified.

k

irrr

i

ppp

k

ksrisr

k

iqpqp GDPGDPGGPGGP

11

1

1

1

1

1

1)ˆ|()|()ˆ,ˆ(),(

where

is the upper bound of the neighbor subset )ˆ,...ˆ,,...,,( 121 kii GGGGG

Page 15: Learning the Structure of Related Tasks

Results

Original network, delete edges with probability Pdel, create 5 tasks.

1000 data points.

10 trials

Compute KL-divergence and editing distance between learned

structure and true structure.

KL-divergence Editing distance

Page 16: Learning the Structure of Related Tasks

Learning from related task