Complex Networks and Data Mining on genetic databases

Post on 25-Jan-2015

86 views 2 download

description

This is my presentation in WaFIS last year, about the use of complex networks on data mining in genetic databases.

Transcript of Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Who’s this guy?

Knowledge Discovery in Databases throughComplex Networks: application to

phylodynamics

Luiz Max F. de CarvalhoScientific Computing Programme (PROCC), FiocruzPan American Center for Foot-and-Mouth Disease

(PAHO/WHO)

WaFiS 2012

September 28, 2012

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Outline

1 Knowledge Discovery in Databases (KDD)

2 Complex Networks

3 Example 1: Chitin pathway phylogeny

4 Example 2: Foot-and-mouth disease virus in South America

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Knowledge Discovery in Databases (KDD)

Lots of data

human brain very limited processing capacity

Information → Knowledge

Increasing number of molecular data (sequences, 3Dstructures, antigenicity,. . . )

Is it possible to explore these databases to discover usefulstuff?

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Well. . . Let’s see

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

[We may use] Complex Networks

Graphs → G = (V ,E )

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Yeah, but how?

We can explore the ”dynamic signature” of these ComplexNetworks, i.e., study and compare their structural properties.Some useful formulas:

Clustering Coefficient < c >: 3×#triangles#triples

Degree distribution PK =∑∞

K ′=K pK ′

Diameter: max(d(i , j))

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Ok, Let’s work then

1 Grab n sequences;

2 Create an n × n matrix using some kind of (normalized)distance (say, S);

3 For each σ ∈ [0, 1] build M(σ) such that:

mij(σ) =

{1 if Sij > σ,

0 if Sij < σ.

In a sense, we are transforming a single network in a family ofnetworks.

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Analysis

We shall explore the relationships between these networks:First, define a higher-order neighborhood indicator function,such that you binarize the adjacency matrix with regard thepath length `, obtaining a matrix M =

∑D`=1 `M(`). Then

δ(α, β) =1

N2

N∑i=1

N∑j=1

(mij(α)

D(α)−

mij(β)

D(β)) (1)

Evaluating δ(σ, σ + ∆σ) can give some interesting insights.

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: Chitin pathway phylogeny

Proteins related to the chitin metabolic pathway from1605 complete genomes;

BLAST distances (which are asymmetric);

Search for phylogenetic relationships

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: Some results

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: Some more results

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: The expected Network(s)

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 2: Foot-and-mouth disease virus in SouthAmerica

S was built with phylogenetic (TN93) distances for NTand JTT distances for AA;

Try to make sense of a somewhat big data set (167 seqs);

Extract some nice patterns;

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Indexes × σ

(a) (b)

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

A nice network

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Some more developments

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Related Work

Identify transmission clusters (HIV, HCV) (Lewis et al,2008,Plos Medicine)

Explore scale-free behavior in phylodynamics (Shiino,2012, Frontiers in Microbiology)

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Future Directions

Explore the spatial aspect in the construction of SMaybe S = µ+ S(G )α

Power law analysis

Implement assortativity

Suggestions. . .

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Thank You!