Complex Networks and Data Mining on genetic databases
-
Upload
luiz-max-carvalho -
Category
Education
-
view
86 -
download
2
description
Transcript of Complex Networks and Data Mining on genetic databases
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Who’s this guy?
Knowledge Discovery in Databases throughComplex Networks: application to
phylodynamics
Luiz Max F. de CarvalhoScientific Computing Programme (PROCC), FiocruzPan American Center for Foot-and-Mouth Disease
(PAHO/WHO)
WaFiS 2012
September 28, 2012
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Outline
1 Knowledge Discovery in Databases (KDD)
2 Complex Networks
3 Example 1: Chitin pathway phylogeny
4 Example 2: Foot-and-mouth disease virus in South America
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Knowledge Discovery in Databases (KDD)
Lots of data
human brain very limited processing capacity
Information → Knowledge
Increasing number of molecular data (sequences, 3Dstructures, antigenicity,. . . )
Is it possible to explore these databases to discover usefulstuff?
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Well. . . Let’s see
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
[We may use] Complex Networks
Graphs → G = (V ,E )
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Yeah, but how?
We can explore the ”dynamic signature” of these ComplexNetworks, i.e., study and compare their structural properties.Some useful formulas:
Clustering Coefficient < c >: 3×#triangles#triples
Degree distribution PK =∑∞
K ′=K pK ′
Diameter: max(d(i , j))
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Ok, Let’s work then
1 Grab n sequences;
2 Create an n × n matrix using some kind of (normalized)distance (say, S);
3 For each σ ∈ [0, 1] build M(σ) such that:
mij(σ) =
{1 if Sij > σ,
0 if Sij < σ.
In a sense, we are transforming a single network in a family ofnetworks.
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Analysis
We shall explore the relationships between these networks:First, define a higher-order neighborhood indicator function,such that you binarize the adjacency matrix with regard thepath length `, obtaining a matrix M =
∑D`=1 `M(`). Then
δ(α, β) =1
N2
N∑i=1
N∑j=1
(mij(α)
D(α)−
mij(β)
D(β)) (1)
Evaluating δ(σ, σ + ∆σ) can give some interesting insights.
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: Chitin pathway phylogeny
Proteins related to the chitin metabolic pathway from1605 complete genomes;
BLAST distances (which are asymmetric);
Search for phylogenetic relationships
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: Some results
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: Some more results
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: The expected Network(s)
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 2: Foot-and-mouth disease virus in SouthAmerica
S was built with phylogenetic (TN93) distances for NTand JTT distances for AA;
Try to make sense of a somewhat big data set (167 seqs);
Extract some nice patterns;
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Indexes × σ
(a) (b)
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
A nice network
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Some more developments
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Related Work
Identify transmission clusters (HIV, HCV) (Lewis et al,2008,Plos Medicine)
Explore scale-free behavior in phylodynamics (Shiino,2012, Frontiers in Microbiology)
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Future Directions
Explore the spatial aspect in the construction of SMaybe S = µ+ S(G )α
Power law analysis
Implement assortativity
Suggestions. . .
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Thank You!