Bioinformatic analysis of ‘omic’ data in genetic...
Transcript of Bioinformatic analysis of ‘omic’ data in genetic...
![Page 1: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/1.jpg)
Jornades de consultoria estadística i software IIUAB, Octubre 2013
Bioinformatic analysis of ‘omic’ datain genetic epidemiology studies
www.creal.cat
Jornades de consultoria estadística i software IIUAB, Octubre 2013
Juan R GonzalezBRGE – Bioinformatics Research Group in EpidemiologyCenter for Research in Environmental Epidemiology (CREAL)
![Page 2: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/2.jpg)
OMICS
DISEASOME (PHENOTYPE)DISEASOME (PHENOTYPE)
All the disordersand diseases ofan organism,viewed as awhole
www.creal.cat
All the disordersand diseases ofan organism,viewed as awhole
![Page 3: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/3.jpg)
OMICS
EPIGENOMEchanges in gene expression caused by mechanisms other than DNA sequence
tissue and time especific
EPIGENOMEchanges in gene expression caused by mechanisms other than DNA sequence
tissue and time especific
TRANSCRIPTOMEgene expression (RNA)tissue and time especific
TRANSCRIPTOMEgene expression (RNA)tissue and time especific
GENOMEhereditary information (DNA)
stable>99% equal between individuals
1.5% coding genes
GENOMEhereditary information (DNA)
stable>99% equal between individuals
1.5% coding genes
EXPOSOMEdynamic
diet, metalls, air pollution, stress…
EXPOSOMEdynamic
diet, metalls, air pollution, stress…
www.creal.cat
METAGENOME(metatranscriptome, virome…)
bacteria and virus1-3% body’s mass
trilions of microorganisms
METAGENOME(metatranscriptome, virome…)
bacteria and virus1-3% body’s mass
trilions of microorganisms
PROTEOMEtissue and time especific
PROTEOMEtissue and time especific
TRANSCRIPTOMEgene expression (RNA)tissue and time especific
TRANSCRIPTOMEgene expression (RNA)tissue and time especific
DISEASOME (PHENOTYPE)DISEASOME (PHENOTYPE)
METABOLOMEtissue and time especificMETABOLOMEtissue and time especific
![Page 4: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/4.jpg)
OMICS
Why OMICS?
To identifybiomarkers
exposuredisease
To understandmolecularmechanims
www.creal.cat
To identifybiomarkers
exposuredisease
![Page 5: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/5.jpg)
Metabolomic data [serum/plasma] Mass-spectometry data (up to 2500 metabolites)
Proteomic data [plasma] Quantification of up to 30 proteins
Transcriptomic data [DNA whole blood] Gene expression and splice variant expression Also information about regulatory non-coding RNAs(snc-RNAs and miRnAs)
Epigenomic data [DNA whole blood] DNA methylation data
Genomic data [DNA whole blood] (Existing GWAS) SNP array data
OMICS
www.creal.cat
Metabolomic data [serum/plasma] Mass-spectometry data (up to 2500 metabolites)
Proteomic data [plasma] Quantification of up to 30 proteins
Transcriptomic data [DNA whole blood] Gene expression and splice variant expression Also information about regulatory non-coding RNAs(snc-RNAs and miRnAs)
Epigenomic data [DNA whole blood] DNA methylation data
Genomic data [DNA whole blood] (Existing GWAS) SNP array data
![Page 6: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/6.jpg)
SNP array data (GWAS)
Imputed will be analyzedSNP -> 4,000,000 markers
GENOMICS
www.creal.cat
Can bepredictedusing existingGWAS data
![Page 7: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/7.jpg)
samplesge
nes
CasesControls
MOLECULAR PROFILES
www.creal.cat
gene
s
Pathway 1
![Page 8: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/8.jpg)
Data pre-processing Quality control (filtering) Normalization (control for batcheffect)
Statistical analysis Clustering and enrichment/pathwayanalysis Visualization and Annotation
PIPELINE FOR OMIC DATA
www.creal.cat
Data pre-processing Quality control (filtering) Normalization (control for batcheffect)
Statistical analysis Clustering and enrichment/pathwayanalysis Visualization and Annotation
![Page 9: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/9.jpg)
Step 2: Statistical Analysis
RNA-seq: 0, 1, 2, ….., 2456, …, 34567, …Generalized linear models(Negative Binomial)
Transcriptome
www.creal.catMicroarray
RNA-seq
Microarrays: 6.1, 6.9, 12.4, 11.4, 8.5, …Generalized linear models(Gaussian)
![Page 10: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/10.jpg)
Step 2: Statistical Analysis
Peaks: 11234, 1353, 1234, 12, 455, 122, …Clustering methods(Non-parametric)
Metabolome
www.creal.cat
CpG islands: 1, 0, 1, 1, 0, 1, 0, …Beta regression(Beta distribution)
Epigenome
![Page 11: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/11.jpg)
Step 3: Enrichment/pathway analysis
Gene set (Under- or over-expressed)
www.creal.cat
![Page 12: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/12.jpg)
GO, KEGG, “Exposure-related”, …
Step 3: Enrichment/pathway analysis
www.creal.cat
![Page 13: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/13.jpg)
Step 4: Annotation and visualization
www.creal.cat
![Page 14: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/14.jpg)
Step 4: Annotation and visualization
5
2
0
2
Z-value
www.creal.cat
Controls
Cases
5
![Page 15: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/15.jpg)
BMI
SCZAsthma-obesity
Step 4: Annotation and visualization
www.creal.cat
IQ
BMIAsthma-obesity
![Page 16: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/16.jpg)
Softwarewww.bioconductor.org
www.creal.cat
![Page 17: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/17.jpg)
GENOMet Network (IP Juan R González, MTM2010-09526-E)
www.creal.cat/jrgonzalez/software.htm
Software
www.creal.cat
![Page 18: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/18.jpg)
Bioinformatics & Data Analysis
Software and statistical methods – SNP arraysSNPassoc (CRAN) – paper 190 cites [Bioinformatics] [SNPs]CNVassoc (CRAN) – 4th most viewed paper [BMC Genomics] [CNVs]R-GADA (R-forge) – Highly accessed [BMC Bioinformatics] [CNVs]inveRsion (Bioconductor) – Highly accessed [BMC Bioinformatics] [Inversions]invClust[Bioinformatics] [Inversions]BayesGen (CRAN) [Statistics in Medicine] [CNVs & SNPs]MLPAstats (CRAN) [BMC Bioinformatics] [CNVs]MAD (R-forge) [BMC Bioinformatics] [Mosaicisms]
Used to analyzed ~58,000 genomes [Nat Gen, 2012]
Software and statistical methods – SequencingtweeDEseq (Bioconductor) [BMC Bioinformatics] [RNAseq]GRIAL [Bioinformatics] [Inversions]MADseq (Bioconductor) [In progress] [Mosaicisms]RASP (Bioconductor) [In progress] [Alternative Splicing]
www.creal.cat
Software and statistical methods – SNP arraysSNPassoc (CRAN) – paper 190 cites [Bioinformatics] [SNPs]CNVassoc (CRAN) – 4th most viewed paper [BMC Genomics] [CNVs]R-GADA (R-forge) – Highly accessed [BMC Bioinformatics] [CNVs]inveRsion (Bioconductor) – Highly accessed [BMC Bioinformatics] [Inversions]invClust[Bioinformatics] [Inversions]BayesGen (CRAN) [Statistics in Medicine] [CNVs & SNPs]MLPAstats (CRAN) [BMC Bioinformatics] [CNVs]MAD (R-forge) [BMC Bioinformatics] [Mosaicisms]
Used to analyzed ~58,000 genomes [Nat Gen, 2012]
Software and statistical methods – SequencingtweeDEseq (Bioconductor) [BMC Bioinformatics] [RNAseq]GRIAL [Bioinformatics] [Inversions]MADseq (Bioconductor) [In progress] [Mosaicisms]RASP (Bioconductor) [In progress] [Alternative Splicing]
![Page 19: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/19.jpg)
Limitations
Data storage Computing time
www.creal.cat
Data storage Computing time
![Page 20: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/20.jpg)
Bioinformatics & Data Analysis
Infrastructure @ CREAL
3 workstations (one more by the end of the next month)WK1: CPU with 2GHz and 32G of RAM memory (8 cores)WK2: CPU with 2GHz and 32G of RAM memory (8 cores)WK3: CPU with 2GHz and 64G of RAM memory (24 cores)
Disk space1Tb (WK1) + 2Tb (WK1) + 6Tb (WK1)+ External device 14Tb
www.creal.cat
Infrastructure @ CREAL
3 workstations (one more by the end of the next month)WK1: CPU with 2GHz and 32G of RAM memory (8 cores)WK2: CPU with 2GHz and 32G of RAM memory (8 cores)WK3: CPU with 2GHz and 64G of RAM memory (24 cores)
Disk space1Tb (WK1) + 2Tb (WK1) + 6Tb (WK1)+ External device 14Tb
![Page 21: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/21.jpg)
Limitations
Cloud computing ???
www.creal.cat
![Page 22: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/22.jpg)
Challenges
www.creal.cat
![Page 23: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/23.jpg)
Challenges
www.creal.cat
![Page 24: Bioinformatic analysis of ‘omic’ data in genetic ...jornades.uab.cat/consultoriaestadistica/sites/... · Bioinformatic analysis of ‘omic’ data in genetic epidemiology studies](https://reader033.fdocuments.net/reader033/viewer/2022043004/5f86f55f73608e3e933968cf/html5/thumbnails/24.jpg)
Bioinformatics & Data Analysis
Human Resources
Luis Pérez-JuradoArmand Gutiérrez
Tonu EskoEva ReinmaaLili MilaniAndreas Mestpalu
www.creal.cat
Luis Pérez-JuradoArmand Gutiérrez
Benjamín Rodríguez-Santiago
Tonu EskoEva ReinmaaLili MilaniAndreas Mestpalu
Marta PuigMario Cáceres