interPopula
-
Upload
tiago -
Category
Technology
-
view
715 -
download
0
description
Transcript of interPopula
interPopula: Database and toolintegration for population genetics
With a focus on the HapMap project
Tiago Rodrigues Antaohttp://popgen.eu/soft/interPop
Liverpool School of Tropical Medicine, UK
interPopula – p. 1
Preamble – the HapMap project(and UCSC Known Genes)
interPopula – p. 2
HapMap
The goal of the International HapMap Project is to developa haplotype map of the human genome, the HapMap, whichwill describe the common patterns of human DNAsequence variation. The HapMap is expected to be a keyresource for researchers to use to find genes affectinghealth, disease, and responses to drugs and environmentalfactors. The information produced by the Project will bemade freely available.http://hapmap.ncbi.nlm.nih.gov/
interPopula – p. 3
What is there?
11 pops, 90–180 individuals/pop (some cases with familytrios), >3M SNPs
Frequencies (e.g. for population P and SNP S, there are30% of As and 70% of Cs)
Genotypes (data per individual)
Phasing data
Pedigree info
LD (linkage disequilibrium) computations
Copy Number Variation (CNV) info – New!
A second generation human haplotype map of over 3.1million SNPs. Nature 449, 851-861. 2007.
interPopula – p. 4
UCSC Known Genes
A gene set constructed by an automated process, basedon protein data from Swiss-Prot/TrEMBL (UniProt) andthe associated mRNA data from Genbank
Inside UCSC Genome Browserhttp://genome.ucsc.edu/
Not only for humans (but options limited, less than ahandful of species)
Really useful for HapMap data (allows to relate SNPswith gene information in a much easier way than EntrezSNP)
Hsu et al, Bioinformatics, 2006 22(9):1036-1046 (but seeGenome Browser updates on NAR)
interPopula – p. 5
We now return to our regularlyscheduled program – interPopula
interPopula – p. 6
Introduction – 1
A Python library to access HapMap and UCSC KnownGenes data
A set of scripts providing integration examples.Integrating interPopula with Biopython, matplotlib,Genepop and Entrez SNP. Interaction with the ecology ofPopGen databases and Python tools encouraged
A set of guidelines to deal with inconsistencies acrossdatabases
Very easy to use, many examples
For Perl: Ensembl Variation API (Rios et al. BMCBioinformatics 2010, 11:238)
interPopula – p. 7
Introduction – 2
Python (2.6) based. Test coverage very high
Uses sqlite (Python built-in, no extra dependencies)
Creates a local SQL database from ftp data files
Can be disk and network intensive
Intelligent download: on-demand and never repeats thesame data twice
Database not normalized (for perfomance and spacereasons)
Family support (triage of offspring)
Data export (Genepop). X and Y aware.
interPopula – p. 8
HapMap example
To have a feel of the interface...
freqDB = Frequency()
freqDB.requireChrPop(chr, pop)
RSs = freqDB.getRSsForInterval(chr,
startPos, endPos)
for rs in RSs:
#We get frequency information
freqSNP = freqDB.getPopSNPs(pop, rs)
nuc1, nuc2 = freqSNP[5], freqSNP[6]
a1a1, a2a2, a1a2 = \
freqSNP[7], freqSNP[8], freqSNP[9]
interPopula – p. 9
UCSC Known Genes support
Everything is supported (not that much, just a long textfile plus a link table)
Get different IDs (Ascension ID, Prot ID, other links)
What is near a certain genomic position (chromosomeand position in chromosome)
Get exons for a certain gene
interPopula – p. 10
Integration
Many examples provided on interoperability (withmatplotlib, Entrez SNP, Genepop and Biopython)
Integrating heterogeneous databases
Databases do use different reference assemblies
Example: The exon positions given by the last versionof UCSC Table Browser are not compatible withHapMap (v37 vs v36)
Silent bug where rarely applications crash and resultsseem correct
This issue is discussed in the context ofHapMap/TableBrowser/EntrezSNP and might be usefulin other cases
interPopula – p. 11
Examples – Known Genes
interPopula – p. 12
Examples – HapMap/Integration
interPopula – p. 13
Future work
Focus on HapMap and maybe 1000 Genomes project
The whole UCSC Table Browser will be spin off later in adifferent project
Copy Number Variation support (since June on HapMap)
Phasing support due very soon (like next week)
Provide examples with genome wide association studies
interPopula – p. 14