Webinar about JASPAR BioPython module and MANTA.
-
Upload
amathelier -
Category
Science
-
view
120 -
download
8
Transcript of Webinar about JASPAR BioPython module and MANTA.
www.cmmt.ubc.ca
JASPAR BioPython & MANTA
Anthony Mathelier, David Arenillas & Wyeth Wasserman
[email protected] & [email protected]
Wasserman Lab
2 2
Outline
● JASPAR BioPython module– What is JASPAR?– How to construct matrices from JASPAR files using
the JASPAR BioPython module.
● MANTA– What is stored in MANTA?– How to interrogate the MANTA DB using Python and
our web application.
3 3
http://jaspar.genereg.net
Mathelier et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014 PMID 24194598
4 4
Modelling Transcription Factor Binding Sites (TFBS)
A [ 1 0 19 20 18 1 20 7 ]C [ 1 0 1 0 1 18 0 2 ]G [17 0 0 0 1 0 0 3 ]T [ 1 20 0 0 0 1 0 8 ]
Example: FOXD1PFM – Position Frequency Matrix
Logo
gctaaGTAACAATgcgcacttaaGTAAACATcgctcccaatGTAAACAAacggagaaagGTAAACAAtgggc GTAAACATgtactcttgtGTAAACAAaaagccttaaGTAAACACgtccgcttatGTCAACAGtgggt tGTAAACATtgcat GTAAACAAtgcgacttagGTAAACATtttcgTTAAGTAAaca caaaATAAACAAcgtgcgctaaCTAAACAGagagagtgttGTAAACATtggaa taatGTAAACAAtgcgggaaagGTAAACATaagaacctaaGTAAACACaacgccctaaGTAAACATtcttatGTAAACAGaggtc
Known binding sites
5 5
Scoring putative TFBS sequences
A [ 1 0 19 20 18 1 20 7 ]C [ 1 0 1 0 1 18 0 2 ]G [17 0 0 0 1 0 0 3 ]T [ 1 20 0 0 0 1 0 8 ]
A [1.5 2.5 1.7 1.8 1.6 1.5 1.8 0.4 ]C [1.5 2.5 1.5 2.5 1.5 1.6 2.5 1.0 ]G [ 1.6 2.5 2.5 2.5 1.5 2.5 2.5 0.6 ]T [1.5 1.8 2.5 2.5 2.5 1.5 2.5 0.6 ]
A C G A G T T A A A C A A G C T AA [1.5 2.5 1.7 1.8 1.6 1.5 1.8 0.4 ]C [1.5 2.5 1.5 2.5 1.5 1.6 2.5 1.0 ]G [ 1.6 2.5 2.5 2.5 1.5 2.5 2.5 0.6 ]T [1.5 1.8 2.5 2.5 2.5 1.5 2.5 0.6 ]
Score = 9.2
PFM PWM – Position Weight Matrix
PWM Sum score at each position
(aka PSSM – Position Specific Scoring Matrix)
7 7
JASPAR Biopython modules
➢ Bio.motifs.jaspar
➢ Read / write motifs encoded in the JASPAR flat file formats: sites, PFM and jaspar
➢ Bio.motifs.jaspar.db
➢ Search / fetch motifs from a JASPAR formatted database.
http://biopython.org*
*Cock et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009 Jun 1;25(11):1422-3. PMID: 19304878
Extend Biopython's Bio.motifs module to support construction of TFBS matrices from JASPAR supported formats.
8 8
Constructing a matrix from a JASPAR sites formatted file
The JASPAR sites format consists of a list of known binding sites for a motif.
9 9
Constructing a matrix from a JASPAR pfm formatted file
The JASPAR pfm format simply describes a frequency matrix for a single motif.
10 10
Constructing matrices from a JASPAR jaspar formatted file
Note the use of the parse rather than the read method to read multiple motifs.
The JASPAR jaspar format allows for multiple motifs. Each record consists of a header line followed by four lines defining the frequency matrix.
11 11
Constructing matrices from a JASPAR jaspar formatted file cont'd
The frequency portions of the file can be specified in a simpler format identical to the pfm format.
12 12
The JASPAR DB module
Connect to a JASPAR database:
Modelled after the Perl TFBS modules*.
Specifically, the Bio.motifs.jaspar.db.JASPAR5 BioPython class is modelled after the TFBS::DB::JASPAR5 perl class.
Fetch a specific motif by it's JASPAR ID:
* Lenhard et al. TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics. 2002 PMID 12176838
13 13
JASPAR DB module cont'dFetch multiple motifs according to various attributes.
Example: fetch the motifs of all the vertebrate and insect transcription factors from the CORE JASPAR collection which are part of the Forkhead family and which have an information content of at least 12 bits:
Note that selection criteria (such a 'tax_group' and 'tf_family') which allow multiple values may be specified either as a single value or as a list of values.
14 14
For more information...
For an overview and examples of using these modules, please see the JASPAR sub-section under the “Reading motifs” section of the BioPython Tutorial and Cookbook: http://biopython.org/DIST/docs/tutorial/Tutorial.html
For more technical information see the Bio.motifs.jaspar section of the BioPython API docs: http://biopython.org/DIST/docs/api
15 15
MANTA
MongoDB for Analysis of TFBS Alteration
Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biology. 2015. PMID 25903198
16 16
MANTA
DB
...gctaaGTAACAATgcgca...
...cttaaGTAAACATcgctc...
...ccaatGTAAACAAacgga...
Adapted from Szalkowski and Schmid (2010). Briefings in Bioinfomatics.
17 17
MANTA Statistics
ChIP-seq experiments 477
Transcription factors 103
TFBSs 9,510,336
Unique bases covered 76,160,599 (~2.25% of the human genome)
AMIA TBI&CRI March 19th-23rd, 2012 18
18
Variations may impact TF binding
TF
Binding sequence
Mutated binding sequence
Transcription initiated
Transcription fails to initiate
TF recognizes binding site
TF fails to recognize binding site
Exon
Exon
5’ UTR
5’ UTR
AGCTAGCTATATTTAAACAACACTGTCTAGCATTGCCTGATAGATGAGCCGTCGCAGCTGGA
AGCTAGCTATATTTAATCCACACTGTCTAGCATTGCCTGATAGATGAGCCGTCGCAGCTGGA
TFTF
25 25
DNASNV
Record best TFBS hit with the mutated sequence
Assessing the impact of variations on TF binding
26 26
DNATFBS
0.80 0.85 0.90 0.95 1.00 1.05 1.10
01
23
45
67
alt/ref
Density
Assessing the impact of variations on TF binding
27 27
DNASNV
0.80 0.85 0.90 0.95 1.00 1.05 1.10
01
23
45
67
alt/ref
Density
Alternative
Assessing the impact of variations on TF binding
28 28
Example of Application of MANTA
Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biology. 2015. PMID
29 29
The MANTA Database
Implemented with MongoDB (http://www.mongodb.org)
Consists of 3 collections:
Experiments
- experiment name, type, TF name, JASPAR matrix ID, etc.
Peaks
- peak position (chromosome, start, end), score, position of maximum peak height, etc.
TFBSs / SNVs
- position (chromosome, start, end), strand, score for the unmutated TFBS plus similar information and impact score for each position / alt. allele mutation.
30 30
MANTA DB with Python
Example: connect to MANTA DB and fetch all TFBS affected by an SNV at position 6425005 on chromosome 19.
31 31
MANTA Web Interface
URL: http://manta.cmmt.ubc.ca/manta
Source code: https://github.com/wassermanlab/MANTA
34 34
Thanks!
Any questions?
Contacts:Anthony Mathelier, [email protected] Arenillas, [email protected]
URLs:Wasserman Lab: www.cisreg.caBioPython: http://biopython.orgMANTA: manta.cmmt.ubc.ca/manta