1 CSE Dept., University of Connecticut 2 Immunology Dept., UCHC
description
Transcript of 1 CSE Dept., University of Connecticut 2 Immunology Dept., UCHC
Analysis Pipeline
Genome sequence
mRNA reads
Consensus coding
sequences
CCDSMapping
Genome Mapping
Read merging
EpitopePrediction
Merged mapped
reads
Expressed SNPsCCDS
mapped reads
Genome mapped
reads
Immunogenic mutations
Known SNPs
SNP calling
Approaches to Phasing
• We maximize the number of accurately mapped reads by using Maq to map them against both the reference genome and reference transcripts (from CCDS database)
• For combining read mapping results we implemented two approaches called hard merging and soft merging– Hard merging throws away reads that are mapped uniquely by one
procedure and to multiple places by the other while soft merging keeps the unique alignment for these reads
– Both merging methods keep reads mapped uniquely to the same place by both mapping procedures and reads mapped by one procedure andnot mapped by the other
– Both methods throw away reads mapped uniquely to different places by both mapping procedures and reads mapped multiple times by both procedures.
Mapping Reads
SNP Calling Methods•Binomial: binomial test used in [3,7] for calling SNPs from genomic DNA
– Binomial probability based on the two highest allele counts under the null hypothesis that the genotype is heterozygous
•Posterior: uses base quality scores and read mapping probabilities
– Conditional probability of observing read data given each possible genotype G is computed as a product of read contributions, assuming independence between reads. A base b mapped with error probability eb contributes as follows:• For G=XX, 1-eb if X = b, eb/3 otherwise• For G=XY, X≠Y, (1-eb)/2+eb/6 if X=b or Y=b, eb/3 otherwise
– Maq mapping probabilities taken into account by raising the corresponding term to the probability that the read is mapped correctly at this location
– The posterior probability of each genotype is then evaluated assuming uniform priors. A variant is called in this approach if the genotype with highest posterior probability is different than homozygous reference and exceeds a user specified threshold
Results on Cancer Cell Line Reads
259138591987429561102567softmerged
220433931904528532100215hardmerged
18792924167122533393499genome
277741212000629623102874transcripts
0.9990.990.950.90.1Threshold
23513186444751147371softmerged
20012773397346096775hardmerged
17272422350540936108genome
25083376466153517638transcripts
0.9990.990.950.90.1Threshold
Alt. Coverage 1
Alt. Coverage 3
Results on Hapmap Reads
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 200 400 600 800 1000 1200
False Positives
Tru
e P
osi
tive
s
Posterior
Binomial Exact
Binomial Cumulative
Maq
0
500
1000
1500
2000
2500
3000
3500
0 20 40 60 80
False Positives
Tru
e P
osi
tive
s
Posterior
Binomial Exact
Binomial Cumulative
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 100 200 300 400
False Positives
Tru
e P
osi
tive
s
Transcripts
Genome
Hard Merge
Soft Merge
0
500
1000
1500
2000
2500
3000
3500
0 5 10 15 20 25 30 35
False Positives
Tru
e P
osi
tive
s
Transcripts
Genome
Hard Merge
Soft Merge
Alt. coverage 1 Alt. coverage 3
Validation Results
•Predicted immunogenic mutations are currently being validated by Sanger sequencing
•For confirmed mutations, peptide immunogenicity will be validated experimentally
• Proposed pipeline has improved accuracy for detecting mutations compared to previous methods
• Ongoing work includes refining the posterior method to increase mutation detection robustness in the presence of differential allelic expression and detecting short immunogenic indels
Conclusions & Ongoing Work
Experimental Setup•We tested the performance of implemented methods on 63 million Illumina mRNA reads generated from blood cell tissue of Hapmap individual NA12878 [2] (NCBI SRA database accession number SRX000566)
•We included in evaluation Hapmap SNPs in known exons for which there was at least one mapped read by any method
•Hapmap genotypes for these SNPs: 22,362 homozygous reference, 7,893 heterozygous or homozygous variant
•True positives: called SNPs for which Hapmap genotype is heterozygous or homozygous variant
•False positives: called SNPs for which Hapmap genotype is homozygous reference
•We also ran our analysis pipeline on 6.75 million Illumina reads from mRNA isolated from a mouse cancer tumor cell line
• Immunotherapy is a promising cancer treatment approach that relies on awakening the immune system to the presence of antigens associated with tumor cells
• The success of this approach depends on the ability to reliablydetect immunogenic cancer mutations, the vast majority of which are expected to be tumor-specific [6]
• In this poster we present a bioinformatics pipeline for detecting immunogenic cancer mutations from high throughput mRNA sequencing data
• Immunogenic mutations predicted by our pipeline from IlluminamRNA reads generated from a mouse cancer tumor cell line are currently under experimental validation
Introduction
Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing
Jorge Duitama1, Ion Mandoiu1, and Pramod Srivastava2
1CSE Dept., University of Connecticut2Immunology Dept., UCHC