2013 pag-poultry-workshop
-
Upload
ctitusbrown -
Category
Documents
-
view
923 -
download
2
Transcript of 2013 pag-poultry-workshop
![Page 1: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/1.jpg)
C. Titus BrownAsst Prof, CSE and
Microbiology;BEACON NSF STC
Michigan State [email protected]
Evaluating and improving the chick genome & transcriptome
![Page 2: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/2.jpg)
AcknowledgementsThis is joint work with Hans Cheng (USDA
ADOL), Jerry Dodgson (MSU).
Likit Preeyanon (MSU) and Alexis Black Pyrkosz (ADOL) did the work.
All of the software discussed in this talk is available.
This work was primarily supported by the USDA NIFA through a grant to me.
![Page 3: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/3.jpg)
Simulations show that incomplete gene reference => inaccurate differential expression from mRNAseq
Alexis Black Pyrkosz
![Page 4: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/4.jpg)
Existing chick gene models lack exons, isoforms
*This gene contains at least 4 isoforms.
Our data
Models
Likit Preeyanon
![Page 5: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/5.jpg)
(Exon detection is pretty good.)
Likit Preeyanon
![Page 6: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/6.jpg)
Different approaches to gene set prediction yield distinct splice junction predictions
> 95% of thee assembly-based splice junctions are supported by 4 or more
independent reads. Likit Preeyanon
![Page 7: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/7.jpg)
mRNAseq analysis with a combined de novo and genome-based approach.
Likit Preeyanon
![Page 8: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/8.jpg)
We can produce combined gene models.
Cufflinks (ref based) + de novo assembly + known mRNA
![Page 9: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/9.jpg)
Gene Model Summary(note: spleen mRNAseq)
Method Gene Transcript
Global Assembly 14,832 32,311
Local Assembly 15,297 23,028
Global + Local Assembly
15,934 46,797
*Number of genes and transcripts might be overdue to incomplete assemblyand spurious splice junctions.
![Page 10: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/10.jpg)
Cross-validation with technical replicates
Dataset Single-end Paired-end
Mapped Unmapped Mapped Unmapped
Line 6 uninfected
18,375,966 (77.93%)
5,203,586 (22.07%)
21,598,218 (64.16%)
12,065,659
(35.84%)
Line 6 infected
17,160,695(73.18%)
6,288,286 (26.82%)
15,274,638 (63.89%)
8633855 (36.11%)
Line 7 uninfected
18,130,072 (75.77%)
5,795,737 (24.22%)
20,961,033 (63.67%)
11,960,299
(36.33%)
Line 7 infected
19,912,046(78.51%)
5,450,521 (21.49%)
22,485,833 (65.22%)
11,992,002
(34.78%)
Single-ended reads were used to generate gene models; paired-end data was used as technical
replicate cross-validation.
![Page 11: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/11.jpg)
Gene Modeler Pipeline (“gimme”) Merge transcripts together based on transcript mapping
to genome; can include existing gene predictions, & iteratively combine predictions.
Construct gene models Remove redundant sequences Predict strands and ORFs
Likit Preeyanon
![Page 12: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/12.jpg)
Next problem: chick reference!We like using the reference genome to scaffold RNAseq
contigs; purely de novo RNAseq assembly is messy.Genomes are also useful for other things, we hear.Problems:Poor sensitivity: the chick genome is missing a
substantial number of genes from microchromosomes:723 genes from HSA19q missing from chicken galGal4.ESTs and RNAseq transcripts for many or most.
Gaps9900 gaps on ordered chromosomes21k gaps on chr-aligned but low-confidence/unaligned
Over-collapsed tandem dups and under-collapsed het
![Page 13: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/13.jpg)
Sensitivity – where is the problem?Are microchromosomes hard to sequence
or is microchromosomal sequence hard to assemble?
Sequences that simply don’t show up in the data are hard to include in the assembly…Unclonable (Sanger)Strong GC or AT bias
Sequences with biased (generally low) coverage are often discarded by assemblers.
![Page 14: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/14.jpg)
Can we “even out” coverage?(Digital normalization)
If you have two loci, or two mRNA species,
with uneven coverage, can you remove the extra
coverage?
![Page 15: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/15.jpg)
![Page 16: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/16.jpg)
![Page 17: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/17.jpg)
![Page 18: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/18.jpg)
![Page 19: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/19.jpg)
![Page 20: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/20.jpg)
![Page 21: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/21.jpg)
Coverage before digital normalization:
(MD amplified)
![Page 22: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/22.jpg)
Coverage after digital normalization:
Normalizes coverage
Discards redundancy
Eliminates majority oferrors
Scales assembly dramatically.
Assembly is 98% identical.
![Page 23: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/23.jpg)
Prelim results from digital normalizationReassembled chick genome contigs from 70x
Illumina -> normalized reads in ~24 hours.Obtained 40 Mbp of assembled contigs that
were not present in galGal4.Contig assembly contained partial or
complete matches to 70% of previously unmappable transcripts assembled from chick spleen mRNAseq.
Bioinformatics remedies may help but are probably not sufficient.
Likit Preeyanon
![Page 24: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/24.jpg)
Can we improve the assembly?
![Page 25: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/25.jpg)
Longer reads!
Repeat copy 1
Repeat copy 2
Long reads can span repeats
Polymorphic contig 2Polymorphic contig 2
Polymorphic contig 3Polymorphic contig 3
Contig 4Contig 1
and heterozygous regions
slides from http://slideshare.net/flxlex/ ; Lex Nederbragt
![Page 26: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/26.jpg)
PacBio: first results (cod/salmon)
Raw reads
slides from http://slideshare.net/flxlex/ ; Lex Nederbragt
![Page 27: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/27.jpg)
Cod: PacBio resultsMapping to the published genome
11.4 kbp subread
10.6 kbp subread
10.9 kbp subread
slides from http://slideshare.net/flxlex/ ; Lex Nederbragt
![Page 28: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/28.jpg)
Need to combine Illumina + PacBio still.
+
+
2.7x
23x
24 cpus4.5 days 100 Gb RAM
Alignments of at least 1kb to cod published assembly
Raw reads
Err
or-
corr
ect
ed r
eads
P_errorCorrection pipeline from
93% of reads recovered
slides from http://slideshare.net/flxlex/ ; Lex Nederbragt
![Page 29: 2013 pag-poultry-workshop](https://reader037.fdocuments.net/reader037/viewer/2022103114/554e74deb4c9054a698b4ce9/html5/thumbnails/29.jpg)
Concluding thoughts/commentsGene models and reference genome both need
work.
This is going to be a continuing process…
Together with Wes Warren (WUSTL), Hans Cheng (USDA ADOL), Jerry Dodgson (MSU) proposing to apply PacBio sequencing and digital normalization to improve chick genome and regularly integrate community improvements; should be generalizable approach.Questions? Contact me at: [email protected]