Single Cell Isoform Sequencing (scIso-Seq) Identifies ... · Title: Single Cell Isoform Sequencing...
Transcript of Single Cell Isoform Sequencing (scIso-Seq) Identifies ... · Title: Single Cell Isoform Sequencing...
For Research Use Only. Not for use in diagnostics procedures. © Copyright 2019 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners.
Single Cell Isoform Sequencing (scIso-Seq) Identifies Novel
Full-length mRNAs and Cell Type-specific Expression Elizabeth Tseng1, Jason G. Underwood1, Aparna Bhuduri2, Calum Marrs3, Filip Konopacki3, Daniel Wong3, Katherine M. Munson5, Alexandra Lewis5, Alex A. Pollen2 and Evan E. Eichler4,5
1 PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 2 Department of Neurology, University of California-San Francisco, San Francisco, CA 3 Dolomite/Blacktrace Holdings Ltd; 27 Jarman Way, Royston, SG8 5HW, UK 4 Howard Hughes Medical Institute, University of Washington, Seattle, WA5 Department of Genome Sciences, University of Washington, Seattle, WA
Single cell RNA-seq (scRNA-seq) is an emerging
field for characterizing cell heterogeneity in
complex tissues. However, most scRNA-seq
methodologies are limited to gene count
information due to short read lengths.
Here, we combine the microfluidics scRNA-seq
technique, Drop-Seq, with PacBio Single
Molecule, Real-Time (SMRT) Sequencing to
generate full-length transcript isoforms that can be
confidently assigned to individual cells.
We generated single cell Iso-Seq (scIso-Seq)
libraries for chimp and human cerebral organoid
samples on the Dolomite Nadia platform and
sequenced each library with two SMRT Cells 8M
on the PacBio Sequel II System. We developed a
bioinformatics pipeline to identify, classify, and
filter full-length isoforms at the single-cell level. We
show that scIso-Seq reveals full-length isoform
information not accessible using short reads that
can reveal differences between cell types and
amongst different species.
Abstract scIso-Seq on Cerebral Organoids Using SQANTI2 for QC
Single Cell Iso-Seq (scIso-Seq)
Full Splice Match = perfect match
Incomplete Splice Match = partial match
Novel In Catalog = novel isoform with known junctions
Novel Not in Catalog = at least one novel junction
Within intron
Overlap with intron and exons
Figure 1. Full-length single cell libraries were generated
using the Dolomite Nadia system and made into PacBio Iso-
Seq libraries. Sequencing was performed on the Sequel II
System.
SQANTI2 [2] matching Iso-Seq transcripts to
existing annotations and provides a wide range of
descriptors of transcript quality.
(a) Comparison against Gencode v29 transcript annotation reveals ~46%
scIso-Seq transcripts are novel.
(b) Novel scIso-Seq isoforms retain coding potential
GENCODE v29
87.3%
3.8%
6.1%
2.8%
Human Developing Cortex
Single Cell
Intropolis v1
Novel
(c) Most junctions are validated by existing annotations and RNA-seq data.
Class # Isoforms% with CAGE Peak
≤50 bp
Full Splice Match 18,344 78%
Incomplete Splice Match 13,802 37%
Novel In Catalog 19,033 44%
Novel Not In Catalog 9,197 67%
Intergenic 245 29%
(d) Matching FANTOM5 CAGE peak data with scIso-Seq transcripts
shows full splice matches (perfect junction matches to annotations)
are enriched for known TSS.
[1] cDNA_Cupcake https://github.com/Magdoll/cDNA_Cupcake
[2] SQANTI2 https://github.com/Magdoll/SQANTI2/
REFERENCES
Post PCR
PacBio. Shear
Iso-Seq
3’ RNA-Seq
5’ primer UMI 3’ primer(AAA)n BarcodeFull-Length cDNA
SAMPLE
POL
READS
(bp)
POL
BASE
(GB)
POL
LENGTH
(kb)
Chimp Organoid 3,157,575 199 62.9
Chimp Organoid 3,637,178 206 56.7
Human Organoid 2,856,715 177 62.1
Human Organoid 5,277,118 265 50.2
Table 1. Sequencing yield for the chimp and human organoid
single cell Iso-Seq (scIso-Seq) libraries on each of two SMRT
Cells 8M run on the Sequel II System.
SAMPLEFLNC
(filtered)
UNIQUE
READS
UNIQUE
GENES
UNIQUE
ISOFORMS
Chimp
Organoid2,303,267 418,542 14,049 58,892
Human
Organoid2,291,947 382,734 14,737 60,815
1. HiFi (CCS) reads
2. Remove cDNA primers
3. Clip UMIs and BCs
4. Remove polyA tail &
artificial concatemers
5. Align to genome
6. Collapsed redundancy
7. Compare w/ annotation
8. Filter library artifacts
9. Process into CSV
report for UMI/BC error
correction
Figure 2. Bioinformatics analysis for scIso-Seq data. The pipeline
is described in Cupcake [1].
Table 2. Number of unique (de-duplicated) full-length, non-
concatemer (FLNC) reads and corresponding number of unique
genes and transcript isoforms.
33456 57
PacBio Illumina
Barcodes
Figure 3. Matching cell barcodes between PacBio scIso-Seq data
and Illumina short-read data shows high concordance. The scIso-
Seq cumulative read plot indicates ~400 STAMPS (single cells).
20 kb hg38
74,025,000 74,030,00074,035,000 74,040,000 74,045,00074,050,000 74,055,000 74,060,00074,065,000 74,070,00074,075,000
Human Organoid Astrocytes
Human Organoid RG/glycolysis/choroid
Human Organoid G2M
Human Organoid Mitochondia/S phase cells
Human Organoid Neuron
Human Organoid Outlier
Human Organoid Progenitor
GENCODE v29 Comprehensive Transcript Set (only Basic displayed by default)
NCBI RefSeq genes, curated subset (NM_*, NR_*, and YP_*) - Annotation Release NCBI Homo sapiens Annotation Release 109 (2018-03-29) ELN
ELN
ELN
ELN
ELN
ELN
ELN
ELN
ELN
ELN
ELN
ELN
ELN
1 _
1 _
Figure 4. The tropoelastin gene is found only to be expressed in
astrocytes but not other human organoid cell types and shows
exon skipping events and usage of alternative start/end sites.
Scale
chr19:
20 kb hg38
53,520,000 53,525,000 53,530,000 53,535,000 53,540,000 53,545,000 53,550,000 53,555,000 53,560,000 53,565,000 53,570,000 53,575,000 53,580,000Chimp Organoid Astrocytes
Chimp Organoid Choroid
Chimp Organoid CyclingChimp Organoid FibroblastChimp Organoid Neuron
Chimp Organoid Unknown
Human Organoid Astrocytes
Human Organoid RG/glycolysis/choroid
Human Organoid G2M
Human Organoid Mitochondia/S phase cellsHuman Organoid Neuron
Human Organoid OutlierHuman Organoid Progenitor
GENCODE v29 Comprehensive Transcript Set (only Basic displayed by default)
merged_final.bam
PB.11696.5
PB.11696.11
PB.11696.4
PB.11696.12
PB.16078.4
PB.16078.9
PB.16078.1
PB.16078.3
PB.16078.2
PB.16078.8
PB.16078.12
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
ZNF331
Figure 5. Alternative transcription start site (TSS) usage in chimp
and human in ZNF331.
Figure 6. SQANTI2 comparison against existing annotations
confirms scIso-Seq data consists of full-length (5’-3’) transcript
isoforms.