AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures...
-
Upload
avis-warner -
Category
Documents
-
view
217 -
download
0
Transcript of AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures...
![Page 1: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/1.jpg)
AMOS tools for assembly validation
Automatically scan an assembly to locate misassembly signatures for further analysis and correction
Load Assembly Data into Bank Evaluate Mate Pairs & Libraries Evaluate Read Alignments Evaluate Read Breakpoints Analyze Depth of Coverage Identify “Surrogates” Load Misassembly Signatures into Bank
AMOSBank
http://amos.sourceforge.net
![Page 2: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/2.jpg)
Assembly QC: mate happiness
Evaluate mate “happiness” across assembly Happy = Correct orientation and distance
Finds regions with multiple: Compressed Mates (too close together) Expanded Mates (too far apart) Invalid same orientation ( ) Invalid “outie” orientation ( ) Missing Mates
Linking mates (mate in a different scaffold) Singleton mates (mate is not in any contig)
Regions with high C/E statistic
![Page 3: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/3.jpg)
Mate happiness
Excision: Skip reads between flanking repeats
Truth
Misassembly: Compressed Mates, Missing Mates
![Page 4: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/4.jpg)
Mate happiness
Insertion: Additional reads between flanking repeats
Truth
Misassembly: Expanded Mates, Missing Mates
![Page 5: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/5.jpg)
Mate happiness
Rearrangement: Reordering of reads
Truth
Misassembly: Misoriented Mates
AB
Note: if A,B too far apart, mates may all be “happy”
BA
![Page 6: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/6.jpg)
Compression/Expansion (C/E) Statistic
The presence of individual compressed or expanded mates is rare but expected
Do the inserts spanning a given position differ from the rest of the library?
Flag large differences as potential misassemblies Even if each individual mate is “happy”
Compute the statistic at all positions (Local Mean – Global Mean) / Scaling Factor
Introduced by Jim Yorke’s group at UMD
![Page 7: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/7.jpg)
Library size variation
2kb 4kb 6kb
8 inserts: 3kb-6kb
Local Mean: 4048
C/E Stat: (4048-4000) = +0.33
(400 / √8)
Near 0 indicates overall happiness
0kb
![Page 8: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/8.jpg)
C/E statistic: Compression
8 inserts: 3.2 kb-4.8kb
Local Mean: 3488
C/E Stat: (3488-4000) = -3.62
(400 / √8)
C/E Stat ≤ -3.0 indicates Compression
2kb 4kb 6kb0kb
![Page 9: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/9.jpg)
Read Alignment
Multiple reads with same conflicting base are unlikely
1x QV 30: 1/1000 base calling error 2x QV 30: 1/1,000,000 base calling error 3x QV 30: 1/1,000,000,000 base calling error
Correlated SNPs are likely to be assembly errors, usually collapsed repeats
AMOS Tools: analyzeSNPs & clusterSNPs Locate regions with high rate of correlated SNPs Parameterized thresholds:
Multiple positions within 100bp sliding window 2+ conflicting reads Cumulative QV >= 40 (1/10000 base calling error)
A G C A G C A G C A G C A G C A G C C T A C T A C T A C T A C T A
![Page 10: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/10.jpg)
“chimeric” reads mates
ribosomal RNA repeats, B. anthracis
Read breakpoints: compression error
QC METHOD:
Align singleton reads to consensus assembly
Find any breakpoints shared by multiple reads
![Page 11: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/11.jpg)
“Uncompress” by creating new repeat copy
Tandem duplication
Reference: B. anthracis Ames ‘ancestor’ strain
B. anthracis Ames Porton Down strain
![Page 12: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/12.jpg)
Read Coverage Find regions of contigs where the
depth of coverage is unusually high
AMOS Tool: analyzeReadDepth 2.5x mean coverage
A R1 + R2 B
A R1 BR2
![Page 13: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/13.jpg)
Hawkeye: assembly viewer and debugger
![Page 14: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/14.jpg)
Launch Pad
![Page 15: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/15.jpg)
Histograms & Statistics
InsertSize
GCContent
ReadLength
OverallStatistics
Bird’s eye view of data and assembly quality
![Page 16: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/16.jpg)
Scaffold View
a. Statistical Plots
b. Scaffold
c. Features
d. Clone inserts
e. Overview
f. Control Panel
g. Details
![Page 17: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/17.jpg)
Standard Feature Types
[B] BreakpointAlignment ends at this position
[C] CoverageLocation of unusual mate coverage (asmQC)
[S] SNPsLocation of Correlated SNPs
[U] UnitigUsed to report location of surrogate unitigs in CA assemblies
[X] OtherAll other Features
![Page 18: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/18.jpg)
Insert (mate) HappinessHappy Oriented Correctly && |Insert Size – Library.mean| <= Happy-Distance *
Library.sd
Stretched Oriented Correctly && Insert Size > Library.mean + Happy-Distance *
Library.sd
Compressed Oriented Correctly && Insert Size < Library.mean - Happy-Distance *
Library.sd
Misoriented Same or Outies
Linking Read’s mate is in some other scaffold
Singleton Read’s mate is a singleton
Unmated No mate was provided for read
Both
mate
s pre
sent
Only
1 r
ead p
rese
nt
![Page 19: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/19.jpg)
Contig View: detailed alignment of reads to contigs
Consensus & Position
ScrollableRead Tiling
Read Orientation DiscrepancyHighlight
Discrepancy
Summary
DiscrepancyNavigation
ContigQuick Select
Regular ExpressionConsensus Search
![Page 20: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/20.jpg)
SNP View
SNP SortedReads
PolymorphismView
Zoom Out
![Page 21: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/21.jpg)
SNP Barcode
SNP SortedReads
Colored Rectangle indicate the positions and composition of the SNPs
![Page 22: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/22.jpg)
Scaffold View
CE Statistic
Coverage
SNP Feature
Happy
Stretched
Compressed
Misoriented Linking
![Page 23: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/23.jpg)
Collapsed Repeat
68 Correlated SNPs
-5.5 CE Dip
CompressedMates
Cluster
ReadCoverageSpike
![Page 24: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/24.jpg)
Example 1: Compression in Prevotella intermedia 17assembly, found by the CE statistic
Green inserts are <=2 standard deviations from the mean, and the orange inserts are compressed by > 2 standard deviations.
Vertical yellow line shows the most likely place of a compression misassembly.
Only one insert in this case is compressed by > 3 standard deviations
![Page 25: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/25.jpg)
Example 2: Compression in Prevotella intermedia 17assembly, found by the CE statistic
![Page 26: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/26.jpg)
Fixing collapsed repeats with AMOS
Befo
reA
fter Resolved “Stitched” Contig
Original ContigCompression Point
Patch Contig
![Page 27: AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction Load Assembly.](https://reader038.fdocuments.net/reader038/viewer/2022110405/56649eeb5503460f94bfc0ec/html5/thumbnails/27.jpg)
Assemblies can be preserved at NCBI’s Assembly Archive
http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi