Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a...

85
Optical Mapping Data: Data Generation and Algorithms

Transcript of Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a...

Page 1: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

OpticalMappingData:DataGenerationandAlgorithms

Page 2: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

SamplePreparation

Sequencing

Assembly

Analysis

Fragments

Reads

Contigs

Page 3: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

WhatisanOpticalMap?

GGCTTCCGACCACCACAACCGAATTATGAAGGATACCGAA

6,19,35

Opticalmapsareordered,genome-wide,high-resolutionrestrictionmaps.

- Muchlongerthanreads.Forexample,theaveragemapsizeforgoatcovers 360,000bp

- Nowcommerciallyavailable

Page 4: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

.

IsolatedDNA Microfludic device

DNAiselongatedandcleavedontheopticalmappingsurface

Epiflourescence microscopewithCCDcamera

Page 5: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn
Page 6: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn
Page 7: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn
Page 8: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

6 3 3 49

Page 9: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

6 3 3 49

6 3 9 4

Genomewideopticalmap

Page 10: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

“There is [..] a critical need for the continued development and public release of software tools for processing optical mapping data ..”

-GigaScience 2014

Page 11: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Goal:tooltoalignthecontigtoa segmentofan

opticalmap

SamplePreparation

Sequencing

Assembly

Analysis

Genome-wideopticalmap

contigs

OpticalMapData

Page 12: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

• Previousapproachesusedynamicprogramming• Burrows-WheelerTransform(BWT)wouldimprovetimeefficiency

• ChallengesinapplyingBWT:(1)Sizingerrorand(2)alphabetsize

Challenges

6 3 9 4

5 4 9.5 6

ActualopticalmapvaluesOpticalmapobtainedfromexperiment

1 1 0.5 2SIZINGERROR

Page 13: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

• Previousapproachesusedynamicprogramming• Burrows-WheelerTransform(BWT)wouldimprovetimeefficiency

• ChallengesinapplyingBWT:(1)Sizingerror and(2)alphabetsize

Challenges

6 3 9 4

5 4 9.5 6

ActualopticalmapvaluesOpticalmapobtainedfromexperiment

1 1 0.5 2SIZINGERROR

Page 14: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

• Previousapproachesusedynamicprogramming• Burrows-WheelerTransform(BWT)wouldimprovetimeefficiency

• ChallengesinapplyingBWT:(1)Sizingerrorand(2)alphabetsize

Challenges

!𝑢𝑛𝑖𝑞𝑢𝑒𝑓𝑟𝑎𝑔𝑚𝑒𝑛𝑡𝑠𝑖𝑧𝑒𝑠 >�

16,000

Page 15: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Twin

SamplePreparation

Sequencing

Assembly

Analysis

Contigs

OpticalMapData

Alignmentofcontigstoopticalmap

Genome-wideopticalmap

Page 16: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Contig 1

Contig 2

Contig 3 Contig 5

Contig 4

Page 17: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TwinAlgorithm

1. Insilico digestcontigs intoopticalmaps.

TTTCCGACCACTTTTCCGAATTATGACCGAA

4,13,24

Page 18: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TwinAlgorithm

1. Insilico digestcontigs intoopticalmaps.2. BuildFM-index* andauxiliarydatastructures

onthegenome-wideopticalmap.

*adatastructurethatallowscompressionoftheinputtextwhilestillpermittingfastsubstringqueries

Page 19: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

BWTandFM-indexAsuffixarray(SA)ofstringSisanarrayofthesuffixesofSsortedintoalphabeticalorder.

1 acaaacgn2 caaacgn3 aaacgn4 aacgn5 acgn6 cgn7 gn8 n

3 aaacgn4 aacgn1 acaaacgn5 acgn2 caaacgn6 cgn7 gn8 n

acaaacgn

Page 20: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

BWTandFM-indexAsuffixarray(SA)ofstringSisanarrayofthesuffixesofSsortedintoalphabeticalorder.

The suffix array clusters all the occurrences of everypattern together into a contiguous range!

1 acaaacgn2 caaacgn3 aaacgn4 aacgn5 acgn6 cgn7 gn8 n

3 aaacgn4 aacgn1 acaaacgn5 acgn2 caaacgn6 cgn7 gn8 n

acaaacgn

Page 21: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Asuffixarray(SA)ofstringSisanarrayofthesuffixesofSsortedintoalphabeticalorder.

The suffix array clusters all the occurrences of everypattern together into a contiguous range!

1 acaaacgn2 caaacgn3 aaacgn4 aacgn5 acgn6 cgn7 gn8 n

3 aaacgn4 aacgn1 acaaacgn5 acgn2 caaacgn6 cgn7 gn8 n

acaaacgn

BWTandFM-index

Page 22: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

1 acaaacgn2 caaacgn3 aaacgn4 aacgn5 acgn6 cgn7 gn8 n

3 aaacgn4 aacgn1 acaaacgn5 acgn2 caaacgn6 cgn7 gn8 n

acaaacgn

Asuffixarray(SA)ofstringSisanarrayofthesuffixesofSsortedintoalphabeticalorder.

The suffix array clusters all the occurrences of everypattern together into a contiguous range!

BWTandFM-index

Page 23: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

3 aaacgn4 aacgn1 acaaacgn5 acgn2 caaacgn6 cgn7 gn8 n

1 acaaacgn2 caaacgn3 aaacgn4 aacgn5 acgn6 cgn7 gn8 n

acaaacgn

Asuffixarray(SA)ofstringSisanarrayofthesuffixesofSsortedintoalphabeticalorder.

The suffix array clusters all the occurrences of everypattern together into a contiguous range!

BWTandFM-index

Page 24: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TheBurrows-WheelerTransform(BWT)isapermutationofthestringsuchthatBWT[i] = S[SA[i] - 1].

3 aaacgnac4 aacgnaca1 acaaacgn5 acgnacaa2 caaacgna6 cgnacaaa7 gnacaaac8 nacaaacg

acaaacgn

BWTandFM-index

canaaacg

ExtractlastcolumnofSA

Page 25: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TheBurrows-WheelerTransform(BWT)isapermutationofthestringsuchthatBWT[i] = S[SA[i] - 1].

rankK(i): returnthenumberofK’sinS[1,i]

3 aaacgnac4 aacgnaca1 acaaacgn5 acgnacaa2 caaacgna6 cgnacaaa7 gnacaaac8 nacaaacg

acaaacgn

BWTandFM-index

canaaacg

00012310

BWT rank

Page 26: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TheBurrows-WheelerTransform(BWT)isapermutationofthestringsuchthatBWT[i] = S[SA[i] - 1].

rankK(i): returnthenumberofK’sinS[1,i]

3 aaacgnac4 aacgnaca1 acaaacgn5 acgnacaa2 caaacgna6 cgnacaaa7 gnacaaac8 nacaaacg

acaaacgn

BWTandFM-index

canaaacg

00012310

BWT rank

ranka[5] = 2

Page 27: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TheBurrows-WheelerTransform(BWT)isapermutationofthestringsuchthatBWT[i] = S[SA[i] - 1].

FM-indexisthecompressedversionoftheBWT andrank.

3 aaacgnac4 aacgnaca1 acaaacgn5 acgnacaa2 caaacgna6 cgnacaaa7 gnacaaac8 nacaaacg

acaaacgn

BWTandFM-index

canaaacg

00012310

BWT rank

Page 28: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TwinAlgorithm

1. Insilico digestcontigs intoopticalmaps.2. BuildFM-indexandauxiliarydatastructures

onthegenome-wideopticalmap.3. UsingtheFM-indexwefindallalignments

betweentheopticalmapandtheinsilicodigestedcontigs.- ModifiedFM-indexBackwardSearchAlgorithm

Page 29: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

FM-IndexBackwardSearchArecursivealgorithmforfindingsubstringsusingrank and BWT

rank[c]rank[a]

rank[a]

Page 30: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ModifiedFM-IndexBackwardSearch

• Sizingerrorandalphabet sizearechallengestoovercome

• Wecannotaffordabruteforceenumerationofthealphabetateachstepinthebackwardsearch

• Noveltyforopticalmaps:WaveletTree

Page 31: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

WaveletTree

AWaveletTreeconvertsastringintoabalancedbinary-treeofbitvectors,wherea0replaceshalfofthesymbols,anda1replacestheotherhalf.Thisdefinitionisappliedrecursive

Page 32: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

{A,C,G,T} is encoded as {0,0,1,1}

ACGTATATAGGAAGA001101010110010

WaveletTree

Page 33: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

{A,C,G,T} is encoded as {0,0,1,1}

ACGTATATAGGAAGA001101010110010

WaveletTree

Page 34: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Noambiguity!

WaveletTree

ACGTATATAGGAAGA001101010110010

ACAAAAAA01000000

0

{A,C} is encoded as {0,1}

Page 35: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

WaveletTree

ACGTATATAGGAAGA001101010110010

ACAAAAAA01000000

0

{G,T} is encoded as {0,1}

GTTTGGG0111000

1

Whichsymbolsin{A, G} existininputstring?

Page 36: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Tomatchx weneedtofindallthesubstringswithintherangex +/- y, fortolerancey.

ModifiedFM-IndexBackwardSearch

Page 37: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Tomatch9 weneedtofindallthesubstringswithintherange[6, 12] , fortolerance3.

ModifiedFM-IndexBackwardSearch

2,11,10,23,53,3,5,10,14,9,110, 1, 0, 1, 1,0,0, 0, 1,0, 1

Genomewideopticalmap

Page 38: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ModifiedFM-IndexBackwardSearch

2,11,10,23,53,3,5,10,14,9,110, 1, 0, 1, 1,0,0, 0, 1,0, 1

Tomatch9 weneedtofindallthesubstringswithintherange[6, 12] , fortolerance3.

2,10,3,5,10,90, 1,0,0, 1,1

11,23,53,14,110, 1, 1, 0, 0

2,3,50,0,1

10,9,100,1, 0

2,30,1

51

11,14,110, 1, 0

23,530, 1

Page 39: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ModifiedFM-IndexBackwardSearch

2,11,10,23,53,3,5,10,14,9,110, 1, 0, 1, 1,0,0, 0, 1,0, 1

Tomatch9 weneedtofindallthesubstringswithintherange[6, 12] , fortolerance3.

2,10,3,5,10,90, 1,0,0, 1,1

11,23,53,14,110, 1, 1, 0, 0

2,3,50,0,1

10,9,100,1, 0

2,30,1

51

11,14,110, 1, 0

23,530, 1

Page 40: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Arecursivealgorithmforfindingsubstringsusingrank and BWT

rank[c] rank[a]

rank[a]

ModifiedFM-IndexBackwardSearch

WaveletTreeQuery

Page 41: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TwinAlgorithm

1. Insilico digestcontigs intoopticalmaps.2. BuildFM-indexandauxiliarydatastructures

onthegenome-wideopticalmap.3. UsingtheFM-indexwefindallalignments

betweentheopticalmapandtheinsilicodigestedcontigs.

4. OutputthealignmentsinPSLformat.

Page 42: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TWINTestDatasets

Page 43: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

TWINResults

Page 44: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Twinisthefirstalignmentmethodthatiscapableofhandlinglargegenomesizes

• Theonlyindex-basedtoolandisordersofmagnitudefasterthanexistingapproaches(patentpending)

• Pinetree(20Gb)wouldtake~84machineyearswithSOMAbutacouplehourswithTwin

TWIN:Optical Map Aligner

Page 45: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

CORRECTINGERRORSINGENOMES

Page 46: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Mis-assemblyinGenomesMis-assembly: Significantlylargeinsertion,deletion,inversion,orrearrangementthatistheresultofdecisionsmadebytheassemblyprogram

Correctassembly

Rearrangement

Deletion

Insertion

A R R B

A R RB

A R B

A R R BR

Page 47: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn
Page 48: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Extensivevs.LocalMis-assemblies

ExtensiveMis-assembly:1kbp insizeandregionsaligntodifferentstrandsordifferentchromosomes.

LocalMis-assembly:smallerinsizeandonthesamestrandandsamechromosome.

Page 49: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

DeBruijn GraphofaGenome

ExampleGenome:ABCDEFGHICDEFGKLExampleGenome:ABCDEFGHICDEFGKL

1 3

2

ABC BCD CDE DEF EFG FGK GKL

FGH

GHIHIC

ICD

Page 50: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

DeBruijn GraphofaGenome

ABC BCD CDE DEF EFG FGK GKL

ExampleGenome:ABCDEFGHICDEFGKLExampleGenome:ABCDEFGHICDEFGKL

Page 51: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

DeBruijn GraphofaGenome

ABC BCD CDE DEF EFG FGK GKL

ExampleGenome:ABCDEFGHICDEFGKLResultingErroneousGenome:ABCDEFGKL

1

Page 52: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

SamplePreparation

Sequencing

Assembly

Analysis

Fragments

Reads

Contigs

Page 53: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel*

RefinedContigs

Reads

Contigs

*(Muggli,Puglisi,Ronen,Boucher,ISMB2015)

SamplePreparation

Sequencing

Assembly

Analysis

Fragments

Reads

Contigs

OpticalMapData

Page 54: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.GGCTTCCGACCACCACAAATGGATTATGAAGGATATATGGA

Page 55: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.GGCTTCCGACCACCACAAATGGATATGAAGGATATATGGATTATGAAGGATATAGGCTTCCGACCACCACAAATGGATTATGAAGGATATATGGA

Page 56: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.GGCTTCCGACCACCACAAATGGATTATGAAGGATATATGGA

Page 57: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.GGCTTCCGACCACCACAAATGGATATGAAGGATATATGGATTATGAAGGATATAGGCTTCCGACCACCACAAATGGATTATGAAGGATATATGGA

1 9

Page 58: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.

2. Buildthered-blackpositionaldeBruijn graphbasedonthealignment.

Page 59: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

SamplePreparation

Sequencing

ACGTAGAATCGACCATG

GGGACGTAGAATACGAC

ACGTAGAATACGTAGAA

Reads

Fragments

NextGenerationSequencing(NGS)

Page 60: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ACGTAGAATCGACCATGGGGACGTAGAATACGA

Paired-EndReads/Mate-PairReads

SamplePreparation

Sequencing

Fragments

Page 61: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ReadMatePairConcordance

A R R B

AR R B

A

R

R B

Correctassembly

Rearrangement

Inversion

Page 62: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ReadDepth

A R R B

A R BR R

RA B

Correctassembly

Insertion

Deletion

Page 63: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

Red-BlackPositionalDeBruijn Graph

I. Chooseavalueof𝑘andΔ .II. Eachpositional𝑘-mer (sk)isanedgebetweentwo

positional𝑘–mers:prefix andsuffix ofsk.III. Positional𝑘–mers,sk-1 andsk-1’, aregluedifsk-1 andsk-1’

havethesamelabelandtheirdistancesdifferbyatmostΔ.IV. Ask-1 isredifthereaddepthistwostandarddeviationsfrom

themeanorthereisasignificantnumberofdisconcordinatereadalignments;otherwise,itisblack.

Apositional𝑘-mer isa𝑘-mer withanapproximateposition.

Page 64: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

PositionalRedBlackdeBruijn GraphReadsaligned tocontigs:

Positionalk-mers withreaddepth:

PositionalRedBlackdeBruijnGraph:

Page 65: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.

2. Buildthered-blackpositionaldeBruijn graphbasedonthealignment.

3. Removeallbulgesandwhirlsforthered-blackpositionaldeBruijn graph.

Page 66: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.

2. Buildthered-blackpositionaldeBruijn graphbasedonthealignment.

3. Removeallbulgesandwhirlsforthered-blackpositionaldeBruijn graph.

Correctassembledcontigs Mis-assembledcontigs

A R R B A R RBA R BA R R BRA R R B

Page 67: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

misSEQuel Algorithm

1. Alignsequencereadstocontigs usingastandardalignmenttool.

2. Buildthered-blackpositionaldeBruijn graphbasedonthealignment.

3. Removeallbulgesandwhirlsforthered-blackpositionaldeBruijn graph.

4. Contig refinementusingopticalmapalignment.

Page 68: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

OpticalMapAlignment

NheI=G^CTAGC

E.Coliopticalmapsegment

A R R B

Page 69: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

NheI=G^CTAGC

“GCTAGC”

OpticalMapAlignment

BA R R

Page 70: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

NheI=G^CTAGC

CorrectlyAssembledContigs Align

BA R R

Page 71: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

NheI=G^CTAGC

A R BR R

Mis-assembledContigs Don’tAlign

Page 72: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

NheI=G^CTAGC

A R BR R

Mis-assembledContigs Don’tAlign

Page 73: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 74: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 75: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 76: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 77: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 78: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 79: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 80: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 81: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonTularensis

Page 82: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn
Page 83: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

ResultsonPine

Page 84: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

B

BA R R

ImprovePrediction

A RR R

Page 85: Optical Mapping Data: Data Generation and AlgorithmsThe Burrows-Wheeler Transform (BWT) is a permutation of the string such that BWT[i] = S[SA[i] -1]. 3 aaacgnac 4 aacgnaca 1 acaaacgn

B

ImprovePrediction

A RR R

Deletionbetweentwoalignedregions