Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’...
Transcript of Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’...
![Page 1: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/1.jpg)
Sequence Alignment Con0nued
Lecture 3: August 28, 2012
![Page 2: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/2.jpg)
Review from Last Lecture: Exis0ng Tools
![Page 3: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/3.jpg)
Different Sequence Alignment • Database Search:
– BLAST, FASTA, HMMER
• Mul0ple Sequence Alignment: – ClustalW, FSA
• Genomic Analysis: – BLAT
• Short Read Sequence Alignment: – BWA, Bow)e, drFAST, GSNAP, SHRiMP, SOAP, MAQ
![Page 4: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/4.jpg)
Short Read Alignment SW
Bow)e: memory-‐efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-‐bp reads per hours Burrows-‐Wheeler Aligner (BWA): an aligner that implements two algorithms: bwa-‐short and BWA-‐SW. The former works for query sequences shorter than 200 bp and the la\er for longer sequences up to around 100 kbp.
![Page 5: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/5.jpg)
Sequence Alignment/Map Format Input: query and
reference sequences.
Alignment So9ware
SAM File
Resequencing RNA Seq SNPs
![Page 6: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/6.jpg)
Understanding the Input and Output of BWA
![Page 7: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/7.jpg)
Sequence Alignment/Map Format Sequence Reads + Reference Sequence
Alignment So9ware
SAM File
Resequencing RNA Seq SNPs
Reads: Illumina or 454 reads. Reference: whole genome, contig, chromosome.
BWA, Bowtie, mrsFAST, GSNAP.
Most of the analysis happens when considering the SAM files.
![Page 8: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/8.jpg)
SAM format “A tab-‐delimited text format consis0ng of a header sec0on, which is op0onal, and an alignment sec0on”
@HD VN:1.0 SO:coordinate @SQ SN:1 LN:249250621 AS:NCBI37 UR:file:/data/local/ref/GATK/human_g1k_v37.fasta
M5:1b22b98cdeb4a9304cb5d48026a85128 @SQ SN:2 LN:243199373 AS:NCBI37 UR:file:/data/local/ref/GATK/human_g1k_v37.fasta
M5:a0d9851da00400dec1098a9255ac712e @SQ SN:3 LN:198022430 AS:NCBI37 UR:file:/data/local/ref/GATK/human_g1k_v37.fasta
M5:fdfd811849cc2fadebc929bb925902e5 @RG ID:UM0098:1 PL:ILLUMINA PU:HWUSI-EAS1707-615LHAAXX-L001 LB:80 DT:2010-05-05T20:00:00-0400 SM:SD37743
Example Headers:
1:497:R:-272+13M17D24M 113 1 497 37 37M 15 100338662 0 CGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAG 0;==-==9;>>>>>=>>>>>>>>>>>=>>>>>>>>>> XT:A:U NM:i:0 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37
19:20389:F:275+18M2D19M 99 1 17644 0 37M = 17919 314 TATGACTGCTAATAATACCTACACATGTTAGAACCAT >>>>>>>>>>>>>>>>>>>><<>>><<>>4::>>:<9 RG:Z:UM0098:1 XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:37
19:20389:F:275+18M2D19M 147 1 17919 0 18M2D19M = 17644 -314 GTAGTACCAACTGTAAGTCCTTATCTTCATACTTTGT ;44999;499<8<8<<<8<<><<<<><7<;<<<>><< XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:4 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:18^CA19
9:21597+10M2I25M:R:-209 83 1 21678 0 8M2I27M = 21469 -244 CACCACATCACATATACCAAGCCTGGCTGTGTCTTCT <;9<<5><<<<><<<>><<><>><9>><>>>9>>><> XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:5 X1:i:0 XM:i:0 XO:i:1 XG:i:2 MD:Z:35
Example Alignments:
![Page 9: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/9.jpg)
The Alignment Column
![Page 10: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/10.jpg)
The Alignment Column
![Page 11: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/11.jpg)
Harves0ng Informa0on from SAM • Query name, QNAME (SAM)/read_name (BAM). • FLAG provides the following informa0on:
– are there mul0ple fragments? – are all fragments properly aligned? – is this fragment unmapped? – is the next fragment unmapped? – is this query the reverse strand? – is the next fragment the reverse strand? – is this the last fragment? – is this a secondary alignment? – did this read fail quality controls? – is this read a PCR or op0cal duplicate?
![Page 12: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/12.jpg)
Bitwise Flags FLAG: bitwise FLAG. Each bit is explained in the following table:
![Page 13: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/13.jpg)
Bitwise Representa0on
1 = 00000001 paired-‐end read 2 = 00000010 mapped as proper pair 4 = 00000100 unmapped read 8 = 00001000 read mate unmapped 16 = 00010000 read mapped on reverse strand Example:
The flag 11 1 + 2 + 8 = 00001011 (condi0ons 1, 2, 8) • Flags 0, 4, and 16 are the flags most commonly used.
![Page 14: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/14.jpg)
The Alignment Column
![Page 15: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/15.jpg)
Mapping Quality
• Phred score, iden0cal to the quality measure in the fastq file. quality Q, probability P:
P = 10 ^ (-‐Q / 10.0)
• If Q=30, P=1/1000on average, one of out 1000 alignments will be wrong
• As good as this sounds it is not easy to compute such a quality.
![Page 16: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/16.jpg)
Mapping Quality • Repeat structure. Reads falling in repe00ve regions usually get very low mapping quality.
• Base quality of the read. Low quality means the observed read sequence is possibly wrong, and wrong sequence may lead to a wrong alignment.
• Sensi)vity of the alignment algorithm. The true hit is more likely to be missed by an algorithm with low sensi0vity, which also causes mapping errors.
• Paired end or not. Reads mapped in pairs are more likely to be correct.
![Page 17: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/17.jpg)
BWA Specific High Scores
A read alignment with a mapping quality 30 or above usually implies:
– The overall base quality of the read is good. – The best alignment has few mismatches. – The read has few or just one “good” hit on the reference, which means the current alignment is s0ll the best even if one or two bases are actually muta0ons or sequencing errors.
![Page 18: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/18.jpg)
BWA Specific Low Scores
Surprisingly difficult to track down the exact behavior • Q=0 if a read can be aligned equally well to mul0ple posi0ons, BWA will randomly pick one posi0on and give it a mapping quality zero.
• Q=25 the edit distance equals mismatches and is greater than zero
![Page 19: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/19.jpg)
What to do with low quality scores?
• Find repeat structures in the genome/con0g. • Determine if there is a problem with your alignment or data (i.e. all the reads mapped with low quality scores).
• Filter them out. Very common to write a perl/python script to filter out poorly aligned reads.
• Many, many, many other possibili0es.
![Page 20: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/20.jpg)
The Alignment Column
![Page 21: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/21.jpg)
CIGAR String
• CIGAR string is a compact representa0on of how the read aligned to the reference genome at that exact posi0on.
• More specifically, the CIGAR string is a sequence of of base lengths and the associated opera0on. – match/mismatch with the reference. – deleted/inserted from the from the reference.
![Page 22: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/22.jpg)
Example of CIGAR
RefPos: ! !1 !2 !3 !4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19!Reference: !C !C !A !T A C T G A A C T G A C T A A C!Read: ! ! ! ! ! A C T A G A A T G G C T!
!
In the SAM file you will have the following fields: • POS: 5 • CIGAR: 3M1I3M1D5M!
![Page 23: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/23.jpg)
Final Comments
• BAM is a compressed version of the SAM file format. There are mul0ple programs that convert BAM files to SAM files and vice versa.
• Tablet (h\p://bioinf.scri.ac.uk/tablet/) is an easy to use, program that allows you to visualize an alignment. – You simply give it a sam file and a fasta file and it reads the sam file and shows you the alignment.
![Page 24: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/24.jpg)
![Page 25: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/25.jpg)
Finding Short Read Data
![Page 26: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/26.jpg)
Where to obtain data?
• Answer: NCBI website: – NCBI contains mul0ple reference genomes (large fasta files) and short read data (fasta files that primarily Illumina and 454).
– Finding data is pre\y trivial by either going to the NCBI website directly or using google.
• For example: googling “e coli k12 reference genome fasta file” will take you directly to the broad ins0tutes website and a link to the NCBI reference genome.
![Page 27: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/27.jpg)
NCBI Short Read Archive h\p://trace.ncbi.nlm.nih.gov/Traces/sra/
![Page 28: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/28.jpg)
Using NCBI SRA
If you can download a movie or a tv show then you can download short read data from SRA (it’s even easier)…. you just have to know what to look for: 1. Go to the “search”. 2. Type in organism name and strain if you
know it. i.e. “Escherichia coli str. K-‐12 substr. MG1655”.
3. Look at query results then download.
![Page 29: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/29.jpg)
Running BWA
![Page 30: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/30.jpg)
Steps in using BWA
Download and install BWA on Linux/Mac. If you are using cs servers then you shouldn’t have to do this step. Export the path or use the exact path.
! ! !bunzip2 bwa-0.5.9.tar.bz2 !! ! !tar xvf bwa-0.5.9.tar!! ! !cd bwa-0.5.9 | make!! ! !make
Download the reference genome using wget.
![Page 31: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/31.jpg)
Create the index for the reference genome (assuming the reference sequences are in wg.fa). Only needs to be performed once for each genome. Use –a for small genomes. !
!bwa index -p hg19bwaidx -a bwtsw wg.fa!
Mapping short reads to the reference genome.
1. Align sequences using mul0ple threads (eg 4 CPUs). Assume the short reads are in the s_3_sequence.txt.gz file.!
!!bwa aln -t 4 hg19bwaidx s_3_sequence.txt.gz > !s_3_sequence.txt.bwa!
![Page 32: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/32.jpg)
2. Create alignment in the SAM format (a generic format for storing large nucleo0de sequence alignments):
!!bwa samse hg19bwaidx s_3_sequence.txt.bwa ! !!s_3_sequence.txt.gz > s_3_sequence.txt.sam!
!
Mapping long reads (454) can be done using the bwasw command: !
!bwa bwasw hg19bwaidx 454seqs.txt > 454seqs.sam!
![Page 33: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/33.jpg)
Recap and Looking Forward
![Page 34: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/34.jpg)
De novo vs. Re-‐sequencing • De novo assembly (“from the beginning”) implies that you have no prior knowledge of the genome. No reference, no con0gs, only reads.
• Re-‐sequencing assembly assumes you have a copy of the reference genome (that has been verified to a certain degree).
• The programs that work for re-‐sequencing will not work for de novo and vice versa. However, both can create copies of the genome.
![Page 35: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/35.jpg)
De novo vs. Re-‐sequencing
![Page 36: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/36.jpg)
Sample PreparaAon
Fragments
Re-sequencing (LOCAS, Shrimp) requires 15x to 30x coverage. Anything less and re-sequencing programs will not produce results or produce questionable results.
![Page 37: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/37.jpg)
Sample PreparaAon
Fragments
Re-sequencing (LOCAS, Shrimp) requires 15x to 30x coverage. Anything less and re-sequencing programs will not produce results or produce questionable results.
![Page 38: Recovered File 1 - cs.colostate.educs680/Slides/lecture3.pdf · SAMformat’ “A’tabTdelimited’textformatconsis0ng’of’aheader’sec0on,’ which’is’op0onal,’and’an’alignmentsec0on”’](https://reader034.fdocuments.net/reader034/viewer/2022052015/602d5f696d8cd760227005e8/html5/thumbnails/38.jpg)
Sample PreparaAon
Fragments
De-novo assembly requires higher coverage. At least 30x but upwards to 100x’s coverage. Most de novo assemblers require paired-end data.