File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned...
Transcript of File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned...
![Page 1: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/1.jpg)
![Page 3: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/3.jpg)
http://xkcd.com
![Page 4: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/4.jpg)
Overwhelming at first Overview
FASTA – reference sequences FASTQ – reads in raw form SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file GTF/GFF/BED – annotations
![Page 5: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/5.jpg)
FASTA
Used for: nucleotide or peptide sequences Simple structure
> header
sequence
![Page 6: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/6.jpg)
FASTA
Used for: nucleotide or peptide sequences Simple structure
![Page 7: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/7.jpg)
FASTQ
Just like FASTA, but with quality values Used for: raw data from sequencing (unaligned reads)
@ header
sequence
+
quality
![Page 8: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/8.jpg)
FASTQ
Just like FASTA, but with quality values Used for: raw data from sequencing (unaligned reads)
![Page 9: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/9.jpg)
FASTQ
Quality 0-40 (Illumina 1.8+ = 41)
40 = best
![Page 10: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/10.jpg)
FASTQ
Quality 0-40 (Illumina 1.8+ = 41)
40 = best
ASCII encoded
![Page 11: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/11.jpg)
FASTQ
Quality 0-40 (Illumina 1.8+ = 41)
40 = best
ASCII encoded
![Page 12: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/12.jpg)
FASTQ
Quality 0-40 (Illumina 1.8+ = 41)
40 = best
ASCII encoded
![Page 13: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/13.jpg)
SAM
Used for: aligned reads Lots of columns..
![Page 14: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/14.jpg)
SAM
![Page 15: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/15.jpg)
SAM
Used for: aligned reads Lots of columns..
Read name
Start position bp chr Sequence Quality
![Page 16: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/16.jpg)
BAM
Binary SAM (compressed) 25% of the size SAMtools to convert .bai = BAM index
![Page 17: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/17.jpg)
![Page 18: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/18.jpg)
BAM
Random order Have to sort before indexing
![Page 19: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/19.jpg)
BAM
Random order Have to sort before indexing
![Page 20: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/20.jpg)
BAM
Random order Have to sort before indexing
Chr1 Chr2 Chr3 Chr4 Chr5
![Page 21: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/21.jpg)
BAM
![Page 22: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/22.jpg)
BAM
![Page 23: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/23.jpg)
BAM
![Page 24: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/24.jpg)
CRAM
Very complex format Used together with a reference genome
![Page 25: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/25.jpg)
CRAM
Quality scores? 3 modes:
Lossless Binned No quality
![Page 26: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/26.jpg)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 32 33 34 35 36 37 38 39 40 41
1-5 6-10 11-15 16-20 21-25 26-30 31-35 35-40 41-45
![Page 27: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/27.jpg)
CRAM
Quality scores? 3 modes:
Lossless Binned No quality
Not widespread, yet
![Page 28: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/28.jpg)
GTF/GFF/BED
Used for: annotations Simple structure
Usually:
chr start stop extra info
![Page 29: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/29.jpg)
GTF/GFF/BED
Used for: annotations Simple structure
Usually:
chr start stop extra info
BED
![Page 30: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/30.jpg)
GTF/GFF/BED
Used for: annotations Simple structure
Usually:
chr start stop extra info
GFF
![Page 31: File Types in Bioinformatics · File Types in Bioinformatics 151116 Martin Dahl ... SAM – aligned reads BAM – compressed SAM file CRAM – even more compressed SAM file](https://reader033.fdocuments.net/reader033/viewer/2022050106/5f4434b9963183616a479de8/html5/thumbnails/31.jpg)
Laboratory time! (yet again)