Post on 02-Jul-2015
description
Genomics?
Genomics - WikipediaGenomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism).[1][2] Advances in genomics have triggered a revolution in discovery-based research to understand even the most complex biological systems such as brain.[3] The field includes efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome.[4] !!In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.[5][6]
Estevezj - CC3 Wikimedia
29/09/2014 14:14
Page 1 of 1http://upload.wikimedia.org/wikipedia/commons/7/73/Number_of_prokaryotic_genomes_and_sequencing_costs.svg
Ⓐ
Ⓑ Ⓒ
• Genomics
• Biodiversity assessments
• Stool microbiome sequencing
• Personalized medicine
• Cancer genomics
Challenges
1. Getting up and running with Unix
2. Algorithms in Bioinformatics: strengths & weaknesses
3. Bioinformatics databases
4. DIY: genome assembly & identifying variants.
Getting up and running with Unix & High Performance Computing
(HPC)
ITS Research Team (Lukasz Zalewski): 1. Install virtualbox & biolinux. 2. Introduction to Unix 3. Using Apocrita HPC = “the cluster”
!
Algorithms for sequence alignment.
- dotplots- the concept of distance: Euclidean, hamming, Levenshtein - dynamic programming and the Smith Waterman algorithm - local, global, semiglobal alignments - gap penalty models - basics of approximate methods (Blast) - scoring matrices (PAM, Blosum) - Profiles and PSI-Blast
Take home message?•Algorithms are approximate •Results aren’t perfect •Computers can get it wrong
Algorithms for sequence alignment.
BLAST is unable to detect any similarity between these 2 sequences:
Gp-9 1 ATGAAGACGTTCGTATTGCATATTTTTATTTTTGCTCTCGTGGCTTTCGCTTCTGCATCT 60 ||||||||||| |||||||||| ||||||||| |||||||| |||||||||| |||||K2000 1 ATGAAGACGTTGGTATTGCATAATTTTATTTT---TCTCGTGGATTTCGCTTCTCCATCT 57!Gp-9 61 CGTGATAGCGCGAGGAAGATAGGATCCCAATATGACAATTACGCGACTTGCTTAGCCGAA 120 ||||| ||||||| || ||| ||||||||| |||||| |||||| ||||||||| |||||K2000 58 CGTGAGAGCGCGAAGACGATGGGATCCCAACATGACATTTACGCCACTTGCTTACCCGAA 117!Gp-9 121 CATAGTCTAACAGAGGATGACATCTTCTCGATTGGTGAAGTATCAAGTGGCCAGCACAAA 180 |||| ||||| || |||| || | ||||||||| ||||||||| |||||||||| |||||K2000 118 CATAATCTAAGAGGGGATAACGTTTTCTCGATTCGTGAAGTATAAAGTGGCCAGGACAAA 177!Gp-9 181 ACCAATCATGAAGATACCGAACTACACAAAAATGGTTGCGTCATGCAATGTTTGTTAGAA 240 |||| ||||||||| |||||||| ||||||||| || ||||||| |||||||| ||||||K2000 178 ACCAGTCATGAAGAAACCGAACTCCACAAAAATCGTCGCGTCATACAATGTTTATTAGAA 237!Gp-9 241 AAAGATGGACTGATGTCTGGAGCTGATTATGATGAAGAGAAAATGCGTGAGGACTATATC 300 |||||||| |||||| ||| ||| ||||||||| ||| |||||||||| |||||||||K2000 238 TAAGATGGAATGATGTGTGGGGCTAATTATGATGGAGAAAAAATGCGTGCTGACTATATC 297!Gp-9 301 AAGGAA------ACAGGTGCTCAACCAGGAGATCAAAGGATAGAAGCTCTGAATGCCTGC 354 | |||| || |||| |||||||||| |||| |||| |||| |||||||||| | |K2000 298 AGGGAATCAGGTACCGGTGGTCAACCAGGACATCAGAGGAGAGAACCTCTGAATGCGTAC 357!Gp-9 355 ATGCAAGAAACAAAAGACATGGAGGATAAATGTGACAAAAGCTTGCTCCTTGTAGCATGT 414 ||||||||| ||||||| ||| ||| |||||| ||||||||| | || ||| |||||K2000 358 ATGCAAGAATCAAAAGATATGCAGGTTAAATGGCACAAAAGCT---TTCTAGTAACATGT 414!Gp-9 415 GTCTTAGCAGCTGAAGCTGTGCTCGCCGATTCTAACGAAGGAGCATAA 462 | |||||||| | |||||| ||||| |||||| ||||||||| ||||K2000 415 ATTTTAGCAGCGGGAGCTGTTCTCGCGGATTCTCACGAAGGAGAATAA 462
Take home message?• Algorithms are approximate • Results depend on:
• underlying biology • approximations made by algorithms • search and database size
Algorithms for sequence alignment.
Databases for Bioinformatics
• Biological databases & access to the annotated genomes • NCBI • Ensembl • UCSC • Entrez & Biomart • Genbank/Uniprot !
• Cancer resources and data portals • TCGA, ICGC and Cosmic
Take home message?
Databases for Bioinformatics
Genome Assembly & variant calling• Processing raw data
• Genome assembly algorithms
• Read mapping
• Quality Assurance processes
• Calling & visualising variants
• Automated gene prediction
• Doing things in the command-line
Bruno Vieira
Rodrigo Pracana
Old & modern assembly algorithms
• Overlap-layout consensus
!
• De bruijn-based.