Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of...

18
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics resources for biomedical scientists

Transcript of Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of...

Page 1: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

SIB and EMBnet Bioinformatics resources for biomedical scientists

Page 2: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

The Swiss Institute of Bioinformatics

Founded in March 1998Collaborative structure Lausanne - Geneva - BaselGroups at ISREC, Ludwig Institute, Unil, HUG, UniGe, UniBas and soon ETHZ.Several roles: teaching, services, researchCurrently: ~ 160 employees

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Projects at SIB

DatabasesSWISS-PROT, PROSITE, EPD, World-2DPAGE, SWISS-MODELTrEST, TrGEN (predicted proteins), tromer (transcriptome)

SoftwaresMelanie, Deep View, proteomic tools, ESTScan, pftools, Java applets

ServicesWeb servers ExPASy, EMBnet, MyHitsTeaching and helpdesk

ResearchMostly sequence and expression analysis, 3D structure, andproteomic

Page 3: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Teaching

Master degrees in Bioinformatics (Bologna type): 90 ECTS credits in Unige, Unil and Unibas.EMBnet courses: 4x 1 week per year in Lausanne, Basel, Bern or ZürichPregrade courses in Geneva, Fribourg and Lausanne UniversitiesOther courses at CHUV and EPFLCourses in other countries: Colombia, Cambodia, Peru, …

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Research

New algorithms (faster alignments…)New technology (GRID or cluster computing)New tools (protein analysis, microarrays, confocalmicroscopy)New databases (microarrays, transcriptome, proteome)

Collaborations with lab researchers!

Page 4: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Three levels of services

Simple web access to softwares and databasesEasy to use for basic occasional research with few sequencesPotentially insecure

Command-line access with a local Unix accountMore powerful (automation) and secure Requires to understand Unix system and frequent practice

Collaboration with SIBAccess to experts in the field (help desk)For projects requiring huge programming or special hardware resources

Help [email protected] or http://www.expasy.org/contact.html

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

SIB’s important sites

Homewww.isb-sib.ch

ExPASy - Expert Protein Analysis Systemwww.expasy.org

MyHits database and toolsmyhits.isb-sib.ch

EMBnet Switzerlandwww.ch.embnet.org

Geneva Bioinformaticswww.genebio.ch

Page 5: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

SIB home

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Expert Protein Analysis SystemQuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Page 6: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

MyHits http://myhits.isb-sib.ch

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Swiss node http://www.ch.embnet.org

Page 7: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

EMBnet organisation

European in 1988, now world-wide spread32 country nodes, 8 special nodes.

RoleTraining, education (EMBER)Software development (EMBOSS, SRS)Computing resources (databases, websites, services)Helpdesk and technical supportPublications (EMBnet.news, Briefings in Bioinformatics)

Access: www.embnet.orgEach node with “www.xx.embnet.org” where xx is the country code (e.g., ch for Switzerland)

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

EMBnet home

Page 8: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

European Molecular Biology Open Software Suite

Free Open Source (for most Unix plateforms)GCG successor (compatible with GCG file format)More than 150 programs (ver. 2.9.0)Easy to install locally

but no interface, requires local databasesUnix command-line only

Interfaces Jemboss, wEMBOSS, www2gcg, w2h… (with account)Pise, EMBOSS-GUI, SRSWWW (no account)Staden, Kaptain, CoLiMate, Jemboss (local)

Access: www.emboss.org or emboss.sourceforge.net

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Other important sites

ExPASy - Expert Protein Analysis Systemwww.expasy.org

EBI - European Bioinformatics Institutewww.ebi.ac.uk

NCBI - National Center for BiotechnologyInformation

www.ncbi.nlm.nih.govSanger - The Sanger Institute

www.sanger.ac.uk

Page 9: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Bioinformatics: definition

Every application of computer science to biologySequence analysis, images analysis, sample management, population modelling, …

Analysis of data coming from large-scale biologicalprojects

Genomes, transcriptomes, proteomes, metabolomes, etc…

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

The new biology

Traditional biologySmall team working on a specialized topicWell defined experiment to answer precise questions

New « high-throughput » biologyLarge international teams using cutting edge technologydefining the projectResults are given raw to the scientific community withoutany underlying hypothesis

Page 10: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Example of « high-throughput »

Complete genome sequencingLarge-scale sampling of the transcriptome (EST)Simultaneous expression analysis of thousands of genes (DNA microarrays, SAGE)Large-scale sampling of the proteomeProtein-protein analysis large-scale 2-hybrid (yeast, worm)Large-scale 3D structure production (yeast)Metabolism modellingSimulationsBiodiversity

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Role of bioinformatics

Control and management of the data Analysis of primary data e.g.

Base calling from chromatogramsMass spectra analysisDNA microarrays images analysis

StatisticsDatabase storage and accessResults analysis in a biological context

Page 11: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

First information: a sequence ?

NucleotideRNA (or cDNA) Genomic (intron-exon)Complete or incomplete?

mRNA with 5’ and 3’ UTR regionsEntire chromosome

ProteinPre/Pro or functional protein?Function predictionPost-translational modifications?Holy Grail: 3D structure?

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Genomes in numbers

Sizes:virus: 103 to 105 ntbacteria: 105 to 107 ntyeast: 1.35 x 107 ntmammals: 108 to 1010 ntplants: 1010 to 1011 nt

Gene number:virus: 3 to 100bacteria: ~ 1000yeast: ~ 7000mammals: ~ 30’000Plants: 30’000-50’000?

Page 12: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Sequencing projects

« small » genomes (<107): bacteria, virusMany already sequenced (industry excluded)More than 150 microbial genomes already in the public domainMore to come! (one new every two weeks…)

« large » genomes (107-1010) eucaryotes>30 finished (S.cerevisiae, S. Pombe, E. cuniculi, G. theta, C.elegans, D.melanogaster, A. gambiae, P. falciparum, P. yoelii, D. rerio, F. rubripes, A.thaliana, O. sativa (2x), M. musculus, Homo sapiens, P. troglodytes, R. norvegicus, C. familiaris, G. gallus…) Many more to come: cat, elephant, pig, cow, maize (and otherplants), insects, fishes, many pathogenic parasites (Leishmania…)

EST sequencingPartial mRNA sequences ~40x106 sequences in the public domain

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Human genome

Size: 3 x 109 nt for a haploid genomeHighly repetitive sequences 25%, moderately repetitive sequences 25-30%Size of a gene: from 900 to >2’000’000 bases (intronsincluded)Proportion of the genome coding for proteins: 5-7%Number of chromosomes: 22 autosomal, 1 sexual chromosome Size of a chromosome: 5 x 107 to 5 x 108 bases

centromer exons of a gene telomer

regulatory elements repetitive sequences

locus control region

Page 13: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

How to sequence the human genome?

Consortium « international » approach:Generate genetic maps (meiotic recombination) and pseudogeneticmaps (chromosome hybrids) for indicator sequencesGenerate a physical map based on large clones (BAC or PAC)Sequence enough large clones to cover the genome

« commercial » approach (Celera):Generate random libraries of fixed length genomic clones (2kb and10kb)Sequence both ends of enough clones to obtain a 10x coverageUse computer techniques to reconstitute the chromosomalsequences, check with the public project physical map

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Interpretation of the human draft

All chromosomes considered as finishedEven a genomic sequencedoes not tell you where thegenes are encoded. Thegenome is far from being« decoded »One must combine genomeand transcriptome to have a better idea Last freeze Ncbi34 July, 2003Last freeze Ncbi34 July, 2003

Page 14: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

The transcriptome

The set of all functional RNAs (tRNA, rRNA, mRNA etc…) that can potentially be transcribed from the genomeThe documentation of the localization (cell type) and conditions under which these RNAs are expressedThe documentation of the biological function(s) of each RNA species

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Public draft transcriptome

Information about the expression specificity and thefunction of mRNAs

« full » cDNA sequences of know function« full » cDNA sequences (HTC), but « anonymous » (e.g. KIAA or DKFZ collections)EST sequences

cDNA libraries derived from many different tissuesRapid random sequencing of the ends of all clones ORESTES sequences

Growing set of expression data (microarrays, SAGE etc…)Increasing evidences for multiple alternative splicing andpolyadenylation

Page 15: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Example mapping of ESTs and mRNAs

ESTsmRNAs

Computer prediction

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

The proteome

Set of proteins present in a particular cell type under particular conditionsSet of proteins potentially expressed from thegenomeInformation about the specific expression andfunction of the proteins

Page 16: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Information on the proteome

Separation of a complex mixture of proteins2D PAGE (IEF + SDS PAGE)Capillary chromatography

Individual characterisation of proteinsTryptic peptides signature (MS)Sequencing by chemistry or MS/MS

All post-translational modifications (PTMs) !

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Tridimentional structures

Methods to determine structuresX-ray cristallographyNMR

Data formatAtoms coordinates (except H) in a cartesian space

DatabasesFor proteins and nucleic acids (RSCB, was PDB)Independent databases for sugars and small organicmolecules

Page 17: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Visualisation of the structures

Secondary structure elementsAlpha helices, beta sheets, other

SoftwaresVarious representations (atoms, bonds, secondary…)Big choice of commercial and free software (e.g., DeepView)

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Sequence information, and so what ?

How to store and organise ?Databases (next lecture)

How to access, search, compare ?Pairwise alignments, dot plots (Tuesday)BLAST searches in db (Tuesday)EST clustering (Wednesday)Multiple Alignments (Wednesday) Patterns, PSI-BLAST, Profiles and HMMs (Thursday)Gene prediction (Thursday) Protein function prediction (Friday)Users problems (Friday)

Page 18: Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2005.02

Thank you