Graduate School Bioinformatics Sequence Analysis...

50
Introduction Barbera van Schaik Welcome Scale of sequence data DNA sequencing Genome projects Bioinformatics databases and tools Databases Sequence analysis Handling sequence data Computing Application areas Graduate School Bioinformatics Sequence Analysis Introduction Barbera van Schaik Bioinformatics Laboratory, KEBB Academic Medical Center [email protected] March 9, 2020 1 / 50

Transcript of Graduate School Bioinformatics Sequence Analysis...

Page 1: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Graduate SchoolBioinformatics Sequence Analysis

Introduction

Barbera van Schaik

Bioinformatics Laboratory, KEBBAcademic Medical Center

[email protected]

March 9, 2020

1 / 50

Page 2: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Related Graduate School courses

• DNA technology

• Unix

• Computing in R

• Practical biostatistics

• Advanced biostatistics

• Bioinformatics

• Bioinformatics Sequence Analysis

• Research Data Managementhttps://www.amc.nl/web/leren/graduate-school.htm

2 / 50

Page 3: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

In this course

Bioinformatics Sequence Analysis

You will learn what is behind commonly used methods forsequence analysis, how to analyze datasets with(reasonably) user-friendly interfaces, and get introduced tocommand-line tools for next generation sequencing (NGS)

3 / 50

Page 4: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Not in this course

1 Sequence assembly

2 Bisulphite sequencing

3 Protein sequence analysis

4 Metagenomics

4 / 50

Page 5: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Bioinformatics Sequence Analysis

1 Introduction to sequence analysis

2 Sequencing techniques

3 Brief introduction Linux and R (self study)

4 NGS pre-processing

5 (Multiple) sequence alignment

6 Case: Neuroblastoma

7 Introduction to R2

8 Exome sequence analysis

9 RNAseq

The focus is on human data, but many techniques are alsoapplicable to other organisms

5 / 50

Page 6: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Practical things

Certificate

• Attend all sessions (one day can be skipped, ask forpossibility for self-study)

• Active participation

Other things

• Lunch is not included

• Coffee is available at the machines with your AMC card

• Slides and exercises are published onhttps://bioinformatics.amc.nl/

6 / 50

Page 7: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

In this hour

IntroductionYou will get an indication about the scale of sequence data,how to handle the data, where to find publicly availabledata and tools, and what can be done with NGS

7 / 50

Page 8: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Overview

1 Welcome

2 Scale of sequence dataDNA sequencingGenome projects

3 Bioinformatics databases and toolsDatabasesSequence analysis

4 Handling sequence dataComputingApplication areas

8 / 50

Page 9: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Sanger

9 / 50

Page 10: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Automated sequencing

10 / 50

Page 11: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Sequencing centers

11 / 50

Page 12: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Next generation sequencing

12 / 50

Page 13: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Genome projects

• HGP

• 1000g

• UK10K >100K genomes

• Personal genomes

13 / 50

Page 14: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Human Genome Project

http://web.ornl.gov/sci/techresources/Human_Genome/index.shtml

14 / 50

Page 15: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Human Genome Project

http://web.ornl.gov/sci/techresources/Human_Genome/index.shtml

15 / 50

Page 16: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

1000 genomes project

http://www.1000genomes.org/

16 / 50

Page 17: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

UK10K

4000 genomes6000 exomeshttp://www.uk10k.org/

17 / 50

Page 18: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

The 100K genomes project

The project will focus onpatients with a rare disease andtheir families and patients withcancer. The first samples forsequencing are being takenfrom patients living in Englandwith discussions taking placewith Scotland, Wales andNorthern Ireland aboutpotential future involvement.http://www.genomicsengland.co.uk/

18 / 50

Page 19: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Personal genomes

100,000 genomes plus medical recordshttp://www.personalgenomes.org/

19 / 50

Page 20: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Sequencers around the world

http://omicsmaps.com/

20 / 50

Page 21: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Sequencers around the world 2015

http://omicsmaps.com/

21 / 50

Page 22: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Big data

22 / 50

Page 23: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

DNA sequencing rate

Stephens et al. (2015) PLoS One

23 / 50

Page 24: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

GenBank, EMBL and DDBJ

International Nucleotide Sequence Database CollaborationDaily exchange of sequence data

https://www.ncbi.nlm.nih.gov/

https://www.ebi.ac.uk/

http://www.ddbj.nig.ac.jp/

24 / 50

Page 25: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Nucleotide sequence databases

From: http://www.davelunt.net/

25 / 50

Page 26: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

GenBank

Release 236 (Feb 2020)has 399,376,854,872 base pairs from 216,214,215sequences. In addition, there are 1,206,720,688 WGSrecords containing 6,968,991,265,752 base pairs ofsequence data.

https://www.ncbi.nlm.nih.gov/genbank/statistics/

GenBank has doubled approximately every 18 months

26 / 50

Page 27: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Core databases and derivatives

Nucleotide sequence databases

Core: RNA, DNA

Genbank

EMBL

DDBJ

RNA grouped per gene UniGene

Genome assemblies

Human

Model organisms

Bacteria

Plants

Etc

Genome comparisons Conserved regions

DNA motifs

Protein binding sites

Conserved regions

DNA structure

Restriction sites

Gene expressionExpressed Sequence Tags (ESTs)

RNAseq

Variants

SNPs, insertions and deletions

Structural variants

Allele databases

Specialized databases

Gene specific

Disease specific

Genome projects

MetagenomicsMicrobiome

Environment samples

Protein translations

27 / 50

Page 28: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Where to start?

https://www.oxfordjournals.org/nar/database/c/

28 / 50

Page 29: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Sequence analysis

Sequence alignment

• Needleman-Wunsch

• Smith-Waterman

• BLAST

• BLAT

• ClustalW

• BWA, BFAST, Bowtie, Tophat, etc, etc

Sequence suites/packages

• Emboss package

• CLCbio workbench

• Galaxy

• R Bioconductor

Hundreds of tools to analyse sequence data...

29 / 50

Page 30: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Tools

https://academic.oup.com/nar/article/47/W1/W1/5524725

30 / 50

Page 31: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Tools

Most tools are only available via the command-line (on linuxsystems)

31 / 50

Page 32: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Open source

Free as in freedomYou can use, change, integrate, and review the codeOpen source allows sharing and promotes collaborationNo vendor lock-in

32 / 50

Page 33: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Open source

• Software

• Databases

• Journals

• Standards

• Hardware

• Art

• Money

• Drinks

• Medicine

• Fashion

• Educationhttps://en.wikipedia.org/wiki/Open_source

33 / 50

Page 34: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Handling sequence data

34 / 50

Page 35: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Buy a bigger cluster (centralizedmodel)

35 / 50

Page 36: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Dutch life science grid

http://surfsara.nl/

36 / 50

Page 37: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Cloud computing

37 / 50

Page 38: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

HPC cloud at SurfSara

You will use a linux environment that runs on the HPC cloudto get acquainted with command-line tools

38 / 50

Page 39: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

NGS application areas

39 / 50

Page 40: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Whole genomes

• De novo sequencing

• Re-sequencing

• Copy number variations

• Rearrangements

• New insertions/deletions/mutations

40 / 50

Page 41: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Structural variation

The Human Genome Structural Variation Working Group, Nature 2007

41 / 50

Page 42: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

SNP / haplotype analysis

Linkage studiesForensic research

42 / 50

Page 43: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Gene expression

https://en.wikipedia.org/wiki/Regulation_

of_gene_expression

• Full-length transcripts

• EST sequencing

• 5’ transcript ends(5’-RATE, CAGE)

• SAGE ditag sequencing

• SAGE-like 3’ endsequencing

• Nebulized fragments

• ncRNA sequencing

43 / 50

Page 44: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Epigenetics

Treatment with sodium bisulfiteUnmethylated cytosines change into uracilMethylated cytosines are unchangedCompare sequences with reference sequence

44 / 50

Page 45: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Metagenomics and microbialdiversity

Study genomic content in acomplex mixture ofmicroorganisms(bacteria or viruses in someenvironment)Identify new species

45 / 50

Page 46: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Paleogenomics

Sequencing ofancient DNAMummiesSabretoothMammothNeanderthal

46 / 50

Page 47: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Gene regulation

47 / 50

Page 48: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Sequence analysis

Usually starts with sequence alignment or sequence assemblyDepending on the application other tools/methods are used ordeveloped

48 / 50

Page 49: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

With a click of a button...

.. or perhaps not. You will find out during this course.Computer exercises sequence analysis:

1 Via web tools

2 Creating pipelines online

3 With command-line tools in a Linux environment

49 / 50

Page 50: Graduate School Bioinformatics Sequence Analysis Introductionbioinformatics.amc.nl/.../gs-sequence-analysis/... · Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amsterdamumc.nl

Introduction

Barbera vanSchaik

Welcome

Scale ofsequence data

DNA sequencing

Genome projects

Bioinformaticsdatabases andtools

Databases

Sequenceanalysis

Handlingsequence data

Computing

Applicationareas

Bioinformatics Sequence Analysis

1 Introduction to sequence analysis

2 Sequencing techniques

3 Brief introduction Linux and R (self study)

4 NGS pre-processing

5 (Multiple) sequence alignment

6 Case: Neuroblastoma

7 Introduction to R2

8 Exome sequence analysis

9 RNAseq

50 / 50