Day 2: Intro to CLIMB at the MRC Unit, Gambia

22
Introduction to nullarbor The Milner Centre for Evolution Department of Biology & Biochemistry, University of Bath http://www.climb.ac.uk/ http://www.sheppardlab.com/ Ben Pascoe with Maciej Filocha (Warwick) & Mark Pallen (Warwick)

Transcript of Day 2: Intro to CLIMB at the MRC Unit, Gambia

Page 1: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Introduction to nullarbor

The Milner Centre for EvolutionDepartment of Biology & Biochemistry, University of Bath

http://www.climb.ac.uk/http://www.sheppardlab.com/

Ben Pascoewith Maciej Filocha (Warwick) & Mark Pallen

(Warwick)

Page 2: Day 2: Intro to CLIMB at the MRC Unit, Gambia

1. Introduction to nullarbor2. Setting up nullarbor with test campy dataset3. Launching your own VM4. (something for you try later – Shigellosis nullarbor tutorial)5. Checking your nullarbor output

Introduction to nullarbor

Page 3: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Sequencing

Page 4: Day 2: Intro to CLIMB at the MRC Unit, Gambia

https://github.com/tseemann/nullarbor

Nullarbor - "Reads to report" for public health and clinical microbiology

http://www.slideshare.net/torstenseemann/bioinformatics-tools-for-the-diagnostic-laboratory-tseemann-antimicrobials-2016-melb-au-sat-27-feb-2016

Page 5: Day 2: Intro to CLIMB at the MRC Unit, Gambia

What is nullarbor?

Per isolateClean /trim sequence reads (Trimmomatic)• Remove adaptors, quality scoresSpecies identification (Kraken)• K-mer analysis against KNOWN databaseDe novo assembly (MEGAHIT/SPAdes)• Fast, confident genome assemblyAnnotation (Prokka)• Genome annotationMLST calling• From KNOWN databasesResistome (Abricate)• ID AMR genes from KNOWN databaseVariant calling from reads compared to reference

Per datasetCore genome SNPs (Snippy – from readsPhylogenetic trees (FastTree)Accessory genome (ROARY)Report generation

Page 6: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Workshop isolates

4 Campylobacter isolatesAll LAB strains – should all be VERY similar…

Run nullarbor

How similar are the isolates?Is there an explanation for any difference observed?

Implications 11168 widely-used as lab strain and molecular studies based on this reference strain

Page 7: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Campylobacter: background

Sheppard et al. (2009) Clinical Infectious Diseases 48:1072–1078

952

22

42

45

177

682

48

1275

661692

61

206

354

257

1034

57421

Sheppard et al. (2010) Applied Environmental Microbiology 76, 5269-5277

Page 8: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Campylobacter: source attribution

Page 9: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Campylobacter: introgression

Page 10: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Campylobacter: GWAS

Page 11: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Linking phenotypes and genotypes using GWAS:Asymptomatic isolates Vs Symptomatic isolates

Weights association compared to relative position on the tree

Sheppard et al, PNAS 2013; Pascoe et al, Environmental Microbiology 2015; Monteil et al, Microbial Genomics 2016, Yahara & Meric et al, Environmental Microbiology 2017

Page 12: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Development of GWAS for use with bacteria: GWAS within clonal complex

Sheppard et al (2013) PNAS 110: (29) 11923-11927

Cattle isolates Vs Chicken isolates

Pascoe et al (2015) Environmental MicrobiologyDOI: 10.1111/1462-2920.13051

Good Vs Bad Biofilm isolates

Previous studies were confined to single clonal complex:

Bacteria are clonal – difficult to associations biased by lineage effects – inheritance from common ancestor.

Accessory genome – bacterial genomes are all different sizes!

Page 13: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Development of GWAS for use with bacteria: pan-genome GWAS

SymptomaticAsymptomatic

Paired isolates for pan-genome GWASFastML tree of 36 paired isolates (pan-genome)

Reduce false positivesMaintain statistical powerNot confined to single clonal

complexZero unmapped words

Mageiros & Meric et al, unpublished; Pascoe et al, unpublished

Previous studies were confined to single clonal complex:

Association weighted against the clonal frame (tree)

Paired isolates from many CCs.

Use of reference pan-genome instead of 1 single reference genome.

Page 14: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Genome-wide association of Campylobacter genetic elements with disease severity / asymptomatic carriage

Pascoe et al, unpublished

High statistical association: glycosylation genes

Iron uptake Motility

*scores for all genes in pan-genome from all 77 isolates – 2,996 genes

Thousands of ‘this’ in ~3,000 genes!

Page 15: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Access VM using VM box

Using your ip address:gambia-1: 137.205.69.151gambia-2: 137.205.69.153gambia-3: 137.205.69.154gambia-4: 137.205.69.155gambia-5: 137.205.69.156gambia-6: 137.205.69.157gambia-7: 137.205.69.158gambia-8: 137.205.69.159gambia-9: 131.251.130.226gambia-10: 131.251.130.227

For all: User: ubuntuPassword: password123

Page 16: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Check we have all the files you need

What do we need?• Input file: allinput.tab• Reference genome: al111168.fasta • Reads from MiSeq: *.fastq.gz

(8 files, 4 isolates)

Page 17: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Setup nullarbor

nullarbor.pl --name gambia --mlst campylobacter --ref al111168.fasta --input allinput.tab --outdir output --verbose

• Type command to setup nullarbor• Nullarbor will perform checks and give you command to use to start run:

nice make -j 1 -C /home/ubuntu/gambia/output• run

• Can also run with ‘no hangup’nohup nice make -j 1 -C /home/ubuntu/gambia/output &

Page 18: Day 2: Intro to CLIMB at the MRC Unit, Gambia

It will run for a couple of hours…

Page 19: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Launching your own VM

https://discourse.climb.ac.uk/

Page 20: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Nullarbor output: report example

Page 21: Day 2: Intro to CLIMB at the MRC Unit, Gambia

Workshop isolates

Are all four isolates very similar?Which of the 4 isolates were contaminated?Which isolate was passaged through a chicken?

Page 22: Day 2: Intro to CLIMB at the MRC Unit, Gambia

https://discourse.climb.ac.uk/

Nullarbor tutorials on discourse.climb.ac.uk: Can you run this on your own VM?