NOT ALL EXOMES ARE EQUAL A COMPARISON OF THREE KITSAgilent v4 shows best base coverage: 30x (largest...

Post on 25-May-2020

4 views 0 download

Transcript of NOT ALL EXOMES ARE EQUAL A COMPARISON OF THREE KITSAgilent v4 shows best base coverage: 30x (largest...

Probe Designs Are Very Different

There are four common filaggrin mutations in Scottish/Irishpopulations.Only NimbleGen is able to capture them all reliably.Nextera is misleading. Illumina do not publish their probesets,cannot know if mutations are well covered or not.

MethodologyFour patient samples had individual libraries made for each of thekits run in duplicate across two lanes per kit.The 24 datasets were aligned to the human genome (ensembl r71)with bowtie2 (v2.1.0) and had PCR duplicates removed with Picard(v1.89).Variant calling was performed with GATK (v2.2-8-g99996f2) usingvendor-provided probeset definitions and annotated with VariantEffect Predictor (v2.7).

Which is Best?

Samples were genotyped with Illumina OmniExpress Exome arrayand the results compared to the three WES platforms.15,068 positions common to genotype array and WES.Globally, Agilent and NimbleGen kits performed similarly with theIllumina kit being significantly worse.The Epidermal Differentiation Complex (EDC) region whichincludes 63 genes which are required for the normal developmentof the stratum corneum in skin.Within the EDC, NimbleGen has best probe coverage, best 20xcoverage and lowest disagreement with the genotyping arrayresults (Table 1).

AcknowledgementsWe thank the patients for providing the samples used in this study.

Exome SequencingSequencing libraries are prepared as for a normal genomic DNA

sample, but the DNA fragments are hydridised in solution to probesdesigned to enrich for coding regions in the genome. Non-codingDNA is washed away and captured fragments are eluted ready forsequencing on an Illumina HiSeq2000.

Quality Assessment

Agilent v4 shows bestbase coverage: 30x(largest circle).Agilent v5 shows worstduplicate rate: ~17%.Nextera & Agilent v5poor on-target rate: <50%NimbleGen shows goodon-target rate (~74%) andlow duplicate reads (~4%)

Variant Calling Reproducibility

Technical reproducibility ishigh with Agilent andNimbleGen (~91%), butNextera is worse: ~85%.Sample reproducibility isquite variable: 35-50%Background noise is high.

Sample Clustering

Samples cluster primarilyby kit.Protocol has more impacton results than biology.Clusters are robust asdetermined by bootstrapscores (au > 95%).

Christian Cole1,2, J Ward1,2,3, M Lee2,3, D Ross2,3, N wilson3, FJD Smith2, SJ Brown2, A Irvine4, WHI McLean2, GJ Barton1, and M Febrer3,

1Computational Biology, College of Life Sciences, University of Dundee, UK; 2Centre for Dermatology and Genetic Medicine, College of Life Sciences and College of Medicine, Dentistryand Nursing, University of Dundee, UK; 3Genomic Sequencing Unit, College of Life Sciences, University of Dundee, UK; 4Department of Dermatology, Our Lady’s Children’s Hospital,Dublin, Ireland

NOT ALL EXOMES ARE EQUAL:A COMPARISON OF THREE KITS

Aim

Whole exome sequencing as a protocol is highly dependent on the probedesign of the kit manufacturers. Here we present results from four humanpatient samples run against Illumina’s Nextera, Agilent’s SureSelect v5 andNimblegen’s SeqCap v3 library preparation kits sequenced across four lanesof a HiSeq2000. The data were processed through a variant calling pipelinebased around the Genome Analysis ToolKit (GATK). A comparison is madewith Illumina OmniExpressExome genotyping array data for validation. Thesignificant differences found have a particular relevance to dermatologyrelated studies which are an important focus for DGEM in Dundee, but alsoare more generally applicable to exome sequencing.

p.R501Xc.2285del4p.R2447Xp.S3247X

Filaggrin

Chromosome 1

Illumina Nextera

NimbleGen SeqCap v3

Agilent SureSelect v4

Agilent SureSelect v5

Common Mutations in Atopic Eczema

smpl

1_la

ne1

smpl

1_la

ne2

smpl

2_la

ne1

smpl

2_la

ne2

smpl

3_la

ne1

smpl

3_la

ne2

smpl

4_la

ne1

smpl

4_la

ne2

smpl

3_la

ne1

smpl

3_la

ne2

smpl

4_la

ne1

smpl

1_la

ne1

smpl

1_la

ne2

smpl

2_la

ne1

smpl

2_la

ne2

smpl

3_la

ne1

smpl

3_la

ne2

smpl

4_la

ne1

smpl

4_la

ne2

smpl

1_la

ne1

smpl

1_la

ne2

smpl

2_la

ne1

smpl

2_la

ne2

smpl

3_la

ne1

smpl

3_la

ne2

smpl

4_la

ne1

smpl

4_la

ne2

smpl4_lane2smpl4_lane1smpl3_lane2smpl3_lane1smpl2_lane2smpl2_lane1smpl1_lane2smpl1_lane1smpl4_lane2smpl4_lane1smpl3_lane2smpl3_lane1smpl2_lane2smpl2_lane1smpl1_lane2smpl1_lane1smpl4_lane1smpl3_lane2smpl3_lane1smpl4_lane2smpl4_lane1smpl3_lane2smpl3_lane1smpl2_lane2smpl2_lane1smpl1_lane2smpl1_lane1

20 40 60 80 100% Agreement

AgilentNextera

Nim

blegenv4

v5

Agilentv4v5

Nextera Nimblegen

0 10 20 30 40 50 60

020

4060

8010

0

Duplicate Reads (%)

On−

targ

et R

eads

(%

)

Circle area = base coverage π

Agilentv4Agilentv5NexteraNimblegen

100 100 100100 100100100 100100100 100100 100

100 100

5861 97728910081 70

85

97

au

next

era_

smpl

4_la

ne1

next

era_

smpl

4_la

ne2

next

era_

smpl

3_la

ne1

next

era_

smpl

3_la

ne2

next

era_

smpl

1_la

ne1

next

era_

smpl

1_la

ne2

next

era_

smpl

2_la

ne1

next

era_

smpl

2_la

ne2

nim

bleg

en_s

mpl

4_la

ne1

nim

bleg

en_s

mpl

4_la

ne2

nim

bleg

en_s

mpl

3_la

ne1

nim

bleg

en_s

mpl

3_la

ne2

nim

bleg

en_s

mpl

2_la

ne1

nim

bleg

en_s

mpl

2_la

ne2

nim

bleg

en_s

mpl

1_la

ne1

nim

bleg

en_s

mpl

1_la

ne2

agile

ntv4

_sm

pl4_

lane

1ag

ilent

_sm

pl4_

lane

1ag

ilent

_sm

pl4_

lane

2ag

ilent

v4_s

mpl

3_la

ne1

agile

ntv4

_sm

pl3_

lane

2ag

ilent

_sm

pl3_

lane

1ag

ilent

_sm

pl3_

lane

2ag

ilent

_sm

pl2_

lane

1ag

ilent

_sm

pl2_

lane

2ag

ilent

_sm

pl1_

lane

1ag

ilent

_sm

pl1_

lane

2

Bootstrap: 1000Distance: euclidean Clustering: complete

p = 0.048 p = 0.033

0

3000

6000

9000

Agilent Illumina NimblegenKit

Cou

nt (+

/− S

E)

No. Agreeing Variantsp = 0.0012 p = 6.7x10 -4

p = 0.028

0

250

500

750

Agilent Illumina NimblegenKit

Cou

nt (+

/− S

E)

No. Disagreeing Zygocity

WES Kit EDC Coverage

WES Variants

20x Coverage

ArrayGenotypes

WES Disagreement

Agilent 37% 376 82% 105 1.9%

Illumina - 1011 46% 191 5.2%

NimbleGen 69% 669 92% 138 1.4%