Genotyping & Sequencing Technologies in Cancer Studies · • Next Generation Sequencing –...
Transcript of Genotyping & Sequencing Technologies in Cancer Studies · • Next Generation Sequencing –...
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Genotyping & SequencingGenotyping & SequencingTechnologies in Cancer StudiesTechnologies in Cancer Studies
Jeanette Papp
© 2008 Jeanette Papp
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
www.genoseq.ucla.edu
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Core EquipmentCore Equipment• Roche GS FLX (454) Next Gen Sequencer• 3730 capillary sequencers• Taqman 7900 Real-time PCR instrument• CEQ 8000 sequencers• Roche LightCycler 480• PSQ96 Pyrosequencer• Liquid handling robots• PCR Machines
Human Genetics Department• Affymetrix• Illumina• Solexa
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Genetic AssaysGenetic Assays• SNP Genotyping• DNA Methylation Analysis• Gene Expression• LOH (loss of heterozygosity)• siRNA/RNAi, microRNA• In/Del Analysis• Microsatellite Genotyping• Large Fragment Sizing• AFLP• RFLP
• BAC Fingerprinting• SAGE• HLA Typing• Conformation Analysis• Allele Quantification• Sequencing• Resequencing• Comparative Sequencing
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Maximize DNA QualityMaximize DNA QualityAll genotyping methods are sensitive to DNA…
• Quantity – Do you have sufficient DNA for your assay?Should you consider Whole Genome Amplification(WGA)? Advantages of WGA can be quantity andconsistency. Disadvantages – amplification bias, expense.
• Quality – What is the quality of your DNA? Highconcentrations of poor quality DNA will not help you.
• Consistency – Even if the quantity and quality areadequate, if they vary widely from sample to sample, youare liable to get poor results
Is your technology more robust to DNA quality or quantity?What is the dynamic range of detection?
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Sample CollectionSample Collection
• Where will you get your DNA?
• Blood or Buccal Swab / Saliva?• Quality• Consistency• Compliance
DNA / RNA CollectionDNA / RNA Collection
1 2 3 4 5 6 7
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Next Generation GeneticNext Generation GeneticTechnologiesTechnologies
Strategy
• Automation• Parallelization• Miniaturization
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Genetic Marker MapsGenetic Marker Maps
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Genetic MarkersGenetic Markers• Variable Number Tandem Repeats
(VNTRs)• Short Tandem Repeats
(STRs)• Dinucleotide Repeats (and tri- and tetra-)• CA Repeats• Microsatellites• Single Nucleotide Polymorphisms
(SNPs)
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Genotyping Gel AutoradiographGenotyping Gel Autoradiograph
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Fluorescent Genotyping GelFluorescent Genotyping Gel
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Capillary SequencingCapillary Sequencing
• Background
•Methods• Applications
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
SNPsSNPs• Single Nucleotide Polymorphism• Responsible for 90% of all human genetic variation• A SNP occurs every 100-300 base pairs• Currently almost 12 million SNPs in the NCBI SNP database• May be within genes (coding SNP, cSNP) or outside gene (non-
coding, the majority)• May cause amino acid changes or not. If it causes an amino acid
change it is called non-synonymous (nsSNP)• Most SNPs are not responsible for a disease.• Like microsatellites, they are used as markers for pinpointing a
disease on the genome map. SNPs make particularly goodmarkers because- They occur frequently throughout the genome.- They are older and more stable genetically.
SNP PlatformsSNP PlatformsStudy SizeStudy Size
• Pyrosequencing – few SNPs, few samples, low-throughput,labor intensive
• Taqman – fewer SNPs, fewer samples, moderate-throughput• SNPlex – moderate numbers of SNPs and samples, high-
throughput (hundreds of thousands per day)• Sequenom – moderate numbers of SNPs and samples,high-
throughput (hundreds of thousands per day)• Affymetrix – many SNPs, many samples, ultra-high-
throughput (millions per day)• Illumina – many SNPs, many samples, ultra-high-throughput
(millions per day)• Next Generation Sequencing – massively parallel,
ultra-ultra-high-throughput
SNP PlatformsSNP PlatformsTechnologyTechnology
• Pyrosequencing – Sequencing by synthesis. Addition ofdNTPs one at a time. Luciferase generates light whennucleotides incorporated. Gives you the neighboringsequence. Labor intensive.
• Taqman – Allelic discrimination. Fluorescent probe for eachSNP variant. Simple, robust chemistry.
• SNPlex – Multiplexed, fluorescently labeled oligonucleotideson capillary electrophoresis.
• Sequenom – Mass Spectrometry. Highly accurate andreproducible.
• Affymetrix – Microarray of probe DNA spots printed on a glass or plastic chip.
• Illumina – Microarray of tiny beads with boundoligonucleotides. Sample DNA binds to the bead oligo, and is detected by an optical fiber.
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
InstrumentsInstruments
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
InstrumentsInstruments
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
PyrosequencingPyrosequencing
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Allelic DiscriminationAllelic Discrimination
SNPlex SNPlex Chemistry OverviewChemistry OverviewEncoding
Generation of genotype (GT) specificproducts through multiplex
oligonucleotide ligation reaction (OLA)
AmplificationMultiplex PCR with universal primers
DecodingHybridization of universal ZipChute
probes to amplicons and identificationof eluted ZipChutes by CE
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
LigationLigation
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Enzymatic PurificationEnzymatic Purification
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
PCRPCR
Hybridization and ElectrophoresisHybridization and Electrophoresis
SNPlex Raw DataSNPlex Raw Data
SNPlex Raw DataSNPlex Raw Datafrom the Corefrom the Core
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
ExamplesExamples
Good
Good Bad
Bad
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
ExamplesExamples
Bad
SNP MicroArraySNP MicroArray
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
SNP BeadArraySNP BeadArray
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
SNP PlatformsSNP PlatformsCostCost
• Affymetrix – $0.0006-??
• Illumina – $0.0002-??
• Sequenom – $0.05-$0.20
• SNPlex – $0.08-$0.14
• Taqman – $0.20-$0.70
• Pyrosequencing – $0.35-$1.60
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
When Choosing a TechnologyWhen Choosing a TechnologyPlatform, Consider:Platform, Consider:
• Conversion rate – How many of the SNPs youchose worked - in silico, in vitro?
• Reproducibility – Do you get the same result whenyou repeat the whole process?
• Error Rate – Compared to “true” result.
• Concordance – Agreement with some othermethod.
• Call Rate – How much missing data?
• Cost – What can you afford?
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Aarray Aarray DataDataChip-based GenomicsChip-based Genomics
•• Each spot represents one genetic markerEach spot represents one genetic marker
•• New generation chips holdNew generation chips hold 2.3 million 2.3 million SNPsSNPs
•• To find genes for common, complex traits To find genes for common, complex traits ititmay require DNA from 2000 individualsmay require DNA from 2000 individuals
4,600,000,000 4,600,000,000 SNP datasetSNP dataset
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Using High-Resolution MeltingUsing High-Resolution Meltingon the Roche on the Roche LightCycler LightCycler toto
Determine Determine CpG-siteCpG-siteMethylation Methylation statusstatus
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Bisulfite Bisulfite ConversionConversion• Methyl-C and C are indistinguishable to
DNA polymerase• Methylation state is lost in PCR
amplification• Bisulfite treatment converts unmethylated
cytosine to uracil, which becomes thymineafter PCR
• Methylated cytosines are protected frombisulfite and thus unchanged
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Principle of High ResolutionPrinciple of High ResolutionMeltingMelting
• C to T proportion significantly changes the meltingtemperature of the product
• Degree of DNA methylation gives different melting profiles
• The high resolution dye from Roche intercalates and saturatesevenly giving a sharp, precise melting profile
HRM Dye vs. SYBRGreen
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Example of High ResolutionExample of High ResolutionMelting curves on Roche LC480Melting curves on Roche LC480
fluorescence
Temperature
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Analysis using Tm CallingAnalysis using Tm Calling• Significantly different melting temperatures of the two species
allows quantitative analysis using the Tm calling
• By comparing the area of each peak relative to the sum of bothpeaks a quantitative percentage can be obtained
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Analysis using Analysis using GeneGene Scan Scan
Method ComparisonsMethod Comparisons
Optimizationdependent
Accurate, scalable,low cost
High precision
One tube method
QuantitativeMethylationSensitive HighResolution Melting
High setup costHigh accuracy.
Can be scaled up
QuantitativeMethyLight
Labor intensive.
3rd primer necessaryfor extra specificity
Sequence contextQuantitativePyrosequencing
Labor intensive, highper sample cost.
Time consuming
Sequence context.Gives preciseinformation for entiresequence
Qualitative/semi-quantitative
BisulfiteSequencing
Extensive PCRoptimization
Low cost.
Variable precision
QualitativeMethylationSpecific PCR
ConsProsQuantitative/
Qualitative
Technique
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Sequencing TechnologiesSequencing Technologies
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
A Brief History of SequencingA Brief History of Sequencing……
• A adenine
• C cytosine
• T thymine
• G guanine
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Sanger SequencingSanger SequencingChain TerminationChain Termination
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Sequencing Gel AutoradiographSequencing Gel Autoradiograph
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Dye TerminationDye Termination
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Fluorescent GelFluorescent Gel
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
DNA SequencingDNA Sequencing
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Capillary SequencingCapillary Sequencing
• Background
•Methods• Applications
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Human Genome ProjectHuman Genome Project
3 Billion Base Pairs
• 1990 - Begin: estimate 15 years, $3 billion• 1998 - Craig Venter starts new companyCelera, “will sequence human genome in 3years for $300 million”
• 2000 - Working draft• 2003 - Complete in 13 years, $2.7 billion
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
NIH $1000 Genome InitiativeNIH $1000 Genome InitiativeRELEASE DATE: February 12, 2004:
The purpose of this Request for Applications (RFA) is to solicitgrant applications to develop novel technologies that willenable extremely low-cost genomic DNA sequencing. Currenttechnologies are able to produce the sequence of amammalian-sized genome of the desired data quality for $10to $50 million; the goal of this initiative is to reduce costs byat least four orders of magnitude, so that a mammalian-sizedgenome could be sequenced for approximately $1000.Substantial fundamental research is needed to develop thescientific and technological knowledge underpinning such amajor advance. Therefore, it is anticipated that the realizationof the goals of this RFA is a long-range effort that is likely torequire as much as ten years to achieve.
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Next Generation SequencingNext Generation SequencingTechnologiesTechnologies
Strategy
• Automation• Parallelization• Miniaturization
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Commercially Available NowCommercially Available Now• 454 (now Roche GS FLX) - 2005
~300 base pair reads. Good for novel genomes 100 million base pairs in 8 hour run
• Solexa - 2006 Short reads, ~ 23-35 bp. Best when there is a reference
sequence (“resequencing”) or for analyzing small moleculeslike RNA.
4 gigabases per 3 day run - the equivalent of one-third of thehuman genome
• SOLiD - 2007 Short reads, ~35 bp 3 gigabase per 3 day run
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
TechnologyTechnology• 454 (now Roche GS FLX)
DNA is fragmented, bound to beads, amplified till each bead has100,00 copies of the DNA fragment
Pyrosequencing - When a base is added to the DNA strandpyrophosphate is released, used as a substrate for luciferase, lightis emitted and detected by a camera
• Solexa Sequencing by synthesis with fluorescently labeled bases and
DNA fragments bound to a slide.
• SOLiD Sequencing by ligation. Bead-bound DNA molecule is interrogated
with each of the 16 possible 2 base pair combinations in afluorescently labeled oligonucleotide.
Next Gen SequencerNext Gen SequencerRoche 454 GS FLXRoche 454 GS FLX
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Current CostCurrent Cost
• ~$100,000 for research
• $350,000 from Knome personalgenomics company
Cost to Sequence Human GenomeCost to Sequence Human Genome
$1000Coming Next???
$60,000Applied BiosystemsSOLiD
< $100,000Illumina Solexa
$100,000Roche GS FLX (454)
$3 millionCurrent CapillarySequencer
$2.7 billionHuman GenomeProject
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Coming NextComing Next…… Single Molecule Sequencing -Single Molecule Sequencing -without amplificationwithout amplificationNanoporeA single-stranded DNA molecule is driven through a pore so tinythat it partially blocks the pore as it moves through. Each baseblocking the pore has different electrical properties and is read offas it passes through the pore.
Nano-knife edgeA single-stranded DNA molecule is stretched and immobilized overa nano-edge probe. Each base in turn is excited by electrons, andthe base’s unique vibration pattern is read.
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
ApplicationsApplications
As the introduction of the personalcomputer created new applications incomputing, whole genome sequencingtechnology has generated newapplications
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Whole Genome SequencingWhole Genome Sequencing
De novo sequence of novel genomes
• Mycobacterium tuberculosis
• Vibrio cholerae
• Streptococcus pneumoniae
• Haemophilus influenzae
• Helicobacter pylori
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Genomic Diversity andGenomic Diversity andMetagenomicsMetagenomics
Microbial diversity in
• Human microbiome• Microbes in honey bee colony
collapse disorder• Deep sea microenvironments• Deep mine microbial ecology• Environmental sampling
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Comparative Genomics,Comparative Genomics,PaleogenomicsPaleogenomics,,
Ancient DNA AnalysisAncient DNA Analysis• Neanderthal Genome
• Mammoth
• Ancient wolves
• Mitochondria from ancient hair shafts
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Chromosome StructureChromosome Structure• Deletions
• Duplications
• Copy number variation
• Insertions
• Inversions
• Translocations
• Methylation Analysis
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Transcriptome Transcriptome AnalysisAnalysis
• Use massively parallel sequencing toquantitatively measure gene expressionacross tens of thousands of samples
• Develop a genomic signature of
Cell differentiation state Disease state Therapeutic response Tumor signature
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
Deep Deep ResequencingResequencing
Sequencing against a reference genome
• Identify rare disease-causing variants
• Cancer, HIV mutation detection
Research Field Application Method
De Novo
Sequencing
Whole Genomes of Plants, Yeasts,
Fungi, Bacteria, Viruses
Shotgun Sequencing,
Paired-End Sequencing, De Novo Assembly
Whole Genomes of Humans, Plants,
Yeasts, Fungi, Bacteria, Viruses
Shotgun Sequencing,
Mapping to Reference
Sequence
Genomic rearrangements, Copy
number variation Paired-End Sequencing
Resequencing
Indels, SNPs, Somatic Mutations Amplicon Sequencing
Full-length transcripts
Multiplex Sequencing of Paired-End Ditags (Singapore MS-PET)
Transcriptome
Analysis
Serial Analysis of Gene Expression
(SAGE)
Small non-coding RNAs Gene Regulation
Studies Transcription Factor Binding Sites (ChIP-Sequencing)
cDNA fragment sequencing
DNA Methylation Pattern Amplicon Sequencing Epigenetic
Changes Nucleosome Modifications DNA fragment sequencing
Analysis of Environmental DNA Shotgun Sequencing Metagenomics &
Microbial
Diversity 16S rRNA Amplicon Sequencing
Paleogenomics Whole Genome Sequencing of Ancient
DNA Shotgun sequencing
Epidemiology 243 - Molecular Epidemiology of Cancer May 13, 2008
DiagnosticsDiagnostics
• $1000 genome will allow individualsequence as diagnostic. Will it come toosoon to be useful?
• In genomic research technology anddata have often come before our abilityto extract information and knowledgefrom them.