ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that...

18
ENCODE 2012

Transcript of ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that...

Page 1: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

ENCODE 2012

Page 2: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.
Page 3: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

• The Human Genome project sequenced “the human genome”

• “the human genome” that we have labeled as such doesn’t actually exist

• What we call the human genome sequence is really just a reference

• Furthermore, the current reference genome sequence is haploid

Page 4: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

African-American Asian-Chinese Hispanic-Mexican Caucasian Caucasian

Whose genome did Celera sequence?Supposedly:

Actually:

Celera’s genome is Craig Venter’sScience v. 291, pp 1304-1351

Page 5: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.
Page 7: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

• Every time an individual cell divides, new mutations arise; no two cells even within any individual have the identical sequence.

Page 8: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

ENCODE

• The Encyclopedia of DNA Elements (ENCODE) is a public research consortium initiated by the US National Human Genome Research Institute (NHGRI) in September 2003.

• The goal is to find all functional elements in the human genome.

• All data generated in the course of the project will be released “rapidly” into public databases.

Page 9: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

• Pilot phase – 2003-2007 – method evaluation– 1% of genome

• Production phase 2007-2012– September 2012 – 30 papers published– 442 scientists– 31 labs– 147 different types of cells with 24 types of experiments– 1,642 experiments– Data released

Page 10: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

• Identification and quantification of RNA species in cells and subcellular compartments

• Mapping of noncoding and protein-coding genes

• Delineation of chromatin and DNA accessibility

• Mapping of histone modifications and transcription factor-binding sites

• Measurement of DNA methylation

Page 11: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

Credits: Darryl Leja (NHGRI), Ian Dunham (EBI)

Page 12: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

What did they find?

• Controversy!

• Assigned biochemical functions to over 80% of the genome.

• Junk DNA or no?• What is a biochemical function?• “a reproducible biochemical signature”• “millions of switches”

Page 13: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

• The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type.

• Primate-specific elements as well as elements without detectable mammalian constraint show, in aggregate, evidence of negative selection; thus, some of them are expected to be functional.

• Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features, as well as hundreds of thousands of quiescent regions.

• It is possible to correlate quantitatively RNA sequence production and processing with both chromatin marks and transcription factor binding at promoters, indicating that promoter functionality can explain most of the variation in RNA expression.

Page 14: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

• Many non-coding variants in individual genome sequences lie in ENCODE-annotated functional regions; this number is at least as large as those that lie in protein-coding genes.

• Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched within non-coding functional elements, with a majority residing in or near ENCODE-defined regions that are outside of protein-coding genes. In many cases, the disease phenotypes can be associated with a specific cell type or transcription factor.

Page 15: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

Changing how we view a gene?

• Genes should be defined by transcripts.• Transcripts are the basic unit that’s affected by

mutation and selection. • A “gene” then becomes a collection of

transcripts, united by some common factor.

Page 16: ENCODE 2012. The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.

• Another related challenge is understanding the genome’s three-dimensional shape. Far from being arranged in a line, chromosomes are folded in fantastically complicated fractal patterns, and these topographies appear to shape network interaction.

• “Every gene is surrounded by an ocean of regulatory elements. They’re everywhere. There are only 25,000 genes, and probably more than 1 million regulatory elements,” said Job Dekker, a molecular biophysicist at the University of Massachusetts Medical School who worked on ENCODE’s structural descriptions of the genome.

• He continued, “It’s not just one gene touching one regulator. It can touch and interact with a whole collection of them. It must involve a very complicated three-dimensional structure. At this scale, chromosomes topography turns out to be incredibly dynamic, complex and cell type-specific.”