Global Analysis of Functional Units of Plant Chromosomes:

1
Global Analysis of Functional Units of Plant Global Analysis of Functional Units of Plant Chromosomes: Chromosomes: DNA Replication, Domain Structure, and DNA Replication, Domain Structure, and Transcription Transcription PI: Bill Thompson a , Co-PIs: George Allen b , Linda Hanley-Bowdoin c , Doreen Main d , Rob Martienssen e , Bryon Sosinski f , Matthew Vaughn e a Plant Biology, Genetics, and Crop Science, NC State University, Raleigh, NC 27695; b Horticultural Science, NC State University; c Biochemistry and Genetics, NC State University d Horticulture & Landscape Architecture, Washington State University; e Cold Spring Harbor Laboratory, New York; f Horticultural Science, NC State University In this project we are identifying and characterizing regions of the genome that are replicated at different times during S phase. We have developed a FACS-based procedure, combined with BrdU pulse-labeling and immunoprecipitation, to analyze replication timing of an asynchronous cell population. This approach is intended to define functional domains of chromatin by determining their preferred time of replication. Figure 1 shows a chromosome-wide view of genomic features and DNA replication in the early, mid and late S- phase for Arabidopsis Chr 4. Panel A shows gene coverage (orange line) and TE coverage (purple line). Panel B shows GC percentage (calculated in 1-kb non-overlapping windows) is shown in panel B. Panel C shows a schematic representation of chr 4 omitting the telomeres and NOR (The gene-rich euchromatic distal short and distal long arms are shaded light gray while the heterochromatic knob and pericentromere are shaded black. The proximal portions of both the short and long arms have intermediate characteristics and are shaded dark gray). Panel D shows replication profiles , expressed as log 2 ratio of BrdU-incorporation in early (blue), mid (green) or late (red) S-phase cells relative to total DNA from the same cells. Gene and TE coverage, GC percentage, and replication profiles are loess-smoothed using a 150-kb window. Early replication is most prevalent in the distal long arm, a predominately euchromatic region. Late replication predominates in the heterochromatic knob and pericentromere, whereas regions of late replication are dispersed in other portions of the chromosome, especially in the centromere-proximal portions of the long and short arms. A remarkable feature of these data is that the early and late replication profiles in panel D show remarkable complementarity (R= -0.83), while the profiles for replication in early and mid S-phase cells are very similar to each other (R= 0.87). The most evident difference between the early and mid S profiles is a broadening and merging of early-replicating regions in mid S. In other words, the DNA replicating in mid S-phase represents nearly the same population of sequences as that replicating in early S-phase. The mid S-phase profile is also distinct from the late profile (R= -0.85). These data indicate that replication in Arabidopsis is basically a two phase process. The similarity of the early and mid S profiles is best explained by assuming considerable heterogeneity in the order of replication in different cells. Figure 2 displays a schematic representation of replicons, replication timing and replication domains for chr 4. In the top panel, each vertical bar represents a replicon, with the width of the bar proportional to the length of the replicon. Subdivisions within the bar indicate the percentage of probes within the replicon with a given replication time. The middle panel illustrates the clustering of replicons with similar timing into replication domains. The lower panel is a cartoon of the major regions of the chromosome, as in panel C of Figure 1. The complexity of replication timing within many replicons likely reflects several factors, including time and efficiency of origin firing, the number of origins within initiation zones, and the speed of elongation by DNA polymerase in specific contexts. Many of replication domains we found in Arabidopsis chr 4 are considerably smaller than those observed in mammalian cells. However, several larger replication domains do occur, including a 4.5-Mb late replication domain at coordinates 2.6 – 7.1 Mb and a 2.3-Mb early/mid replication domain located at coordinates 16.2 – 18.5 Mb. Figure 1. Landscape of Arabidopsis chr 4 with replication timing profiles for early, mid, and late S-phase cells D C B A Earl y Early and Mid Mid Mid and Late Lat e Early and Late Early, Mid and Late Indetermina te Figure 2. Replication timing and replicon structure on Arabidopsis chr 4 Analysis of DNA Replication timing on Arabidopsis Chromosome 4 We are optimizing the essential conditions for analysis of DNA replication timing on rice chr 4L and 10L using a rice cell culture (cultivar Nipponbare). All the technologies we developed for Arabidopsis replication timing will apply to this rice work with the optimized conditions. Panel A: The highest BrdU incorporation was observed twelve to sixteen hours after 7 day cultures were supplemented with fresh medium (“7 day split”). Panel B: Analytical FACS profile of 1-hr pulse-labeled rice nuclei isolated from cells after a 1 hour pulse given 16 hours after 7-day split cultures. Gates are defined for cell populations in G1, S and G2/M phases. DNA Replication Timing in Rice Mapping of Short Nascent Strands DNA replication is a strictly regulated process that preserves the genetic information necessary for future generations. Despite its importance, very little is known of the regulation of DNA replication in higher eukaryotes. Our goal is to understand and define where DNA replication originates in the Arabidopsis thaliana genome. We are using the newly synthesized leading strands (short nascent strands, SNS) which are thought to be initiated at the very origins. These SNS are being analyzed using a NimbleGen custom designed tiling array that covers the entire chromosome lV. We have developed two techniques to achieve this. In the first (A), we isolated SNS by size using an alkaline sucrose gradient (fig. A). Collected DNA between 1 and 3kb (including SNS) is then amplified, labeled and hybridized to the array. The second technique was developed to enrich and purify SNS. During the synthesis of DNA, the RNA primase adds an RNA primer so the DNA polymerase can recognize it and start synthesizing DNA. To isolate SNS, we used lambda exonuclease (B) which is unable to digest ssDNA primed with RNA primer at 5’ while digesting unprotected DNA. This allows us to recover newly synthesized DNA that is close to the origin. The recovered SNS are then amplified, labeled and hybridized to the array. (C) Comparison of the preliminary data for SNS with two different procedures. Data shows that 62% of the peaks found in SNS Lambda exonuclease are common with SNS by size. While this suggest that we have putative SNS, more analysis is needed to confirm our results During the last year we have profiled histone modifications, RNA Pol II occupancy and gene expression patterns in cell suspension culture (samples Cells4 and Cells7, taken at 96hrs and 16 hrs post culture split, respectively) and in the young rosette leaf samples from the wild- type Col plants using Illumina GA2 high throughput DNA sequencers. We used antibodies against histone H3K4me2, H3K4me3, H3K9me2, H3K27me2, H3K27me3, H3K14ac and H3K56ac, as well as antibodies against RNA Pol II to immuno-precipitate DNA associated with these histone modifications or with the initiating or elongating forms of RNA Pol II. This generated, on average, over 1.5x genome coverage per sequenced ChIP library. We have also sequenced over 2M mRNA fragments (40-50 nt in length) in a strand- specific manner from the same samples. In addition, we have sequenced between 1.7M and 3M small RNA from leaf, Cells4 and Cells7 samples. We are currently focusing our efforts on the analysis of ChIP-seq and RNA- seq data in the context of DNA replication timing by computational identification of genomic regions significantly enriched by immunoprecipitation over the control, input, DNA, in 100bp windows. The windows showing enrichment will be displayed as intensity maps in the Generic Genome Browser 2 environment. Profiling Histone Modifications (A) Proteins from the final ChIP input fraction (40 ug, lane 1), whole cell extract (40 ug WCE, lane 2) and volume equivalents from the non-chromatin-associated (S, lane 3) and chromatin-bound (P, lane 4) fractions were resolved by SDS-PAGE, and the blots were probed with the indicated antibodies. Chromatin was immunoprecipitated with the indicated quantity of anti-ORC2 serum (B) or anti- MCM5 serum (C). Proteins remaining in the supernatant (depletion) and the immunoprecipitated (enrichment) fractions were resolved by SDS- PAGE and the blots were probed with anti-ORC2 serum (B) or anti- MCM5 serum (C). (D) Sheared chromatin. Control (lane 2) and sonicated (lane 3) DNA was resolved and visualized with ethidium bromide. Validation of ORC2 and MCM5 ChIP ORC2 binding sites in Arabidopsis Ch4 euchromatin often map with early replication timing peaks The profile shows the Loess-smoothed early S replication profiles. Vertical lines indicate the termination zones for putative replicons.ORC2-binding sites are marked by orange spheres. Vertical positioning of ORC2 sites is from the replication profile and does not indicate ORC2 enrichment. Preliminary analysis of ORC2 binding sites The experiment included 3 bioreps each with an IP-technical rep for a total of 6 ChIP samples corresponding to ORC2. The samples were hybridized to Nimblegene Ch4 tiling arrays. The raw data was Loess-normalized and scaled in limma, and peak-finding was performed with NimbleScan using an 800-bp window. This analysis identified 563 putative ORC2-binding sites at an FDR ≤ 0.05, of which 289 were in constitutive heterochromatin. Euchromatic ORC2 binding sites tend to be intergenic, e.g. distal long arm is only ~ 22% intergenic but 56% of ORC2 binding sites in this region are intergenic. ORC2 binding sites are also AT-rich (68% for ORC2 binding compared to 64% for sites not binding ORC2). Mapping Nuclear Matrix Attachment Regions (MARs) Preliminary Mapping Results We have carried out four array hybridization experiments that represent two biological replicates and two technical replicates using our custom-designed NimbleGen tiling array for chromosome IV. Our ”first pass” analysis used a combination of limma and NimbleScan, to resolve 933 putative MARs at an estimated FDR <0.05. The median length of the putative MAR regions from this analysis is 800 bp. Panel A shows that the median AT content of the putative MARs (histogram) is 71%, which contrasts to the median AT content of 63% along chromosome IV (red curve). These data are consistent with earlier studies showing that MARs are AT-rich. The relative distance between each MAR can be used to estimate loop size. Panel B depicts the uneven spacing of MARs, which encompass a range of loop sizes across chromosome IV. The frequency of MAR spacing shows an unimodal distribution. The largest peak contained MARs with spacing that ranged from 42 bp to 265 kb with an average loop size of 19 kb and a median loop domain size of 10 kb. Genomic DNA is packaged and organized within the nucleus by histones. When the histones are extracted, the DNA forms large loops (nuclear haloes), which remains bound by Matrix Attachment Regions, or MARs, to a substructure composed of RNA and protein called the nuclear matrix. While the biological significance of MARs remains largely unknown, several studies have shown that MARs may function as origins of DNA replication in higher eukaryotes. We have used lithium diiodosalisylic acid (LIS) to extract the histones from A. thaliana nuclei to produce nuclear haloes, which were then digested with Eco RI and Hind III. Matrix associated DNA was separated from unbound DNA by low speed centrifugation. The MAR DNA was then amplified and labeled for microarray analysis.

description

Global Analysis of Functional Units of Plant Chromosomes: DNA Replication, Domain Structure, and Transcription. A. PI: Bill Thompson a , Co-PIs: George Allen b , Linda Hanley-Bowdoin c , Doreen Main d , Rob Martienssen e , Bryon Sosinski f , Matthew Vaughn e. B. - PowerPoint PPT Presentation

Transcript of Global Analysis of Functional Units of Plant Chromosomes:

Page 1: Global Analysis of Functional Units of Plant Chromosomes:

Global Analysis of Functional Units of Plant Chromosomes:Global Analysis of Functional Units of Plant Chromosomes:DNA Replication, Domain Structure, and TranscriptionDNA Replication, Domain Structure, and Transcription

PI: Bill Thompsona, Co-PIs: George Allenb, Linda Hanley-Bowdoinc, Doreen Maind, Rob Martienssene, Bryon Sosinskif, Matthew Vaughne

a Plant Biology, Genetics, and Crop Science, NC State University, Raleigh, NC 27695; b Horticultural Science, NC State University; c Biochemistry and Genetics, NC State Universityd Horticulture & Landscape Architecture, Washington State University; e Cold Spring Harbor Laboratory, New York; f Horticultural Science, NC State University

In this project we are identifying and characterizing regions of the genome that are replicated at different times during S phase. We have developed a FACS-based procedure, combined with BrdU pulse-labeling and immunoprecipitation, to analyze replication timing of an asynchronous cell population. This approach is intended to define functional domains of chromatin by determining their preferred time of replication. Figure 1 shows a chromosome-wide view of genomic features and DNA replication in the early, mid and late S-phase for Arabidopsis Chr 4. Panel A shows gene coverage (orange line) and TE coverage (purple line). Panel B shows GC percentage (calculated in 1-kb non-overlapping windows) is shown in panel B. Panel C shows a schematic representation of chr 4 omitting the telomeres and NOR (The gene-rich euchromatic distal short and distal long arms are shaded light gray while the heterochromatic knob and pericentromere are shaded black. The proximal portions of both the short and long arms have intermediate characteristics and are shaded dark gray). Panel D shows replication profiles , expressed as log2 ratio of BrdU-incorporation in early (blue), mid (green) or late (red) S-phase cells relative to total DNA from the same cells. Gene and TE coverage, GC percentage, and replication profiles are loess-smoothed using a 150-kb window. Early replication is most prevalent in the distal long arm, a predominately euchromatic region. Late replication predominates in the heterochromatic knob and pericentromere, whereas regions of late replication are dispersed in other portions of the chromosome, especially in the centromere-proximal portions of the long and short arms. A remarkable feature of these data is that the early and late replication profiles in panel D show remarkable complementarity (R= -0.83), while the profiles for replication in early and mid S-phase cells are very similar to each other (R= 0.87). The most evident difference between the early and mid S profiles is a broadening and merging of early-replicating regions in mid S. In other words, the DNA replicating in mid S-phase represents nearly the same population of sequences as that replicating in early S-phase. The mid S-phase profile is also distinct from the late profile (R= -0.85). These data indicate that replication in Arabidopsis is basically a two phase process. The similarity of the early and mid S profiles is best explained by assuming considerable heterogeneity in the order of replication in different cells. Figure 2 displays a schematic representation of replicons, replication timing and replication domains for chr 4. In the top panel, each vertical bar represents a replicon, with the width of the bar proportional to the length of the replicon. Subdivisions within the bar indicate the percentage of probes within the replicon with a given replication time. The middle panel illustrates the clustering of replicons with similar timing into replication domains. The lower panel is a cartoon of the major regions of the chromosome, as in panel C of Figure 1. The complexity of replication timing within many replicons likely reflects several factors, including time and efficiency of origin firing, the number of origins within initiation zones, and the speed of elongation by DNA polymerase in specific contexts. Many of replication domains we found in Arabidopsis chr 4 are considerably smaller than those observed in mammalian cells. However, several larger replication domains do occur, including a 4.5-Mb late replication domain at coordinates 2.6 – 7.1 Mb and a 2.3-Mb early/mid replication domain located at coordinates 16.2 – 18.5 Mb.

Figure 1. Landscape of Arabidopsis chr 4 with replication timing profiles for early, mid, and late S-phase cells

D

C

B

A

Early

Early and Mid

Mid

Mid and Late

Late

Early and Late

Early, Mid and Late

Indeterminate

Figure 2. Replication timing and replicon structure on Arabidopsis chr 4

Analysis of DNA Replication timing on Arabidopsis Chromosome 4

We are optimizing the essential conditions for analysis of DNA replication timing on rice chr 4L and 10L using a rice cell culture (cultivar Nipponbare). All the technologies we developed for Arabidopsis replication timing will apply to this rice work with the optimized conditions. Panel A: The highest BrdU incorporation was observed twelve to sixteen hours after 7 day cultures were supplemented with fresh medium (“7 day split”). Panel B: Analytical FACS profile of 1-hr pulse-labeled rice nuclei isolated from cells after a 1 hour pulse given 16 hours after 7-day split cultures. Gates are defined for cell populations in G1, S and G2/M phases.

DNA Replication Timing in Rice

Mapping of Short Nascent Strands DNA replication is a strictly regulated process that preserves the genetic information necessary for future generations. Despite its importance, very little is known of the regulation of DNA replication in higher eukaryotes. Our goal is to understand and define where DNA replication originates in the Arabidopsis thaliana genome. We are using the newly synthesized leading strands (short nascent strands, SNS) which are thought to be initiated at the very origins. These SNS are being analyzed using a NimbleGen custom designed tiling array that covers the entire chromosome lV. We have developed two techniques to achieve this. In the first (A), we isolated SNS by size using an alkaline sucrose gradient (fig. A). Collected DNA between 1 and 3kb (including SNS) is then amplified, labeled and hybridized to the array. The second technique was developed to enrich and purify SNS. During the synthesis of DNA, the RNA primase adds an RNA primer so the DNA polymerase can recognize it and start synthesizing DNA. To isolate SNS, we used lambda exonuclease (B) which is unable to digest ssDNA primed with RNA primer at 5’ while digesting unprotected DNA. This allows us to recover newly synthesized DNA that is close to the origin. The recovered SNS are then amplified, labeled and hybridized to the array.

(C) Comparison of the preliminary data for SNS with two different procedures. Data shows that 62% of the peaks found in SNS Lambda exonuclease are common with SNS by size. While this suggest that we have putative SNS, more analysis is needed to confirm our results

During the last year we have profiled histone modifications, RNA Pol II occupancy and gene expression patterns in cell suspension culture (samples Cells4 and Cells7, taken at 96hrs and 16 hrs post culture split, respectively) and in the young rosette leaf samples from the wild-type Col plants using Illumina GA2 high throughput DNA sequencers. We used antibodies against histone H3K4me2, H3K4me3, H3K9me2, H3K27me2, H3K27me3, H3K14ac and H3K56ac, as well as antibodies against RNA Pol II to immuno-precipitate DNA associated with these histone modifications or with the initiating or elongating forms of RNA Pol II. This generated, on average, over 1.5x genome coverage per sequenced ChIP library. We have also sequenced over 2M mRNA fragments (40-50 nt in length) in a strand-specific manner from the same samples. In addition, we have sequenced between 1.7M and 3M small RNA from leaf, Cells4 and Cells7 samples. We are currently focusing our efforts on the analysis of ChIP-seq and RNA-seq data in the context of DNA replication timing by computational identification of genomic regions significantly enriched by immunoprecipitation over the control, input, DNA, in 100bp windows. The windows showing enrichment will be displayed as intensity maps in the Generic Genome Browser 2 environment.

Profiling Histone Modifications

(A) Proteins from the final ChIP input fraction (40 ug, lane 1), whole cell extract (40 ug WCE, lane 2) and volume equivalents from the non-chromatin-associated (S, lane 3) and chromatin-bound (P, lane 4) fractions were resolved by SDS-PAGE, and the blots were probed with the indicated antibodies. Chromatin was immunoprecipitated with the indicated quantity of anti-ORC2 serum (B) or anti- MCM5 serum (C). Proteins remaining in the supernatant (depletion) and the immunoprecipitated (enrichment) fractions were resolved by SDS-PAGE and the blots were probed with anti-ORC2 serum (B) or anti-MCM5 serum (C). (D) Sheared chromatin. Control (lane 2) and sonicated (lane 3) DNA was resolved and visualized with ethidium bromide.

Validation of ORC2 and MCM5 ChIP

ORC2 binding sites in Arabidopsis Ch4 euchromatin often map with early replication timing peaks

The profile shows the Loess-smoothed early S replication profiles. Vertical lines indicate the termination zones for putative replicons.ORC2-binding sites are marked by orange spheres. Vertical positioning of ORC2 sites is from the replication profile and does not indicate ORC2 enrichment.

Preliminary analysis of ORC2 binding sites The experiment included 3 bioreps each with an IP-technical rep for a total of 6 ChIP samples corresponding to ORC2. The samples were hybridized to Nimblegene Ch4 tiling arrays. The raw data was Loess-normalized and scaled in limma, and peak-finding was performed with NimbleScan using an 800-bp window. This analysis identified 563 putative ORC2-binding sites at an FDR ≤ 0.05, of which 289 were in constitutive heterochromatin. Euchromatic ORC2 binding sites tend to be intergenic, e.g. distal long arm is only ~ 22% intergenic but 56% of ORC2 binding sites in this region are intergenic. ORC2 binding sites are also AT-rich (68% for ORC2 binding compared to 64% for sites not binding ORC2).

Mapping Nuclear Matrix Attachment Regions (MARs)

Preliminary Mapping Results We have carried out four array hybridization experiments that represent two biological replicates and two technical replicates using our custom-designed NimbleGen tiling array for chromosome IV. Our ”first pass” analysis used a combination of limma and NimbleScan, to resolve 933 putative MARs at an estimated FDR <0.05. The median length of the putative MAR regions from this analysis is 800 bp. Panel A shows that the median AT content of the putative MARs (histogram) is 71%, which contrasts to the median AT content of 63% along chromosome IV (red curve). These data are consistent with earlier studies showing that MARs are AT-rich. The relative distance between each MAR can be used to estimate loop size. Panel B depicts the uneven spacing of MARs, which encompass a range of loop sizes across chromosome IV. The frequency of MAR spacing shows an unimodal distribution. The largest peak contained MARs with spacing that ranged from 42 bp to 265 kb with an average loop size of 19 kb and a median loop domain size of 10 kb.

Genomic DNA is packaged and organized within the nucleus by histones. When the histones are extracted, the DNA forms large loops (nuclear haloes), which remains bound by Matrix Attachment Regions, or MARs, to a substructure composed of RNA and protein called the nuclear matrix. While the biological significance of MARs remains largely unknown, several studies have shown that MARs may function as origins of DNA replication in higher eukaryotes. We have used lithium diiodosalisylic acid (LIS) to extract the histones from A. thaliana nuclei to produce nuclear haloes, which were then digested with Eco RI and Hind III. Matrix associated DNA was separated from unbound DNA by low speed centrifugation. The MAR DNA was then amplified and labeled for microarray analysis.