Exploiting transcription Exploiting transcription factor binding site factor binding site
clustering to identify cis-clustering to identify cis-regulatory modules regulatory modules involved in pattern involved in pattern
formation in the formation in the DrosophilaDrosophila genome genome
ECS289A PresentationECS289A PresentationBy Hua ChenBy Hua Chen
2003-3-32003-3-3
Background KnowledgeBackground Knowledge A significant character of cis-regulatory sites: A significant character of cis-regulatory sites:
the multiple binding sites for different the multiple binding sites for different transcriptional factors tend to cluster together transcriptional factors tend to cluster together in one region around the gene, forming the Cis-in one region around the gene, forming the Cis-Regulatory Modules (CRM).Regulatory Modules (CRM).
The searching of cis-regulatory sites gives out The searching of cis-regulatory sites gives out too many candidate positions, which make it too many candidate positions, which make it difficult to tell the true ones;difficult to tell the true ones;
The character of CRM provides a feasible The character of CRM provides a feasible method to identify the cis-regulatory sites in the method to identify the cis-regulatory sites in the genome.genome.
One example of CRM in One example of CRM in Drosophila:Drosophila:eveeve gene gene
Targets:Targets: Adopt the clustering Adopt the clustering
of cis-regulatory of cis-regulatory modules as a method modules as a method to identify the to identify the functional motifs;functional motifs;
Test the method with Test the method with some known real some known real CRM regions;CRM regions;
Search the genome to Search the genome to discover CRMs and discover CRMs and confirm the results by confirm the results by experiments.experiments.
The System The System Investigated:Investigated:
The early The early DrosophilaDrosophila embryo.embryo.
Five Five transcriptional transcriptional factors: Bcd, factors: Bcd, Cad, Hb, Kr and Cad, Hb, Kr and Kni are Kni are investigated.investigated.
Methods:Methods: Collecting Transcription Factor Binding Sequences Collecting Transcription Factor Binding Sequences
in preceding lab works and doing Alignment;in preceding lab works and doing Alignment; Construction of Position Weight Matrices (PWM) for Construction of Position Weight Matrices (PWM) for
the conserved motifs.the conserved motifs. Test the method with the known CRMs;Test the method with the known CRMs; Genome-wide Searching for unknown regulatory Genome-wide Searching for unknown regulatory
regions;regions; mRNA Hybridization and Microarray hybridization mRNA Hybridization and Microarray hybridization
to test whether the predicted regions are near to to test whether the predicted regions are near to genes under regulation of the Transcription genes under regulation of the Transcription Factors;Factors;
One special case: One special case: giantgiant gene, further investigated gene, further investigated by Transgenics and Mutant Embryo.by Transgenics and Mutant Embryo.
Step1: Collection and Step1: Collection and Alignment of TF Binding Alignment of TF Binding SitesSites Bcd, Cad, Hb, Kr, Kni binding Bcd, Cad, Hb, Kr, Kni binding
sequences are determined by in vitro sequences are determined by in vitro DNAse protection assays;DNAse protection assays;
The sequences are aligned with The sequences are aligned with MEME.MEME.
Step 2: Construction of Step 2: Construction of PWMs and Searching:PWMs and Searching: Patser is used to construct the Position Weight Patser is used to construct the Position Weight
Matrix;Matrix; Cis-Analyst is used to identify the potential Cis-Analyst is used to identify the potential
binding sites matching to the PWM in the binding sites matching to the PWM in the Drosophila genome.Drosophila genome. A user-defined cutoff parameter (site_p) to eliminate A user-defined cutoff parameter (site_p) to eliminate
predicted low-affinity sites;predicted low-affinity sites; Search the sequence with a specified window length;Search the sequence with a specified window length; Retain the windows that contain at least min_sites Retain the windows that contain at least min_sites
binding sites;binding sites; Merge all overlapping windows into a “cluster”.Merge all overlapping windows into a “cluster”.
Binding Site Sequence Binding Site Sequence for for CadCad::
Binding Sites:Binding Sites:
Step 3: Collection of Known Step 3: Collection of Known CRMs:CRMs:
Successful Successful Result: Result: 14/1914/19
with the with the searching searching criteria: criteria: window-window-size=700 bp, size=700 bp, number of number of predicted predicted sites>=13sites>=13
Step 4: Genome-wide Step 4: Genome-wide Searching:Searching: 28 clusters identified;28 clusters identified; 23 out of 28 fall in regions between 23 out of 28 fall in regions between
genes;genes; 5 in the intron regions;5 in the intron regions; 49 genes in the nearby regions.49 genes in the nearby regions.
Step 5: Examine the expression pattern Step 5: Examine the expression pattern of the 49 genes by RNA in situ of the 49 genes by RNA in situ hybridization and microarray hybridization and microarray hybridization:hybridization: The 49 genes are The 49 genes are
examined by examined by hybridizations to see hybridizations to see whether they show the whether they show the pattern of under pattern of under regulation of the TFs;regulation of the TFs;
10 out of the 28 clusters 10 out of the 28 clusters are near to at least one are near to at least one gene show the anterior-gene show the anterior-posterior expression posterior expression pattern (Under regulation pattern (Under regulation of the five TFs).of the five TFs).
Step 6: The special case: Step 6: The special case: giantgiant gene gene
The posterior expression The posterior expression is regulated by is regulated by Cad,Hb,Kr;Cad,Hb,Kr;
The cis-regulatory sites The cis-regulatory sites are still unknown;are still unknown;
The predicted CRM The predicted CRM nearest to the nearest to the giantgiant gene gene is cloned to the upstream is cloned to the upstream of lacZ reporter gene.of lacZ reporter gene.
The lacZ gene show a The lacZ gene show a similar expression similar expression pattern as the giant pattern as the giant mRNA.mRNA.
+/+ +/+ Kr/KrKr/Kr
Conclusions:Conclusions: Binding site clustering is an effective Binding site clustering is an effective
method to identify cis-regulatory method to identify cis-regulatory modules;modules;
A major block is the paucity of the A major block is the paucity of the binding data for most transcription binding data for most transcription factors, which need a systematical factors, which need a systematical work;work;
The real CRM structures is more The real CRM structures is more complex, it needs to incorporate complex, it needs to incorporate more complex rules in the method. more complex rules in the method.
ReferenceReference Berman, B.P., Nibu, Y. et al. 2001. Berman, B.P., Nibu, Y. et al. 2001.
Exploiting transcription factor Exploiting transcription factor binding site clustering to identify binding site clustering to identify cis-regulatory modules involved in cis-regulatory modules involved in pattern formation in the pattern formation in the DrosophilaDrosophila genome. genome. P. N. A. SP. N. A. S. 99:757-762. 99:757-762
Top Related