Data mining of circadian clock genes
description
Transcript of Data mining of circadian clock genes
![Page 1: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/1.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Identification of Circadian Clock GenesBy Data Mining Microarray Data
Atreyi Banerjee and Martin Hunt
The University of Leicester
June 27, 2008
![Page 2: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/2.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
![Page 3: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/3.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
![Page 4: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/4.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
What is circadian rhythm?
Circadian circa (about) + dies (a day) Circadian rhythm is theself-sustained cycle with 24 hour period that controls rest/activitytime awareness, photosynthesis, etc. Common among eukaryotes(Neurospora, Drosophila, Mammals) Reserved for living organisms(daily traffic congestions is not a circadian rhythm) Circannual 1year period(e.g. migration)
![Page 5: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/5.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Circadian rhythm properties
Circadian rhythm properties are conserved across plant and animalkingdom Basic properties of circadian rhythm: Endogenous freerunning period of 24 hours Synchronization of stimuli Period isunchanged with temperature Advantage: learn from studyingsimple organisms (Drosophila, Neurospora, Mouse) Mechanismsare similar but the genes are different The main cycling genes:PER, TIM, CLK, CYC, BMAL
![Page 6: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/6.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Drosophila
Affymetrix gene chip (Drosgenome 1) assay Identifying circadiangenes Clustering and Heatmap Promoter analysis
![Page 7: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/7.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Drosophila circadian oscillator
![Page 8: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/8.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Circadian clock control in Drosophila
ADD REFERENCE
![Page 9: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/9.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Experimentations
Drosophila entrained in 12:12 hour light dark (LD) cycle Then leftin complete darkness and analysed every 4 hours The final datasetincluded replicas of 4 chips CT0, CT4, CT8, CT12, CT16 andCT20
![Page 10: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/10.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
![Page 11: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/11.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• How to find circadian clock genes
• Promotor Analysis
![Page 12: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/12.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Promoter analysis
To detect genes having same regulatory mechanism Extracting the5’ untranslated region of the genes Finding out the overrepresented motifs in the sequences Finding out the cis-regulatorymodules (combination of binding sites) in sets of co-expressed orcoregulated genes Getting the putative transcription factor bindingsites (TFBS) Functional analysis
![Page 13: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/13.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Effects of clock mutations on enhancers regulatingcircadian gene expression
Stempfl, T. et al. Genetics 2002;160:571-593
![Page 14: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/14.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
TOUCAN software
An interactive java display Map genes onto the Sequence set spaceFlexibilty of using any identifier(Affy ID, EMBL, Refseq etc)Perform statistical tests for finding regulatory sequences, selectingparts of sequences, finding CpG islands in metazoan genome
![Page 15: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/15.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Predict instances of known motifs with MotifScanner
![Page 16: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/16.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Significant motifs found in each cluster
![Page 17: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/17.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Predict cis-regulatory modules with MotifSampler
The co-expression of Dorsal 2 and Myf showing
![Page 18: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/18.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The cis-regulatory modules in each cluster
![Page 19: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/19.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The cis-regulatory module in genes listed with p-values
![Page 20: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/20.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Genscan output of cluster 1
![Page 21: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/21.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
List of unknown TFBS found in each cluster
![Page 22: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/22.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
de novo discovery of unknown TFBS
MotifSampler tool in TOUCAN used to find unknown motifs whichcould be novel transcription factors The 5’UTR sequences alsoextracted from Ensembl Biomart The over represented TFBS wereextracted from MATCH and OTFBS Dorsal 2 and Myf were overrepresented modules ARNT also found in cycle an important clockgene, was located Genscan predicted genes in each cluster
![Page 23: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/23.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• Identifying circadian clock genes
• Promotor Analysis
![Page 24: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/24.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Outline
• Introduction
• Identifying circadian clock genes
• Promotor Analysis
![Page 25: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/25.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Microarray experiment
?
Data (spreadsheet)
?
Process data in R
?
Data analysis in R
?List of circadian genes
![Page 26: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/26.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Microarray experiment
?
Data (spreadsheet)
?
Process data in R
?
Data analysis in R
?List of circadian genes
![Page 27: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/27.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Identifying circadian genes: an outline
Four methods considered, all of which were implemented in R:
GeneCycle based
• The Fisher Method (Wichert et al. 2004)
• The Robust Method (Ahdesmaki et al. 2005)
“Sine wave” based
• The M&R Method (McDonald & Rosbash 2001)
• The Sine Method
![Page 28: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/28.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Fisher Method
Implemented by the R package GeneCycle, based on Fouriermethods and Fisher’s g test
Time Series:
CT0 = 1.2
CT4 = 4.9
CT8 = 9.5
CT12 = 0.4
CT16 = 1.5
CT20 = −42
- Fisher’s g test - p-value = 0.3213
Repeat this process for each time series
![Page 29: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/29.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Fisher Method: FDR
Oops! We’ve carried out over 6000 multiple tests.The solution: false discovery rate (FDR) control, implemented bythe R package fdrtool
Definition
The FDR value is the percentage of false-positives we expect to befound in our results
0.011, 0.021, 0.042, 0.045, 0.056, 0.065, 0.066, . . .
![Page 30: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/30.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Robust Method
Also implemented by the R package GeneCycle
![Page 31: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/31.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The M&R Method
The M&R Method
![Page 32: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/32.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Sine Method
The Sine Method
![Page 33: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/33.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Heatmap: The Fisher Method
heatmap of Fisher method
![Page 34: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/34.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Heatmap: The Robust Method
![Page 35: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/35.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
The Numbers
How many in genes in common between methods etc
![Page 36: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/36.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Fisher Vs Sine Methods
what’s so different about them?
![Page 37: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/37.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Conclusions
• Why only use sine waves as a model?
• Is FDR really better than multiple testing?
• Why use GeneCycle?
![Page 38: Data mining of circadian clock genes](https://reader033.fdocuments.net/reader033/viewer/2022051013/548255065906b505058b466c/html5/thumbnails/38.jpg)
Introduction Promotor Analysis Identifying genes: Methods Identifying genes: Results
Conclusions
• All methods find some circadian clock genes
• . . . and some false positives
• Best approach: use many methods
• There is always a new, better method around the corner . . .