Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used...

39
Staging of Bladder Tumors. Steen Knudsen April 7, 2003 1

Transcript of Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used...

Page 1: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Staging of Bladder Tumors.

Steen Knudsen

April 7, 2003

1

Page 2: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Contents

1 Introduction 2

2 Materials and Methods 32.1 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Array Normalization . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Expression index calculation . . . . . . . . . . . . . . . . . . . . 32.4 Clustering and PCA on chips . . . . . . . . . . . . . . . . . . . . 42.5 Statistical Significance . . . . . . . . . . . . . . . . . . . . . . . 42.6 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . 42.7 Log fold change calculation . . . . . . . . . . . . . . . . . . . . . 42.8 Gene Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.9 Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . 52.10 Gene Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 52.11 Protein Function Prediction . . . . . . . . . . . . . . . . . . . . . 52.12 Promoter analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Results 93.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 PCA and clustering of chips . . . . . . . . . . . . . . . . . . . . 93.3 Classification of chips . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 113.5 Functional categories . . . . . . . . . . . . . . . . . . . . . . . . 183.6 Prediction of orphan function . . . . . . . . . . . . . . . . . . . . 183.7 Signal transduction pathway analysis . . . . . . . . . . . . . . . . 183.8 Metabolic pathway analysis . . . . . . . . . . . . . . . . . . . . . 243.9 Clustering of Genes . . . . . . . . . . . . . . . . . . . . . . . . . 243.10 Promoter analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 303.11 Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . 35

4 Appendix A: parameters used in this report 37

Abstract

A DNA microarray experiment was performed using a chip of type HU6800.Principal Component Analysis and clustering was performed to reveal group-ings in the samples. A statistical analysis was performed to reveal genesdifferentially expressed between the categories. A correspondence analysiswas performed to identify genes associated with the individual categoriesand experiments.

Significantly regulated genes with unknown function were analyzed forproperties of the encoded proteins and their function predicted using the

1

Page 3: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

ProtFun software. The TRANSPATH and KEGG databases were searchedfor differentially expressed genes annotated on known signal transduction ormetabolic pathways.

1 Introduction

This report was generated automatically by the GenePublisher automatic DNAmicroarray analysis system1.

Guide to interpretation of results: first look at the MVA plots before and afternormalization to see if there are any obvious outlying chips (high variance andsteep slope). Outlying chips may also be identified in the chip clustering, the PCAor the KNN classifier. Then you look at the table of genes with significant changesin expression. Help in interpreting the biology of these genes may come from theLocusLink (if available), and from the TRANSPATH and KEGG analysis. Typi-cally, one or more genes on this list need to be verified as differentially regulatedby another method before publication, for example a quantitative PCR against themessenger RNA or an immunoassay against the protein. The gene cluster analysisis usually only of interest if there are more than two conditions compared in theexperiment. Whether there are two or more conditions, you may look at the pro-moter analysis. The list of potential promoter elements may be overwhelming, butyou can try to look for elements that are found by more than one method, or ele-ments that show up in genes with a related role or function. For more informationon the analysis methods used in this report, see Knudsen, S. (2002) A Biologist’sGuide to Analysis of DNA Microarray Data. Wiley, New York.

The purpose of this study is to identify differences between different stages/typesof bladder cancer based on DNA chips run on a biopsy. Patients with suspiciousgrowth in the bladder epithelium are subjected to a biopsy with an endoscope.From the biopsy RNA is extracted and run on a DNA chip. The biopsy is alsogiven to histopathologist, who uses a microscope to evaluate and stage the sus-picious growth into: superficial Ta, intermediate T1, and invasive T2-T4. Thepurpose of this report is to identify differences in gene expression between thesestages. Such differences can be used not only to learn more about the molecu-lar basis of the disease and its progression from benign to malignant, but also toclassify tumors based on a biopsy.

The data has been gathered by Skejby Sygehus and it cannot be used withouttheir permission.

1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis, C. (2003) GenePublisher: Auto-mated Analysis of DNA Microarray Data. Manuscript submitted to Nucleic Acids Research.

2

Page 4: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

2 Materials and Methods

2.1 Statistical Analysis

The statistical analysis was performed using the R statistics programming envi-ronment available from www.r-project.org. False positive prediction rates wereassessed by multiplying P-values with the number of genes (Bonferroni-type cor-rection for multiple testing2).

2.2 Array Normalization

The individual chips were made comparable to each other by applying the qs-pline3 method. Qspline is a robust non-linear method for normalization usingarray signal distribution analysis and cubic splines. Qspline fits cubic splines tothe quantiles of the array signal distribution, and uses those splines to normalizesignals dependent on their intensity.

2.3 Expression index calculation

For each gene, the expression index was calculated based on the probes by usingthe Li-Wong Model-Based Expression Index4. This model takes into account thatprobe pairs respond differently to changes in expression of a gene and that thevariation between replicates is also probe-pair dependent.

If ”Background correction” is specified as ”subtractmm”, then the weightedaverage difference is used:

���������� ������������������where ��� is a scaling factor that is specific to probe pair ������������ and isobtained by fitting a statistical model to a series of experiments.

The model can be run without the mismatch (MM) probes as well, using onlyperfect match (PM) probe information, by specifying ”Background correction” as”bg.adjust”. This uses a model-based background subtraction from PM probes5.This latter PM-bg method is preferred over PM-MM methods because the result-ing noise level is lower and because negative expression values are avoided.

2Benjamini, Y., and Hochberg, Y. (1995) Controlling the False Discovery Rate: A Practicaland Powerful Approach to Multiple Testing. J. R. Statist. Soc. B 57:289-300

3Workman, C., Jensen, L.J., Jarmer, H., Berka, R., Saxild, H.H., Gautier, L., Nielsen, C.,Nielsen, H.B., Brunak, S, and Knudsen, S. (2002) A new non-linear method for reducing variancebetween DNA microarray experiments. Genome Biology 3(9):0048.

4Li, C., and Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: Expressionindex computation and outlier detection. Proc. Natl. Acad. Sci. USA 98:31–36.

5 Irizarry, RA, Hobbs, B, Collin, F, Beazer-Barclay, YD, Antonellis, KJ, Scherf, U,

3

Page 5: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

2.4 Clustering and PCA on chips

Before any statistical analysis was performed, all genes on the chip were used fora hierarchical cluster analysis and principal component analysis to discover anygrouping in the data (chips).

2.5 Statistical Significance

Differentially expressed genes between two categories of replicated experimentswere identified by applying the Welch t-test assuming unequal variance of the twopopulations. The P-values calculated for each gene were Bonferroni corrected bythe number of genes to estimate the family-wise error rate. It is possible to specifyuse of a paired t-test in the parameter file.

2.6 Analysis of Variance

Differentially expressed genes between more than two categories of replicatedexperiments were identified by applying an Analysis of Variance (ANOVA). TheP-values calculated for each gene were Bonferroni corrected by the number ofgenes to estimate the family-wise error rate.

2.7 Log fold change calculation

The logarithm of the fold change of gene expression was calculated in order toobtain a symmetric distribution of regulation around zero (upregulated genes havepositive logfold values, downregulated genes have negative logfold values). Ex-pression values less than 20 were set to 20 before calculating the log fold changein order to avoid negative expression values that can occur if mismatch probevalues are subtracted.

2.8 Gene Clustering

Hierarchical clustering was performed using the ClusterExpress software devel-oped by Christopher Workman. Distances were calculated as the angle betweenvectors, and the expression values visualized as the logarithm of fold change rel-ative to the average of category A.

Speed, TP (2002) Exploration, Normalization, and Summaries of High Density Oligonu-cleotide Array Probe Level Data. Accepted for publication in Biostatistics., Available athttp://biosun01.biostat.jhsph.edu/ ririzarr/

4

Page 6: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

2.9 Correspondence Analysis

Associations between categories and genes significant in the statistical test werevisualized with correspondence analysis. Expression values were first convertedto positive numbers by setting all negative numbers to zero. After correspondenceanalysis, genes and experiments were plotted in the same plot using the first twoprincipal components6

2.10 Gene Annotation

Genes were annotated with Gene Ontologies (www.geneontology.org), which pro-vides a unique identifier for each gene known to be responsible for a cellular pro-cess or function. Genes were grouped according to high-level function categoriesin the Gene Ontology database. Genes grouped under more than one functionalcategory were only counted once.

Genes were matched to the KEGG7 (Kyoto Encyclopedia of Genes and Genomes)description of known cellular pathways (http://www.genome.ad.jp). For genesmatching more than one pathway, only one pathway is shown.

Genes were matched to the TRANSPATH8 database of signal transduction(www.gene-regulation.com). If genes match more than one pathway, only onepathway is shown.

2.11 Protein Function Prediction

For those genes where a gene ontology number has not been assigned and thefunction has not been inferred by homology to another protein, an attempt wasmade at predicting the function using the ProtFun9 method. The ProtFun methodspredicts the function not based on homology, but based on properties of the proteinsequence as well as predicted features such as post-translational modification.

6Fellenberg, K., Hauser, N. C., Brors, B., Neutzner, A., Hoheisel, J. D., and Vingron, M.(2001), Correspondence analysis applied to microarray data. Proc. Natl. Acad. Sci. USA98:10781–10786.

7Kanehisa M, Goto S, Kawashima S, Nakaya A. ”The KEGG databases at GenomeNet.” Nu-cleic Acids Res. 2002 Jan 1;30(1):42-6.

8Krull M, Voss N, Choi C, Pistor S, Potapov A, Wingender E. ”TRANSPATH: an integrateddatabase on signal transduction and a tool for array analysis.” Nucleic Acids Res. 2003 Jan1;31(1):97-100.

9Jensen, L. J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Staerfeldt,H. H., Rapacki, K., Workman, C., Andersen, C. A. F., Knudsen, S., Krogh, A., Valencia, A., andBrunak. S. (2002) Ab initio prediction of human orphan protein function from post-translationalmodifications and localization features. Journal of Molecular Biology 319:1257-1265

5

Page 7: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

2.12 Promoter analysis

Upstream regions (5000 bp for human, 300 bp for yeast) were extracted fromthe genes of each cluster using Ensembl (www.ensembl.org) or GenBank. Thesoftware program saco patterns10 was run on each cluster to identify significantlyoverrepresented patterns in the upstream regions. saco patterns looks for con-served (identical) patterns in sequences, it does not allow for degeneration of thepattern.

The Gibbs sampler11 was run on the same upstream regions. The Gibbs sam-pler looks for degenerate patterns which it tries to capture with a weight matrixdescription. In all sequences, the best match to this weight matrix is shown inthe output. The Gibbs sampler starts with a new random matrix every time and isnon-deterministic, meaning that it may give different results every time it is run.

The transcription factor binding sites in the TRANSFAC12 database were matchedagainst the same upstream regions. Factor matrices with hits more than 95% ofthe maximal score of the matrix were recorded.

10Jensen, L.J. and S. Knudsen, (2000) Automatic Discovery of Regulatory Patterns in Pro-moter Regions Based on Whole Cell Expression Data and Functional Annotation. Bioinformatics16:326-333.

11Lawrence, Altschul, Boguski, Liu, Neuwald & Wootton (1993) ”Detecting Subtle SequenceSignals: A Gibbs Sampling Strategy for Multiple Alignment”, Science 262:208-214.

12 Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D,Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, ReuterI, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E. ”TRANSFAC: transcriptional regulation,from patterns to profiles. Nucleic Acids Res. 2003 Jan 1;31(1):374-8.

6

Page 8: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Ta

10 13

−2.

0−

0.5

0.5

0.4

10 12 14

−1.

00.

01.

0

0.2

10 12 14

−2.

0−

1.0

0.0

1.0

0.2

10 12 14

−2.

0−

1.0

0.0

1.0

0.7

10 12 14

−1.

00.

01.

0

0.2

10 12 14

−3

−1

01

2

0.3

10 12 14

−1.

50.

01.

02.

0

0.4

10 12 14

−1

01

2

0.4

11 13 15

−2

−1

01

2

0.9

Ta

10 12 14

−1

01

2

0.4

10 12 14

−1

01

2

0.4

11 13 15

−2

−1

01

2

0.9

10 12 14

−1

01

2

0.4

10 12 14

−2

−1

01

2

0.4

10 12 14

−1

01

2

0.4

10 12 14

−1

01

2

0.6

11 13 15

−2

−1

01

2

1.1

Ta

10 12 14

−2.

0−

1.0

0.0

1.0

0.2

11 13 15

−3

−2

−1

0

0.6

10 12 14

−2

−1

01

2

0.2

10 12 14

−3

−1

01

2

0.3

10 12 14

−2

−1

01

2

0.4

10 12 14

−2

−1

01

2

0.3

11 13 15

−3

−2

−1

01

0.8

Ta

11 13 15−

2.0

−0.

50.

5

0.7

10 12 14

−2

−1

01

2

0.3

10 12 14

−2

−1

01

2

0.3

10 12 14

−1.

00.

01.

02.

0

0.4

10 12 14

−1.

5−

0.5

0.5

1.5

0.4

11 13 15

−2

−1

01

0.9

Ta

11 13 15−0.

50.

51.

5

0.6

11 13 15

−1

01

2

0.6

11 13 15

−0.

50.

51.

5

0.5

11 13 15

−1.

00.

01.

02.

0

0.5

11 13 15−1.

00.

01.

0

0.4

Ta

10 12 14

−2

−1

01

2

0.4

10 12 14

−2

−1

01

2

0.4

10 12 14

−2

−1

01

2

0.4

11 13 15

−2

−1

01

0.8

T1

10 12 14

−0.

50.

51.

5

0.3

10 12 14

−1

01

2

0.4

11 13 15

−1.

5−

0.5

0.5

1.5

0.8

T1

10 12 14

−2.

0−

1.0

0.0

1.0

0.4

11 13 15

−2.

0−

1.0

0.0

1.0

0.8

T1

11 13 15

−2

−1

01

2

0.6 T1

A

M

MVA plot

Figure 1: M versus A for all chip-to-chip comparisons before normalization. The diago-nal shows the names of the chips being compared. The lower triangle shows the varianceof the ratios between the two chips being compared. Two identical chips should have avariance of zero. Look for bad chips in this plot. They are revealed by a higher variancein comparisons to the other chips and by a consistent curvature when compared to otherchips (indicating low amount of hybridization). The comparison is limited to 10 chipsversus 10 chips.

7

Page 9: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Ta

10 12 14

−2

−1

01

2

0.3

11 13 15

−2

−1

01

2

0.2

10 12 14

−2

−1

01

2

0.2

11 13 15−2.

0−

1.0

0.0

1.0

0.4

10 12 14

−3

−2

−1

01

0.2

11 13 15

−2

−1

01

2

0.3

11 13 15

−2

−1

01

2

0.3

11 13 15

−2

01

2

0.3

11 13 15

−2

01

23

0.5

Ta

11 13 15

−1

01

23

0.3

11 13 15

−1

01

2

0.2

10 12 14

−1

01

2

0.4

11 13 15

−3

−1

12

3

0.3

10 12 14

−2

−1

01

2

0.3

11 13 15

−2

−1

01

2

0.3

11 13 15

−1.

00.

01.

02.

0

0.3

10 12 14

−2

01

2

0.5

Ta

10 12 14

−3

−1

01

0.2

11 13 15

−3

−2

−1

01

0.3

11 13 15

−3

−2

−1

01

0.2

11 13 15

−3

−1

01

2

0.3

11 13 15

−3

−1

01

2

0.3

11 13 15

−3

−1

01

2

0.3

11 13 15

−3

−1

01

2

0.4

Ta

11 13 15−2.

0−

1.0

0.0

1.0

0.4

11 13 15

−3

−1

01

2

0.2

11 13 15

−2

−1

01

0.3

11 13 15

−1.

50.

01.

0

0.3

11 13 15−2.

0−

0.5

0.5

1.5

0.3

11 13 15

−2

−1

01

0.5

Ta

11 13 15

−2

−1

01

2

0.4

11 13 15

−1.

5−

0.5

0.5

1.5

0.4

11 13 15

−1.

5−

0.5

0.5

1.5

0.4

11 13 15

−1.

5−

0.5

0.5

1.5

0.4

11 13 15

−2

−1

01

0.3

Ta

11 13 15

−2

01

23

0.3

11 13 15

−2

01

23

0.3

11 13 15

−2

01

23

0.3

11 13 15

−2

01

2

0.5

T1

11 13 15

−1.

00.

01.

0

0.2

11 13 15

−1.

00.

01.

02.

0

0.3

11 13 15

−2

−1

01

0.5

T1

11 13 15

−1.

5−

0.5

0.5

1.5

0.3

11 13 15

−2

−1

01

2

0.4

T1

11 13 15

−2

−1

01

0.5 T1

A

M

MVA plot

Figure 2: M versus A for all chip-to-chip comparisons after normalization. The diagonalshows the names of the chips being compared. The lower triangle shows the varianceof the ratios between the two chips being compared. Two identical chips should have avariance of zero.

The comparison is limited to 10 chips versus 10 chips.8

Page 10: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

T2T1T1T2T2T1T1T1TaTaTaTaTaTaT2T2T1T1

Figure 3: Hierarchical clustering of categories using Euclidean distance between vectorsof all genes and complete linkage.

3 Results

3.1 Normalization

Figure 1 shows a comparison of all chips before normalization. This is a so-calledM versus A plot; instead of plotting each probe on one chip against each probe onanother, the scales are changed so it plots, for each probe, the logarithm of the ra-tio of expression between the two chips as a function of the logarithm of the meanof the expression of the two chips. Two identical chips would yield a straight, flatline through zero. Two comparable chips ideally have a straight, flat line throughzero and a few probes off the line indicating differential expression. Deviation ofthe line from zero reveals a need for normalization before the two chips can becompared, and deviation from a straight line reveals a need for non-linear normal-ization (different normalization factors for highly and weakly expressed genes).

Figure 2 shows the comparison of all the chips after normalization.

3.2 PCA and clustering of chips

All chips were clustered based on the Euclidean distance of all genes (Figure 3).Such a clustering shows the relationship between individual chips, in particular

9

Page 11: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

0.21 0.22 0.23 0.24 0.25

−0.2

0.0

0.2

0.4

PC1

PC2

TaTa Ta

Ta

Ta

Ta

T1

T1T1

T1

T1

T1

T1

T2

T2

T2

T2

T2

Figure 4: Principal Component Analysis showing all chips plotted according to their firsttwo principal components.

if the cluster together in the categories they have been assigned. If they do notcluster together in the categories assigned, or if one chip clusters separately, thismay be indicative of a problem, for example an outlier (bad quality) chip. In thatcase the analysis should be repeated without that chip to see if the results from thestatistical analysis increase in significance.

Another way to look at the same information is to look at the first two principalcomponents. Figure 4 shows a principal component analysis of the individualchips in order to determine any structure in the relationship between chips. ThePCA is based on all genes.

3.3 Classification of chips

A K nearest neighbor classifier was built to classify chips based on the expressionof all genes. Each chip was compared to all other chips and the category assign-ment of the three closest chips in Euclidean gene expression space was used topredict its category. Table 1 shows the prediction for each chip. The total ac-

10

Page 12: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

curacy of class prediction reached was 67 and 67 percent for a k=1 and a k=3classifier, repectively. It may be possible to improve on this accuracy by select-ing predictive genes and by optimizing the number of nearest neighbors K. Doingthis, however, will necessitate an evaluation on an independent test set that wasnot used for optimizing the classifier.

Table 1: Predictions of the K nearest neighbor classifier

Chip Assigned category Predicted category k=1 Predicted category k=3Ta B B BTa B B BTa B B BTa B B BTa B B BTa B B BT1 C C CT1 C C CT1 C B BT1 C B BT1 C C CT1 C B BT1 C C CT2 D D DT2 D C CT2 D C CT2 D D DT2 D C C

3.4 Statistical Analysis

Table 2 and Table 3 show the top 100 up- and downregulated genes after statisticalanalysis together with their estimated P-values. For each gene there is a list ofgene ontology annotations (GO), if available. Information on the P-values andexpression levels of all genes on the array is available in the file Pvalues.abs in thesame directory as this report.

11

Page 13: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Table 2: The top ranking upregulated genes in statistical anal-ysis. Numbers in parenthesis help evaluate the significance andrelevance of the result: expression level of gene on the first chip, Pvalue from the statistical analysis, and the average logfold changebetween the last and the first category.

Rank Gene Annotations (expressionlevel Pvalue logfold)3 M37766 a CD48 antigen (B-cell membrane protein). GO: plasma membrane ; defense

response ; lymphocyte antigen ; integral plasma membrane protein ; (26642.9e-06 0.5)

9 M62628 s NA. (5430 2.5e-05 0.2)

11 X66087 a v-myb myeloblastosis viral oncogene homolog (avian)-like 1. GO: chromatin;; regulation of transcription from Pol II promoter ; (491 3.1e-05 0.5)

12 AFFX-Bio NA. (19563 3.3e-05 0.4)

13 K02405 f major histocompatibility complex, class II, DQ beta 1. (10162 3.5e-05 0.3)

16 M31165 a tumor necrosis factor, alpha-induced protein 6. GO: extracellular ; signal trans-duction ; cell-cell signaling ; inflammatory response ; cell adhesion receptor ;hyaluronic acid binding ; (998 4.8e-05 0.7)

18 M97347 s glucosaminyl (N-acetyl) transferase 1, core 2 (beta-1,6-N-acetylglucosaminyltransferase). GO: O-linked glycosylation ; integralmembrane protein ; beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase ; (629 5.8e-05 0.2)

19 M74719 a transcription factor 4. GO: nucleus ; RNA polymerase II transcription factor ;regulation of transcription from Pol II promoter ; (1336 6.6e-05 1.1)

20 M62505 a complement component 5 receptor 1 (C5a ligand). GO: chemotaxis ; immuneresponse ; plasma membrane ; activation of MAPK ; signal transduction ;chemosensory perception ; cellular defense response ; phospholipase C ac-tivation ; C5a anaphylatoxin receptor ; integral plasma membrane protein ;cytosolic calcium ion concentration elevation ; (2730 6.7e-05 0.7)

23 D38498 f postmeiotic segregation increased 2-like 9. (8456 1.1e-04 0.3)

24 M29696 a interleukin 7 receptor. GO: antibody ; immune response ; signal transduction; interleukin-7 receptor ; antimicrobial humoral response (sensu Invertebrata); regulation of DNA recombination ; cell surface receptor linked signal trans-duction ; (3162 1.1e-04 0.6)

25 D38437 f postmeiotic segregation increased 2-like 3. (8749 1.2e-04 0.3)

28 D30715 x pancreatitis-associated protein. GO: lectin ; cytoplasm ; cell adhesion ; celladhesion molecule ; soluble fraction ; cell proliferation ; extracellular space ;development ; (1451 1.4e-04 0.6)

12

Page 14: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

32 HG4069-H small inducible cytokine A2 (monocyte chemotactic protein 1). GO: receptorbinding ; chemokine ; chemotaxis ; oncogenesis ; cell adhesion ; protein ki-nase ; JAK-STAT cascade ; defense response ; viral replication ; extracellularspace ; signal transducer ; signal transduction ; cell-cell signaling ; inflamma-tory response ; calcium ion homeostasis ; protein amino acid phosphorylation; humoral immune response ; histogenesis and organogenesis ; response topathogenic bacteria ; response to pest/pathogen/parasite ; cell surface recep-tor linked signal transduction ; G-protein coupled receptor protein signalingpathway ; G-protein signaling, coupled to cyclic nucleotide second messenger; (2781 1.9e-04 0.6)

36 U75362 a ubiquitin specific protease 13 (isopeptidase T-3). GO: deubiquitination ;cysteine-type endopeptidase ; ubiquitin-specific protease ; (1465 2.9e-04 0.4)

37 HG3286-H crystallin, alpha A. GO: vision ; chaperone ; protein folding ; (9326 3.0e-040.3)

39 L36644 a EphA5. GO: transmembrane receptor protein tyrosine kinase ; (948 3.4e-040.1)

41 X17042 a proteoglycan 1, secretory granule. GO: proteoglycan ; (2072 3.8e-04 2.2)

42 Y00062 a protein tyrosine phosphatase, receptor type, C. GO: protein tyrosine phos-phatase ; integral plasma membrane protein ; cell surface receptor linked signaltransduction ; transmembrane receptor protein tyrosine phosphatase ; (17163.9e-04 0.8)

44 M58597 a fucosyltransferase 4 (alpha (1,3) fucosyltransferase, myeloid-specific). GO:membrane fraction ; fucosyltransferase ; carbohydrate metabolism ; (30324.1e-04 0.3)

48 D29642 a KIAA0053 gene product. (2424 4.2e-04 0.4)

49 D17357 a inhibin, beta A (activin A, activin AB alpha polypeptide). GO: extracellular; signal transduction ; cell-cell signaling ; skeletal development ; cell growthand/or maintenance ; transforming growth factor-beta receptor ligand ; (6234.2e-04 0.8)

50 U16307 a glioma pathogenesis-related protein. GO: pathogenesis ; (467 4.2e-04 0.4)

54 X13334 a CD14 antigen. GO: apoptosis ; phagocytosis ; plasma membrane ; inflam-matory response ; peptidoglycan recognition ; antibacterial peptide ; GPI-anchored membrane-bound receptor ; cell surface receptor linked signal trans-duction ; (4224 4.9e-04 1.2)

55 U05259 r CD79A antigen (immunoglobulin-associated alpha). GO: defense response ;cell surface receptor linked signal transduction ; (5768 4.9e-04 0.4)

58 HG3928-H surfactant, pulmonary-associated protein A2. (559 5.3e-04 0.5)

60 X89426 a endothelial cell-specific molecule 1. (232 5.6e-04 0.0)

61 U05681 s B-cell CLL/lymphoma 3. GO: oncogenesis ; regulation of cell cycle ; cyto-plasmic sequestering of NF-kappaB ; (8339 5.8e-04 0.3)

65 U00928 a fusion, derived from t(12. GO: nucleus ; RNA binding ; (1631 6.6e-04 0.3)

66 M57731 s GRO2 oncogene. GO: chemokine ; cytokine ; chemotaxis ; soluble fraction ;extracellular space ; inflammatory response ; G-protein coupled receptor pro-tein signaling pathway ; (2679 7.1e-04 1.0)

13

Page 15: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

67 U19713 s allograft inflammatory factor 1. GO: nucleus ; stress response ; cell cycle arrest; inflammatory response ; negative regulation of cell proliferation ; (1276 7.2e-04 1.0)

68 L38025 a ciliary neurotrophic factor receptor. GO: neurogenesis ; signal transduction ;GPI-anchored membrane-bound receptor ; ciliary neurotrophic factor receptor; (6295 7.3e-04 0.2)

70 J00207 r interferon, alpha 2. GO: cell-cell signaling ; inflammatory response ; induc-tion of apoptosis ; interferon-alpha/beta receptor ligand ; cell surface receptorlinked signal transduction ; hematopoietin/interferon-class (D200-domain) cy-tokine receptor ligand ; (990 7.6e-04 0.3)

71 M98539 a prostaglandin D2 synthase (21kD, brain). GO: membrane ; prostaglandin-Dsynthase ; (4750 7.6e-04 0.7)

72 Z34974 s plakophilin 1 (ectodermal dysplasia/skin fragility syndrome). GO: cell adhe-sion molecule ; plasma membrane ; signal transducer ; signal transduction ;cell-cell signaling ; intermediate filament ; intercellular junction ; (3642 7.6e-04 0.2)

73 L25286 s collagen, type XV, alpha 1. GO: collagen ; collagen type XV ; (1286 7.7e-040.7)

76 X04729 s serine (or cysteine) proteinase inhibitor, clade E (nexin, plasminogen activatorinhibitor type 1), member 1. GO: blood coagulation ; endopeptidase inhibitor; (4314 7.9e-04 0.3)

77 J04990 a cathepsin G. GO: cathepsin G ; immune response ; insoluble fraction ; prote-olysis and peptidolysis ; (2663 8.0e-04 0.3)

79 HG3395-H DnaJ (Hsp40) homolog, subfamily B, member 2. GO: chaperone ; co-chaperone ; protein folding ; heat shock protein ; (5187 8.1e-04 0.6)

80 U17743 s mitogen-activated protein kinase kinase 4. GO: JNK cascade ; protein kinase ;signal transduction ; (977 8.2e-04 0.1)

83 AFFX-Bio NA. (4786 8.8e-04 0.7)

84 X93017 a solute carrier family 8 (sodium-calcium exchanger), member 3. (1559 8.8e-040.5)

87 L13720 a growth arrest-specific 6. GO: receptor binding ; cell proliferation ; signal trans-duction ; (6130 9.2e-04 0.4)

88 X14046 a CD37 antigen. GO: plasma membrane ; N-linked glycosylation ; integralplasma membrane protein ; (9849 9.2e-04 0.1)

89 M28882 s melanoma cell adhesion molecule. GO: cell adhesion ; cell adhesion molecule; tumor antigen ; plasma membrane ; embryogenesis and morphogenesis ; in-tegral plasma membrane protein ; (5122 9.5e-04 0.4)

91 Z31560 s SRY (sex determining region Y)-box 2. GO: enhancer binding ; transcriptionfrom Pol II promoter ; (2054 9.7e-04 0.3)

93 X85750 a monocyte to macrophage differentiation-associated. GO: receptor ; membranefraction ; integral plasma membrane protein ; (757 9.8e-04 1.0)

95 U60269 c NA. (5390 9.9e-04 0.2)

96 M34516 a NA. (7895 1.0e-03 1.6)

14

Page 16: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

98 D79984 s suppressor of Ty 6 homolog (S. cerevisiae). GO: nucleus ; transcription fac-tor ; chromatin assembly/disassembly ; regulation of transcription from Pol IIpromoter ; (4026 1.1e-03 0.5)

Table 3: The top ranking downregulated genes in statistical anal-ysis. Numbers in parenthesis help evaluate the significance andrelevance of the result: expression level of gene on the first chip, Pvalue from the statistical analysis, and the average logfold changebetween the last and the first category.

Rank Gene Annotations (expressionlevel Pvalue logfold)1 L11708 a hydroxysteroid (17-beta) dehydrogenase 2. GO: estrogen biosynthesis ; endo-

plasmic reticulum membrane ; (16448 2.3e-07 -2.1)

2 Y11999 a inositol 1,4,5-trisphosphate 3-kinase C. (4456 1.6e-06 -1.1)

4 M81637 a grancalcin, EF-hand calcium binding protein. GO: cytoplasm ; membrane fu-sion ; plasma membrane ; calcium ion binding ; (4010 5.0e-06 -0.8)

5 Z26491 s catechol-O-methyltransferase. GO: microsome ; soluble fraction ; O-methyltransferase ;; (12169 6.8e-06 -1.6)

6 U68385 a Meis1, myeloid ecotropic viral integration site 1 homolog 3 (mouse). GO:transcription factor ; (2597 1.0e-05 -0.8)

7 U42408 a ladinin 1. GO: basement membrane ; structural molecule ; (15551 1.1e-05-0.8)

8 X94453 a pyrroline-5-carboxylate synthetase (glutamate gamma-semialdehyde syn-thetase). GO: proline biosynthesis ; N-acetyl-gamma-glutamyl-phosphate re-ductase ; (3532 2.3e-05 -0.4)

10 U72649 a BTG family, member 2. GO: DNA repair ; tumor suppressor ; cell cycle regu-lator ; transcription factor ; DNA damage response, activation of p53 ; negativeregulation of cell proliferation ; (17310 3.0e-05 -0.9)

14 U49352 a 2,4-dienoyl CoA reductase 1, mitochondrial. GO: mitochondrion ; 2,4-dienoyl-CoA reductase (NADPH) ; (8385 3.7e-05 -1.0)

15 Y08999 a actin related protein 2/3 complex, subunit 1A (41 kD). GO: actin binding ;actin cytoskeleton ; regulation of cell shape and cell size ; actin cytoskeletonreorganization ; (6928 4.3e-05 -0.8)

17 U90549 a high-mobility group (nonhistone chromosomal) protein 17-like 3. (6361 5.7e-05 -0.6)

21 M99701 a transcription elongation factor A (SII)-like 1. GO: nucleus ; transcription factor; RNA polymerase II transcription factor ; negative regulation of transcriptionfrom Pol II promoter ; (5038 7.2e-05 -0.6)

22 M60094 r H1 histone family, member T (testis-specific). GO: spermatogenesis ; (20989.5e-05 -0.5)

26 Z48199 a syndecan 1. GO: syndecan ; integral plasma membrane proteoglycan ; (259091.2e-04 -1.0)

15

Page 17: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

27 Z22548 a peroxiredoxin 2. GO: cytoplasm ; killer activity ; electron transporter ; thiore-doxin peroxidase ; oxidative stress response ; (16458 1.2e-04 -0.9)

29 M58525 s catechol-O-methyltransferase. GO: microsome ; soluble fraction ; O-methyltransferase ;; (14923 1.4e-04 -1.4)

30 Z23064 a RNA binding motif protein, X chromosome. (5540 1.6e-04 -0.6)

31 U90916 a NA. (5776 1.8e-04 -2.0)

33 U59914 a MAD, mothers against decapentaplegic homolog 6 (Drosophila). GO: proteinbinding ; signal transducer ; inhibitory SMAD protein ; receptor signaling pro-tein serine/threonine kinase signaling protein ; (1622 2.4e-04 -1.6)

34 U13991 a TAF10 RNA polymerase II, TATA box binding protein (TBP)-associated fac-tor, 30 kD. GO: TFIID complex ; RNA polymerase II transcription factor ;(5618 2.5e-04 -0.6)

35 L07956 a glucan (1,4-alpha-), branching enzyme 1 (glycogen branching enzyme, Ander-sen disease, glycogen storage disease type IV). GO: energy pathways ; glyco-gen metabolism ; 1,4-alpha-glucan branching enzyme ; (1850 2.7e-04 -0.7)

38 X75861 a testis enhanced gene transcript (BAX inhibitor 1). GO: nucleus ; insolublefraction ; endoplasmic reticulum ; integral plasma membrane protein ; (216763.4e-04 -0.8)

40 X87176 a hydroxysteroid (17-beta) dehydrogenase 4. GO: peroxisome ; sterol carrier ;sterol transporter ; estradiol 17 beta-dehydrogenase ; (3610 3.5e-04 -1.0)

43 U25789 a ribosomal protein L21. GO: RNA binding ; protein biosynthesis ; structuralconstituent of ribosome ; cytosolic large ribosomal subunit (sensu Eukarya) ;(22739 4.0e-04 -0.4)

45 X99325 a serine/threonine kinase 25 (STE20 homolog, yeast). GO: protein kinase ; sig-nal transduction ; oxidative stress response ; (5312 4.1e-04 -0.2)

46 S73591 a thioredoxin interacting protein. (28908 4.1e-04 -1.1)

47 X68194 a synaptophysin-like protein. GO: synaptic vesicle ; synaptic transmission ; inte-gral membrane protein ; non-selective vesicle transport ; integral plasma mem-brane protein ; (6522 4.2e-04 -1.2)

51 M36429 s guanine nucleotide binding protein (G protein), beta polypeptide 2. GO:; het-erotrimeric G-protein GTPase, beta-subunit ; G-protein coupled receptor pro-tein signaling pathway ; (8871 4.2e-04 -0.5)

52 D45370 a adipose specific 2. (36202 4.4e-04 -0.8)

53 U96915 a sin3-associated polypeptide, 18kD. GO: transcription co-repressor ; histonedeacetylase complex ; regulation of transcription from Pol II promoter ; (82544.5e-04 -0.8)

56 HG311-HT ribosomal protein L24. GO: RNA binding ; protein biosynthesis ; structuralconstituent of ribosome ; cytosolic large ribosomal subunit (sensu Eukarya) ;(27082 5.1e-04 -0.4)

57 Y08915 a immunoglobulin (CD79A) binding protein 1. (7050 5.3e-04 -0.7)

59 Y00815 a protein tyrosine phosphatase, receptor type, F. GO: cell adhesion ; protein tyro-sine phosphatase ; integral plasma membrane protein ; transmembrane receptorprotein tyrosine phosphatase ; transmembrane receptor protein tyrosine phos-phatase signaling pathway ; (12545 5.4e-04 -0.8)

62 U03886 a GS2 gene. (1596 5.9e-04 -0.4)

16

Page 18: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

63 S82470 a leukocyte receptor cluster (LRC) member 4. (12237 6.3e-04 -0.5)

64 U77948 a general transcription factor II, i. GO: protein binding ; signal transduction; transcription factor ; transcription initiation from Pol II promoter ; generalRNA polymerase II transcription factor ; (9029 6.6e-04 -0.8)

69 D87453 a mitochondrial ribosomal protein S27. (2162 7.4e-04 -0.7)

74 U04241 a amino-terminal enhancer of split. GO: development ; histogenesis and organo-genesis ; (22909 7.8e-04 -0.5)

75 Z46788 a cylicin, basic protein of sperm head cytoskeleton 2. GO: structural constituentof cytoskeleton ; regulation of cell shape and cell size ; (1446 7.8e-04 -0.8)

78 D16481 a hydroxyacyl-Coenzyme A dehydrogenase/3-ketoacyl-Coenzyme Athiolase/enoyl-Coenzyme A hydratase (trifunctional protein), beta sub-unit. GO: mitochondrion ; enoyl-CoA hydratase ; mitochondrial membrane ;fatty acid beta-oxidation ; acetyl-CoA C-acyltransferase ; 3-hydroxyacyl-CoAdehydrogenase ; (9151 8.0e-04 -0.7)

81 X82676 a protein tyrosine phosphatase, non-receptor type 14. GO: protein amino aciddephosphorylation ; protein tyrosine phosphatase ; (1670 8.3e-04 -0.3)

82 J04093 s UDP glycosyltransferase 1 family, polypeptide A6. (10672 8.4e-04 -1.0)

85 X91788 a chloride channel, nucleotide-sensitive, 1A. GO: vision ; circulation ; plasmamembrane ; small molecule transport ; auxiliary transport protein ; (4558 8.9e-04 -0.5)

86 X76013 a glutaminyl-tRNA synthetase. GO: cytoplasm ; soluble fraction ; proteinbiosynthesis ; glutamine-tRNA ligase ; glutaminyl-tRNA aminoacylation ;(12086 8.9e-04 -0.8)

90 M37104 a ATP synthase, H+ transporting, mitochondrial F0 complex, subunit F6.GO: transporter ; mitochondrion ; energy pathways ; membrane frac-tion ; adenosinetriphosphatase ; mitochondrial inner membrane ; hydrogen-transporting two-sector ATPase ; (8244 9.6e-04 -0.8)

92 L20298 a core-binding factor, beta subunit. GO: oncogenesis ; transcription factor ; tran-scription from Pol II promoter ; RNA polymerase II transcription factor ; (51299.8e-04 -0.6)

94 U90915 a cytochrome c oxidase subunit IV isoform 1. GO: energy pathways ; cy-tochrome c oxidase ; (24108 9.9e-04 -0.7)

97 X98307 a UV-B repressed sequence, HUR 7. (2710 1.0e-03 -0.3)

99 M35128 a cholinergic receptor, muscarinic 1. GO: oncogenesis ; neurogenesis ; plasmamembrane ; membrane fraction ; cell proliferation ; signal transduction ; pro-tein modification ; protein kinase C activation ; integral plasma membraneprotein ; muscarinic acetylcholine receptor ; positive regulation of cell prolif-eration ; phosphatidylinositol-4,5-bisphosphate hydrolysis ; G-protein coupledreceptor protein signaling pathway ; acetyl choline receptor signaling, mus-carinic pathway ; muscarinic acetyl choline receptor, phospholipase C activat-ing pathway ; (9906 1.1e-03 -0.5)

100 U94855 a eukaryotic translation initiation factor 3, subunit 5 (epsilon, 47kD). GO: trans-lation initiation factor ; regulation of translational initiation ; eukaryotic trans-lation initiation factor 3 complex ; (13822 1.1e-03 -0.8)

17

Page 19: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

We expect 0.00163262 false positive predictions of differential regulation withthe same P-value as the top ranking gene (after correction for multiple testing 7129genes). We expect 7.81381 false positive predictions of differential regulationwith the same P-value as the bottom ranking gene (after correction for multipletesting).

In the Adobe Acrobat (PDF) version of this report, the probe ID is hyperlinkedto the LocusLink database (if available). Clicking on the probe ID will take youto a detailed description of the gene in that database.

3.5 Functional categories

The top ranking genes that have a function annotated by Gene Ontology termshave been placed into functional and process categories as defined by the GeneOntology Consortium. Figure 5 shows the distribution of the upregulated anddownregulated genes by function. Upregulation and downregulation is determinedbased on the last category compared to the first category. Figure 6 comparesupregulated and downregulated genes directly by category.

3.6 Prediction of orphan function

Among the top ranking genes are genes with unknown function. For those geneswhere the complete amino acid sequence is known or predicted, the ProtFun soft-ware was used to predict the function in general categories (Table 4).

Table 4: ProtFun prediction of orphan gene function, if any.Gene ProtFun Predicted CategoriesD29642 at Purines and pyrimidines; Enzyme; Ligase; Ion channel;D45370 at Energy metabolism; Nonenzyme; Transcription regulation;U90915 at Translation; Nonenzyme; Ion channel;D79984 s at Cell envelope; Enzyme; Transporter;

3.7 Signal transduction pathway analysis

18

Page 20: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

binding (2)

cell adhesion molecule (2)

chaperone (2)enzyme (7)

enzyme regulator (1)

signal transducer (12)

transcription regulator (3)

Functional Categories of upregulated genes

binding (7)enzyme (11)

signal transducer (1)

transcription regulator (5) structural molecule (2)

translation regulator (1)

transporter (4)

Functional Categories of downregulated genes

Figure 5: Gene ontology function categories of those top ranking genes that have beenannotated. The number of genes in each category is shown in parenthesis. Note that onlya fraction of the top ranking genes have been categorized with a gene ontology function.

19

Page 21: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

binding

cell adhesion molecule

chaperone

enzyme

enzyme regulator

signal transducer

transcription regulator

structural molecule

translation regulator

transporter

0 2 4 6 8 10 12

Figure 6: Gene ontology function categories of those top ranking genes that have beenannotated. Upregulated genes are shown in red, downregulated genes are shown in green.

20

Page 22: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

The top genes were searched against the TRANSPATH13 signal transductiondatabase (www.transpath.de or www.gene-regulation.com). Table 5 shows theresults.

Table 5: Table of top ranking genes found in TRANSPATH.Expression refers to absolute expression of of the gene on thefirst chip, P-value of differential expression and logfold changein expression. Pathway refers to the name of the pathway inTRANSPATH in which the gene was found and the gene namerefers to the name used for the gene in that pathway. If you clickon a gene identifier, your browser will take you to a database de-scription of it.

Gene Expression Gene name in pathway Pathway FigureU17743 s (977 8.2e-04 0.1) MKK4 p53 7

The figures shown on the following pages give a schematic overview of thesignal transduction pathways in which differentially expressed genes were found.Remember that the signal is usually transmitted by protein-to-protein contact.Such protein-to-protein contact is not detected in a DNA microarray experiment.What is detected instead is if any genes encoding the proteins in the pathway areregulated or if any target genes of the pathways are regulated.

The signal transduction pathway analysis was extended beyond the top rank-ing genes to look for all genes in the experiment which could be mapped to aTRANSPATH annotated pathway. The purpose of this is to discover pathwayswith a number of differentially regulated genes, even though they on an individualgene basis do not pass a statistical significance test.

Figure 8 shows all the TRANSPATH pathways in which genes were found andsummarizes their rank in the statistical analysis.

13Krull M, Voss N, Choi C, Pistor S, Potapov A, Wingender E. ”TRANSPATH: an integrateddatabase on signal transduction and a tool for array analysis.” Nucleic Acids Res. 2003 Jan1;31(1):97-100.

21

Page 23: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Figure 7: The p53 signal transduction pathway.

22

Page 24: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

1e−03 1e−01 1e+01 1e+03

05

1015

2025

30

P−value of genes

Pat

hway

Num

ber

p53

beta−catenin

IL−1

E2F

TGFbetamap

cancernet

insulin

p53

FaswegFasweg

TGFbetamap

cancernet

E2F

beta−catenin

TNF_alpha

wnt

BCRweg

beta−catenin

cancernet

CD28

IL−8

beta−catenin

IL−8

Notch

apoptosis

TGFbetamap

cancernet

p53

E2F

TNF_alpha

CD28

TNF_alpha

cancernet

IL−1

TGFbetamap

p53sites

cancernet

p53sites

IL−8

proteasome

cancernet

IL−1

p53

apoptosis

p53

cancernetcancernet

neurotensinneurotensin

apoptosis

TGFbetamap

GpIIb−IIIa

IL−1

cancernet

TGFbetamap

cancernet

EGF

cancernet

TGFbetamap

beta−catenin

E2F

TGFbetamap

vegf

E2F

IL−8

proteasome

E2F

p53

apoptosis

IL−8

vegf

TGFbetamapTGFbetamap

CD28

beta−catenin

cancernet

vegf

IFN−map2

insulin

p53

apoptosis

IL−8

beta−catenin

BCRweg

wnt

apoptosis

vegf

p53sites

vegf

IL−1

TGFbetamap

wnt

cancernet

IL−8

TGFbetamap

wnt

p53p53

beta−catenin

IFN−map2

vegf

TGFbetamap

wnt

E2FE2F

IL−1

p53

IL−8

IL−10

cancernetcancernet

p53

IL−8

IL−1

cancernet

GpIIb−IIIa

wnt

TNF_alpha

p53

CD28

p53sites

Notch

TPO−map

CD28

IL−1

insulin

IFN−map2

wnt

CD28

p53

insulin

beta−catenin

cancernet

vegf

beta−catenin

neurotensin

IL−1

proteasome

p53

TCR2

cancernet

BCRweg

apoptosis

cancernet

beta−catenin

vegf

p53

E2F

IL−1

TNF_alpha

cancernet

insulin

Notch

E2FE2F

IFN−map2

E2FE2F

wnt

p53p53

insulin

cancernet

p53sites

CD28

TGFbetamap

Notch

proteasome

GpIIb−IIIa

cancernet

BCRweg

E2F

IL−8

TGFbetamap

cancernet

IL−8

vegf

beta−catenin

TLR4

Fasweg

E2F

cancernetcancernet

Fasweg

IL−8

apoptosis

TNF_alpha

beta−catenin

CD28

Notch

p53

IL−8

TGFbetamap

E2F

p53sites

cancernet

TNF_alpha

beta−catenin

p53sites

insulin

TGFbetamap

apoptosis

cancernet

IFN−map2

wnt

apoptosis

Notch

beta−catenin

IL−8

BCRweg

wnt

beta−catenin

wnt

OSM

IL−1IL−1

CD28

IL−1

beta−catenin

Figure 8: A list of all signal transduction pathways in which genes were found. Thex-axis shows the P-value of each gene assigned to each pathway. A P-value close to 1means the gene is almost certain to be unchanged in the experiment. The smaller theP-value, the greater the probability of differential regulation. Pathways with differentialexpression should stand out from the background level.

23

Page 25: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

3.8 Metabolic pathway analysis

A pathway analysis was performed on the top ranking genes by running themagainst the KEGG database of cellular pathways. Table 6 shows the results.

Table 6: Table of top ranking genes found in KEGG. The pathwayof the top gene can be seen in Figure 9 and the E.C. number refersto the step in that pathway. If you click on a pathway name, yourbrowser will take you to a figure of the pathway. You can locatethe E.C. numbers on the figures. If you click on a gene identifier,your browser will take you to a database description of it.

Gene Description PathwayM97347 s beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-

acetylglucosaminyltransferase [EC:2.4.1.102] (629 5.8e-05 0.2)O-Glycans biosynthesis

L07956 a 1,4-alpha-glucan branching enzyme [EC:2.4.1.18] (1850 2.7e-04 -0.7) Starch and sucrose metabolism

X87176 a HSD17B; estradiol 17beta-dehydrogenase [EC:1.1.1.62] (3610 3.5e-04-1.0)

Androgen and estrogen metabolism

M98539 a prostaglandin-H2 D-isomerase [EC:5.3.99.2] (4750 7.6e-04 0.7) Prostaglandin and leukotriene metabolism

D16481 a fadA; acetyl-CoA acyltransferase [EC:2.3.1.16] (9151 8.0e-04 -0.7) Benzoate degradation via hydroxylation

X76013 a glutaminyl-tRNA synthetase [EC:6.1.1.18] (12086 8.9e-04 -0.8) Aminoacyl-tRNA biosynthesis

M37104 a H+-transporting ATPase [EC:3.6.3.14] (8244 9.6e-04 -0.8) Photosynthesis

The KEGG pathway analysis was extended beyond the top ranking genes tolook for all genes in the experiment which could be mapped to a KEGG pathway.The purpose of this is to discover pathways with a number of differentially regu-lated genes, even though they on an individual gene basis do not pass a statisticalsignificance test.

Figure 10 shows all the KEGG pathways in which genes were found and sum-marizes their rank in the statistical analysis.

3.9 Clustering of Genes

A visualization of the expression of the top ranking genes in each of the experi-ments is performed by clustering with the ClusterExpress software (Figure 11).

A number of K-means clusterings were performed as well. First the numberof clusters, K, was optimized by measuring how the number of clusters affects thequality of the clustering (Figure 12). Then a K-means clustering using the optimalnumber of clusters, 2, was performed (Figure 13).

24

Page 26: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Figure 9: The KEGG pathway of the highest ranking gene from Table 6

25

Page 27: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

1e−04 1e−02 1e+00 1e+02 1e+04

020

4060

8010

0

P−value of genes

KE

GG

Pat

hway

Num

ber

O−Glycans biosynthesis

Starch and sucrose metabolism

Androgen and estrogen metabolism

Prostaglandin and leukotriene metabolism

Benzoate degradation via hydroxylation

Aminoacyl−tRNA biosynthesis

Photosynthesis

Porphyrin and chlorophyll metabolism

Photosynthesis

Purine metabolism

Tryptophan metabolism

Glycerolipid metabolism

Glutathione metabolism

N−Glycans biosynthesis

Glutathione metabolism

Pyrimidine metabolism

Butanoate metabolism

Purine metabolism

Carbon fixation

Nicotinate and nicotinamide metabolism

Porphyrin and chlorophyll metabolism

Arginine and proline metabolism

Pyrimidine metabolism

Propanoate metabolism

Butanoate metabolism

Photosynthesis

Fructose and mannose metabolism

Butanoate metabolism

Pantothenate and CoA biosynthesis

Purine metabolism

Prostaglandin and leukotriene metabolismProstaglandin and leukotriene metabolism

Glutathione metabolism

Tryptophan metabolism

Carbon fixation

Aminosugars metabolism

Alanine and aspartate metabolism

Urea cycle and metabolism of amino groupsPhotosynthesisPhotosynthesis

Sphingoglycolipid metabolism

Pantothenate and CoA biosynthesis

Butanoate metabolism

Glycerolipid metabolism

Purine metabolism

Methane metabolism

Urea cycle and metabolism of amino groups

Fructose and mannose metabolism

Folate biosynthesis

Reductive carboxylate cycle (CO2 fixation)

RNA polymerase

Nucleotide sugars metabolism

Oxidative phosphorylation

One carbon pool by folate

Nucleotide sugars metabolism

Purine metabolism

Pantothenate and CoA biosynthesis

Arginine and proline metabolism

Aminoacyl−tRNA biosynthesis

Terpenoid biosynthesis

Benzoate degradation via CoA ligation

Glycerolipid metabolismGlycerolipid metabolism

Nicotinate and nicotinamide metabolism

Nucleotide sugars metabolism

N−Glycan degradation

Purine metabolism

Prostaglandin and leukotriene metabolism

Photosynthesis

Glycosaminoglycan degradation

Propanoate metabolism

Glycerolipid metabolism

Oxidative phosphorylation

RNA polymerase

Phospholipid degradation

Reductive carboxylate cycle (CO2 fixation)

Pyrimidine metabolism

Sulfur metabolism

Phosphatidylinositol signaling system

Glutamate metabolism

Photosynthesis

O−Glycans biosynthesis

Propanoate metabolism

Butanoate metabolism

Propanoate metabolism

Aminosugars metabolism

Purine metabolism

Arginine and proline metabolism

Prostaglandin and leukotriene metabolism

Valine, leucine and isoleucine degradation

Purine metabolism

Sphingoglycolipid metabolismProstaglandin and leukotriene metabolism

Glycerolipid metabolism

Styrene degradation

Purine metabolism

Propanoate metabolism

Methane metabolism

Globoside metabolism

Aminoacyl−tRNA biosynthesis

Oxidative phosphorylation

Phosphatidylinositol signaling system

Phenylalanine, tyrosine and tryptophan bio

Glutathione metabolism

Purine metabolism

Glycerolipid metabolism

Globoside metabolism

Cysteine metabolism

Porphyrin and chlorophyll metabolism

Glutathione metabolism

Glycine, serine and threonine metabolism

Tetrachloroethene degradation

Aminosugars metabolism

Keratan sulfate biosynthesis

Benzoate degradation via CoA ligation

N−Glycans biosynthesis

Propanoate metabolism

Reductive carboxylate cycle (CO2 fixation)

Glutathione metabolism

Pyrimidine metabolism

Arginine and proline metabolism

Nicotinate and nicotinamide metabolism

Arginine and proline metabolism

D−Arginine and D−ornithine metabolismGlutathione metabolism

Histidine metabolism

Sphingoglycolipid metabolism

Fatty acid biosynthesis (path 1)

Glycerolipid metabolism

Butanoate metabolism

Sphingophospholipid biosynthesis

Nitrogen metabolism

Glutathione metabolism

One carbon pool by folate

Androgen and estrogen metabolism

Nucleotide sugars metabolism

Fatty acid metabolism

Pyruvate metabolism

Phosphatidylinositol signaling system

Pantothenate and CoA biosynthesis

Purine metabolismPurine metabolism

Porphyrin and chlorophyll metabolism

Pyrimidine metabolism

Sterol biosynthesis

Pantothenate and CoA biosynthesis

Fatty acid metabolism

Glutamate metabolism

Purine metabolism

Nitrogen metabolismSulfur metabolism

Riboflavin metabolism

Purine metabolism

Tyrosine metabolism

Androgen and estrogen metabolism

Carbon fixation

Porphyrin and chlorophyll metabolism

Alkaloid biosynthesis II

Carbon fixation

Glutathione metabolism

Arginine and proline metabolism

Butanoate metabolism

N−Glycan degradation

Glycerolipid metabolism

Nitrogen metabolism

Pyruvate metabolism

O−Glycans biosynthesis

Starch and sucrose metabolism

Oxidative phosphorylation

Butanoate metabolism

Phospholipid degradation

Carbon fixation

Benzoate degradation via CoA ligation

Pentose phosphate pathway

Androgen and estrogen metabolism

Selenoamino acid metabolism

Oxidative phosphorylation

Pyruvate metabolism

N−Glycan degradation

Phospholipid degradation

Glycolysis / Gluconeogenesis

beta−Alanine metabolism

Purine metabolism

Sphingophospholipid biosynthesis

Pyrimidine metabolism

Glycerolipid metabolism

Glutathione metabolism

Porphyrin and chlorophyll metabolism

Methionine metabolism

Oxidative phosphorylation

Glycosaminoglycan degradation

Terpenoid biosynthesis

Carbon fixation

Pyrimidine metabolism

Nitrogen metabolism

Valine, leucine and isoleucine degradation

O−Glycans biosynthesis

Pentose phosphate pathway

Arginine and proline metabolism

Androgen and estrogen metabolism

Selenoamino acid metabolism

Androgen and estrogen metabolism

Porphyrin and chlorophyll metabolism

Oxidative phosphorylation

Propanoate metabolism

Galactose metabolism

Sphingoglycolipid metabolism

Globoside metabolismPyruvate metabolism

Glutathione metabolism

Folate biosynthesis

Arginine and proline metabolism

Sulfur metabolism

Phenylalanine, tyrosine and tryptophan bio

Purine metabolism

Glutathione metabolism

Purine metabolism

Prostaglandin and leukotriene metabolism

Aminoacyl−tRNA biosynthesis

Alkaloid biosynthesis I

PhotosynthesisPhotosynthesis

Globoside metabolism

Tetrachloroethene degradation

Aminosugars metabolism

Pyrimidine metabolism

RNA polymerase

Carbon fixation

Selenoamino acid metabolism

RNA polymerase

Sphingoglycolipid metabolism

Starch and sucrose metabolism

Arginine and proline metabolism

Pyrimidine metabolism

Glycosaminoglycan degradation

Arginine and proline metabolism

Urea cycle and metabolism of amino groups

Propanoate metabolism

Androgen and estrogen metabolism

Globoside metabolism

Butanoate metabolism

Sphingoglycolipid metabolism

Phosphatidylinositol signaling system

Glutathione metabolism

Nitrogen metabolism

Glycerolipid metabolism

beta−Alanine metabolism

Propanoate metabolism

Citrate cycle (TCA cycle)

Phosphatidylinositol signaling system

Porphyrin and chlorophyll metabolism

Pyruvate metabolism

Prostaglandin and leukotriene metabolism

One carbon pool by folate

Sphingoglycolipid metabolism

Fatty acid biosynthesis (path 1)

Phosphatidylinositol signaling system

Phenylalanine, tyrosine and tryptophan bio

Androgen and estrogen metabolism

Arginine and proline metabolism

Nitrogen metabolism

Valine, leucine and isoleucine degradation

D−Arginine and D−ornithine metabolism

Pyruvate metabolism

Glycerolipid metabolism

Purine metabolism

Carbon fixation

Oxidative phosphorylation

Retinol metabolism

Purine metabolismPurine metabolismPurine metabolism

Arginine and proline metabolism

Tryptophan metabolism

Porphyrin and chlorophyll metabolismPorphyrin and chlorophyll metabolism

Folate biosynthesis

Sterol biosynthesis

Folate biosynthesis

Riboflavin metabolism

Purine metabolismPurine metabolism

Citrate cycle (TCA cycle)

Prostaglandin and leukotriene metabolism

Porphyrin and chlorophyll metabolism

Phosphatidylinositol signaling system

Nitrogen metabolism

beta−Alanine metabolism

Glutathione metabolism

Sphingoglycolipid metabolism

Phosphatidylinositol signaling system

Pyrimidine metabolismPurine metabolism

Glycerolipid metabolism

Carbon fixation

C21−Steroid hormone metabolism

N−Glycans biosynthesis

Carbon fixation

Phosphatidylinositol signaling system

Arginine and proline metabolism

Retinol metabolismPorphyrin and chlorophyll metabolismRetinol metabolism

Arginine and proline metabolism

Carbon fixation

Pyruvate metabolism

Glutathione metabolism

Prostaglandin and leukotriene metabolism

Glycerolipid metabolism

Butanoate metabolismButanoate metabolism

Pyruvate metabolism

Purine metabolism

Pentose phosphate pathway

Glycerolipid metabolism

Lysine degradation

Glycosaminoglycan degradation

Galactose metabolism

Tryptophan metabolism

Glutathione metabolism

Galactose metabolism

Porphyrin and chlorophyll metabolism

Phosphatidylinositol signaling system

Sphingoglycolipid metabolism

Blood group glycolipid biosynthesis − neol

Glycosaminoglycan degradation

Prostaglandin and leukotriene metabolism

Glycine, serine and threonine metabolism

Oxidative phosphorylation

Carbon fixation

Aminosugars metabolism

Propanoate metabolism

Aminosugars metabolism

Porphyrin and chlorophyll metabolism

Glycosaminoglycan degradation

Type II secretion system

Porphyrin and chlorophyll metabolism

Aminosugars metabolism

Folate biosynthesis

Glutathione metabolism

Pantothenate and CoA biosynthesis

Glycosaminoglycan degradation

Phosphatidylinositol signaling system

Purine metabolism

Photosynthesis

Arginine and proline metabolism

Phosphatidylinositol signaling system

Purine metabolism

Nicotinate and nicotinamide metabolism

Alanine and aspartate metabolism

Aminoacyl−tRNA biosynthesis

Nitrogen metabolism

Citrate cycle (TCA cycle)

Arginine and proline metabolism

Chondroitin / Heparan sulfate biosynthesis

RNA polymerase

Phosphatidylinositol signaling system

Keratan sulfate biosynthesis

Pyruvate metabolism

Tryptophan metabolism

Sterol biosynthesis

Carbon fixation

Aminoacyl−tRNA biosynthesis

Oxidative phosphorylation

Pantothenate and CoA biosynthesis

Nucleotide sugars metabolism

Glycolysis / Gluconeogenesis

Aminoacyl−tRNA biosynthesis

Fatty acid metabolism

Arginine and proline metabolismArginine and proline metabolism

Porphyrin and chlorophyll metabolism

Sphingoglycolipid metabolism

Glycerolipid metabolism

Alkaloid biosynthesis II

Porphyrin and chlorophyll metabolism

N−Glycans biosynthesis

Photosynthesis

Pantothenate and CoA biosynthesis

Pyrimidine metabolism

Glutathione metabolism

Glycerolipid metabolism

Styrene degradation

Phenylalanine, tyrosine and tryptophan bio

Sphingophospholipid biosynthesis

Aminosugars metabolism

Glycine, serine and threonine metabolism

Carbon fixation

Nitrogen metabolism

Biotin metabolism

RNA polymerase

Vitamin B6 metabolism

Folate biosynthesis

Phospholipid degradation

Oxidative phosphorylation

Benzoate degradation via hydroxylation

Phenylalanine, tyrosine and tryptophan bio

Riboflavin metabolism

Sphingoglycolipid metabolism

Selenoamino acid metabolism

Nitrogen metabolism

Arginine and proline metabolism

Chondroitin / Heparan sulfate biosynthesis

Sphingophospholipid biosynthesis

Glutathione metabolism

Sterol biosynthesis

Prostaglandin and leukotriene metabolism

Carbon fixation

Sterol biosynthesis

Tyrosine metabolism

Sulfur metabolism

Folate biosynthesis

Fatty acid metabolism

Starch and sucrose metabolism

Phosphatidylinositol signaling system

Pyrimidine metabolism

Arginine and proline metabolism

Pyrimidine metabolism

Fructose and mannose metabolism

O−Glycans biosynthesis

Aminoacyl−tRNA biosynthesis

Starch and sucrose metabolism

Androgen and estrogen metabolism

N−Glycans biosynthesis

Nitrogen metabolism

Phospholipid degradation

Aminoacyl−tRNA biosynthesis

Sphingoglycolipid metabolism

Arginine and proline metabolism

Pantothenate and CoA biosynthesis

Figure 10: A list of all KEGG pathways in which genes were found. The x-axis showsthe P-value of each gene assigned to each pathway. A P-value close to 1 means the gene isalmost certain to be unchanged in the experiment. The smaller the P-value, the greater theprobability of differential regulation. Pathways with differential expression should standout from the background level.

26

Page 28: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Ta

Ta

Ta

Ta

Ta

Ta

T1

T1

T1

T1

T1

T1

T1

T2

T2

T2

T2

T2

U94855_at 100 1.1e-03U90549_at 17 5.7e-05X76013_at 86 8.9e-04HG311-HT311_at 56 5.1e-04Z23064_at 30 1.6e-04Y08915_at 57 5.3e-04M81637_at 4 5.0e-06U13991_at 34 2.5e-04D87453_at 69 7.4e-04X98307_at 97 1.0e-03Y08999_at 15 4.3e-05M37104_at 90 9.6e-04L20298_at 92 9.8e-04L07956_at 35 2.7e-04Z46788_at 75 7.8e-04U90916_at 31 1.8e-04U49352_at 14 3.7e-05D16481_at 78 8.0e-04X68194_at 47 4.2e-04U42408_at 7 1.1e-05Y11999_at 2 1.6e-06U72649_at 10 3.0e-05U59914_at 33 2.4e-04L11708_at 1 2.3e-07M99701_at 21 7.2e-05U68385_at 6 1.0e-05X82676_at 81 8.3e-04M60094_rna1_at 22 9.5e-05M36429_s_at 51 4.2e-04J04093_s_at 82 8.4e-04X87176_at 40 3.5e-04X94453_at 8 2.3e-05X75861_at 38 3.4e-04U04241_at 74 7.8e-04U25789_at 43 4.0e-04U96915_at 53 4.5e-04U03886_at 62 5.9e-04X91788_at 85 8.9e-04X99325_at 45 4.1e-04X89426_at 60 5.6e-04M35128_at 99 1.1e-03U90915_at 94 9.9e-04Z26491_s_at 5 6.8e-06M58525_s_at 29 1.4e-04Z22548_at 27 1.2e-04S73591_at 46 4.1e-04Z48199_at 26 1.2e-04U77948_at 64 6.6e-04S82470_at 63 6.3e-04Y00815_at 59 5.4e-04D45370_at 52 4.4e-04M97347_s_at 18 5.8e-05D79984_s_at 98 1.1e-03D30715_xpt5_s_at 28 1.4e-04X66087_at 11 3.1e-05Z31560_s_at 91 9.7e-04HG3928-HT4198_s_at58 5.3e-04U05681_s_at 61 5.8e-04X04729_s_at 76 7.9e-04M58597_at 44 4.1e-04Z34974_s_at 72 7.6e-04M62628_s_at 9 2.5e-05L38025_at 68 7.3e-04HG3286-HT3463_at 37 3.0e-04U05259_rna1_at 55 4.9e-04U60269_cds3_at 95 9.9e-04X93017_at 84 8.8e-04J04990_at 77 8.0e-04X14046_at 88 9.2e-04AFFX-BioDn-3_at 12 3.3e-05L13720_at 87 9.2e-04D29642_at 48 4.2e-04AFFX-BioC-3_at 83 8.8e-04HG3395-HT3573_s_at79 8.1e-04J00207_rna2_at 70 7.6e-04U17743_s_at 80 8.2e-04L36644_at 39 3.4e-04U00928_at 65 6.6e-04D38498_f_at 23 1.1e-04D38437_f_at 25 1.2e-04X85750_at 93 9.8e-04M28882_s_at 89 9.5e-04M98539_at 71 7.6e-04M74719_at 19 6.6e-05U19713_s_at 67 7.2e-04Y00062_at 42 3.9e-04M57731_s_at 66 7.1e-04X13334_at 54 4.9e-04X17042_at 41 3.8e-04D17357_at 49 4.2e-04U75362_at 36 2.9e-04HG4069-HT4339_s_at32 1.9e-04M31165_at 16 4.8e-05M29696_at 24 1.1e-04M62505_at 20 6.7e-05K02405_f_at 13 3.5e-05M37766_at 3 2.9e-06U16307_at 50 4.2e-04L25286_s_at 73 7.7e-04M34516_at 96 1.0e-03

-1.68 -0.09 1.51

Figure 11: Hierarchical clustering of top ranking genes based on their vector angle dis-tance. The color scale shows for each gene the logarithm of the fold change relative to theaverage expression in the first category. For each gene, the chip ID, the number referringto Table 2 or Table 3, as well as the P-value are given.

27

Page 29: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

2 3 4 5 6 7 8 9

0.50

0.55

0.60

0.65

0.70

Number of clusters K

Clus

terin

g qu

ality

Figure 12: Optimization of the number of clusters K. The clustering quality was mea-sured, for each value of K, as the ratio of between-cluster variance to within-cluster vari-ance. The higher this ratio is, the better the separation into clusters is.

28

Page 30: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Ta

Ta

Ta

Ta

Ta

Ta

T1

T1

T1

T1

T1

T1

T1

T2

T2

T2

T2

T2

M37766_at 3 2.9e-06AFFX-BioDn-3_at 12 3.3e-05K02405_f_at 13 3.5e-05M31165_at 16 4.8e-05M29696_at 24 1.1e-04HG3286-HT3463_at 37 3.0e-04Y00062_at 42 3.9e-04M58597_at 44 4.1e-04D17357_at 49 4.2e-04U16307_at 50 4.2e-04X13334_at 54 4.9e-04L38025_at 68 7.3e-04M98539_at 71 7.6e-04AFFX-BioC-3_at 83 8.8e-04X93017_at 84 8.8e-04L13720_at 87 9.2e-04M28882_s_at 89 9.5e-04X85750_at 93 9.8e-04U60269_cds3_at 95 9.9e-04M34516_at 96 1.0e-03M74719_at 19 6.6e-05M62505_at 20 6.7e-05HG4069-HT4339_s_at32 1.9e-04U75362_at 36 2.9e-04X17042_at 41 3.8e-04D29642_at 48 4.2e-04U05259_rna1_at 55 4.9e-04M57731_s_at 66 7.1e-04U19713_s_at 67 7.2e-04L25286_s_at 73 7.7e-04J04990_at 77 8.0e-04HG3395-HT3573_s_at79 8.1e-04X14046_at 88 9.2e-04Z31560_s_at 91 9.7e-04D79984_s_at 98 1.1e-03M62628_s_at 9 2.5e-05X66087_at 11 3.1e-05D38498_f_at 23 1.1e-04D38437_f_at 25 1.2e-04D30715_xpt5_s_at 28 1.4e-04L36644_at 39 3.4e-04HG3928-HT4198_s_at58 5.3e-04U05681_s_at 61 5.8e-04U00928_at 65 6.6e-04J00207_rna2_at 70 7.6e-04Z34974_s_at 72 7.6e-04X04729_s_at 76 7.9e-04U17743_s_at 80 8.2e-04

L11708_at 1 2.3e-07Y11999_at 2 1.6e-06M81637_at 4 5.0e-06Z26491_s_at 5 6.8e-06U42408_at 7 1.1e-05X94453_at 8 2.3e-05U49352_at 14 3.7e-05Y08999_at 15 4.3e-05Z48199_at 26 1.2e-04U59914_at 33 2.4e-04U13991_at 34 2.5e-04X87176_at 40 3.5e-04U25789_at 43 4.0e-04D45370_at 52 4.4e-04HG311-HT311_at 56 5.1e-04S82470_at 63 6.3e-04U77948_at 64 6.6e-04D87453_at 69 7.4e-04Z46788_at 75 7.8e-04J04093_s_at 82 8.4e-04X91788_at 85 8.9e-04M37104_at 90 9.6e-04L20298_at 92 9.8e-04U90915_at 94 9.9e-04X98307_at 97 1.0e-03U68385_at 6 1.0e-05U72649_at 10 3.0e-05M99701_at 21 7.2e-05Z22548_at 27 1.2e-04M58525_s_at 29 1.4e-04U90916_at 31 1.8e-04L07956_at 35 2.7e-04X75861_at 38 3.4e-04X99325_at 45 4.1e-04S73591_at 46 4.1e-04X68194_at 47 4.2e-04Y00815_at 59 5.4e-04U04241_at 74 7.8e-04D16481_at 78 8.0e-04X82676_at 81 8.3e-04M35128_at 99 1.1e-03U90549_at 17 5.7e-05M60094_rna1_at 22 9.5e-05Z23064_at 30 1.6e-04M36429_s_at 51 4.2e-04U96915_at 53 4.5e-04Y08915_at 57 5.3e-04X89426_at 60 5.6e-04U03886_at 62 5.9e-04X76013_at 86 8.9e-04U94855_at 100 1.1e-03M97347_s_at 18 5.8e-05

-1.68 -0.09 1.51

Figure 13: K-means clustering of top ranking genes based on their vector angle distance.The color scale shows for each gene the logarithm of the fold change relative to the av-erage expression in the first category. For each gene, the chip ID, the number referringto Table 2 and Table 3, as well as the P-value are given. The number of clusters, 2, wasselected by optimization.

29

Page 31: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

3.10 Promoter analysis

From the K-means clustering the upstream regions were extracted from the genesof each cluster. The software program saco patterns14 was run on each cluster toidentify overrepresented patterns in the upstream regions. Table 7 shows the mostoverrepresented patterns for each cluster.

Table 7: Analysis of the upstream regions of the K-means clus-ters with saco patterns. The occurrence of exact matches to eachpattern is shown in the cluster (cluster size given in parenthesis)and in the background data set (set size given in parenthesis). Theresulting (negative logarithm of the) probability of overrepresen-tation from the hypergeometric distribution is shown. For eachpattern, the genes in which it was found is listed. If a pattern wasfound more than once in a gene, then that gene will appear morethan once on the list. The sequence numbers refer to the numbersin the clustering and in the tables of up- and down-regulated genes.

Pattern -log(P) In cluster In bg (4409 genes) Found in genesCluster number 1 (cluster size=48, upstream regions extracted=25)GTATT 0.93 22 2169 98 70 70 70 70 70 77 77 77 77 13 13 13

13 73 39 39 39 39 89 89 24 24 24 24 33 19 55 55 55 61 50 50 80 80 80 80 8067 36 76 76 88 41 41 41 11 11 93 9393 93 84 84 84 91 91 91 72

Cluster number 2 (cluster size=52, upstream regions extracted=40)

An overrepresentation per se is not enough to signify biological relevance.To further substantiate a pattern, the patterns can be extracted from the upstreamregions and aligned with context. If there is conservation in the regions surround-ing the pattern then that further supports biological relevance. The final determi-nation will come from biological verification using site-directed mutagenesis orbandshift methods.

14Jensen, L.J. and S. Knudsen, (2000) Automatic Discovery of Regulatory Patterns in Pro-moter Regions Based on Whole Cell Expression Data and Functional Annotation. Bioinformatics16:326-333.

30

Page 32: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

The Gibbs sampler15 was run on the same clusters as saco patterns. The Gibbssampler looks for degenerate patterns which it tries to capture with a weight matrixdescription. In all sequences, the best match to this weight matrix is shown in theoutput. The alignment allows judgment of the degree of conservation. The resultsare shown below:

Table 8: Weight matrices describing gibbs patterns in upstream regions of K-means clus-ters. The hypergeometric sample statistics is given as the logarithm of the P-value, wherei is the number of times the matrix matches the positive set above threshold, m is thenumber of times the matrix matches the negative set above threshold, and N and n are thesizes of the negative and positive sets, respectively. For each pattern, the genes in whichit was found is listed.

Base 1 2 3 4 5 6 7 8 9 10 11Cluster number 1 (cluster size=48, upstream regions extracted=25)HYP -4.029246 i=2, m=38, N=4434, n=25Consensus: AGCAGCAGCAGFound in genes 19 19 19 19 19 72A 82 0 0 82 0 0 100 0 0 100 0C 9 0 100 0 0 100 0 9 100 0 0G 9 100 0 0 100 0 0 91 0 0 100T 0 0 0 18 0 0 0 0 0 0 0Cluster number 2 (cluster size=52, upstream regions extracted=40)HYP -1.825490 i=7, m=723, N=4449, n=40Consensus: CTGGGATTACAFound in genes 51 74 74 7 7 45 15 15 15 75A 0 0 0 14 0 100 0 0 91 0 100C 100 9 5 0 0 0 23 18 0 91 0G 0 0 95 86 100 0 0 0 0 9 0T 0 91 0 0 0 0 77 82 9 0 0

The transcription factor binding sites in Transfac16 were checked against thesame clusters. All eukaryotic factors were matched and the results are shownbelow:

15Lawrence, Altschul, Boguski, Liu, Neuwald & Wootton (1993) ”Detecting Subtle SequenceSignals: A Gibbs Sampling Strategy for Multiple Alignment”, Science 262:208-214.

16 Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D,Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, ReuterI, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E. ”TRANSFAC: transcriptional regulation,from patterns to profiles. Nucleic Acids Res. 2003 Jan 1;31(1):374-8.

31

Page 33: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Table 9: Analysis of the upstream regions of the K-means clusterswith Transfac. The occurrence of matches to each Factor is shownin the cluster (cluster size given in parenthesis). More informationabout the Factors can be found by looking them up in the publicversion of Transfac at www.gene-regulation.de. For each pattern,the genes in which it was found is listed. If a pattern was foundmore than once in a gene, then that gene will appear more thanonce on the list.

Factor name Found in sequencesCluster number 1 (cluster size=48, upstream regions extracted=25)CF2-II@fruit 98 11 11 11 93ATF@unknown 89Kr@fruit 50Hb@fruit 39Elk-1@human 41HSF@fruit 98 98 70 70 70 77 13 13 39 39 39 39 39 39 39 24 24 24 24 24 3 3

3 3 3 19 19 19 19 55 55 61 50 50 50 50 50 50 50 80 80 80 80 8080 80 80 80 67 67 67 67 36 36 36 36 36 36 36 76 76 76 76 76 7676 76 76 76 76 76 88 88 88 88 41 41 41 11 11 93 93 93 93 93 9384 84 84 84 91 91 72 72 72

HSF@yeast 98 98 98 70 70 70 70 70 70 70 70 77 77 77 77 77 13 13 13 13 1313 13 39 39 39 39 39 24 24 24 24 24 24 3 3 3 3 3 3 19 19 55 5561 61 61 61 50 50 50 50 50 50 50 50 50 80 80 80 80 80 80 80 8080 80 67 67 36 36 36 36 76 76 76 76 76 76 76 76 76 76 76 76 7654 88 41 41 41 41 41 11 11 11 11 11 93 93 93 93 93 93 93 93 8484 84 84 84 91 91 91 91 72 72 72

c-Ets-1(p54)@mouse 41NF-E2@mouse 24CREB@human 89CRE-BP1/c-Jun@mouse 89Sn@fruit 76ADR1@yeast 98 73 89 89 89 89 89 3 19 19 55 55 55 55 55 61 61 61 61 80 67

67 67 36 36 36 36 36 36 54 88 88 88 84 84 91 72 72c-Rel@human 73c-Ets-1(p54)@chick 41GATA-1@mouse 77 13 67MZF1@human 77 73 67 36 54 93 84S8@mouse 13CdxA@chick 55 50 80 36 36 41 91CdxA@chick 70 70 77 19 55 55 61 50 80 36 36 88 41 84 84 91CF1

32

Page 34: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

C/EBPbeta@mouse 70Oct-1@human 13Lyf-1@mouse 84NIT2@Neurospora 70 73 39 61 61 50 50 67 88 84HSF1@human 11HSF2@mouse 24 50 11SRY@mouse 98 39 24 3 41 41STRE@unknown 98 55 91HSF@fruit 76HSF@yeast 76Adf-1@unknown 73 91 91AP-1@unknown 36AP-4@unknown 89 3 76MyoD@unknown 89 3 88AP-1@unknown 55 36P@maize 39 3 55 36 91VBP@chick 73Nkx-2.5@mouse 88 88Nkx-2.5@mouse 13 41cap@unknown 77 73 73 89 89 3 19 55 55 55 61 61 61 36 76 76 54 54 88 41 11

11 84 91Cluster number 2 (cluster size=52, upstream regions extracted=40)Sp1@human 35 26TtkCF2-II@fruit 18 60CF2-II@fruit 18Hb@fruit 100 38HSF@fruit 78 78 78 78 78 78 78 52 52 69 69 69 35 35 35 35 35 35 35 1 1 1

1 1 92 92 99 99 99 99 99 99 99 99 51 51 51 51 51 90 90 90 90 2929 29 22 22 18 18 18 18 18 46 46 46 46 46 63 63 62 62 62 62 6274 74 74 34 34 34 34 34 14 14 14 14 6 6 6 6 10 10 10 64 64 6464 64 64 17 17 17 17 100 100 100 100 53 53 53 53 38 38 38 3838 38 40 40 40 40 40 60 60 60 60 60 8 8 8 45 45 45 59 59 59 5757 15 15 15 30 30 30 30 5 5 5 75 75 75 26 26 26 26

33

Page 35: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

HSF@yeast 78 78 78 78 78 78 52 52 52 52 69 69 69 69 69 69 35 35 35 35 3535 35 35 1 1 1 1 1 1 1 1 1 92 92 92 92 99 99 99 99 99 99 99 9999 99 99 99 99 99 99 99 99 99 51 51 51 90 90 90 90 29 22 18 1818 46 46 63 63 62 62 62 62 62 62 62 62 62 74 74 74 74 74 34 3434 34 34 14 14 14 14 14 14 6 10 10 64 64 64 64 64 64 17 17 100100 100 100 100 100 53 53 53 53 53 53 47 38 38 38 38 38 38 3838 38 38 38 38 40 40 40 40 60 60 60 60 8 8 8 8 45 45 45 45 5959 59 57 57 57 15 15 15 15 27 30 30 30 30 5 75 75 26 26 26 2626

MATa1@yeast 35MATalpha2@yeast 78 22p300@human 29 59 5NF-E2@mouse 1Sox-5@mouse 78 1 92 64 17 57ADR1@yeast 78 78 52 52 52 69 35 35 35 92 29 22 22 18 46 63 63 63 63 62 74

74 74 74 74 34 14 14 6 6 6 6 10 10 64 100 100 38 38 40 45 45 5957 57 15 27 27 27 27 27 30 5 75 26 26 26 26 26 26 26

NF-kappaBc-Rel@human 40NF-kappaB@rat 40deltaEF1@chick 100GATA-1@mouse 52 29 63 63 34 38 40 5GATA-2@human 40GATA-3@human 40MZF1@human 52 63 10 57S8@mouse 59CdxA@chick 90 18 46 62 17 17 100 45 75 75CdxA@chick 78 90 22 18 46 62 100 100 38 38 8 8 45 15 75 75 75dl@fruit 18 38GATA-1@chick 1GATA-1@chick 1HFH-2@unknown 45HNF-3beta@rat 90 8Bcd@fruit 40Lyf-1@mouse 69 7 40NIT2@Neurospora 99 99 18 14 10 100 100 53 38 40 15HSF1@human 30HSF2@mouse 6 60 30SRY@mouse 78 52 35 35 1 92 22 7 47 47 38 60STRE@unknown 51 64 40 8 75HSF@fruit 64

34

Page 36: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

HSF@yeast 51AP-1@unknown 27AP-1@clawed 15 27AP-1@unknown 27MyoD@unknown 52 62 64 45AP-1@unknown 27AP-2@unknown 27NF-kappaB@unknown 40Sp1@unknown 47 26P@maize 52 1 17 75v-Myb@AMV 100 59 27 26Skn-1@Caenorhabditis 14Nkx-2.5@mouse 78 52 51 51 29 74 74 74 7 47 60 45 45 5cap@unknown 78 78 52 69 35 1 99 99 51 90 74 74 74 34 7 14 14 6 6 10 64 64

64 17 53 38 40 60 45 59 15 27 27 75 26GC

3.11 Correspondence Analysis

A correspondence analysis was performed on the 50 top ranking genes to lookfor strong associations between genes and experiments (Figure 14). Genes andexperiments are each projected into the same two-dimensional space. A gene thatis far removed from the center of the plot (0,0) is associated with an experimentif that experiment is also far removed from the center of the plot in the samedirection.

35

Page 37: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

−0.4 −0.2 0.0 0.2 0.4 0.6 0.8

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

1

2

3

4

5

67

8

9

10

1112

1314 15

161718 19

20

21

22

23

24

25

26

27 28

29

30

31 32

33 3435

36

3738

39

40

41

42

4344

45

46

4748

49

50

−0.2 0.0 0.2 0.4

−0.

20.

00.

20.

4

Ta

Ta

Ta

Ta

Ta

Ta

T1

T1T1

T1

T1

T1

T1

T2

T2

T2

T2

T2

Figure 14: Correspondence analysis of the top 50 ranking genes and the experiments.Genes are shown in one color and experiments are shown in a different color. Genenumbers refer to Table 2 or Table 3.

36

Page 38: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

4 Appendix A: parameters used in this report

37

Page 39: Staging of Bladder Tumors. · The data has been gathered by Skejby Sygehus and it cannot be used without their permission. 1Knudsen, S., Workman, C., Sicheritz-Ponten, T., and Friis,

Table 10: Parameters set in parameter file.Parameter Value (options in parenthesis)Name of file none

File names 709 Tagr2.CEL 795-4 Tagr3.CEL 928 Tagr2.CEL930 Tagr2.CEL 934 Tagr2.CEL 968-1 Tagr2.CELpool T1gr3.CEL 1098-3 T1gr3.CEL 625 T1gr3.CEL812 T1gr3.CEL 847 T1gr3.CEL 880 T1gr3.CEL919 T1gr3.CEL 1078-1 T2gr3.CEL 1133-1 T2gr3.CEL1169-1 T2gr4.CEL 875-1 T2gr3.CEL 937-1 T2gr3.CEL

Categories B B B B B B C C C C C C C D D D D D

Chip Type HU6800 (HG Focus HU6800 HG U95Av2 HG-U133AMG U74Av2 RG U34A DrosGenome1 YG S98 EcoliPae G1a AG Other)

Compressed CEL files FALSE (TRUE FALSE)

Experiment name Staging of Bladder Tumors.

Author Steen Knudsen

Organism hsa (bsu rno pae eco sce dro mmu pae)

B Ta

C T1

D t2

Category Names Ta Ta Ta Ta Ta Ta T1 T1 T1 T1 T1 T1 T1 T2 T2 T2 T2T2

Normalization method qspline (qspline quantile constant loess invariantset con-trasts)

Expression index li.wong (li.wong avdiff medianpolish)

Remove outliers FALSE (TRUE FALSE affects only li.wong calculation)

Background correction bg.adjust (FALSE bg.adjust subtractmm)

Statistical analysis parametric (parametric non-parametric)

Paired t-test FALSE (TRUE FALSE) (if TRUE experiments must ap-pear in the order they are paired)

Minimum cutoff for logfold calculation 1 (1-20)

Show results on X display FALSE (TRUE FALSE)

Max number of genes to analyze further 100

Bonferroni cutoff (max number of false pos.) 10

Logfold log2 (log2 log10 hlog)

Color scheme red-green (blue-yellow)

Include table of all genes NO (YES NO)

as well

who uses a microscope to evaluate and stage the suspicious growth into superficial Ta,

38