Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the...

20
Matching the Pemetrexed Heatmap Keith A. Baggerly September 24, 2009 Contents 1 Executive Summary 1 1.1 Introduction .............................................. 1 1.2 Methods ................................................ 1 1.3 Results ................................................. 1 1.4 Conclusions .............................................. 2 2 Options and Libraries 2 3 Loading and Parsing Data 2 3.1 Earlier Rda Files ........................................... 2 4 Matching the Reported Genes 2 5 Drawing Heatmaps of the Reported (and Offset) Genes 3 5.1 A Heatmap of the Reported Genes ................................. 3 5.2 A Heatmap of the Reported Genes, Offset ............................. 3 6 Using Correlation to Guess Starting Cell Lines 3 7 Steepest Ascent with Binreg 3 7.1 Exporting the Novartis A NCI-60 Data ............................... 7 7.2 Exporting the Starting Guess .................................... 7 7.3 Invoking Matlab ........................................... 7 8 Loading Matlab Output and Comparing Gene Lists 16 8.1 Loading Scores from Each Iteration ................................. 16 8.2 Extracting the Cell Lines Used ................................... 16 8.3 Loading the Heatmap Genes ..................................... 17 8.4 Comparing Gene Lists ........................................ 17 9 Save Rda File 17 10 Appendix 18 10.1 File Location ............................................. 18 10.2 Saves .................................................. 18 10.3 SessionInfo .............................................. 18 1

Transcript of Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the...

Page 1: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

Matching the Pemetrexed Heatmap

Keith A. Baggerly

September 24, 2009

Contents

1 Executive Summary 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Options and Libraries 2

3 Loading and Parsing Data 23.1 Earlier Rda Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

4 Matching the Reported Genes 2

5 Drawing Heatmaps of the Reported (and Offset) Genes 35.1 A Heatmap of the Reported Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.2 A Heatmap of the Reported Genes, Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

6 Using Correlation to Guess Starting Cell Lines 3

7 Steepest Ascent with Binreg 37.1 Exporting the Novartis A NCI-60 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77.2 Exporting the Starting Guess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77.3 Invoking Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

8 Loading Matlab Output and Comparing Gene Lists 168.1 Loading Scores from Each Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168.2 Extracting the Cell Lines Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168.3 Loading the Heatmap Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178.4 Comparing Gene Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

9 Save Rda File 17

10 Appendix 1810.1 File Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1810.2 Saves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1810.3 SessionInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1

Page 2: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 2

List of Figures

1 Heatmap of centered expression values from the Novartis A set expression data for the peme-trexed probesets reported by Hsu et al. [3] As these genes were chosen to separate sensitivefrom resistant lines, we expect to see fairly pervasive structure separating two subgroups. Wedo not see this structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Heatmap of centered expression values from the Novartis A set expression data for the probe-sets reported by Hsu et al. [3] after “offsetting” by one row. As genes were chosen to separatesensitive from resistant lines, we expect to see fairly pervasive structure separating two sub-groups. We see this structure here, suggesting that the reported list is incorrect due to anindexing error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Heatmap of correlations between samples using the offset probesets from Figure 2. The clearestgroups of cell lines are those in the upper right and lower left. . . . . . . . . . . . . . . . . . 6

4 First iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving TK-10from Unused to Resistant) is indicated with an offset circle. This shift increases the score from58 to 66. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Second iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving SW-620from Resistant to Unused) is indicated with an offset circle. This shift increases the score from66 to 71. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Third iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving SNB-75from Sensitive to Unused) is indicated with an offset circle. This shift increases the score from71 to 73. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

7 Fourth iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increasesor decreses as a given change is made. The shift producing the maximum score (movingKM12 from Resistant to Unused) is indicated with an offset circle. This shift leaves the scoreunchanged at 73. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

8 Fifth iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving SF-258from Sensitive to Unused) is indicated with an offset circle. This shift increases the score from73 to 77. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

9 Sixth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decresesas a given change is made. The shift producing the maximum score (moving CCRF-CEM fromResistant to Unused) is indicated with an offset circle. This shift increases the score from 77to 85. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

10 Seventh iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. Any shift from the current guess decreases the score. Ourbest score is 85 (a perfect match). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Page 3: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 3

11 Heatmap for the pemetrexed signature using the final set of cell lines obtained through oursearch procedure. This heatmap is a perfect match for that in Figure 1B of Hsu et al. [3],indicating that these are the cell lines and genes involved. . . . . . . . . . . . . . . . . . . . 15

List of Tables

1 Executive Summary

1.1 Introduction

Hsu et al. [3] construct a signature for pemetrexed using the NCI-60 cell lines. They note that pemetrexedand cisplatin sensitivity appear to be inversely correlated in NSCLC, suggesting that patients resistant toone would likely respond to the other. In this report, we outline our reconstruction of the heatmap provided,and our inferences about the specific genes and cell lines involved.

1.2 Methods

We loaded two previously constructed Rda files: novartisA and hsuReportedGeneLists. We extracted quan-tifications for the reported probesets, and examined heatmaps to see if there was clear separation of sensitiveand resistant cell lines. We repeated this procedure after “offsetting” the probesets by one row to accountfor possible problems with binreg. We then constructed pairwise sample correlation matrices to suggest thespecific cell lines used in each group. Starting from this initial guess, we then examined all other sets of celllines in a local “neighborhood” and assigned each set a “score” of the number of reported probesets (offset ornot) that were matched. We shifted to the set with the highest score in the neighborhood and repeated theprocess until a local maximum was reached. This scoring makes use of Matlab scripts which call the binregsoftware; the primary file is buildPemetrexedHeatmap.m.

1.3 Results

Using probesets“off-by-one” from those reported, we were able to perfectly reconstruct the heatmap reported,thus identifying the specific genes and cell lines involved. The cell lines areResistant (8): K-562, MOLT-4, HL-60(TB), MCF7, HCC-2998, HCT-116, NCI-H460, and TK-10,Sensitive (10): SNB-19, HS 578T, MDA-MB-231/ATCC, MDA-MB-435, NCI-H226, M14, MALME-3M,SKMEL-2, SK-MEL-28, and SN12C. We match all 85 of the genes reported after offsetting.

The various intermediate files and gene lists for pemetrexed are saved in RDataObjects as pemetrexedAll.Rda.

1.4 Conclusions

The reported signature for pemetrexed is incorrect due to an off-by-one indexing error.

2 Options and Libraries

> options(width = 80)

Page 4: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 4

3 Loading and Parsing Data

3.1 Earlier Rda Files

We begin by loading two Rda files assembled earlier: novartisA and hsuReportedGenelists.

> rdaList <- c("novartisA", "hsuReportedGenelists")

> for (rdaFile in rdaList) {

+ rdaFullFile <- file.path("RDataObjects", paste(rdaFile, "Rda",

+ sep = "."))

+ if (file.exists(rdaFullFile)) {

+ cat("loading ", rdaFullFile, " from cache\n")

+ load(rdaFullFile)

+ }

+ else {

+ cat("building ", rdaFullFile, " from raw data\n")

+ Stangle(file.path("RNowebSource", paste("buildRda", rdaFile,

+ "Rnw", sep = ".")))

+ source(paste("buildRda", rdaFile, "R", sep = "."))

+ }

+ }

loading RDataObjects/novartisA.Rda from cacheloading RDataObjects/hsuReportedGenelists.Rda from cache

4 Matching the Reported Genes

Hsu et al. [3] used 18 cell lines to form their pemetrexed signature: 8 resistant and 10 sensitive. Fromthese, the binreg software was used to select the 85 genes having the most extreme two-sample t-test valuesseparating resistant from sensitive.

We loaded the list of reported genes above. We now check that all of these genes reported are present inthe novartisA data matrix.

> unmatchedRows <- which(is.na(match(pemetrexedReportedProbesets[,

+ "probesetID"], rownames(novartisA))))

> unmatchedRows

integer(0)

> matchedPemetrexedProbesets <- rownames(pemetrexedReportedProbesets)

We match all 85 of the reported probesets.

5 Drawing Heatmaps of the Reported (and Offset) Genes

5.1 A Heatmap of the Reported Genes

We now draw a heatmap showing the expression levels of the reported genes across all of the samples.Since these genes were chosen specifically to separate one group of cell lines from another, we expect to see

Page 5: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 5

some clear differential structure. We center and scale the results for each row (probeset) before display (theheatmap function in R does this scaling by default). We also use an analog of the Matlab jet colormap. Theheatmap is shown in Figure 1. There is essentially no separating structure visible.

5.2 A Heatmap of the Reported Genes, Offset

Coombes et al. [1] noted that all of the gene lists initially reported by Potti et al. [4] had been “offset by one”due to an indexing error. We try introducing the same type of offset (e.g., replacing 1100 at with 1101 at)and redrawing the heatmap. This heatmap is shown in Figure 2. There is now clearly visible structureseparating groups of cell lines. This suggests that the probesets shown were part of the “true” signature, andthat the reported list is incorrect due to the same indexing error that affected the initial gene lists reportedby Potti et al. [4].

6 Using Correlation to Guess Starting Cell Lines

We know of no simple analytic way of determining which cell lines were involved, and the names are notgiven in Hsu et al. [3] or in their supplementary material. We rely instead on a combination of informedguessing and brute force. We begin by looking at correlations between samples using the expression valuesfor the offset genes identified above. A heatmap is shown in Figure 3. Looking at the correlation heatmapsuggests some clear candidates for the larger (sensitive) group: the 12 samples in the cluster at the topright. Reading from the top, these are SNB-75, HS 578T, SNB-19, SF-268, NCI-H226, SK-MEL-2, M14,MALME-3M, SK-MEL-28, MDA-MB-231/ATCC, SN12C, and MDA-MB-435. Likewise, the 10 samples inthe cluster at the bottom left are good candidates for the smaller (resistant) group. Reading from the top,these are SW-620, NCI-H460, KM12, HCT-116, HCC-2998, K-562, CCRF-CEM, HL-60(TB), MCF7, andMOLT-4.

> startingSensitiveLines <- c("SNB-75", "HS 578T", "SNB-19", "SF-268",

+ "NCI-H226", "SK-MEL-2", "M14", "MALME-3M", "SK-MEL-28", "MDA-MB-231/ATCC",

+ "SN12C", "MDA-MB-435")

> startingResistantLines <- c("SW-620", "NCI-H460", "KM12", "HCT-116",

+ "HCC-2998", "K-562", "CCRF-CEM", "HL-60(TB)", "MCF7", "MOLT-4")

The starting lists involve 12 and 10 cell lines, respectively, whereas Hsu et al. [3] use 10 and 8.

7 Steepest Ascent with Binreg

In order to improve our guess, we use a process of steepest ascent with the binreg software used by Potti etal. [4] Specifically, we give each set of cell lines a score corresponding to the number of target probesets wecan match. From our starting set, we score all sets that can be reached by changing the status of at mostone cell line (e.g., from sensitive to resistant, from sensitive to excluded, from excluded to resistant). Wethen shift our central location to the set in the neighborhood with the highest score and repeat until a localmaximum is reached.

7.1 Exporting the Novartis A NCI-60 Data

To conduct our search, we first export the novartisA NCI-60 in a format usable by binreg.

Page 6: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 6

> tempMat <- log2(novartisA[matchedPemetrexedProbesets, ])

> source(file.path("Scripts", "jet.colors.R"))

> heatmap(as.matrix(tempMat), col = jet.colors(64), margins = c(7,

+ 4), cexRow = 0.45, cexCol = 0.65)M

CF

7T

−47

DH

CT

−11

6K

M12

HC

T−

15R

PM

I−82

26K

−56

2H

L−60

(TB

)S

RC

CR

F−

CE

MM

OLT

−4

SW

−62

0O

VC

AR

−4

OV

CA

R−

3H

S 5

78T

SN

B−

75S

F−

295

SF

−53

9N

CI−

H32

2MO

VC

AR

−5

A49

8U

O−

31E

KV

XA

549/

ATC

CR

XF

393

786−

0IG

RO

V1

OV

CA

R−

8N

CI/A

DR

−R

ES

HT

29D

U−

145

SN

12C

AC

HN

TK

−10

CA

KI−

1N

CI−

H22

6LO

X IM

VI

NC

I−H

23M

DA

−M

B−

231/

ATC

CH

OP

−92

HO

P−

62S

F−

268

BT

−54

9P

C−

3S

K−

ME

L−5

UA

CC

−25

7U

AC

C−

62N

CI−

H52

2U

251

SN

B−

19C

OLO

205

HC

C−

2998

NC

I−H

460

SK

−O

V−

3M

DA

−M

B−

435

M14

SK

−M

EL−

2M

ALM

E−

3MS

K−

ME

L−28

31545_at37746_r_at32748_at241_g_at40864_at33451_s_at41833_at39247_at40393_at36584_at40683_at40952_at797_at31462_f_at32892_at41448_at33377_at1227_g_at37484_at34245_at41402_at41643_at39350_at41442_at36191_at41459_at31537_at33144_at1355_g_at41738_at38545_at39799_at40327_at36118_at590_at33918_s_at41853_at39797_at33613_at39543_at40212_at32317_s_at37374_at41127_at37344_at34318_at242_at41435_at37744_r_at32225_at35699_at38908_s_at39328_at33854_at35762_at40492_at32835_at32259_at38404_at38119_at39329_at35747_at32573_at39018_at34859_at39169_at1100_at40821_at32433_at39749_at35351_at38287_at31510_s_at1318_at40854_at32251_at33361_at35434_at39149_at41757_at36535_at41234_at36988_at34858_at38478_at

Figure 1: Heatmap of centered expression values from the Novartis A set expression data for the pemetrexedprobesets reported by Hsu et al. [3] As these genes were chosen to separate sensitive from resistant lines, weexpect to see fairly pervasive structure separating two subgroups. We do not see this structure.

Page 7: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 7

> offsetPemetrexedProbesets <- rownames(novartisA)[match(matchedPemetrexedProbesets,

+ rownames(novartisA)) + 1]

> tempMat <- log2(novartisA[offsetPemetrexedProbesets, ])

> heatmap(as.matrix(tempMat), col = jet.colors(64), margins = c(7,

+ 4), cexRow = 0.45, cexCol = 0.65)R

PM

I−82

26 SR

HT

29H

L−60

(TB

)K

−56

2H

CC

−29

98C

CR

F−

CE

MM

OLT

−4

MC

F7

HC

T−

116

KM

12N

CI−

H46

0T

−47

DN

CI−

H32

2MN

CI−

H23

NC

I−H

522

DU

−14

5IG

RO

V1

SK

−O

V−

3T

K−

10O

VC

AR

−4

OV

CA

R−

3H

CT

−15

SW

−62

0P

C−

3O

VC

AR

−5

A54

9/AT

CC

EK

VX

A49

8R

XF

393

AC

HN

786−

0C

OLO

205

NC

I/AD

R−

RE

SC

AK

I−1

OV

CA

R−

8S

K−

ME

L−5

UA

CC

−25

7U

AC

C−

62LO

X IM

VI

BT

−54

9S

N12

CH

OP

−92

MD

A−

MB

−23

1/AT

CC

UO

−31

HO

P−

62U

251

SN

B−

19S

K−

ME

L−2

M14

MA

LME

−3M

MD

A−

MB

−43

5S

K−

ME

L−28

NC

I−H

226

SF

−26

8H

S 5

78T

SF

−29

5S

F−

539

SN

B−

75

40328_at1319_at38288_at41739_s_at39544_at32252_at40394_at40855_at1101_at38405_at34319_at37485_at39248_at33145_at41854_at38909_at36536_at35352_at33452_at38546_at40213_at38120_at33362_at591_s_at242_at39750_at37375_at37745_s_at32893_s_at41443_at41128_at1228_s_at33855_at41460_at356_at35763_at40684_at41436_at33378_at798_at40822_at35435_s_at40865_at32318_s_at31511_at33614_at39798_at31538_at32749_s_at41235_at39800_s_at35748_at38479_at31463_s_at31546_at40953_at36119_at41449_at34246_at33919_at32434_at36192_at39150_at39351_at41644_at32574_at40493_at36989_at39170_at37345_at1356_at41403_at41834_g_at36585_at39019_at34860_g_at34859_at32836_at39330_s_at39329_at32260_at243_g_at32226_at41758_at37747_at

Figure 2: Heatmap of centered expression values from the Novartis A set expression data for the probesetsreported by Hsu et al. [3] after “offsetting” by one row. As genes were chosen to separate sensitive fromresistant lines, we expect to see fairly pervasive structure separating two subgroups. We see this structurehere, suggesting that the reported list is incorrect due to an indexing error.

Page 8: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 8

> pemetrexedCor <- cor(t(scale(t(tempMat))))

> heatmap(pemetrexedCor, scale = "none", margins = c(7, 7), cexCol = 0.7,

+ cexRow = 0.7, col = jet.colors(64))M

OLT

−4

MC

F7

HL−

60(T

B)

CC

RF

−C

EM

K−

562

HC

C−

2998

HC

T−

116

KM

12N

CI−

H46

0S

W−

620

RP

MI−

8226 SR

HC

T−

15T

K−

10IG

RO

V1

SK

−O

V−

3O

VC

AR

−3

OV

CA

R−

4D

U−

145

CO

LO 2

05P

C−

3T

−47

DN

CI−

H23

NC

I−H

522

HT

29N

CI−

H32

2MA

549/

ATC

CLO

X IM

VI

EK

VX

OV

CA

R−

5S

K−

ME

L−5

UA

CC

−25

7U

AC

C−

62C

AK

I−1

OV

CA

R−

8N

CI/A

DR

−R

ES

AC

HN

RX

F 3

93A

498

786−

0H

OP

−62

SF

−29

5S

F−

539

BT

−54

9H

OP

−92

UO

−31

U25

1M

DA

−M

B−

435

SN

12C

MD

A−

MB

−23

1/AT

CC

SK

−M

EL−

28M

ALM

E−

3M M14

SK

−M

EL−

2N

CI−

H22

6S

F−

268

SN

B−

19H

S 5

78T

SN

B−

75

MOLT−4MCF7HL−60(TB)CCRF−CEMK−562HCC−2998HCT−116KM12NCI−H460SW−620RPMI−8226SRHCT−15TK−10IGROV1SK−OV−3OVCAR−3OVCAR−4DU−145COLO 205PC−3T−47DNCI−H23NCI−H522HT29NCI−H322MA549/ATCCLOX IMVIEKVXOVCAR−5SK−MEL−5UACC−257UACC−62CAKI−1OVCAR−8NCI/ADR−RESACHNRXF 393A498786−0HOP−62SF−295SF−539BT−549HOP−92UO−31U251MDA−MB−435SN12CMDA−MB−231/ATCCSK−MEL−28MALME−3MM14SK−MEL−2NCI−H226SF−268SNB−19HS 578TSNB−75

Figure 3: Heatmap of correlations between samples using the offset probesets from Figure 2. The clearestgroups of cell lines are those in the upper right and lower left.

Page 9: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 9

> affyControlRows = grep("^AFFX", rownames(novartisA))

> write.table(novartisA[-affyControlRows, ], file = file.path("MatlabFiles",

+ "NCI60Data", "nci60_numbers.csv"), sep = ",", row.names = FALSE,

+ col.names = FALSE)

> write.table(rownames(novartisA)[-affyControlRows], file = file.path("MatlabFiles",

+ "NCI60Data", "nci60_probesets.csv"), sep = ",", row.names = FALSE,

+ col.names = FALSE, quote = FALSE)

> write.table(colnames(novartisA), file = file.path("MatlabFiles",

+ "NCI60Data", "nci60_samples.csv"), sep = ",", row.names = FALSE,

+ col.names = FALSE, quote = FALSE)

7.2 Exporting the Starting Guess

We likewise export our starting guesses for the resistant and sensitive cell lines, and the target probeset idsthat we want to match.

> write.table(startingResistantLines, file = file.path("MatlabFiles",

+ "Pemetrexed", "startingResistantLines.csv"), sep = ",", row.names = FALSE,

+ col.names = FALSE, quote = FALSE)

> write.table(startingSensitiveLines, file = file.path("MatlabFiles",

+ "Pemetrexed", "startingSensitiveLines.csv"), sep = ",", row.names = FALSE,

+ col.names = FALSE, quote = FALSE)

> write.table(offsetPemetrexedProbesets, file = file.path("MatlabFiles",

+ "Pemetrexed", "targetProbesets.csv"), sep = ",", row.names = FALSE,

+ col.names = FALSE, quote = FALSE)

7.3 Invoking Matlab

The main Matlab script is buildPemetrexedHeatmap.m. This script iteratively explores the neighborhoodto increase the score until a maximum is found. At each iteration, it produces a figure indicating how thecurrent guess should be changed. For pemetrexed, this search and change process takes seven steps, asillustrated in Figures 4-10. The baseline score is 58 in agreement. The biggest jump (to 66) is seenwhen we add TK-10 to the resistant group. The biggest jump in the second neighborhood (to 71) is seenwhen we drop SW-620 from the resistant group. The biggest jump in the third neighborhood (to 73) is seenwhen we drop SNB-75 from the sensitive group. There is no change in the fourth neighborhood that causesan increase, and only one alteration to the list that leaves things at 73 – dropping KM12 from the resistantgroup (we make this alteration). The biggest jump in the fifth neighborhood (and there is one, to 77) is seenwhen we drop SF-268 from the sensitive group. The biggest jump in the sixth neighborhood (to 85, a perfectscore) is seen when we drop CCRF-CEM from the resistant group. Exploring the seventh neighborhoodshows that we are indeed at a maximum; any further alteration of the list drops the score. The numbers ofresistant and sensitive lines now match those shown by Hsu et al. [3]

After reaching a local maximum, we produce a heatmap using the final set of cell lines selected. Thisheatmap, shown in Figure 11, perfectly matches the one shown in Figure 1B of Hsu et al. [3]

8 Loading Matlab Output and Comparing Gene Lists

8.1 Loading Scores from Each Iteration

We begin by loading the numerical scores and starting vectors for each iteration of the fitting procedure.

Page 10: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 10

Figure 4: First iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving TK-10 from Unused to Resistant) isindicated with an offset circle. This shift increases the score from 58 to 66.

Page 11: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 11

Figure 5: Second iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving SW-620 from Resistant to Unused) isindicated with an offset circle. This shift increases the score from 66 to 71.

Page 12: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 12

Figure 6: Third iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving SNB-75 from Sensitive to Unused) isindicated with an offset circle. This shift increases the score from 71 to 73.

Page 13: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 13

Figure 7: Fourth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving KM12 from Resistant to Unused) isindicated with an offset circle. This shift leaves the score unchanged at 73.

Page 14: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 14

Figure 8: Fifth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving SF-258 from Sensitive to Unused) isindicated with an offset circle. This shift increases the score from 73 to 77.

Page 15: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 15

Figure 9: Sixth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving CCRF-CEM from Resistant to Unused)is indicated with an offset circle. This shift increases the score from 77 to 85.

Page 16: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 16

Figure 10: Seventh iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. Any shift from the current guess decreases the score. Our best score is 85 (a perfect match).

Page 17: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 17

Figure 11: Heatmap for the pemetrexed signature using the final set of cell lines obtained through our searchprocedure. This heatmap is a perfect match for that in Figure 1B of Hsu et al. [3], indicating that these arethe cell lines and genes involved.

Page 18: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 18

> pemetrexedIteration1 <- read.table(file.path("MatlabFiles", "Pemetrexed",

+ "pemetrexedIteration1.csv"), header = TRUE, sep = ",", row.names = 1)

> pemetrexedIteration2 <- read.table(file.path("MatlabFiles", "Pemetrexed",

+ "pemetrexedIteration2.csv"), header = TRUE, sep = ",", row.names = 1)

> pemetrexedIteration3 <- read.table(file.path("MatlabFiles", "Pemetrexed",

+ "pemetrexedIteration3.csv"), header = TRUE, sep = ",", row.names = 1)

> pemetrexedIteration4 <- read.table(file.path("MatlabFiles", "Pemetrexed",

+ "pemetrexedIteration4.csv"), header = TRUE, sep = ",", row.names = 1)

> pemetrexedIteration5 <- read.table(file.path("MatlabFiles", "Pemetrexed",

+ "pemetrexedIteration5.csv"), header = TRUE, sep = ",", row.names = 1)

> pemetrexedIteration6 <- read.table(file.path("MatlabFiles", "Pemetrexed",

+ "pemetrexedIteration6.csv"), header = TRUE, sep = ",", row.names = 1)

> pemetrexedIteration7 <- read.table(file.path("MatlabFiles", "Pemetrexed",

+ "pemetrexedIteration7.csv"), header = TRUE, sep = ",", row.names = 1)

> pemetrexedIteration7[1:5, ]

startStatus Resistant Unused SensitiveCCRF-CEM Unused 77 85 50K-562 Resistant 85 72 45MOLT-4 Resistant 85 71 31HL-60(TB) Resistant 85 76 37RPMI-8226 Unused 71 85 57

8.2 Extracting the Cell Lines Used

Next, we extract the cell lines used to obtain the perfectly matching heatmap.

> pemetrexedResistantLines <- rownames(pemetrexedIteration3)[pemetrexedIteration7[,

+ "startStatus"] == "Resistant"]

> pemetrexedSensitiveLines <- rownames(pemetrexedIteration3)[pemetrexedIteration7[,

+ "startStatus"] == "Sensitive"]

> pemetrexedResistantLines

[1] "K-562" "MOLT-4" "HL-60(TB)" "MCF7" "HCC-2998" "HCT-116"[7] "NCI-H460" "TK-10"

> pemetrexedSensitiveLines

[1] "SNB-19" "HS 578T" "MDA-MB-231/ATCC" "MDA-MB-435"[5] "NCI-H226" "M14" "MALME-3M" "SK-MEL-2"[9] "SK-MEL-28" "SN12C"

8.3 Loading the Heatmap Genes

Next, we load the probesets used in the final heatmap.

> softwarePemetrexedProbesets <- read.table(file.path("MatlabFiles",

+ "Pemetrexed", "topPemetrexedGenesInHeatmapOrder.txt"), header = FALSE,

+ sep = ",", strip.white = TRUE)

> softwarePemetrexedProbesets <- as.character(softwarePemetrexedProbesets[,

Page 19: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 19

+ 1])

> sort(softwarePemetrexedProbesets)

[1] "1101_at" "1228_s_at" "1319_at" "1356_at" "242_at"[6] "243_g_at" "31463_s_at" "31511_at" "31538_at" "31546_at"[11] "32226_at" "32252_at" "32260_at" "32318_s_at" "32434_at"[16] "32574_at" "32749_s_at" "32836_at" "32893_s_at" "33145_at"[21] "33362_at" "33378_at" "33452_at" "33614_at" "33855_at"[26] "33919_at" "34246_at" "34319_at" "34859_at" "34860_g_at"[31] "35352_at" "35435_s_at" "356_at" "35748_at" "35763_at"[36] "36119_at" "36192_at" "36536_at" "36585_at" "36989_at"[41] "37345_at" "37375_at" "37485_at" "37745_s_at" "37747_at"[46] "38120_at" "38288_at" "38405_at" "38479_at" "38546_at"[51] "38909_at" "39019_at" "39150_at" "39170_at" "39248_at"[56] "39329_at" "39330_s_at" "39351_at" "39544_at" "39750_at"[61] "39798_at" "39800_s_at" "40213_at" "40328_at" "40394_at"[66] "40493_at" "40684_at" "40822_at" "40855_at" "40865_at"[71] "40953_at" "41128_at" "41235_at" "41403_at" "41436_at"[76] "41443_at" "41449_at" "41460_at" "41644_at" "41739_s_at"[81] "41758_at" "41834_g_at" "41854_at" "591_s_at" "798_at"

8.4 Comparing Gene Lists

Now we see which genes we can and cannot match.

> sum(!is.na(match(offsetPemetrexedProbesets, softwarePemetrexedProbesets)))

[1] 85

> setdiff(softwarePemetrexedProbesets, offsetPemetrexedProbesets)

character(0)

As noted above, the software output matches the offset gene list perfectly.

9 Save Rda File

Finally, we save the data associated with our examination of the pemetrexed signature.

> save(pemetrexedCor, pemetrexedIteration1, pemetrexedIteration2,

+ pemetrexedIteration3, pemetrexedIteration4, pemetrexedIteration5,

+ pemetrexedIteration6, pemetrexedIteration7, pemetrexedReportedProbesets,

+ pemetrexedResistantLines, pemetrexedSensitiveLines, pemetrexedTable,

+ matchedPemetrexedProbesets, offsetPemetrexedProbesets, softwarePemetrexedProbesets,

+ file = file.path("RDataObjects", "pemetrexedAll.Rda"))

Page 20: Matching the Pemetrexed Heatmap - Bioinformatics...matchPemetrexedHeatmap.Rnw 3 11 Heatmap for the pemetrexed signature using the nal set of cell lines obtained through our search

matchPemetrexedHeatmap.Rnw 20

10 Appendix

10.1 File Location

> getwd()

[1] "/Users/kabagg/ReproRsch/AnnAppStat"

10.2 Saves

10.3 SessionInfo

> sessionInfo()

R version 2.9.1 (2009-06-26)i386-apple-darwin8.11.1

locale:en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:[1] stats graphics grDevices utils datasets methods base

other attached packages:[1] geneplotter_1.22.0 lattice_0.17-25 annotate_1.22.0[4] AnnotationDbi_1.6.1 Biobase_2.4.1 XML_2.6-0

loaded via a namespace (and not attached):[1] DBI_0.2-4 grid_2.9.1 KernSmooth_2.23-2 RColorBrewer_1.0-2[5] RSQLite_0.7-1 xtable_1.5-5

References

[1] Coombes KR, Wang J, Baggerly KA: Microarrays: retracing steps. Nat Med, 13:1276-7, 2007. Authorreply, 1277-8.

[2] Gyorffy B, Surowiak P, Kiesslich O, et al.: Gene expression profiling of 30 cancer cell lines predictsresistance towards 11 anticancer drugs at clinically achieved concentrations. Int J Cancer, 118:1699-712, 2006.

[3] Hsu DS, Balakumaran BS, Acharya CR, et al.: Pharmacogenomic strategies provide a rational approachto the treatment of cisplatin-resistant patients with advanced cancer. J Clin Oncol, 25:4350-4357, 2007

[4] Potti A, Dressman HK, Bild A, et al: Genomic signatures to guide the use of chemotherapeutics. NatMed, 12:1294-1300, 2006.