Day 2 Track 2 - IASCT

256
Day 2 Track 2 Conference for Statistical Programmers In Clinical Research (ConSPIC) 2011 Bengaluru Indian Association for Statistics in Clinical Trials

Transcript of Day 2 Track 2 - IASCT

Page 1: Day 2 Track 2 - IASCT

Day 2 Track 2

Conference for Statistical Programmers In Clinical Research (ConSPIC) 2011

Bengaluru

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 2: Day 2 Track 2 - IASCT

ConSPIC 2011 Page iv

Day 2

Time Track 1 (Lalit 1&2) Track 2 (Lalit 3&4)

9.30 – 11.00 Session 1 – Reporting Session 1 – Statistics

D2S1T1

&

D2S1T2

Subject Narratives through SAS -

Devayani Deodhar

Personalized medicines – Role of

ROC analysis using SAS 9.2 -

Muralikrishna C

Techniques in RTF formatting in

clinical trial reports - Sameer

Bamnote

Comparison of multiple SAS

procedures to perform statistical

activity for same objective -

Pradeep Acharya

Overview of NLS (National

Language Support) in SAS - K.

E. Sudarshan

Simulating clinical trial data

using SAS - Ramsathish S

Template procedure, and

customizing style template -

Rubia Shaik

Statistical and Graphical

Methods used in clinical trials -

Meghana Marathe

11.00 – 11.30 Break

11.300 – 1.00 Session 2 – SAS and Beyond Session 2 – Efficiency

D2S2T1

&

D2S2T2

Graphics across languages -

Tapas Chakraborty

Hashing Unleashed!!! - Pratibha

Jalui

Bookmarking and hyperlinking

in PDF and HTML outputs using

SAS - Sugunesh Sivalingam

Dorfman-Whitlock DO- (DOW-)

Loop –A Loop for N to one data

step programming - Periasamy K

Oracle Clinical for SAS

programmers - Sandeep Kumar

Lets go for a picture (format) -

Senthilkumar Karuppiah

Using R to validate SAS outputs

- Nikhil Abhyankar

Proc transpose for Horizontal

data - Ramesh Sundaram

1.00 – 2.00 Lunch

2.00 – 3.30 Session 3 – SDTM/CDISC Session 3 - Quality

D2S3T1

&

D2S3T2

Clinical Data Standardization

Methodology - Pankaj

Bharadwaj

10 simple ways to do quality

checks on database -

Jayapandian N

SDTM data conversion system -

Ashwin Venkat

Why should stats have all the

fun? – Priya Iyer

Journey into the world of SAS

clinical standards toolkit and

Proc CDISC - Melanie Vaz

Electronic validation for clinical

data and summary reports -

Amruta Pathak

Validation of SDTM metadata

using an utility - Jegan Pillaiyar

RTF to SAS datasets -

Vijaybhaskar Reddy

3.30 – 4.00 Break

4.00 – 5.15 AGM (Lalit 1 – 4)

Page 3: Day 2 Track 2 - IASCT

- By Vinay Mahajan, Mahesh Babu, Ramsathish S

Data Scrambling

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 4: Day 2 Track 2 - IASCT

All material in these slides are the opinion of the speakers, and do not reflect the views of Novartis

Pharmaceuticals.

Disclaimer

| ConSPIC 2011| Author | Sep 29-30, 2011 2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 5: Day 2 Track 2 - IASCT

Purpose

• To blind the data or Scrambling the data

• blinded outputs for clinical team review• to create output for client review before un-blinding the study• we can create test data• Data for edit check testing

3 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 6: Day 2 Track 2 - IASCT

Different type of studies

• Blinded studies (Single blind, Double blind)

• Un-blinded studies (Open label)

• FDA recommendation: To keep the blind of the study as high possible

4 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 7: Day 2 Track 2 - IASCT

Scrambling the what does it mean? How to do this?

5 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

• For a given study if there is 100 patient data available in a form of SAS datasets, then

• completely dismantle linkage across and within CRF pages for a patient level info

• but keep number of patients close to the original number in the resultant datasets

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 8: Day 2 Track 2 - IASCT

Types of scrambling data

6 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

• Type 1: Subject level patient number changing

• Type 2: Record level patient number changing

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 9: Day 2 Track 2 - IASCT

Scrambling data: flowchart for Type1

Identify a study with available SAS data

Go through the source data library for all the datasets and

unique identifier

Setup macro variables (1) STYSID1A (unique ID)(2) Input Library name(3) Output Library Name.

USER INPUT

Using new dataset and creates a new order for new variable using “Proc Plan”

Creates a new dataset with new variable by using _n_

Creates a dataset with unique dataset name [one

record/dataset]

merge new dataset and main dataset with new variable as

by var

Send the final dataset to output library

Output libraryScrambling complete

7 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 10: Day 2 Track 2 - IASCT

Type 1 example:

8 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 11: Day 2 Track 2 - IASCT

Scrambling data: flowchart for Type2

9 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Finds unique dataset in work library

Creates a macro variable contains number of unique

datasets in the library

Setup macro variables (1) patient IDUSER INPUT

Merge the records in original dataset and the newly created

dataset named new by the common variable _n_

Variable _n_ will be created in each of the original dataset

Create a dataset with 2 variables: Patient ID + _n_ for

each dataset when the iteration runs. Patient ID will be picked in a random order from the source dataset without excluding any

Patient ID

Real number of patients in a dataset will be the one which we get from the dataset called new

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 12: Day 2 Track 2 - IASCT

Type 2 example:

10 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 13: Day 2 Track 2 - IASCT

Pros• Complete dismantling of the data• Patient data at visit level as well as across CRF pages delinked• Could be a very useful way to create dummy data• Useful tool to create a lot data issues – in turn useful data to write

data validation programs

Cons• No way to check data consistency across reports• May have some funny data issues e.g. Visit dates appearing after

death

11 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Pros and Cons

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 14: Day 2 Track 2 - IASCT

- Vinay Mahajan, Mahesh Babu, Ramsathish S

Creating Dummy Data

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 15: Day 2 Track 2 - IASCT

To test new programs

To develop new codes

To create dummy project

To train new programmers

Purpose

13 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 16: Day 2 Track 2 - IASCT

User Input part

Automatic part

About the program

14 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 17: Day 2 Track 2 - IASCT

Defining Key variables• Subject ID• Site ID• Visits• Treatments• Days for each visits• Sex• Ethnicity• Race• Age range• Child baring potential options• Output Library Name

User Input part

15 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 18: Day 2 Track 2 - IASCT

Example:

No of adverse events

Severity levels and grades

No of Medical History events

Etc

Automatic Part

16 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 19: Day 2 Track 2 - IASCT

17 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Medical History Dataset

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 20: Day 2 Track 2 - IASCT

Pros We can generate our own data We can use this data any trainings Generation is very simple it is very useful for training purpose

Cons Limited to few datasets User inputs are more AE-terms and MH-terms are not original terms We may get few data issues We may not get accurate data

18 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Pros and Cons

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 21: Day 2 Track 2 - IASCT

19 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 22: Day 2 Track 2 - IASCT

20 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 23: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 1 -

STATISTICAL & GRAPHICAL METHODS USED IN

EXPLORATORY TRIALS

Meghana MarathePinakin Jani

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 24: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 2 -

Agenda

Introduction Exploratory Analysis Statistical Methods Overview Case Study SAS Macro for Cancer Exploratory Trials Output Display Conclusion

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 25: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 3 -

Exploratory Analysis

Why exploratory analysis?

In present scenario we see lot of Independent New Drug Discovery (IND) and trend will increase in the upcoming years.

With current trends of efficacy, safety concerns and lengthy drug development processes, there is a critical need for efficiencies in drug development procedures.

In exploratory analysis data review, the goals are to quickly extract, display and review the salient safety and efficacy information in the data, generally using graphical and statistical methods, which facilitates quick interpretation and communication.

More emphasis is given on the published proposals and materials of exploratory trial analyses which may act as an benchmark for presenting and analyzing the data further in Clinical Trials.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 26: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 4 -

Statistical Methods Overview

Introduction to Correlation

It is the extent to which – two or more quantitative variables are related:

Positively Correlated : the value of one variable “varying somewhat in step” with the values of another variable

Negatively Correlated : the values of one continuous variable “varying somewhat in opposite step” with the values of another variable

Not Correlated : the values of one continuous variable “varying randomly” with the values of another variable

Pearson’s Linear Correlation Coefficient (r):

The Pearson product-moment correlation coefficient is a measure of the co-relation between two variables x and y.

Pearson's r reflects the intensity of linear relationship between two variables. It ranges from +1 to -1.

value of (r) near 1 : Positive Correlationvalue of (r) near -1 : Negative Correlationvalue of (r) near 0 : No or poor correlationInd

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 27: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 5 -

Statistical Methods Overview ………….continued

Assumptions of Pearson’s r

There is a linear relationships between x and y Both x and y are continuous random variables Both variables are normally distributed Equal differences between measurements represent equivalent intervals.

Spearman’s Rank Correlation Coefficient (ρ) or (rho):

• Spearman's rank correlation is a non parametric measure of the intensity of a correlation between two variables, without making any assumptions about the distribution of the variables, i.e. about the linearity, normality or scale of the relationship.

value of (ρ) near 1 : Positive Correlationvalue of (ρ) near -1 : Negative Correlationvalue of (ρ) near 0 : No or poor correlation

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 28: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 6 -

Case Study

Overview of Graphical & Statistical methods used in Exploratory Cancer Trial:

Data Domain: Laboratory

Graphical Method: Scatter Plot with regression fit

Statistical Methods Used: Spearman Rank Correlation / Pearson Correlation

Inferential Statistics 95% Confidence Interval (CI) using parameters from Spearman Correlation.

In our examples we show how appropriate statistical graphics elements may be used to extract and highlight salient information in the clinical study data.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 29: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 7 -

SAS Macro for Cancer Exploratory Trials

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 30: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 8 -

Output Display (End of Treatment Period)

Interpretation:

•As per the output we see that the values of the variables are Negatively correlated.

•For Treatment A, the value of rho is greater than -0.9 which implies NK is negatively correlated to CD3_TUM.

•Similar Trend is observed for Treatment B and C.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 31: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 9 -

Conclusion

In Exploratory Cancer trials, we recommend to use Correlation Analysis to find relationship between the target variable which provides scientific evidence for easy interpretation and hence decision-making.

Standard Macro can be developed for use across the similar studies and can be modified for other functional areas.

Similarly there are several statistical and graphical methods that can be optimized for the better analysis and presentation of the clinical data which would play vital role in the Independent New Drug research market and hence would provide robust platformfor further analysis in clinical trials.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 32: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 10 -

References

• http://www.springer.com/cda/content/document/cda_downloaddocument/9780387988146-c7.pdf?SGWID=0-0-45-101848-p2018642

• http://support.sas.com/resources/papers/proceedings10/234-2010.pdf

We gratefully acknowledge the support provided by TCS SPA management Team and IASCT for providing this opportunity.

Acknowledgement

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 33: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 11 - 11

Thank You

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 34: Day 2 Track 2 - IASCT

Muralikrishna ChakravarthulaSenior Statistical Analyst - INovartis Oncology

Personalized Medicines – Role of ROC Curve analysis using SAS® 9.2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 35: Day 2 Track 2 - IASCT

Disclaimer

All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis

| ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011 2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 36: Day 2 Track 2 - IASCT

Introduction

ROC Curve and Its Uses

Conceptual definitions – Measurements of quality of the test

Example: PSA as a biomarker for Prostate cancer

ROC curve - PROC LOGISTIC – SAS 9.2

Area under the ROC curve

AGENDA

| ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011 3

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 37: Day 2 Track 2 - IASCT

4

Introduction

| ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Increasing trend towards developing personalized (individualized) medicines through evaluation of new biomarkers.

Right patient selection is a primary goal. • Diagnostic tests plays major role.

Build a robust diagnostic/clinical test.

Evaluate diagnostic test - Receiver Operating Characteristic (ROC) curve analysis.

Note: Receiver-operating characteristic (ROC) analysis was originally developed during World War II for radar images. The first applications of this theory with in the medical area occurred during late 1960s.Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 38: Day 2 Track 2 - IASCT

ROC curve - Uses

5 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

ROC Curve analysis is a standard analytical tool for evaluating diagnostic tests. • It is the plot of sensitivity on the vertical axis and 1-specificity on the

horizontal axis for all possible thresholds in the study data set. • The area under the ROC curve is an effective way to summarize the

overall diagnostic accuracy of the test.

Generally it is used to

• Describe a diagnostic test.- Associated measures - Sensitivity, Specificity, Accuracy, Area

under curve- Often, the calculation of sensitivity and specificity of the test are

depends on the specific threshold selected.• Also, determine a cutoff (Threshold) value for a clinical/

diagnostic test.Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 39: Day 2 Track 2 - IASCT

Definitions – Four possible decisions

6 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Test Result

Condition (Disease)Positive (Present)

Negative (Absent) Row total

Positive TP(# of True Positive)

FP(# of False Positive)

TP+FP (Total # of subjects with positive test)

Negative FN(# of False Negative )

TN (# of True negative)

FN + TN(Total # of subjects with negative test)

Column total

TP + FN(Total # of subjectsWith disease)

FP + TN(Total # ofsubjects without disease)

N= TP+TN+FP+FN(Total # of subjects in study)

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 40: Day 2 Track 2 - IASCT

Measurements of the quality of the test.

7 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Sensitivity : The probability of having a positive test among the patients who have a positive diagnosis. = TP/(TP + FN)

Specificity : The probability of having a negative test among the patients who have a negative diagnosis. = TN/(TN + FP)

Efficiency : TP + TN.

Accuracy : (TN+TP)/(TN+TP+FN+FP) = (sensitivity) (prevalence) + (specificity) (1 - prevalence).Here Prevalence is the probability of disease in the population at a given time, and it is known.

The area under the ROC curve is an effective way to summarize the overall diagnostic the testInd

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 41: Day 2 Track 2 - IASCT

Example 1: PSA as biomarker for Prostate cancer

8 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Example : Simulated Prostate cancer data.

Biomarker : Prostate antigen (PSA)

Cutoff point : 4 ng/mL as a threshold/ cutoff point

Decision:1= Positive (abnormal), if PSA > 4 ng/mL0= Negative(normal), if PSA < 4 ng/mL

Actual Disease status : 1= Yes, 0=No

Note: Though PSA only may not serve as an accurate marker to detect Prostate cancer, I have used this data for the explanation purpose.Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 42: Day 2 Track 2 - IASCT

Example 1: Prostate cancer data (cont.)

9 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

PSA results (0 = < 4 ng/mL = (- Ve) (1 = > 4 ng/mL = (+Ve)

Actual Disease condition 0= ( - ve) = No cancer1= (+ ve) = Cancer

1 11 11 00 10 00 00 11 11 0

..... .....

Data set – PROSTPCA

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 43: Day 2 Track 2 - IASCT

Example 1: Prostate cancer (cont.) – SAS code for Freq counts

10 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

PROC FORMAT;VALUE cutoffmt 0 = “< 4 ng/ml "

1 = “4+ ng/ml "; VALUE prostfmt 0 = “ No Cancer ”

1 = “Cancer ”; RUN;

PROC FREQ DATA= PROSTPCA ORDER=formatted;

FORMAT psa cutoffmt. resp prostfmt.; LABEL psa=‘PCA level' resp='Prostate Cancer'; TABLES psa * resp / NOROW NOPERCENT;

RUN;Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 44: Day 2 Track 2 - IASCT

Example 1: Prostate cancer (cont.) – Frequency counts from SAS output

11 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Test results

Disease status / Condition Total Sensitivity= TP/(TP +FN)=0.83

Specificity= TN/(TN+FP)=0.58

Accuracy=(TN+TP)/(TN+TP+FN+FP) = 0.74

(Cancer) (no cancer)

Positive 66 (TP) 18 (FP) 84Negative 13 (FN) 25 (TN) 38Total 79 (TP+FN) 43 (FP + TN) 122Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 45: Day 2 Track 2 - IASCT

Example 1: Prostate cancer (cont.) - 95 % CI

12 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

PROC FORMAT;VALUE cutoffmt 0 = “< 4 ng/ml "

1 = “4+ ng/ml "; VALUE prostfmt 0 = “ No Cancer ”

1 = “Cancer ”; RUN;

PROC FREQ DATA= PROSTPCA ORDER=formatted; FORMAT psa cutoffmt. resp prostfmt.; LABEL psa=‘PCA level' resp='Prostate Cancer'; TABLES psa / BINOMIAL; EXACT BINOMIAL; WHERE resp =1;

RUN;Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 46: Day 2 Track 2 - IASCT

Example 1: Confidence Intervals from SAS output

13 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

NOTE: SAS generates confidence intervals for the proportion in the first row of the output (in this instance, 83.54%). Therefore, you should make sure that the proportions are listed in the order that places sensitivity (or specificity) in the first row. The code above uses ORDER=formatted to instruct SAS to use formatted values for ordering. Otherwise, the output would have generated confidence intervals for 16.46 %. The BINOMIAL option in the TABLES statement along with the EXACT BINOMIAL statement instructs SAS to produce confidence intervals using the normal approximation of the binomial distribution (asymptotic standard error or ASE) and the exact binomial distribution. SAS is able to efficiently calculate exact confidence intervals, which is the preferred method. Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 47: Day 2 Track 2 - IASCT

Example 1: Prostate Cancer data - ROC Analysis using the LOGISTIC procedure in SAS 9.2

14 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

GENERATING THE ROC CURVE The empirical ROC curve is the plot of sensitivity on the vertical axis and 1-specificity on the horizontal axis for all possible thresholds in the study data set. It is often used to explore thresholds for the application of a new biomarker in clinical practice or to visually assess the overall performance of the biomarker.

With the release of SAS 9.2, ROC curves can be generated using standard ODS STATISTICAL GRAPHICS and simple LOGISTIC procedure statements. The code below generates an ROC curve for the Prostate Cancer data.

ODS GRAPHICS ON; PROC LOGISTIC DATA=prostpsa

PLOTS(ONLY)=ROC; MODEL resp (EVENT='1') = PSA; RUN; ODS GRAPHICS OFF;

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 48: Day 2 Track 2 - IASCT

Example 1: Prostate Cancer data - ROC Curve

15 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

The PLOTS(ONLY)=ROC directs ODS STATISTICAL GRAPHICS to plot an ROC curve without plotting other standard graphs associated with PROC LOGISTIC.

The MODEL statement is constructed using the standard PROC LOGISTIC syntax (dependent variable = covariates) with EVENT=’1’ specified as the outcome we want to predict. Figure:1 ROC curve for biomarker PSA.

ODS GRAPHICS ON; PROC LOGISTIC DATA=prostpsa

PLOTS(ONLY)=ROC; MODEL resp (EVENT='1') = PSA; RUN; ODS GRAPHICS OFF;

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 49: Day 2 Track 2 - IASCT

Example 1: Prostate Cancer Data – The Area under the ROC curve

16 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

The Area Under the curve

The area under the ROC curve (AUC) is the average sensitivity of the biomarker over the range of specificities. It is often used as a summary statistic representing the overall performance of the biomarker. A biomarker with no predictive value would have an AUC of 0.5 (also represented by the diagonal “chance” line above), while a biomarker with perfect ability to predict disease would have an AUC of 1.

In SAS 9.2, the empirical AUC is calculated and printed at the top of the ROC curve generated by PROC LOGISTIC. As shown in Figure 1, the PSA biomarker has an AUC of 0.7084 for the diagnosis of prostate cancer in the sample population.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 50: Day 2 Track 2 - IASCT

Accuracy classification Rule for a diagnostic test

17 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Commonly used classification using AUC for a diagnostic test is summarized in the below table

AUC range Classification0.9 < AUC < 1.0 Excellent0.8 < AUC < 0.9 Good0.7 < AUC < 0.8 Worthless0.6 < AUC < 0.7 Not good

In short, ROC curve is a good tool to select possible optimal cut-point for a given diagnostic test.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 51: Day 2 Track 2 - IASCT

Example 1: Comparing with Chance AUC of 0.5

18 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

The AUC of a biomarker is often compared to chance which has an AUC of 0.5. The statistical test involves estimating AUCtest - AUCchance which is asymptotically normal. The code below performs this task using standard features in PROC LOGISTIC of SAS 9.2 for comparing ROC curves.

ODS GRAPHICS ON;PROC LOGISTIC DATA=PROSTPSA

PLOTS=ROC ROCOPTIONS(NODETAILS);MODEL resp (EVENT='1') = PSA / NOFIT;ROC 'PSA' PSA;ROC 'Chance';ROCCONTRAST REFERENCE('Chance') / ESTIMATE;run;ODS GRAPHICS OFF;

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 52: Day 2 Track 2 - IASCT

Example 1: Comparing with Standard AUC of 0.5

19 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

In our example, the estimated AUC for PSA is statistically greater than 0.5, providing evidence that the PSA biomarker is useful for correctly classifying prostate cancer patients and patients without prostate cancer.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 53: Day 2 Track 2 - IASCT

20 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Example 2: Pancreatic cancer, CA-125 and CA19-9 Biomarkers

Pancreatic cancer data (Wieand et al. 1989)Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 54: Day 2 Track 2 - IASCT

Example 2: Pancreatic cancer, CA-125 and CA19-9 Biomarkers (Contd..)

21 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

As displayed in figure, CA19-9 appears to perform better than CA-125, particularly in the area of the curve representing high specificity. Overall, the AUC of 0.86 for CA19-9 is significantly greater than the AUC of 0.71 for CA-125 (p=0.0065). Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 55: Day 2 Track 2 - IASCT

Q & A

22 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 56: Day 2 Track 2 - IASCT

23 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011

Thank you

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 57: Day 2 Track 2 - IASCT

Comparison of Multiple SAS Procedures to Perform Statistical Activity for the same Objective

Pradeep AcharyaIASCT – ConSPIC 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 58: Day 2 Track 2 - IASCT

Contents

• Introduction to hypothetical study and its objectives

• Introduction to Analysis of Covariance (ANCOVA)

• SAS procedures to perform ANCOVA

• ANCOVA results with GLM procedure

• ANCOVA results with MIXED procedure

• Comparison of GLM and MIXED procedures

• Conclusion

• Acknowledgement

• References

Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011 2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 59: Day 2 Track 2 - IASCT

Introduction to hypothetical study & its objectives

A multi-center, randomized, double blind, 24 weeks study to

compare efficacy and safety of Treatment A with Treatment B

in patients with Hypertension

3Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 60: Day 2 Track 2 - IASCT

Introduction to the study continued...

Inclusion Criteria:• Clinical diagnosis of hypertension

• Age between 18 years and 75 years

Exclusion Criteria:• Systolic blood pressure >170 mmHg and/or diastolic blood pressure

of >105 mmHg

• Second or third-degree atrio-ventricular block

• Neurological disorders (such as Parkinson's disease, multiple sclerosis or peripheral neuropathy)

4Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 61: Day 2 Track 2 - IASCT

Introduction to the study continued...

Primary objective :

• To evaluate the efficacy of Treatment-A versus Treatment-B in patients with hypertension by assessing the reduction in Blood pressure from baseline after 24 weeks of treatment

Secondary objective:

• To evaluate the safety of Treatment-A versus Treatment-B in patients with hypertension over 24 weeks of treatment

5Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 62: Day 2 Track 2 - IASCT

Structure of the dataset

6Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 63: Day 2 Track 2 - IASCT

Introduction to ANCOVA

Analysis of Covariance is a statistical method which is combination of Analysis of Variance (ANOVA) and Regression Analysis

Purposes:

• To increase the precision of comparisons between groups by accounting to variation on important prognostic variables

• To “adjust” comparison between groups for imbalances in important prognostic variables between these groups

7Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 64: Day 2 Track 2 - IASCT

Introduction to ANCOVA continued...

8Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 65: Day 2 Track 2 - IASCT

SAS procedures to perform ANCOVA

9Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 66: Day 2 Track 2 - IASCT

SAS procedures to perform ANCOVA continued...

10Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 67: Day 2 Track 2 - IASCT

ANCOVA results with GLM procedure

11Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 68: Day 2 Track 2 - IASCT

ANCOVA results with GLM procedure continued...

12Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 69: Day 2 Track 2 - IASCT

ANCOVA results with MIXED procedure

13Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 70: Day 2 Track 2 - IASCT

ANCOVA results with MIXED procedure continued...

14Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 71: Day 2 Track 2 - IASCT

Comparison of GLM and MIXED procedures

15Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Ignore for ANCOVA

Ignore for ANCOVA

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 72: Day 2 Track 2 - IASCT

Conclusion

• Depends on programmer’s confidence and flexibility any of these methods can be used to perform ANCOVA

• Both of these methods are robust to perform ANCOVA

16Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 73: Day 2 Track 2 - IASCT

Acknowledgement

• Sincere thanks to my manager Priti Pandey and my

colleagues for their support and inspiration which made me

to attend/present in this conference

17Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 74: Day 2 Track 2 - IASCT

References

• Analysis of Covariance, Medical Statistics course: MD/PhD students, Faculty of Medicine & MED819: ANCOVA. (can be obtained @ http://www.mas.ncl.ac.uk/~njnsm/medfac/docs/ancova.pdf)

• Comparing the SAS GLM and MIXED procedures for repeated measures, Russ Wolfinger and Ming Chang, SAS Institute Inc., Cary, NC. (can be obtained @ http://www.ats.ucla.edu/stat/sas/library/mixedglm.pdf)

• Statistical considerations in a protocol, Pradeep Acharya, Thesis work of Biostatistics.

• SAS/STAT 9.2 user guide, second edition, SAS Institute Inc., Cary, NC, USA

• http://www.ats.ucla.edu/stat/sas/library/SASExpDes_os.htm

• http://rfd.uoregon.edu/files/rfd/StatisticalResources/glm04_mixed_why.txt

18Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 75: Day 2 Track 2 - IASCT

19Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 76: Day 2 Track 2 - IASCT

20Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 77: Day 2 Track 2 - IASCT

Periasamy KNovartis Healthcare Pvt ltd.

Dorfman-Whitlock DO- (DOW) LoopA Loop for N to one data step programming

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 78: Day 2 Track 2 - IASCT

All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis

Disclaimer

2 | ConSPIC 2011 | Periasamy K| Sep 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 79: Day 2 Track 2 - IASCT

oDOW loop is actually a DO UNTIL loop.

oThe W just comes from the name of the person who thought about that (Whitlock).

oThe DOW loop is a technique that moves the DATA step SET statement inside of an explicitly-coded DO-loop.

oProgrammer can control the retention of variable values and the population of the Program Data Vector (PDV) by controlling a certain break-event.

o DOW loop allows to go through every records of a block in a single iteration of a DATA step No re-initialisation of variable is done in the PDV

Do Until - loop

| ConSPIC 2011| Periasamy K| Sep 30, 2011 3

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 80: Day 2 Track 2 - IASCT

data ... ;

<Statements before do -loop> ;

do < Iteration(Optional)> until ( break-event ) ;

set … ;

by …;

< Statement inside the loop> ;

Output;

end ;

<Statements after break-event... > ;

run ;

Structure of Do until - Loop

| ConSPIC 2011| Periasamy K| Sep 30, 2011 4

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 81: Day 2 Track 2 - IASCT

data ... ;

<Statements before do -loop> ;

do < Iteration(Optional)> until ( break-event ) ;

set … ;

by …;

< Statement inside the loop> ;

end ;

<Statements after break-event... > ;

do < Iteration(Optional)> until ( break-event ) ;

set … ;

by …;

< Statement inside the loop> ;

Output;end ;

run ;

Double Do Until- Loop

| ConSPIC 2011| Periasamy K| Sep 30, 2011 5

Explicit output statement

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 82: Day 2 Track 2 - IASCT

oSteps required before the first record in the by-group is read.

oStatements inside the DOW-loop, for each record in the by-group.

oStatements need to be done after the last record in the by-group has been processed.

oDouble DOW-loop, Updates the calculated variable value into all observation for by group.

oExplicit output statement to write all observation into output dataset. Otherwise only one observation for by group into output dataset.

Dow Loop

| ConSPIC 2011| Periasamy K| Sep 30, 2011 6

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 83: Day 2 Track 2 - IASCT

o Baseline and change from baseline calculation

o Summary calculation with average for baseline calculation

o Abnormal value flagging

o Transpose more than 1 variable in 1 data step

Useful scenarios

| ConSPIC 2011| Periasamy K| Sep 30, 2011 7

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 84: Day 2 Track 2 - IASCT

oBaseline value is last non-missing value on or before treatment start day

oCalculate change from baseline

o Approach 1: Two data step for baseline calculation and change from baseline calculation

oApproach 2:One data step with Dow loop

Baseline and change from baseline calculation

| ConSPIC 2011| Periasamy K | Sep 30, 2011 8

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 85: Day 2 Track 2 - IASCT

data base;

do until(last.patient);

set bpp;

by PATIENT;

if DAY<=0 and SUPSYS then

b_supsys=SUPSYS;

end;

do until (last.patient);

set bpp;

by PATIENT;

if DAY > 0 then c_sysbp=SUPSYS - b_supsys;

output; /* Explicit output statement to get all the observations from the data set*/

end;

run;

Baseline and change from baseline in one step

| ConSPIC 2011| Periasamy K| Sep 30, 2011 9

Baseline is last non-missing value before treatment start.

Change from baseline calculation.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 86: Day 2 Track 2 - IASCT

o Baseline is mean of all pre-dose assessment

o Number of non-missing value assessment for mean value calculation

o Min and Max value in all visit for the subject

o Approach 1: One Proc Step and Two data step

o Approach 2: One data step with Dow loop.

Baseline calculation with summary

| ConSPIC 2011| Periasamy K| Sep 30, 2011 10

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 87: Day 2 Track 2 - IASCT

Baseline calculation with summary

| ConSPIC 2011| Periasamy K | Sep 30, 2011 11

data base2;max=0;min=1000; do until (last.patient);

set bpp;by PATIENT;if first.patient then nb=0;

if DAY<=0 and SUPSYS >. then do; nb+1;b_supsys=sum(SUPSYS,b_supsys);end;IF SUPSYS > max THEN max = SUPSYS;if SUPSYS < min THEN min = SUPSYS;

end;s_supsys=round(b_supsys/nb,0.001);

do until (last.patient);set bpp;by PATIENT;if DAY > 0 then do;c_sysbp=SUPSYS-s_supsys;end;

output;end;

run;

Baseline is mean value before treatment start.

Select all non-missing value on or before treatment start day.

Mini and Max calculation

Mean calculation.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 88: Day 2 Track 2 - IASCT

oMax , Min, N as “nb” and pre-dose assessment mean value as S_supsys.

Output dataset

| ConSPIC 2011| Periasamy K | Sep 30, 2011 12

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 89: Day 2 Track 2 - IASCT

oFlag subject with abnormal BP value < 120

o Flag ‘Y’ for subject with abnormal BP

o Flag ‘N’ for subject with normal BP

o Approach 1: Two data step; Step1: Flag individual abnormal value Step2: Merge with source data set

oApproach 2: Proc SQL, with sub query

oApproach 3: One data step with DOW -loop

Abnormal value flagging

| ConSPIC 2011| Periasamy K| Sep 30, 2011 13

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 90: Day 2 Track 2 - IASCT

data base;

do until (last.patient);

set bpp2;

by PATIENT;

if 120 > SUPSYS >. then miss=1;

end;

do until (last.patient);

set bpp2;

by PATIENT;

if miss=1 then LOW="Y";

else if miss=. then LOW="N";

output;

end;

run;

Abnormal value flagging

| ConSPIC 2011| Periasamy K | Sep 30, 2011 14

Flag abnormal BP value less 120

Flag subject with Y or N atleast one abnormal value

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 91: Day 2 Track 2 - IASCT

oAll observation for the subject flagged

Output dataset

| ConSPIC 2011| Periasamy K| Sep 30, 2011 15

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 92: Day 2 Track 2 - IASCT

o Transpose 3 variables by group

o Summary dataset variables(N, Mean,Median) needs to be transposed by group variable

o Approach 1: Three Proc transpose and One data step merge.

o Approach 2: One data step with DOW loop.

Transpose more than 1 variable in 1 data step

| ConSPIC 2011| Periasamy K| Sep 30, 2011 16

Source dataset

Transpose dataset

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 93: Day 2 Track 2 - IASCT

data Trans2(drop=i treatmnt SUPSYS_N SUPSYS_Mean SUPSYS_Median);

array na n1-n3;

array mean Mean1-Mean3;

array median Median1-Median3;

do i=1 by 1 until (last.group);

set trns;

by group;

na(i)=SUPSYS_N;

mean(i)=SUPSYS_Mean;

median(i)=SUPSYS_Median;

end;

output;

run;

Transpose more than 1 variable in 1 data step

| ConSPIC 2011| Periasamy K | Sep 30, 2011 17

Define number of arrays for each variable to transpose

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 94: Day 2 Track 2 - IASCT

Reference:

* Practical Uses of the DOW-loop, R.R. Allen, Phuse 2009.

| ConSPIC 2011| Periasamy K| Sep 30, 2011 18

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 95: Day 2 Track 2 - IASCT

Questions ?

| ConSPIC 2011| Periasamy K | Sep 30, 2011 19

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 96: Day 2 Track 2 - IASCT

THANK YOU!

| ConSPIC 2011| Periasamy K | Sep 30, 2011 20

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 97: Day 2 Track 2 - IASCT

Strategic Engagements , Integrating Excellence !

1

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 98: Day 2 Track 2 - IASCT

Hashing Unleashed!!!!

Pratibha Jalui

Alliance Manager,

SCEDAM (SBU of SIRO Clinpharm Pvt. Ltd)

2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 99: Day 2 Track 2 - IASCT

Introduction

Common

Merging datasets – common - at times challenging task

Popular ‘sort’ - ‘merge’ - ‘SQL joins’ - limitation in terms of decline in performance - increased turnaround time.

Significance in today’s context - large volumes of clinical data -performing efficient merging -very critical.

3

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 100: Day 2 Track 2 - IASCT

Introduction

Hash tables can help . . .

Achieve great I/O efficiencies

Enormous time savings when merging.

Fast, easy way to perform lookups without sorting or indexing.

Consumes memory, if necessary, at the run-time.

4

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 101: Day 2 Track 2 - IASCT

What is a Hash object ?

In-memory lookup table accessible from the DATA step.

Loaded with records, only available from the DATA step that creates it.

Two parts:

– Key part: one or more character and numeric values.

– Data part: zero or more character and numeric values.

5

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 102: Day 2 Track 2 - IASCT

How does a hash object work?

Once a hash object is loaded with records, a lookup occurs by passing a key to the hash object's FIND method.

If a record with the particular key is found, the data part of the record is copied into DATA step variables.

In addition to being able to add and find records, there are methods to replace records, remove records, and output records to a data set.

6

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 103: Day 2 Track 2 - IASCT

Syntax for Hashingdata New_Dataset_Name;length variable1 format1… variableN $ formatN;if _N_ = 1 then do;

declare hash h(dataset:'lookup_dataset_name');h.defineKey('common_variable'); h.defineData(‘variable1’,.., 'variableN');h.defineDone();call missing (‘variable1’,.., 'variableN');

end;set Input_Dataset;if h.find() = 0 then

output;run;

7

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 104: Day 2 Track 2 - IASCT

Example to show how a hash object is declared, loaded, and how lookups occur.

Dataset Participant.sas7bdat

8

Applications of Hashing

NAME GENDER TREATMENT

John M Placebo

Ronald M Drug-A

Barbara F Drug-B

Alice F Drug-A

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 105: Day 2 Track 2 - IASCT

Dataset02: Weight.sas7bdat

9

Applications of Hashing

DATE NAME WEIGHT

5-May-06 Barbara 125

5-May-06 Alice 130

5-May-06 Ronald 170

5-May-06 John 160

4-Jun-06 Barbara 122

4-Jun-06 Alice 133

4-Jun-06 Ronald 168

4-Jun-06 John 155Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 106: Day 2 Track 2 - IASCT

Objective: To merge the name, gender, and treatment with the weight data for an analysis that occurs later i.e. the expected resultant dataset should look like for e.g. Results.sas7bdat

10

Applications of Hashing

NAME TREATMENT GENDER DATE WEIGHT

Barbara Drug-B F 5-May-06 125

Alice Drug-A F 5-May-06 130

Ronald Drug-A M 5-May-06 170

John Placebo M 5-May-06 160

Barbara Drug-B F 4-Jun-06 122

Alice Drug-A F 4-Jun-06 133

Ronald Drug-A M 4-Jun-06 168

John Placebo M 4-Jun-06 155Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 107: Day 2 Track 2 - IASCT

How to do it using hashing

data results;length name treatment $ 8 gender $ 1;if _N_ = 1 then do;

declare hash h(dataset: 'participants');h.defineKey('name');h.defineData('gender', 'treatment');h.defineDone();call missing(gender, treatment);

end;set weight;if h.find() = 0 then

output;Run;

11

Applications of Hashing

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 108: Day 2 Track 2 - IASCT

Steps to define a hash object

STEP 01: LENGTH STATEMENT length name treatment $ 8 gender $ 1

Each variable used to define the hash table must be defined in a LENGTH statement.

Else ERROR: Variable <variable name> has been defined as both character and numeric.

12

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 109: Day 2 Track 2 - IASCT

Steps to define a hash object

STEP 02: DECLARE THE HASH OBJECT declare hash h(dataset:'participants');

'participants’ -name of the SAS® data set from which the hash table, h, will be populated. Must be bounded with single- or double-quotation marks.

DECLARE or DCL statement creates an object

Keyword HASH is specified after DECLARE

To manipulate the hash object in the code, it must be given a name after the keyword HASH

13

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 110: Day 2 Track 2 - IASCT

STEP 03: DEFINE THE KEY FOR THE HASH OBJECT Rc = h.DefineKey(‘name'); OR h.DefineKey(‘name');

DefineKey -method used to define the hash table’s key.

H - name of the hash object coded

Rc is a numeric variable into which the return code value, from executing the DefineKey method, is stored.

“name” - column name from the data set ‘participants' that is used as the hash table key. To be enclosed in single- or double-quotation marks.

14

Steps to define a hash object

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 111: Day 2 Track 2 - IASCT

STEP 04: DEFINE THE DATA VALUES FOR THE HASH OBJECT Rc = h.defineData('gender', 'treatment') ; OR h.defineData('gender', 'treatment') ;

DefineData is a method to define the values returned when the hash table is searched.

Is optional.

'gender‘,'treatment'- columns from the data set, 'participants' that is used as the hash table’s data value. To be enclosed within single- or double-quotation marks.

15

Steps to define a hash object

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 112: Day 2 Track 2 - IASCT

Steps to define a hash object

STEP 05: CONCLUDE THE HASH OBJECT DECLARATION Rc = h.defineDone();or h.defineDone();

DefineDone is a method used to conclude the declaration of the hash table.

STEP 06: INITIALIZE THE VARIABLES USED IN THE HASH OBJECT TO MISSING VALUES call missing (gender,treatment);

Is used to suppress the following messages from the SAS®: Log: NOTE: Variable ‘Gender’ is uninitialized.

NOTE: Variable ‘treatment’ is uninitialized

16

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 113: Day 2 Track 2 - IASCT

STEP 07: SEARCHING THE HASH OBJECT Rc = h.Find();

‘Find’ is a method used to search the hash table

A return code value of zero indicates that the execution of the Find method was successful.h.Find(key:variable_name);

NOTE: Only variables listed in the DefineKey and DefineData methods will be stored in the hash table.

17

Steps to define a hash object

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 114: Day 2 Track 2 - IASCT

Example to ADD, REPLACE, AND OUTPUT using hashing

Objective: To concatenate each player's goal times into a comma separated list and output the results to a data set.

» Input dataset –goal.sas7bdat«

18

Applications of Hashing

PLAYER WHEN

Hill 1st 01:24

Jones 1st 09:43

Santos 1st 12:45

Santos 2nd 00:42

Santos 2nd 03:46

Jones 2nd 11:15Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 115: Day 2 Track 2 - IASCT

Example to ADD, REPLACE, AND OUTPUT using hashing

»Output Dataset: goal_summary «

19

Applications of Hashing

PLAYER GOALS_LIST

Hill 1st 01:24

Santos 1st 12:45, 2nd 00:42, 2nd 03:46

Jones 1st 09:43, 2nd 11:15

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 116: Day 2 Track 2 - IASCT

Example:» ADD, REPLACE, AND OUTPUT using hashingDATA _NULL_;

LENGTH goals_list $ 64;If _N_ = 1 THEN DO;

DECLARE hash h();h.defineKey('player');h.defineData('player', 'goals_list');h.defineDone();

END;SET goals end=done;IF h.find() ^= 0 THEN DO; /*key variable player has to exist in goals */

goals_list = when;h.add();

END;ELSE DO;

goals_list = trim(goals_list) || ', ' || when;h.replace();

END;IF done THEN

h.output(dataset:'goal_summary');RUN;

20

Applications of Hashing

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 117: Day 2 Track 2 - IASCT

Benefits of Hashing

Processing time for huge datasets reduced by almost 90%

Key lookup occurs in memory, avoiding costly disk access.

When a key lookup occurs, only a small subset of the records are searched-reducing load on memory.

The hash object allocates memory as records are added-utilizing memory efficiently.

When loading a hash object from a data set, the data set need not be sorted or indexed.

Smart strategy for programming!!!!!!!!!!!!!!!!!!

21

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 118: Day 2 Track 2 - IASCT

Common messages & missteps of Hashing

NOTE: VARIABLE XXXX IS UNINITIALIZED.

To avoid these NOTES, you can use the CALL MISSING routine to initialize variables to missing

ERROR: UNDECLARED DATA SYMBOL FOR XXXX FOR HASH OBJECT AT LINE N COLUMN M.

Ensure you have a LENGTH statement for the variable names passed to DEFINEKEY and DEFINEDATA and have given those variables an initial value as well.

22

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 119: Day 2 Track 2 - IASCT

Common messages & missteps of Hashing

ERROR 559-185: INVALID OBJECT ATTRIBUTE REFERENCE XX.YY.

Ensure there is parentheses after the FIND method

ERROR 557-185: VARIABLE XXXX IS NOT AN OBJECT.

Ensure you do not misspell the name of the hash object H in the one of the methods.

DUPLICATE KEYS

Ensure that the keys are unique

23

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 120: Day 2 Track 2 - IASCT

C nclusi n The SAS hash object is much more than an in-

Memory table look-up.

When it comes to very large data sets and multiple common variables ( keys), hashing is one of the most efficient and viable solution.

SAS programmers should see the hash object as an alternative to any proc- sql-join or data-step merge.

24

Only limit to the amount of data that can be loaded into a hash object is the amount of memory available to the SAS session.Ind

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 121: Day 2 Track 2 - IASCT

Contact information

Your comments /suggestions / questions are valued and

encouraged.

Contact the author at:

Pratibha Jalui

SIRO Clinpharm Pvt. Ltd.

Email id: [email protected]

25

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 122: Day 2 Track 2 - IASCT

26

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 123: Day 2 Track 2 - IASCT

27

Any Questions

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 124: Day 2 Track 2 - IASCT

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 125: Day 2 Track 2 - IASCT

Author: Ramesh SundaramDate: 30-Sep-2011Company: Novartis Healthcare Pvt. Ltd.

Proc Transpose for Horizontal Data

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 126: Day 2 Track 2 - IASCT

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Disclaimer

All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 127: Day 2 Track 2 - IASCT

Outline

Proc Transpose

Horizontal data

Vertical data

Baseline calculation for Horizontal data • Without Transpose• With Transpose

Summary Report for Horizontal data• Without Transpose• With Transpose

Conclusion

3 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 128: Day 2 Track 2 - IASCT

Proc Transpose

To convert Horizontal data in to vertical data or vice versa.

Syntax

PROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET> <NAME=name> <OUT=output-data-set> <PREFIX=prefix>;

BY <DESCENDING> variable-1 <…<DESCENDING> variable-n> <NOTSORTED>;

COPY variable(s);

ID variable;

IDLABEL variable;

VAR variable(s);

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 129: Day 2 Track 2 - IASCT

Horizontal data

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

VS: Vital sign dataset

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 130: Day 2 Track 2 - IASCT

Vertical data

VS: Vital sign dataset

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 131: Day 2 Track 2 - IASCT

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Baseline calculation for Horizontal data

Baseline calculation (LOCF) for following data as Input

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 132: Day 2 Track 2 - IASCT

Baseline calculation for Horizontal data

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Output:

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 133: Day 2 Track 2 - IASCT

Code without Transpose: Method1

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 134: Day 2 Track 2 - IASCT

Code without Transpose: Method 2

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 135: Day 2 Track 2 - IASCT

Code with Transpose: Method 3

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 136: Day 2 Track 2 - IASCT

Advantages of method 3 over 1 & 2

Simple

No redundancies

Macros/Macro functions are not used

Loops are not used

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 137: Day 2 Track 2 - IASCT

Summary Report for Horizontal data

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 138: Day 2 Track 2 - IASCT

Code without Transpose: Method1

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 139: Day 2 Track 2 - IASCT

Code without Transpose: Method 2

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 140: Day 2 Track 2 - IASCT

Code with Transpose: Method 3

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 141: Day 2 Track 2 - IASCT

Conclusion

Transpose procedure for horizontal data will make the program simple and efficient in the below scenarios- Baseline Calculation- Endpoint Calculation- Summary reports

From programming perspective, vertical data is more flexible then horizontal data in the above scenarios

2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 142: Day 2 Track 2 - IASCT

SenthilKumar

Let’s go for a picture

September 29 – 30, 2011

ConSPIC@

on

by

Let’s go for a Picture

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 143: Day 2 Track 2 - IASCT

Let’s go for a picture

What is Format? Formats :

- contains the instructions used by the SAS System to display, or portray the values of variables

- two broad classes of SAS Formats

1. VALUE - either “supplied” or “internal” to the SAS System. Assigns a label or text string to a range of values

2. PICTURE - create a series of templates for displaying the data

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 144: Day 2 Track 2 - IASCT

Let’s go for a picture

What is Picture Format? Why do we need ? How to use ? Options in Picture Format What’s happening behind? Getting with Examples Challenges of using Picture Format?

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 145: Day 2 Track 2 - IASCT

Let’s go for a picture

• Picture Format -- > Creates a template that is used to display the values of variable

• Template -- > Control how the values are displayed in our SAS-generated output

• Picture Formats is same as Value Formats.

• As like Value formats, we can use PROC FORMAT tools such as the FMTLIB, CNTLIN, CNTLOUT and MULTILABEL in Picture Formats also

What is Picture Format?

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 146: Day 2 Track 2 - IASCT

Let’s go for a picture

Why do we need a Picture Format?

Output:

Mockup Requires:

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 147: Day 2 Track 2 - IASCT

Let’s go for a picture

Why do we need a Picture Format?(contd…)

Output:

12

34

5

1. Name of Picture Format

2. Range of values to which the Format will be applied (0, 1-9)3. PREFIX option, which, like the DEFAULT option4. The template, showing a series of digit selectors5. Default length of the Picture Format 16 characters

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 148: Day 2 Track 2 - IASCT

Let’s go for a picture

What’s happening behind the Picture Format?

Output:

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 149: Day 2 Track 2 - IASCT

Let’s go for a picture

Options in Picture Format...

Output:

Control the attributes of each picture in the format

FILL =Specify a character that completes the formatted value.

MULTIPLIER =Specify a number to multiply the variable's value by before it is formatted.

NOEDITSpecify that numbers are message characters rather than digit selectors.

PREFIX = Specify a character prefix for the formatted value.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 150: Day 2 Track 2 - IASCT

Let’s go for a picture

Control the attributes of the format

DATATYPE= Specify that you can use directives in the picture as a template to format date, time, or datetime values.

DEFAULT= Specify the default length of the format.DECSEP= Specify the separator character for the fractional part

of a number.

DIG3SEP= Specify the three-digit separator character for a number.

FUZZ= Specify a fuzz factor for matching values to a range.MAX= Specify a maximum length for the format.MIN= Specify a minimum length for the format.MULTILABEL Specify multiple pictures for a given value or range

and for overlapping ranges.

NOTSORTED Store values or ranges in the order that you define them.

ROUND Round the value to the nearest integer before formatting.

Options in Picture Format... (contd..)

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 151: Day 2 Track 2 - IASCT

Let’s go for a picture

Getting into the Picture..

Output:

Mockup Requires:

Picture - 1

Frequency Report..

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 152: Day 2 Track 2 - IASCT

Let’s go for a picture

Getting into the Picture..

Output:

Mockup Requires:

Picture - 1

Frequency Report..

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 153: Day 2 Track 2 - IASCT

Let’s go for a picture

Getting into the Picture..

Mockup Requires:

Picture - 2

Summary Report..

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 154: Day 2 Track 2 - IASCT

Let’s go for a picture

Challenges.. Any text in front of the first digit selector is ignored

Ex : low - high = 'XX999) - 999 – 9999‘

Other than digit selectors, If you want to include a text message, you should use the NOEDIT option. Else, SAS will interpret those text as digit selectors.

Ex : picture miles 1-1000 = '0000' 1000<-high = ‘ >1000 miles'(noedit);

If you use the FILL= and PREFIX= options in the same picture, then the format places the prefix and then the fill characters.

Ex : low-high='00,000,000.00' (fill='*' prefix='$'); Output : ****$1,259.45

NINE represents 0 and ZERO represents blanksIndian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 155: Day 2 Track 2 - IASCT

Let’s go for a picture

Challenges.. The ROUND option should be used for the continuous data to not be truncated. By default SAS truncates the data.

Easy to produce misleading output with a PICTURE format if the template does not accommodate the data, or the specified ranges do not capture all the data values.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 156: Day 2 Track 2 - IASCT

Thank You!

PICTURE FORMAT is a powerful tool for displaying data. (If handled with CARE)

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 157: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 1 -

Electronic Validation For Clinical Data And Summary Reports.

Amruta Pathak, Bhushan Kulkarni

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 158: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 2 -

Agenda

• Validation• Electronic Validation• Comparing Proc Report Datasets• Examples of Report dataset• Use of “diff” command in Unix• Example of output by “diff” command• Other methods of Electronic Validation• Advantages of Electronic Validation

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 159: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 3 -

Validation

• Importance– High Quality deliverables– Required at different phases of clinical trial study– General Practice

• Create two reports and compare

• Methods– Traditional method

• Manual checks• Spot checks (Random Checks), Cross check reports

– Programming method• Use of SAS procedures• 100 % check

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 160: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 4 -

Electronic Validation

• Required for multiple runs• No manual checks• Input 1 (1st line report) ---------- Difference

betweenInput 2 (QC report)---------------- two reports

• Various ways of automated approach– Comparing Proc Report Datasets– Use of “diff” command on UNIX– Use of customized shells– Use of SAS/ASSIST– Use of SAS Format

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 161: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 5 -

Comparing Proc Report Datasets

• Most common procedure to produce reports: PROC REPORT• Proc Report Dataset : Datasets going into SAS procedure

PROC REPORT • Compare the two datasets by PROC COMPARE• Transforming report to dataset• Need to define the specification of Proc Report Dataset• QC report needs to be in proper format

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 162: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 6 -

Example of Reports-1Report 1

Report 2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 163: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 7 -

Output by Comparing Proc Report DatasetsReport 1 Dataset Report 2 Dataset

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 164: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 8 -

Proc Compare Output

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 165: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 9 -

Example of reports-2Report 1

Report 2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 166: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 10 -

Output by Comparing Proc Report DatasetsReport 1 Dataset Report 2 Dataset

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 167: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 11 -

Example of listingListing 1

Listing 2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 168: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 12 -

Validation of listingListing 1 Dataset

Listing 2 Dataset

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 169: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 13 -

Use of “diff” command in Unix

• Need to provide paths for the two reports• Does a line by line comparison• Difference of a space is also captured• Used to compare various versions of same code• Used during huge number of re-runs

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 170: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 14 -

Example of output by “diff” command

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 171: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 15 -

Other Methods of Electronic Validation• Use of customized shells

– If multiple tables have the same template– Reports created by Macros and validated – Macros written for validation codes– Proc Compare of the input data to the Macro

• Use of SAS/ASSIST– SAS/ASSIST provides the tools for the user to produce output using

report writing and graphing to help facilitate the verification process – To validate listings, Proc Prints of the data set(s) being used are

compared to the listing output. – A dummy data set is created _null_. – The same data using a Proc Print defined through SAS/ASSIST.– This assurance would be done by spot-checking the Proc Print versus

the Data _null_ output – summary tables can be verified using the interactive Proc Tabulate

definition through SAS/ASSIST

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 172: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 16 -

Other Methods of Electronic Validation

• Use of SAS Format

– The SAS solution described here involves a macro and a user-defined format that performs validation checks on a data set, then produces a report.

– Create several formats that contain validation checks. E.g The numeric formats qcpat, qclab, and qcdose contain a collection of checks specific to the several data sets PATIENTS, LABS, and DOSE, respectively.

– You can add or delete validation checks simply by modifying the format. – Also, notice that the format library containing the validation checks and the

data sets can reside in separate data libraries. – Each format is an independent collection of Boolean expressions that are

pertinent to a specific SAS data set used for the sole purpose of validating it.

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 173: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 17 -

Advantages of Electronic Validation

• Saves time when huge Number of Re-Runs• Can be applicable for Pooled study• Less risk of manual errors• More accurate for Huge data• Reduces the time to market

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 174: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 18 -

References:

http://www2.sas.com/proceedings/sugi31/018-31.pdfhttp://www.phuse.eu/download.aspx?type=cms&docID=536http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER230.PDFhttp://www.nesug.org/Proceedings/nesug97/advtut/gerlach.pdf

Acknowledgement:

• We gratefully acknowledge the support provided by TCS Management Team and IASCT for providing this opportunity .

Contact:• [email protected][email protected]

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 175: Day 2 Track 2 - IASCT

CONFIDENTIALCorporate Presentation- 19 - 19

Thank You.!

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 176: Day 2 Track 2 - IASCT

Priya Iyer, Neha Singh and Anand BoopalanOncology Biometrics, Hyderabad

Why should stats have all the fun?

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 177: Day 2 Track 2 - IASCT

Disclaimer

All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 20112

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 178: Day 2 Track 2 - IASCT

Quality concerns and the facts

The Way Forward

Are we First Time Right?- Let’s review

Explore the story behind the statistical procedures

What to keep in mind- A checklist

Agenda

3 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 179: Day 2 Track 2 - IASCT

What makes them say... “This report is not what I want”

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

• Alignment & Indentation• Spelling Mistakes• Missing Titles and

Footnotes• Truncation• Precision & Accuracy• Formatting• Data used• Logical concept• Reports not making sense• Reporting measures• Procedures used• Information displayed

4

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 180: Day 2 Track 2 - IASCT

Science: Are we sure of the science behind the analysis?• Reports cannot be generalized always.

Reports: Do we understand what our reports are trying to say?• Every report speaks a story and has its own importance.

Procedures: Are we always sure of the what is produced while STAT procedures are run? • Ignorance of the concepts behind the procedures in SAS and

sometime not sure what to choose amongst the huge content that these procedures end up in displaying.

The fact

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 20115

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 181: Day 2 Track 2 - IASCT

Today, we shall focus on • Helping the programmer with primary checks on SAS/STAT based

reports before sending them out for review- Showcasing the common mistakes while displaying summary statistics,

counts, percentages. - An overview of some Advanced Statistical Procedures like LIFETEST,

LOGISTIC, PHREG- Recommending selective information out of what is produced using these

procedures- Some hand check tips to check small statistical concepts

The way forward

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 20116

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 182: Day 2 Track 2 - IASCT

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Are we first time right?

7

Precision: Up to what level?

Indentation on decimal points?

Mean? Isn’t that looking strange

Summaries speak a lot of informationIndian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 183: Day 2 Track 2 - IASCT

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Are we first time right?

8

Reporting need not be generalizedIndian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 184: Day 2 Track 2 - IASCT

9 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Are we missing some stars here?

A report is complete when everything speaks the same language

Are we first time right?

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 185: Day 2 Track 2 - IASCT

PROC LOGISTIC

PROC LIFETEST

PROC PHREG

Explore the story behind statistical procedures

10 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 186: Day 2 Track 2 - IASCT

Required: To find the number of responses in each arm and also calculate Odds Ratio

Reports and Procedures - Using Odds ratio

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 201111

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 187: Day 2 Track 2 - IASCT

Procedures that could be used

PROC FREQ PROC LOGISTIC

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Programming Steps

12

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 188: Day 2 Track 2 - IASCT

Confirm your output is right

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 201113

Odds in Arm A = 1/(5-1) = 1/4 = 0.25Odds in Arm B = 20/(76-20) = 20/56 =0.357Odds Ratio = 0.25/0.357 = 0.700

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 189: Day 2 Track 2 - IASCT

Choose the right information from the output generated or you may end up with wrong information displayed• Don’t confuse between Relative risk and Odds Ratio

OR = ad/bc RR = (a/(a + b))/(c/(c + d))

Defining the right values- Event =1 and Non–event = 0

Response level ordering- PROC LOGISTIC by default models the probability of response levels with

lower ordered value

Some facts on the procedures

14 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 190: Day 2 Track 2 - IASCT

15 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Reports and Procedures - Using LIFETEST

Required: To find the number of deaths and censored, produce the KM estimates, Median and 25th, 75th percentiles for OSInd

ian A

ssoc

iation

for S

tatist

ics in

Clin

ical T

rials

Page 191: Day 2 Track 2 - IASCT

16 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Programming Steps

Procedures to Use: • PROC LIFETEST

No of events and censored

Time variable * censor variable

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 192: Day 2 Track 2 - IASCT

KM estimates and confidence intervals

Median and Quartile estimates

17 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Programming Steps

Confidence intervals

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 193: Day 2 Track 2 - IASCT

For reporting graphs

Reporting graphs

18 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 194: Day 2 Track 2 - IASCT

The outsurv option outputs the survival probabilities at time points and the respective upper and lower limits

To output the other information use specific ods output option

Some of the default options are- ALPHA = 0.05- METHOD = KM/PL- CONFTYPE =LOGLOG

Wilcoxon and Log rank tests are produced for testing the homogenity of survival curves over strata

Some facts on the procedures

19 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 195: Day 2 Track 2 - IASCT

ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 201120

Reports and Procedures - Using PHREG

Required: To find the Hazard Ratio and CI based

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 196: Day 2 Track 2 - IASCT

21 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Programming Steps

Procedures to Use: • PROC PHREG

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 197: Day 2 Track 2 - IASCT

Setting the data• Choose the right population for determining the N’s• Ensure the censor, time variables are defined clearly ( eg: days, months)• Definitions of the event and non event to be consistent

Using the procedures• Based on the requirement use the right procedure (eg: stratified, PHREG)• Pass the right options in the procedure (eg: reference variables, censor(1))

Choose what to display from the output• Requirements to be based on what needs to be reported (eg: survival, failure)

Reporting• Consistent usage of functions across summary statistics (Eg: PUT, ROUND)• Graphs and the statistics displayed to be in sync

What to keep in mind – A checklist

22 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 198: Day 2 Track 2 - IASCT

Check your reports for P values and CI’s• Confidence Intervals

- The summary value should fall within the interval- If the null value lies with the CI then p-value should be insignificant

(p-val >0.05)

• Probabilities- Always in between 0 and 1- Survival probabilities are always decreasing

23 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

What to keep in mind – A checklist

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 199: Day 2 Track 2 - IASCT

24 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 200: Day 2 Track 2 - IASCT

Presenter,Ravinder Arakati

10 SIMPLE WAYS TO PERFORM QUALITY CHECK ON CLINICAL DATABASE

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 201: Day 2 Track 2 - IASCT

Disclaimer

All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis

2 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 202: Day 2 Track 2 - IASCT

Duplicate records• Duplicate treatment records, Multiple demographic records, Vital signs data, etc..

Missing data• Demog, lab data, vital signs data, adverse events terms, etc.

Same data collected in different units• Height, Weight, lab data, etc.

Data Ranges • Height, Weight, lab data, etc.

Inconsistency between dates• startdate > enddate ; following visit date< previous visit date ; overlapping dates

Possible Data Issues

33 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 203: Day 2 Track 2 - IASCT

Study specific data or mandatory Laboratory data not collected

• Hematology, biochemistry, and Urine parameters

Different DATES formats• Contains alphabets, length of dates is not consistent

Data Variation between two consecutive visits

Possible Data Issues (cont.)

4 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 204: Day 2 Track 2 - IASCT

It checks the variation of the value between two consecutive visit or between a visit and the baseline value

Macro 1: %check_variation

Parameters:datain = entry datasetunik_id = unique ID for a patientbaseline = value of the parameter at baseline - needed if the variation is calculated from baselinevisit = variable contains the chronologic order - needed if the variation is calculated by visitparam = name of parameter to calculate variation - neededrang_inf = inferior range of incorrect variationrang_sup = superior range of incorrect variationvar_pct = Y if the variation is in percentage, else the variation is in absolute value –

needed - Y by defaultcentr_id = create an additional report by center, ex ctr1n - not needed (only 1)

5 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 205: Day 2 Track 2 - IASCT

%check_variation (

datain=vsn,

unik_id=stysid1a,

baseline=,

visit=vis1n,

param=wgt1n,

rang_inf=-15,

rang_sup=30,

var_pct=Y,

centr_id=ctr1n

);

Macro 1: %check_variation (Example)

Here, we want to be sure no patient has lost more than 15% of his weight and has taken more than 30% of his weight.

6 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 206: Day 2 Track 2 - IASCT

It checks the consistency between two consecutive dates- Find the overlap between dates- No discrepancy between visit numbers. Example: Visit number 3 cannot have a

date before the visit number 1

Macro 2: %check_order_date

Parameter:datain = entry dataset unik_id = unique ID for a patientvisitstt = variable contains the chronologic order, start date - neededvisitend = stop date - needed if this date existvisitnum = variable contains the chronologic order, visit number – needed if the

comparison is made between the visit number and the visit dategrp_by = variable of grouping, ex pt_txt - no needed (only 1)add_rep = parameter to add on the report - no needed (<10)centr_id = create an additional report by center, ex ctr1n - not needed (only 1)

7 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 207: Day 2 Track 2 - IASCT

Macro 2: %check_order_date (Example)• Example 1

%check_order_date (datain=dar,

unik_id=stysid1a,

visitstt=smdstt1o,

visitend=smdend1o,

visitnum=,

grp_by=,

add_rep=dartyp1c tdd1n tddunt1a rsndos2c doschg4c,

centr_id=

);

• Example 2

%check_order_date (datain=import_s.vis,

unik_id=stysid1a,

visitstt=vis1o,

visitend=,

visitnum=vis1n,

grp_by=,

add_rep=visnam1a,

centr_id=ctr1n

);

Here, we want be sure there is no overlap between treatment date or the start treatment date is not after the end treatment date.

Here, we want be sure there is no discrepencies between visit number and date. Ex: the visit number 3 can't have a date before the visit number 1.

8 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 208: Day 2 Track 2 - IASCT

This macro checks the number and the percentage of missing values for a reference variable

Macro 3: %check_missing_value

Parameter:datain = entry dataset - neededunik_id = unique ID for a patient - no needed - stysid1a by defaultmissvar = variable contains potential missing value - needed - _all_ by default.

This parameter can be a list of parameters <=9.centr_id = create an additional report by center, ex ctr1n - not needed (only 1)del_lst = this option delete the last record classed by &del_lst. variable. printopt = if equal to Y, this option print the listing of unik_id with missing parameters –

not needed - missing by default

9 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 209: Day 2 Track 2 - IASCT

%check_missing_value (datain=dar,

unik_id=stysid1a,

missvar=smdend1o,

del_lst=smdstt1o,

printopt=Y

);

Macro 3: %check_missing_value (Example)

Here, we want print the list of all missing end treatment date which are not the latest end treatment date as the study is not frozen and patient are on treatment

10 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 210: Day 2 Track 2 - IASCT

This helps to check whether the same data collected in different units

Macro 4: %check_lab_units

Parameters:

dsn = one or two level sas dataset nameparm =parameter nameunitvar =variable name which has the unit informationparmvar =Name of the parameter variable

11 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 211: Day 2 Track 2 - IASCT

Example:

Macro 4: %check_lab_units (Example)

%check_lab_units(dsn=data_a.a_lrs,parm=FERR,unitvar=parunt1c,parmvar=parnam1c

);

Here, it lists out number of distinct units for the parameter SERUM FERRITIN in the laboratory dataset

12 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 212: Day 2 Track 2 - IASCT

This is to check how the values are deviated from Low and upper normal limits

Macro 5: %check_lrs_range

Parameters:

dsn = one or two level dataset nameparm = parameter nameparamvar = variable name contains the parameterlabvar = Variable contains laboratory valuelowvar =lower normal limit of the parameteruppvar =upper normal limit of the parameterpercent =preferred percentage deviated from lower or upper normal limitspatidvar =patient id variable

13 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 213: Day 2 Track 2 - IASCT

Macro 5: %check_lrs_range (Example)

%check_lrs_range(dsn=data_s.lrs,

parm=FERR,

parmvar=parnam1c,

lowvar=nrgllm1n,

uppvar=nrgulm1n,

labvar=labrsl1n,

percent=50,

patidvar=stysid1a);

Here , it lists all the patient ids whose lab values deviated by 50% from lower or upper normal limits.

14 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 214: Day 2 Track 2 - IASCT

This macro is to identify any incorrect date formats present

Macro 6: %check_dar_incorrect_date

Parameters:

dsn = one or two level dataset namechardt =Character date variablepercent =preferred percentage deviated from lower or upper normal limitspatidvar =patient id variable

15 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 215: Day 2 Track 2 - IASCT

Macro 6: %check_dar_incorrect_date (Example)

%check_dar_incorrect_date(dsn=data_s.dar,

chardt=smdstt1d,

patidvar=stysid1a);

It displays all patient ids with the medication start dates that have incorrect date format

16 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 216: Day 2 Track 2 - IASCT

This is to check the duplicate dose records or dose interruptions

Macro 7: %check_dar_dup_dose

Parameters:dsn = one or two level dataset namepatidvar = patient id variablebygrpdup = list of variables to identify duplicatesdosevar = variable contain dose information

17 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 217: Day 2 Track 2 - IASCT

Macro 7: %check_dar_dup_dose (Example)

%check_dar_dup_dose(dsn=data_s.dar,patidvar=stysid1a,bygrpdup=smdstt1o smdend1o,dosevar=tdd1n);

It displays list of patient ids with missing dose or interruptions and find records of duplicates with the by group variables smdstt1o and smdend1o.

18 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 218: Day 2 Track 2 - IASCT

This helps to identify records- Any patient visit start and end date after Last Patient Last Visit (LPLV)- Any patient visit start and end date before First Patient First Visit (FPFV)- End date is less than Start date

Macro 8: %check_aev_fpfv_lplv

Parameters:dsn = one or two level dataset namepatidvar =patient id variablelplvdate =last patient last visit datefpfvdate =first patient first visit dateaevstvar =Adverse event start dateaevendvar =Adverse event end date

19 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 219: Day 2 Track 2 - IASCT

Macro 8: %check_aev_fpfv_lplv (Example)

%check_aev_fpfv_lplv(dsn=data_s.aev,

lplvdate=,

fpfvdate="01Feb2006"d,

aevstvar=aevstt1o,

aevendvar=aevend1o,

patidvar=stysid1a);

It displays list of patient ids who have adverse events before first patient first visit

Also it displays list of patient ids with adverse event end date less than adverse event start date.

20 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 220: Day 2 Track 2 - IASCT

This helps to identify records with AEVs leading to discontinuation but no reason given for discontinuation

Macro 9: %check_aev_reason

Parameters:dsn = one or two level dataset namepatidvar =patient id variableaevservar =Variable name has seriousness of adverse eventsacntknvar =Variable name contains action taken or not

Example:

%check_aev_reason(dsn=data_s.aev,aevservar=aevser1c,acntknvar=acntak1n,patidvar=stysid1a);

21 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 221: Day 2 Track 2 - IASCT

This helps to check missing MedDRA terms (SOC,PT), duplicate records

Macro 10: %check_aev_meddra

Parameters:dsn = one or two level dataset namepatidvar = patient id variablesocvar = SOC variable nameptvar = preferred Term variable namedtvar = adverse event start and/or end date

Example:%check_aev_meddra

(dsn=data_s.aev,socvar=soc_txt,ptvar=pt_txt,patidvar=stysid1a,dtvar=aev);22 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 222: Day 2 Track 2 - IASCT

Laura Robin, Senior Statistical Programmer, Novartis, Basel

Jayapandian N, Senior Statistical Analyst, Novartis, Hyderabad

Acknowledgement

| ConSPIC 2011| Author | Sep 29-30, 2011 23

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 223: Day 2 Track 2 - IASCT

| ConSPIC 2011| Author | Sep 29-30, 2011 24

QUESTIONS!

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 224: Day 2 Track 2 - IASCT

TAKE Solutions

RTF_Read

Presented ByVijayabhaskar ReddySr. Clinical SAS Programmer

1

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 225: Day 2 Track 2 - IASCT

Why RTF_READ. Know about RTF. Control Words in RTF. Identify the Titles and Footnotes in RTF.

Dynamic Titles and Footnotes. Extract Unique Titles and Footnotes into a SAS dataset. Read Spanning Headers. Group Column Headers. Read Special Symbols. Extract the Data into SAS dataset. Distinguish between different rows and columns. Create Horizontal Dataset. Create Vertical Dataset.

Agenda

2

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 226: Day 2 Track 2 - IASCT

Why RTF_READ?

Validation of RTF Tables programmatically

Comparison of multiple versions of the same RTF Table file

3

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 227: Day 2 Track 2 - IASCT

RTF File Generated by SAS

4

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 228: Day 2 Track 2 - IASCT

RTF File - NOTEPAD

5

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 229: Day 2 Track 2 - IASCT

Validate RTF Clarify the RTF is generated by SAS

Clarify the RTF is not modified outside of the SAS

PRXPARSE Function :

• Compiles a Perl regular expression (PRX) that can be used for patternmatching of a character value

PRXMATCH Function :

• Searches for a pattern match and returns the position at which the pattern is found.

6

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 230: Day 2 Track 2 - IASCT

Data READFILE;length ver $1000;infile "&RTFFile." missover length = l end = lastobs lrecl = 2000;input string $varying1500. l;rownum = _n_;string=_infile_;rc1=prxparse("/\\*\\generator/");rc2=prxparse("/\\version\d+/");if prxmatch(rc1,string) then MWFLG=1; ……………

SAS CODE

Validate RTF

7

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 231: Day 2 Track 2 - IASCT

Control Word Comment Control Word Comment

1. \header Identify Title in Header Section

2. \headery Identify Titles in Document section

3. \footer Identify Footnotes in Footer Section

4. \footery Identify Footnotes in Document Section

5. \trhdr Identify Column header 6. \trowd Identify Table Row7. \cell Identify the Table Cell 8. \cellx Identify the cell size in

twips9. \line Identify line break 10. \li200 Works line tab, use to enter

extra space.Note : numeric value represents twips.

Control Words

Although there might be lot of control words used in RTF, only a few are required to understand the basic structure of the RTF file and extract data out of it.

8

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 232: Day 2 Track 2 - IASCT

9

{\rtf1\ansi\ansicpg1252\uc1\deff0\deflang1033\deflangfe1033{\fonttbl..........\pard\sectd\linex0\endnhere\pgwsxn15840\pghsxn12240\lndscpsxn\headery1440\footery540\marglsxn540\margrsxn1440\margtsxn1440\margbsxn540

\trowd\trkeep\trhdr\trgaph10\cltxlrtb\clvertalt\cellx6929\cltxlrtb\clvertalt\cellx13858\pard\plain\intbl\keepn\sb10\sa10\ql\f1\fs20\cf1{Company Name - Protocol123\cell}\pard\plain\intbl\keepn\sb10\sa10\qr\f1\fs20\cf1{Page 1 of 1\cell}{\row}........{\pard}.......\trowd\trkeep\trqc\trgaph10\cltxlrtb\clvertalt\cellx13860\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{Names of input datasets: ADSL\cell}{\row}.......\trowd\trkeep\trhdr\trqc\trgaph22\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx4838....\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{Parameter\cell}....{\row}\trowd\trkeep\trqc\trgaph22\cltxlrtb\clvertalt\cellx4838.....\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{Age at Screening (Years)\cell}.....{\row}

Control WordsRTF File

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 233: Day 2 Track 2 - IASCT

10

Document or Header/Footer SectionDocument Section

Header Section

Document Section

Titles & Footnotes in Document or Header/Footer

Header Section

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 234: Day 2 Track 2 - IASCT

Titles & Footnotes in Document or Header/Footer Section

%* Header Section;if index(string,'\header')>0 and index(string,'\footer')=0then hdrflg=1; if hdrflg=1 then do; %* HEADER SECTION;%* PATTERN 1 HEADER INFORMATION *;if index(string,'\trowd')>0 then hdr= hdr + 1;else if index(string,'\pard{\par}')>0 then do;

hdr= 0;hdrflg=0;

end;end;else do; %* DOCUMENT SECTION;%* PATTERN 2 HEADER INFORMATION **;

if index(string,'\headery')>0 and index(string,'\footery')>0then hdrflg1=1;if index(string,'\pard{\par}')>0 then hdrflg1=0; if hdr=1 and index(string,'\trowd')>0 then tblid = tblid + 1;…….

SAS CODE

11

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 235: Day 2 Track 2 - IASCT

Dynamic Titles & Footnotes

Standard CRF Page

Metadata

Library

Table 14.1.1.1 DEMOGRAPHICS AND BASELINE CHARACTERISTICS

FULL ANALYSIS SET

Program name: ‘Path’ Creation date of output 14.1.1.1: 22SEP2009 4:43

Names of input datasets: ADSL

…..\pard\plain\intbl\keepn\sb10\sa10\ql\f1\fs20\cf1{Company Name - Protocol123\cell}\pard\plain\intbl\keepn\sb10\sa10\qr\f1\fs20\cf1{Page 1 of 1\cell}{\row}…\pard\plain\intbl\keepn\sb10\sa10\qc\f1\fs20\cf1{Table 14.1.1.1 \line DEMOGRAPHICS AND BASELINE CHARACTERISTICS\line FULL ANALYSIS SET\cell}{\row}…..\pard\plain\intbl\keepn\sb10\sa10\qr\f1\fs20\cf1{\cell}{\row}

Company Name – Protocol123 Page 1 of 1

…..\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{\cell}{\row}…..\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{Names of input datasets: ADSL\cell}{\row}…..\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{Program name: ‘Path’ \cell}\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\qr\f1\fs18\cf1{Creation date of output 14.1.1.1: 22SEP2011 4:43\cell}{\row}

12

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 236: Day 2 Track 2 - IASCT

Create Title and Footnote (T&F) dataset with unique Titles and Footnotes

Extract Titles and Footnotes in to a SAS dataset

13

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 237: Day 2 Track 2 - IASCT

Spanning Headers

Standard CRF Page

Metadata

Library

Spanning Headers

\trowd\trkeep\trhdr\trqc\trgaph22\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx4838\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx5776\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx10108\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx12280\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Treatment\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{\cell}{\row}

\trowd\trkeep\trhdr\trqc\trgaph22\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx4838\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx5776\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx7942\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx10108\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx12280\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{Parameter\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Statistics\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Trt Group 1\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Trt Group 2\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Total\cell}{\row}

14

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 238: Day 2 Track 2 - IASCT

Spanning Headers

Standard CRF Page

Metadata

Library

SAS CODE

%*Extract twip values and generate twip data;data &prefix._twipdata;set &prefix._hdr1;by tblid hdr;where index(string,'\cellx')>0;

if index(string,'\cellx')>0 then twpval=scan(string,-1,'\cellx');

run;

Header is spanning, hence the twpval is between 5776 and 10108

15

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 239: Day 2 Track 2 - IASCT

Group Column Headers

Standard CRF Page

Metadata

LibraryTreatment

Parameter Statistics Trt Group 1 Trt Group 2 Total

|Parameter |StatisticsTreatment|

Trt Group 1

Treatment|

Trt Group 2|Total

16

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 240: Day 2 Track 2 - IASCT

Special Symbols

\\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{BMI (kg/m\super 2\nosupersub{})\uc1\u0956\\uc1\u0945\\cell}

μα

RTF Code

PRXCHANGE Function• Performs a pattern-matching replacement.

CALL PRXNEXT Routine• Returns the position and length of a substring that matches a pattern,

and iterates over multiple matches within one string. 17

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 241: Day 2 Track 2 - IASCT

Special Symbols – SAS Code

Standard CRF Page

Metadata

Library

if _N_=1 thendo;

%* Takes care of tokens for Super and Subscripts;re1=prxparse('s/\\super ?(.+?)\\nosupersub(\{\}| ?)/_super-\1_/');%* Handling Special Symbols present in $token format;tokenw_re = prxparse('/\\uc1\\u\d{4}\\~|\\uc1\\u\d{4}/');

end;%* Replace the Special characters with SAS tokens;call prxchange(re1,2,newvar);start=1;stop=length(newvar);call prxnext(tokenw_re,start,stop,newvar,stpos,len);do i=1 to 5 while(stpos>0);

tok[i]=substr(newvar,stpos,len);call prxnext (tokenw_re,start,stop,newvar,stpos,len);fval[i]=put(compress(tok[i],'() '),$token.);

end;

proc format;value $token "\uc1\u0956" = '_mu_'"\uc1\u0945" ='_alpha_'"¹" = '_super_1'

18

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 242: Day 2 Track 2 - IASCT

Extract the Data into SAS dataset

{<data>\cell} Control words for extract the tables cells data in to col1• { Means save all the character formatting attributes now.• <data> It would be any text contain in the column.• } Means restore the character formatting attributes to their most

recently saved values.

There are multiple possibilities for \cell control word. This code snippet will capture correct data.

SAS CODE%*Extract data from file and store it in Col1 variable.;if index(string,'{')>0 and index(string,'\cell')>0 and index(string,'}')>0 then do;

_col1='';col1=trim(left(substr(string,(index(string,'{')+1))));col1=tranwrd(col1, '\cell}' , '');

end;19

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 243: Day 2 Track 2 - IASCT

Extract the Data into SAS dataset

else if index(string,'{')>0 and index(string,'\cell')=0 and index(string,'}')=0 then do;

_col1=trim(left(substr(string,(index(string,'{')+1))));end;

else if index(string,'{')=0 and index(string,'\cell')=0 and index(string,'}')=0 and index(string,'\')=0 then

do;col1=trim(left(_col1)) || " " || trim(left(col1));_col1='';

end;* if \ is missing means no tokens only data, read full string;_col1=trim(left(_col1))|| " " || trim(left(string));

end;else if index(string,'{')=0 and index(string,'\cell')>0 and index(string,'}')>0 then do;

col1=trim(left(substr(string,1,(index(string,'\cell')-1))));

20

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 244: Day 2 Track 2 - IASCT

Distinguish between different rows and columns

* Initialize counters for identifying Rows and Cols;data DATARECS;

set PARSEFILE1;by tblid;where indexw(string,'\cellx')=0; * Keep only data records;retain rowid colid 0;if first.tblid then rowid=0;if index(string, '\trowd') or index(string,'\pard{\par}')>0 then do;

rowid=rowid + 1; colid=0;end;else do;

if (index(string,'\cell')>0 then colid=colid+1;if (index(string,'\bkmkstart')>0 orindex(string,'\bkmkend')>0) then colid= colid + 1;if index(string,'{\row}')>0 or index(string,'\footer')>0 then colid=0;

end;run; 21

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 245: Day 2 Track 2 - IASCT

Distinguish between different rows and columns

*Transpose Vertical data to Horizontal table like structure;proc transpose data=DATARECS1 out=TDATARECS(drop=_NAME_) prefix=c;

by tblid rowid hdr;id colid;var col1;

run;

22

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 246: Day 2 Track 2 - IASCT

Horizontal Data Set Structure

Metadata

Library

Protocol:123 Page 1 of 1TABLE 14.3.1.6.1

Treatment Emergent Adverse Events Considered Related (Possibly, Probably, or Definitely) to Treatment by System Organ Class and Preferred Term Safety Set

Treatment 1 Treatment 2 Treatment 3 Treatment 4 Treatment 4 Overall(N=3) (N=3) (N=2) (N=2) (N=1) (N=11)

Body System or Organ ClassDictionary -Derived Term

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Gastrointestinal disorders 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 1 1 ( 50) 2 1 ( 100) 4 3 ( 27)Nausea 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 100) 2 2 ( 18)Abdominal pain 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 9)Vomiting 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 100) 1 1 ( 9)

Note: MedDRA Dictionary Version 11.1 was used for codingNames of input datasets: ADS.ADAE and ADS.ADSLProgram name: ‘Path’\

Creation date of output 14.3.1.6.1: 12JUN2011 11:17

23

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 247: Day 2 Track 2 - IASCT

Output Data Set Structure

Standard CRF Page

Metadata

Library

Problem: How do you represent ALL types of tables and listings

in a data set?

Solution? Turn every column into a variable?

What About:• Multiple pages• Tables that flow horizontally onto another page

24

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 248: Day 2 Track 2 - IASCT

Some don’t fit on one page

Standard CRF Page

Metadata

Library

RTF_READ training Page 1 of 2

This is Title 1Safety Population

Parameter Long Treatment Name 1 Long Treatment Name 2 Long Treatment Name 3 Long Treatment Name 4

This is a categorical Variable 14 13 13 14

25

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 249: Day 2 Track 2 - IASCT

Some don’t fit on one page

Standard CRF Page

Metadata

Library

RTF_READ training Page 2 of 2

This is Title 1Safety Population

Parameter Long Treatment Name 5 Long Treatment Overall

This is a categorical Variable 13 1

26

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 250: Day 2 Track 2 - IASCT

Some don’t fit on one page

Standard CRF Page

Metadata

Library What is column 2? Treatment 1 or Treatment 5?

How do you know (without looking) how many columns fit across on the page?

Are you sure they are not going to add/subtract any columns ?• Request 1: Can you make it fit on one page ?• Request 2: Can you add a total column ?

27

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 251: Day 2 Track 2 - IASCT

Our Solution

Standard CRF Page

Metadata

Library Normalize the structure of the table Two ways to think

Instead of reading across, think about reading each column all the way down the table stopping when the header text changes or the table ends. Then move to the next column.

Or, think of it as transposed down PROC TRANSPOSE DATA=table; BY statistic; VAR columns;

28

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 252: Day 2 Track 2 - IASCT

Vertical aka. Normalized

Standard CRF Page

Metadata

Library

Always same variables Number of Records in Table [R] - 4 Number of columns per Treatment [N(C)] -12 Number of Records after RTF_READ [N(Obs)] - 48

Protocol :123 Page 1 of 1TABLE 14.3.1.6.1

Treatment Emergent Adverse Events Considered Related (Possibly, Probably, or Definitely) to Treatment by System Organ Class and Preferred Term Safety Set

Treatment 1 Treatment 2 Treatment 3 Treatment 4 Treatment 5 Overall(N=3) (N=3) (N=2) (N=2) (N=1) (N=11)

Body System or Organ ClassDictionary -Derived Term

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Eventsn

Patientsn (%)

Gastrointestinal disorders 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 1 1 ( 50) 2 1 ( 100) 4 3 ( 27)Nausea 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 100) 2 2 ( 18)Abdominal pain 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 9)Vomiting 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 100) 1 1 ( 9)

Note: MedDRA Dictionary Version 11.1 was used for codingNames of input datasets: ADS.ADAE and ADS.ADSLProgram name: ‘Path’\

Creation date of output 14.3.1.6.1: 12JUN2011 11:17

29

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 253: Day 2 Track 2 - IASCT

Vertical Data Structure

Standard CRF Page

Metadata

Library

Obs tf_id H1 C1 hvalue cvalue

1 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 1|(N=3)|Events n 0

2 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 1|(N=3)|Events n 0

3 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 1|(N=3)|Events n 0

4 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 1|(N=3)|Events n 0

5 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 1|(N=3)|Patients n (%) 0 ( 0)

6 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 1|(N=3)|Patients n (%) 0 ( 0)

7 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 1|(N=3)|Patients n (%) 0 ( 0)

8 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 1|(N=3)|Patients n (%) 0 ( 0)

9 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 2|(N=3)|Events n 0

10 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 2|(N=3)|Events n 0

11 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 2|(N=3)|Events n 0

12 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 2|(N=3)|Events n 0

13 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 2|(N=3)|Patients n (%) 0 ( 0)

14 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 2|(N=3)|Patients n (%) 0 ( 0)

15 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 2|(N=3)|Patients n (%) 0 ( 0)

16 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 2|(N=3)|Patients n (%) 0 ( 0)

17 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 3|(N=2)|Events n 1

18 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 3|(N=2)|Events n 1

19 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 3|(N=2)|Events n 0

20 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 3|(N=2)|Events n 0

21 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 3|(N=2)|Patients n (%) 1 ( 50)

22 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 3|(N=2)|Patients n (%) 1 ( 50)

23 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 3|(N=2)|Patients n (%) 0 ( 0)

24 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 3|(N=2)|Patients n (%) 0 ( 0)

30

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 254: Day 2 Track 2 - IASCT

Vertical Data Structure

Standard CRF Page

Metadata

Library

25 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 4|(N=2)|Events n 1

26 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 4|(N=2)|Events n 0

27 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 4|(N=2)|Events n 1

28 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 4|(N=2)|Events n 0

29 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 4|(N=2)|Patients n (%) 1 ( 50)

30 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 4|(N=2)|Patients n (%) 0 ( 0)

31 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 4|(N=2)|Patients n (%) 1 ( 50)

32 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 4|(N=2)|Patients n (%) 0 ( 0)

33 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 5|(N=1)|Events n 2

34 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 5|(N=1)|Events n 1

35 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 5|(N=1)|Events n 0

36 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 5|(N=1)|Events n 1

37 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 5|(N=1)|Patients n (%) 1 ( 100)

38 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 5|(N=1)|Patients n (%) 1 ( 100)

39 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 5|(N=1)|Patients n (%) 0 ( 0)

40 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 5|(N=1)|Patients n (%) 1 ( 100)

41 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Overall|(N=11)|Events n 4

42 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Overall|(N=11)|Events n 2

43 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Overall|(N=11)|Events n 1

44 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Overall|(N=11)|Events n 1

45 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Overall|(N=11)|Patients n (%) 3 ( 27)

46 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Overall|(N=11)|Patients n (%) 2 ( 18)

47 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Overall|(N=11)|Patients n (%) 1 ( 9)

48 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Overall|(N=11)|Patients n (%) 1 ( 9)

31

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 255: Day 2 Track 2 - IASCT

Return on Investment

Standard CRF Page

Metadata

Library Significant increase in Quality– 100% QC on all tables

• Whether 1 page or 1000 pages– QC that we can Re-run

• Initial QC takes longer• Re-QC is much faster

32

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls

Page 256: Day 2 Track 2 - IASCT

Questions?

Standard CRF Page

Metadata

Library

33

Indian

Ass

ociat

ion fo

r Stat

istics

in C

linica

l Tria

ls