SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 1

SAS Macro Coding for Jackknife Repeated Replication

• Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language

• This presentation will demonstrate how to use a general JRR macro to correctly calculate variance estimates for means and regression coefficients (logistic and OLS models)


Analysis of Complex Sample Survey Data

• Data from complex sample surveys must be analyzed using techniques which adjust for the clustering of the sample design

• SAS, SPSS, and Stata assume a simple random sample and do not correctly calculate variances and standard errors within the standard procedures


Analysis of Complex Survey Data

• SAS and Stata offer survey and svy procedures which use the Taylor Series Linearization approach

• JRR is another widely used replication approach, offers an alternative to the Taylor Series method

• JRR is flexible and can be adapted to many different types of statistics such as means, regression coefficients, and other statistics of interest


Visual Representation of JRR process

• JRR systematically removes a small portion of the sample and statistics of interest are computed repeated for each sub-sample

• In this example, str=42 and secu=2 is deleted and str=42 and secu=1 is doubled.

• This process is followed for each strata until entire dataset is covered


SAS JRR Macro: Logistic Regression

*Logistic Regression Jackknife for Analysis of Complex Survey Data****************** ;

*Pat Berglund, July 2003 for Summer Institute Workshop ;

libname d 'd:\sumclass' ;options compress=yes nofmterr symbolgen ;options macrogen mprint;

*create outer jackknife macro with parameters ;*Parameters to fill in:*ncluster=number of clusters, in the NCS I dataset this is 42 ;*weight=case weight ;*depend=dependent variable for the logistic model ;*preds=predictor variables entered with a space between each one ;*indata=input dataset* ;

%macro jacklogods(ncluster,weight,depend,preds,indata);


*section 1: jackknife using strata and secu variables to do 42 jackknife selections* ;*each iteration of do loop selects one strata*secu combination and doubles the contribution of strata=x and secu=1 while setting strata=x and secu=2 to zero ;*all other combinations stay the same* ;

%let nclust=%eval(&ncluster);data one; set &indata;

%macro wgtcal ; %do i=1 %to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend;%wgtcal ;


**section 2: run base model/statistic of interest for entire sample using full weight* ;

%macro base ;

ods output parameterestimates=parms (keep=variable estimate ) ;

ods listing close ;

proc logistic des data=ONE ;

model &depend=&preds ;

weight &weight ;

run ;

ods listing ;

proc print data=parms ;

run ;

proc sort ;

by variable ;

run ;

%mend base ;

%base ;


*Section 3: Run Replicate Models* ;

* replicate models, one for each strata using weight developed in jackknife section 1* ;

*save statistic of interest for use with variance estimation* ;

%macro reps ;

%do j=1 %to &nclust ;

ods output parameterestimates=parms&j

(keep=estimate variable rename=(estimate=estimate&j )) ;ods listing close ;

proc logistic des data=ONE ;


weight pwt&j ;

run ;

proc sort ;

by variable ;

%end ;

%mend reps;

%reps ;


*Section 4: Merge Base and Replicate files together for calculation of statistics of interest* ;

data rep ;

merge parms

%do k=1 %to &nclust;

parms&k

%end;;

by variable ;

proc print ;

run ;


*Section 5-Calculate complex design corrected variance and standard errors

*variance = sum of the squared differences between the base statistic and the replicate statistics ;

*standard error= square root of the sum of the squared differences (variance) ;

*Odds Ratio=exponent of the coefficient ;

*Confidence Intervals=OR+-1.96*corrected standard error* ;

ods listing ;

data calculate ;

set rep ;

%macro it ;


sqdiff&j=(estimate-estimate&j)**2;

%end;

sumdiff=sum(of sqdiff1-sqdiff&nclust);

stderr=sqrt(sumdiff) ;

or=exp(estimate) ;

lowor=or-(1.96*stderr) ;

upor=or+(1.96*stderr) ;

%mend it ;

%it;

run ;


proc print ;

var variable estimate stderr or lowor upor ;

run ;

%mend jacklogods ;

%jacklogods(42,p2wtv3,deplt1,sexf,d.ncsdxdm3 ) ;

*comparison with SRS logistic regression* ;

proc logistic des data=d.ncsdxdm3 ;

weight p2wtv3 ;

model deplt1=sexf ;

run ;

*comparison with SAS surveylogistic ;proc surveylogistic data=d.ncsdxdm3 ;strata str ;cluster secu ;weight p2wtv3 ;model deplt1 (event='1') =sexf ;run ;


Results from Logistic JRR

Design Corrected Results:

Variable Estimate stderr or lowor upor

SEXF 0.7434 0.088842 2.10315 1.92902 2.27728


SRS Results

Analysis of Maximum Likelihood Estimates

Std. Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

SEXF 1 0.7434 0.0724 105.3802 <.0001


SAS Surveylogistic Results

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 2.0084 0.0776 669.6525 <.0001SEXF 1 -0.7434 0.0889 70.0103 <.0001

Odds Ratio Estimates Point 95% WaldEffect Estimate Confidence LimitsSEXF 0.475 0.399 0.566


Another approach: Linear Regression

%macro jackgenmod(ncluster,weight,depend,preds,indata);

%let nclust=%eval(&ncluster);

data one;

set &indata;

%macro wgtcal ;

%do i=1 %to &nclust ;

pwt&i=&weight;

if str=&i and secu=1 then pwt&i=pwt&i*2 ;

if str=&i and secu=2 then pwt&i=0 ;

%end;

%mend;

%wgtcal ;


Base Model for OLS

%macro base ;

ods output parameterestimates=parms

(keep=variable estimate ) ;

title "Example of Proc Reg without design correction" ;

proc reg data=ONE ;


weight &weight ;

run ;

proc sort ;

by variable ;

run ;

%mend base ;

%base ;


Replicate Models

%macro reps ;


ods output parameterestimates=parms&j

(keep=estimate variable rename=(estimate=estimate&j )) ;

ods listing close ;

proc reg data=ONE ;


weight pwt&j ;

run ;

proc sort ;

by variable ;

%end ;

%mend reps;

%reps ;


Merge Replicate Datasets with Base Dataset

data rep ;

merge parms

%do k=1 %to &nclust;

parms&k

%end;;

by variable ;

proc print ;

run ;

ods listing ;


Calculate Corrected Standard Errors from Distribution of Replicate Coefficients

data calculate ;

set rep ;

%macro it ;


sqdiff&j=(estimate-estimate&j)**2;

%end;

sumdiff=sum(of sqdiff1-sqdiff&nclust);

stderr=sqrt(sumdiff) ;

%mend it ;

%it;

run ;


Code to Print Results from JRR and Execute Outer Macro

proc print ;

title "Results from JRR for OLS regression" ;

var variable estimate stderr ;

run ;

%mend jackgenmod ;

%jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ;


Proc SurveyReg Code

proc surveyreg data=d.ncsdxdm3 ;

title "Example of Proc SurveyReg" ;

strata str ;

cluster secu ;

weight p2wtv3 ;

model incpers=sexf ag25 ag35 ag45 ;

run ;


Parameter Estimates

Parameter Std.

Variable DF Estimate Error t Value

Intercept 1 11077 485.53334 22.81

SEXF 1 -12096 434.45468 -27.84

AG25 1 15227 586.69609 25.95

AG35 1 22194 600.60265 36.95

AG45 1 21404 683.46087 31.32

Parameter Estimates from OLS SRS Regression


JRR Results

Results from JRR for OLS regression

Obs Variable Estimate stderr

1 Intercept 11077 529.49

2 AG25 15227 698.83

3 AG35 22194 1026.29

4 AG45 21404 1055.67

5 SEXF -12096 689.31


Proc SurveyReg Results

Estimated Regression Coefficients

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept 11077.003 532.95062 20.78 <.0001

SEXF -12095.819 690.29149 -17.52 <.0001

AG25 15227.170 698.54031 21.80 <.0001

AG35 22194.355 1017.50689 21.81 <.0001

AG45 21403.763 1062.42802 20.15 <.0001


Conclusions

• JRR is a flexible and convenient alternative to canned software procedures/programs

• Any statistic/procedure can be used within JRR structure, assuming it makes statistical sense

• SAS Macro coding allows parsimonious syntax and is ideal for repetitive and flexible coding

SAS Macro Coding for Jackknife Repeated Replication

Documents

Transcript of SAS Macro Coding for Jackknife Repeated Replication