SAS Macro Coding for Jackknife Repeated Replication

SAS Macro Coding for Jackknife Repeated Replication

• Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language

• This presentation will demonstrate how to use a general JRR macro to correctly calculate variance estimates for means and regression coefficients (logistic and OLS models)

Analysis of Complex Sample Survey Data

• Data from complex sample surveys must be analyzed using techniques which adjust for the clustering of the sample design

• SAS, SPSS, and Stata assume a simple random sample and do not correctly calculate variances and standard errors within the standard procedures

Analysis of Complex Survey Data

• SAS and Stata offer survey and svy procedures which use the Taylor Series Linearization approach

• JRR is another widely used replication approach, offers an alternative to the Taylor Series method

• JRR is flexible and can be adapted to many different types of statistics such as means, regression coefficients, and other statistics of interest

Visual Representation of JRR process

• JRR systematically removes a small portion of the sample and statistics of interest are computed repeated for each sub-sample

• In this example, str=42 and secu=2 is deleted and str=42 and secu=1 is doubled.

• This process is followed for each strata until entire dataset is covered

SAS JRR Macro: Logistic Regression

*Logistic Regression Jackknife for Analysis of Complex Survey Data****************** ;

*Pat Berglund, July 2003 for Summer Institute Workshop ;

libname d 'd:\sumclass' ;options compress=yes nofmterr symbolgen ;options macrogen mprint;

*create outer jackknife macro with parameters ;*Parameters to fill in:*ncluster=number of clusters, in the NCS I dataset this is 42 ;*weight=case weight ;*depend=dependent variable for the logistic model ;*preds=predictor variables entered with a space between each one ;*indata=input dataset* ;

%macro jacklogods(ncluster,weight,depend,preds,indata);

*section 1: jackknife using strata and secu variables to do 42 jackknife selections* ;*each iteration of do loop selects one strata*secu combination and doubles the contribution of strata=x and secu=1 while setting strata=x and secu=2 to zero ;*all other combinations stay the same* ;

%let nclust=%eval(&ncluster);data one; set &indata;

%macro wgtcal ; %do i=1 %to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend;%wgtcal ;

**section 2: run base model/statistic of interest for entire sample using full weight* ;

%macro base ;

ods output parameterestimates=parms (keep=variable estimate ) ;

ods listing close ;

proc logistic des data=ONE ;

model &depend=&preds ;

weight &weight ;

run ;

ods listing ;

proc print data=parms ;

run ;

proc sort ;

by variable ;

run ;

%mend base ;

%base ;

*Section 3: Run Replicate Models* ;

* replicate models, one for each strata using weight developed in jackknife section 1* ;

*save statistic of interest for use with variance estimation* ;

%macro reps ;

%do j=1 %to &nclust ;

ods output parameterestimates=parms&j

(keep=estimate variable rename=(estimate=estimate&j )) ;ods listing close ;

proc logistic des data=ONE ;

model &depend=&preds ;

weight pwt&j ;

run ;

proc sort ;

by variable ;

%end ;

%mend reps;

%reps ;

*Section 4: Merge Base and Replicate files together for calculation of statistics of interest* ;

data rep ;

merge parms

%do k=1 %to &nclust;



by variable ;

proc print ;

run ;

*Section 5-Calculate complex design corrected variance and standard errors

*variance = sum of the squared differences between the base statistic and the replicate statistics ;

*standard error= square root of the sum of the squared differences (variance) ;

*Odds Ratio=exponent of the coefficient ;

*Confidence Intervals=OR+-1.96*corrected standard error* ;

ods listing ;

data calculate ;

set rep ;

%macro it ;

%do j=1 %to &nclust ;



sumdiff=sum(of sqdiff1-sqdiff&nclust);

stderr=sqrt(sumdiff) ;

or=exp(estimate) ;

lowor=or-(1.96*stderr) ;

upor=or+(1.96*stderr) ;

%mend it ;


run ;

proc print ;

var variable estimate stderr or lowor upor ;

run ;

%mend jacklogods ;

%jacklogods(42,p2wtv3,deplt1,sexf,d.ncsdxdm3 ) ;

*comparison with SRS logistic regression* ;

proc logistic des data=d.ncsdxdm3 ;

weight p2wtv3 ;

model deplt1=sexf ;

run ;

*comparison with SAS surveylogistic ;proc surveylogistic data=d.ncsdxdm3 ;strata str ;cluster secu ;weight p2wtv3 ;model deplt1 (event='1') =sexf ;run ;

Results from Logistic JRR

Design Corrected Results:

Variable Estimate stderr or lowor upor

SEXF 0.7434 0.088842 2.10315 1.92902 2.27728

SRS Results

Analysis of Maximum Likelihood Estimates

Std. Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

SEXF 1 0.7434 0.0724 105.3802 <.0001

SAS Surveylogistic Results

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 2.0084 0.0776 669.6525 <.0001SEXF 1 -0.7434 0.0889 70.0103 <.0001

Odds Ratio Estimates Point 95% WaldEffect Estimate Confidence LimitsSEXF 0.475 0.399 0.566

Another approach: Linear Regression

%macro jackgenmod(ncluster,weight,depend,preds,indata);

%let nclust=%eval(&ncluster);

data one;

set &indata;

%macro wgtcal ;

%do i=1 %to &nclust ;


if str=&i and secu=1 then pwt&i=pwt&i*2 ;

if str=&i and secu=2 then pwt&i=0 ;



%wgtcal ;

Base Model for OLS

%macro base ;

ods output parameterestimates=parms

(keep=variable estimate ) ;

title "Example of Proc Reg without design correction" ;

proc reg data=ONE ;

model &depend=&preds ;

weight &weight ;

run ;

proc sort ;

by variable ;

run ;

%mend base ;

%base ;

Replicate Models

%macro reps ;

%do j=1 %to &nclust ;

ods output parameterestimates=parms&j

(keep=estimate variable rename=(estimate=estimate&j )) ;

ods listing close ;

proc reg data=ONE ;

model &depend=&preds ;

weight pwt&j ;

run ;

proc sort ;

by variable ;

%end ;

%mend reps;

%reps ;

Merge Replicate Datasets with Base Dataset

data rep ;

merge parms

%do k=1 %to &nclust;



by variable ;

proc print ;

run ;

ods listing ;

Calculate Corrected Standard Errors from Distribution of Replicate Coefficients

data calculate ;

set rep ;

%macro it ;

%do j=1 %to &nclust ;



sumdiff=sum(of sqdiff1-sqdiff&nclust);

stderr=sqrt(sumdiff) ;

%mend it ;


run ;

Code to Print Results from JRR and Execute Outer Macro

proc print ;

title "Results from JRR for OLS regression" ;

var variable estimate stderr ;

run ;

%mend jackgenmod ;

%jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ;

Proc SurveyReg Code

proc surveyreg data=d.ncsdxdm3 ;

title "Example of Proc SurveyReg" ;

strata str ;

cluster secu ;

weight p2wtv3 ;

model incpers=sexf ag25 ag35 ag45 ;

run ;

Parameter Estimates

Parameter Std.

Variable DF Estimate Error t Value

Intercept 1 11077 485.53334 22.81

SEXF 1 -12096 434.45468 -27.84

AG25 1 15227 586.69609 25.95

AG35 1 22194 600.60265 36.95

AG45 1 21404 683.46087 31.32

Parameter Estimates from OLS SRS Regression

JRR Results

Results from JRR for OLS regression

Obs Variable Estimate stderr

1 Intercept 11077 529.49

2 AG25 15227 698.83

3 AG35 22194 1026.29

4 AG45 21404 1055.67

5 SEXF -12096 689.31

Proc SurveyReg Results

Estimated Regression Coefficients


Parameter Estimate Error t Value Pr > |t|

Intercept 11077.003 532.95062 20.78 <.0001

SEXF -12095.819 690.29149 -17.52 <.0001

AG25 15227.170 698.54031 21.80 <.0001

AG35 22194.355 1017.50689 21.81 <.0001

AG45 21403.763 1062.42802 20.15 <.0001

• JRR is a flexible and convenient alternative to canned software procedures/programs

• Any statistic/procedure can be used within JRR structure, assuming it makes statistical sense

• SAS Macro coding allows parsimonious syntax and is ideal for repetitive and flexible coding