Chapter 8 Bootstrap and Jackknife Estimation of Sampling ...
SAS Macro Coding for Jackknife Repeated Replication
description
Transcript of SAS Macro Coding for Jackknife Repeated Replication
SI Workshop: July 15, 2005 1
SAS Macro Coding for Jackknife Repeated Replication
• Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language
• This presentation will demonstrate how to use a general JRR macro to correctly calculate variance estimates for means and regression coefficients (logistic and OLS models)
SI Workshop: July 15, 2005 2
Analysis of Complex Sample Survey Data
• Data from complex sample surveys must be analyzed using techniques which adjust for the clustering of the sample design
• SAS, SPSS, and Stata assume a simple random sample and do not correctly calculate variances and standard errors within the standard procedures
SI Workshop: July 15, 2005 3
Analysis of Complex Survey Data
• SAS and Stata offer survey and svy procedures which use the Taylor Series Linearization approach
• JRR is another widely used replication approach, offers an alternative to the Taylor Series method
• JRR is flexible and can be adapted to many different types of statistics such as means, regression coefficients, and other statistics of interest
SI Workshop: July 15, 2005 4
Visual Representation of JRR process
• JRR systematically removes a small portion of the sample and statistics of interest are computed repeated for each sub-sample
• In this example, str=42 and secu=2 is deleted and str=42 and secu=1 is doubled.
• This process is followed for each strata until entire dataset is covered
SI Workshop: July 15, 2005 5
SI Workshop: July 15, 2005 6
SI Workshop: July 15, 2005 7
SAS JRR Macro: Logistic Regression
*Logistic Regression Jackknife for Analysis of Complex Survey Data****************** ;
*Pat Berglund, July 2003 for Summer Institute Workshop ;
libname d 'd:\sumclass' ;options compress=yes nofmterr symbolgen ;options macrogen mprint;
*create outer jackknife macro with parameters ;*Parameters to fill in:*ncluster=number of clusters, in the NCS I dataset this is 42 ;*weight=case weight ;*depend=dependent variable for the logistic model ;*preds=predictor variables entered with a space between each one ;*indata=input dataset* ;
%macro jacklogods(ncluster,weight,depend,preds,indata);
SI Workshop: July 15, 2005 8
*section 1: jackknife using strata and secu variables to do 42 jackknife selections* ;*each iteration of do loop selects one strata*secu combination and doubles the contribution of strata=x and secu=1 while setting strata=x and secu=2 to zero ;*all other combinations stay the same* ;
%let nclust=%eval(&ncluster);data one; set &indata;
%macro wgtcal ; %do i=1 %to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend;%wgtcal ;
SI Workshop: July 15, 2005 9
**section 2: run base model/statistic of interest for entire sample using full weight* ;
%macro base ;
ods output parameterestimates=parms (keep=variable estimate ) ;
ods listing close ;
proc logistic des data=ONE ;
model &depend=&preds ;
weight &weight ;
run ;
ods listing ;
proc print data=parms ;
run ;
proc sort ;
by variable ;
run ;
%mend base ;
%base ;
SI Workshop: July 15, 2005 10
*Section 3: Run Replicate Models* ;
* replicate models, one for each strata using weight developed in jackknife section 1* ;
*save statistic of interest for use with variance estimation* ;
%macro reps ;
%do j=1 %to &nclust ;
ods output parameterestimates=parms&j
(keep=estimate variable rename=(estimate=estimate&j )) ;ods listing close ;
proc logistic des data=ONE ;
model &depend=&preds ;
weight pwt&j ;
run ;
proc sort ;
by variable ;
%end ;
%mend reps;
%reps ;
SI Workshop: July 15, 2005 11
*Section 4: Merge Base and Replicate files together for calculation of statistics of interest* ;
data rep ;
merge parms
%do k=1 %to &nclust;
parms&k
%end;;
by variable ;
proc print ;
run ;
SI Workshop: July 15, 2005 12
*Section 5-Calculate complex design corrected variance and standard errors
*variance = sum of the squared differences between the base statistic and the replicate statistics ;
*standard error= square root of the sum of the squared differences (variance) ;
*Odds Ratio=exponent of the coefficient ;
*Confidence Intervals=OR+-1.96*corrected standard error* ;
ods listing ;
data calculate ;
set rep ;
%macro it ;
%do j=1 %to &nclust ;
sqdiff&j=(estimate-estimate&j)**2;
%end;
sumdiff=sum(of sqdiff1-sqdiff&nclust);
stderr=sqrt(sumdiff) ;
or=exp(estimate) ;
lowor=or-(1.96*stderr) ;
upor=or+(1.96*stderr) ;
%mend it ;
%it;
run ;
SI Workshop: July 15, 2005 13
proc print ;
var variable estimate stderr or lowor upor ;
run ;
%mend jacklogods ;
%jacklogods(42,p2wtv3,deplt1,sexf,d.ncsdxdm3 ) ;
*comparison with SRS logistic regression* ;
proc logistic des data=d.ncsdxdm3 ;
weight p2wtv3 ;
model deplt1=sexf ;
run ;
*comparison with SAS surveylogistic ;proc surveylogistic data=d.ncsdxdm3 ;strata str ;cluster secu ;weight p2wtv3 ;model deplt1 (event='1') =sexf ;run ;
SI Workshop: July 15, 2005 14
Results from Logistic JRR
Design Corrected Results:
Variable Estimate stderr or lowor upor
SEXF 0.7434 0.088842 2.10315 1.92902 2.27728
SI Workshop: July 15, 2005 15
SRS Results
Analysis of Maximum Likelihood Estimates
Std. Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
SEXF 1 0.7434 0.0724 105.3802 <.0001
SI Workshop: July 15, 2005 16
SAS Surveylogistic Results
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 2.0084 0.0776 669.6525 <.0001SEXF 1 -0.7434 0.0889 70.0103 <.0001
Odds Ratio Estimates Point 95% WaldEffect Estimate Confidence LimitsSEXF 0.475 0.399 0.566
SI Workshop: July 15, 2005 17
Another approach: Linear Regression
%macro jackgenmod(ncluster,weight,depend,preds,indata);
%let nclust=%eval(&ncluster);
data one;
set &indata;
%macro wgtcal ;
%do i=1 %to &nclust ;
pwt&i=&weight;
if str=&i and secu=1 then pwt&i=pwt&i*2 ;
if str=&i and secu=2 then pwt&i=0 ;
%end;
%mend;
%wgtcal ;
SI Workshop: July 15, 2005 18
Base Model for OLS
%macro base ;
ods output parameterestimates=parms
(keep=variable estimate ) ;
title "Example of Proc Reg without design correction" ;
proc reg data=ONE ;
model &depend=&preds ;
weight &weight ;
run ;
proc sort ;
by variable ;
run ;
%mend base ;
%base ;
SI Workshop: July 15, 2005 19
Replicate Models
%macro reps ;
%do j=1 %to &nclust ;
ods output parameterestimates=parms&j
(keep=estimate variable rename=(estimate=estimate&j )) ;
ods listing close ;
proc reg data=ONE ;
model &depend=&preds ;
weight pwt&j ;
run ;
proc sort ;
by variable ;
%end ;
%mend reps;
%reps ;
SI Workshop: July 15, 2005 20
Merge Replicate Datasets with Base Dataset
data rep ;
merge parms
%do k=1 %to &nclust;
parms&k
%end;;
by variable ;
proc print ;
run ;
ods listing ;
SI Workshop: July 15, 2005 21
Calculate Corrected Standard Errors from Distribution of Replicate Coefficients
data calculate ;
set rep ;
%macro it ;
%do j=1 %to &nclust ;
sqdiff&j=(estimate-estimate&j)**2;
%end;
sumdiff=sum(of sqdiff1-sqdiff&nclust);
stderr=sqrt(sumdiff) ;
%mend it ;
%it;
run ;
SI Workshop: July 15, 2005 22
Code to Print Results from JRR and Execute Outer Macro
proc print ;
title "Results from JRR for OLS regression" ;
var variable estimate stderr ;
run ;
%mend jackgenmod ;
%jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ;
SI Workshop: July 15, 2005 23
Proc SurveyReg Code
proc surveyreg data=d.ncsdxdm3 ;
title "Example of Proc SurveyReg" ;
strata str ;
cluster secu ;
weight p2wtv3 ;
model incpers=sexf ag25 ag35 ag45 ;
run ;
SI Workshop: July 15, 2005 24
Parameter Estimates
Parameter Std.
Variable DF Estimate Error t Value
Intercept 1 11077 485.53334 22.81
SEXF 1 -12096 434.45468 -27.84
AG25 1 15227 586.69609 25.95
AG35 1 22194 600.60265 36.95
AG45 1 21404 683.46087 31.32
Parameter Estimates from OLS SRS Regression
SI Workshop: July 15, 2005 25
JRR Results
Results from JRR for OLS regression
Obs Variable Estimate stderr
1 Intercept 11077 529.49
2 AG25 15227 698.83
3 AG35 22194 1026.29
4 AG45 21404 1055.67
5 SEXF -12096 689.31
SI Workshop: July 15, 2005 26
Proc SurveyReg Results
Estimated Regression Coefficients
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 11077.003 532.95062 20.78 <.0001
SEXF -12095.819 690.29149 -17.52 <.0001
AG25 15227.170 698.54031 21.80 <.0001
AG35 22194.355 1017.50689 21.81 <.0001
AG45 21403.763 1062.42802 20.15 <.0001
SI Workshop: July 15, 2005 27
Conclusions
• JRR is a flexible and convenient alternative to canned software procedures/programs
• Any statistic/procedure can be used within JRR structure, assuming it makes statistical sense
• SAS Macro coding allows parsimonious syntax and is ideal for repetitive and flexible coding