SAS Sort Accum Total
-
Upload
sarathannapareddy -
Category
Documents
-
view
122 -
download
1
Transcript of SAS Sort Accum Total
Sort AndSort AndAccumulating Accumulating
TotalsTotals
Last Updated : 29June, 2004
Center of Excellence
Data Warehousing
ObjectivesObjectives
Understand how the SAS System initializes the value of a variable in the PDV.
Prevent reinitialization of a variable in the PDV.
Create an accumulating variable.
Creating an Accumulating VariableCreating an Accumulating Variable
The SAS data set prog2.daysales contains daily sales data for a retail store. There is one observation for each day in April showing the date (SaleDate) and the total receipts for that day (SaleAmt).
SaleDate SaleAmt
01APR2001 498.4902APR2001 946.5003APR2001 994.9704APR2001 564.5905APR2001 783.0106APR2001 228.8207APR2001 930.5708APR2001 211.4709APR2001 156.2310APR2001 117.6911APR2001 374.7312APR2001 252.73
Creating an Accumulating VariableCreating an Accumulating VariableThe store manager also wants to see a
running total of sales for the month as of each day.
Partial Output Sale SaleDate Amt Mth2Dte
01APR2001 498.49 498.4902APR2001 946.50 1444.9903APR2001 994.97 2439.9604APR2001 564.59 3004.5505APR2001 783.01 3787.56
Creating Mth2DteCreating Mth2Dte
By default, variables created with an assignment statement are initialized to missing at the top of the DATA step.
An accumulating variable must retain its value from one observation to the next.
Mth2Dte=Mth2Dte+SaleAmt;
The RETAIN StatementThe RETAIN Statement
General form of the RETAIN statement:
The RETAIN statement prevents SAS from re-initializing the values of new variables at the top of the DATA step.
Previous values of retained variables are available for processing across iterations of the DATA step.
RETAIN variable-name <initial-value> … ;RETAIN variable-name <initial-value> … ;
The RETAIN StatementThe RETAIN Statement
The RETAIN statementretains the value of the variable in the PDV
across iterations of the DATA step initializes the retained variable to missing
before the first execution of the DATA step if an initial value is not specified
is a compile-time-only statement.
Retain Mth2Dte and Set an Initial ValueRetain Mth2Dte and Set an Initial ValueIf you do not supply an initial value, all
the values of Mth2Dte will be missing.
retain Mth2Dte 0;
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
Creating an Accumulating VariableCreating an Accumulating Variable
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MTH2DTESALEDATE
R
Compile
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
.
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
.
R
Execute
0
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
.
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
.
R
498.49 015066
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
.
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
.
R
15066 498.49 0498.49
0+498.49
......Write out observation to mnthtot.
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
R
15066 498.49 498.49
Implicit Output
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
R
15066 498.49 498.49
Implicit Return
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
498.49
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
15066
R
498.49
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
498.49
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
15066
R
15067 946.50 498.49
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
498.49
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
15066
R
15067 946.50 498.491444.99
498.49+946.50
......Write out observation to mnthtot.
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
R
15067 946.50 1444.99
Implicit Output
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
R
15067 946.50 1444.99
Implicit Return
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
946.50
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
15067
R
1444.99
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
946.50
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
15067
R
994.97 1444.9915068
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
946.50
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
15067
R
15068 994.97 1444.992439.96
1444.99+994.97
......Write out observation to mnthtot.
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
R
15068 994.97 2439.96
Implicit Output
......
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
R
15068 994.97 2439.96
Implicit Return
SaleDate SaleAmt
15066 498.49 15067 946.50 15068 994.97 15069 564.59 15070 783.01
SALEAMT
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2Dte=Mth2Dte+SaleAmt;run;
MNTH2DTESALEDATE
R
15068 994.97 2439.96
Continue processing until end of SAS data set
proc print data=mnthtot noobs; format SaleDate date9.;run;
Sale SaleDate Amt Mth2Dte
01APR2001 498.49 498.4902APR2001 946.50 1444.9903APR2001 994.97 2439.9604APR2001 564.59 3004.5505APR2001 783.01 3787.56
Partial PROC PRINT Output
Creating an Accumulating VariableCreating an Accumulating Variable
Accumulating Totals: Missing ValuesAccumulating Totals: Missing ValuesWhat happens if there are missing values
for SaleAmt?
data mnthtot; set prog2.daysales; retain Mth2Dte 0; Mth2dte=Mth2Dte+SaleAmt;run;
......
Undesirable OutputUndesirable Output
Sale SaleDate Amt Mth2Dte
01APR2001 498.49 498.4902APR2001 . .03APR2001 994.97 .04APR2001 564.59 .05APR2001 783.01 .
Missing value
Subsequent values of Mth2Dte
are missing
The Sum StatementThe Sum Statement
When creating an accumulating variable, an alternative to the RETAIN statement is the sum statement.
General form of the sum statement:
variable + expression;variable + expression;
The Sum StatementThe Sum Statement
The sum statementcreates the variable on the left side of the plus
sign if it does not already exist initializes the variable to zero before the first
iteration of the DATA stepautomatically retains the variableadds the value of the expression to the
variable at execution ignores missing values.
Accumulating Totals: Missing ValuesAccumulating Totals: Missing Values
data mnthtot2; set prog2.daysales2; Mth2Dte+SaleAmt;run;
proc print data=mnthtot2 noobs; format SaleDate date9.;run;
SaleDate SaleAmt Mth2Dte
01APR2001 498.49 498.4902APR2001 . 498.4903APR2001 994.97 1493.4604APR2001 564.59 2058.0505APR2001 783.01 2841.06
Partial PROC PRINT Output
Accumulating Totals: Missing ValuesAccumulating Totals: Missing Values
c03s1d1.sas
ObjectivesObjectives
Define First. and Last. processing. Calculate an accumulating total for
groupsof data.
Use a subsetting IF statement to output selected observations.
Accumulating Totals for GroupsAccumulating Totals for Groups
The SAS data set prog2.empsals contains each employee’s identification number (EmpID), salary (Salary), and division (Div). There is one observation for each employee.
EmpID Salary Div
E00004 42000 HUMRESE00009 34000 FINACEE00011 27000 FLTOPSE00036 20000 FINACEE00037 19000 FINACEE00048 19000 FLTOPSE00077 27000 APTOPSE00097 20000 APTOPSE00107 31000 FINACEE00123 20000 APTOPSE00155 27000 APTOPSE00171 44000 SALES
Desired OutputDesired Output
Human resources wants a new data set that shows total salary paid for each division.
Div DivSal
APTOPS 410000FINACE 163000FLTOPS 318000HUMRES 181000SALES 373000
Grouping the Data Grouping the Data
You must group the data in the SAS data set before you can perform processing.
A
B
CD
E
Review of the SORT ProcedureReview of the SORT Procedure
You can rearrange the observations into groups using the SORT procedure.
General form of a PROC SORT step:
PROC SORT DATA=input-SAS-data-set <OUT=output-SAS-data-set>;BY <DESCENDING> BY-variable ...;
RUN;
PROC SORT DATA=input-SAS-data-set <OUT=output-SAS-data-set>;BY <DESCENDING> BY-variable ...;
RUN;
The SORT ProcedureThe SORT Procedure
The SORT procedurerearranges the observations in a DATA setcan sort on multiple variablescreates a SAS data set that is a sorted copy of
the input SAS data setreplaces the input data set by default.
......
Processing Data in GroupsProcessing Data in Groups
APTOPSAPTOPSAPTOPSFINACEFINACEFINACEFINACESALESSALES
Div
20000100000 50000250002000023000270001000012000
Salary DivSal
170000
95000
22000
BY-Group ProcessingBY-Group Processing
General form of a BY statement used with the SET statement:
The BY statement in the DATA step enables you to process your data in groups.
DATA output-SAS-data-set;SET input-SAS-data-set;BY BY-variable … ;<additional SAS statements>
RUN;
DATA output-SAS-data-set;SET input-SAS-data-set;BY BY-variable … ;<additional SAS statements>
RUN;
data divsals(keep=Div DivSal); set salsort; by Div; additional SAS statementsrun;
BY-Group ProcessingBY-Group Processing
BY-Group ProcessingBY-Group Processing
A BY statement in a DATA step creates temporary variables for each variable listed in the BY statement.
General form of the names of BY variables in a DATA step:
First.BY-variableLast.BY-variable
First.BY-variableLast.BY-variable
First. and Last. ValuesFirst. and Last. Values
The First. variable has a value of 1 for the first observation in a BY group; otherwise, it equals 0.
The Last. variable has a value of 1 for the last observation in a BY group; otherwise, it equals 0.
Use these temporary variables to conditionally process sorted, grouped, or indexed data.
......
APTOPSAPTOPSAPTOPSFINACEFINACEFINACEFINACESALESSALES
Div
20000100000 50000250002000023000270001000012000
Salary
First. / Last. ExampleFirst. / Last. Example
First.Div
Look Ahead
Last.Div
0
1
......
APTOPSAPTOPSAPTOPSFINACEFINACEFINACEFINACESALESSALES
Div
20000100000 50000250002000023000270001000012000
Salary
First. / Last. ExampleFirst. / Last. Example
Look AheadFirst.Div
Last.Div
0
0
......
APTOPSAPTOPSAPTOPSFINACEFINACEFINACEFINACESALESSALES
Div
20000100000 50000250002000023000270001000012000
Salary
First. / Last. ExampleFirst. / Last. Example
Look AheadFirst.Div
Last.Div
1
0
......
First. / Last. ExampleFirst. / Last. Example
APTOPSAPTOPSAPTOPSFINACEFINACEFINACEFINACESALESSALES
Div
20000100000 50000250002000023000270001000012000
Salary First.Div
Last.Div
0
1Look Ahead
First. / Last. ExampleFirst. / Last. Example
APTOPSAPTOPSAPTOPSFINACEFINACEFINACEFINACESALESSALES
Div
20000100000 50000250002000023000270001000012000
Salary First.Div
Last.Div
0
1Look Ahead
What Must Happen When?What Must Happen When?
There is a three-step process for accumulating totals:
1. Set the accumulating variable to 0 at the start of each BY group.
2. Increment the accumulating variable with a sum statement (automatically retains).
3. Output only the last observation of each BY group.
data divsals(keep=Div DivSal); set salsort; by Div; if First.Div then DivSal=0; additional SAS statementsrun;
Accumulating Totals for GroupsAccumulating Totals for Groups
1. Set the accumulating variable to 0 at the start of each BY group.
data divsals(keep=Div DivSal); set salsort; by Div; if First.Div then DivSal=0; DivSal+Salary; additional SAS statementsrun;
Accumulating Totals for GroupsAccumulating Totals for Groups
2. Increment the accumulating variable with a sum statement (automatically retains).
First. / Last. ExampleFirst. / Last. Example
Div Salary DivSal
APTOPSAPTOPSAPTOPSFINACEFINACEFINACEFINACESALESSALES
20000100000 50000250002000023000270001000012000
20000120000170000
250004500068000910001000022000
Subsetting IF StatementSubsetting IF Statement
The subsetting IF defines a condition that the observation must meet to be further processed by the DATA step.
General form of the subsetting IF statement:
If the expression is true, the DATA step continues processing the current observation.
If the expression is false, SAS returns to the top of the DATA step.
IF expression;IF expression;
data divsals(keep=Div DivSal); set salsort; by Div; if First.Div then DivSal=0; DivSal+Salary; if Last.Div;run;
Accumulating Totals for GroupsAccumulating Totals for Groups
3. Output only the last observation of each BY group.
......
Subsetting IF Statement (Review)
...
Initialize PDV.Initialize PDV.
Execute program statements.
Execute program statements.
Is the condition
true?
NO
Execute additional program statements.
Execute additional program statements.
Output observation to SAS data set.
Output observation to SAS data set.
YES
If condition;
NOTE: There were 39 observations read from the data set WORK.SALSORT.NOTE: The data set WORK.DIVSALS has 5 observations and 2 variables.NOTE: DATA statement used: real time 0.74 seconds cpu time 0.33 seconds
Accumulating Totals for GroupsAccumulating Totals for Groups
Partial Log
proc print data=divsals noobs;run;
Div DivSal
APTOPS 410000FINACE 163000FLTOPS 318000HUMRES 181000SALES 373000
Accumulating Totals for GroupsAccumulating Totals for Groups
PROC PRINT Output
c03s2d1.sas
Input DataInput Data
The SAS data set prog2.regsals contains each employee’s ID number (EmpID), salary (Salary), region (Region), and division (Div). There is one observation for each employee.
EmpID Salary Region Div
E00004 42000 E HUMRESE00009 34000 W FINACEE00011 27000 W FLTOPSE00036 20000 W FINACEE00037 19000 E FINACEE00077 27000 C APTOPSE00097 20000 E APTOPSE00107 31000 E FINACEE00123 20000 NC APTOPSE00155 27000 W APTOPSE00171 44000 W SALESE00188 37000 W HUMRESE00196 43000 C APTOPSE00210 31000 E APTOPSE00222 250000 NC SALESE00236 41000 W APTOPS
NumRegion Div DivSal Emps
C APTOPS 70000 2 E APTOPS 83000 3 E FINACE 109000 4 E FLTOPS 122000 3 E HUMRES 178000 5 NC APTOPS 37000 2 NC FLTOPS 28000 1
Desired OutputDesired Output
Human resources wants a new data set that shows the total salary paid and the total number of employees for each division in each region.
Partial Output
proc sort data=prog2.regsals out=regsort; by Region Div;run;
Sorting by Region and DivSorting by Region and Div
The data must be sorted by Region and Div. Region is the primary sort variable. Div is the secondary sort variable.
proc print data=regsort noobs;run;
Sorting by Region and DivSorting by Region and Div
Region Div Salary
C APTOPS 27000 C APTOPS 43000 E APTOPS 20000 E APTOPS 31000 E APTOPS 32000 E FINACE 19000 E FINACE 31000
Partial PROC PRINT Output
data regdivsals; set regsort; by Region Div; additional SAS statementsrun;
Multiple BY VariablesMultiple BY Variables
......
CCCEEENCNCNCNCNC
Region Div First.Region
Look Ahead
Last.Div
0
1APTOPSAPTOPSAPTOPSAPTOPSFINACEFINACEFINACESALESSALESSALESSALES
First.Div
Last.Region
0
1
Multiple BY Variables: ExampleMultiple BY Variables: Example
......
CCCEEENCNCNCNCNC
Region Div First.RegionLook Ahead
Last.Div
0
0APTOPSAPTOPSAPTOPSAPTOPSFINACEFINACEFINACESALESSALESSALESSALES
First.Div
Last.Region
0
0
Multiple BY Variables: ExampleMultiple BY Variables: Example
......
CCCEEENCNCNCNCNC
Region Div First.RegionLook Ahead
Last.Div
1
0APTOPSAPTOPSAPTOPSAPTOPSFINACEFINACEFINACESALESSALESSALESSALES
First.Div
Last.Region
1
0
Multiple BY Variables: ExampleMultiple BY Variables: Example
......
CCCEEENCNCNCNCNC
Region Div First.RegionLook Ahead
Last.Div
1
1APTOPSAPTOPSAPTOPSAPTOPSFINACEFINACEFINACESALESSALESSALESSALES
First.Div
Last.Region
0
1
Multiple BY Variables: ExampleMultiple BY Variables: Example
CCCEEENCNCNCNCNC
Region Div First.RegionLook Ahead
Last.Div
1
1APTOPSAPTOPSAPTOPSAPTOPSFINACEFINACEFINACESALESSALESSALESSALES
First.Div
Last.Region
0
1
Multiple BY Variables: ExampleMultiple BY Variables: Example
Multiple BY VariablesMultiple BY Variables
When you use more than one variable in the BY statement, a change in the primary variable forces Last.BY-variable=1 for the secondary variable. First. Last. First.Region Div Region Region Div Last.Div
C APTOPS 1 0 1 0 C APTOPS 0 1 0 1 E APTOPS 1 0 1 0 E APTOPS 0 0 0 0 E APTOPS 0 0 0 1 E FINACE 0 0 1 0
/*Summarize salaries by division*/data regdivsals(keep=Region Div DivSal NumEmps); set regsort; by Region Div; if First.Div then do; DivSal=0; NumEmps=0; end; DivSal+Salary; NumEmps+1; if Last.Div;run;
Multiple BY VariablesMultiple BY Variables
NOTE: There were 39 observations read from the data set WORK.REGSORT.NOTE: The data set WORK.REGDIVSALS has 14 observations and 4 variables.NOTE: DATA statement used: real time 0.07 seconds cpu time 0.07 seconds
Multiple BY VariablesMultiple BY Variables
Partial Log
proc print data=regdivsals noobs;run;
Region Div DivSal
C APTOPS 70000 E APTOPS 83000 E FINACE 109000 E FLTOPS 122000
Partial PROC PRINT Output
Multiple BY VariablesMultiple BY Variables
c03s2d2.sas