The Report Procedurepeople.uncw.edu/blumj/stt305/ppt/The Report Procedure.pdfThe REPORT Procedure We...
Transcript of The Report Procedurepeople.uncw.edu/blumj/stt305/ppt/The Report Procedure.pdfThe REPORT Procedure We...
The Report Procedure
The REPORT Procedure
We have seen the PRINT procedure used to display data in various forms, and the MEANS procedure for summarizing data.
The REPORT procedure can do the many of these same things in a versatile format.
The REPORT Procedure General Syntax:
PROC REPORT <option(s)>; BREAK location break-variable</ option(s)>; BY <DESCENDING> variable-1
<...<DESCENDING> variable-n> <NOTSORTED>; COLUMN column-specification(s); DEFINE report-item / <usage>
<attribute(s)> <option(s)> <justification> <COLOR=color> <'column-header-1' <...'column-header-n'>> <style>;
FREQ variable; RBREAK location </ option(s)>; WEIGHT variable;
The COLUMN Statement
The result of the REPORT procedure can be viewed as a table—columns represent different variables, rows are values.
In general, we will use the columnstatement like the var statement:column variable(s); Variables will appear in the order listed in the
column statement.
Interactive vs. Non-Interactive
In previous versions of SAS, the REPORT procedure opens an interactive window.
In versions prior to 9.4 you can suppress the interactive mode, and direct output to the output window, using the NOWD option in the PROC REPORT line:proc report data=data-set nowd;
Example
A simple example:
proc report data=mysas.projects nowd;
column Region Pol_type equipmnt personel JobTotal;
run;
The result is very much like that of the PRINT procedure.
Example
The DEFINE Statement
The define statement allows attributes to be set for each variable.
Syntax:define variable / options ;
Many different options are available…
The DEFINE Statement
Some options: A quoted string: ‘Any Text’ – provides a
column label
width=n—sets the column width
format=SAS-format—sets format for display
Example Modifying the previous example:
proc report data=mysas.projects nowd;
column Region Pol_type equipmnt personel JobTotal;
define region/'Regional Office';
define pol_type/'Pollutant' width=10;
define equipmnt/'Equipment Cost' format=dollar10.;
define personel/'Personnel Cost' format=dollar10.;
define jobtotal/'Total Cost' format=dollar10.;
run;
Example
The DEFINE Statement
This output could also be generated using PROC PRINT with appropriate label and format statements.
There must be more options available in PROC REPORT…
The ORDER Option
The order option:
Orders the values of the variable in question.
Displays each value only once at the beginning of its output block.
To see this, add the order option to the options for the Region and Pol_type variables in the previous example
Output (2nd Page)
Can PRINT do this?
The GROUP Option
The group option:
Orders the values of the variable
Condenses multiple observations (if possible)
Replaces any quantitative variable(s) with a summary statistic (the sum by default)
To see this, include the group option for both the Region and Pol_type variables
The GROUP Option
Can PRINT do this?
Summary Statistics
To change the default statistic for numeric variables a statistic keyword is included as an option in its define statement.
Some statistics keywords:N MIN MAX MEAN STD MEDIAN(Similar to what is available with PROC MEANS)
Summary Statistics Example—compute means for each cost variable:
proc report data=mysas.projects nowd;
column Region Pol_type equipmnt personel JobTotal;
define region/'Regional Office' group;
define pol_type/'Pollutant' group width=10;
define equipmnt/mean 'Average Equipment Cost' format=dollar10.;
define personel/mean 'Average Personnel Cost' format=dollar10.;
define jobtotal/mean 'Average Total Cost' format=dollar10.;
run;
Summary Statistics
Summary Statistics
Suppose we want a summary of multiple statistics on a single variable, jobtotal in this example, how is that accomplished?
The same variable can be used in multiple columns in a report with differing definitions by the use of aliases
Aliases Aliases are set in the column statement:
column variable=alias …
The alias must be a legal SAS name and must not be a variable name from the current data set.
In define statements, the alias is referred to.
proc report data=mysas.projects nowd;
column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;
define region/'Regional Office' group;
define pol_type/'Pollutant' group width=10;
define num/n 'Number of Jobs' width=8;
define avg/mean 'Mean Total Cost' format=dollar12.2;
define med/median 'Median Total Cost' format=dollar12.2;
define std/std 'Std. Deviation of Total Cost' format=dollar12.2;
run;
Aliases Aliases are set in the column statement:
column variable=alias …
The alias must be a legal SAS name and must not be a variable name from the current data set.
In define statements, the alias is referred to.
proc report data=mysas.projects nowd;
column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;
define region/'Regional Office' group;
define pol_type/'Pollutant' group width=10;
define num/n 'Number of Jobs' width=8;
define avg/mean 'Mean Total Cost' format=dollar12.2;
define med/median 'Median Total Cost' format=dollar12.2;
define std/std 'Std. Deviation of Total Cost' format=dollar12.2;
run;
Aliases
Using Formats with Grouping
In past work, formats have been used to define categories.
For groups, categories are determined by the format if one is specified.
Consider:
Using Formats with Groupingproc format;
value gestation
low-<259='Premature'
259-<999='Normal'
999='Unknown'
;
run;
proc report data=mysas.birthweight nowd;
column gestation birth_wt=n birth_wt=avg birth_wt=sd;
define gestation/group 'Gestation' format=gestation.;
define n/n 'Number of Births' width=8;
define avg/mean 'Avg. Birth Weight (oz)' format=9.2;
define sd/std 'Std. Deviation' format=9.2;
run;
Using Formats with Groupingproc format;
value gestation
low-<259='Premature'
259-<999='Normal'
999='Unknown'
;
run;
proc report data=mysas.birthweight nowd;
column gestation birth_wt=n birth_wt=avg birth_wt=sd;
define gestation/group 'Gestation' format=gestation.;
define n/n 'Number of Births' width=8;
define avg/mean 'Avg. Birth Weight (oz)' format=9.2;
define sd/std 'Std. Deviation' format=9.2;
run;
Result
Try without the format.
BREAK and RBREAK
The rbreak command allows for whole report summaries.
Syntaxrbreak location </ option(s)>;
Location: before or after
Options: summarize—gives whole report summary statistic(s)
ol or dol—overline or double overline
ul or dul—underline or double underline
BREAK and RBREAK
Example
proc report data=mysas.projects nowd;
column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;
define region/'Regional Office' group;
define pol_type/'Pollutant' group width=10;
define num/n 'Number of Jobs' width=8;
define avg/mean 'Mean Total Cost' format=dollar12.2;
define med/median 'Median Total Cost' format=dollar12.2;
define std/std 'Std. Deviation of Total Cost' format=dollar12.2;
rbreak after / summarize dol;
run;
BREAK and RBREAK
Summary line added at bottom with double over-line separator.
BREAK and RBREAK
The break command allows for summaries by group.
Syntaxbreak location variable </ option(s)>;
Location: before or after Options: (in addition to previous)
skip—skips a line between groups page—skips a page between groups suppress—suppresses printing of the break variable
values at the end of each group
BREAK and RBREAK Example:
proc report data=mysas.projects nowd;
column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;
define region/'Regional Office' group;
define pol_type/'Pollutant' group width=10;
define num/n 'Number of Jobs' width=8;
define avg/mean 'Mean Total Cost' format=dollar12.2;
define med/median 'Median Total Cost' format=dollar12.2;
define std/std 'Std. Deviation of Total Cost' format=dollar12.2;
break after region/summarize ol skip;
rbreak after / summarize dol;
run;
BREAK and RBREAK
BREAK and RBREAK
Summary line added at bottom of each region with over-line separator and line break before the next group.
Suppress option will remove these from the summary lines.
Other Options
In addition to the nowd option in the PROC REPORT line, you may wish to specify: headline—underlines the set of column
headings headskip—skips a line after the headings
before beginning the report box—draws borders for rows and columns of
the table
Exercise 1 Using the delay data set, create:
Exercise 2 Using the birthweight data and the following: Normal gestation is 259 days or more, less is
premature (missing values are coded as 999) Smoking status is coded as:
0, Never smoked 1, Currently smoke 2, Stopped smoking at pregnancy 3, Stopped before current pregnancy 9, Unknown
Create:
Exercise 2
Exercise 3
Use the fish data set to create the report that follows.
The standards for mercury levels are:
less than 0.5ppm is acceptable
more than 1.0ppm is toxic
between is considered as requiring action--dangerous
Exercise 3