The Report Procedurepeople.uncw.edu/blumj/stt305/ppt/The Report Procedure.pdfThe REPORT Procedure We...

Post on 12-Mar-2020

7 views 0 download

Transcript of The Report Procedurepeople.uncw.edu/blumj/stt305/ppt/The Report Procedure.pdfThe REPORT Procedure We...

The Report Procedure

The REPORT Procedure

We have seen the PRINT procedure used to display data in various forms, and the MEANS procedure for summarizing data.

The REPORT procedure can do the many of these same things in a versatile format.

The REPORT Procedure General Syntax:

PROC REPORT <option(s)>; BREAK location break-variable</ option(s)>; BY <DESCENDING> variable-1

<...<DESCENDING> variable-n> <NOTSORTED>; COLUMN column-specification(s); DEFINE report-item / <usage>

<attribute(s)> <option(s)> <justification> <COLOR=color> <'column-header-1' <...'column-header-n'>> <style>;

FREQ variable; RBREAK location </ option(s)>; WEIGHT variable;

The COLUMN Statement

The result of the REPORT procedure can be viewed as a table—columns represent different variables, rows are values.

In general, we will use the columnstatement like the var statement:column variable(s); Variables will appear in the order listed in the

column statement.

Interactive vs. Non-Interactive

In previous versions of SAS, the REPORT procedure opens an interactive window.

In versions prior to 9.4 you can suppress the interactive mode, and direct output to the output window, using the NOWD option in the PROC REPORT line:proc report data=data-set nowd;

Example

A simple example:

proc report data=mysas.projects nowd;

column Region Pol_type equipmnt personel JobTotal;

run;

The result is very much like that of the PRINT procedure.

Example

The DEFINE Statement

The define statement allows attributes to be set for each variable.

Syntax:define variable / options ;

Many different options are available…

The DEFINE Statement

Some options: A quoted string: ‘Any Text’ – provides a

column label

width=n—sets the column width

format=SAS-format—sets format for display

Example Modifying the previous example:

proc report data=mysas.projects nowd;

column Region Pol_type equipmnt personel JobTotal;

define region/'Regional Office';

define pol_type/'Pollutant' width=10;

define equipmnt/'Equipment Cost' format=dollar10.;

define personel/'Personnel Cost' format=dollar10.;

define jobtotal/'Total Cost' format=dollar10.;

run;

Example

The DEFINE Statement

This output could also be generated using PROC PRINT with appropriate label and format statements.

There must be more options available in PROC REPORT…

The ORDER Option

The order option:

Orders the values of the variable in question.

Displays each value only once at the beginning of its output block.

To see this, add the order option to the options for the Region and Pol_type variables in the previous example

Output (2nd Page)

Can PRINT do this?

The GROUP Option

The group option:

Orders the values of the variable

Condenses multiple observations (if possible)

Replaces any quantitative variable(s) with a summary statistic (the sum by default)

To see this, include the group option for both the Region and Pol_type variables

The GROUP Option

Can PRINT do this?

Summary Statistics

To change the default statistic for numeric variables a statistic keyword is included as an option in its define statement.

Some statistics keywords:N MIN MAX MEAN STD MEDIAN(Similar to what is available with PROC MEANS)

Summary Statistics Example—compute means for each cost variable:

proc report data=mysas.projects nowd;

column Region Pol_type equipmnt personel JobTotal;

define region/'Regional Office' group;

define pol_type/'Pollutant' group width=10;

define equipmnt/mean 'Average Equipment Cost' format=dollar10.;

define personel/mean 'Average Personnel Cost' format=dollar10.;

define jobtotal/mean 'Average Total Cost' format=dollar10.;

run;

Summary Statistics

Summary Statistics

Suppose we want a summary of multiple statistics on a single variable, jobtotal in this example, how is that accomplished?

The same variable can be used in multiple columns in a report with differing definitions by the use of aliases

Aliases Aliases are set in the column statement:

column variable=alias …

The alias must be a legal SAS name and must not be a variable name from the current data set.

In define statements, the alias is referred to.

proc report data=mysas.projects nowd;

column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;

define region/'Regional Office' group;

define pol_type/'Pollutant' group width=10;

define num/n 'Number of Jobs' width=8;

define avg/mean 'Mean Total Cost' format=dollar12.2;

define med/median 'Median Total Cost' format=dollar12.2;

define std/std 'Std. Deviation of Total Cost' format=dollar12.2;

run;

Aliases Aliases are set in the column statement:

column variable=alias …

The alias must be a legal SAS name and must not be a variable name from the current data set.

In define statements, the alias is referred to.

proc report data=mysas.projects nowd;

column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;

define region/'Regional Office' group;

define pol_type/'Pollutant' group width=10;

define num/n 'Number of Jobs' width=8;

define avg/mean 'Mean Total Cost' format=dollar12.2;

define med/median 'Median Total Cost' format=dollar12.2;

define std/std 'Std. Deviation of Total Cost' format=dollar12.2;

run;

Aliases

Using Formats with Grouping

In past work, formats have been used to define categories.

For groups, categories are determined by the format if one is specified.

Consider:

Using Formats with Groupingproc format;

value gestation

low-<259='Premature'

259-<999='Normal'

999='Unknown'

;

run;

proc report data=mysas.birthweight nowd;

column gestation birth_wt=n birth_wt=avg birth_wt=sd;

define gestation/group 'Gestation' format=gestation.;

define n/n 'Number of Births' width=8;

define avg/mean 'Avg. Birth Weight (oz)' format=9.2;

define sd/std 'Std. Deviation' format=9.2;

run;

Using Formats with Groupingproc format;

value gestation

low-<259='Premature'

259-<999='Normal'

999='Unknown'

;

run;

proc report data=mysas.birthweight nowd;

column gestation birth_wt=n birth_wt=avg birth_wt=sd;

define gestation/group 'Gestation' format=gestation.;

define n/n 'Number of Births' width=8;

define avg/mean 'Avg. Birth Weight (oz)' format=9.2;

define sd/std 'Std. Deviation' format=9.2;

run;

Result

Try without the format.

BREAK and RBREAK

The rbreak command allows for whole report summaries.

Syntaxrbreak location </ option(s)>;

Location: before or after

Options: summarize—gives whole report summary statistic(s)

ol or dol—overline or double overline

ul or dul—underline or double underline

BREAK and RBREAK

Example

proc report data=mysas.projects nowd;

column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;

define region/'Regional Office' group;

define pol_type/'Pollutant' group width=10;

define num/n 'Number of Jobs' width=8;

define avg/mean 'Mean Total Cost' format=dollar12.2;

define med/median 'Median Total Cost' format=dollar12.2;

define std/std 'Std. Deviation of Total Cost' format=dollar12.2;

rbreak after / summarize dol;

run;

BREAK and RBREAK

Summary line added at bottom with double over-line separator.

BREAK and RBREAK

The break command allows for summaries by group.

Syntaxbreak location variable </ option(s)>;

Location: before or after Options: (in addition to previous)

skip—skips a line between groups page—skips a page between groups suppress—suppresses printing of the break variable

values at the end of each group

BREAK and RBREAK Example:

proc report data=mysas.projects nowd;

column Region Pol_type JobTotal=num JobTotal=avg JobTotal=med JobTotal=std;

define region/'Regional Office' group;

define pol_type/'Pollutant' group width=10;

define num/n 'Number of Jobs' width=8;

define avg/mean 'Mean Total Cost' format=dollar12.2;

define med/median 'Median Total Cost' format=dollar12.2;

define std/std 'Std. Deviation of Total Cost' format=dollar12.2;

break after region/summarize ol skip;

rbreak after / summarize dol;

run;

BREAK and RBREAK

BREAK and RBREAK

Summary line added at bottom of each region with over-line separator and line break before the next group.

Suppress option will remove these from the summary lines.

Other Options

In addition to the nowd option in the PROC REPORT line, you may wish to specify: headline—underlines the set of column

headings headskip—skips a line after the headings

before beginning the report box—draws borders for rows and columns of

the table

Exercise 1 Using the delay data set, create:

Exercise 2 Using the birthweight data and the following: Normal gestation is 259 days or more, less is

premature (missing values are coded as 999) Smoking status is coded as:

0, Never smoked 1, Currently smoke 2, Stopped smoking at pregnancy 3, Stopped before current pregnancy 9, Unknown

Create:

Exercise 2

Exercise 3

Use the fish data set to create the report that follows.

The standards for mercury levels are:

less than 0.5ppm is acceptable

more than 1.0ppm is toxic

between is considered as requiring action--dangerous

Exercise 3