Data Step Manipulations
• New variables should be created during a Data step
• Existing variables should be manipulated during a data step
Missing Values in SAS
• SAS uses a period (.) to represent missing values in a SAS data set
• Different SAS procedures and functions treat missing values differently - always be careful when your SAS data set contains missing values
Working With Numeric Variables
• SAS uses the standard arithmetic operators+, -, *, /, ** (exponentiation)
Note on Missing Values: Arithmetic operators propagate missing values.
• SAS has many built-in numeric functionsround(variable,value): Rounds variable to nearest
unit given by value.
sum(variable1, variable2, …): Adds any number of variables and ignores missing values
Acting on Selected Observations
• Working with selected observations - subsets of a SAS data set - is easy in SAS
• First, you must decide on a selection process. What is the distinguishing characteristic of the observations you want to work with?
Selecting Observations: IF-THEN Statements
• The IF-THEN statement is the most common way to select observations. Format:IF condition THEN action;
• condition is one or more comparisons. For any observation, condition is either true or false. If condition is true, SAS performs the action.
IF-THEN Statement: Example
• Suppose INC is a variable representing annual household income and you want to create a dummy variable, DUM, based on income that takes value 1 when income is less than $10,000.IF INC<10000 THEN DUM=1;
IF INC >=10000 THEN DUM=0;
Using OBS in condition
• In a SAS data set, each record has an observation number which is the number stored in the variable OBS
• OBS can be used in a condition, but you must refer to the observation number using the variable _n_
• Example: set the first 10 observations of INC equal to zeroIF _n_ <= 10 THEN INC=0;
Comparison Operators
• There are 6 comparison operators• Can use either the symbol or mnemonic
Symbol Mnemonic Meaning
= EQ Equal to
^= NE Not equal to
> GT Greater than
< LT Less than
>= GE Greater than or equal to
<= LE Less than or equal to
Multiple Comparisons
• Can make more than one comparison in condition by using AND/OR
• AND / &: All parts must be true for condition to be true
• Or / |: At least one part must be true for condition to be true
• Be careful when using AND/OR• Can use parentheses in condition
Selecting Observations for New SAS Data Sets
• Can use IF-THEN statements to create new SAS data sets
• Either delete or keep selected observations based on condition
Deleting Observations
• Format for IF-THEN:IF condition THEN DELETE;
• Example: Removing missing observations. Suppose the variable INC is missing for some households and you want to drop these observationsIF INC=. THEN DELETE;
Keeping Selected Observations
• A more straightforward way to create new SAS data sets is to keep only those observations that meet some condition. Format:IF condition;
Example
• The file salary.dat contains data for 93 employees of a Chicago bank. The file contains the following variables:Y: Salary
X: Years of education
E: Months of previous work experience
T: Number of months after 1/1/69 that the individual was hired
• First 61 observations are females, last 32 males
Example: Create Dummy for Males
*Program to create dummy variables and;*new SAS data sets ;
data salary;infile ‘s:\mysas\salary.dat;input y x e t;
IF _n_ >61 THEN G=1;IF _n_ <= 60 THEN G=0;run;
Example: Create Data Set for Males
*Make a new SAS data set composed of only;*records for males ;
data males; *New SAS data set; set=salary; *Created from salary;
IF G=1;
run;
Example: Create Data Set for Females
*Make a new SAS data set composed of only;*records for females ;
data females; *New SAS data set; set=salary; *Created from salary;
IF G=0;
run;
Describing Data: Sample Statistics
• Format:
PROC UNIVARIATE <option-list>;VAR variable-list;BY variable-list;FREQ variable;WEIGHT variable;
Selected Options
DATA=SAS-data-set; Specify Data Set
If omitted, uses most recent
SAS data set
FREQ Generate Frequency Table
NOPRINT Suppress Printed Output
VAR Statement
• List of variables to calculate sample statistics for.
• If no variables are specified, sample statistics are generated for all numeric variables
WEIGHT Statement
• Specifies a numeric variable in the SAS data set whose values are used to weight each observation
BY Statement
• Can be used to obtain separate analyses on observations in groups defined by some value of a variable.
• Example: Suppose SEX=1 if individual is male, SEX=0 if individual is female; EARN=annual earnings.
PROC UNIVARIATE; *Generates statistics; VAR EARN; *on earnings for men;BY SEX; *and women;RUN;
BY Statements and Sorting
• Before using a BY statement, the SAS data set must be sorted on the variable specified
• SAS puts the observations in order, based on the values of the variables specified in the BY statement.
• Use PROC SORT
PROC SORT
• FORMAT:
PROC SORT <options>; BY <options>variables;• Sort Order: ascending. For descending,
put DESCENDING on BY line
Describing Data: Frequencies
• FORMAT:
PROC FREQ <options>; BY variables; TABLES requests</options>; WEIGHT variable;
Top Related