Economics 206 A Brief Introduction to SAS for...

23
Economics 206 A Brief Introduction to SAS for Windows Beomsoo Kim SAS for Windows is a powerful statistical program with the flexibility to estimate many models used in applied econometric work. Although SAS is not as fast as other mathematical packages, it is the most widely-used statistical software package. The principal advantage of SAS is that one can easily create, alter, and manipulate large data sets and estimate complicated models all within the same environment. This handout gives a brief introduction to the SAS system. I illustrate the structure of the SAS programing language, how to write and submit basic programs, etc. To assist you in your programing, I recommended the purchase of Delwiche and Slaughter’s The Little SAS Book, which is a nice introduction to the SAS system. For each technique we will use, I will provide a sample program and all necessary documentation. In the problem sets, I will also indicate what sections of The Little SAS Book are most useful. Questions concerning the proper syntax of a procedure can sometimes be answered by the on- line help. I can answer your questions concerning SAS. In most cases, I can detect mistakes quickly. Feel free to email me your questions at [email protected]. When you send me questions, please send in your email a copy of the program as well as the SAS log. I cannot answer your questions until I see a program and the error messages in the SAS log. The Basics of SAS Once you’ve called up the SAS program, you see a screen like the one displayed below. The screen displays two or three different boxes -- depending on your machine. For each SAS job you run, there are three components: the program output, the SAS log, and the program editor. The box marked “PROGRAM EDITOR” is the window where programs can be written, edited, and executed. The box marked “LOG” is a printout of program statements submitted plus messages generated by the system. These messages include such items as how long a procedure took, how many observations were read into the data set, and all error messages. The SAS LOG is your friend. It lists programming errors and sometimes suggests ways to fix the code. As you begin to use SAS, you will spend a great deal of time looking through error messages in the SAS LOG. The final component box is called “OUTPUT” and this is where results from your programs will be written. Since there is no output yet, the output box is not displayed. You can move between the boxes by clicking on

Transcript of Economics 206 A Brief Introduction to SAS for...

Page 1: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Economics 206 A Brief Introduction to SAS for Windows

Beomsoo Kim SAS for Windows is a powerful statistical program with the flexibility to estimate many models used in applied econometric work. Although SAS is not as fast as other mathematical packages, it is the most widely-used statistical software package. The principal advantage of SAS is that one can easily create, alter, and manipulate large data sets and estimate complicated models all within the same environment. This handout gives a brief introduction to the SAS system. I illustrate the structure of the SAS programing language, how to write and submit basic programs, etc. To assist you in your programing, I recommended the purchase of Delwiche and Slaughter’s The Little SAS Book, which is a nice introduction to the SAS system. For each technique we will use, I will provide a sample program and all necessary documentation. In the problem sets, I will also indicate what sections of The Little SAS Book are most useful. Questions concerning the proper syntax of a procedure can sometimes be answered by the on-line help. I can answer your questions concerning SAS. In most cases, I can detect mistakes quickly. Feel free to email me your questions at [email protected]. When you send me questions, please send in your email a copy of the program as well as the SAS log. I cannot answer your questions until I see a program and the error messages in the SAS log.

The Basics of SAS Once you’ve called up the SAS program, you see a screen like the one displayed below. The screen displays two or three different boxes -- depending on your machine. For each SAS job you run, there are three components: the program output, the SAS log, and the program editor. The box marked “PROGRAM EDITOR” is the window where programs can be written, edited, and executed. The box marked “LOG” is a printout of program statements submitted plus messages generated by the system. These messages include such items as how long a procedure took, how many observations were read into the data set, and all error messages. The SAS LOG is your friend. It lists programming errors and sometimes suggests ways to fix the code. As you begin to use SAS, you will spend a great deal of time looking through error messages in the SAS LOG. The final component box is called “OUTPUT” and this is where results from your programs will be written. Since there is no output yet, the output box is not displayed. You can move between the boxes by clicking on

Page 2: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

the “WINDOWS” text in the tool bar and clicking on the box (program editor/log/output) you want. Alternatively, you can move between the boxes with WIDOWS options at the top of the page, or by using the function keys: F7 moves to OUTPUT, F6 moves to the LOG, and F5 to the PROGRAM EDITOR. To review the hot keys, hit F9.

Once in the appropriate box, you can expand the view to full screen by clicking on the “up” arrow icon in the upper right-hand corner of the box. All output and logs will remain in the boxes until they are cleared by the user. Output can also be saved to a disk. The easiest way to handle these tasks is to click on the right button on the mouse and use the pop-up menus. Please do not save programs or output to the hard disks. There are two types of SAS steps than can be performed: DATA steps and PROCedures. DATA steps allow the user to read in raw files from another median into a SAS data set, manipulate the data (such as creating and/or transforming data, creating dummy variables, deleting observations, or merging different data sets), or output a SAS data set into an ASCII file. PROCedures use SAS data sets and perform the appropriate statistical tests. PROCedures vary and can be as simple as generating frequencies or as complicated as nonlinear three-stage least-squares with cross equation restrictions.

Page 3: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Note that there are a number of lines of code that begin with and asterisk (*) and end with a semicolon(;). These are comments and not actual programming statements. They are presented to provide you with a road map of what the program is trying to accomplish. It is good programming practice to use comments in your work.

THE SAS Data Step In order to perform statistical operations in SAS, raw data must first be read into a SAS data file. A SAS data file is essentially a group of variables all stored under the same name. Most SAS PROCedures operate by first accessing a data set, then choosing particular variables from within the data set. The first block of executable statements in the program are SAS DATA steps. Every time a DATA statement is invoked, a new SAS data file is created. SAS data files can be created by either reading raw data into a SAS file, editing an existing SAS file, or by merging two or more SAS files. SAS executes a block of DATA statements together. The group of statements begin with the DATA command and end with the RUN statement. All statements in between are executed at the same time. Ends of lines are delimited by the semicolon (;). SAS will continue to read statements as one line until it finds another semicolon. One SAS statement may take 10 lines of code, but SAS will keep reading each line until it finds another semicolon.

A Reading raw data into a SAS data file In the first DATA step, the SAS data file named ONE is created by reading in data from an ASCII data set. Data set names can start with either a letter or a number and they can be no longer than eight characters in length. This new SAS data set is a temporary file that will be erased once you end your SAS session. The second line tells the computer where the raw data is located. In this instance, the data is in the ASCII file s:\econ\econ321\marcps98.asc. Just a brief word about the data set. The data set contains demographic and labor supply data for 844 Maryland residents, aged 19-64, from the March 1998 Current Population Survey. The CPS is a nationally representative monthly survey of about 50,000 households, conducted by the U.S. Census Department for the Bureau of Labor Statistics. The survey is the primary source of state and national labor market information such as the unemployment rate. In the March survey, all adults 15 and above are also asked detailed question about labor market experiences in the previous year, so the 1998 survey contains data on work in 1997. This data set is a much smaller version of the original March CPS.

Page 4: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Each row in the raw data file contains information on 9 variables. The names and definitions of the 9 variables are listed in order in Appendix A and Appendix B lists the first 20 lines of the file. The raw data file is in matrix form, with 844 rows -- one for each person in the data set. The variables in the data set are “packed” to conserve space which means that there are no blanks between variables. To input packed data into a SAS data file, you must a) list the variables you want read into the data set and b) tell the computer to look in certain columns for each variable. This is accomplished with the INPUT statement. The INPUT statement contains variable names for all 9 variables and the columns places of variable locations. Variable names must be eight characters or shorter and they must begin with a letter. Variable names can contain numbers and special characters such as underscores. Notice that the INPUT statement covers 10 lines. The computer continues to read code until it finds a semicolon (;). I just put each variable on a different line so it is easier to read where each variable is located in the data set. The columns numbers for each variable are given in Appendix A and illustrated in Appendix B. Since age is the first variable and it takes up 2 spaces, it takes up the first 2 columns of each row. The next 5 variables all take up 1 character each. WEEKS worked and HOURS per week can take up to 2 characters and up to 6 characters are reserved for the annual EARNINGS variable. Subsequently, we read data from places 1-2 for age, 3 for SEX, 4 for RACE, .... 8-9 for WEEKS, 10-11 for HOURS and 12-17 for EARNINGS. When data is packed, not all columns may be used for each person. For example, the final 6 spaces have been reserved for the annual earnings variable. If someone has no labor earnings, then only column 17 is used and it contains a zero (0). If a person makes $25,000 per year, only the last 5 columns are used. However, if a person makes between $100,000 and $999,999 in income, all of the final 6 columns are used. Looking at Appendix B, we can map the variable definitions for each column to describe the characteristics of the respondents. The first person in the data set is a 58 year old female (F) who is black (B), nonhispanic (2), has a college degree (6), married (1), worked 0 weeks and 0 hours per week in 1997, generating no labor income. The second person is a 33 year old male (M), who is white (W), nonhispanic (2), college educated (6), never married (5), who worked 50 weeks per year, 40 hours per week, generating $21,000 in labor income in 1997. Notice in the variable list from Appendix A that 7 variables are numeric values, but two (RACE and SEX) contain characters. SAS automatically assumes that it should read the data as numeric variables, unless it is told otherwise. To read a character, we need to add a dollar sign ($) after the variable name and before the column identifiers. This lets SAS know that the variables SEX and RACE will be stored as characters rather than numeric values.

Page 5: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

After the variables are loaded into a data set, the variables are LABELed. Labels provide a short description of the variable. Labels can be no longer than 40 characters. It is good programming practice to label all your variables. Labels make it much easier to return to a project in the future. The syntax for labels is as follows LABEL=’variable label’; where the variable label must be in apostrophes and end in a semicolon. Note again that SAS statements must end with a semicolon, and all groups of DATA and PROCedure statements must end with a ”RUN;” command.

B Constructing a new data set from and existing SAS data file The next series of data statements demonstrate how to create a new SAS file from an existing SAS file. You will want to construct a new file when you want to add new variables by transforming existing variables. In these next lines of code, I create the SAS file named TWO, by SETting the original file ONE. This second DATA step does not erase data set ONE. Instead, both the old (ONE) and the new data sets (TWO) can be accessed later on in a program by PROCedures or other data steps. In this series of data steps, I construct new variables by transforming existing ones. The simplest type of variable transformation is to use a mathematical operator. For example, you could construct the total number of hours worked last year by multiplying HOURS and WEEKS together. One of the more useful types of variables we will use in class are called “dummy variables.” These variables are simple indicators that equal 1 or 0. For example, you can use these variables to describe whether a respondent is a male, a worker, a student, etc. The variables are generated in SAS through the use of a “logical operator” which constructs a new variable that equals 1 when the operator is true and a 0. otherwise. For example, suppose the data set already contains the variable X1. The statement X2=X1>0. constructs a new variable X2 that equals 1 if the statement (X1>0) is true and it equals zero otherwise. Logical operators can have multiple parts, for example, the statement X3=((X1>0) or (Y1>0)); constructs the variable X3 which equals 1 if X1>0 or Y1>0. The sample program contructs five different dummy variables in the lines: hsgrad=educ>=4; worked97=weeks>0; male=sex='M';

Page 6: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

nonwhite=race~='W'; fulltime=((weeks>=40) and (hours>=30)); The first line creates a new variable HSGRAD that equals 1 if the person has a high school degree (EDUC>=4). The second constructs a variable WORKED97 that equals 1 if a person worked at some point in the previous year (WEEKS>0). The third constructs a variable MALE that equals 1 if the variable SEX equals M. In the next statement, a variable NONWHITE equals 1 if the race variable does not equal (the ”~=” expression) the character W. Note from these last two variables that when the logical operator is examining character values, the value of the character must be in apostrophe. Finally, the construction of the variable FULLTIME illustrates how one can make a dummy variable with compound statements. In this case, fulltime workers are defined as people who work 40 or more weeks AND 30 or more hours per week. Another handy type of data statements are “IF/THEN” clasues. If/then statements are a series of logical expressions that are used to recode variables. For example, suppose you just wanted to group respondents into broad age ranges. The next series of statements construct a discrete variable named AGEGROUP that equals 1 when respondents are 19-29, 2 when they are 30-39, 3 when 40-49, and 4 when 50 and above. Note that the new SAS data file TWO contains the new variables (HOURS_YR, HSGRAD, WORKED97, MALE, NONWHITE, FULLTIME, and AGEG) plus the 9 original variables in the data set ONE.

V Generating Simple Statistics Once we've created a SAS file, you can now utilize the data to generate simple statistics. In this section, I describe some of these simple procedures. For discrete variables with a small number of groups (like SEX, RACE, EDUC and AGEGROUP), you may want to examine the frequency of responses in the sample. This can easily be described with a PROCedure called FREQ. The syntax for the procedure is: proc freq data=name; title ‘output title’; tables variable names; run; Note that in any procedure, you must inform SAS what data set you want to use. The list of variables you want frequencies of is given in the TABLES line. Before the TABLES command, you may want to add titles to the printed output that describe the procedure and the data. In the sample program, I include 2

Page 7: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

titles for this PROC FREQ. Titles are contained within apostrophes and must end in a semicolon. For continuous variables or numeric variables with a large number of categories, you may want to describe the data with PROC MEANS. The syntax for PROC MEANS is as follows: proc means data=name; title ‘output title’; var variable names; run; In the sample program, I ask for descriptive information on 6 variables. MEANS automatically prints out 4 pieces of information for each variable -- the mean, standard deviation plus the minium and maximum values.

VI Restricting the Observations Used in Your Analysis The data sets ONE and TWO contain information on all Maryland residents, aged 19-64, who responded to the survey. One may be interested in calculating statistics on only a subset of respondents, such as males or high school graduates or workers. In the next set of statements, we construct a new data set called THREE that contains data only for people who worked at some point in the previous year. This is done by the logical “IF” statement. When the logical statement is true, observations are kept, and when the statement if false, the observations are dropped. We construct the new data THREE by SETing data set TWO. By using the “IF WEEKS>0" statement, we keep only those observations where people had positive weeks worked in the previous year. In this data step, I also illustrate the use of some other mathematical operators. By restricting the sample to workers, we may want to calculate average weekly or hourly earnings.

VII More Detailed Descriptive Statistics Next, using PROC MEANS, I obtain descriptive statistics for the new variables. There are 729 workers in this sample. The average worker works 47.4 weeks per year, 41.0 hours per week, makes $33,962.91 per year in wages and salary, earns $708.23 per week, and makes $16.78 per hour. A researcher may be interested in looking at not only sample means, but also means of variables for certain groups of observations. For example, we may want to know what are average values of annual earnings for males and females. Means by subgroups can be generated by inserting a

Page 8: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

“CLASS” statement into the program, where the variables listed in CLASS describe the categories for which we want means. The syntax for this type of procedure is listed below. proc means data=name; title ‘output title’; class discrete variable name; var variable names; run; In the sample program, I construct sample means for three variables for both males and females. Looking at the output, we see that females earn an average of $28,056.71 per year, much lower than the $39,917.92 per year that males earn. Next, I construct sample means of annual earnings by age group. Notice in the output than annual earnings increase at a steady rate as the average age increases. Those aged 19-29 (AGEGROUP=1) on average $20,198.64, rising to $32,646.23 for those 30-39 (AGEGROUP=2), to $38,857.84 for those 40-49 (AGRGROUP=3) and to $45,575.55 for those 50 and above (AGEGROUP=4). Finally, a researcher may want to know more than the mean of a particular variable. A more complete description of the data can be obtained with the PROC UNIVARIATE command. The syntax for this procedure is identical to that of PROC MEANS proc univariate data=name; title ‘output title’; var variable names; run; In this case, although the average person makes $33,962.91, the median (50th percentile highest) value is only $28,000. Finally, if at any time, you want to know what variables are in your data set, run a PROC CONTENTS: proc contents data=name; run; Contents will list the number of observations, plus all variable names and labels.

Page 9: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

VIII Submitting a SAS Job Now that your have some idea about what the SAS program will try to do, it is time to execute the program. To submit a program written in the program editor, make sure the cursor is in the program editor, then click on the little running man in the box on the tool bar. As the program executes, a description of the program will appear in the LOG box. When the job has finished executing, you can view the LOG or OUTPUT generated by the program. Hit F7 to move to the OUTPUT box and hit F6 to move to the LOG. I have included a copy of the output in Appendix C. If your program "bombs," look at the SAS LOG (F6) to detect any errors. Once you've located the error, you can correct the program by first recalling, then editing the text. First, hit F5 to go to the PROGRAM EDITOR. The function key F4 recalls the previous program. You can recall any program run during the current SAS session by hitting F4 as many times as needed. If you have run 10 programs, you can hit F4 10 times to recall the first program.

IV Ending your SAS Session To end a SAS session, double click on SAS trademark box in the upper left-hand corner of the SAS Window. You will be asked if you want to exit SAS and you hit “YES.” To log off the network, click on the START button in the lower left hand corner of the page and click on shut down and choose log off network. Make sure you exit the network!!! If you do not exit the network, someone can use your print card!!!

Page 10: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Appendix A Variable definitions for marcps98.asc

Page 11: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Appendix B: First 20 Lines of the ASCII FILE marcps98.asc 58FB261 0 0 0

33MW2655040 21000

46MB2715240 28000

40FW241 0 0 0

20MW255 440 4000

46FW2735260 40000

51MW2745280330659

56MW2615260 37000

52FW251 0 0 0

38FW2715220 14000

41MW2715260 90000

32FW2415240 27000

32MW2715245 53000

29FW2555235 14000

26FW2555240 13000

25MW2555230 10920

25MW2652440 24000

32FW2755220 19000

30FW2755220 19000

39MA2715250 90000

Page 12: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Variable Placement in Raw Data Set

Page 13: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Appendix C: marcps98.sas * this program reads in data from;

* the ascii file marcps98.prn;

* and puts it into sas format;

* the information is taken from;

* marlyand residents, aged 19-64, from;

* the 1998 march current population survey;

options ls=132 ps=500;

* read the raw data into a sas file and label variables;

data one;

infile '??\marcps98.asc';

input

age 1-2

sex $ 3

race $ 4

hispanic 5

educ 6

marital 7

weeks 8-9

hours 10-11

earnings 12-17;

*label variables;

label age='age in years';

label sex='M=male, F=female';

label race='W=white,B=black,I=Amer Indian,A=Asian';

label hispanic='1=hispanic origin, 2=non hispanic';

label educ='educational attainment';

label marital='1=married,2=widow,3=div,4=sep,5=nev mar';

label weeks='weeks worked in 1997';

label hours='hours worked per week in 1997';

label earnings='wage and salary earnings in 1997';

run;

* construct a new sas data set named two;

* from the original data one;

data two;

Page 14: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

set one;

* illustrate some mathematical operations;

* basic algebraic functions;

hours_yr=hours*weeks;

label hours_yr='hours worked last year';

* illustrate use of logical operators;

hsgrad=educ>=4;

worked97=weeks>0;

male=sex='M';

nonwhite=race~='W';

fulltime=((weeks>=40) and (hours>=30));

label hsgrad='=1 if a hs graduate, 0 otherwise';

label male='=1 if male, =0 otherwise';

label worked97='=1 if worked in 1997, =0 otherwise';

label fulltime='=1 if worked fulltime in 1997';

label nonwhite='=1 if nonwhite, 0 otherwise';

* illustrate if-then statements;

if age<=29 then agegroup=1;

else if 30<=age<=39 then agegroup=2;

else if 40<=age<=49 then agegroup=3;

else agegroup=4;

label agegroup='ages,1=(<30),2=(30-39),3=(40-49),4=(50+)';

run;

* get frequencies of important variables;

proc freq data=two;

title1 'frequencies of discrete variables';

title2 'maryland adults, 19-64, march cps';

tables sex race marital educ agegroup;

run;

* get means of important variables;

proc means data=two;

title1 'means of some variables';

title2 'maryland adults, 19-64, march cps';

var age male hsgrad worked97 fulltime nonwhite hours_yr;

run;

Page 15: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

* reduce the sample to only include workers;

* construct some new variables;

data three;

set two;

if weeks>0;

* illustrate mathematical operations;

earn_wke=earnings/weeks;

wage_hr=earn_wke/hours;

run;

* get means of some labor market outcomes;

proc means data=three;

title1 'means of labor market variables';

title2 'maryland adults, 19-64, march cps';

title3 'working subsample';

var earnings earn_wke wage_hr hours weeks;

run;

* get means of earnings by sex;

proc means data=three;

title1 'means of labor market variables by sex';

title2 'maryland adults, 19-64, march cps';

title3 'working subsample';

class sex;

var earnings wage_hr hours;

run;

* get means of earnings by age group;

proc means data=three;

title1 'means of labor market variables by age group';

title2 'maryland adults, 19-64, march cps';

title3 'working subsample';

class agegroup;

var earnings;

run;

* get entire distribution of earnings among workers;

proc univariate data=three;

var earnings;

run;

Page 16: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

* get contents of data set;

proc contents data=three;

run;

Appendix D: marcps98.out

frequencies of discrete variables

maryland adults, 19-64, march cps

The FREQ Procedure

M=male, F=female

Cumulative Cumulative

sex Frequency Percent Frequency Percent

--------------------------------------------------------

F 448 53.08 448 53.08

M 396 46.92 844 100.00

W=white,B=black,I=Amer Indian,A=Asian

Cumulative Cumulative

race Frequency Percent Frequency Percent

---------------------------------------------------------

A 44 5.21 44 5.21

B 201 23.82 245 29.03

I 4 0.47 249 29.50

W 595 70.50 844 100.00

1=married,2=widow,3=div,4=sep,5=nev mar

Cumulative Cumulative

marital Frequency Percent Frequency Percent

Page 17: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

----------------------------------------------------------

--

1 486 57.58 486 57.58

2 20 2.37 506 59.95

3 70 8.29 576 68.25

4 35 4.15 611 72.39

5 233 27.61 844 100.00

educational attainment

Cumulative Cumulative

educ Frequency Percent Frequency Percent

---------------------------------------------------------

1 26 3.08 26 3.08

2 62 7.35 88 10.43

3 11 1.30 99 11.73

4 267 31.64 366 43.36

5 208 24.64 574 68.01

6 171 20.26 745 88.27

7 99 11.73 844 100.00

ages,1=(<30),2=(30-39),3=(40-49),4=(50+)

Cumulative Cumulative

agegroup Frequency Percent Frequency Percent

-----------------------------------------------------------

--

1 201 23.82 201 23.82

2 231 27.37 432 51.18

3 212 25.12 644 76.30

4 200 23.70 844 100.00

Page 18: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

means of some variables

16:06 Thursday, March 27, 2008 2

maryland adults, 19-64, march cps

The MEANS Procedure

Variable Label N Mean Std Dev Minimum Maximum

---------------------------------------------------------------------------------------------------

------------------

age age in years 844 39.7488152 11.9890593 19.0000000 64.0000000

male =1 if male, =0 otherwise 844 0.4691943 0.4993460 0 1.0000000

hsgrad =1 if a hs graduate, 0 otherwise 844 0.8827014 0.3219665 0 1.0000000

worked97 =1 if worked in 1997, =0 otherwise 844 0.8637441 0.3432635 0 1.0000000

fulltime =1 if worked fulltime in 1997 844 0.6990521 0.4589421 0 1.0000000

nonwhite =1 if nonwhite, 0 otherwise 844 0.2950237 0.4563238 0 1.0000000

hours_yr hours worked last year 844 1704.01 949.1269222 0 4160.00

---------------------------------------------------------------------------------------------------

------------------

Page 19: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

means of labor market variables

16:06 Thursday, March 27, 2008 3

maryland adults, 19-64, march cps

working subsample

The MEANS Procedure

Variable Label N Mean Std Dev Minimum Maximum

--------------------------------------------------------------------------------------------------

-----------------

earnings wage and salary earnings in 1997 729 33962.91 32794.83 20.0000000 330659.00

earn_wke 729 708.2339670 837.7281899 7.5000000 16504.75

wage_hr 729 16.7788008 17.3449756 0.1875000 330.0950000

hours hours worked per week in 1997 729 41.0082305 11.2090494 4.0000000 99.0000000

weeks weeks worked in 1997 729 47.4019204 11.0849143 1.0000000

52.0000000

--------------------------------------------------------------------------------------------------

-----------------

Page 20: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

means of labor market variables by sex

16:06 Thursday, March 27, 2008 4

maryland adults, 19-64, march cps

working subsample

The MEANS Procedure

M=male, N

F=female Obs Variable Label N Mean Std Dev Minimum

Maximum

-------------------------------------------------------------------------------------------------------

------------------------

F 366 earnings wage and salary earnings in 1997 366 28056.71 24757.37 20.0000000

306468.00

wage_hr 366 14.8779490 12.8456093 0.1875000

150.0000000

hours hours worked per week in 1997 366 38.3715847 11.4859610 4.0000000

99.0000000

M 363 earnings wage and salary earnings in 1997 363 39917.92 38393.36 700.0000000

330659.00

wage_hr 363 18.6953621 20.7679312 1.3333333

330.0950000

hours hours worked per week in 1997 363 43.6666667 10.2731569 10.0000000

80.0000000

-------------------------------------------------------------------------------------------------------

------------------------

Page 21: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

means of labor market variables by age group

16:06 Thursday, March 27, 2008 5

maryland adults, 19-64, march cps

working subsample

The MEANS Procedure

Analysis Variable : earnings wage and salary earnings in 1997

ages,1=(<30),

2=(30-39),3=(

40-49), N

4=(50+) Obs N Mean Std Dev Minimum Maximum

----------------------------------------------------------------------------------------

---

1 177 177 20198.64 27444.04 420.0000000 330095.00

2 205 205 32646.23 19812.42 300.0000000 95000.00

3 197 197 38857.84 31813.99 2400.00 330659.00

4 150 150 45575.55 45841.48 20.0000000 330659.00

----------------------------------------------------------------------------------------

---

Page 22: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

means of labor market variables by age group

16:06 Thursday, March 27, 2008 6

maryland adults, 19-64, march cps

working subsample

The UNIVARIATE Procedure

Variable: earnings (wage and salary earnings in 1997)

Moments

N 729 Sum Weights 729

Mean 33962.9122 Sum Observations 24758963

Std Deviation 32794.8316 Variance 1075500982

Skewness 4.46789575 Kurtosis 33.6209832

Uncorrected SS 1.62385E12 Corrected SS 7.82965E11

Coeff Variation 96.5607173 Std Error Mean 1214.62339

Basic Statistical Measures

Location Variability

Mean 33962.91 Std Deviation 32795

Median 28000.00 Variance 1075500982

Mode 15000.00 Range 330639

Interquartile Range 29000

Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student's t t 27.96168 Pr > |t| <.0001

Sign M 364.5 Pr >= |M| <.0001

Signed Rank S 133042.5 Pr >= |S| <.0001

Page 23: Economics 206 A Brief Introduction to SAS for Windowsecon.korea.ac.kr/~kimecon/econ206/sasintro.pdf · 2010-09-30 · Economics 206 A Brief Introduction to SAS for Windows Beomsoo

Quantiles (Definition 5)

Quantile Estimate

100% Max 330659

99% 140000

95% 82000

90% 65000

75% Q3 44000

50% Median 28000

25% Q1 15000

10% 6500

5% 3120

1% 1000

0% Min 20

Extreme Observations

----Lowest---- -----Highest----

Value Obs Value Obs

20 22 181345 331

56 499 306468 190

300 595 330095 556

420 551 330659 5

700 312 330659 29