freeway 408 3684 3459 surface Martin Luther King Jr. Blvd. 1590 1234

10
• Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the A’s... there are a number of ways to do this, but an easy way is given on p. 54 using the so-called “trailing @” : DATA XXXX; INPUT TYPE $ @; IF TYPE=“B” THEN DELETE; INPUT other variables for the A record; See the program on the next page...

description

Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the A’s... there are a number of ways to do this, but an easy way is given on p. 54 using the so-called “trailing @” : DATA XXXX; INPUT TYPE $ @; - PowerPoint PPT Presentation

Transcript of freeway 408 3684 3459 surface Martin Luther King Jr. Blvd. 1590 1234

Page 1: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

• Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the A’s... there are a number of ways to do this, but an easy way is given on p. 54 using the so-called “trailing @” :

DATA XXXX; INPUT TYPE $ @; IF TYPE=“B” THEN DELETE;INPUT other variables for the A record;

See the program on the next page...

Page 2: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

freeway 408 3684 3459surface Martin Luther King Jr. Blvd. 1590 1234surface Broadway 1259 1290surface Rodeo Dr. 1890 2067freeway 608 4583 3860freeway 808 2386 2518surface Lake Shore Dr. 1590 1234surface Pennsylvania Ave. 1259 1290

* Use a trailing @ to delete surface streets;DATA freeways; INFILE 'c:\MyRawData\Traffic.dat'; INPUT Type $ @; IF Type = 'surface' THEN DELETE; INPUT Name $ 9-38 AMTraffic PMTraffic;PROC PRINT DATA = freeways; TITLE 'Traffic for Freeways';RUN;

Page 3: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

• Note there are two INPUT statements; the first one reads the value of TYPE and the IF … THEN statement performs the test; if the value of TYPE is “surface”, then the second INPUT statement never executes; instead SAS returns to the beginning of the DATA step to process the next observation.

• Look at the log to check this out…• See the last paragraph on page 55:

– the trailing @ holds the data line for subsequent INPUT statements but releases the line when SAS returns to the top of the DATA step

– the @@ holds for subsequent INPUT statements even when SAS starts building a new observation

Page 4: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

• There are other ways of controlling the way we input data when using the INFILE statement (these are options with INFILE)– FIRSTOBS= This tells SAS the line on which to begin reading from the

file. I use this a lot when I’ve placed a “counter” on line #1 to count columns- then use FIRSTOBS=2

– OBS= This option tells SAS the line on which to end the reading from the file. Most often used to limit the number of observations read into the SAS dataset; e.g., if you have a huge dataset and you want to test your code on a small portion of the data. Can also be used with FIRSTOBS= to choose a subset of data between the value of FIRSTOBS and the value of OBS.

– MISSOVER This option is used to help deal with missing data that happen to occur at the end of the dataline; i.e., don’t go to the next line if you run out of data. Go over the example on p. 57 to see how it works.

– TRUNCOVER This option tells SAS to read data until it gets to the end of the line or the last column specified in the column input range, whichever comes first; i.e., it’s used when you’re using column input and some data lines are shorter than others. See example, p. 57

Page 5: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

*Put this data in c:\MyRawData\Scores.dat;

*NOTE that Nguyen and Robbins have incomplete data...;

Nguyen 89 76 91 82

Ramos 67 72 80 76 86

Robbins 76 65 79

SAS program - try it both with and without the MISSOVER statement to see what will happen in each case…

DATA class102;

INFILE 'c:\MyRawData\Scores.dat' MISSOVER;

INPUT Name $ Test1 Test2 Test3 Test4 Test5;

RUN;

Page 6: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

*Put this data in c:\MyRawData\Address.dat;

*NOTE that the addresses are diffent lengths...;

John Garcia 114 Maple Ave.Sylvia Chung 1302 Washington DriveMartha Newton 45 S.E. 14th St.

SAS program - try it both with and without the TRUNCOVER statement to see what will happen in each case…

DATA homeaddress; INFILE 'c:\MyRawData\Address.dat' TRUNCOVER; INPUT Name $ 1-15 Number 16-19 Street $ 22-37;RUN;

Page 7: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

• A file is delimited if it is a raw datafile that uses special characters to separate (or delimit) the individual values of the variables... Examples of delimiters are commas, tabs, etc.

• Use the INFILE statement with the DLM= option to handle this situation:– DLM=‘,’ will tell SAS that the file is comma-

delimited– DLM=‘09’x will tell SAS that the file uses the

tab character to delimit values of the variables. (09 is the hexadecimal representation of tab)

Page 8: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

Grace,3,1,5,2,6Martin,1,2,4,1,3Scott,9,10,4,8,6

Can be read with:

DATA reading; INFILE ‘C:\MYRAWDATA\BOOKS.DAT’ DLM=‘,’;INPUT NAME $ Week1-Week5;RUN; QUIT;

But what about missing values? 2 Commas in a row? Use the DSD option (Delimited Separated Data) in the INFILE statement as seen in the following:

INFILE ‘C:\MYRAWDATA\XXXX.DAT’ DLM=‘,’ DSD;

Page 9: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

• DSD does 3 things:– ignores delimiters inside data values enclosed in quotes

– does not read quotes as part of the data– treats two delimiters in a row as a missing value

NOTE: DSD assumes the delimiter is a comma - If it’s not, be sure to use the DLM option along with DSD…

• CSV files are a type of file written by Excel and the DSD option will read them… CSV files use comma as delimiter and two commas in a row for missing value… Let’s try an example...

Page 10: freeway 408                           3684 3459 surface Martin Luther King Jr. Blvd.  1590 1234

• Look at the file co2.csv, consisting of the countries of the world with population at least 20 million. The values of the co2 variable are the number of metric tons of co2 emissions per person in each of the corresponding countries.

• Open it with Notepad to see it in text form

• Two ways to proceed:

– first use the INFILE statement with the appropriate options (DLM=‘,’ DSD) to read the CSV version of the Excel file. If you have an EXCEL .xls file, you may save it as .csv.

– a second possibility is to use the IMPORT option under the File menu to read the Excel file directly into SAS. Let’s try this method next…

• HW:Make sure you’ve completely read 2.13-2.16 and worked through the examples (skip 2.17-2.18)