SPSS MR Utils

40
SPSS MR Utilities User’s Guide QUT000223DU

Transcript of SPSS MR Utils

Page 1: SPSS MR Utils

SPSS MR Utilities

User’s Guide

QUT000223DU

Page 2: SPSS MR Utils

COPYRIGHT 2000 BY SPSS LTD

All rights reserved as an unpublished work, and the existence of this notice shall not be construed as an admission or presumption that publication has occurred. No part of the materials may be used, reproduced, disclosed or transmitted to others in any form or by any means except under license by SPSS Ltd. or its authorized distributors.

SPSS LtdMaygrove House67 Maygrove RoadLONDONNW6 2EGEngland

Please address any comments or queries about this manual to the Support Department at the above address, or via e-mail to: [email protected]

All trademarks acknowledged.

Page 3: SPSS MR Utils

Contents

About this Guide......................................................................................................... iiiTypographical conventions ............................................................................................ iii

1 Reading non-Quantum data files.......................................................................... 11.1 Which program to use ......................................................................................................... 11.2 Using rcolbin ....................................................................................................................... 21.3 Using mtr ............................................................................................................................. 2

Data format to read.............................................................................................................. 2Input and output files........................................................................................................... 3Record length and block size............................................................................................... 4Files with one record per block ........................................................................................... 4Reading only a given number of records............................................................................. 5Byte swapping ..................................................................................................................... 5

1.4 Using mtread ....................................................................................................................... 61.5 Restrictions.......................................................................................................................... 61.6 How to read a mystery data file........................................................................................... 6

2 Converting Quantum data to foreign formats................................................. 92.1 Which program to use ......................................................................................................... 92.2 Using wcolbin.................................................................................................................... 102.3 Using mtw ......................................................................................................................... 10

Data format to write .......................................................................................................... 10Input and output files......................................................................................................... 11Record length and block size............................................................................................. 12Writing only a given number of records ........................................................................... 13

2.4 Using mtwrite .................................................................................................................... 132.5 Restrictions........................................................................................................................ 13

3 Checking for corrupt Quantum data files ....................................................... 15

4 Editing Quantum data.............................................................................................. 174.1 Using ded........................................................................................................................... 174.2 Record-editing commands................................................................................................. 174.3 Card-editing commands .................................................................................................... 204.4 Restrictions........................................................................................................................ 214.5 Diagnostics ........................................................................................................................ 21

5 Replacing text with sequential numeric values ........................................... 235.1 Preparing the text file ........................................................................................................ 235.2 Using mc............................................................................................................................ 23

6 Printing selected fields from a file ..................................................................... 276.1 Which columns and fields to print .................................................................................... 276.2 Text and column separators in the output ......................................................................... 286.3 Dealing with blank or short records .................................................................................. 29

Contents / i

Page 4: SPSS MR Utils

SPSS MR Utilities User’s Guide

6.4 Line numbers..................................................................................................................... 296.5 Restrictions........................................................................................................................ 29

7 Sorting files ................................................................................................................. 31

8 ANSI carriage control sequences in files ....................................................... 338.1 Adding ANSI control sequences....................................................................................... 338.2 Removing ANSI control sequences .................................................................................. 34

ii / Contents

Page 5: SPSS MR Utils

About this guide / iii

About this Guide

The SPSS MR Utilities User’s Guide describes a set of useful programs that are distributed to usersof Quantum and Quancept. These programs are designed to overcome some of the problems andrestrictions that come with the standard MS-DOS and Unix tools.

Unless noted otherwise, all the programs documented in this manual work on MS-DOS and the Unixoperating systems on which Quantum and Quancept are available.

Typographical conventions

The following typographical conventions have been used in this manual:

Bold text is used in syntax statements to show words that you must type exactly as they are shown.

Italic text is used in syntax statements to show words where you must substitute information of yourown. For example, the word filename indicates that the program requires a filename and that youshould enter the name of a file in place of the filename parameter.

Italic text is also used in the main body of the text to refer to variable parameters from the commandline and also to show MS-DOS or Unix commands.

Fixed width type is used to show examples.

In statements of syntax, [square brackets] are used to show optional parameters.

Page 6: SPSS MR Utils
Page 7: SPSS MR Utils

0001.)

this

thission in

hat useonvert

s 160at, use

card

ll thes it is

ta file,

1 Reading non-Quantum data files

Data for Quantum runs does not always start off in Quantum format. Many market research datafiles are created in 360 or 1130 column binary and need converting if they are to be used withQuantum. The programs that do this are rcolbin, mtr and mtreat and the foreign formats theyknow about are:

• ASCII text

• EBCDIC

• 360/370 column binary

• 1130 column binary

• Quantum internal (binary) format. (This uses the 12 lower-order bits (0FFF in hexadecimal) ofa 16-bit word to represent the codes &–0123456789 in that order; that is &=0800 and 9=

Normally, you will be reading foreign data files directly from a tape, but the instructions indocument apply to any input device as long as the records are of a fixed length.

✎ SPSS MR has no utilities for reading variable length records. If you receive a tape or file informat, you should ask the person who created the tape or file to create another verfixed-record format.

1.1 Which program to use

mtr is the main data conversion program. rcolbin and mtread are shell programs/batch files tmtr in specific formats. They exist to make it easier for beginners or nontechnical users to cQuantum data files.

rcolbin converts a complete 360/370 column binary file into Quantum format. Each record icharacters long and records are read in blocks of 1,600 characters. If your data is in this formrcolbin. If you want to do anything else, or the data you want to convert is not 80-columnimages, use either mtr or mtread.

mtread is simply an interactive interface to mtr. Instead of assuming that you will provide ainformation about the foreign data format on the command line, mtread prompts you for it arunning.

The one advantage that mtr has over mtread is that it allows you to convert only part of a dawhereas, mtread always converts the whole file.

Reading non-Quantum data files – Chapter 1 / 1

Page 8: SPSS MR Utils

SPSS MR Utilities User’s Guide

1.2 Using rcolbin

rcolbin reads a 360/370 column binary data file with a record length of 160 characters and a blocksize of 1,600 characters. To use it, type:

rcolbin foreign_file quantum_file

where foreign_file is the name of the 360/370 column binary data file you wish to convert andquantum_file is the name of the Quantum data file you wish to create.

Normally, you will be reading the data file directly from a tape, so foreign_file will be the devicename of the tape drive you are using.

Here are some examples. The first one creates a file called qtdata by converting the file on a tapein a SCSI tape drive called /dev/rst1:

rcolbin /dev/rst1 qtdata

The next example reads a 360/370 column binary file called cbdata from the current directory andcreates from it a Quantum data file called qtdata:

rcolbin cbdata qtdata

✎ If you want to convert a 360/370 column binary file that has a different record length and/orblock size, or you want to convert a certain number of records only, or you want to convertdata from a different format, use mtr or mtread.

1.3 Using mtr

The full syntax of the mtr command is:

mtr format –rrecord_length –bblock_size –iinput –ooutput [other_options]

The order of parameters in the command line is unimportant.

Data format to read

The format parameter defines the type of data you are trying to read and may be one of:

–asc ASCII text

–ebc EBCDIC using the IBM translation

–col or –360 360 or 370 column binary

2 / Chapter 1 – Reading non-Quantum data files

Page 9: SPSS MR Utils

SPSS MR Utilities User’s Guide

’ and

/rmt1):

a file

tead,, press

need

If you omit the format option from the command line, mtr prompts for each data type in turn, askingwhether the tape is of the particular format in question. For example:

Is this an EBCDIC tape?

Is this an ASCII tape?

Type y and press ENTER at the appropriate prompt.

If you answer n to all prompts, mtr displays the message ‘Don’t know how to read this tapestops.

Input and output files

You define the name of the foreign file you wish to read with the option –ifilename. Most times,you will be reading data from a tape so the name of the input filename will be the name of the tapedevice you are using. For example, if you are using a SCSI tape drive called /dev/rst0, you wouldenter this on the command line as:

mtr -360 -i/dev/rst0

The next command assumes that the data is being read from a ½-inch magnetic tape (/dev

mtr -360 -i/dev/rmt1

The next command names an input file rather than an input device, so mtr will search forcalled efile in the current directory:

mtr -1130 -iefile

If you forget to enter the name of the input file or device, mtr does not prompt you for it. Insit waits for you to type in the data; to cancel mtr and re-enter the command with a filenameCTRL+D.

If a tape contains more than one data file and you want to read any file after the first file, youto tell mtr to skip over the preceding files on the tape. To do this, add the option –fnumber to thecommand line, where number is the number of files to skip. For example, if you want to read thethird file on the tape, your command could be:

mtr -c -f2 -i/dev/rst0

–1130 1130 column binary

–qin Quantum internal (binary) format

Reading non-Quantum data files – Chapter 1 / 3

Page 10: SPSS MR Utils

SPSS MR Utilities User’s Guide

columnis 160

record a raw

ize of the

f 160

3,200

are thenter just

You enter the name of the Quantum data file you wish to create in a similar way using the option–ofilename. For example:

mtr -360 -i/dev/rst0 -oqtdata

This example uses a simple filename as the name of the Quantum file so mtr will create the file inyour current directory. If you want to create the file in a different directory, you may enter a fullpathname here instead. In both cases, mtr overwrites the file if it already exists.

If you omit the output filename from the command, mtr displays the data on the screen as it reads it.

Record length and block size

The –r and –b parameters tell mtr about the overall structure of the data on the tape. The recordlength is the number of characters in each record of the data file. You define it as –rnn. If theincoming data is in a binary format, the record length must be an even number because eachof data is held as two characters (bytes). A common length for column binary records characters – this is left over from the days when market research data was held on IBM punchedcards which had 80 columns per card – but this is not a requirement.

The block size is the number of characters in each block. This must be a multiple of the length, for example, 1600 if the record length is 160 characters. If you are reading data fromcharacter device such as a tape drive, the block size must be equal to or greater than the slargest block to be read. You define the block size as –bnn.

For example, to convert a data file that is in 360 column binary format with a record length ocharacters and a block size of 1,600 characters, you would type:

mtr -360 -r160 -b1600 -i/dev/rmt1 -oqtdata

To convert an EBCDIC data file that has a record length of 160 characters and a block size of characters, you would type:

mtr -1130 -r160 -b3200 -iefile -oqtdata

Files with one record per block

In some files, each record is written as a separate block, so the record length and block sizesame. In this case, you can either enter the same value with both –r and –b, or you can ethe record or block size and use a new option, –v, instead. For example:

mtr -360 -r160 -v -iefile -oqtdata

reads a 360/370 column binary data file in which the record length and block size are both 160characters.

4 / Chapter 1 – Reading non-Quantum data files

Page 11: SPSS MR Utils

SPSS MR Utilities User’s Guide

eans in a

ber of

It is the same as typing either:

mtr -360 -b160 -v -iefile -oqtdata

or

mtr -360 -r160 -b160 -iefile -oqtdata

If you omit the record length from your command but have defined the block size, mtr asks whetherthe incoming data has variable length records. If records are written one per block, type y and therecord length is set to the same as the block size. The same is true, in reverse, if you define a blocksize without a record length.

✎ Using the –v option or pressing y to the question about variable length records does not mthat the incoming data file contains records of different lengths. mtr expects all recordfile to be the same length.

Reading only a given number of records

You do not have to read or convert a whole file if you do not want to. To read a given numrecords from a file, add the option –mnn to the command line, where nn is the number of recordsyou wish to read. For example:

mtr -360 -r160 -b1600 -m50 -i/dev/rst1 -oqtdata

to read and convert 50 records from SCSI tape drive 1. The command assumes that records have arecord length of 160 characters, a block size of 1,600 characters and are in 360/370 column binaryformat. The Quantum data will be written to a file called qtdata in the current directory.

Byte swapping

Some computers hold the two bytes (characters) that make up a column in the opposite order to thatused by the majority of computers. This does not affect text formats such as ASCII and EBCDIC, butwith binary data the Quantum data will be incorrect if the bytes are not swapped as they are readin. To have mtr swap bytes before it converts to Quantum format, add the option –s to the commandline.

Reading non-Quantum data files – Chapter 1 / 5

Page 12: SPSS MR Utils

SPSS MR Utilities User’s Guide

e byte

d and

dataich type output

tandardou

s wello keepe 6. The6,mal,

1.4 Using mtread

mtread is an interactive version of mtr which prompts you for the record length and block size ofthe incoming data file. Running mtread is exactly the same as running mtr with only the –i and –oparameters on the command line. To use it, type:

mtread foreign_file quantum_file

✎ Do not use mtread if you want to convert a few records only or if the data needs to bswapped before conversion.

1.5 Restrictions

• Invalid binary data is corrected and converted without warning.

• If mtr reaches the end of the file midway through a record, the partial record is ignorecannot be converted. A message to this effect is issued:

Bad block size: Block 58 expected 160 got 100 assuming 80

1.6 How to read a mystery data file

Sometimes, you will be confronted with a data file whose format is a mystery to you. Oftenfiles are transferred from machine to machine in such as way as to lose any clues as to whof data they contain. You may have used mtr with several different options and have createdthat is definitely not a data file. Here are some clues to help you cope.

You should have on your machine a program that lets you look at a file on the screen. The sMS-DOS program is called type, but this does not provide much flexibility. Other programs that ymay have on a PC are browse and more. Under Unix, the program you want is more.

You would also benefit from using a program that can show files in hexadecimal notation aas in regular decimal. Hexadecimal (or hex) is a special notation that allows computers tinformation about a character in two bytes rather than three. There are 256 characters in thASCII

data set, and this is exactly 16 times 16. Hexadecimal notation is actually arithmetic in base 1numerals in this notation are 1 to 9 and A to F. So, 17 in decimal is 11 in hex (that is, 1 times 1plus 1) and 255 in decimal is FF in hex. Note that each of the 256 characters (0 to 255 in deci0 to FF in hex) takes up only one byte. You can look at files in hex in MS-DOS by using debug, orany number of other utilities. In Unix, the utility is od.

6 / Chapter 1 – Reading non-Quantum data files

Page 13: SPSS MR Utils

SPSS MR Utilities User’s Guide

thesystemxtra

d. Thisou are

It would also be useful to know how many respondents should be in the data file, and how manycards each respondent should have. Even if this information is approximate, it can give you a hintas to whether the data file is close to the correct size or not.

When looking in hex at a file which may be column binary, you may find that every 160 charactersor so you will see a repeated pattern, or a similar pattern. When you are dealing with an 80-columnrecord, the serial number will be repeated in approximately the same place on each record, every160 bytes. When looking at a file as a regular text file, you will see that most of the file consists ofblanks, interspersed with blocks, triangles and other graphics-style characters.

✎ Keep in mind that while much of column binary is record length 160 (two bytes per column),this is not a requirement and any even number may be used.

EBCDIC is different. EBCDIC in regular text mode looks mostly like line-drawing characters. Thereare few spaces. However, the best way to tell is to look at the file in hex. The hex codes mostly arenumbers of the form Fn, where n is the data code. For example, a code 1 would be a hex F1, a code 2would be a hex F2, and so on.

Most market research data files you will be converting are in the 80-column format, no matterwhich type of record you are converting from. So, think 80, 160 and the like when trying todiscover the record length.

When you are converting, and are not sure of the record length, do not use the –m option to readonly a few records. With this option, mtr does not tell you if there is anything wrong withconversion factors you used. It just reports how many records it found and returns you to the prompt. If you let mtr convert the whole file, it will issue an error message if it found echaracters left over at the end of the operation. For example:

Bad block size: Block 69 expected 1600 got 1437 assuming 1280

This means that the last record was not completely filled before the end if the file was reacheis the pointer that suggests that your mtr command was not correct for the data file yconverting.

Reading non-Quantum data files – Chapter 1 / 7

Page 14: SPSS MR Utils

SPSS MR Utilities User’s Guide

User’s

Once you have tried to convert the file and have received a message similar to the one shown above,look at the output file using, say, type in MS-DOS or more in Unix. If you have the type of filecorrect, but the block size, record length, or both, is incorrect, you will see a file where you havenumbers but they will seem to march diagonally across the screen, displaced out of their correctspots. For example:

00019595695969696797979879695949392692393459697695954493932932939339393933933

00

02409824098724t0989873245097309874098247124098712340987123409871240983408 00

03

244098723409874t3450987345098723098723409871234098712340987123409871234 0004

309873087959569596969679392692393459697695954493932932939339393933933 000594

0486467638362526474746253939393939 ....

Notice how the serial numbers 0001, 0002, 0003, 0004 and 0005 (shown in bold) seem to movediagonally across the page. The record length used to convert this file was 160 when it should havebeen 164.

When you have converted a multicoded file into Quantum format, you may see a five-sided boxsymbol or, if you look at the file with vi, the characters ^?, followed by a list of letters, numbersand symbols. This special symbol is the decimal code 127, or hex 7E, that separates the end of thedata from the multicodes in each record. In the body of the record, each multicode is shown as anasterisk. If a record contains no multicodes, you will not see this special symbol in that record.

☞ For a complete description of Quantum data format, see Appendix C of the Quantum Guide Volume 4.

8 / Chapter 1 – Reading non-Quantum data files

Page 15: SPSS MR Utils

0001.)

s thatm data

s 160t to do,n card

ll thes it is

ta file,

2 Converting Quantum data to foreign formats

From time to time, you may need to give your client a copy of the data file on tape. Not everycomputer can read data in Quantum format and you may need to convert the data as you create thetape. The programs that do this are wcolbin, mtw and mtwrite, and the foreign formats they knowabout are:

• ASCII text

• EBCDIC

• 360/370 column binary

• 1130 column binary

• Quantum internal (binary) format. (This uses the 12 lower-order bits (0FFF in hexadecimal) ofa 16-bit word to represent the codes &–0123456789 in that order; that is &=0800 and 9=

2.1 Which program to use

mtw is the main data conversion program. wcolbin and mtwrite are shell programs/batch fileuse mtw. They exist to make it easier for beginners or nontechnical users to convert Quantufiles.

wcolbin converts a complete Quantum data file to 360/370 column binary. Each record icharacters long and records are read in blocks of 1,600 characters. If this is what you wanuse wcolbin. If you want to do anything else, or the data you want to convert is not 80-columimages, use either mtw or mtwrite.

mtwrite is simply an interactive interface to mtw. Instead of assuming that you will provide ainformation about the foreign data format on the command line, mtwrite prompts you for it arunning.

The one advantage that mtw has over mtwrite is that it allows you to convert only part of a dawhereas, mtwrite always converts the whole file.

Converting Quantum data to foreign formats – Chapter 2 / 9

Page 16: SPSS MR Utils

SPSS MR Utilities User’s Guide

2.2 Using wcolbin

wcolbin creates a 360/370 column binary data file with a record length of 160 characters and ablock size of 1,600 characters. To use it, type:

wcolbin quantum_file foreign_file

where quantum_file is the name of the Quantum data file and foreign_file is the name of the360/370 column binary data file you wish to create.

Normally, you will be writing the data file directly to a tape, so foreign_file will be the device nameof the tape drive you are using.

Here are some examples. The first writes to a tape in a SCSI tape drive called /dev/rst1:

wcolbin qtdata /dev/rst1

The next example creates a 360/370 column binary file called cbdata in the current directory:

wcolbin qtdata cbdata

✎ If you want to create a 360/370 column binary file with a different record length and/or blocksize, or you want to create a certain number of records only, or you want to create data in adifferent format, use mtw or mtwrite.

2.3 Using mtw

The full syntax of the mtw command is:

mtw format –rrecord_length –bblock_size [–mmax_recs] –iinput –ooutput

The order of parameters in the command line is unimportant. Parameters in square brackets areoptional.

Data format to write

The format parameter defines the type of data you are trying to write and may be one of:

–asc ASCII text

–ebc EBCDIC using the IBM translation

–col or –360 360 or 370 column binary

10 / Chapter 2 – Converting Quantum data to foreign formats

Page 17: SPSS MR Utils

SPSS MR Utilities User’s Guide

‘&1’ no

turn,

ion’

In conversions to ASCII format, any multicodes in the Quantum data that do not correspond tostandard ASCII characters are written out as asterisks. For example, a multicode of corresponds to the letter A and will be written out as such, whereas a multicode of ‘123’ hasASCII equivalent, and will therefore be written out as an asterisk.

In conversions to EBCDIC, the Quantum data is converted first to ASCII and then from ASCII intoEBCDIC. The notes for conversions to ASCII therefore apply.

If you omit the format option from the command line, mtw prompts for each data type in asking whether the conversion is of the particular format in question. For example:

Is this an EBCDIC conversion?

Is this an ASCII conversion?

Type y and press ENTER at the appropriate prompt.

If you answer n to all prompts, mtw displays the message ‘Don’t know how to do this conversand stops.

Input and output files

You define the name of the Quantum data file you wish to convert with the option –ifilename. Forexample:

mtw -360 iqdata

This example uses a simple filename as the name of the Quantum data file so mtw will look for thefile in your current directory. If you want to convert a file in a different directory, you may enter apathname here instead.

If you forget to enter the name of the input file, mtw does not prompt you for it. Instead, it waitsfor you to type in the data; to cancel mtw and re-enter the command with a filename, press CTRL+D.

You enter the name of the Quantum data file you wish to create in a similar way using the option–ofilename. Most times, you will be writing data to a tape so the name of the output filename willbe the name of the tape device you are using. For example, if you are using a SCSI tape drive called/dev/rst0 and are converting the file called qtdata, you would enter this on the command line as:

mtw -360 qtdata -o/dev/rst

The next command assumes that you are writing to a ½-inch magnetic tape (/dev/rmt1):

mtw -360 -iqtdata -o/dev/rmt1

–1130 1130 column binary

–qin Quantum internal (binary) format

Converting Quantum data to foreign formats – Chapter 2 / 11

Page 18: SPSS MR Utils

SPSS MR Utilities User’s Guide

f data iss – this0

recorda rawre being

gth ofpe, you

3,200

u for

The next command names an output file rather than an output device, so mtw will create a filecalled efile in the current directory:

mtw -1130 -iqtdata -oefile

If you omit the output filename or device name from the command, mtw displays the data on thescreen as it converts it. You may find this useful if you want to use the converted data as the inputto another program, since it means that you can pipe the converted data directly from mtw into thesecond program. There is no need to store an intermediate data file unless you wish to do so.

Record length and block size

The –r and –b parameters tell mtw how to write the data on the tape. The record length is thenumber of characters in each record of the foreign data file. You define it as –rnn. If you are writingdata in a binary format, the record length must be an even number because each column oheld as two characters (bytes). A common length for column binary records is 160 characteris left over from the days when market research data was held on IBM punched cards which had 8columns per card – but this is not a requirement.

The block size is the number of characters in each block. This must be a multiple of the length, for example, 1600 if the record length is 160 characters. If you are writing data to character device such as a tape drive, records are grouped into blocks of the given size befowritten out. You define the block size as –bnn.

For example, to convert a Quantum data file into 360 column binary format with a record len160 characters and a block size of 1,600 characters, and write it to a ½-inch magnetic tawould type:

mtw -360 -r160 -b1600 -iqtdata -o/dev/rmt1

To create an EBCDIC data file that has a record length of 160 characters and a block size of characters, you would type:

mtw -1130 -r160 -b3200 -iqtdata -oefile

If you omit either the record length or the block size from your command, mtw prompts yothem.

12 / Chapter 2 – Converting Quantum data to foreign formats

Page 19: SPSS MR Utils

SPSS MR Utilities User’s Guide

ishosition

Writing only a given number of records

You do not have to convert or write out a whole file if you do not want to. To write out a givennumber of records, add the option –mnn, where nn is the number of records you wish to write. Forexample:

mtw -360 -r160 -b1600 -m50 -iqtdata -o/dev/rst1

to convert and write out 50 records to SCSI tape drive 1. Records will have a record length of 160characters, a block size of 1,600 characters and will be in 360 column binary format.

✎ mtw does not skip to the end of the input file before stopping. If you are reading data from atape, the tape stops in the middle of the Quantum data file and should be repositioned orrewound manually.

2.4 Using mtwrite

mtwrite converts a whole Quantum data file into a format of your choice. It prompts you for therecord length and block size of the file you wish to create. Running mtwrite is exactly the same asrunning mtw with only the –i and –o parameters on the command line. To use it, type:

mtwrite quantum_file foreign_file

2.5 Restrictions

wcolbin, mtw and mtwrite have no facilities for positioning the tape before writing to it. If you wto write more than one file to a tape, you must either use a non-rewinding tape drive or repthe tape at the end of the last file before writing the second file to the tape.

Converting Quantum data to foreign formats – Chapter 2 / 13

Page 20: SPSS MR Utils
Page 21: SPSS MR Utils

Checking for corrupt Quantum data files – Chapter 3 / 15

3 Checking for corrupt Quantum data files

Quantum data files hold information about multicodes in a special format. Each multicoded columnwhose codes cannot be shown as a letter or other symbol is shown as an asterisk. The codes thatmake up the multicodes are stored at the end of the line, separated from the end of the data with aspecial character. If you list a multicoded data file under Unix, you will see this character displayedas ^?, whereas, under MS-DOS it appears as a small, five-sided box. Each multicode is representedby two characters, so you would expect to see an even number of characters after the multicodesymbol. Data in which the number of character pairs at the end of the lines does not match thenumber of asterisks in the earlier part of the line is corrupt and will be rejected by Quantum.

☞ For further information on Quantum data formats, see Appendix C of the Quantum User’sGuide Volume 4.

Tab characters have no meaning in Quantum data files and will cause a run to fail. To check fortabs and corrupt multicodes, use the badata program.

To run badata, type:

badata [–v][–x][–o output_file][input_files]

input_files is a list of one or more filenames separated by spaces. A hyphen instead of a filenametells badata to read from the standard input (that is, data you type on your keyboard) rather thanfrom a file.

The –v option displays the program version number, and –x displays a summary of usage.

Errors are normally displayed on the screen, but you can use the –o option to redirect the output toa file. For example:

badata -o errors data1 data2

badata issues two types of error messages. If it finds a tab character, it reports:

Line number: Tab character in data

If it finds too few or too many characters after the multicode symbol, it reports:

Line number: corrupt record (x multi-punched columns, y character codes)

Page 22: SPSS MR Utils
Page 23: SPSS MR Utils

4 Editing Quantum data

ded is a data editor designed for handling Quantum data files. You can edit single-card andmulticard data files on a record by record basis, where each record is treated as a separate line. Withmulticard records, you can also edit the individual cards that make up a record.

4.1 Using ded

To edit a Quantum data file, type:

ded filename [ser=s1,s2] [crd=c1[,c2]]

ser= defines the start and end columns of the serial number and crd= defines the start and endcolumns of the card type (if the card type is held in a single column, you may omit the end column).These parameters are optional with single-card records, but must be used with multicard records.For example, if your data has two cards per record, with the serial number in columns 1 to 5 andthe card type in column 6, you would type:

ded qtdata ser=1,5 crd=6

This example separates the start and end columns of each field with commas, but ded accepts anycharacter except a space, a digit or a tab character as a separator.

Editing commands are divided into record-editing commands and card-editing commands. Theprompt for record-editing commands is a colon, and for card-editing commands, it is c:.

4.2 Record-editing commands

When you are working in record-editing mode, ded prompts for commands with a colon. Record-editing commands are as follows:

prompt Switches the colon prompt on and off. pro is an abbreviation for prompt.

pb Switches brief-printing mode on. For each record displayed, ded prints therecord serial number and the number of cards in the record. If you named thecard type columns on the command line, ded tells you which cards these are. Forexample:

Record 156 has 5 cards, these are 1,2,2,3,4

This is the default printing mode if you name the serial number field on thecommand line.

Editing Quantum data – Chapter 4 / 17

Page 24: SPSS MR Utils

SPSS MR Utilities User’s Guide

fter the

pply

rterpe

edvalid

the atny,

pc Switches character-printing mode on. The contents of each record are displayedwith multicodes shown as asterisks. For example:

0015616267*7575*16231204521321**

001562 9438 21232* &- *23

This is the default if you do not name the serial number field on the commandline.

pp Switches punch (code) printing mode on. The punches in a multicoded columnare displayed vertically in that column. Ranges of consecutive codes are shownusing the notation start/end (for example, 1/5 for codes 1 to 5 inclusive). Forexample:

00156162671757541623120452132113

3 / 25

7 9&

p Prints the record in the current printing mode. In a multicard record, all cards areprinted at once.

ruler Displays an 80-column ruler above each record printed in pp or pc mode.Entering this command when a ruler is already displayed removes the ruler.

eol Prints cards in a multicard record double-spaced. To switch off double-spacing,re-enter the eol (end of line) command.

n Prints the nth record in the file (for example, 156).

m,n Prints records m to n in the file (for example, 156,160).

+/–[n] Prints the next (+) or previous (–) record in the file. If you enter a number athe + or – sign, ded skips forward or back that number of records and printsrecord at that position. For example, typing -5 at record 156, prints record 151.The printed record becomes the current record to which any changes will a

$ Goes to the last record in the file and prints it.

(sernum) Locates the record with the given serial number. If the serial number is shothan the serial number field width, ded pads it on the left with zeros. If you ty(156), for example, and the serial number field is five columns wide, dsearches for a record whose serial number is 00156. This command is only if you defined the serial number field on the command line.

g(sernum) Locates and displays all records with the given serial number. At the end ofsearch, the pointer is left at the end of the file. This is useful for lookingrecords with duplicate serial numbers with a view to deciding which one, if ashould be deleted.

l,$ Lists the whole file.

= Reports the number of records in the file.

18 / Chapter 4 – Editing Quantum data

Page 25: SPSS MR Utils

SPSS MR Utilities User’s Guide

/data/ Searches for a record containing the given data and displays that record.

s col=data Overwrites the contents of column/field col with the given data. As in Quantum,the column specification for multicard records must define both the card typeand column numbers. Punches must be enclosed in single quotes and stringsmust be enclosed in dollar signs. Here are some examples:

s 15=’&’

s 45,50=$123456$

s 252=’156&’

A space is an alternative to the = sign for separating the column and codespecifications.

a Appends a new record after the current record. Type the data on a new line. Eachcharacter you type goes into a new column unless you enclose a string of codesin single quotes. In this case, those codes are treated as a multicode in the currentcolumn. At the end of the data, press ENTER and then type a dot on a line by itselfto terminate the record. For example:

0015718462’137&’123

generates a record with 14 columns of data; column 11 is multicoded.

i Inserts a new record before the current record. Rules for data entry are asdescribed for the append command.

d Deletes the current record. In a multicard record, all cards are deleted.

sort Sorts the data by serial number (you must have defined the serial number fieldon the command line).

gsort Sorts the data by card type within serial number (you must have defined theserial number and card type fields on the command line).

merge Merges adjacent records with identical serial numbers. Cards in the resultantrecords are not sorted, neither are duplicate card types merged into a single card.

When used with the insert or append commands, this is a useful facility fordealing with missing cards. For example, if card 5 is missing from record 156,you could enter the data for this card by appending it after record 156. If you thenrun the merge command, the new card 5 will be merged with the rest of the datafor respondent 156.

cs sernum Changes the serial number to the given number. If the number you give is shorterthan the serial number field, it will be padded on the left with zeros. To forceblank padding, precede the number with a colon followed by the requirednumber of blanks. For example:

cs : 156

Editing Quantum data – Chapter 4 / 19

Page 26: SPSS MR Utils

SPSS MR Utilities User’s Guide

4.3 Card-editing commands

To switch from record-editing mode into card-editing mode, type e. The prompt changes to c:. Youmay use any record-editing commands apart from pb, (sn), merge and w in this mode to refer toindividual cards in the current record rather than to the current record as a whole. For example, ifyou are looking at card 4 of record 156 and you type d, ded deletes card 4 while leaving the rest ofrecord 156 intact.

Card-editing commands are as follows:

e Switches into card-editing mode for the current record. Type q to revert torecord-editing mode.

w Writes out the data file saving any changes. To write out a range of records, typethe first and last record numbers in the range at the start of the command. Towrite data out to a different file, type the new filename at the end of thecommand. Here are some variations of the w command:

1,100 w

w newdata

1,100 w newdata

q Leave the data editor.

!command Executes the given MS-DOS/Unix command without terminating the editingsession. When the command finishes, you are returned to the editor.

shell Starts a subshell in which you can run MS-DOS/Unix commands. When you closethe subshell, you are returned to the editor.

rc Displays the serial number of the current record. This is useful if the serialnumber is coded somewhere other than at the start of the card.

ct Lists the card type numbers present in the record.

cs sernum Changes the serial number using the same rules as described above. ded searchesthrough the rest of the data for a record with the given serial number and, if oneis found, moves the current card to the end of that record. If no such record isfound, ded converts the card to a new record with the given serial number, butleaves it in its current position in the data file. In both cases, the original card isremoved from the current record.

20 / Chapter 4 – Editing Quantum data

Page 27: SPSS MR Utils

SPSS MR Utilities User’s Guide

s not

into a

data..

typerentfromies

andfrom

4.4 Restrictions

• ded does not recognize the Quantum notation –/& meaning all 12 punches.

• The dot notation used in the Unix ed, ex and vi editors for referring to the current line habeen implemented in ded.

• It is not possible to delete specific punches from a column, nor to emit new punches column. Use set commands to name the exact punches required in each column.

4.5 Diagnostics

Various error messages are displayed, mainly to do with buffer errors while reading or writingIf the file is too large to handle, ded advises you to use the Unix split command to make it smaller

For example, suppose you are working on card 3 of record 156 and you cs 200. If there is already a record with serial number 200, the data on the curcard is copied into record 200 as a card 3 and the current card is deleted record 156. However, if there is not already a record 200 in the file, ded copthe data from the current card into a new record with serial number 200, places the new record immediately after record 156. Card 3 is then deleted record 156.

q Returns to record-editing mode.

Editing Quantum data – Chapter 4 / 21

Page 28: SPSS MR Utils
Page 29: SPSS MR Utils

ment

ignals atical for

, the

5 Replacing text with sequential numeric values

The mc program replaces all occurrences of a given character with a numeric value. The numericvalue is different for each replacement made, but the increment between each value and theprevious replacement value is the same. For example, you could choose to replace all occurrencesof the # symbol with the numbers 1, 3,5, 7, and so on. The first occurrence of # would be replacedby the number 1, the second occurrence of # would be replaced by the number 3, and so on.

When replacing a character in this way, you choose:

• The start value for the first replacement.

• The incremental value used to calculate subsequent replacement values.

• The width of the replacement field if it is wider than the replacement value.

• What to do with blank columns in replacement fields that are longer than the replacevalue.

• The character to be replaced.

5.1 Preparing the text file

Each mc command performs replacements using one character only. The default text that sreplacement is the @ sign, but you can use any character you like, as long as they are ideneach mc command you want to run.

5.2 Using mc

To use mc, type:

mc start_value increment field_width[format] [text] [input_file] [output_file]

In the mc command, start_value is the numeric value for the first replacement and increment is theincremental value for each subsequent replacement. The default for both values is 1.

field_width is the width of the replacement field. This may be between 1 and 11 columnsdefault is 5 columns.

Replacing text with sequential numeric values – Chapter 5 / 23

Page 30: SPSS MR Utils

SPSS MR Utilities User’s Guide

format is a single character defining what to do when the replacement value is shorter than the fieldwidth. The default is to right-justify the replacement value in the field and pad the field on the leftwith zeroes. You may choose to use blanks instead of zeroes or to suppress them altogether. To dothis, enter a format character immediately after the field width. Valid format characters are:

text is the text to be replaced. The default is the @ symbol.

You can also use mc from within the ex or vi editors. To do this, edit the text file containing thespecial replacement symbols using one or other of these editors. Then, type:

:lines!mc start_value increment field_width[format] [text]

lines is any ex or vi syntax that is valid for referring to lines in the file. Examples are 1,$ and % forall lines in the file, and 10,50 for lines 10 to 50 only.

Here is an example. Suppose you have a large file containing a list of magazines. From time to timeyou want to extract various titles from the list and number them sequentially from 1. Here is partof the master magazine list:

Gardener’s World @

Amateur Gardening @

Garden News @

Practical Gardening @

Amateur Photographer @

Photography @

Practical Photography @

Practical Woodworking @

Practical Householder @

Do It Yourself @

Suppose you want to extract a list of all photography magazines and number them sequentiallyfrom 1. Here are the steps you would take:

1. Edit the file with ex or vi and delete all magazines that are not to do with photography. It ishelpful if all titles to do with a particular topic are grouped, as in the above example, but thisis not necessary.

2. Type:

:1,$!mc 1 1 3B

B Pad the field on the left with blanks.

S Suppress leading zeroes. The width of the replacement field then depends on thenumber of digits in the replacement value.

Z Pad the field on the left with zeroes (the default).

24 / Chapter 5 – Replacing text with sequential numeric values

Page 31: SPSS MR Utils

SPSS MR Utilities User’s Guide

This tells mc to replace all @ symbols (the default text is used because no other text is definedwith mc) with numbers starting at 1 and incremented by 1. The replacement field is 3 characterswide and is padded on the left with blanks.

3. Save your work in a new file (:w filename) and quit.

Here is the result of running these commands on the example file:

Photography 1

Practical Photography 2

Replacing text with sequential numeric values – Chapter 5 / 25

Page 32: SPSS MR Utils
Page 33: SPSS MR Utils

6 Printing selected fields from a file

MS-DOS does not provide any programs for extracting and printing information from lines in a file.The cut, paste and join utilities available with Unix provide some of this functionality, but aregenerally used in shell scripts rather than on the command line. An alternative is to use awk whichis, again, standard on Unix systems. However, although it is extremely flexible, awk is more aprogramming language than a simple one-line utility.

Another option is to use bycol. This easy-to-use program reads a file and prints selected columnsor fields from each line. You may also define additional texts to be printed as part of the output.

To use bycol, type:

bycol [–anpx] [–sseparator] [what_to_print] [filename]

To see a reminder of the command syntax, type:

bycol –x

6.1 Which columns and fields to print

The what_to_print parameter is a list of one or more column or field references defining the partsof the lines to print.

To print a single column, just type its number. To print a field, type the start and end columnnumbers separated by a comma. If the start column of a field reference is lower than the endcolumn, the field is read from right to left and its columns are printed in that order. All referencesin the list must be separated by spaces. For example:

bycol 1,2 15,10 4 9 1 29 myfile

prints columns 1 to 2, 15 to 10, 4, 9, 1 and 29 of myfile in that order. The contents of these columnsare printed as a single string with no spaces in between them.

The $ symbol represents the end of the line, so the notation 1,$ prints the whole line. However, $often has a special meaning to the shell so it is advisable to enclose references of this kind in singlequotes (Unix) or double quotes (MS-DOS). For example, under Unix:

bycol ’1,$’ myfile

prints the whole of myfile and is the same as typing cat myfile under Unix. Under MS-DOS, thecommand is:

bycol "1,$" myfile

and is the same as typing type myfile under MS-DOS.

Printing selected fields from a file – Chapter 6 / 27

Page 34: SPSS MR Utils

SPSS MR Utilities User’s Guide

bycol displays its output on the screen. To write the output to a file, end the line with >filename.For example:

bycol 1,2 15,10 4, 9, 1, 29 myfile > opfile

6.2 Text and column separators in the output

You can print other things besides just what is in the columns you have chosen. You can defineadditional texts of your own and you can define a character or string that is to be used as a separatorbetween each column, field or text printed.

To print a text, type it as part of the what_to_print parameter at the point in the string that you wantit to appear, preceded by a + sign. If the text contains spaces, enclose it in single quotes (Unix) ordouble quotes (MS-DOS). For example:

Unix bycol +’Before: ’ 25 +’ After: ’ 26 myfile

MS-DOS bycol +"Before: " 25 +" After: " 26 myfile

prints the word Before, then the contents of column 25, then the word After, and finally the contentsof column 26.

If you wish, you can define column separators as texts. Here is the very first example again, thistime with spaces used as separators:

Unix bycol 1,2 ’ ’ 15,10 ’ ’ 4 ’ ’ 9 ’ ’ 1 ’ ’ 29 myfile

MS-DOS bycol 1,2 " " 15,10 " " 4 " " 9 " " 1 " " 29 myfile

If you want to use the same separator across the whole line, it is quicker to define it once using the–s option. You could rewrite the previous example to produce the same output by typing:

Unix bycol -s’ ’ 1,2 15,10 4 9 1 29 myfile

MS-DOS bycol -s" " 1,2 15,10 4 9 1 29 myfile

The characters you use in text strings or as separators are not restricted to letters and numbers. Youcan use other characters from the list below, but be sure to enclose them in single quotes:

To print a Type Or the octal value

New line \n 12

Carriage return \r 15

Backspace \b 10

Tab \t 11

28 / Chapter 6 – Printing selected fields from a file

Page 35: SPSS MR Utils

SPSS MR Utilities User’s Guide

1,024.

Here is an example that uses the tab character as the column separator:

Unix bycol -s’\t’ 1,4 10,12 15 56,62 data

MS-DOS bycol -s"\t" 1,4 10,12 15 56,62 data

6.3 Dealing with blank or short records

bycol normally ignores blank records. If you want to include blank records in your output, use the–a option on the command line.

It may happen that records in your files are not all the same length and that some of the columnsnamed on the command line do not exist in some records. If this happens, bycol prints as manycolumns as it can. If you are printing data in columns with text in the last column, this could meanthat the text appears in the wrong column on short records.

If you would like all records that are shorter than the highest column named on the command lineto be padded with blanks to this length before being printed, include the option –p in yourcommand.

6.4 Line numbers

Use the option –n to print a line number at the start of each line.

6.5 Restrictions

• bycol cannot output lines longer than 1,024 characters.

• The maximum number of column references, field references, and texts in a command is

Formfeed \f 34

Backslash \\ 134

To print a Type Or the octal value

Printing selected fields from a file – Chapter 6 / 29

Page 36: SPSS MR Utils
Page 37: SPSS MR Utils

trt on

ions in

irst on. The

d forat

youefault

7 Sorting files

The standard ASCII file sorting programs provided with MS-DOS and Unix have shortcomings whenused to sort files based on the contents of more than one field in the line. Under MS-DOS, lines aresorted based on the contents of column 1 or a single column that you choose. Sorting based onfields of columns is not possible. The Unix sorting program is much more sophisticated and allowssorting based on the contents of one or more fields, but the syntax for specifying the fields is notstraightforward.

asort is a utility that overcomes these limitations and makes it easy to specify sorts using anynumber of columns and fields.

To use asort under Unix, type:

asort [options] input_file output_file start1 end1 [… startn endn]

To use asort under MS-DOS, type:

asort input_file output_file start1 end1

Where input_file is the name of the unsorted input file, output_file is the name of the sorted outpufile, start1 and end1 are the start and end positions of the first field you want to sort on. To soa single column, enter the same value for the start and end columns.

Under Unix, you can sort on more than one field by entering the pairs of start and end positorder of importance, most important first. This is not possible under MS-DOS and asort will issue anerror message if you specify more than one column field.

The options under Unix are:

Here is an example for Unix systems:

asort unsort.txt sort.txt 1 5 10 12 80 80

This command produces a sorted version of unsort.txt in the file sort.txt. Lines are sorted fthe contents of columns 1 to 5 (the highest sort level) and within that on columns 10 to 12lowest level of sorting is on column 80 within columns 10 to 12.

Option Explanation

o Call sort using the old method of specifying the sort key. This has been providebackwards compatibility. You may find this option useful in the unlikely event thasort now gives different results from previous versions. By using this option, should get the same results as you did using the previous version of asort. The dis that this option is off.

v Call sort in verbose mode. The default is that this option is off.

Sorting files – Chapter 7 / 31

Page 38: SPSS MR Utils
Page 39: SPSS MR Utils

8 ANSI carriage control sequences in files

Nowadays, computer systems and printers print text files as they appear on your screen. Exceptionsare that CTRL+L character (ASCII Formfeed) normally starts a new page, a CTRL+J character (ASCII

Linefeed) normally starts a new line, and CTRL+M character (ASCII Carriage return) normallyreturns to the start of the current line.

In the past, many systems adopted the ANSI standard for formatting text output. This specifies thatthe text to be printed on each line starts in position two on the line. The first character position isreserved for printing control characters that determine how the text is to be printed. These controlcharacters are:

Anything else means print this text at the beginning of the next line, which equates to the ASCII

CTRL+J character. The accepted character to use in ANSI files is the space character.

If you have a file with printer controls marked in ANSI format, you can convert it into ASCII formatby running the program deftn. Similarly, if you have a file in ASCII format that needs to beconverted into ANSI format, you can convert the file using ftnise.

8.1 Adding ANSI control sequences

To add ANSI control sequences to a file, type:

ftnise –o output_file existing_file

For example:

ftnise -o list1.ans list1

to create a file called list1.ans by adding ANSI carriage control characters to the lines in the filecalled list1.

1 Print this line on a new page (equates to ASCII CTRL+L).

+ Start this line at the beginning of the current line (equates to ASCII CTRL+M).

0 Start this line at the beginning of the next but one line (equates to two consecutive ASCII

CTRL+J characters).

ANSI carriage control sequences in files – Chapter 8 / 33

Page 40: SPSS MR Utils

SPSS MR Utilities User’s Guide

8.2 Removing ANSI control sequences

To remove ANSI carriage control sequences from a file, type:

deftn existing_file new_file

For example:

deftn list1.ans list1.unx

to remove the ANSI carriage control sequences from the file list1.ans and to save the results in a filecalled list1.unx.

34 / Chapter 8 – ANSI carriage control sequences in files