Nagler - Coding Style and Good Computing Practices

6
7/27/2019 Nagler - Coding Style and Good Computing Practices http://slidepdf.com/reader/full/nagler-coding-style-and-good-computing-practices 1/6 Coding Style and Good Computing Practices Author(s): Jonathan Nagler Source: PS: Political Science and Politics, Vol. 28, No. 3 (Sep., 1995), pp. 488-492 Published by: American Political Science Association Stable URL: http://www.jstor.org/stable/420315 Accessed: 06/10/2008 05:15 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=apsa . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].  American Political Science Association is collaborating with JSTOR to digitize, preserve and extend access to PS: Political Science and Politics.

Transcript of Nagler - Coding Style and Good Computing Practices

Page 1: Nagler - Coding Style and Good Computing Practices

7/27/2019 Nagler - Coding Style and Good Computing Practices

http://slidepdf.com/reader/full/nagler-coding-style-and-good-computing-practices 1/6

Coding Style and Good Computing Practices

Author(s): Jonathan NaglerSource: PS: Political Science and Politics, Vol. 28, No. 3 (Sep., 1995), pp. 488-492Published by: American Political Science AssociationStable URL: http://www.jstor.org/stable/420315

Accessed: 06/10/2008 05:15

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you

may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/action/showPublisher?publisherCode=apsa.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the

scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that

promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

 American Political Science Association is collaborating with JSTOR to digitize, preserve and extend access to

PS: Political Science and Politics.

Page 2: Nagler - Coding Style and Good Computing Practices

7/27/2019 Nagler - Coding Style and Good Computing Practices

http://slidepdf.com/reader/full/nagler-coding-style-and-good-computing-practices 2/6

Coding Style and Good Computing Practices

Jonathan Nagler, University of California, Riverside

Replication of scholarly analysisdepends on individual researchers

being able to explain exactly whatthey have done. And being able to

explain exactly what one has done

requires keeping good records of it.This article describes basic goodcomputing practices1 and offers ad-vice for writing clear code that fa-cilitates the task of replicating the

research. The goals are simple.First, the researcher should be ableto replicate his or her own work sixhours later, six months later, andeven six years later. Second, othersshould be able to look at the codeand understand what was beingdone (and preferably why it was

being done). In addition, followinggood computing practices encour-

ages the researcher to maintain a

thorough grasp of what is beingdone with the data. This allows formore efficient research: one is not

always rereading one's own workand retracing one's own steps to

perform the smallest bit of addi-tional analysis.

This article is not meant only for

sophisticated statistical researchers.In fact, the statistical proceduresyou ultimately use have nothing todo with the topic of this article.These practices should be usedeven if you are doing nothing more

complex than producing 2 x 2 ta-bles with a particular data set. The

sequence this article is written inleads the reader from more generalthemes to more specific ones. I en-

courage readers who haven't the

patience for the big picture to skipahead rather than skip the whole

article. Even learning basic conven-tions about variable names will putyou ahead of most coders!

First, what do I mean by com-

puting practices and "code?" Com-

puting practices covers everythingyou do from the time you open thecodebook to a data set, or begin toenter your own data, to the time

you produce the actual numbersthat will be placed in a table of an

article. "Code" refers to the actual

computer syntax-or computer pro-

gram-used to perform the compu-tations. This most likely means theset of commands issued to a higher-level statistics package such asSAS or SPSS. A given file of com-mands is a program. Most politicalscientists do not think of them-selves as computer programmers.But when you write a line of syn-tax in SAS or SPSS, that is exactlywhat you are doing-programminga computer. It is coincidental thatthe language you use is SAS in-stead of Fortran or C. The paradox

is that most political scientists arenot trained as computer program-mers. And so they never learnedbasic elements of programmingstyle, nor are they practiced in theart of writing clean, readable, main-tainable code. The classic text on

programming style remains Ker-

nighan and Plauger, The Elements

of Programming Style (1974, 1978),and most books on programminginclude a chapter on style. I recom-mend including a section on codingstyle in every graduate methods

sequence.This article starts from the point

at which a raw data set existssomewhere on a computer. Itbreaks analysis into two basic

parts: coding the data to put it intoa usable form, and computing withthe data. It is useful to break thefirst part into two component steps:reading the essential data from a

larger data set; and recoding it and

computing new variables to be usedin data analysis.

Lab Books

Our peers in the laboratory sci-ences maintain lab books. Thesebooks indicate what they did to

perform their experiments. We arenot generally performing experi-ments. But we have identical goals.We want to be able to retrieve a

record of every step we have fol-lowed to produce a particular re-

sult. This means the lab bookshould include the name of everyfile of commands you wrote, with abrief synopsis of what the file wasintended to do. The lab bookshould indicate which producedresults worth noting.

It is a good idea to have a tem-

plate that you follow for each labbook entry; this encourages you toavoid becoming careless in yourentries. A template could include

date, file, author, purpose, results,and machine. You might have a set

of purposes-recoding, data extrac-tion, data analysis-that you feeleach file fits into. It may seem su-

perfluous to indicate on what ma-chine the file was executed. Butshould you develop the habit of

computing on several machines, orshould you move from one machineto another in the course of a

project, this information becomesinvaluable in making sure you canlocate all your files.

It makes a lot of sense to havethe lab book on-line. It can be ineither a Wordperfect file, or a plainascii text file, or whatever you aremost comfortable writing in. First,it is easy to search for particularevents if you remember theirnames. Second, it can be accessed

by more than one researcher if youare doing joint work.

Rule: Maintaina labbook from the

beginningof a projectto the end.

Lab books should provide an au-dit trail of any of your results. Thelab book should contain all the in-

formation necessary to take a givenpiece of data analysis, and traceback where all of the data camefrom, including its source and allrecode steps. So although youmight want to keep many sorts ofentries in a lab book, and there are

many different styles in which to

keep them, the point to keep inmind is that whatever style youchoose, it should meet this purpose.

PS: Political Science & Politics88

Page 3: Nagler - Coding Style and Good Computing Practices

7/27/2019 Nagler - Coding Style and Good Computing Practices

http://slidepdf.com/reader/full/nagler-coding-style-and-good-computing-practices 3/6

CodingStyleand GoodComputingPractices

Command Files

The notion of command files may

appear as an anachronism to some.

After all, can't we all just work in-

teractively by pointing and clickingto do our analysis, not having to go

through the tedium of typing sepa-

rate lines for each regression wewant to run and submitting our jobfor analysis? Yes, we can. But, we

probably don't want to. And even

if we do, modern software will

keep a log for us of what we have

done (a fact that seems to escape

many users of the software). The

reason I am not a fan of interactive

work has to do with the nature of

the analysis we do. The model we

ultimately settle on was usually ar-

rived at after several estimates test-

ing many models. And this is ap-propriate: we all ought to be testingthe robustness of our results to

changes in model specification,

changes in variable measurement,and so forth. By writing command

files to perform estimates, and

keeping each version, one has a

record of all this.

You generally want a large set of

comments at the beginning of each

file, indicating what the file is in-

tended to do. And each file should

not do too much. Comments on thetop of a file should list the following:

1. State the date it was written,and by whom.

2. Include a description of what

the file does.3. Note from what file it was im-

mediately derived.

4. Note all changes from its prede-cessors, where appropriate.

5. Indicate any data sets the fileuses as input. If the file uses adata

set,there should be a com-

ment indicating the source of thedata set. This could be eitheranother file you have that pro-duced the data or a descriptionof a public-use data set. If it is a

public-use data set, include thedate and version number.

6. Indicate any output files or dataset created.

For example, the first nine lines ofa command file might be

\* File-Name: mnpl52.gData: Feb 2, 1994Author: JN

Purpose: This file does multinomial

probits on our basic model.Data Used: nes921.dat (created by

mkascic.cmd)Output File: mnpl52.outData Output: None

Machine: billandal (IBM/RS6000)\*

You could keep a template withthe fields above left blank, and readin the template to start each newcommand file. You should treat thecomments at the top of a file the

way you would the notes on a ta-

ble; they should allow the file tostand alone and be interpreted byanyone opening the file without ac-cess to other files.

Know the Goal of the Code

It makes no sense to start codingvariables if you do not know whatthe point of the analysis using thevariables will be. You end up witha bunch of variables that are all

being recoded later on and confus-

ing you to no end. Before you start

manipulating the data, figure outwhat you will be testing and howthe variables need to be set up.

Example: say you want to testthe claim that people who voted in

1992, but did not vote in 1988,were more likely to support Perotthan Bush in 1992. How do youcode the independent variable indi-cated here? Well, you really want a"new voter" variable, because

your substantive hypothesis isstated most directly in terms of"new voters." Thus, the variableshould be coded so that

vote in 1992, not voted in 1988 = 1voted in 1992, voted in 1988 = 0

Rule: Code each variable so that it

corresponds as closely as possible toa verbal description of the substan-tive hypothesis the variable will beused to test.

Fixing Mistakes

If you find errors, the errorsshould be corrected where theyfirst appear in your code-not

patched over later. This may meanrerunning many files. But this is

preferable to the alternative. If youpatch the error further downstreamin the code then you will need toremember to repeat the patchshould you make any change in the

early part of the data preparation(i.e., you decide you need to pulladditional variables from a data set,

etc.) If the patch is downstream,you are also likely to get confusedas to which of your analysis runsare based on legitimate (patched)data, and which are based on incor-rect data.

Rule: Errors in code should be cor-rected where they occur and the

code rerun.

Data Manipulation VersusData Analysis

Most data go through many ma-

nipulations from raw form to theform we use in our analyses. Itmakes sense to isolate as much ofthis as possible in a separate file.For instance, suppose you are us-

ing the 1992 National Election

Study (NES) presidential electionfile. You have 100 variables inmind that you might use in youranalysis. It makes sense to have a

program that will pull those 100variables from the NES data set,

give them the names you want, doany basic recoding that you know

you will maintain for all of youranalysis, and create a "system file"of these 100 named and recodedvariables that can be read by yourstatistics package. There are atleast two reasons for this. First, itsaves a lot of time. You are goingto estimate at least 50 models be-fore settling on one. Do you reallywant to read the whole NES dataset off the disk 50 times when youcould read a file 1/20th the size in-

stead? Second, why do all that re-coding and naming 50 times? You

might accidentally alter the recodesin one of your files.

Rule: Separate tasks related to data

manipulation vs. data analysis into

separate files.

Modularity

Separating data manipulation anddata analysis is an example of mod-

September1995 489

Page 4: Nagler - Coding Style and Good Computing Practices

7/27/2019 Nagler - Coding Style and Good Computing Practices

http://slidepdf.com/reader/full/nagler-coding-style-and-good-computing-practices 4/6

Symposium

ularity. Modularity refers to the

concept that tasks should be splitup. If two tasks can be performedsequentially, rather than two at a

time, then perform them sequen-tially. The logic for this is simple.Lots of things can go wrong. Youwant to be able to isolate what

went wrong. You also want to beable to isolate what went right. Af-ter what specific change did thingsimprove? Also, this makes formuch more readable code.

Thus if you will be engaging in

producing some tables before multi-variate analysis, you might have aseries of programs: descripl.cmd,descript2.cmd, . . ., descrip9.cmd.Following this, you might produce:

regl.cmd, reg2.cmd, . . . ,reg99.cmd. You need not constrain

yourself to one regression per file.

But the regressions in each fileshould constitute a coherent set.For instance, one file might contain

your three most likely models ofvote choice, each disaggregated bysex. This does tend to lead to pro-liferation of files. One can startwith regl.cmd and finish with

reg243.cmd. But disk space is

cheap these days, and the files can

easily be compressed and stored on

floppies if disk space is gettingtight.

Rule:Eachprogram houldperformonly one task.

KISS

Keep it simple and don't get tooclever. You may think of a veryclever way to code something thisweek. Unfortunately you may notbe as clever next week and youmight not be able to figure out what

you did. Also, the next person toread

your code might not be soclever. Finally, you might not be asclever as you think-the clever

way you think of to do three stepsat once might only work for fiveout of six possible cases you are

implementing it on. Or it might cre-ate nonsense values out of whatshould be missing data. Why takethe chance? Computers are veryfast. Any gains you make throughefficiency will be dwarfed by theconfusion caused later on in trying

to figure out what exactly yourcode is doing so efficiently.

Rule:Do not try to be as clever aspossiblewhen coding.Try to writecode thatis as simpleas possible.

Variable Names

There is basically no place forvariables named XI other than insimulated data. Our data are real;they should have names that impartas much meaning as possible. Un-

fortunately many statistical pack-ages still limit us to eight-characternames (and for portability's sake,we are forced to stick with eight-character names even in packagesthat don't impose the limit). How-

ever, your keyboard has 84 keys,and the alphabet has 52 letters: 26

lower-case and 26 upper-case. In-dulge yourself and make liberal useof them. There are also several ad-ditional useful characters-such asthe underscore and the digits0-9-at your disposal. It is a con-vention in programming to use UP-PER-CASE characters to indicateconstants and lower-case charac-ters to indicate variables. This

might not be as useful in statistical

programming. You might adopt theconvention that capitals refer to

computed quantities (suchas

PROBCHC1: the estimated PROBa-

bility of CHoosing Choice 1). Andif you are trying to have your code

closely follow the notation of a par-ticular econometrics article, youmight use a capital U for utility, ora capital V for the systemic compo-nent of utility. Obviously in such acase comments would be in order!Some people like variable namessuch as NatlEcR because the use of

capitals allows for clearly indicatingwhere one word stops and another

starts. NatlEcR makes it easier tothink of 'National Economic-Ret-

rospective' than natlecr might. Youwill need to make some tradeoffs inthe conventions you choose. The

important thing is to adopt a con-vention on the use of capitals andstick with it.

Rule:Use a consistentstyle regard-ing lower-andupper-case etters.

Rule: Use variablenamesthathavesubstantivemeaning.

When possible a variable nameshould reveal subject and direction.The simplest case is probably a

dummy variable for a respondent'sgender; imagine it is coded so that0 = men, 1 = women. We couldcall the variable either "SEX," or"WOMEN." It is clear that

"WOMEN" is the better name be-cause it indicates the direction ofthe variable. When we see our co-efficients in the output we won'thave to guess whether we codedmen = 1 or women = 1.

Rule: Use variablenamesthat indi-cate directionwhere possible.

Similarly, value labels are usefulfor packages that permit them. The

examples of computer syntax I usein this article are written in SST

(Dubin/Rivers 1992), but they can

be translated easily into SAS,SPSS, or most statistical packages.Here is a simple example. The vari-able natlecr indicates the respon-dent's retrospective view of perfor-mance of the national economy.Notice that the variable name canindicate only so much informationin 8 characters. But the label of itand the values tell us what weneed. And the fact that the labeltells us where to look the variable

up in the code book is further pro-tection.

label var[natlecrl\lab[v3531:nationalconomy - retrol\val[1 gotbet3 same 5 gotworsel\

Some people using NES data-or

any data produced by someone else

accompanied by a codebook-fol-low the convention of naming thevariable by its codebook number

(i.e., V3531), and using labels forsubstantive meaning. I think this isa poor practice. Consider which ofthe following statements is easier toread:

logit dep[preschcl\ind[oneeduc women partyid]

or:

logit dep[V56091\ind[oneV3908 V4201V3634]

The codebook name for the vari-able should definitely be retained;but it can be retained in the labelstatement. Without the codebookname one would not know which ofthe several party-identification vari-ables the variable partyid refers to.

PS: Political Science & Politics90

Page 5: Nagler - Coding Style and Good Computing Practices

7/27/2019 Nagler - Coding Style and Good Computing Practices

http://slidepdf.com/reader/full/nagler-coding-style-and-good-computing-practices 5/6

Coding Style and Good Computing Practices

Writing Cleanly

Anything that reduces ambiguityis good. Parentheses do so and so

parentheses are good. A readershould not need to remember the

precedence of operators. But inmost cases parentheses are more

valuable as visual cues to groupingexpressions than to actual prece-dence of operators. Almost as use-ful as parentheses is white space.Proper spacing in your code canmake it much easier to read. Thismeans both vertical space betweensections of a program that do dif-ferent tasks, and indenting to make

things easier to follow.

Rule:Use appropriatewhite space inyour programs,anddo so in a con-sistent fashion to makethemeasy toread.

Comments

There is probably nothing more

important than having adequatecomments in your code. Basicallyone should have comments before

any distinct task that the code isabout to perform. Beyond this, onecan have a comment to describewhat a single line does. The basicrule of thumb is this: is the line ofcode absolutely, positively self-

explanatory to someone other thanyourself without the comment? Ifthere is any ambiguity, go aheadand put the comment in.

Remember though, the commentsshould add to the clarity of thecode. Don't put a comment beforeeach line repeating the content ofthe line. Put comments in before

specific blocks of code. Only add acomment for a line where the indi-vidual line might not be clear. And

remember, if the individual line isnot clear without a comment,

maybe you should rewrite it.Rule: Includecommentsbefore eachblock of code describing he purposeof the code.

Rule: Includecomments for any lineof code if the meaningof the linewill not be unambiguouso someoneother thanyourself.

Rule:Rewriteany code that is notclear.

Following is a case where a singlecomment lets us know what is go-

ing on. Most programmers thinkthat well-written code should be

self-documenting. This is partlytrue. But no matter how well writ-ten your code is, some commentscan make it much clearer.

remremremremremrem

Finally, after you have done re-codes and created new variables itis a good idea to list all the vari-ables. This way, you can confirmthat you and your statistics pack-age agree on what data are avail-

Create party-id dummy variables

Missing values are handled correctly here by SST.In other statistics packages these three variables mighthave to be initialized as missing first.

set dem = (pid < 3)set ind = (pid == 3)set rep = (pid > 3)

rem ***********************************************************rem ***********************************************************

Recodes andCreatingNew Variables

Probably the most importantthing to keep track of both when

recoding variables and creating newvariables is missing data. There isno general rule that can specify ex-

actly how to do this, because treat-ment of missing data can varyacross statistics packages. Thus thebest rule is:

Rule:Verifythatmissingdataarehandledcorrectlyon any recode or

creationof a new variable.

In some statistics packages youmay be best served by initializingall new variables as missing data,and allowing them to become legiti-mate values only when they are

assigned a legitimate value. Thebest advice is to recode and createnew variables defensively.

Rule: After creatingeach new vari-able or recoding anyvariable,pro-duce frequenciesor descriptivesta-tistics of the new variable and

examine themto be sure thatyouachievedwhat you intended.

Generally it is poor style to hard-wire values into your code. Anyspecific values are likely to changewhen some related piece of codesomewhere else is altered or whenthe data set changes.

Rule:Whenpossible, automatethingsandavoid placinghard-wiredvalues (those computed"by hand")in code.

able and how many observationsare available for each variable. InSST this would be done with a list

command; in SAS, PROC CON-TENTS will produce a clean list ofvariables. Most statistics packagesoffer similar commands.

Procedures or Macros

Most political scientists do notever have to write a macro or pro-cedure, but maybe that's why they

do so little secondary analysis oncethey generate some estimates. The

purpose of a well-defined procedureis to automate a particular se-

quence of steps. Procedures andmacros are useful both in makingyour code more readable and in

allowing you to perform the same

operation multiple times on differ-ent values or on different variables.The use of procedures and macrosis a topic for a separate article, but

political scientists should realizethat the tools are available in most

statistics packages.

Summing Up

The rules presented here repre-sent one way of accomplishing the

goal you should have in mind. That

goal is to write clear code that willfunction reliably and that can beread and understood by you andothers and can serve as a road map

September1995 491

Page 6: Nagler - Coding Style and Good Computing Practices

7/27/2019 Nagler - Coding Style and Good Computing Practices

http://slidepdf.com/reader/full/nagler-coding-style-and-good-computing-practices 6/6

Symposiumymposium

for replicating and extending yourresearch.

Most people are in a huge hurrywhen they write their code. Either

they are excited about getting theresults and want them as fast as

possible, or they figure the codewill be run once and then thrown

out. If your program is not worthdocumenting, it probably is notworth running. The time you save

by writing clean code and com-

menting it carefully may be yourown.

Rules

1. Maintain a lab book from the

beginning of a project to theend.

2. Code each variable so that it

corresponds as closely as pos-sible to a verbal description ofthe substantive hypothesis thevariable will be used to test.

3. Correct errors in code where

they occur, and rerun the code.4. Separate tasks related to data

manipulation vs. data analysisinto separate files.

5. Design each program to per-form only one task.

6. Do not try to be as clever as

possible when coding. Try to

for replicating and extending yourresearch.

Most people are in a huge hurrywhen they write their code. Either

they are excited about getting theresults and want them as fast as

possible, or they figure the codewill be run once and then thrown

out. If your program is not worthdocumenting, it probably is notworth running. The time you save

by writing clean code and com-

menting it carefully may be yourown.

Rules

1. Maintain a lab book from the

beginning of a project to theend.

2. Code each variable so that it

corresponds as closely as pos-sible to a verbal description ofthe substantive hypothesis thevariable will be used to test.

3. Correct errors in code where

they occur, and rerun the code.4. Separate tasks related to data

manipulation vs. data analysisinto separate files.

5. Design each program to per-form only one task.

6. Do not try to be as clever as

possible when coding. Try to

write code that is as simple as

possible.7. Set up each section of a pro-

gram to perform only one task.8. Use a consistent style regard-

ing lower- and upper-case let-ters.

9. Use variable names that have

substantive meaning.10. Use variable names that indi-

cate direction where possible.11. Use appropriate white space in

your programs, and do so in aconsistent fashion to make the

programs easy to read.12. Include comments before each

block of code describing the

purpose of the code.13. Include comments for any line

of code if the meaning of theline will not be unambiguous tosomeone other than yourself.

14. Rewrite any code that is notclear.

15. Verify that missing data arehandled correctly on any re-code or creation of a new vari-able.

16. After creating each new vari-able or recoding any variable,produce frequencies or descrip-tive statistics of the new vari-able and examine them to besure that you achieved what

you intended.

write code that is as simple as

possible.7. Set up each section of a pro-

gram to perform only one task.8. Use a consistent style regard-

ing lower- and upper-case let-ters.

9. Use variable names that have

substantive meaning.10. Use variable names that indi-

cate direction where possible.11. Use appropriate white space in

your programs, and do so in aconsistent fashion to make the

programs easy to read.12. Include comments before each

block of code describing the

purpose of the code.13. Include comments for any line

of code if the meaning of theline will not be unambiguous tosomeone other than yourself.

14. Rewrite any code that is notclear.

15. Verify that missing data arehandled correctly on any re-code or creation of a new vari-able.

16. After creating each new vari-able or recoding any variable,produce frequencies or descrip-tive statistics of the new vari-able and examine them to besure that you achieved what

you intended.

17. When possible, automate thingsand avoid placing hard-wiredvalues (those computed "byhand") in code.

Note

1. A version of this article appeared inThe Political Methodologist, vol. 6, no. 2,

spring 1995, 2-8.I thank Charles Franklin, Bob Hanneman,

Gary King, and Burt Kritzer for useful com-ments and suggestions.

References

Dubin, Jeffrey, and R. Douglas Rivers. 1992.Statistical Software Tools Users Guide:Volume 2.0. Pasadena, CA: Dubin/RiversResearch.

Kernighan, Brian W. and P. J. Plauger.1978. The Elements of ProgrammingStyle, 2nd edition. New York: McGraw-Hill.

About the Author

Jonathan Nagler is associate professor at

University of California, Riverside. His re-search interests include qualitative analysis,campaigns and elections. He can be reachedat [email protected].

17. When possible, automate thingsand avoid placing hard-wiredvalues (those computed "byhand") in code.

Note

1. A version of this article appeared inThe Political Methodologist, vol. 6, no. 2,

spring 1995, 2-8.I thank Charles Franklin, Bob Hanneman,

Gary King, and Burt Kritzer for useful com-ments and suggestions.

References

Dubin, Jeffrey, and R. Douglas Rivers. 1992.Statistical Software Tools Users Guide:Volume 2.0. Pasadena, CA: Dubin/RiversResearch.

Kernighan, Brian W. and P. J. Plauger.1978. The Elements of ProgrammingStyle, 2nd edition. New York: McGraw-Hill.

About the Author

Jonathan Nagler is associate professor at

University of California, Riverside. His re-search interests include qualitative analysis,campaigns and elections. He can be reachedat [email protected].

Response: Potential Research Policies for Political Science

Paul S. Herrnson, University of Maryland, College Park

Response: Potential Research Policies for Political Science

Paul S. Herrnson, University of Maryland, College Park

As evidenced by the thoughtful

symposium participants, politicalscientists have a variety of opinionsabout verification, replication, anddata archiving. I continue to have

strong reservations regarding the

proposed replication, verification,data relinquishment rules debatedhere. The symposium, however,has convinced me of one importantthing: a still broader discussion isneeded before editors or organiza-tions change the norms governingresearch or publication. WalterStone's (1995) proposal to surveypolitical scientists to obtain their

As evidenced by the thoughtful

symposium participants, politicalscientists have a variety of opinionsabout verification, replication, anddata archiving. I continue to have

strong reservations regarding the

proposed replication, verification,data relinquishment rules debatedhere. The symposium, however,has convinced me of one importantthing: a still broader discussion isneeded before editors or organiza-tions change the norms governingresearch or publication. WalterStone's (1995) proposal to surveypolitical scientists to obtain their

opinions on verification/replicationis, therefore, a good one.

The survey should ask politicalscientists to consider a

varietyof

policies applicable to quantitativestudies, including the following:

1. A "True" Replication PolicyJournals would reserve a certainamount of space for studies that

replicate a piece of research inits entirety, including data col-lection and analysis. Authorswho seek to publish studies thatare based on original data wouldbe required to provide more de-

opinions on verification/replicationis, therefore, a good one.

The survey should ask politicalscientists to consider a

varietyof

policies applicable to quantitativestudies, including the following:

1. A "True" Replication PolicyJournals would reserve a certainamount of space for studies that

replicate a piece of research inits entirety, including data col-lection and analysis. Authorswho seek to publish studies thatare based on original data wouldbe required to provide more de-

tails on their data collection pro-cess than are included in articles

currently published in most po-litical science journals, making iteasier for others to replicate the

original research. Neulip (1991)is an example of a publicationbased entirely on replication.

2. A Verification PolicyJournals would require scholarsto include with their manuscriptsubmissions the printout fromwhich their statistical resultswere generated. The printoutwould include variable distribu-

tions, scatterplots, and other

tails on their data collection pro-cess than are included in articles

currently published in most po-litical science journals, making iteasier for others to replicate the

original research. Neulip (1991)is an example of a publicationbased entirely on replication.

2. A Verification PolicyJournals would require scholarsto include with their manuscriptsubmissions the printout fromwhich their statistical resultswere generated. The printoutwould include variable distribu-

tions, scatterplots, and other

PS: Political Science & PoliticsS: Political Science & Politics9292