Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web...

15
Chapter 2 Using a Database: Comparative Research with a Standard Sample http:// links are live The first steps in testing hypotheses with a database are to search the codebook for variables relevant to the hypothesis, study how these variables are coded, and decide which variables to retrieve from the database. In the case of the Standard Sample codebook (http://eclectic.ss.uci.edu/~drwhite/courses/SCCCodes.htm), as with other social science datasets, there will be additional information somewhere in the codebook as to the specific criteria and methods used when the original investigatores coded the variables. You might not look at this background information right away, but once you take a serious interest in a particular set of variables, you should do so. In the SCCS codes, for example, codes come in clusters that are numbered and grouped according to the original investigators who contributed the codes. Thus the first SCCS study by Murdock and Morrow (1970) contributed 22 variable and there were numbered 1-22 in the cumulative codebook. Within the codebook itself there is a header statement before this set of variables, which for this set of codes reads as follows: SUBSISTENCE ECONOMY AND SUPPORTIVE PRACTICES George P. Murdock and Diana O. Morrow. 1970. ETHNOLOGY 9:302-330. Datafile: STDS01.DAT Vars. 1- 22 subsistence The first line is the title of the article where the article first appeared in the journal ETHNOLOGY and as noted in the next line, this article can be found in volume 9, pp. 302- 330, under the authorship of George P. Murdock and Diana O.

Transcript of Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web...

Page 1: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Chapter 2Using a Database: Comparative Research

with a Standard Sample http:// links are live

The first steps in testing hypotheses with a database are to search the codebook for variables relevant to the hypothesis, study how these variables are coded, and decide which variables to retrieve from the database. In the case of the Standard Sample codebook (http://eclectic.ss.uci.edu/~drwhite/courses/SCCCodes.htm), as with other social science datasets, there will be additional information somewhere in the codebook as to the specific criteria and methods used when the original investigatores coded the variables. You might not look at this background information right away, but once you take a serious interest in a particular set of variables, you should do so. In the SCCS codes, for example, codes come in clusters that are numbered and grouped according to the original investigators who contributed the codes. Thus the first SCCS study by Murdock and Morrow (1970) contributed 22 variable and there were numbered 1-22 in the cumulative codebook. Within the codebook itself there is a header statement before this set of variables, which for this set of codes reads as follows:

SUBSISTENCE ECONOMY AND SUPPORTIVE PRACTICESGeorge P. Murdock and Diana O. Morrow. 1970. ETHNOLOGY 9:302-330.

Datafile: STDS01.DAT Vars. 1- 22 subsistence

The first line is the title of the article where the article first appeared in the journal ETH-NOLOGY and as noted in the next line, this article can be found in volume 9, pp. 302-330, under the authorship of George P. Murdock and Diana O. Morrow (1970). This then gives you all the information needed to find this article in the library if you need to use coded variables that were published by these authors (in which case you should also cite them in your research paper resulting from your analysis).

In the first section of this chapter we show how to open and search the codebook for relevant information and variables, how to copy and store critical information from the codebook that you will be using for your research, and how to think about what types of variables may be relevant for the topics you are thinking of studying. You will note that it is important to note the general context of the study, such as SUBSISTENCE ECONOMY, the name of each variable, such as INTERCOMMUNITY TRADE AS FOOD SOURCE, and the specific categories used under the heading that variable to classify cases (societies as described by one or more ethnographers at a particular time period) into particular named coding categories (e.g., >50% of food). You will note that the coding categories are compressed and telegraphic: “>50% of food” means that if this code applies to a given society, that society depends on intercommunity trade for over half its economic subsistence. If you are unsure about the meanings of codoes, consult the original article in which it is published, but in any case take a good look at the title of the article. Do not

Page 2: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Chapter 2

assume, as did one hapless soul, that a code “absent” under a variable named Political and Legal system in an article on Modernization means “no political or legal system” when an examination of the article itself and other variables coded would clearly indicate that what was coded were presence or absence of recent changes in the political and legal system. Given that kind of source of confusion, the labeling of variables in the codebook for that study were changed to indicate more clearly that it was change that was coded and “absent” category is node labeled “no changes.” The coding labels, however, were intended as a shorthand to appear on a table made by a computer program with access to the coded data, so that shorthand label is not sufficient to understand what was coded in all cases. Do your background research adequately so as to understand what your variables and codes consist of, if not initially, then once you have decided which variables to investigate.

Section two of this chapter shows the next steps you will take to work with the computer database using SPSS to extract the variables you are interested in so as to be able to use them to test hypotheses. The SCCS database is organized in rows that represent cases (societies) and columns that represent variables. In earlier editions of the database there were separate files for each set of variables from a single article or study, such as Datafile: STDS01.DAT for Murdock and Morrow’s (1970) variables 1-22 on subsistence.1 Now these reside in a single SPSS file, “SCCSDatabase.sav,” which is distributed on CD rom by the World Cultures journal.2

If you are a subscriber to the journal and have received the CD rom, or if you have purchased this book with a CD rom, first copy to the hard disk of your computer from the CD which accompanies this manual files “SCCSDatabase.sav” and “Codes.doc.” It might make sense to put them in a special directory named, e.g., CROSSCULT. In addition to this, be sure that you should have at hand a piece of paper/notebook and a pen.

If you are a student in a computer lab at a school where this database is used in class, your instructor will have placed all the material you need on the hard drives at that lab. Most of the material used in this book is available at the home site of the UCI course in which this material was first put to use instructionally, and is accessible at http://eclectic.ss.uci.edu/~drwhite/courses. If you are concerned about the variable of time because the SCCS cases are coded only for a single time period, you will find on that site links to an archaeological site database comparable to the SCCS.Now we can begin.

Section 1: Finding Variables

1 STDS and SCCS stand for STandarD Cross-Cultural Sample (Murdock and White 1969), using variations in the acrynym. The sample is described in http://eclectic.ss.uci.edu/~drwhite/pub/SCCS1969.pdf, where you will also find links to the societies in the sample. William Divalle converted the raw STDS .dat files later on into the labeled SPSS files that we used today, and these were similarly named, e.g., STDS01.SAV, where ‘.sav’ is the extension used for SPSS data files. More recently, Andrey Korotayev combined nearly all these files (except for the newest ones for which the variables had not been sequentially numbered) into a single file, “SCCSDatabase.sav.” 2 http://eclectic.ss.uci.edu/~drwhite/worldcul/world.htm is the electronic site for the journal, which is also published on paper. Codebooks, datasets and articles are distributed in the accompanying CD rom.

2

Page 3: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Cross-Cultural Research: Starting up

Exercise 1

Now, let us, for example, test a hypothesis that the transition to agriculture would tend to lead to the transition to fixed settlement patterns using the Standard Cross-Cultural Sample database. (This hypothesis would predict that the reliance on agriculture should correlate positively with the fixity of settlement, i.e. the higher the reliance on agriculture by the given culture is, the more fixed settlement [as opposed to nomadic or migratory] it is likely to have).

To do this we should find the respective variables in the database. To find them open the file “CODES.DOC.” Then press “CTRL + F” button.

You will see the following window:

The independent variable in our case is the reliance on agriculture. However, we do not advise you just to type the name of the variable in the “Find what” line. The same variable could be named in a number of different ways. Hence, we would rather advise you to type a keyword, i.e. a word which is bound to be present in the variable name whatever way is chosen to denote it. In our case this word seems to be just “agriculture.” So, let us type this word in the “Find what” (it could be also named “Search for”) line. Then press “Find” (or “Find Next”) button.

3

Page 4: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Chapter 2

As a result you will get the following window:

As we see, the first variable which we have found is described in the following way:

3. AGRICULTURE- CONTRIBUTION TO LOCAL FOOD SUPPLY 35 1 = None 3 2 = Non-food Crops 17 3 = < 10% 12 4 = < 50%, and less than any other single source, incl. trade 42 5 = < 50%, and more than any other single source, incl. trade 773 6 = Primarily agricultural

Hence, the impression is that what we have found is quite appropriate for our task. Now, write down the number of this variable (which is 3).

IMPORTANT NOTE: If you do serious cross-cultural research we would strongly advise you to continue your search. Many cross-cultural variables (including the variable under consideration) were coded more than once; that is why more than one version of them have been published and are available in the electronic form. The first variable you will find is not always the best available one for your purposes. Hence, our suggestion would be to continue the search till you reach the end of the variable list writing down numbers

3 Figures to the right denote number of Standard Cross-Cultural Sample cultures possess-ing the respective characteristic. E.g., number 77 at the beginning of the last line indicates that 77 Standard Sample cultures are primarily agricultural.

4

Page 5: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Cross-Cultural Research: Starting up

of relevant variables, to compare between them and to choose the most appropriate one. Frequently it makes sense to perform several tests using all the variables you have found.

However, as this is our first exercise we will simplify our task and will perform just one test using the first variable on the reliance on agriculture we found.

Yet, now we have also to find a variable on settlement fixity. To do this press “CTRL + F” and type in the “Search for” (or “Find what”) line a new keyword. What keyword would you suggest? Perhaps, the most evident keyword here would be just “fixity.” So, type this word and press “Find” (or “Find Next”) button. You will see the following:

Again, the first variable which we have found turns out to be quite appropriate (incidentally, let us remind you that this will not always be the case!). So, let us write down its number and after that we can go to the database.

However, before doing this, let us find variables for testing another hypothesis. According to this hypothesis the growth of population density leads to an increase in political complexity (hence, according to this hypothesis population density should correlate positively with political complexity). We have to find in the Standard Cross-Cultural Sample variables to test the hypothesis.

Let us start with population density. Follow the algorithm specified above using “density” as a keyword. The result should look as follows:

5

Page 6: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Chapter 2

In fact, the first variable we have found is quite appropriate for our task. So, after writing down its number we can start looking for the other variable, political complexity.

Soon you will understand that in this case our task is not so simple. Indeed if you follow the above described algorithm using “complexity” as a keyword, your first finding will look as follows:

6

Page 7: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Cross-Cultural Research: Starting up

As you see this is not quite that “complexity” we really need. You can go on looking for “complexity” in the codebook. And you will find it there three more times. But again you will see that in all three cases the keyword will turn out to be as useless.

Thus, in this case we appear to be in need in a less straightforward solution. Let us recollect what levels of political complexity we know. Apparently the most frequently used scheme of the political complexity levels is the one designed by Service (1962): band – tribe – chiefdom – state. Out these three words the one which is less likely to be used outside the discussion of political complexity levels is “chiefdom.” Let us use it as the keyword.

The result will look as follows:

7

Page 8: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Chapter 2

Let us study now the variable which we have found. Yes, its name might not suggest to us that it is what we need. But if we study how this variable is coded we will see immediately that it is JUST what we need. Hence, let us write down this variable number and move further on.

Section 2: Working with the database

Our natural next step is to start working with the database itself. To do this you should just open the file SCCSDatabase.sav which could be found on the CD which accompa-nies this manual and which you are supposed to have already copied to the hard disk of your computer. Now find the file in the directory, then double click on the SPSS sign next to the file’s name (We assume that SPSS has already been installed on your com-puter).

You will see the following window:

8

Page 9: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Cross-Cultural Research: Starting up

Note that by default SPSS opens files without value labels. For example the first line of the database looks as follows:

sccs# socname foc_year v1 v2 v31 Nama Hottentot 1860 4 1 1

“1” is just the case number. “Socname” is the name of the respective culture which in our case is “Nama Hottentot.” “1860” is the “focal year,” i.e. the year when the data were collected. However, if you are dealing with an SPSS database for the first time, the rest might look entirely puzzling. For example, what could “v1 = 4” mean. In fact, it is not difficult at all to get to know what v1 is. You should just move your mouse pointer to “v1” and you will see the following:

9

Page 10: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Chapter 2

This way you can easily find out that the database column marked as “v1” contains information on intercommunity trade as food source. (In the same way, of course, you can easily get to know the same information on all the other variables.) But what could “4” just to the right from “Nama Hottentot” mean? To get to know what this number means (as well as all the other numbers/codes in the database) is very easy. As shown in the following window, just choose in the menu line:

VIEW →VALUE LABELS

10

Page 11: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Cross-Cultural Research: Starting up

Immediately after this the database will start looking in an entirely different way:

(the SCCSDatabase.sav file does not have foc_year this window)

11

Page 12: Statistics for Cross-Cultural Researcheclectic.ss.uci.edu/~drwhite/xc/!XC-BK2.doc  · Web viewUsing a Database: Comparative Research with a Standard Sample . http:// links are live.

Chapter 2

Now you can easily see which information the upper cell in “v1” column contains: that the Nama Hottentot in 1860 got less than 10% of their food through intercommunity trade. The information contained in all the other cells of the database can now, of course, be as easily understood.

Now we can start analyzing the database. Chapter 3 begins with scatterplots and maps, and Chapter 4 continues with how to do cross-tabulations. You will then be ready for Chapter 5, which is a mini-course in statistics that you may use as a reference in chosing and interpreting correlations, in evaluating tests of significance, and in evaluating your hypotheses generally.

Chapter 6 provides additional help in reading and interpreting cross-tabulations from the larger perspective of the anthropological sciences, and Chapter 7 provides the advanced methods for developing and evaluating your measures through one-factor constructed variables and in testing hypotheses using third factors as controls.

In this Chapter you have begun your journey into the anthropological sciences through a cross-cultural research project. With the following chapters you will be able to finish your journey. Since this book is electronic, hence easily revised, we will appreciate your feedback on the good rapids and the bad. Like river-rafting, perhaps the best approach is to deal with each turn and problem as they are encountered, and return to these chapters as well as the on-line links for the course as needed.

12