Analysing Survey Data

6
part II Analysing Survey Data Choosing the Right Data Analysis Techniques

Transcript of Analysing Survey Data

Page 1: Analysing Survey Data

part II

Analysing Survey Data

Choosing the Right Data Analysis Techniques

Page 2: Analysing Survey Data

Introduction to Part II

In data construction the focus is on the entry of a single value at a time for a single respondent on a single variable. The focus of data analysis, by contrast, is upon data in the aggregate, and the individual respondent along with his or her associated values 'disappear' in the sense that they can no longer be identified from the results. While data construction and data analysis are separate processes, they are nevertheless interdependent. Both the design of the data matrix and its analysis must reflect the objectives of the research as outlined by the researcher or as agreed between researcher and client. The particular analysis procedures deployed must reflect a clear understanding of how the data were constructed, and in particular the measurement and scaling procedures used. The fact that data construction and data analysis are separate processes, however, means that good quality data can be used for a number of purposes using a variety of different data analysis techniques. What is called 'secondary analysis' and the emergence of large data archives depend on this ability to separate data construction and data analysis.

Data analysis is the process whereby researchers take the raw data that have been entered into the data matrix and create information that can be used to tackle the objectives for which the research was undertaken. The raw data are of little informative value themselves until they have been structured, summarised and a range of conclusions drawn from them. Such conclusions, furthermore, need to be relevant to the objectives of the research.

Data will have been entered row by row, and analysis now proceeds by performing a range of operations on the columns. Before any researcher can decide what analysis techniques to deploy, three key questions need to be answered:

• What does the researcher want to do with the data? • On what type of scale are the data recorded? • How many variables are to be entered into the analysis?

Answering these questions may be seen as three key steps that have to do with establishing the objectives of the analysis, the scale type and the number of variables.

ANALYSIS OBJECTIVES

The researcher may wish to do one or more of three main things with the data in the data matrix:

• display the data, • summarise the data, • draw conclusions from the data.

Data display takes the raw data and presents them in tables, charts or graphs so that it is possible for readers to 'eyeball' the total distribution on a single

74

Page 3: Analysing Survey Data

SCALE TYPE

Analysing Survey Data - 75

variable or to observe the pattern of relationships between two or more variables. Chapters 4 and 5 look at tables and charts for categorical and for interval variables respectively.

Summarising data uses statistical methods like calculating an average on a single variable, or measures of association or correlation on two or more variables to reduce the data to a few key swnmary measures. Chapters 6 and 7 explain summary measures, again respectively, for categorical and for interval variables.

Data display and data summaries are components of what is commonly referred to as 'descriptive' statistics. Drawing conclusions may involve one or more of three main activities:

• statistical inference, • evaluating hypotheses against the data, • explaining discovered relationships between variables.

Sometimes the data in the data matrix relate to a set of respondents who are part of a sample that was chosen using probability (random) methods. The issues of when survey researchers take samples, sample design and the errors that may arise from the sampling process are taken up in Chapter 8. The researcher who has taken a sample may wish to make estimates based on the sample of total, proportional or average values for the population of cases from which the sample was drawn. Alternatively, the researcher may wish to test statements or hypotheses made about the population of cases against the probability that survey research findings were, in fact, a result of random sampling fluctuations. Chapters 9 and 10 explain how researchers make inferences, again respectively, for categorical and for interval variables.

These procedures are known variously as 'inferential' statistics, 'statistical inference', 'significance testing' or 'testing statistical significance'. Making estimates is, unsurprisingly, generally referred to as 'estimation'. However, the second procedure is almost universally called 'hypothesis-testing', which in many ways is unfortunate, because testing the statistical significance of a statement is only one of several ways in which hypotheses may be evaluated.

In evaluating hypotheses, the researcher is more concerned about the extent to which the data in the data matrix 'fit' or support his or her initial ideas, hunches or formally stated hypotheses. Hypotheses come in many different forms, they may be stated formally before the data analysis begins, or they may emerge during, or even after, the analysis. All these different circumstances affect the ways in which they may be appropriately evaluated. The first part of Chapter 11 takes up the theme of evaluating hypotheses in rather more detail.

Once hypotheses have been evaluated, it might still be necessary to explain why the research findings appear to be as they are. What counts as an 'explanation', however, can vary enormously from causal analysis to providing understanding or discovering a dialectic. The second part of Chapter 11looks at these issues.

Chapter 2 introduced you to different types of scale. It made a basic distinction between categorical and interval scales with the former sub-divided into labelling, binary, nominal, ordinal and ranked, and the latter sub-divided

Page 4: Analysing Survey Data

76 - Data Construction and Data Analysis for Survey Research

into discrete and continuous. The type of scale crucially affects the kind of statistical operations that may be performed on the data, so after clarifying what he or she wants to do with the data, the researcher must be very clear about the nahtl'e of the scale for each variable being mapped into the matrix. Otapters 4, 6 and 9 deal specifically with categorical variables and what can be done with them by way of tables, charts, summaries and statistical inference. The various sub­types of categorical scale also assume an important role. Otapters 5, 7 and 10 look at interval variables and what can be done with these by way of tables, charts, summaries and statistical inference.

THE NUMBER OF VARIABLES

When approaching a data matrix the first thing a researcher needs to do is to look at the distribution of each variable, one at a time. This is usually called 'univariate' analysis. So, the researcher might use data display, data summary or statistical inference separately on each variable. Which particular techniques are used depends on the scale involved. For categorical variables it would be usual to get SPSS to create univariate (or 'one-way') tables, bar charts or pie charts. Otapter 4 explains how this is done. For interval variables it is also possible to create one-way tables for discrete variables, but for continuous variables it would normally be necessary to group the values into class intervals before this can be done. The procedures are explained in Otapter 5 along with other ways of displaying interval variables. Which summary measures and what procedures for statistical inference can be used on variables one at a time similarly depend on the scale involved. Chapter 6 deals with summary measures for categorical variables and Chapter 7 with summary measures for interval variables. (Univariate statistical inference is explained in Chapters 9 and 10.)

Univariate analysis, however, tells the researcher nothing about the relation­ships between the variables. 'Bivariate' analysis takes variables two at a time to see whether there is any pattern in the way the values of the two values jointly occur. If the two variables are categorical it is possible to display the relationship between them in a crosstabulation. Crosstabulations are explained in Chapter 4, while Chapter 6 considers how it is possible to calculate summary measures for two crosstabulated variables. If both are interval then the relationship may be graphed in a scattergram. (These are explained in Chapter 5.) Relationships between two interval variables may be summarised by using correlation and regression. Similarly, it is possible to undertake statistical inference for bivariate relationships (Chapters 9 and 10).

Bivariate analysis is limited to looking at the relationships between variables two at a time; multivariate analysis techniques allow the analysis of three or more variables simultaneously. It has a number of advantages over univariate and bivariate procedures, namely:

• it permits conclusions to be drawn about the nahtl'e of causal connections between variables (establishing causality is discussed in Chapter 11),

• it facilitates the grouping together of variables that are inter-related, or cases that are similar in terms of their characteristics,

Page 5: Analysing Survey Data

Analysing Survey Data - 77

• it provides the ability to predict dependent variables from two or more independent variables and hence improve on predictions made on the basis of only one variable.

Where all the variables to be used in multivariate analysis are categorical, then it is possible to 'layer' or 'control' the relationship between two variables by a third, fourth, fifth variable and so on in the process of crosstabulation. How this is done is explained in Chapter 4. Where all the variables are interval then much more sophisticated techniques like multiple regression, factor analysis, and cluster analysis are possible. This book considers these techniques briefly and shows you how to obtain them using SPSS. However, for a more detailed consideration, other sources are recommended.

Where variables to be used in multivariate analysis are a mixture of categorical and interval then other techniques like analysis of variance may be used. (This is explained in Chapter 10.)

CHOOSING DATA ANALYSIS TECHNIQUES

Data analysis is not a 'one-off enterprise that a researcher undertakes on one single occasion. Rather, it is an iterative process in which the researcher moves backwards and forwards between the objectives of the research and a number of 'sessions' of analysing data from the matrix. For each session, the researcher needs to answer the three question posed at the beginning of this section:

• What do I want to do with the data - display, summarise or draw conclusions? • Onto what type of scale are the variables recorded - categorical or interval? • How many variables do I want to enter into a single analysis - one, two or

more than two?

Thus a researcher in one session may wish to display the relationship between two categorical variables and in another may wish to summarise a single interval variable. The various factors that will affect the researcher's choice of technique are summarised in Figure ii.l, which can be seen as a kind of 'map' of the rest of this book. Notice that, for the sake of completeness, there is a fourth element in Figure ii.l, namely respondents. If these constitute a random sample then it is possible to deploy statistical inference. If the sample is non-random or is a census or an attempt at a census, then such techniques are not appropriate.

Further reading

Fink. A. (1995) How to analyse survey data, london, Sage. A very simple and straightforward introduction to the types statistics that are useful for analysing data from surveys. A good book to begin with if you are approaching statistics for the first time. However, there is little on analysing categorical data and no mention of crosstabulation. Selecting statistical methods for surveys appears to entail picking your test of statistical significance. There is a brief introduction to SPSS output, but it refers to SPSS/PC+, which is little used these days.

Page 6: Analysing Survey Data

78 - Data Construction and Data Analysis for Survey Research

Figure 11.1 Factors determining choice of technique

-univariate