1
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
Module Detail and its Structure
Subject Name Sociology
Paper Name Methodology of Research in Sociology
Module Name/Title Processing and Analyzing Quantitative Data
Module Id RMS 20
Pre-requisites Some knowledge on social statistics
Objectives This module will deal with the issues involved in the process of handling,
managing and interpreting quantitative data collected in the process of research. It
will also discuss about the basic statistical tools with the help of which we analyse
social phenomena.
Keywords Coding, editing, statistics, quantitative research, measures of central tendency,
dispersion, coefficient correlation and regression.
Role in Content
Development
Name Affiliation
Principal Investigator Prof. Sujata Patel Dept. of Sociology,
University of Hyderabad
Paper Co-ordinator Prof. Biswajit Ghosh Professor, Department of Sociology, The
University of Burdwan, Burdwan 713104
Email: [email protected]
Ph. M +91 9002769014
Content Writer Dr. Udita Mitra Assistant Professor, Department of Sociology,
Shri Shikshayatan College, Kolkata-700095
Email: [email protected]
Ph. M +91 9433213816
Ph. L (O) 033-24140594
Content Reviewer (CR)
& Language Editor
Prof. Biswajit Ghosh Professor, Department of Sociology, The
University of Burdwan, Burdwan 713104
2
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
Contents
1. Objective .................................................................................................................................... 3
2. Introduction…………………………………………………………………………………….3
3. Learning Outcome .................................................................... Error! Bookmark not defined.
4. Data Processing ........................................................................ Error! Bookmark not defined.
4.1 Editing ........................................................................................................................................ 3
4.2 Coding……………………………………………………………………………………….....3
4.3 Classification .............................................................................................................................. 4
4.4. Tabulation..................................................................................................................................4
Self-check exercise – 1..............................................................................................................4
5. Data Analysis…………………………………………………………………………………..5
6. Statistics in Social Research…………………………………………………………………...5
Self-check exercise – 2..............................................................................................................6
6.1 Measures of Central tendency .................................................................................................... 6
6.2 Measures of Dispersion ............................................................................................................ ..9
6.3 Chi-Square Test ........................................................................................................................ 13
6.4 T-test.......................................................................................................................................15
6.5 Measures of Relationship…………………………………………………………………….18
Self-check Exercise - 3…………………………………………………………………….....22
7. Limitations of Statistics in Sociology…………………………………………………………23
8. Summary..................................................................................................................................23
9. References ................................................................................................................................ 25
3
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
1. Objective
This module will deal with the issues involved in the process of handling, managing and interpreting
quantitative data collected in the process of research. It will also discuss about the basic statistical
tools with the help of which we analyse social phenomena.
2. Introduction
Quantitative research can be construed as a research strategy that emphasizes quantification in the
collection and analysis of data. It entails a deductive approach to the relationship between theory and
research in which the accent is placed on testing the theories. Quantitative research usually
incorporates the practices and norms of the natural scientific model and of positivism in particular and
it also embodies a view of social reality as an external, objective reality (Bryman 2004: 19). It also
has a preoccupation with measurement and involves a process of collecting large amount of data.
These data may be collected through various ways like survey and field research. The data, after
collection, have to be processed in order to ensure their proper analysis and interpretation. According
to Kothari (2004), technically, processing implies editing, coding, classification and tabulation of
collected data so that they are amenable to analysis. These endeavours help us to search for patterns of
relationship that exist among data-groups (Ibid.: 122).
3. Learning Outcome
This module will help you to understand different issues involved in processing and analysing
quantitative data. It will also help you to grasp the essential steps of applying various statistical
measures in order to interpret data collected through social research.
4. Data Processing
Data reduction or processing mainly involves various steps necessary for preparing the data for
analysis. These steps involve editing, categorising the open-ended questions, coding, computerization
and preparation of tables (Ahuja 2007: 304). The processing of data is an essential step before
analysis because it enables us to overcome the errors at the stage of data collection.
4.1. Editing
According to Majumdar (2005), error can come in at any stage of social research especially in the
stage of data collection. These errors have to be kept at a minimum level to avoid errors in the results
of the research. Editing or checking for errors in the completed questionnaires is a laborious exercise
and needs to be done meticulously. Interviewers tend to commit mistakes like some questions are
missed out; some answers remain unrecorded or are recorded at the wrong places. The questionnaires
therefore need to be checked for completeness, accuracy and uniformity (Ibid.: 310).
4.2. Coding
Coding implicates the process of assigning numbers or other symbols to answers so that they can be
categorized into specific classes. Such classes should be appropriate to the research problem under
consideration (Kothari 2004: 123). Careful consideration should be made so as not to leave out any
response uncoded. According to Majumdar (2005: 313), a set of categories is referred to as “coding
frame” or “code book”. Code book explains how to assign numerical codes for response categories
4
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
received in the questionnaire/schedule. It also indicates the location of a variable on computer cards.
Ahuja (2007: 306) provides an example to illustrate how variables can be coded. In a question
regarding the religion of the respondent the answer categories of Hindu, Muslim, Sikh, and Christian
can be coded as 1, 2, 3, and 4 respectively. In such cases, the counting of frequencies will not be
according to Hindus, Muslims etc., but as 1, 2 and so on. Coding can be done manually or with the
help of computers.
4.3. Classification
Besides editing and coding of data, classification is another important method to process data.
Classification has been defined as the process of arranging data into groups and classes on the basis of
some common characteristics (Kothari 2004: 123). Classification can be of two types, namely
Classification according to attributes or common characteristics like gender, literacy etc., and
Classification according to class intervals whereby the entire range of data is divided into a
number of classes or class intervals.
4.4. Tabulation
Tabulation is the process of summarising raw data and displaying the same in compact form for
further analysis (Kothari 2004: 127). The necessity of tabulating raw data is:
It conserves space and reduces explanatory and descriptive statement to a minimum, and
It provides a basis for various statistical computations.
Tabulation can be done manually as well as with electronic and mechanical devices like computers.
When the data are not large in number, tabulation can be done by hand with the help of tally marks.
Self check exercise – 1
Question 1. Tabulate the following examination grades for 80 students.
72, 49, 81, 52, 31,38,81, 58,68, 73, 43, 56, 45, 54, 40, 81, 60, 52, 52, 38, 79, 83, 63, 58, 59, 71, 89, 73,
77, 60, 65, 60, 69, 88, 75, 59, 52, 75, 70, 93, 90, 62, 91, 61, 53, 83, 32, 49, 39, 57, 39, 28, 67, 74, 61,
42, 39, 76, 68, 65, 58, 49, 72, 29, 70, 56, 48, 60, 36, 79, 72, 65, 40, 49, 37, 63, 72, 58, 62, 46 (Levin
and Fox 2006).
Procedures for Tabulation/Grouping of Data
The above is an array of scores which otherwise would not be very handy to use. In order to make the
data meaningful and useful it must be organized and classified into frequency tables. There are certain
easy steps to be followed in order to convert the raw scores into frequency tables.
i. We must first find the difference between the highest and the lowest score in the series. In
the above case the difference is 65 (93-28). To it we must add 1 to bring in the entire range
of scores. So it becomes 66.
ii. Next, we would have to assume the number of class intervals that would best summarise the
entire range of scores. In this case we assume the number of intervals as 10.
5
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
iii. Now we would divide the range of scores by the number of class intervals to obtain the
width (denoted as i) of the class interval. Here it would be 6.6, that is 6 or 7 approximately.
iv. To the lowest score in the series we add (i – 1) to get the first class interval. In this case it
would be 28+ (7-1) or 28 to 33.
v. We take the higher integer from the upper limit of the class interval and repeat step iv to get
the next class interval. In this way we would obtain the class intervals and put the
frequencies in the respective class intervals (Elifson 1997).
Answer: The complete class interval of examination grades for 80 students is the following:
Class Interval Frequencies
28- 33
34-39
40-45
46-51
52-57
58-63
64-69
70-75
76-81
82-87
88-93
4
7
5
6
9
16
7
12
6
3
5
N = 80
5. Data Analysis
The term ‘data analysis’ refers to the computation of certain indices or measures along with searching
for patterns of relationship that exist among the data groups. Analysis, particularly in case of survey
or experimental data (quantitative data), involves estimating the values of unknown parameters of the
population and testing of hypothesis for drawing inferences (Kothari 2004: 130). Quantitative data
analysis occurs typically at a late stage in the research process. But this does not mean that the
researchers should not be considering how they will analyse their data at the beginning of the
research. During the designing phase of the questionnaire or observation schedule, the researchers
should be fully aware of the techniques of data analysis. In other words, the kinds of data the
researchers will collect and the size of the sample will have implications for the sorts of analysis that
would be applied for (Bryman 2004).
6. Statistics in Social Research
The task of analysing quantitative data in research is done by social statistics. Social statistics has two
major areas of function in research. They are namely Descriptive and Inferential. Descriptive statistics
is concerned with organizing raw data obtained in the process of research. Tabulation and
classification of data are instances of descriptive statistics. Inferential statistics is concerned with
making inferences or conclusions from the data collected from the sample and drawing
generalisations on the entire population (Elifson 1997). Inferential statistics is also known as sampling
statistics and it is concerned with two major types of problems:
6
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
the estimation of population parameters, and
the testing of statistical hypothesis (Kothari 2004: 131)
Some of the most important and useful statistical measures that would be taken up for discussion in
the present module are:
measures of central tendency or statistical averages
measures of dispersion
chi-square test
t-test
measures of relationship
From the next section we are going to take up each for discussion.
Self Check Exercise – 2
1. How does descriptive statistics work?
Descriptive statistics tries to describe and summarize the mass of data that is obtained in the
process of conducting research. It tries to do so with the help of some specific measures. The
very first step of organizing data would be to arrange the raw scores into a number of categories
known as frequency tables. After it is done, the next step would be to represent the data through
various graphs and figures. Some of these would be bar graph, pie chart, frequency polygon etc.
2. What is inferential statistics?
Inferential statistics deals with the task of drawing inferences on the population by studying the
sample drawn from that population. The reasons why we infer on the findings of a sample can be
many. Insufficient resources in terms of money and man power can force a researcher to draw a
sample from the population. Time available for a research may also be short and inadequate to
study an entire population. Statistics can be of great help in generalizing findings. It needs to be
mentioned here that error(s) inevitably appears in the process of sampling, but researchers may
adopt various methods to minimize those. The prefix ‘social’ is attached to statistics due to its
application to interpret social phenomena.
6.1. Measures of Central Tendency
When the scores have been tabulated into a frequency distribution, the next task is to calculate a
measure of central tendency or central position. The measure of central tendency defines a value
around which items have a tendency to cluster. The importance of the Measure of Central Tendency is
twofold. First, it is an “average” which represents all the scores in a distribution and gives a precise
picture of the entire distribution. Second, it enables us to compare two or more groups in terms of
typical performance. Three “averages” or measures of central tendency are commonly used:
Arithmetic Mean, Median and Mode (Garrett 1981: 27).
7
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
i) Arithmetic Mean: Mean is known as arithmetic average and is the most stable measure of central
tendency. It is defined as the summation of all the values given in the series of numbers divided by the
number of values. Mean can be calculated through different methods:
a) Calculation of the Mean from Ungrouped Scores: This can be computed by the following equation:
𝑋=∑ 𝑓1+ 𝑓2+𝑓3……………… 𝑓𝑛
𝑛 where ‘f’ is the frequency and ‘n’ is the number of scores.
In the case of the following scores, the mean can be found out by the above formula (Garrett 1981).
8, 5, 4, 7, 9, 10
𝑋= 8+5+4+7+9+10
6= 7
b) Calculation of the Mean from Grouped Scores: In case of computing the mean from a grouped
frequency distribution, the mean is calculated by a slightly different method from that given above.
Thus, it can be computed by the following formula:
𝑋 = ∑𝑓𝑋
𝑛where ‘X’ is the midpoint of the class intervals and ‘f’ is the frequency assigned to each class
interval, ∑ is the summation operator and ‘n’ is the total frequency. The calculation is shown in the
table below (see Garrett 1981, for details):
Class Intervals Frequencies Midpoint (X) fX
140-144 1 142 142
145-149 3 147 441
150-154 2 152 304
155-159 4 157 628
160-164 4 162 648
165-169 6 167 1002
170-174 10 172 1720
175-179 8 177 1416
180-184 5 182 910
185-189 4 187 748
190-194 2 192 384
195-199 1 197 197
∑ N = 50 ∑ fX=8540
The Mean will be ∑𝑓𝑋
𝑛 =
8540
50 or, 170.8.
ii) Median: Median is the middle most value in the entire distribution of data. It divides the
distribution into two equal parts: one half of the distribution falls below the median value and the
other half falls above it. Before calculating the median we have to arrange the values in either
ascending or descending order. It is a positional average. It is shown by the following formula:
M= Value of (𝑛+1
2)th item
It should be mentioned in this context that the median is usually used to describe qualitative
phenomena like intelligence. It is not often used in sampling statistics (Kothari 2004: 133).
8
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
a) Computation of the Median when data are Ungrouped: Two situations arise in the computation of
the Median from ungrouped data: a) when N is odd, and b) when N is even. To consider the first case
where N is odd, suppose we have the following numbers: 7, 10, 8, 12, 9, 11, 7. First we have to
arrange these data in an ascending order like 7, 7, 8, 9, 10, 11, 12. Then we apply the above equation
to compute the median.
M= Value of (𝑛+1
2)th item where ‘n’ is the number of scores.
= 7+1
2 =
8
2 = 4th item
M= 9.
When the total number of scores is even like 7, 8, 9, 10, 11, 12, the median is the average of the two
middlemost numbers. In the above numbers, the two middlemost numbers are 9 and 10. The average
of these numbers is 19/2 or 9.5.
b) Computation of the Median when data are Grouped: When the scores are arranged into a
frequency distribution, the median by definition is the 50% point in the distribution. We calculate the
cumulative frequency of the distribution and divide N by 2 to locate the class interval in which the
median falls. The following equation would help us to compute the median from a grouped frequency
distribution:
Mdn = l + (𝑁
2− 𝐹
𝑓𝑚) 𝑖 where l is the exact lower limit of the class interval upon which the median lies.
N/2 one half of the total number of scores, F is the sum of the scores on all intervals below l, and fm is
the frequency within the interval upon which the median falls and i is the width of the class interval.
The computation of the median is shown in the following table:
Class Intervals Frequencies Cumulative Frequencies
140-144 1 1
145-149 3 4
150-154 2 6
155-159 4 10
160-164 4 14
165-169 6 20
170-174 10 30
175-179 8 38
180-184 5 43
185-189 4 47
190-194 2 49
195-199 1 50
N = 50
When we divide N or 50 by 2 we get 25. We locate the class interval with the help of it and locate it
as 170-174 (since 30 would include 25). Next we compute the median with the help of the equation
above:
Here ‘l’ would be (170- 0.5) = 169.5
Mdn = 169.5 + (25−20
10) 5
= 172.
9
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
iii) Mode: When a rough and quick estimate of central tendency is wanted, mode is usually the most
preferred measure. Mode is that value which has the greatest frequency in the given series of scores.
Like median, mode is also a positional average and is therefore unaffected by extreme scores in the
series of numbers. It is useful in all situations where we want to eliminate the effect of extreme
variations (Kothari 2004: 133).
a) Calculating Mode from Ungrouped Data: In a simple ungrouped data, the mode is that single
measure or score which occurs most frequently. For instance in the series of the numbers 10, 11, 11,
12, 12, 13, 13, 13, 14, 14, the crude mode is 13 (the most frequented one).
b) Calculating the Mode from Grouped Data: When the data are grouped into a frequency
distribution, the crude mode is found out by the midpoint of the interval which contains the highest
frequency. In the case of the above table, the value of the mode would be 172 (the midpoint of the
class interval 170-174 (Garrett 1981). We can also calculate the true mode from a grouped frequency
distribution. The formula for calculating the true mode in a normal or symmetrical distribution is:
Mode = 3 Mdn – 2 Mean (ibid).
iv) When to Use the Various Measures of Central Tendency: The situations in which the three
measures are used are stated below:
a) The Mean is used when
The scores are distributed symmetrically around a central point
The central tendency having the greatest stability is wanted
Other statistics like standard deviation and correlation coefficient are to be computed later.
b) The Median is used when
The exact midpoint of the distribution is all that is wanted
There are extreme scores which affect the mean but they do not affect the median.
c) The Mode is used when
A rough and quick estimate of central tendency is all that is wanted
The measure of central tendency should be the most typical value (Garrett 1981).
The choice of average depends on the researcher and the objectives of the study. Only then will be the
statistical computation of averages be effective and useful in interpretation of data.
6.2. Measures of Dispersion (Range, Interquartile Range, Mean Deviation or Average Deviation
and Standard Deviation)
Measures of central tendency like mean, median and mode can only be a representative of the entire
series of scores. But it cannot fully describe the nature of a frequency distribution. For instance it
cannot state how far a given score in a series deviates from the average. In other words, how much a
10
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
score is lower or higher than the average? Therefore, in order to measure this spread of score from the
central tendency, we calculate the measures of dispersion or variability. There are different measures
of dispersion. They are the range, mean deviation and standard deviation.
i) Range: Range is the simplest and the easiest measure of variability. It is usually calculated by
subtracting the lowest score from the highest score in the given series of data. The value of the range
depends on only two values and this is its main limitation. It ignores the remaining values in the
distribution and therefore it fails to provide an accurate and stable picture of the dispersed scores.
a) Range for Ungrouped Data: In a distribution of ungrouped scores, if the scores are arranged in an
array, the range is defined as the largest score minus the smallest score plus one.
Range = (Highest value of an item in a series) ─ (Lowest value of an item in a series) +1
In a distribution that has 103 as the highest score and 30 as the lowest score, the range is computed as
range = (103- 30)+1 = 74 (Leonard 1996).
b) Range for Grouped Data: In case of grouped data, the range is the difference between the upper
true limit of the highest class interval and the lower true limit of the lowest class interval. Let us look
into the following data:
Class Interval Frequency
31-33 3
34-36 0
37-39 1
40-42 5
43-45 7
46-48 6
49-51 24
52-54 18
55-57 14
58-60 15
61-63 16
64-66 7
In case of the above data, the upper true limit of the highest class interval is 66.5 (64-66) and the
lower true limit of the lowest class interval is 30.5 (31-33). Therefore, the range would be 66.5-
30.5=36. Here, 1 is not added because the difference is between the two true limits (Leonard 1996).
Please note that range does not represent the entire series of scores as its computation requires only
the two extreme values.
ii) Mean Deviation or Average Deviation: It is the average of difference of the values of items from
some average of the series (Kothari 2004: 135). It is based on absolute deviations of scores from the
centre (Leonard 1996). This procedure is designed to avoid the algebraic sum of deviations from the
mean equalling zero, in which case it would be impossible to compute indices of variability.
a) Average Deviation for Ungrouped Scores:
11
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
Mean Deviation from Mean = ∑|𝑋−𝑋|
𝑛 where X denotes a particular score and 𝑋 the mean of the
scores, n stands for total number of frequencies. Let us look into the calculation of following scores:
Observation No. X 𝑋 |𝑋 − 𝑋| or x
1 26 16 10
2 24 16 8
3 22 16 6
4 20 16 4
5 18 16 2
6 16 16 0
7 14 16 2
8 10 16 6
9 6 16 10
10 4 16 12
N= 10 ∑ 𝑋 = 160 16 ∑|𝑋 − 𝑋| = 60
For the above scores, we first calculated the mean which is 16 (160/10). Then, we have subtracted the
mean from the scores in order to know their deviation and ignored the sign of the scores. After this,
the absolute scores have been summed up (60). To find out the average deviation, we divided 60 by n
or 10 and obtain 6. Here 6 is our mean deviation (Ibid. 1996).
b) Average Deviation for Grouped Data: The formula for calculating the average deviation is
A.D. = ∑𝑓|𝑋− 𝑋|
𝑁
The average deviation or mean deviation from the grouped data is calculated below:
Class Intervals Midpoints (m) Frequencies (f) mf x= |𝑋 − 𝑋| Fx
140-144 142 1 142 -28.8 28.8
145-149 147 3 441 -23.8 71.4
150-154 152 2 304 -18.8 37.6
155-159 157 4 628 -13.8 55.2
160-164 162 4 648 -8.8 35.2
165-169 167 6 1002 -3.8 22.8
170-174 172 10 1720 1.2 12
175-179 177 8 1416 6.2 49.6
180-184 182 5 910 11.2 56
185-189 187 4 748 16.2 64.8
190-194 192 2 384 21.2 42.4
195-199 197 1 197 26.2 26.2
∑N= 50 ∑mf= 8540 ∑fx= 502
Mean or 𝑋 of the above group of scores is 8540/50= 170.8. The rest of the calculations have been
shown in the table. Therefore A.D. would be 502/50 or 10.04.
iii) Standard Deviation: Standard Deviation (S.D) is the most stable measure of dispersion or
variability. It is defined as the square root of the average of the squares of deviations when such
12
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
deviations for the values of individual items in a series are obtained from the arithmetic average. In
finding the S.D, we avoid the difficulty of signs by squaring the separate deviations (Garrett 1981).
a) Standard Deviation for Ungrouped Scores: The formula for computing S.D. from the ungrouped
scores is σ (S.D) = √∑ 𝑋2
𝑁 where ‘X’ is the value of the deviations of the scores from the mean and
‘N’ is the total of frequencies given.
We can calculate standard deviation from the scores below in the following manner (Leonard 1996)
X X2
2 4
2 4
4 16
6 36
8 64
14 196
20 400
N= 56 ∑X2= 720
If we find the square root of 12.85, we will get standard deviation. So √12.85 or 3.58 is the S.D.
b) Standard Deviation for Grouped Data: The following is the formula for computing standard
deviation for grouped data.
Standard Deviation for grouped data (σ) = √∑𝑓𝑥2
𝑁where f stands for the individual frequency, ‘x’ is the
value of the deviation of the individual scores from the mean and N stands for the total frequency
(Garrett 1981). The calculation is shown in the table below:
Class Interval
(1)
Midpoints(X)
(2)
Frequency (f)
(3)
f(X)
(4) x=|𝑋 − 𝑋|
(5)
𝑥2
(6)
f𝑥2
(7)
140-144 142 1 142 -28.80 -28.80 829.44
145-149 147 3 441 -23.80 -71.40 1699.32
150-154 152 2 304 -18.80 -37.60 706.88
155-159 157 4 628 -13.80 -55.20 761.76
160-164 162 4 648 -8.80 -35.20 309.76
165-169 167 6 1002 -3.80 -22.80 86.64
170-174 172 10 1720 1.20 12 14.40
175-179 177 8 1416 6.20 49.60 307.52
180-184 182 5 910 11.20 56 627.20
185-189 187 4 748 16.20 64.80 1049.76
190-194 192 2 384 21.20 42.40 898.88
195-199 197 1 197 26.20 26.20 686.44
N=50 8540 ∑𝑓𝑥2=7978
The mean score of the above distribution of scores is 8540/50 or 170.80.
The computed value of σ is √∑𝑓𝑥2
𝑁 or √
7978
50 or 12.63.
13
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
iv) When to Use the Various Measures of Variability: The rules for using the measures of
dispersion are as follows:
a) Range can be used when
the scores are scanty in number or are too dispersed
a knowledge of the extreme scores or total spread of scores are wanted.
b) Average Deviation can be computed when
it is desirable to weigh all deviations from the mean according to their size
extreme deviations would influence the S.D. unduly.
c) S.D. is to be used when
the statistic having the greatest stability is wanted
coefficient of correlation and other statistics are subsequently to be computed (Garrett 1981).
6.3. Chi-square Test
The Chi-square test is an important one among several tests of significance developed by statisticians.
It is symbolically written as 𝑥2 and can be used to determine if categorical data shows dependency or
the two classifications are independent. It can be used to make comparisons between theoretical
populations and actual data when categories are used. The test is, in fact, a technique by use of which
it is possible for all researchers to test a) goodness of fit, and b) test of significance of association
between two attributes (Kothari 2008).
a) Test of Goodness of Fit: As a test of Goodness of Fit, Chi-square enables us to see how well the
theoretical distribution fit to the observed data. If the calculated value of 𝑥2 is greater than its table
value at a certain level of significance, the fit is considered to be a good one. When the calculated
value of 𝑥2 is less than the table value, we do not consider the fit to be a good one (Kothari op. cit).
Illustrative Problem
Given below is the data on the number of students entering the University from each school.
School 1 – 22, School 2 – 25, school 3 – 26, School 4 – 28, School 5 – 33.
Is there a difference in the quality of school? N=50
In the case of the above data the most suitable technique of statistical application would be chi-square
goodness of fit test because the data are at the nominal level and the hypothesis is to be tested on one
variable, that is, the quality of schools on the basis of the prospect of entering the University from
each school.
The steps for calculating the chi-square are shown below.
14
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
1. Stating the Null and the Alternative Hypothesis: The null hypothesis assumes that there is no
difference in the quality of the schools. Whereas the alternative hypothesis would state that
there is a difference in the quality of the schools.
2. Choice of a Statistical Test: As has been stated above, the appropriate statistical test
applicable here would be Chi-square goodness of fit test.
3. Level of Significance and Sample Size: Here the level of significance would be 0.5, that
means only 5 times in 100. The sample size is 50.
4. One versus the two tailed test: It is a two tailed test because no direction is indicated in the
alternative hypothesis. It only suggests that there is a difference in the number of students
entering the University from each school.
5. The Sampling Distribution: The sampling distribution is a function of the degrees of freedom
which are quantities that are free to vary. Here it can be computed by (k-1) where ‘k’ is the
number of categories into which observations are divided. Here there are 5 categories, that
means degrees of freedom (df) = (5-1) = 4.
6. The Region of Rejection: The point of intersection of the ‘df’ and the level of significance
gives the critical value of 𝑥2 which is 9.488. The computed value of the chi-square has to be
greater than the table value, so as to reject the null hypothesis. It is computed by the formula:
𝒙𝟐= ∑(𝑂𝑓− 𝐸𝑓)
2
𝐸𝑓
where Of is the observed frequencies of each and Ef is the expected frequency. In ideal
situation, from each school there would be 10 students selected in the University, therefore
our expected frequencies in each case would be 50/5 = 10. The computation of the 𝑥2 is
shown in the table below:
Schools Of Ef Of–Ef (Of – Ef)2
(𝑂𝑓 − 𝐸𝑓)2
𝐸𝑓
1 22 10 12 144 14.4
2 25 10 15 225 22.5
3 26 10 16 256 25.6
4 28 10 18 324 32.4
5 33 10 23 529 52.9
∑ (Of – Ef)2
/N= 147.8
Since the computed value of chi-square is 147.8, which is greater than its table value of 9.488, the
alternative hypothesis is upheld that there are differences in the quality of schools. This is understood
from the different number of students entering the University from each school.
b) Chi-square Test of Independence: As a test of independence, chi square test enables us to explain
whether or not two attributes are associated. If the table value of chi is greater than its computed
value, we can conclude that there is no association between the attributes, that is, the null hypothesis
is upheld. But if the computed value of chi is greater than its table value, we uphold that the two
attributes are associated and the association is not due to chance factors but it exists in reality (Kothari
2008). For the test of association, the formula for computing the chi-square remains the same as
above.
15
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
Illustrative Problem
Let us look into the following data:
Level of Job Satisfaction
Union Membership Not Satisfied Satisfied Total
No 75 (A) 125 (B) 200
Yes 65 (C) 135 (D) 200
Total 140 260 400
From the above data, we have to find out if a relation exists between the two variables.
Here, we will apply a chi square test of independence because the data are at the nominal level and
there are two variables in the data, namely job satisfaction and union membership. The steps 1 to 6 are
to be written in the same manner as above. Only the sample size is 400. The degrees of freedom will
be computed by (c-1)(r-1)where ‘c’ is the number of columns and ‘r’ means the number of rows into
which observations are divided. Here the degrees of freedom (df) = (2-1)(2-1) = 1. The point of
intersection between the ‘df’ and the level of significance (0.05) gives the critical or table value of 𝑥2
which is 3.841. The computed value of the chi-square would have to be more than its table value in
order to reject the null hypothesis.
Next, we calculate the expected frequencies against each observed frequency by the following
formula:
Cell A = (𝐴+𝐵)(𝐴+𝐶)
𝑁 =
200 𝑋 140
400 = 70
Cell B = (𝐵+𝐴)(𝐵+𝐷)
𝑁 =
200 𝑋 260
400 = 130
Cell C = (𝐶+𝐷)(𝐶+𝐴)
𝑁 =
200 𝑋 140
400 = 70
Cell D = (𝐷+𝐶)(𝐷+𝐵)
𝑁 =
200 𝑋 260
400 = 130
Now we would compute the value of Chi-square in the following table:
Cell Of Ef Of – Ef (Of – Ef)2 (𝑂𝑓 − 𝐸𝑓)
2
𝐸𝑓
A 75 70 5 25 0.35
B 125 130 -5 25 0.19
C 65 70 -5 25 0.35
D 135 130 5 25 0.19
1.08
Since the computed value of Chi-square (1.08) is less than its table value (3.841), therefore the
alternative hypothesis is rejected. It may hence be argued that there is an association between job
satisfaction and union membership. It appears that the chi-square test is one of the most frequently
used tests, but it should be applied correctly in situations where an individual observations of sample
are independent (Kothari 2008: 295).
16
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
6.4. T-test
The Central Limit Theorem states that, if sample size N is large, the sample statistic approaches the Z
distribution (explained above). When a sample is taken from a normally distributed population with a
known mean (µ) and standard deviation (σ), and then compute a z-score on the basis of each
observation, the resulting scores will have a z – distribution, that is, a normal distribution with mean =
0 and standard deviation = 1. But the problem is that in most of the cases, the population standard
deviation is unknown. As the Central Limit Theorem involves the use of the standard deviation, it
cannot be ignored. One solution here is to substitute the sample standard deviation (s1) for the
population standard deviation (Vito and Latessa 1989). To test the samples of small size, we have the
“t” statistic. T-test can be of two types, namely – two sample t - test and related sample t – test. The
type of test chosen will depend upon whether or not the two samples are independent or related.
Related samples occur when -
both samples have been matched according to some trait like race or gender, or
repeated measurements of the same sample are taken (before – after or time series design)
(Ibid. 1989).
a) Two Sample t – test: When two samples are to be tested on any trait or variable, then we apply for a
two sample t- test. The formula for computation of the value of t is as follows:
t = 𝑋1− 𝑋2
√(𝑛1𝑠1
2+ 𝑛2𝑠22
𝑛1+ 𝑛2−2)(
𝑛1+𝑛2𝑛1𝑛2
)
where 𝑋1 is the mean of the first sample, 𝑋2 is the mean of the
second sample, s1 and s2 are the standard deviations of the first and the second samples respectively
and n1 and n2 are the sample sizes of the two samples respectively.
Illustrative Problem
The data for two schools have been provided below:
State Funded School
N1 =20 𝑋1 = 64 S1 = 18.5
Private Schools
N2 = 24 𝑋2 = 46 S2 = 18.5 (C.U. 2001)
The steps for computing the value of t would be summarized below.
1. Stating the Null and the Alternative Hypothesis: The null hypothesis assumes that there would
be no differences in the samples. The alternative hypothesis assumes a difference between
two samples.
2. Choice of Statistical Test: The statistical test chosen is the two sample t-test.
3. Level of Significance and Sample Size: The level of significance is .05 which means that 5
times in 100, we can reject the null hypothesis incorrectly or 5 times in 100 our result can be
due to chance. The sample sizes are 20 and 24 respectively.
4. One Versus Two Tailed Test: It is a two tailed test because no direction is implied in the
alternative hypothesis. It only suggests a difference between two sample means.
17
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
5. The sampling Distributions: It is the function of the degrees of freedom that means the
quantities which are free to vary. It can be calculated by the formula (N1+ N2 – 2). Here
would be (20 + 24 – 2) or 42.
6. The Region of Rejection: The point of intersection between the degrees of freedom and the
level of significance gives the table value of ‘t’. Here the critical or table value of “t” would
be 1.684. The computed value of “t” has to be more than this in order to reject the null
hypothesis. The computed value of “t” can be found out from the formula given above. We
just substitute the values in it.
t = 𝑋1− 𝑋2
√(𝑛1𝑠1
2+ 𝑛2𝑠22
𝑛1+ 𝑛2−2)(
𝑛1+𝑛2𝑛1𝑛2
)
= 64−46
√(20(342.25)+24(342.25)
20+24−2)(
20+24
20𝑥24)
= 3.16.
Since the computed value of “t” is 3.16 and is greater than its critical value which is 1.684, the
alternative hypothesis is upheld. In other words, there are significant differences between the two
school systems.
b) T-test for Related Samples: This is applicable when there are repeated measurements of the same
sample (time series design). The formula for computing the value of “t” for related samples is:
t = 𝑋1− 𝑋2
𝑆𝐷 where again 𝑋1 and 𝑋2 are the two values of the mean of the samples
respectively and 𝑆𝐷 is the estimation of the standard error of the Mean Difference scores. The
standard error is calculated by the formula
SD = √𝑠𝐷
2
𝑁 where SD
2 is the pooled variance of the different scores and N is the total
number of scores. The pooled variance is computed by the formula
SD2 =
𝑁 ∑ 𝐷2− ∑(𝐷)2
𝑁(𝑁−1) where D is the difference between the two mean of the related sample
(Vito and Latessa 1989).
Illustrative Problem
The governor of Florida wants a report on the effects of the death penalty. Homicide rates (per
100,000 population) in Florida cities, two weeks before and two weeks after an execution are noted
below (Vito and Latessa 1989):
City Rate Before
(Test 1)
Rate After
(Test 2)
D = Test 2–Test 1 D2
Pompano Beach 23 19 -4 16
Tallahassee 15 16 1 1
Tampa 12 18 6 36
Miami 20 17 -3 9
Orlando 13 11 -2 4
83 81 -2 66
At first we would calculate the mean of the two tests:
18
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
𝑋1 = 83
5 = 16.6
𝑋2 = 81
5 = 16.2
Now, we would calculate the population variance of difference scores. The formula is:
SD2 =
𝑁 ∑ 𝐷2− ∑(𝐷)2
𝑁(𝑁−1)
= 5(66)−(−2)2
5 (5−1)
= 16.3
Next, we would calculate the Population Standard Error of the Mean Difference Scores, the formula
for which is:
SD = √𝑠𝐷
2
𝑁 = √
16.3
5 = √3.26 = 1.80
From the above values we calculate the value of “t” as:
t = 𝑋1− 𝑋2
𝑆𝐷 =
16.6−16.2
1.80 = 0.22
The steps for computing the value of ‘t’ are the same as above. Here the sample size is 5. The
sampling distribution would be calculated by (N-1) or (5-1) or 4. The point of intersection between
the degrees of freedom and the level of significance (0.05) gives the table value of ‘t’. Here the
critical or table value of “t” would be 2.776. The computed value of “t” has to be more than this in
order to reject the null hypothesis. The computed value of “t” is found out from the above formula.
We have just substituted the values in it and found out “t” to be 0.22. Since the computed value of t is
less than its table value, the null hypothesis is upheld. Our findings can be due to chance factors.
6.5. Measures of Relationship (Correlation co-efficient, Simple Linear Regression and Bivariate
Contingency Tables)
The statistical measures discussed before have dealt with univariate population that is, the population
which have one variable as their characteristic feature. But cases of observations based on two
variables are known as bivariate relationships. If for every measurement of a variable X, we have
corresponding value of a second variable, Y, the resulting pairs of values are called a bivariate
population. We have to answer two types of questions in bivariate population:
Does there exist an association or correlation between two variables? If yes, to what degree?
Is there any cause and effect relation between the two variables (Kothari 2004: 138)
i) Coefficient of correlation or simple correlation: It is the most widely used method of measuring
the degree of relationship between two variables. At times we want to know if there is a relation
between the variables incidence of child labour and broken homes or that between drug addiction and
involvement into criminal activities. In all such cases, it would be appropriate to use coefficient
correlation. The Pearson correlation coefficient or Pearson’s ‘r’ (also known as Pearson product-
moment coefficient correlation) is a measure of the straight line relationship between two interval-
level variables (Elifson 1997). To employ Pearson’s correlation coefficient correctly as a measure of
association between X and Y variables, the following requirements must be taken into account:
19
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
Interval data: Both X and Y variables must be measured at the interval level so that
scores may be assigned to the respondents
Normally distributed characteristics: Testing the significance of Pearson’s ‘r’ requires
both X and Y variables to be normally distributed in the population (Levin and Fox
2006: 357).
Computation of the Pearson’s ‘r’ by Mean Deviation Method: The mean deviation computational
equation for ‘r’ is:
‘r’= ∑(𝑋−𝑋)(𝑌− 𝑌)
√∑(𝑋−𝑋)2
(𝑌−𝑌)2 where by X and Y stand for the variables respectively and
𝑋 and 𝑌 refer to the deviation of the scores from the mean. The calculation would be shown in the
following table. An effort would be made here to find out the nature and strength of relationship
between the variables mothers’ education and daughters’ education (Elifson 1997).
Respondents
(1)
Mother’s
education
(X) (2)
(𝑋 − 𝑋)
(3)
(𝑋 − 𝑋)2
(4)
Daughter’s
education
(Y) (5)
(𝑌 − 𝑌)
(6)
(𝑌 − 𝑌)2
(7)
(𝑋 − 𝑋)(𝑌 − 𝑌)
(8)
A 1 -6 36 7 -6 36 36
B 3 -4 16 4 -9 81 36
C 5 -2 4 13 0 0 0
D 7 0 0 16 3 9 0
E 9 2 4 10 -3 9 -6
F 11 4 16 22 9 81 36
G 13 6 36 19 6 36 36
∑(𝑋
− 𝑋)2
= 112
∑(𝑌
− 𝑌)2
= 252
Summation=138
Now we would substitute the values into the above equation and compute the Pearson’s ‘r’.
‘r’ = 150
√(112)(252)
= 150
√28224 =
138
168
= 0 .82
The value of ‘r’ lies in between (+1) and (-1). The direction of a relationship is indicated by the sign
of the correlation coefficient. A positive relationship (or direct relationship) indicates that high scores
on one variable tend to be associated with high scores on a second variable and conversely low scores
on one variable tend to be associated with low scores on the second variable. A negative relationship
(also referred to as an inverse or indirect relationship) indicates that low scores on one variable tend to
be associated with high scores on a second variable. Conversely high scores on one variable tend to be
associated with low scores on the second variable (Elifson 1997: 201). In the above example there is
found to be a strong positive correlation between mothers’ education and their daughters’ education.
In a concluding note it can be said that although there is no established rule so as to specify what
constitutes a weak, moderate or strong relationship, yet there are certain guidelines to follow. A weak
20
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
relationship is one where the score varies between ± 0.01 to ± 0.30, moderate when the scores vary
between ± 0.31 to ± 0.70, and strong relationship between ± 0.71 to ± 0.99. A perfect relationship is ±
1.00 and no relationship is indicated when ‘r’ = 0 (Elifson 1997: 208).
ii) Simple Regression Analysis: Regression analysis is very closely related to correlation. It is the
statistical determination of a relationship between two or more variables (Kothari 2004). When we
use regression analysis, we are essentially interested in the description of a predictive relationship
(Vito and Latessa 1989). The independent variable in the relationship is known as the cause and the
dependent variable is the effect. In regression analysis, we can state accurately the degree of change in
the two variables. In other words, how much each unit change in X produces a change Y (Kothari
2004). The basic equation of simple linear regression is as follows:
�̂�= a+bX where �̂� is the predicted scores of the dependent variable, X is the scores of the independent
variable, ‘a’ is the Y intercept, the point at which the regression line crosses the Y axis, representing
the predicted value of Y when X = 0 and ‘b’ is the regression coefficient, it is the slope of the
regression line and indicates the expected change in Y with a change of one unit in X (Vito and
Latessa 1989).
Vito and Latessa (1989) state the example of the theory of prisonization in correction. According to
the theory, the longer a person is incarcerated, the more ‘prisonized’ the person will become and their
readjustment to society will be hampered. The hypothesis was tested with a random sample of inmates
using a scale designed to test the degree of prisonization, where 0 indicates no prisonization and 10
equals a high degree of prisonization. Here prisonization is the dependent variable (Y) where as the
time served in years in prison is our independent variable (X). The computation of the X and Y will
be shown in the table below:
Prisoner X x= (X-𝑋) 𝑥2 Y y= Y-𝑌 𝑦2 XY
A 0 -3.4 11.56 1 -3.6 12.96 12.24
B 2 -1.4 1.96 3 -1.6 2.56 2.24
C 5 1.6 2.56 4 -0.6 0.36 -0.96
D 4 0.6 0.36 6 1.4 1.96 0.84
E 6 2.6 6.76 9 4.4 19.36 11.44
N = 5 17 23.2 23 37.2 25.8
The Mean value of X is
𝑋= 17/5 = 3.4
The Mean value of Y is
𝑌 = 23/5 = 4.6
The value of the regression coefficient or ‘b’ can be found out from the formula
b = 𝑥𝑦
𝑥2 = 25.8
23.2 = 1.11
now we can find out the value of ‘a’ from the formula
a = 𝑌- b(𝑋) = 4.6 – 1.11(3.4) = 0.83
‘b’ is the slope of the regression line or the ratio of the change in Y corresponding to a change in X.
Therefore when X changes by 1, Y will change by 1.11 units.
‘a’ is the y-intercept or the value of Y if x = 0 (Vito and Latessa 1989).
21
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
When the value of X (time spent in prison) would be 2, the value of �̂�(the degree of prisonization)
would be
�̂� = a + bX = 0.83 + 1.11(2) = 0.83 + 2.22 = 3.05.
In this way we can calculate the value of the dependent variable from the existing regression equation
and infer exactly what amount of change in X will lead to what amount of change in Y.
To conclude, we can state that the regression analysis is a statistical method to deal with the
formulation of mathematical model depicting relationship amongst variables which can be used for
the purpose of prediction of the values of the dependent variable, given the values of the independent
variable (Kothari 2004: 142).
iii) Contingency Tables: Contingency Tables are another way of explaining and interpreting
relationship between variables. In the present module, we would be concerned only with the bivariate
contingency tables where the focus of discussion would be on two variables – one an independent
variable or the predictor variable (symbolized by X) and the other a dependent variable (symbolized
by Y). Here we would discuss a relationship between marital status (X) and employment status (Y) of
women. The hypothesis is that marital status exerts an influence on the employment status of women.
The study has been carried out on 200 respondents (Elifson 1997). The data have been presented in
the table below:
Marital Status (X)
Employment
Status (Y)
Never Married Married Divorced Widowed Total
Employed 21 60 11 6 98
Not-Employed 14 65 4 19 102
Total 35 125 15 25 N= 200
Contingency tables can be interpreted by percentaging it in three ways as follows.
Percentaging Down: This is one of the most common ways of calculating percentages. Here the
column marginals (35, 125, 15 and 25) are taken as the base on which the percentages are calculated.
Percentaging down is also referred to as percentaging on the independent variable when it is the
column variable. Percentaging down allows us to determine the effect of the independent variable by
comparing across the percentages within a row that is by comparing people in different categories of
the independent variable (Elifson 1997: 172). The method will be shown below:
Marital Status (X)
Employment Status (Y) Never Married Married Divorced Widowed
Employed 60% 48% 73.3% 24%
Not-Employed 40% 52% 26.7% 76%
Total 100% 100% 100% 100%
While interpreting from the above table, we say 60% (21/35x100) of the never married respondents
are employed, 48% (60/125x100) of the married respondents are employed, 73.3% (11/15x100) of the
divorced respondents are employed and 24% (6/25x100) of the widowed respondents are employed. If
22
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
we interpret it in this way we get a logical relationship between marital status and employment status
of women.
Percentaging Across: When we are percentaging across we are taking row marginal as the base and
calculating percentages. Here we are percentaging across and comparing up and down. An advantage
of doing this is that a profile of the employed versus those who are not employed can be established in
terms of their marital status (Elifson 1997: 172). This is also shown in the table below:
Marital Status (X)
Employment
Status (Y)
Never
Married
Married Divorced Widowed Total
Employed 21.4% 61.2% 11.2% 6.1% 99.9%
Not-Employed 13.7% 63.7% 3.9% 18.6% 99.9%
From the above table, we can say that 21.4% (21/98x100) of the respondents who are employed have
never married, 13.7% (14/102x100) of the respondents who are not-employed have never married.
Moreover, 61.2% of the employed respondents are married, where as 63.7% of the respondents who
are not-employed have married, 11.2% of the employed respondents have been divorced, 3.9% of the
not-employed respondents have been divorced. 6.1% of the employed respondents are widowed and
18.6% of the not-employed respondents are widowed. In the above table, the total has not come to
100% due to rounding (Elifson 1997).
Percentaging on the total number of cases: This is another method of interpreting bivariate
contingency tables. Here the percentages are calculated on the total number of cases (N). The
following table shows this:
Marital Status (X)
Employment
Status (Y)
Never
Married
Married Divorced Widowed
Employed 10.5% 30% 5.5% 3%
Not-Employed 7% 32.5% 2% 9.5%
Total 100%
From the above table we infer that 10.5% (21/200x100) of the respondents have never married and are
employed where as 7% (14/200x100) of the respondents who have married are not-employed. This
way of percentaging like the second method (percentaging across) also does not allow us to see the
influence of the independent variable on the dependent one and is rarely used. But it is used in certain
instances (Elifson 1997: 172).
Self-Check Exercise – 3
1. What is measurement?
Measurement is the assignment of numbers to objects or events according to some
predetermined (or arbitrary) rules. The different levels of measurement represent different
levels of numerical information contained in a set of observations.
23
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
2. What are the levels of measurement that are used by the social scientists?
There are four levels of measurement namely – nominal, ordinal, interval and ratio. The
characteristics of each will decide the kind of statistical application we can use.
The nominal level does not involve highly complex measurement but rather involves rules for
placing individuals or objects into categories.
The ordinal scales possess all the characteristics of the nominal and in addition the categories
represent a rank-ordered series of relationships like poorer, healthier, greater than etc.
The interval and ratio scales are the highest level of measurement in science and employ
numbers. The numerical values associated with these scales permit the use of mathematical
operations such as adding, subtracting, multiplying and dividing. The only difference between
the two is that the ratio level has a true zero point which the interval does not have. With both
these levels we can state the exact differences between categories (Elifson 1997).
7. Limitations of Statistics in Sociology
Statistics plays a role in Sociology, especially in Applied Sociology. There is a debate which has been
going on since the middle of the twentieth century between researchers who are committed to the use
of quantitative methods and computer application and those who believe in qualitative approach in
sociology. The latter group argues that statistics, if its importance is overemphasized, will become a
substitute for sociology. They argue that it is not always appropriate to conduct research with
quantitative variables that can be handled by statistical analysis. The decision to apply statistics to the
research would depend on factors like the nature of the problem, the subjects of study and the
availability of previously collected data, to name a few (Weinstein 2011). Researchers now a day
increasingly depend on the use of mixed methods. In general, mixed methods combine both
qualitative and quantitative techniques to cancel out their weaknesses. Triangulation is a particular
application of mixed methods (Guthrie 2010). One way in which a qualitative research approach is
introduced into quantitative research is through ethnostatistics which implicates the study of the
construction, interpretation and display of statistics in quantitative social research. The idea of
ethnostatistics can be applied in many ways but one predominant way to apply it is to treat statistics as
rhetoric. More specifically this implies examining the language used in persuading audiences about
the validity of the research (Bryman 2004: 446). To conclude, we can say that statistics will be a
necessary tool for effective research but can never be a substitute for sociological reasoning. It can
give the data some precision and make it manageable and smart for presentation (Weinstein 2011).
8. Summary
The present module has tried to analyse the processes and methods to examine quantitative data that
is, data that can be reduced to numbers. This process comes at a time when the researcher is through
with the process of data collection. The data are first to be processed through various methods of
coding, tabulation and classification. These help to reduce the data to manageable proportions and
make it ready to be applied to interpret data. After the data are processed different methods of
24
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
statistics like measures of central tendency, dispersion, chi-square, t-test, coefficient correlation,
simple regression and contingency tables are used to interpret data. The choice of the use of statistical
application depends on the nature of the research and the availability of the levels of data. But it has to
be remembered that statistical analysis is only a helping tool of research. It can never be a substitute
for the efforts of the researcher and the quality of the data collected. A combination of quantitative
and qualitative methods of analysis is essential for the interpretation of data in social research.
25
Sociology Name of Paper: Methodology of Research in Sociology
Name of Module: Processing and Analyzing Quantitative Data
9. References
Ahuja Ram. Research Methods. Jaipur: Rawat Publications, 2007.
Bryman Alan. Social Research Methods. New York: Oxford University Press, 2004.
Elifson Kirk W, Runyon Richard P. and Haber Audrey. Fundamentals of Social Statistics. United
States: Mc. Graw Hill, 1997.
Garrett, Henry E. Statistics in Psychology and Education. New York: David McKay Company, Inc. ,
1981.
Guthrie, Gerard. Basic Research Methods: An Entry to Social Science Research. New Delhi: Sage
Publications India Private Limited, 2010.
Kothari C.R. Research Methodology: Methods and Techniques. New Delhi: New Age International
(P) Limited, Publishers, 2008.
Leonard Wilbert Marcellus. Basic Social Statistics. Illinois: Stipes Publishing L.L.C., 1996.
Levin Jack and Fox James Alan.Elementary Statistics in Social Research. New Delhi: Dorling
Kindersley (India) Pvt. Ltd., 2006.
Majumdar P. K. Research Methods in Social Science. New Delhi: Vinod Vasishtha for Viva Books
Private Limited, 2005.
Morrison Ken. Marx, Durkheim, Weber. London: Sage Publications, 1995.
Vito Gennaro and Latessa Edward.Statistical Applications in Criminal Justice. London: Sage
Publications, 1989.
Weinstein Jay Alan. Applying Social Statistics. United Kingdom: Rowman and Littlefield Publishers
Inc., 2011.
Top Related