data processing and analysis of data
-
Upload
ankita3031 -
Category
Documents
-
view
11 -
download
4
description
Transcript of data processing and analysis of data
![Page 1: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/1.jpg)
A TalkOn
‘Data Processing and Analysis of Data’(Research Methodology)
![Page 2: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/2.jpg)
Introduction
• The data has to be processed and analyzed for the purpose of research plan
• This is essential for scientific study and comparisons.
• Processing implies– Editing– Coding– Classification and– Tabulation
![Page 3: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/3.jpg)
• Analysis implies– Computation of certain measures – Searching for patterns of relationships that exists
among data groups.
![Page 4: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/4.jpg)
Processing Operations
1. Editing– The process of examining the collected raw data
to detect errors and omission and also correct these.
– It involves scrutiny of the completed questionnaires and/or schedules.
– There are two variations of editing• Field editing.• Central editing.
![Page 5: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/5.jpg)
• Field editing– Consists of review of the reporting forms by the
investigator for completing (rewriting) what has been written in abbreviated form at the time of recording the response.
– This editing is expected to be done as soon as possible after the interview.
– While doing field editing the investigator should not try to correct errors or omissions by simply guessing the suitable option.
![Page 6: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/6.jpg)
• Central editing– Takes place when all forms or schedules have
been completed and returned to office.– All the forms should be edited by a single editor in
a small study or a team of editors in case of large inquiry.
– Corrections are allowed in this editing.
![Page 7: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/7.jpg)
– There are certain points to be kept in view while performing their work
a) Editors should be familiar with instructions given to the interviewers and coders.
b) Single line should be drawn to cross out any information.
c) Entries should be made in some distinctive color and in standardized form.
d) They should initial all answers which they change or supply,.
e) Editor’s initials and the date of editing should be placed on each completed from or schedule.
![Page 8: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/8.jpg)
2. Coding– Refers to the process of assigning numerals or
other symbols to answers so that the response can be put into limited categories.
– Necessary for efficient analysis.– Coding decision is usually taken at the design
stage of the questionnaire.
![Page 9: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/9.jpg)
3. Classification– Individual Data should be reduced into
homogeneous groups to get meaningful relationships.
– classification is the process of arranging data in groups or classes on the basis of some common characteristics.
![Page 10: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/10.jpg)
• Broadly there are two types of classification based on the nature of the phenomena involved.a) Classification according to attributes.
b) Classification according to class-interval.
![Page 11: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/11.jpg)
• Classification according to attributes:– Data are classified on the basis of common
characteristics either descriptive or numerical.– Descriptive characteristics refer to qualitative
phenomenon which cannot be measured quantitatively
– Data obtained this way is known as statistics of attributes.
![Page 12: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/12.jpg)
– This classification can be either simple or manifold– In Simple classification, we consider only one
attribute and make two classes; one possessing the considered attribute and the other devoid of it.
– In Manifold classification, more than one attributes are considered and data is divided into number of classes.
![Page 13: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/13.jpg)
• Classification according to class-interval:– Data relating to income, production, age etc are
known as statistics of variables and are classified on the basis of class intervals.
![Page 14: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/14.jpg)
4. Tabulation– Tabulation refers to the process of summarizing
the raw data and displaying the same in compact form.
– It is essential because:• It conserves space and reduces the explanatory
statements to minimum.• Facilitates the process of comparison.
![Page 15: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/15.jpg)
Elements/Types of Analysis
• In case of survey or experimental data, analysis involves – estimating the values of unknown parameters of
the population,– Testing of hypotheses for drawing inferences.
• Categories of analysis:a)Descriptiveb)inferential
![Page 16: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/16.jpg)
• Correlation analysis:– Studies the joint variation of two or more
variables for determining the amount of correlation between two or more variables.
• Casual analysis:– Studies how one or more variable affect changes
in another variable.
![Page 17: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/17.jpg)
• Multivariate analysis:– “All statistical methods which simultaneously
analyze more than two variables on a sample of observations.”
– It involves:a) Multiple regression analysisb) Multiple discriminant analysisc) Multivariate analysis of varianced) Canonical analysis
![Page 18: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/18.jpg)
STATISTICS IN RESEARCH
• Statistics in research functions as a tool in designing research, analyzing its data and drawing conclusions there from.
• The important statistical measures used to summarize the survey/research are:1) Measure of central tendency or statistical
averages.2) Measures of dispersion
![Page 19: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/19.jpg)
3. Measures of asymmetry(skewness)4. Measures of relationship5. Other measures
![Page 20: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/20.jpg)
Measure of Central Tendency
– It tells the point about which items have a tendency to cluster.
– Mean, Median ,Modes are the most popular averages.
– Mean is also known as arithmetic average– Median is the value of the middle item of series
when it is arranged in ascending or descending order.
– Mode is the most commonly or frequently occurring value in a series.
![Page 21: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/21.jpg)
Measure of Dispersion
– It is used to give an idea about the scatter of the values of items of a variable in the series around the true value of average.
– Important measures of dispersion are:a) Rangeb) Mean deviation andc) Standard deviation
![Page 22: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/22.jpg)
• Range– Is the simplest possible measure of dispersion – It is defined as the difference between the values of
the extreme items of a series.• Mean deviation– It is the average of difference of the values of items
from some average of the series.• Standard deviation– Most widely used measure of dispersion– Denoted by the symbol σ
![Page 23: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/23.jpg)
– Standard deviation is defined as the square root of the average of squares of deviations.
Where
![Page 24: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/24.jpg)
Measure of Asymmetry
– When the distribution of the elements in a series happens to be perfectly symmetrical then we get the following type of curve. Technically such curves are described as normal curve.
![Page 25: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/25.jpg)
• If the curve is distorted, it is said to exhibit asymmetrical distribution which indicates the presence of skewness.
– Where
![Page 26: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/26.jpg)
![Page 27: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/27.jpg)
Measures of Relationship
– In context of bivariate and multivariate population, it is required to know the relation of the two or more variables in the data to one another.
– These association/correlation and cause-and-effect relationship are studied using correlation technique and the technique of regression
![Page 28: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/28.jpg)
• In case of bivariate population:– Correlation can be studied through:
a) Cross tabulationb) Charles Spearman’s coefficient of correlationc) Karl Pearson’s coefficient of correlation
– Cause-and-effect relationship can be studied through simple regression technique.
![Page 29: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/29.jpg)
1. Cross tabulation:– Useful when the data are in nominal form– Classify each variable in two or more categories
and then cross classify the variables in these categories.
– The interaction between them can be as follows:• Symmetrical• Reciprocal• Asymmetrical
![Page 30: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/30.jpg)
• In a symmetrical relationship the two variables vary together.
• In reciprocal relationship the two variables mutually influence or reinforce each other.
• In an asymmetric relationship one variable (independent variable) is responsible for another variable (dependent variable).
![Page 31: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/31.jpg)
2. Charles Spearman’s coefficient of correlation:― This technique deals with ordinal data where ranks are
given to the different values of the variables― The objective is to determine the extent to which the
two sets of ranking are similar of dissimilar.
![Page 32: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/32.jpg)
3. Karl Pearson’s coefficient of correlation: – Most widely used method to measure the
degree of relationship between two variables.
![Page 33: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/33.jpg)
• Simple regression analysis:– Regression is the determination of a statistical
relationship between two or more variables, where one variable is the cause of the behavior of another variable.
– If X is the independent variable and Y is the dependent variable then, the regression equation of Y on X is given as below
![Page 34: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/34.jpg)
• In case of multivariate population:– Correlation can be studied through:
a)coefficient of multiple correlation.b)coefficient of partial correlation.
– Cause-and-effect relationship can be studied through multiple regression equations.
![Page 35: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/35.jpg)
1. Multiple Correlation and Regression– When there are two or more independent
variables then the analysis concerning relationship is known as multiple correlation
– The equation describing such relationship is known as multiple regression equation.
![Page 36: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/36.jpg)
• In the context of two independent variables and one dependent variable the equation can be given as:
![Page 37: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/37.jpg)
• Partial correlation:– Partial correlation measures separately the
relationship between two variables such that the effect of other related variable is eliminated
– In other words the aim is at measuring the relation between a dependent variable and particular independent variable by holding all other variables constant.
![Page 38: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/38.jpg)
Other Measures
1. Index number:– Used when the series are expressed in different
units.– In such scenario the series is converted into
series of index numbers.– For example the given figures can be expressed
in terms of percentage.
![Page 39: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/39.jpg)
2. Time- Series Analysis:– When the data collected relates to some time
period concerning a given phenomenon, particularly in economic and business scenario, such data are labeled as ‘Time-Series’
– Factors affecting such series areI. Secular trend (T) : changes taking place at long duration of
time II. Short time oscillations: changes taking place at short
duration of time
![Page 40: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/40.jpg)
• Short time oscillation are affected by the following factors:
a) Cyclic fluctuations (C): the fluctuations as a result of business cycles.
b) Seasonal fluctuations (S): these fluctuations are of short duration occurring at a regular sequence at specific interval of time.
c) Irregular fluctuations (I): such fluctuations takes place at completely unpredictable fashion.
![Page 41: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/41.jpg)
• For analyzing time series there are two models:a) Multiplicative modelb) Additive modelMultiplicative model assumes that the various
component interact in a multiplicative manner to produce the given values of the overall time series and can be stated as;
![Page 42: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/42.jpg)
The additive model considers the total of various components resulting in the given values of the overall time series and can be stated as
![Page 43: data processing and analysis of data](https://reader035.fdocuments.net/reader035/viewer/2022062808/5695cf8f1a28ab9b028e97fc/html5/thumbnails/43.jpg)