Cent Tend SD Corr Reg
date post
08-Apr-2018Category
Documents
view
217download
0
Embed Size (px)
Transcript of Cent Tend SD Corr Reg
8/6/2019 Cent Tend SD Corr Reg
1/69
Measurement of central tendency
Measurement of dispersionCorrelation Regression
Statistical methods
8/6/2019 Cent Tend SD Corr Reg
2/69
Data ts types
Definition ofData: Facts, figures, enumerations & other materials, pastand present, serving as basis for study and analysis; they are raw
material for analysis; provide basis for testing hypothesis, developingscales and tables Data help researchers draw inferences on specific issues/
problems Quality of findings depend on relevance, adequacy & reliability of
data Types of data (Not in statistical sense)
A.1. Personal data (Individual as a source) Demographic & socio-economic Characteristics Behaviour variables Attitude, behaviour, opinions Awareness, preferences, knowledge
Practices, intensions
2. Organisational data (Organisational sources) Archives ,Manuscript library, museums
3. Territorial data Economic structure, occupation pattern
B. I Secondary (Paper method)
8/6/2019 Cent Tend SD Corr Reg
3/69
Methods & Techniques ofData Collection
I-Secondary data
How to scrutinize
Published & unpublished
Methods where used
A-Meta analysis
B- Historical method
C-Content analysis D-Informetrics
E-Use studies
8/6/2019 Cent Tend SD Corr Reg
4/69
II-Primary data
A-Records & relics B-Observation C-Experimentation D-Simulation E-Ask people orally F-Ask people in writing G-Panel study H-Projective techniques I -Sociometry
J -Case study-Interview / Depth interview / Schedule-Mail survey / questionnaire-Mechanical devices
8/6/2019 Cent Tend SD Corr Reg
5/69
Primary Data
Secondary Data-1. Internet sites /webpage of different companies and
organizations2. Central and local govt. studies and reports,3. Rules on international trading, import and exports,
state budgets4. FICCI(federation ofIndian chambers of conference
and industry),CII(Confederation ofINDIANINDUSTRY),ASSOCAM(Associated chamber ofcommerce and Industry).
5. Policies on foreign direct investment
Data Sources
8/6/2019 Cent Tend SD Corr Reg
6/69
Skewness and Kurtosis: someexamples
Edu ational Attainment
7.06.05.0
.0
.0
.01.0
Edu ational Attainment
Frequen
1
0
100
80
60
0
0
0
Std. De
= 1.81
ean =
.8
N =
.00
Reason or ermination
17.515.01
.510.07.55.0
.50.0
Reason or ermination
Frequen
80
60
0
0
0
Std. De
= 5.
6
ean =
.6
N = 1
.00
8/6/2019 Cent Tend SD Corr Reg
7/69
8/6/2019 Cent Tend SD Corr Reg
8/69
8/6/2019 Cent Tend SD Corr Reg
9/69
8/6/2019 Cent Tend SD Corr Reg
10/69
8/6/2019 Cent Tend SD Corr Reg
11/69
8/6/2019 Cent Tend SD Corr Reg
12/69
8/6/2019 Cent Tend SD Corr Reg
13/69
Pictogram
8/6/2019 Cent Tend SD Corr Reg
14/69
Annotated box plot
8/6/2019 Cent Tend SD Corr Reg
15/69
Describing Data Numerically
Arithmetic Mean
Median
Mode
Describing Data Numerically
Variance
Standard Deviation
Coefficient of Variation
Range
Interquartile Range
Central Tendency Variation
8/6/2019 Cent Tend SD Corr Reg
16/69
Measures of Central Tendency
Central Tendency
Mean Median Mode
n
n
1i
i!!
Overview
Midpoint ofranked values
Most fre uentlyobserved value
Arithmeticaverage
8/6/2019 Cent Tend SD Corr Reg
17/69
Arithmetic Mean
The arithmetic mean (mean) is the mostcommon measure of central tendency
For a population ofN values:
For a sample of size n:
Sample size
nnn1
n
1ii
!!
!
. Observedvalues
N
xxx
N
x
N21
N
1ii
!!
! .
Population size
Populationvalues
8/6/2019 Cent Tend SD Corr Reg
18/69
Arithmetic Mean
The most common measure of central tendency
Mean sum of values divided by the number of values
Affected by extreme values (outliers)
(continued)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
35
15
5
54321!!
4
5
2
5
104321!!
8/6/2019 Cent Tend SD Corr Reg
19/69
Median
In an ordered list, the median is the middlenumber(50% above, 50% below)
Not affected by extreme values Median L+[(1/2N-C)/f ]h Q2 Compare knowledge level in Two subjects for a
group of students by median
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
8/6/2019 Cent Tend SD Corr Reg
20/69
Quartiles, Deciles.Percentiles
Similar to median which divides data in to parts , Quartiles (dividesdata in four parts), Deciles(divides data in ten parts) and percentiles(divides data in 1000 parts)
Mode 3median-2mode
3,2,1,..4
Qj !
! jhf
fcpjN
L
9,....2,1
..10
Dj
!
!
j
hf
fcpjN
L
99...2,1
..100
Pj
!
!
j
hf
fcpjN
L
8/6/2019 Cent Tend SD Corr Reg
21/69
Finding the Median
The location of the median:
If the number of values is odd, the median is the middle number
If the number of values is even, the median is the average ofthe two middle numbers
Note that is not the value of the median, only the
position of the median in the ranked data
dataorderedtheinosition
1n
ositionedian
!
2
1n
8/6/2019 Cent Tend SD Corr Reg
22/69
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be several modes
Mode L+[(f-f-1)/(2f-f-1-f1 )]h
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Frequency after modalclass
Frequency beforemodal class
8/6/2019 Cent Tend SD Corr Reg
23/69
Five houses on a hill by the beach
Review xample
$
$
$
$
$
House Prices:
$2,000,000500,000300,000100,000
100,000
8/6/2019 Cent Tend SD Corr Reg
24/69
Review xample:Summary Statistics
Mean: ($3,000,000/5)
$600,000
Median: middle value of ranked data$300,000
Mode: most fre uent value$100,000
House Prices:
$2,000,000
500,000300,000100,000100,000
Sum 3,000,000
8/6/2019 Cent Tend SD Corr Reg
25/69
Example
5 1 Class Freque C.F less C.F More Than
9 2 19 5-10 5 5 49
7 3 20 10-15 6 11 44
9 4 22 15-20 15 26 38
10 5 22 20-25 10 36 23
9 7 17 25-30 5 41 135 7 30-35 4 45 8
Mean 7.714286 4.142857 20 35-40 2 47 4
mode 9 7 22 40-45 2 49 2median 9 4 20
SD 4.238095 5.47619 4.5
Median=L+[(1/2N-C)/f ]h e ( - - ( - - -Median Class=Total Freq/2 Class MODAL CLASS= Max Frequency class
Median Class='15-20 i.e 15 is max fre in freq G21
i.e 26 in Cumulative frequency
Median=15+[((1/2)49-11)/15 ]5 Mode 15+[(15-6)/(2x15-6-10 )]5
8/6/2019 Cent Tend SD Corr Reg
26/69
Mean is generally used, unlessextreme values (outliers) exist
Then median is often used, sincethe median is not sensitive toextreme values.
Example: Median home prices may be
reported for a region less sensitive tooutliers
Which measure of locationis the best?
8/6/2019 Cent Tend SD Corr Reg
27/69
Geometric mean & Harmonicmean
Geometric mean is nth root of product of n observations ( ex: averagepercent increase in sales, production, ), Best considered in case ofconstructing index number.
Harmonic mean: restricted use such as average rate of increase of
profits average price at which an article has been sold
NX
anti !log
logG.M
,H.M,1
H.M
!
!
X
f
X
N
21
2211 loglog.log NN
GNGN
!
8/6/2019 Cent Tend SD Corr Reg
28/69
Same center,
different variation
Measures of Variability
Variation
Variance Standard
Deviation
Coefficient
of Variation
Range Interquartile
Range
Measures of variation give
information on the spreadorvariability of the datavalues.
8/6/2019 Cent Tend SD Corr Reg
29/69
Range
Simplest measure of variation
Difference between the largest and the smallest
observations:Range Xlargest Xsmallest
0 1 2 3 4 5 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
8/6/2019 Cent Tend SD Corr Reg
30/69
Ignores the way in which data are distributed
Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Disadvantages of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
8/6/2019 Cent Tend