Section 9.1 Samples and Central Tendency Section 9.1 Samples and Central Tendency.
central tendency and correlation coeeficent
-
Upload
qis-college-of-enggtechnology -
Category
Education
-
view
208 -
download
3
description
Transcript of central tendency and correlation coeeficent
![Page 1: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/1.jpg)
QTBD 2013
UNIT-1Measures of Central TendencyDefinition:
Average is a measure which represents the huge volume of data into a single numerical value.
An average gives us an idea about the concentration of the values in the central part of the distribution.
Averages are the typical values around which the other distribution concentrates.Types of Measures
1) Arithmetic Mean (or) Average2) Median3) Mode4) Geometric Mean5) Harmonic Mean
Characteristics of Measures of central tendency
It should be easy to understand and easy to calculate. It should be based on all items. It should be capable for further algebraic calculations. It should be rigidly defined. It should not affected by the extreme observations. It should not affected by the fluctuations of the sampling.
Demerits of measures of Central Tendency
It can’t be determined by inspection method nor can’t locate by graphically. Arithmetic mean can’t be used for qualitative characteristics, which cannot be measured
quantitatively. Ex. Honesty, Intelligence, beauty, etc. Arithmetic mean cannot be used for open ended class-intervals.
Ex. below 90 and above 100. Arithmetic mean is affected by extreme values. Arithmetic mean leads to wrong conclusions if the details of the data from which it is
computed are given. Arithmetic mean cannot be obtained if the single observation is missing or lost from the
remaining values. Arithmetic mean is not suitable measure for extremely asymmetric distribution.
Method to calculate Average
1) Direct method.2) In-direct method (or) Deviation method.3) Step Deviation method.
1) Direct method :
Raw Data ----------- X = ∑i=1n X i /n
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 1
![Page 2: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/2.jpg)
QTBD 2013
Discrete Data ----- X = ∑i=1n f i X i /∑i=1
n f i
Continuous Data- X = ∑i=1n f i X i /∑i=1
n f i
2) Deviation Method :
Raw Data ----------- X = A + ∑i=1n d i /n
Discrete Data ----- X = A + ∑i=1n f i d i /∑i=1
n f i
Continuous Data- X = ∑i=1n f i X i /∑i=1
n f i
3) Step-Deviation Method :
Raw Data ----------- X = A + ∑i=1n d i /n X C
Discrete Data ----- X = A + ∑i=1n f f i di❑/∑i=1
n f i
Continuous Data- X = A + ∑i=1n f i d i❑/∑i=1
n f i
2) Median:
Median is defined as “middle most “or “Central value “of the set of the observations, when Observations are arranged in ascending or descending order of their magnitude. It divides the given arranged series into two equal parts. Median is also known as ‘Positional Average “.Whereas mean is known as ‘Calculated average “.
When a series consists of even number of terms then median is known as arithmetic mean
Of the central items. It is denoted byM d.Formulas:
Raw Data ----------- Arrange the given set of data in ascending or descending Order. Case – i) If n is odd then median is the value given by
M d = (n+1 ) /2th term Where n = No. of observationsCase –ii) If n is even number then median is given by
M d = (n/2 )+(n+1 /2 )
2 the term
Discrete Data ------ STEP -1: Find the cumulative frequencies of the given data.
STEP -2: Find N = ∑i=1n f i
STEP -3: Find the cumulative frequency just greater than N /2 and the corresponding value of X is known as median value.
Continuous Data--- STEP -1: Find the cumulative frequencies of the given data.
STEP -2: Find N = ∑i=1n f i
STEP -3: Then value of median is given by
M d = L + {N /2−mf }X C
Where L = Lower limit of the median class F = frequency of the median class
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 2
![Page 3: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/3.jpg)
QTBD 2013
M = the cumulative frequency preceding the median class C = width of the class interval
N = ∑i=1n = sum of the frequencies.
3) MODE: Mode is a value in a series which occurs most frequently. In a frequency distribution mode
Is the value which has the maximum frequency. In other words, mode is the value which has theGreatest frequency density in its neighbourhood. Mod e is also known as most frequent value or difficult value or predominant value or most fluctuation value or norm value. FORMULAS:
Raw Data ----------- In this case the value which has maximum frequency is known as mode value.
Discrete Data ------ In this case mode is the value which has maximum frequency corresponding the X
Continuous Data--- STEP -1: Find the cumulative frequencies of the given data.
STEP -2: Find N = ∑i=1n f i
STEP -3: Then value of median is given by
M O = L + f 1−fo
2 f 1−fo−f 2 X C
4) GEOMETRIC MEAN:The geometric mean of n observations is the n th root of the product of the observations.
Let X1, X2, X3 ... Xn are given set of n observations then the geometric mean is given byG.M. = n√¿¿ = ¿¿
If n= 2 the the geometric mean mean is the square root of the product of the observations.EXA MPLE: The geometric mean of 4 and 16
G.M. = 2√ (4 ) . (16 ) = 2√64 = 8
If the observations are greater than 2 then the computation of n th root is not suitable, in that case we can take logarithm.
Log (G.M.) = log ¿¿ = 1/n log ¿
= 1/n {log ( X1 ) . log ( X2 ) . log ( X13 ) ……. log ( Xn )}
FORMULAS:
Raw Data ------------- G.M. = Anti log {(1/n ) (∑i=1n log X i ) }
Discrete Data ------ G.M. = Anti log {(1/N ) (∑i=1n f i log X i )}
Continuous Data--- G.M. = Anti log {(1/N ) (∑i=1n f i logmi) }
5) Harmonic Mean:
The harmonic mean is the reciprocal of arithmetic mean of reciprocal of observations.
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 3
![Page 4: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/4.jpg)
QTBD 2013
If X1, X2, X3 ... Xn are given set of n observations then the harmonic mean is given by
H.M. = 1
1/n(∑i=1n X i )
FORMULAS:
Raw Data ------------- H.M. = 1
1/n(∑i=1n X i )
Discrete Data ------ H.M. =
1
1/n(∑i=1n f i
X i)
Continuous Data--- H.M. =
1
1/n(∑i=1n f i
mi)
Measures of dispersionDefinition:
The meaning of dispersion is ‘scateredness’. The measure of scatter of the given dataabout the average is said to be a measure of dispersion.Characteristics of Good Measure of Dispersion
It should be easy to understand. It should be based on all items. It should be readily comprehensible. Its procedure should be simple. It should be rigidly defined. It should be capable for further algebraic calculations. It should not affected by the extreme observations. It should not affected by the fluctuations of the sampling.
Types of Measures1) Range.2) Quartile Deviation.3) Standard Deviation.4) Mean Deviation.In the above the first two measures are known as ‘positional averages’ and the remaining measures are known as ‘calculated averages’.
Formulas:
1) Range :Range is the difference between the values of the extreme values. It is denoted by R.
Raw Data ----- ---- Range = R= (Largest value- Smallest value) = L-S Discrete Data ----- Range = R= (Largest value- Smallest value) = L-S Continuous Data - Range = R= (Largest value- Smallest value) = L-S
Coefficient of Range
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 4
![Page 5: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/5.jpg)
QTBD 2013
Coefficient of range = L−SL+S
2) Quartile deviation : Quartile deviation is denoted by Q.D. If Q1 is the first quartile and Q3 is the third
Quartile. Then quartile deviation is as follows
Q.D. = Q3−Q 12
Raw Data ----- ---- Q.D. = Q3−Q 12
Discrete Data ----- Q.D. = Q3−Q 12
Continuous Data - Q.D. = Q3−Q 12
3) Mean Deviation : If X1 , X2 , X3, ...... Xn are n observations and di= Xi – a then the mean deviation is
denoted by M.D. And is given by
M.D. = ∑i=0n
¿di∨¿
n¿ where di = Xi- X X = mean
Raw Data ----- ---- M.D. = ∑i=0n
¿di∨¿
n¿ where di = Xi- X X = mean
Discrete Data ----- M.D. = ∑i=0n
fi∨di∨¿
fi¿
where di = Xi- X X = mean
Continuous Data - M.D. = ∑i=0n
fi∨di∨¿
fi¿
where di = mi- X X = mean
Coefficient of Mean Deviation:
Coefficient of Mean Deviation = Mean Deviation
Mean4) Standard Deviation :
If X1 , X2 , X3, ...... Xn are n observations and di= Xi - X then the standard deviation Is denoted by S.D. and is given by
S.D. = √{(∑i=1n
d i
2/n)−(∑
i=1
n
di /n)2}
Raw Data ----- ---- S.D. = √{(∑i=1n
d i
2/n)−(∑
i=1
n
di /n)2}
Discrete Data ----- S.D. = √{(∑i=1n
d i
2/n)−(∑
i=1
n
di /n)2}
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 5
![Page 6: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/6.jpg)
QTBD 2013
Continuous Data - S.D. = √{(∑i=1n
d i
2/n)−(∑
i=1
n
di /n)2}
Coefficient of Variation:C.V. = 100 x(σ / X )
PROBLEMS ON MEASURES OF CENTRAL TENDENCY:1) PROBLEMS ON ARITHMETIC MEAN:a) Direct Method:
Raw Data:1) Find the average for the following data
Solution: X = ΣXn
= 62010
= 62
Discrete Data:1) Find the Arithmetic mean for the following data
X 10 20 30 40 50 60f 5 15 25 20 10 5
Solution:
X = ΣfXΣf
= 270080
=33.75
b) In-Direct Method or Deviation Method: Raw Data
Problem -1 Calculate the average for the following dataFamily A B C D E F G H I JIncome 90 75 60 100 125 50 80 120 500 400
Solution:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 6
X f X f
10 5 50
20 15 300
30 25 750
40 20 800
50 10 500
60 5 300𝞢f = 80 𝞢X f =2700
![Page 7: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/7.jpg)
QTBD 2013
Family Income d i = X i - AA 90 -10
B 75 -25
C 60 -40
D 100 0
E 125 25
F 50 -50
G 80 -20
H 120 20
I 500 400
J 400 300
𝞢d i = 600 X = A +
Σ d i
n = 100 +
600100
= 100 + 60 = 160
Discrete Data: Problem -1 Calculate the average for the following data
X 10 20 30 40 50 60f 5 15 25 20 10 5
Solution:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 7
X f i d i = X i - A f id i
10 5 -30 -150
20 15 -20 -300
30 25 -10 -250
40 20 0 0
50 10 10 100
60 5 20 100
𝞢f i = 80 𝞢f i d i=¿-500
![Page 8: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/8.jpg)
QTBD 2013
X = A + Σ f i di
Σ f i
= 40 + ⌈−50080
⌉ = 40 -6.25 = 33.75
Continuous Data :1) Find the Arithmetic mean for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90f 1 4 10 22 30 35 10 7 1
Solution:C.I f mi f imi d i = mi-A f i d i d i =
d i
c
f id i
0-10 1 5 5 -50 -50 -5 -510-20 4 15 60 -40 -160 -4 -1620-30 10 25 250 -30 -300 -3 -3030-40 22 35 770 -20 -440 -2 -4440-50 30 45 1350 -10 -300 -1 -3050-60 35 55 1925 0 0 0 060-70 10 65 650 10 100 1 1070-80 7 75 525 20 140 2 1480-90 1 85 85 30 30 3 3Total Σ f i=120 𝞢f imi =5620 𝞢f id i =-980 𝞢f id i = -98
1) PROBLEMS ON MEDIAN: Raw Data :
Problem -1 Find the median for the following data also calculates Q1 & Q3 values.
X 120 170 100 110 180 220 160
Solution: Arrange the given data in ascending order
n=7
Q2
Or
md
=
( n+12 )
th
term =
( 7+12 )th
term =
( 82 )th
term
= 4 th term = 160 ⟹md = 160 Q1 = ( n+1
4 )th
term = ( 7+14 )th
term = ( 84 )th
term
= 2th term = 110 ⟹Q1 = 160K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 8
X
120
110 → Q1
120
160 → Q2
170
180 → Q3
220
![Page 9: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/9.jpg)
QTBD 2013
Q3 = ( 3(n+1)4 )
th
term = ( 3(7+1)4 )th
term = ( 244 )th
term
= 6 th term = 180 ⟹Q1 = 180
Discrete Data:Problem – 1 Find the median for the following data also calculate Q1 & Q3 values.
X 10 20 30 40 50 60f 5 15 25 20 10 5
Solution:
⟹N = 80 ⟹N4
= 804
= 20 ⟹ Q1 = 20 ⟹N
2 = 802
=40 ⟹ M d∨¿ Q2 = 30⟹3N4
=
3(80)4
= 60 ⟹ Q3
= 40
Continuous Data Problem -1 Find the median for the following data also calculates Q1 & Q3 values.
C.I 0-10 10-20 20-30 30-40 40-50 50-60f 4 6 10 15 8 7
Solution:N = 50
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 9
X f C.f
10 5 5
20 15 20
30 25 45
40 20 65
50 10 75
60 5 80
C.I f C.f
0-10 4 4
10-20 6 10 → m1
20-30 10 → f 0 20 → m2
30-40 15 → f 1 35 → m3
40-50 8 → f 2 43
50-60 7 50
![Page 10: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/10.jpg)
QTBD 2013
⟹N4
= 504
= 12.5
⟹N2
= 502
= 25
⟹3N4
= 3(50)4
= 37.5
Q1 = L1 + ⌈( N /4 )−m1
f 1⌉ X c
= 20 + ⌈12.5−1010
⌉ X 10
= 20 + 2.5 = 22.5
Q2 = L1 + ⌈( N /2 )−m2
f 2⌉ X c = 30 + ⌈ 25−20
15⌉ X 10 = 30 +3.33 = 33.33
Q3 = L1 + ⌈3 ( N / 4 )−m3
f 3⌉ X c = 40 + ⌈ 37.5−35
8⌉ X 10 = 40 +3.125 = 43.125
2) PROBLEMS ON MODE: Raw Data :
Problem -1 Find the mode for the following data0,6,1,7,2,3,7,6,6,2,6,6,5,6,0Solution:
∴ MODE = 6
Discrete Data :Problem -1 Find the mode for the following data
Height (in inches)
57 59 61 62 63 64 65 66 67 69
f 3 5 7 10 20 22 24 5 2 2
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 10
X f f
0 II 2
1 I 1
2 II 2
3 I 1
5 I 1
6 → M O
IIII I 6
7 II 2
![Page 11: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/11.jpg)
QTBD 2013
Solution:Height
(in inches)f
57 3
59 5
61 7
62 10
63 20
64 22
65 24
66 → M O 5
67 2
69 2
Continuous Data :Problem -1 Find the mode for the following data
C.I 0-400 400-800 800-1200 1200-1600 1600-2000 2000-2400 2400-2800 2800-3200
f 4 12 40 41 27 13 9 4Solution:
C.I f
0-400 4
400-800 12
800-1200 40 → f 0
L →1200-1600
41 → f 1
1600-2000 27 → f 2
2000=2400 13
2400-2800 9
2800-3200 4
M O = L + f 1−fo
2 f 1− fo−f 2 X C = 1200 + ⌈
41−402 (41 )−40−27
⌉ = 1200 + 22.6 = 1226.6
Problems on Geometric Mean: Raw Data :
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 11
![Page 12: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/12.jpg)
QTBD 2013
Problem -1 Find the Geometric mean for the following dataX 2000 200 20 12 8log X i 3.3010 2.3010 1.3010 1.0792 0.9030
Solution:
G.M. = Anti log ⌈Σ log X i
n⌉
= Anti log ⌈8.88525
⌉
= Anti log [1.7770] = 59.8411
Discrete Data :Problem -1 Find the geometric mean for the following data
X 10 20 30 40 50 60f 15 18 22 16 12 7
Solution:
X f log X i f (log X i ¿
10 15 1 15
20 18 1.3010 23.418
30 22 1.4771 32.4962
40 16 1.6021 25.6336
50 12 1.6989 20.3868
60 7 1.7781 12.4467
Total 𝞢f i = 90 𝞢 f (log X i ¿= 129.3797
G.M. = Antilog [ Σ f i logXi
N ] = Antilog [ 129.379790 ] = Antilog [1.4372] = 27.3652
Continuous Data :Problem -1 Find the Geometric mean for the following data.
C.I 15-20 20-25 25-30 30-35 35-40 40-45f 4 20 38 24 10 4
Solution:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 12
X log X i
2000 3.3.10
200 2.3010
20 1.3010
12 1.0792
8 0.9030
𝞢 log X i
![Page 13: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/13.jpg)
QTBD 2013
C.I f mi logmi f ( logmi)15-20 4 17.5 1.2430 4.97220-25 20 22.5 1.3521 27.04225-30 38 27.5 1.439 54.68230-35 24 32.5 1.5118 36.283235-40 10 37.5 1.5740 15.7440-50 4 42.5 1.6283 6.5132
Σ f i = 100
G.M. = [ Σf log X i
N ] = Anti log [ 145.2324100 ] = Anti log [1.4523] = 28.33
5) Problems on Harmonic Mean: Raw Data :
Problem -1 Calculate harmonic mean for the following dataX 200 300 20 12 8 0.8
Solution:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 13
𝞢f ( logmi) = 145.2324
![Page 14: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/14.jpg)
QTBD 2013
H.M. =
n
Σ ( 1X i)
=
61.516
= 3.95
Discrete Data :Problem -1 Calculate harmonic mean for the following data
X 24 26 30 42 17 11f 2 9 7 14 24 5
Solution:
H.M. = Σ f i
Σf i
X i
= 612.86
= 21.319
Continuous Data :Problem-1 Calculate the harmonic mean for the following data
C.I 100-110 110-120 120-130 130-140 140-150f 12 18 25 22 18
Solution:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 14
X 1X i
200 0.005300 0.00320 0.0512 0.08338 0.1250.8 1.25𝞢 1
X i = 1.516
X f i f i
X i
24 2 0.08326 9 0.34630 7 0.23342 14 0.33317 24 1.41111 5 0.454𝞢f i = 61 𝞢 f i
X i = 2.86
![Page 15: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/15.jpg)
QTBD 2013
H.M. =
Σ f i
Σ ( f i
mi) =
950.7577
= 125.379
Problems on Measures of Dispersion :1) Problems on Range Discrete Data :
Problem-1 Find the range for the following dataX 12 12 14 15 16 17f 6 14 10 7 5 3
Solution: Range = L-S = 17-12 = 5
Coefficient of Range = L−SL+s
= 17−1217+12
= 529
= 0.1724
Continuous Data Problem-1: Find the range for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70f 5 8 12 20 15 7 3
Solution: Range = L-S = 70-0 = 70
Coefficient of Range = L−SL+s
= 70−070+0
= 7070
= 1
2) Problems on Quartile Deviation : Raw Data :
Problem-1 Find the quartile deviation for the following dataS.NO. 1 2 3 4 5 6 7Marks 25 35 45 17 35 20 55
Solution:
Q1 = n+14
= 7+14
= 84
= 2nd term = 20
Q3 = 3(n+1)4
= 3(7+1)4
= 244
= 6nd term = 45
Q.D. = Q3−Q1
2
= 45−202
= 12.5
Coefficient of Q.D. = Q3−Q1
Q3+Q1
= 45−2045+20
= 2565
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 15
C.I f i mi f i
mi
100-110 12 105 0.1142110-120 18 115 0.1565120-130 25 125 0.2130-140 22 135 0.1629140-150 18 145 0.1241𝞢f i = 95 𝞢 f i
mi = 0.7577
S.NO. Marks (X i ¿ Ascending order1 25 172 35 20 → Q1
3 45 254 17 355 35 356 20 45 → Q3
7 55 55
![Page 16: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/16.jpg)
QTBD 2013
= 0.3846
Discrete Data :Problem-1 Find the quartile deviation for the following data
X 30 20 40 50 10 60f 15 7 8 7 4 2
Solution:
Q1
=
N4
=
434
= 10.73
≅
11
⟹Q1 = 20Q3 =
3N4
=3(43)4
= 32.25≅ 32
⟹Q3 = 400Q.D. =
Q3−Q1
2 = 40−202
= 10
Coefficient of Q.D. = Q3−Q1
Q3+Q1
= 40−2040+20
= 2060
= 0.3334
Continuous Data :Problem-1 Find the quartile deviation for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70f 4 8 10 16 11 7 3
Solution:C.I f Cumulative frequency (c.f.)0-10 4 410-20 8 12 → m1
L1→20-30 10 → f 1 2230-40 16 38 → m3
L3→40-50 11 → f 3 4950-60 7 5660-70 3 59
Q1 = L1 + [ (N4 )−m1
f 1] XC
= 20 + [ 14.75−1710 ] X10 = 20 +[2.75 ] = 22.75
Q3 = L3 + [ (3N4 )−m3
f 3] XC
= 40 + [ 44.25−3811 ] X10 = 40 +[5.68 ] = 45.68
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 16
X f Ascending order f Cumulative frequency (c .f.)
30 15 10 4 420 7 20→ Q1 7 11→ Q.D. class40 8 30 15 2650 7 40 → Q3 8 34 → Q.D. class10 4 50 7 4160 2 60 2 43
![Page 17: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/17.jpg)
QTBD 2013
Q.D. = Q3−Q1
2 = 45.68−22.75
2 = 11.465
Coefficient of Q.D. = Q3−Q1
Q3+Q1
= 45.68−22.7545.68+22.75
= 22.9368.43
=0 .3351
3) Problems on Mean Deviation : Raw Data:
Problem-1 Find the mean deviation for the following dataX 7 4 10 9 15 12 7 9 7
Solution:
X = Σ X i
n = 809
= 8.9
M.D. = Σ|di|
n = 21.19
= 2.344
Coefficient of M.D. =
Discrete Data :Problem-1 Find the mean deviation for the following data
X 10 15 20 30 40 50f 8 12 15 10 3 2
Solution:
X
=
Σ f i X i
N
=
108050
= 21.6
M.D. = Σ f i|d i|
N = 39250
= 7.84
Coefficient of M.D. = M . D .Mean
= 7.8421.6
= 0.3629
Continuous Data :
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 17
X Ascending Order (X i)
|di| = |X i−X|
7 4 4.94 7 1.9
10 7 1.99 7 1.9
15 9 0.112 9 0.17 10 1.1
9 12 3.1 7 15 6.1
ΣXi = 80 𝞢 |di| = 21.1
X f X i f i |di| =|X i−X| f i|d i| 10 8 80 11.6 92.815 12 180 6.6 79.220 15 300 1.6 2430 10 300 8.4 8440 3 120 18.4 55.250 2 100 28.4 56.8
N= 50 𝞢 X i f i = 1080 Σ f i|d i| = 392
![Page 18: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/18.jpg)
QTBD 2013
Problem-1 Find the mean deviation for the following dataC.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80f 5 8 7 12 28 20 10 10
Solution:C.I f mi f i mi |di| =|X i−X| f i |di|
0-10 5 5 25 40 20010-20 8 15 120 30 24020-30 7 25 175 20 14030-40 12 35 420 10 12040-50 28 45 1260 0 050-60 20 55 1100 10 20060-70 10 65 650 20 20070-80 10 75 750 30 300
N =100 𝞢f i mi= 4500 𝞢f i |di|=¿ 1400
X = Σ f imi
N = 4500100
= 45
M.D. = Σ f i|d i|
N = 1400100
= 14
Coefficient of M.D. = M . D .Mean
= 1445
= 0.3111
4) Problems on Standard Deviation : Raw Data :
Problem-1 Find the Standard deviation for the following dataX 8 10 12 14 16 18 20 22 24 26
Solution:
24 8 6426 10 100
Σ d i= 100 𝞢d i2 = 340
X = Σ d i
n = 1010
= 1
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 18
X d i = X i - A d i2
8 -8 6410 -6 3612 -4 1614 -2 4
16 → A 0 018 2 420 4 1622 6 36
![Page 19: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/19.jpg)
QTBD 2013
S.D.(σ ¿= √[ Σ d i2
n ]−⌈ X2 ⌉ = √ 340100− (12 ) = √3.4−1 = √2.4 = 1.5492
C.V. = σX
X 100 = 1.54921
X 100 = 154.92
Discrete Data:Prolem-1 Find the Standard deviation for the following data
X 5 15 25 35 45 55 65f 3 10 20 30 15 12 10
Solution:X f d i = X i - A f i d i f i d i
2
5 3 -30 -90 270015 10 -20 -200 400025 20 -10 -200 2000
35→ A 30 0 0 045 15 10 150 150055 12 20 240 480065 10 30 300 9000𝞢 f = 100 𝞢f i d i = 400 𝞢f i d i
2=24,000
X = Σ f i di
N = 400100
= 4
S.D.(σ ¿= √[ Σ f i di2
N ]−⌈ X 2⌉ = √( 2400100 )−(42 ) = √24−16 = √8 = 2.8284
Continuous Data :Problem-1 Find the Standard deviation for the following data
C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80f 5 8 7 12 28 20 10 10
Solution:C.I f mi d i = X i - A f i d i f i d i
2
0-10 5 5 -40 -200 800010-20 8 15 -30 -240 720020-30 7 25 -20 -140 280030-40 12 35 -10 -120 120040-50 28 45 0 0 050-60 20 55 10 200 200060-70 10 65 20 200 4000
X =A + Σ f i di
N X c = 45 +
0100
X 10 = 45 + 0 = 45
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 19
70=80 10 75 30 300 7000𝞢 f = 100 𝞢f i d i = 0 𝞢f i d i2=34,200
![Page 20: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/20.jpg)
QTBD 2013
S.D.(σ ¿= √[ Σ f i di2
N ]−⌈ X 2⌉ = √( 34200100 )−(452 ) = √342−2025 = √1980
= 44.4972
C.V. = σX
X 100 = 44.497245
X 100 = 98.8827
PERMUTATIONS:Definition:
The each arrangement made by choosing r objects among n is called a ‘Permutation’.The total number of arrangements innpr. Also written as P (n, r).
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 20
![Page 21: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/21.jpg)
QTBD 2013
npr= n.(n−1 ) . (n−2 ) …. ( n−r+1 )= n . ( n−1 ) . (n−2 ) … ..1(n−r ) . (n−r+1 ) … .1 =
n!
(n−r )!NOTE: i) P (n, n) = n! ii) P (n, (n-1)) = P (n,n)PERMUTATIONS WITH REPETITIONS:
Suppose there are n objects. If repetitions are allowed, then the number of permutations taking r at a time is nr
I. The number of permutations of choosing r1 of type 1, r2 of type 2 and the rest are different
and is npr=n !
(r1 ) ! (r2 ) !
ii. The number of permutations of choosing r1 of typer 1, r2 of type 2, r3 of type 3 and the rest
are different and is npr=n !
(r1 ) ! (r2 ) ! (r3 ) !RESTRICTED PERMUTATIONS:
1. Suppose there are n objects, we have to select r such that particular s objects should not be selected, then the number of permutations is ¿
2. Suppose there are n objects, we have to select r such that particular s objects should be selected, then the number of permutations is ¿. rpS
CIRCULAR PERMUTATIONS:The number of ways of sitting n people in circular seats is (n−1)!
COMBINATIONS:Definition:
The selection of r different objects selecting if the order is not important among n objects is called a ‘combination’.
If we select r objects, then number of possible ways is
nC r = C (n, r) = n!
r! (n−r )!
NOTE: i) If the order is important and repetitions are allowed, then we can select r objects among n
objects in n!
(n−r )! ways.
ii) The number of arranging n stones in r boxes such that there will be one at least one stone in each box is C (r, (n-r)) = C ((n-1), (n-r)) = ¿iii) Suppose the set A = (a1 , a2 ,….an ¿ andr1 , r2 , …. , rn. The number of permutations of A, where
each element ar is repeated ri times as (r1+r2+… ..+n )(r1 ) ! (r2 ) !… (rn ) !
REPETITIONS ARE ALLOWED:
1) The number of combinations of r objects among n objects, if the repetitions are allowed and
the r is not important is C((n+r-1), (n-1)) = (n+r−1)!
r! (n−r )!2) The number of ways of distributing n chaklets to r children, so that each child get at lest
One is C ((n-1), (n-r))
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 21
![Page 22: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/22.jpg)
QTBD 2013
3) The number of non-negative integer solutions of X1+ X2+.....+ X n =n such that X i> 0 isC ((n-1), (n-r))
PROBLEMS ON PERMUTATIONS: PROBLEM -1: How many ways can you arrange 9 different books, such that a special book is on 4th place?SOLUTION: There are 9 books, one is on 4th place, so removing 4th place, remaining other 8, can be arrange in 8! ways i.e. npr= 40,320 ways.PROBLEM-2: How many different eight –digit numbers can be found by arranging the digits 1, 1,1,1,2,3,3,3?SOLUTION: The number of digits = 8The digit 1 4 times, the digit 2 1 time, the digit 3 3 times
The number of ways npr=n !
(r1 ) ! (r2 ) ! (r3 ) ! = 8 !(4 )! (1 ) ! (3 ) ! = 240 ways.
PROBLEMS ON COMBINATIONS:PROBLEM-1: Find the number of permutations of the word CALCULUS.SOLUTION: There are 8 letters in the word. The letter C, L and U repeated twice.
So the number of permutations is 8 !
(2 )! (2 )! (2 ) ! = 5040
PROBLEM-2: How many possible committees of 6 people can be chosen from 15 men and 10 women, if 3 men and at least 2 women must be there on each committee?SOLUTION: Three women and 3 men = C (15, 3) X C (10, 3) = 54,600.
Two women and 4 men = C (15, 4) X C (10, 2) = 61,425.The total number of possible ways = 54,600 + 61,425 = 1, 16,025
BAYE’S THEORMStatement: If an event A will appears only if the combination of any one of n mutually exclusive events E1, E2, .....En. If an event A is appeared then the probability that it was preceded by the
particular event Ei is obtained. Then
P (Ei / A) =
P(E i) . P (A / Ei)
∑i=1
n
P(Ei). P( A /E i)
PROBLEMS ON BAYE’S THEORM PROBLEM -1 In a bolt factory machines A, B, C manufactures 20 %, 30 %,and 50 % of the their output and 6 %, 3 %, and 2 % are defectives. A bolt is drawn at random and found to be defective. Find the probabilities that it is manufactured by i) Machine A ii) Machine B iii) Machine C.SOLUTION: Let A = The event that the bolt is manufactured by Machine A.
B = The event that the bolt is manufactured by Machine B. C = The event that the bolt is manufactured by Machine C. D = The event that the drawn bolt is defective.
P (A) = The probability that the bolt is manufactured by Machine A = 20100
P (B) = The probability that the bolt is manufactured by Machine B = 30100
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 22
![Page 23: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/23.jpg)
QTBD 2013
P (C) = The probability that the bolt is manufactured by Machine C = 50100
P (D/A) = If the bolt is manufactured by Machine A, then the probability that the drawn bolt is
defective = 6100
P (D/B) = If the bolt is manufactured by Machine B, then the probability that the drawn bolt is
Defective = 3100
P (D/C) = If the bolt is manufactured by Machine C, then the probability that the drawn bolt is
Defective = 2100
i) If the drawn bolt is defective, then the probability that it is from machine
P (A/D) = P (A ). P(D / A)
P ( A ) . P( DA )+P (B ) . P( D
B )+P (C ) . P( DC ) =
( 20100 ) .( 6100 )( 20100 ) .( 6100 )+( 30100 ) .( 3100 )+( 50100 ) .( 2100 )
= 120/10000
120/10000+90 /10000+100 /10000 = 12/100012/10000+9/1000+10 /1000 =
= 12/1000
(12+9+10 )/1000 = 0.0120.031
= 0.3871
ii) If the drawn bolt is defective, then the probability that it is from machine
P (B/D) = P(B). P(D /B)
P ( A ) . P( DA )+P (B ) . P( D
B )+P (C ) . P( DC ) =
( 30100 ) .( 3100 )( 20100 ) .( 6100 )+( 30100 ) .( 3100 )+( 50100 ) .( 2100 )
= 90 /10000
120/10000+90 /10000+100 /10000 = 0.009
(0.012 )+(0.009 )+(0.01 ) = 0.0090.031
= 0.2903
iii) If the drawn bolt is defective, then the probability that it is from machine
P (C/D) = P(C ). P(D /C )
P ( A ) . P( DA )+P (B ) . P( D
B )+P (C ) . P( DC ) =
( 50100 ) .( 2100 )( 20100 ) .( 6100 )+( 30100 ) .( 3100 )+( 50100 ) .( 2100 )
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 23
![Page 24: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/24.jpg)
QTBD 2013
= 100/10000
120/10000+90 /10000+100 /10000 = 0.01
(0.012 )+(0.09 )+(0.01 ) = 0.010.031
= 0.3226PROBLEM -2 Urn A contains 3 red and 5 white marbles. Urn B contains 2 red and 1 white marbles and Urn C contains 2 red and 3 white marbles. An Urn is selected at random and a marble is drawn from the urn. If the marble is red, what is the probability that it came from Urn A?SOLUTION: Let A = The event of choosing the Urn A.
B = The event of choosing the Urn B. P (A) = The probability of selecting 1st urn = 1/3 P (B) = The probability of selecting 2nd urn = 1/3
P (A) = The probability of selecting 3rd urn = 1/3 P (R/A) =The probability of selecting 1 red ball from the urn A = m /n=¿¿ C1
3 / C18 = 3/8
P (R/B) =The probability of selecting 1 red ball from the urn B = m /n=¿¿ C12 / C1
3 = 2/3
P (R/C) =The probability of selecting 1 red ball from the urn C = m /n=¿¿ C12 / C1
5 = 2/5From the baye’s theorem we haveP (A/R) = If the marble is red, then the probability that is came from urn A
= P ( A ) . P ( R/ A )
P ( A ) . P (R / A )+P (B ) . P (R /B )+P (C ) . P ( R/C ) = ¿¿¿
= 1/8
1/8+2/9+2 /15 = 0.125
0.125+0.2224+0.1334 = 0.1250.4808 = 0.2601
BINOMIAL DISTRIBUTION:Definition: A random variable X is said to follow Binomial Distribution if it assumes non-negative values and its probability mass function (p.m.f) is follows
P (X=x) = (nCx) px q (n-x) ; x= 0,1,2,3....., n ; q=1-p =o ; Otherwise
Examples: 1) The number of heads obtained in 3 tosses of a coin2) The number of defectives in a lot of 10 items3) The number of boys in a family of 4 children
POISSON DISTRIBUTION:Definition: A random variable X is said to follow Poisson distribution if it assumes non- negative values and its probability mass function (p.m.f.) is given by
P (X,λ) = P (X) = e−λ λX
X ! ; X = 0, 1, 2,.... ; λ >0
= 0 ; otherwise It is denoted by X P (λ)
Examples: 1) The typing mistakes per page in a book2) The number of accidents on a road in a particular time3) The number of telephone calls received by an operator
EXPONENTIAL DISTRIBUTIONDefinition: A continuous random variable X is said to follow exponential distribution with parameter θ if its probability density function is given by
f(X) = θ. e−θX ; X≥0; θ>0 = 0 ; otherwise
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 24
![Page 25: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/25.jpg)
QTBD 2013
NORMAL DISTRIBUTIONDefinition: A random variable X is said to have a Normal distribution with parameters µ and 𝞼 if its probability density function is given by
f(X; µ, 𝞼) = 1
σ . 2√π exp {( 12 ) .(( X−µ)
σ
2
)} ;- ∞< X < ∞ ; - ∞< µ< ∞ ; 𝞼 > 0 = o ; otherwiseSTANDARD NORMAL VARIATEIf X N (µ,σ 2) the if we put Z = ( X−µ)
σ in the p.d.f. of the normal distribution
f(X; µ, 𝞼) = 1
σ . 2√π ∫−∞
∞
e−¿ ¿¿¿¿
ᵩ (Z) = 1
σ . 2√π ∫−∞
∞
e−¿ ¿¿¿¿ ; - ∞< Z < ∞whereᵩ (Z) = The p.d.f. of standard normal variate.
PROBLEMS ON BINOMIAL DISTRIBUTION:PROBLEM -1
The probability of a defective bolt is 0.2. Find i) Mean ii) Standard Deviation for the distribution of bolts of 400.SOLUTION: Given that n= Number of trials = 400
P= Probability of success = Probability of getting a defective bolt = 0.2 Q = 1-P = 1-0.2 = 0.8
i) Mean = np = 400(0.2) = 80 ii) Variance = npq standard deviation =√npq =
√400 (0.2 )(0.8) =√64 =√8PROBLEMS ON POISSON DISTRIBITION:PROBLEM -1
Average number of accidents on any day on a national highway is 1.8. Determine the probability that the numbers of accidents are i) At least one ii) At most one.Solution: Given that mean = λ= 1.8The mean of Poisson distribution is
P (X) = e−λ λX
X ! = e
−1.81.8X
X !→ 1
i) The probability that the number of accidents are at least one is
P (X≥1) = 1- p(X<1) = 1- p(X=0) = 1-[ e−1.81.80
0 ! ] = 1-(e−1.8 ) = 1- 0.1653 = 0.8347
ii) The probability that the number of accidents are at most one is
P (X ≤ 1) = P (X =0) + P (X=1) = [ e−1.81.80
0 ! ] + [ e−1.81.81
1! ] = e−1.8 + e−1.8 (1.8)
= e−1.8 (1+1.8) = (0.1653). (2.8) = 0.4628PROBLEMS ON EXPONENTIAL DISTRIBUTION:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 25
![Page 26: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/26.jpg)
QTBD 2013
PROBLEM -1The time taken by a person while speaking over a telephone is exponential distribution with
mean 4 minutes. Find i) The probability that he speaks for more than 6 minutes but less than 7 minutes.ii) Out of 6 calls he makes, what is the probability that exactly 2 calls taken him more than
3 minutes.iii) How many calls out of 100 are expected to take more than 3 minutes each?
Solution: Let t= the time taken (in minutes) per call.Given that X exponential distribution with mean 4 minutes.
f(X) = 14
. e−14
X ; X≥0;θ >0 → 1
i) P (The time taken for one call is between 6 and 7 minutes)
= P (6<X<7) = ∫6
7
f ( x )dx = ∫6
714
. e−X4 =14
∫6
7
e−X4 .dx =
14
¿¿
= [−e−(14)]67
= [−e−( 74)+e
−( 64)] = 0.04936
ii) P (The time taken for 2 calls is more than 3 minutes)
= P(X>3) = P (3<X<∞) = ∫3
∞
f ( x )dx = ∫3
∞14
. e−X4 =14
∫3
∞
e−X4 .dx
= 14
¿¿ = [−e−∞+e−3/4 ] = [0+e−3 /4 ] = 0.4724
Expected number of calls out of 100 that will be longer 3 minutes each=100XP(X>3) = 100(0.4724) = 47.24
PROBLEMS ON NORMAL DISTRIBUTION:Problem -1
If X is a Normal variate with mean 30 and standard deviation 5. Find the probabilities thati) 26≤X≤40 ii) X≥45Solution: Given that Mean = µ = 30 and S.D. =𝞼=5
i) When X = 26 ⟹Z= X−μ
σ = 26−305
=−45
= -(0.8) = - Z1
When X= 40 ⟹Z= X−μ
σ = 40−305
=−105
= 2 = Z2∴ P (26≤X≤40) = P (-0.8≤Z≤2) = P ( Z2 ) + P (Z1) = P (2) + P (-0.8)(From the normal table we have P (2) =0.4772 & P (0.8) = 0.2881)
=0.4772 + 0.2881 = 0.7653 ⟹ P (26≤X≤40) =0.7653
ii) When X=45 ⟹ Z= X−μ
σ = 45−305
=155
= 3 = Z1
∴ P (X≥45) = P (Z1≥ 3) = 0.5 – P (Z1≤ 3) = 0.5- 0.49865
JOINT PROBABILITY MASS FUNCTION:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 26
![Page 27: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/27.jpg)
QTBD 2013
Definition: Let XY are 2 random variables defined on same probability space S. W.r.to 2 image
sets X(S) = , {x1 , x2….. x i ,…. xn } and Y(S) ={ y1 , y2 ,… .. , y j , …. ym }. Then the product of sets
X(S). Y(S) = {x1 , x2… .., x i ,…. xn } X{ y1 , y2 , … .., y j , …. ym } . The probability of the ordered pair
(x i , y i ¿ is defined as P(X =x i , Y= y j). Then the above product of sets defined on a probability space and it is given by.
pij = P(X =x i , Y= y j) = PXY (x, y) = P (x i , y j )
Then P (x i , y j ) is known as joint Probability mass function of X & Y. The values of P (x i , y j ) can be represented in the following table.
X\ Y y1 y2 y3......... y j.......... ym Totalx1 p11 p12 p13 p1 j p1m p1.x2 p21 p22 p23 p2 j p1m p2.x3...
p31 p32 p33 p3 j p3m p3.
x i
.
.
.
pi1 pi2 pi3 pij pℑ pi .
xn pn1 pn2 pn3 pnj pnm pn .
Total p.1 p.2 p.3 p. j p. m ∑i=1
n
❑∑j=1
m
pij = 1
Marginal probability mass function:Definition: Let (X,Y) be a bi-variate random variable and P (X,Y) be the probability mass function of a bi-variate random variable (X,Y).
The Marginal probability mass function of X is denoted by P (X) or PX (x) and is given byP (X) = PX (X=x) = P (X= xi ∩ Y = y1) + P (X= xi ∩ Y = y2) +....+ P (X= xi ∩ Y = yj) +....+ P (X= xi ∩ Y = ym)
= P (xi, y1) + P (xi, y2) +.....+P (xi, yj) +....+ P (xi, ym) = Pi1+ Pi2 +....+ Pij+.....+ Pim = ∑j=1
m
pij
=∑j=1
m
P ¿) = Pi . = PX (x)
The Marginal probability mass function of Y is denoted by P (Y) or PY (y) and is given byP (Y) = PY (Y= y j) = P (X= x1 ∩ Y = yj) + P (X= x2 ∩ Y = yj) +....+ P (X= xi ∩ Y = yj) +....+ P (X= xn ∩Y = yj)
= P (x1, yj) + P (x2, yj) +.....+P (xi, yj) +....+ P (xn, yj) = Pi1+ Pi2 +....+ Pij+.....+ Pim = ∑j=1
m
pij
=∑j=1
m
P ¿) = Pi . = PX (x)
MatrixDefinition: A system of mn numbers (real or complex) arranged in the form of an ordered set of m rows, each row consisting of an ordered set of n numbers between [ ]∨()∨||is called a matrix of order of type mXn.
Each of mn numbers consisting of mXn matrix is called an element of the matrix.
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 27
![Page 28: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/28.jpg)
QTBD 2013
A = a11 a12......a1n = [aij ]mXn where 1≤i≤m ; 1≤j≤n
a21 a22......a2n : : : : : : am1 am2....amn
In relation to matrix we call the numbers as scalars.Operations of Matrices:Equal matrices:Definition: Two matrices A = [aij ] and B= [bij ] are said to be equal if and only if
i) A and B are of the same type ii) a ij =b ij for every i & jMultiplication of a matrix by a scalarDefinition: Let A be a matrix. The matrix obtained by multiplying every element of A by k, a scalar is called the product of A by k and is denoted by kA or Ak
If A = [aij ]mXn then Ka = [k aij ]mXn = k [aij ]mXn = kAProperties:
i) OA = O (Null matrix ), (-1) A = (-A) called the negative of Aii) k 1(k 2 A) = ¿ k 2) A = k 2(k1 A) where k 1k2 are scalars.iii) kA = O ⟹ A = O if k≠0iv) k 1 A=k2 A and A is not a null matrix ⟹k 1=k2
Addition of matrices:Definition: Le A = [aij ]mXn and B= [bij ]mXnbe 2 matrices. The matrix C = [Cij ]mXn
Where C ij =a ij + b ij is called the sum of matrices A & B is denoted by A+B
Thus [aij ]mXn + [bij ]mXn = [aij+bij ]mXn = [aij ]mXn + [bij ]mXn
Differences of matrices:Definition: If A&B are matrices of the same type then A + (-B) is A-B.Matrix Multiplication:Definition: let A = [aik ]mXn and B= [bkj]nXpbe 2 matrices. The matrix C = [Cij ]nXp
Where C ij= ∑k =1
n
aik bkj is called the product of the matrices A&B in that order we can write
C = A+BTypes of Matrices:
1) Square Matrix: If A = [aij ]mXn and m=n , then A is called a square matrix. A square matrix A of order (nXn) is sometimes called as a “n-rowed matrix A”.
Example: A = [1 12 2] is called 2nd order matrix.
2) Rectangular Matrix: A matrix which in not a square matrix is called a rectangular matrix.
Example: A = [ 1 −1 22 3 4
] is a (2X3) matrix.
3) Row Matrix : A matrix of order (1Xm) is called a row matrix.Example: A = [123 ](1 X 3)
4) Column Matrix : A matrix of order (nX1) is called a column matrix.
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 28
![Page 29: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/29.jpg)
QTBD 2013
Example: A = 112(3 X 1)
5) Unit Matrix : If A = [aij ]mXn such that a ij = 1 for i = j and a ij = 0 for i ≠ j, then A is called a unit
matrix. It is denoted by I n
Example: I 2 = 1 00 1
I 3 = 1 0 00 1 00 0 1
6) Null Matrix (or) Zero Matrix : If A = [aij ]mXn such that a ij = 0 for ∀ i&j , then A is called a Zero matrix or a null matrix. It is denoted by O.
Example: O = 0 0 00 0 0(2 X3 )
Definitions:1) Diagonal Elements
Definition: In a matrix A = [aij ]mXn , the elements a ij of A for which i =j
(i.e.a11,a22,...,ann) are called diagonal elements of A.2) Principle Diagonal
Definition: The line along which the diagonal elements line is called the principle diagonal of A.
3) Diagonal Matrix Definition: A square matrix all of whose elements except those leading diagonal are zero is called diagonal matrix. Ifd1,d2 ,.....,dn are diagonal elements of a diagonal matrix A, then A is
written as A = diag (d1,d2 ,.....,dn )
Example: A = diag (3, 1,-2) = 3 0 00 1 10 0 −2
4) Scalar Matrix :Definition: A diagonal matrix whose leading elements are equal is called a’’ scalar matrix’’.
Example: A = 3 0 00 3 00 0 3
CURVE FITTING Types of Cure Fitting:
Fitting of Straight Line Fitting of Second degree parabola Fitting of Exponential Curve Fitting of Power Curve1) Fitting of Straight Line:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 29
![Page 30: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/30.jpg)
QTBD 2013
Let us consider the fitting of a straight line Y = a + b X→ ①
To a set of n points (x i , y j ); i=1,2,....,n. The equation 1 represents a family of straight lines for a different values of arbitrary constants a and b. The problem is to determine a and b so that the line is the line of the best fit.
The best fit can be obtained with Legend’s principle of least square. Which consists in minimising the sum of squares of the deviations the actual values of y from their estimated values is given by the line of best fit.
Let pi (x i , y i ¿ be any general point in the scatter diagram. Draw pi M ⊥ to X axis meeting
the line in H i. Since H i lies on straight line its ordinate is a +b X i. Hence the co-ordinates of H i are
[x i , (a +b X i) ]
pi H i = pi M -H iM = y i- (a +b X i) ⟹e i =y i- (a +b X i) → ②
Here e i is called error of estimate or “residual” of y i . According to the principle of least square, we have to determine a & b so that
E =∑i=1
n
ei2 = ∑
i=1
n
( y i−a−b X i)2 is minimum → ③
From the principle of maxima and minima, the partial derivatives of E w.r.to a & b and
equating them to zero. i.e. ⟹ dEda
= 0 ⟹ dEdb
= 0⟹ dE
da = 0 ⟹ dE
da ∑i=1
n
( y i−a−b X i)2 = 0
⟹ 2 ∑i=1
n
( y i−a−b X i)2−1 (-1) = 0
⟹ ∑i=1
n
( y i−a−b X i)1 = 0 ⟹∑
i=1
n
yi –∑i=1
n
a - b ∑i=1
n
x i = 0 ⟹∑
i=1
n
yi – n.a- b ∑i=1
n
x i = 0 ⟹∑i=1
n
yi = n.a +b ∑i=1
n
x i → ④
⟹ dEdb
= 0 ⟹ dEdb
∑i=1
n
( y i−a−b X i)2 = 0
⟹ 2 ∑i=1
n
( y i−a−b X i)2−1 (-X i) = 0
⟹ ∑i=1
n
( y i−a−b X i)1(−x i) = 0 ⟹∑
i=1
n
( xi ) .¿¿¿) –a∑i=1
n
(X i) - b ∑i=1
n
x i2 = 0
⟹∑i=1
n
( xi ) .( y¿¿ i)¿ –a∑i=1
n
(X i) - b ∑i=1
n
x i2 = 0
⟹∑i=1
n
( xi ) .( y¿¿ i)¿ ¿a∑i=1
n
(X i)+¿b ∑i=1
n
x i2 →⑤
Normal Equitation’s: The Normal equations for straight line equation are∑i=1
n
yi = n.a +b ∑i=1
n
x i → ④∑i=1
n
( xi ) .( y¿¿ i)¿ ¿a∑i=1
n
(X i)+¿b ∑i=1
n
x i2 →⑤
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 30
![Page 31: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/31.jpg)
QTBD 2013
After solving these Normal equations we get the values of a & b with these values of a & b, put these values in equation 1, then it is called line of Best fit to the given set of points (x i , y i ¿I=1,2....,n
The given set of on n points is Y =a +b X
2. FITTING OF SECOND DEGREE PARABOLA:-
Let Y=a+bXi+cXi2①
be a 2nd degree parabola to be fitted to the given set of observations (Xi,Yi) (i=1,2,3,………..,n)
According to principle of least square technique to determine the constants a, b, c consider the residual.
ei= y i - y② y=¿ a+bXi+cXi2
ei=yi-(a+bXi+cXi2) ③
Taking summation & squaring on both sides to eq (3).
E=∑ ei2=∑
i=1
n
( y i−a−b x i−c x i2)2 ④
Taking partial derivatives w.r.to parameters a, b, c and equating them to ‘0’ then we get “normal equations”
The normal equations for the second degree parabola are
dEda
=0 dda
(E )= dda
¿
2∑i=1
n
( y i−a−b x i−c x i2)2(-1) =0 ∑
i=1
n
( y i−a−b x i−c x i2)2=0
∑i=1
n
yi−∑i=1
n
a−b∑i=1
n
x i−c∑i=1
2
x i2=0
∑i=1
n
yi=na−b∑i=1
n
x i−c∑i=1
2
xi2⑤
dEdb
=0 ddb
¿
2¿
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 31
![Page 32: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/32.jpg)
QTBD 2013
∑i=1
n
( y i−a−b x i−c x i2) ( xi )=0
∑i=1
n
x i y i−a∑i=1
n
x i−b∑i=1
n
xi2−c∑
i=1
n
x i3=0
∑i=1
n
x i y i=a∑i=1
n
x i−b∑i=1
n
xi2−c∑
i=1
n
x i3⑥
dEdc
=0 ddc
∑i=1
n
( y i−a−b x i−c x i2 )2=0
2¿
∑i=1
n
( y i−a−b x i−c x i2) ( x i
2)=0
∑i=1
n
x i2 y i−a∑
i=1
n
x i2−b∑
i=1
n
x i3−c∑
i=1
n
x i4=0
∑i=1
n
x i2 y i=a∑
i=1
n
x i2−b∑
i=1
n
x i3−c∑
i=1
n
x i4⑦
NORMAL EQUATIONS OF SECOND DEGREE PARABOLA
∑i=1
n
yi=na−b∑i=1
n
x i−c∑i=1
2
xi2
∑i=1
n
x i y i=a∑i=1
n
x i−b∑i=1
n
xi2−c∑
i=1
n
x i3
∑i=1
n
x i2 y i=a∑
i=1
n
x i2−b∑
i=1
n
x i3−c∑
i=1
n
x i4
After solving these normal equations we get the estimated values of a,b,c. substituting these estimated values in eq(1) then resulting equation is called “best fit” for the given set of data.
Y= a+b x+c x2
3. FITTING OF EXPONENTIAL CURVE Y = ab x
Let Y=abx ①
Taking logarithm on both sides we get
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 32
![Page 33: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/33.jpg)
QTBD 2013
log ( y)=log(a .b¿¿ x)=log a+log bx=loga+ x log(b)¿
[∵log x m=m log x ∵ log (m. n) = log m + log n]
U=A+Bx ②
Where U=log y, A=log a, B=log b
This is a linear equation in x and U
The normal equations for estimating A & B are
∑U=nA+B∑ x ③
∑ xU=A∑ x+B∑ x2 ④
After solving these normal equations we get the A & B values. Finally we get a, b values as follows
a=Anti log (A)
b=Anti log (B)
Substitute these a & b values in eq ① then we get “best fit” to the given set of ‘n’ points.
The best fit of the required equations is y= a bx
4. FITTING OF EXPONENTIAL CURVE Y = ae bx
Let Y=aebx →①
Taking logarithm on both sides to eq(1) ,then we get
Log y=log[aebx]log y=log a + log ebx log y=log a + bx log e
log y=log a +x [b log e]
U=A+Bx 2
Where U=log y, A=log a, B=b log e
This is a linear equation in x and U
The normal equations are:-
∑U=nA+B∑ x ③
∑ xU=A∑ x+B∑ x2④
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 33
![Page 34: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/34.jpg)
QTBD 2013
From these we find A and B are consequently
a=Anti log (A) and B=b[log e]Blog e
=b b= Bloge
= B0.4343
The best fit to the given set of ‘n’ points is
y= a e b x
5. FITTING OF A POWER CURVE Y=ax b
Let y=axb ①
Taking logarithm on both sides to eq(1), then we get
Log y=log[axb]log y=log a+ log [xb]log y=log a+ b log x
log y=log a+ log x
U=A+ Bv ②
Where U=log y, A=log a, v=log x
This is a linear equation in v and U
The normal equations are
∑U=nA+b∑ v ③
∑Uv=A∑ v+b∑ v2 ④
From these we find A and B consequently
a=Anti log (A) b=B
The best fit to the given set of ‘n’ points is y= a(x b)
1. PROBLEMS ON FITTING OF STRAIGHT LINE:
Problem – 1 Fit a straight line to the following data.
X 1 2 3 4 6 8Y 2.4 3 3.6 4 5 6
Solution:
The straight line equation is
Y = a + b X→ ①The normal equations for straight line are
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 34
X Y X2 XY
1 2.4 1 2.4
2 3.0 4 6.0
3 3.6 9 10.8
4 4.0 16 16.0
6 5.0 36 30.0
8 6.0 64 48.0
∑ X= 24
∑Y= 24
∑ X2
= 130∑Y 2
= 113.2
![Page 35: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/35.jpg)
QTBD 2013
∑i=1
n
yi = n.a +b ∑i=1
n
x i → ②
∑i=1
n
( xi ) .( y¿¿ i)¿ ¿a∑i=1
n
(X i)+¿b ∑i=1
n
x i2 → ③
From the above table we have
= 24 ∑Y = 24 ∑ X2=130 ∑ X2= 113.2
24 =6 (a) + b (24) → ④ X 4
113.2=a (24) + b (130) → ⑤
24 (a) + 96 (b) = 96
24 (a) + 130 (b) = 113.2
34 (b) = 17.2 ⟹b = 17.234
⟹b = 0.5059
Substitute b in eq 4⟹6 (a) + 24 (0.5059) = 24 ⟹6 (a) + 12.1416 = 24 ⟹6 (a) = 24 – 12.1416⟹6 (a) = 11.8584 ⟹a=11.8584
6 ⟹a =1.9764
∴ a = 1.9764 & b = 0.5059
Hence the required equation of straight line is
Y = a + b X ⟹Y = 1.9764 + (0.5059) XProblems on second degree parabola:
Problem -1 Fit a parabola of second degree to the following data.
X 0 1 2 3 4Y 1 1.8 1.3 2.5 6.3
Solution:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 35
∑ X
![Page 36: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/36.jpg)
QTBD 2013
From the table we have
∑ X=10 ∑Y =12.9 ∑ X2 =30 ∑ X3 =100 ∑ X4 =354 ∑ X2Y =130.3
The second degree parabola equation is
Y=a+bXi+cXi2①
The normal equations for 2nd degree parabola are
∑i=1
n
yi=na−b∑i=1
n
x i−c∑i=1
2
xi2②
∑i=1
n
x i y i=a∑i=1
n
x i−b∑i=1
n
xi2−c∑
i=1
n
x i3③
∑i=1
n
x i2 y i=a∑
i=1
n
x i2−b∑
i=1
n
x i3−c∑
i=1
n
x i4④
⟹12.9 =5 (a) + b (10) + c (30) ⑤
⟹37.1=a (10) + b (30) + c (100) ⑥
⟹130.3 =a (30) +b (100) + c (354) ⑦
From ⑤∧¿ ⑥we have From ⑥&⑦ we have
5 (a) + b (10) + c (30) =12.9 X 2 10 (a) + 30 (b) + 100 (c) = 37.1 X3
a (10) + b (30) + c (100) =37.1 30 (a) + 100 (b) + 354 (c) = 130.310 (a) + 20 (b) + 60 (c) =25.8 30 (a) + 90 (b) + 300 (c) = 111.3K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 36
X Y X2 X3 X 4 XY X2Y
0 1 0 0 0 0 0
1 1.8 1 1 1 1.8 1.8
2 1.3 4 8 16 2.6 5.2
3 2.5 9 27 81 7.5 22.5
4 6.3 16 64 256 25.2 100.8
∑ X=10
∑Y= 12.9
∑ X2=30
∑ X3=100
∑ X4
=354∑ XY=37.1
∑ X2Y=130.3
![Page 37: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/37.jpg)
QTBD 2013
10 (a) + 30 (b) + 100 (c) =37.1 30 (a) + 100 (b) + 354 (c) = 130.3-10 (b) – 40 (b) = - (11.3) - 10 (b) – 54 (c) = - 19
⟹10 (b) + 40 (c) = 11.3 ⑧ 10 (b) + 54 (c) = 19 ⑨
From ⑧ & ⑨ substituting c = 0.55 in eq ⑥
10 (b) + 40 (c) = 11.3 ⟹ 10 (b) + 40 (0.55) = 11.310 (b) + 54 (c) = 19 ⟹ 10 (b) + 22 = 11.3
14 (c) = 7.7 ⟹ 10 (b) = 11.3 - 22⟹ c=7.7
14 = 0.55 ⟹ c= 0.55 ⟹ 10 (b) = - 10.7 ⟹ b = −10.7
10 = - 1.07
Substituting b = - 1.07 & c= 0.55 in eq ⑤
5 (a) + 10 (-1.07) + 30 (0.55) = 12.9 ⟹ 5 a – 10.7 + 16.5 = 12.9 ⟹ 5 a = 12.9 + 10.7-16.5⟹ 5 a = 23.6 – 16.5 ⟹ 5 a = 7.1 ⟹ a = 7.1
5 = 1.42 ⟹ a = 1.42
∴ a = 1.42 b = - (1.07) c = 0.55Thus the required equation of the second degree parabola is Y = a +b X + c X2 ⟹ Y = 1.42 – 1.047 (X) + 0.55 (X2 ) PROBLEMS ON POWER CURVE Y = a x b :
Problem – 1 For given data fit a power curve of the type Y = a xb
X 1 2 3 4 5 6Y 6.2 8.3 15.4 33.1 65.2 127.4
Solution:
X Y U i= log Y V i= log X V i2 U i V i
1 6.2 0.7924 0 0 02 8.3 0.99191 0.3010 0.0906 0.27663 15.4 1.1875 0.4771 0.2276 0.56654 33.1 1.5198 0.6020 0.3624 0.91495 65.2 1.8142 0.6990 0.4886 1.26816 127.4 2.1052 0.7781 0.6054 1.6380
Total Σ U i=8.3382 Σ V i=2.8572 Σ V i2=1.7746 Σ U iV i
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 37
![Page 38: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/38.jpg)
QTBD 2013
=4.6641Let power curve be Y = a xb ①
Taking logarithm on both sides, then we get
log y=log[a xb] log y=log a+ log x U=A+ B v ②
The normal equations are
∑U=nA+b∑ v ③
∑Uv=A∑ v+b∑ v2 ④
8.3382 = 6 (A) + B (2.8572) ⑤
4.6641 = A (2.8572) + B (1.7746) ⑥
Solving these equations we get
A B 1
2.8572 -8.3382 6 2.8572
1.7746 -4.6641 2.8572 1.7746
A
[ (2.8572 ) (−4.6641 ) ]−[(−8.3382 ) (1.7746 )]= B
[ (−8.3382 ) (2.8572 ) ]−[6 (−4.6641 )]= 1¿¿
⟹ A
−13.326+14.7970 = B−23.8239+27.9846 = 1
10.6476−8.1636
⟹ A
1.471 = B4.1607
= 12.484
⟹A = 1.4712.484
= 0.5921 ⟹B = 4.16072.484
= 1.675⟹ a = Anti log (A) = Anti log (0.5921) = 3.9093 ⟹a = 3.9093
⟹b = B = 1.675 ⟹ b = 1.675Substituting a & b in equation we get the best fit of power curve①
Hence for the given data, the fitted power curve is⟹ Y = a X b ⟹Y = (3.9093) X (1.675) PROBLEMS ON EXPONENTIAL CURVE Y = a e bx Problem -2 Fit an exponential curve of the form Y = a ebx for the following data
X 1 2 3 4 5 6K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 38
![Page 39: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/39.jpg)
QTBD 2013
Y 1.4 4.1 13.2 39.3 125 303Solution:X Y U = log Y X2 XU1 1.4 0.1461 1 0.14612 4.1 0.6128 4 1.22563 13.2 1.1206 9 3.36184 39.3 1.5944 16 6.37765 125 2.0969 25 10.48456 303 2.4814 36 14.8884ΣX=21 Σ U= 8.0522 Σ X2 = 91 Σ XU =36.484The exponential curve is Y= a ebx →①
Taking logarithm on both side
⟹log y=log[a ebx] log y=log a + log ebx log y=log a + b x log e
log y=log a +x [b log e] U=A +B X ②
Where U=log y, A=log a, B=b log e
The normal equations are:-
∑U=nA+B∑ x ③
∑ xU=A∑ x+B∑ x2④
From the table we have
ΣX=21 Σ U= 8.0522Σ X2 = 91 Σ XU =36.4848.0522 = 6 (A) + B (21) →⑤
36.484 = 21(A) + B (91) →⑥
A B 121 -18.0522 6 2191 -36.484 21 91
⟹ A(−766.164+732.7502) = B
(−169.0962+218.904 ) = 1(546−441)
⟹ A−33.4138 = B
49.8078 = 1
105 ⟹A=−33.4138
105 = - 0.3182 ⟹B = 49.8078
105 = 0.4744
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 39
![Page 40: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/40.jpg)
QTBD 2013
a = Anti log (A) = Anti log (-0.3182) = 0.4806b = B
log10e =
B0.4343
= 0.47430.4343
= 1.0921
Substituting a = 0.4806 & b = 1.0921 in equation ①, then we get the best fit of the given curve.
Hence for the given data the fitted exponential curve is
⟹ Y = a e b ⟹Y = (0.4806) e (1.0921 ) X
PROBLEMS ON FITTING OF EXPONENTIAL CURVE Y = ab x Problem -1 Fit an exponential curve of the form Y = abx for the following data
X 1 2 3 4 5 6 7 8Y 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1Solution: Let Y=abx ①
Taking logarithm on both sides we get
log ( y)=log (a .b¿¿ x)=log a+log bx= loga+ x log (b)¿
U=A+Bx ②
Where U=log y, A=log a, B=log b
The normal equations for estimating A & B are
∑U=nA+B∑ x ③
∑ xU=A∑ x+B∑ x2 ④
X Y U = log Y XU X21 1.0 0 0 12 1.2 0.0792 0.1584 43 1.8 0.2553 0.7659 94 2.5 0.3979 1.5916 165 3.6 0.5563 2.7815 256 4.7 0.6721 4.0326 367 6.6 0.8195 5.7365 498 9.1 0.9590 7.6720 64ΣX = 36 Σ Y = 30.5 Σ U = 3.7393 Σ XU = 22.7385 Σ X2=204
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 40
![Page 41: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/41.jpg)
QTBD 2013
From the above table we have
ΣX = 36 Σ Y = 30.5 Σ U = 3.7393 Σ XU = 22.7385 Σ X2=204
3.7393 = 8 (A) + B (36) ⑥ X 36 ⟹288 (A) + 1296 (B) = 134.614822.7385 = A (36) + B (204) ⑦ X 8 ⟹288 (A) + 1632 (B) = 181.908
336 (B) = 47.2932
⟹ B = 47.2932336
= 0.1407 ⟹ B = 0.1408Substituting B in equation ⑥⟹8 (A) + 36 (0.1408) = 3.7393 ⟹ 8(A) + 5.0688 = 3.7393 ⟹ 8(A) = 33.7393-5.0652⟹ 8 (A) = 1.3295 ⟹A = 1.3295
8 = 0.1662 ⟹ A = 0.1662
⟹ a = Anti log (A) ⟹a = Anti log (0.1662) =0.6821 ⟹ a = 0.6821⟹ b = Anti log (B) ⟹ b = Anti log (0.1408) = 1.383 ⟹ b = 1.383
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 41
![Page 42: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/42.jpg)
QTBD 2013
CORRELATION Uni- variate Distribution Bi-variate Distribution Multi – variate Distribution
1. Uni – variate Distribution: The distribution involving only one variable is called “uni-Variate distribution “.
Example: The heights of certain group of persons.2. Bi – variate Distribution: The distribution involving only 2 variables is called “ Bi-
Variate distribution “.Example: The heights and weights of certain group of persons.
3. Multi- variate Distribution: The distribution involving 2 or more than variables is called“Multi – variate distribution “.
Correlation: Definition 1 If the change in one variable effects a change in the other variable, then
Variables are said to be “correlated variables”. Definition 2 Correlation is an analysis of the ‘co-variation’ between 2 or more variables. Types of Correlation: Positive Correlation (or) Direct Correlation Negative Correlation (or) Inverse Correlation Perfect Correlation1) Positive Correlation: Definition 1 If the variables deviate in same direction then the variables are to be
“Positive correlation”. Definition 2 In another words, if the increase in the value of one variable is accompanied
by increase in the value of other value or a decrease in the value of one variable is accompanied by the decrease in the other variable, then the variables are said to be “Directly correlated variables”. Examples: 1) Price & Supply of goods. 2) Income & Expenditures of a group of persons.
2) Negative Correlation: Definition 1 If the variables deviate in opposite direction then the variables are to be
“Negative correlation”. Definition 2 In another words, if the increase in the value of one variable is accompanied
by decrease in the value of other value or a decrease in the value of one variable is accompanied by the increase in the other variable, then the variables are said to be “Directly correlated variable”. Examples: 1) Volume & pressure of a perfect gas. 2) Price & Demand of goods.
3) Perfect Correlation:
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 42
![Page 43: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/43.jpg)
QTBD 2013
Definition: If the deviation in one variable is followed by a corresponding and proportional deviation in the other variable, then the variables are said to be “perfectly correlated variables”.
Linear Correlation: Definition: If the ‘ratio’ of the change is ‘uniform’, then there will be “linear correlation”
between the variables. If we plot these on the graph then we get a ‘straight line’.Example: We can see that ‘ratio of the change between the variables is same.
A 2 7 12 17B 3 9 15 21
Linear Correlation: Definition: The amount of change of one variable does not bear a constant ratio of the
amount of change in the other variables, and then the correlation is called “Non- linear correlation”. Non-linear correlation is also called ‘Curvy linear correlation’.
Uses (or) Applications of Correlation:1) Correlation is a measure of extent of relation between 2 variables.2) By using the correlation coefficient we can predict the future.3) Correlation coefficient will contribute the economic behaviour.4) By using the correlation coefficient we can find the value of variable if the value of another
variable has given. Perfect Linear Correlation:
Definition: If the all points lie exactly on the “straight line”, then the correlation is said to be “perfect linear correlation”.
Perfect Positive Correlation: Definition: If the correlation is linear and the line runs from lower left hand corner to the upper right hand corner. Then the correlation is called “perfect positive correlation “.It is denoted by r = +1 or r = -1.
Perfect Negative Correlation: Definition: If the correlation is linear and the line runs from upper left hand corner to lower right hand corner. Then the correlation is called “perfect negative correlation.
No Correlation: If the plotted points lie scattered all over graph paper, then there is no correlation
between 2 variables. And the variables are said to be “Statistically independent”. If r = 0 the variables X & Y are said to be “Independent”.
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 43
No correlation No correlation
Perfect +ve correlation Perfect – ve correlation
![Page 44: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/44.jpg)
QTBD 2013
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Methods of Studying Correlation: There are 2 different methods for finding out the relationship between the
Variables.1) Graphical Method 2) Mathematical Method1) Graphical Method:
a) Scatter Diagram b) Scatter gram2) Mathematical Method:
a) Karl Pearson’s Correlation Coefficient.b) Spearman’s Rank Correlation.c) Coefficient of Concurrent Deviation.d) Methods of Least Squares.
Mathematical Method: a) Karl Pearson’s Correlation Coefficient:
As a measure of ‘intensity’ or ‘degree’ of linear relationship between 2 variables, Karl Pearson’s, a British Bio-metrician, developed a formula called “correlation coefficient”.
Correlation coefficient 2 variables X & Y, usually denoted by r (x, y) or r XY and is given by
r (x, y) = r XY = cov (x , y )√ x .√ y
→1
Where Cov (X,Y) = E{ (X-E(x) (Y-E(Y) } =E { (X-X ) (Y-Y ) } = E(XY) -X Y = 1n
¿XY) -X Y
V(X)= E {( X−E( X))2} = E { X2 - X2} = E(X2) – E(X2 ) = 1n
𝞢X2 - X2
V(Y)= E {(Y −E(Y ))2} = E { Y 2 - Y 2} = E(Y 2) – E(Y 2 ) = 1n
𝞢Y 2 - Y 2
r (x, y) = 1n¿¿
Properties of Correlation Coefficient: 1) Limits for correlation coefficient lies between -1 & +1.
i.e. -1 ≤ r (x, y) ≤ +1.2) Correlation coefficient is independent of change of origin & scale.3) Two independent variables are un-correlated. Its converse need not be true.
Regression: Definition: “Regression Analysis” is a mathematical measure of average relationship between 2 or more variables in terms of the original units of the data.In regression Analysis there 2 types of variables, dependent variable & independent variable. The variable whose value is ‘influenced’ or is to be ‘predicted’ is called ‘Dependent variable’The variable which ‘influences’ or is used for ‘prediction’ is called “independent variable”.
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 44
![Page 45: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/45.jpg)
QTBD 2013
Lines of Regression: The line of regression is the line which gives the best estimate to the of one variable
for any specific value of the other variable. Thus the line of regression is the line of ‘best fit’, Which can be obtained by using “principle of least square “technique.
Linear Regression: If the points in the scatter diagram are a straight line, then it is called “linear
Regression”. Non-Linear Regression:
If the points in the scatter diagram is a curve, then is is called “non-linear Regression” or “curvy-linear regression”.
Curve of Regression: If the variables in a bi- variate distribution are related, we find that the points in the
Scatter diagram will cluster round some curve is called “curve of regression”.Let us suppose that in the bi- variate distribution (x, y) i= 1, 2, ...., n where
X= independent variable Y = dependent variable. Let the line of the regression Y on X beY = a + b X → 1According to the principle of least squares, the normal equations for estimating
a & b are
∑i=1
n
yi = n.a +b ∑i=1
n
x i →2 ∑i=1
n
( xi ) .( y¿¿ i)¿ ¿a∑i=1
n
(X i)+¿b ∑i=1
n
x i2→3
Regression Equations :1) Regression Equation Y on X 2) Regression Equation X on y
Regression Equation Y on X :Since b is the ‘slope’ of the line of regression of Y on X. And since the line of
Regression passes through the point (x , y ), and its equation is
Y - y = b (X - x ) ⟹ Y - y = r [ σ x
σ y] (X - x )
Where b yx = r [ σ y
σ x] = The regression coefficient Y on X r = correlation coefficient
Regression Equation Y on X :The regression equation X on Y is given by
(X - x )= b (Y - y ¿ ⟹(X - x ) = r [ σ x
σ y] Y - y
Where bxy = r [ σ x
σ y] = The regression coefficient Y on X r = correlation coefficient
Regression Coefficients :The slope of the regression is called “coefficient of regression”. The coefficient of
regression Y on X indicates the change in the value of variable Y corresponding to a unit change in the value of variable x and is given by
b yx = r [ σ y
σ x] = The regression coefficient Y on X ⟹b yx = r [ σ y
σ x]
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 45
![Page 46: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/46.jpg)
QTBD 2013
Similarly, the coefficient of regression X on Y indicates the change in the value ofVariable X corresponding to a unit change in the value of variable Y and is given by
bxy = r [ σ x
σ y] = The regression coefficient Y on X ⟹bxy = r [ σ x
σ y]
Properties of Regression Coefficient :1) The Geometric mean (G.M.) of regression coefficient is equals to the correlation
coefficient.√ (bxy ) .(b¿¿ yx)¿ = r
2) If one of the regression coefficients is greater than the unity, then other must be less than unity. i.e.bxy≤ 1 ⟹b yx≥ 1
3) Arithmetic Mean (A.M.) of the regression coefficients is equals to the correlation
coefficient.12
[bxy +b yx ] ≥ r
4) Regression coefficient is independent of change of origin but not scale.5) The angle between 2 regression lines are
θ = tan−1 {1−r2
r.
σ x2 . σ y
2
σx2+σ y
2 }PROBLEMS ON CORRELATION COEFFICIENT:Problem -1 Calculate the correlation coefficient for the following heights (in inches) of father(X)And their sons (Y)
X 65 66 67 67 68 69 70 72Y 67 68 65 68 72 72 69 71
Solution:X Y X2 Y 2 XY
65 67 4225 4489 435566 68 4356 4624 448867 65 4489 4225 435567 68 4489 4624 455668 72 4624 5184 489669 72 4761 5184 496870 69 4900 4761 483072 71 5184 5041 5112𝞢 X =
544𝞢 Y = 552 𝞢X2= 37028 𝞢Y 2= 38132 𝞢 XY = 37560
From the above table we have 𝞢 X = 544 𝞢 Y = 552 𝞢X2= 37028 𝞢Y 2= 38132 𝞢 XY = 37560 X =
Σ Xn
= 5448
= 68 Y = Σ Yn
= 3528
= 69
The correlation coefficient is given by
r (x, y) = cov (x , y )√ x .√ y
= 1n¿¿ =
375608
− (68 )(69)
√ 370288 −682 .√ 381328 −692
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 46
![Page 47: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/47.jpg)
QTBD 2013
= 4695−4692
√(4628.5−4624 ) .(4766.5−4761) =
3
√(4.5 ) .(5.5) =
3
√(24.75) =
34.9749
= 0.6030
∴ r (x, y) = 0.6030Problem -2
Calculate the correlation coefficient for the following heights (in inches) of father(X)And their sons (Y)
X 65 66 67 67 68 69 70 72Y 67 68 65 68 72 72 69 71
Solution:X Y U =X-68 Y=Y-69 U 2 V 2 UV65 67 -3 -2 9 4 666 68 -2 -1 4 1 267 65 -4 -4 1 16 467 68 -1 -1 1 1 168 72 0 3 0 9 069 72 1 3 1 9 370 69 2 0 4 0 0
72 71 4 2 16 4 8𝞢X=544 𝞢Y=552 𝞢U=0 𝞢V=0 𝞢U 2=36 𝞢V 2=44 𝞢UV=24The correlation coefficient is
r (U,V) = COV (U ,V )σU . σV
→①
⟹U =ΣUn
= 08
= 0 ⟹U =0 ⟹V =ΣVn
=08
=0⟹V =0⟹Cov (U, V) = 1
n UV – (U ,V ) =
248
- (0) (0) =3-0 =3 ⟹ Cov (U, V) =3⟹σ U
2= 1n 𝞢U 2 - U 2 =36
8 = 4.5-0 =4.5 ⟹σ U
2 = 4.5⟹σV
2 = 1n 𝞢V 2 - V 2 =448 = 5.5 -0 =5.5 ⟹σV2 = 5.5
∴ r (U,V) = 3
√4.5−√5.5 = 3
√24.75 = 34.9749
= 0.6030 ⟹r (U,V) =0.6030PROBLEMS ON REGRESSION LINESProblem -1 Price indices of cotton and wool are given below for the 12 months of a year. ObtainThe equations of lines of regression between the indices
Price indexOf cotton (X)
78 77 85 88 87 82 81 77 76 83 97 93
Price Index of wool (Y)
84 82 82 85 89 90 88 92 83 89 98 99
Solution:X Y U = X-84 V = Y-88 U 2 V 2 UV
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 47
![Page 48: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/48.jpg)
QTBD 2013
78 84 -6 -4 36 16 2477 82 -7 -6 49 36 4285 82 +1 -6 1 36 -688 85 +4 -3 16 9 -1287 89 +3 +1 9 1 382 90 -2 +2 4 4 -481 88 -3 0 9 0 077 92 -7 +4 49 16 -2876 83 -8 -5 64 25 40
83 89 -1 +1 1 1 -197 98 +13 +10 169 100 13093 99 +9 +11 81 121 99𝞢 X =
1004𝞢 Y=1061 𝞢U=- 4 𝞢V = +5 𝞢U 2 =488 𝞢V 2 =365 𝞢UV =287
⟹ X =ΣXn
= 100412
=83.67 ⟹X = -83.67 ⟹Y =ΣYn
= 106112
=88.42 ⟹X = 88.42⟹U =
ΣUn
= −412
= -0.34 ⟹U = -0.34 ⟹V =ΣVn
= 512
= 0.42 ⟹V = 0.42r (U,V) = COV (U ,V )
σU . σV
→①
Cov (U, V) = 1n
UV – (U ,V ) = 28712
- (0.34)(0.42) =23.92 – 0.14 =23.78 σ U
2= 1n 𝞢U 2 - U 2 = 488
12 – (0.34)2 = 40.67 – 0.110 = 40.56
σ V2 = 1
n 𝞢V 2 - V 2 = 365
12 – (0.42)2 = 30.42 – 0.18 = 30.24
r (U,V) = 23.78
√6.37 √5.50 = 23.78(6.37 )(5.50) = 23.78
35.03 = 0.6788
The regression equation Y on X is ⟹ Y - y = r [ σ x
σ y] (X - x ) ⟹ (Y- 88.42)= 0.68( 5.506.37 ) ⟹( Y-88.42) = 0.68(0.86)(X-
83.67)⟹ (Y-88.42) = (0.59) (X-83.67)⟹(X - x ) = r [ σ x
σ y] (Y - y ) ⟹ (X-83.67) = 0.68 ( 6.375.50 )⟹(X- 83.67) = 0.68 (1.16)(Y-
88.42)⟹ (X- 83.67) = (0.79) (Y-88.42)
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 48
![Page 49: central tendency and correlation coeeficent](https://reader035.fdocuments.net/reader035/viewer/2022062616/54b588434a795971418b456d/html5/thumbnails/49.jpg)
QTBD 2013
K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 49