central tendency and correlation coeeficent

62
QTBD 2013 UNIT-1 Measures of Central Tendency Definition: Average is a measure which represents the huge volume of data into a single numerical value. An average gives us an idea about the concentration of the values in the central part of the distribution. Averages are the typical values around which the other distribution concentrates. Types of Measures 1) Arithmetic Mean (or) Average 2) Median 3) Mode 4) Geometric Mean 5) Harmonic Mean Characteristics of Measures of central tendency It should be easy to understand and easy to calculate. It should be based on all items. It should be capable for further algebraic calculations. It should be rigidly defined. It should not affected by the extreme observations. It should not affected by the fluctuations of the sampling. Demerits of measures of Central Tendency It can’t be determined by inspection method nor can’t locate by graphically. Arithmetic mean can’t be used for qualitative characteristics, which cannot be measured quantitatively. Ex. Honesty, Intelligence, beauty, etc. Arithmetic mean cannot be used for open ended class-intervals. Ex. below 90 and above 100. Arithmetic mean is affected by extreme values. Arithmetic mean leads to wrong conclusions if the details of the data from which it is computed are given. Arithmetic mean cannot be obtained if the single observation is missing or lost from the remaining values. K.V.RAMESH BABU M.SC.STATISATICS @ ASSISTANT PROFESSOR Page 1

description

 

Transcript of central tendency and correlation coeeficent

Page 1: central tendency and correlation coeeficent

QTBD 2013

UNIT-1Measures of Central TendencyDefinition:

Average is a measure which represents the huge volume of data into a single numerical value.

An average gives us an idea about the concentration of the values in the central part of the distribution.

Averages are the typical values around which the other distribution concentrates.Types of Measures

1) Arithmetic Mean (or) Average2) Median3) Mode4) Geometric Mean5) Harmonic Mean

Characteristics of Measures of central tendency

It should be easy to understand and easy to calculate. It should be based on all items. It should be capable for further algebraic calculations. It should be rigidly defined. It should not affected by the extreme observations. It should not affected by the fluctuations of the sampling.

Demerits of measures of Central Tendency

It can’t be determined by inspection method nor can’t locate by graphically. Arithmetic mean can’t be used for qualitative characteristics, which cannot be measured

quantitatively. Ex. Honesty, Intelligence, beauty, etc. Arithmetic mean cannot be used for open ended class-intervals.

Ex. below 90 and above 100. Arithmetic mean is affected by extreme values. Arithmetic mean leads to wrong conclusions if the details of the data from which it is

computed are given. Arithmetic mean cannot be obtained if the single observation is missing or lost from the

remaining values. Arithmetic mean is not suitable measure for extremely asymmetric distribution.

Method to calculate Average

1) Direct method.2) In-direct method (or) Deviation method.3) Step Deviation method.

1) Direct method :

Raw Data ----------- X = ∑i=1n X i /n

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 1

Page 2: central tendency and correlation coeeficent

QTBD 2013

Discrete Data ----- X = ∑i=1n f i X i /∑i=1

n f i

Continuous Data- X = ∑i=1n f i X i /∑i=1

n f i

2) Deviation Method :

Raw Data ----------- X = A + ∑i=1n d i /n

Discrete Data ----- X = A + ∑i=1n f i d i /∑i=1

n f i

Continuous Data- X = ∑i=1n f i X i /∑i=1

n f i

3) Step-Deviation Method :

Raw Data ----------- X = A + ∑i=1n d i /n X C

Discrete Data ----- X = A + ∑i=1n f f i di❑/∑i=1

n f i

Continuous Data- X = A + ∑i=1n f i d i❑/∑i=1

n f i

2) Median:

Median is defined as “middle most “or “Central value “of the set of the observations, when Observations are arranged in ascending or descending order of their magnitude. It divides the given arranged series into two equal parts. Median is also known as ‘Positional Average “.Whereas mean is known as ‘Calculated average “.

When a series consists of even number of terms then median is known as arithmetic mean

Of the central items. It is denoted byM d.Formulas:

Raw Data ----------- Arrange the given set of data in ascending or descending Order. Case – i) If n is odd then median is the value given by

M d = (n+1 ) /2th term Where n = No. of observationsCase –ii) If n is even number then median is given by

M d = (n/2 )+(n+1 /2 )

2 the term

Discrete Data ------ STEP -1: Find the cumulative frequencies of the given data.

STEP -2: Find N = ∑i=1n f i

STEP -3: Find the cumulative frequency just greater than N /2 and the corresponding value of X is known as median value.

Continuous Data--- STEP -1: Find the cumulative frequencies of the given data.

STEP -2: Find N = ∑i=1n f i

STEP -3: Then value of median is given by

M d = L + {N /2−mf }X C

Where L = Lower limit of the median class F = frequency of the median class

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 2

Page 3: central tendency and correlation coeeficent

QTBD 2013

M = the cumulative frequency preceding the median class C = width of the class interval

N = ∑i=1n = sum of the frequencies.

3) MODE: Mode is a value in a series which occurs most frequently. In a frequency distribution mode

Is the value which has the maximum frequency. In other words, mode is the value which has theGreatest frequency density in its neighbourhood. Mod e is also known as most frequent value or difficult value or predominant value or most fluctuation value or norm value. FORMULAS:

Raw Data ----------- In this case the value which has maximum frequency is known as mode value.

Discrete Data ------ In this case mode is the value which has maximum frequency corresponding the X

Continuous Data--- STEP -1: Find the cumulative frequencies of the given data.

STEP -2: Find N = ∑i=1n f i

STEP -3: Then value of median is given by

M O = L + f 1−fo

2 f 1−fo−f 2 X C

4) GEOMETRIC MEAN:The geometric mean of n observations is the n th root of the product of the observations.

Let X1, X2, X3 ... Xn are given set of n observations then the geometric mean is given byG.M. = n√¿¿ = ¿¿

If n= 2 the the geometric mean mean is the square root of the product of the observations.EXA MPLE: The geometric mean of 4 and 16

G.M. = 2√ (4 ) . (16 ) = 2√64 = 8

If the observations are greater than 2 then the computation of n th root is not suitable, in that case we can take logarithm.

Log (G.M.) = log ¿¿ = 1/n log ¿

= 1/n {log ( X1 ) . log ( X2 ) . log ( X13 ) ……. log ( Xn )}

FORMULAS:

Raw Data ------------- G.M. = Anti log {(1/n ) (∑i=1n log X i ) }

Discrete Data ------ G.M. = Anti log {(1/N ) (∑i=1n f i log X i )}

Continuous Data--- G.M. = Anti log {(1/N ) (∑i=1n f i logmi) }

5) Harmonic Mean:

The harmonic mean is the reciprocal of arithmetic mean of reciprocal of observations.

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 3

Page 4: central tendency and correlation coeeficent

QTBD 2013

If X1, X2, X3 ... Xn are given set of n observations then the harmonic mean is given by

H.M. = 1

1/n(∑i=1n X i )

FORMULAS:

Raw Data ------------- H.M. = 1

1/n(∑i=1n X i )

Discrete Data ------ H.M. =

1

1/n(∑i=1n f i

X i)

Continuous Data--- H.M. =

1

1/n(∑i=1n f i

mi)

Measures of dispersionDefinition:

The meaning of dispersion is ‘scateredness’. The measure of scatter of the given dataabout the average is said to be a measure of dispersion.Characteristics of Good Measure of Dispersion

It should be easy to understand. It should be based on all items. It should be readily comprehensible. Its procedure should be simple. It should be rigidly defined. It should be capable for further algebraic calculations. It should not affected by the extreme observations. It should not affected by the fluctuations of the sampling.

Types of Measures1) Range.2) Quartile Deviation.3) Standard Deviation.4) Mean Deviation.In the above the first two measures are known as ‘positional averages’ and the remaining measures are known as ‘calculated averages’.

Formulas:

1) Range :Range is the difference between the values of the extreme values. It is denoted by R.

Raw Data ----- ---- Range = R= (Largest value- Smallest value) = L-S Discrete Data ----- Range = R= (Largest value- Smallest value) = L-S Continuous Data - Range = R= (Largest value- Smallest value) = L-S

Coefficient of Range

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 4

Page 5: central tendency and correlation coeeficent

QTBD 2013

Coefficient of range = L−SL+S

2) Quartile deviation : Quartile deviation is denoted by Q.D. If Q1 is the first quartile and Q3 is the third

Quartile. Then quartile deviation is as follows

Q.D. = Q3−Q 12

Raw Data ----- ---- Q.D. = Q3−Q 12

Discrete Data ----- Q.D. = Q3−Q 12

Continuous Data - Q.D. = Q3−Q 12

3) Mean Deviation : If X1 , X2 , X3, ...... Xn are n observations and di= Xi – a then the mean deviation is

denoted by M.D. And is given by

M.D. = ∑i=0n

¿di∨¿

n¿ where di = Xi- X X = mean

Raw Data ----- ---- M.D. = ∑i=0n

¿di∨¿

n¿ where di = Xi- X X = mean

Discrete Data ----- M.D. = ∑i=0n

fi∨di∨¿

fi¿

where di = Xi- X X = mean

Continuous Data - M.D. = ∑i=0n

fi∨di∨¿

fi¿

where di = mi- X X = mean

Coefficient of Mean Deviation:

Coefficient of Mean Deviation = Mean Deviation

Mean4) Standard Deviation :

If X1 , X2 , X3, ...... Xn are n observations and di= Xi - X then the standard deviation Is denoted by S.D. and is given by

S.D. = √{(∑i=1n

d i

2/n)−(∑

i=1

n

di /n)2}

Raw Data ----- ---- S.D. = √{(∑i=1n

d i

2/n)−(∑

i=1

n

di /n)2}

Discrete Data ----- S.D. = √{(∑i=1n

d i

2/n)−(∑

i=1

n

di /n)2}

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 5

Page 6: central tendency and correlation coeeficent

QTBD 2013

Continuous Data - S.D. = √{(∑i=1n

d i

2/n)−(∑

i=1

n

di /n)2}

Coefficient of Variation:C.V. = 100 x(σ / X )

PROBLEMS ON MEASURES OF CENTRAL TENDENCY:1) PROBLEMS ON ARITHMETIC MEAN:a) Direct Method:

Raw Data:1) Find the average for the following data

Solution: X = ΣXn

= 62010

= 62

Discrete Data:1) Find the Arithmetic mean for the following data

X 10 20 30 40 50 60f 5 15 25 20 10 5

Solution:

X = ΣfXΣf

= 270080

=33.75

b) In-Direct Method or Deviation Method: Raw Data

Problem -1 Calculate the average for the following dataFamily A B C D E F G H I JIncome 90 75 60 100 125 50 80 120 500 400

Solution:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 6

X f X f

10 5 50

20 15 300

30 25 750

40 20 800

50 10 500

60 5 300𝞢f = 80 𝞢X f =2700

Page 7: central tendency and correlation coeeficent

QTBD 2013

Family Income d i = X i - AA 90 -10

B 75 -25

C 60 -40

D 100 0

E 125 25

F 50 -50

G 80 -20

H 120 20

I 500 400

J 400 300

𝞢d i = 600 X = A +

Σ d i

n = 100 +

600100

= 100 + 60 = 160

Discrete Data: Problem -1 Calculate the average for the following data

X 10 20 30 40 50 60f 5 15 25 20 10 5

Solution:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 7

X f i d i = X i - A f id i

10 5 -30 -150

20 15 -20 -300

30 25 -10 -250

40 20 0 0

50 10 10 100

60 5 20 100

𝞢f i = 80 𝞢f i d i=¿-500

Page 8: central tendency and correlation coeeficent

QTBD 2013

X = A + Σ f i di

Σ f i

= 40 + ⌈−50080

⌉ = 40 -6.25 = 33.75

Continuous Data :1) Find the Arithmetic mean for the following data

C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90f 1 4 10 22 30 35 10 7 1

Solution:C.I f mi f imi d i = mi-A f i d i d i =

d i

c

f id i

0-10 1 5 5 -50 -50 -5 -510-20 4 15 60 -40 -160 -4 -1620-30 10 25 250 -30 -300 -3 -3030-40 22 35 770 -20 -440 -2 -4440-50 30 45 1350 -10 -300 -1 -3050-60 35 55 1925 0 0 0 060-70 10 65 650 10 100 1 1070-80 7 75 525 20 140 2 1480-90 1 85 85 30 30 3 3Total Σ f i=120 𝞢f imi =5620 𝞢f id i =-980 𝞢f id i = -98

1) PROBLEMS ON MEDIAN: Raw Data :

Problem -1 Find the median for the following data also calculates Q1 & Q3 values.

X 120 170 100 110 180 220 160

Solution: Arrange the given data in ascending order

n=7

Q2

Or

md

=

( n+12 )

th

term =

( 7+12 )th

term =

( 82 )th

term

= 4 th term = 160 ⟹md = 160 Q1 = ( n+1

4 )th

term = ( 7+14 )th

term = ( 84 )th

term

= 2th term = 110 ⟹Q1 = 160K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 8

X

120

110 → Q1

120

160 → Q2

170

180 → Q3

220

Page 9: central tendency and correlation coeeficent

QTBD 2013

Q3 = ( 3(n+1)4 )

th

term = ( 3(7+1)4 )th

term = ( 244 )th

term

= 6 th term = 180 ⟹Q1 = 180

Discrete Data:Problem – 1 Find the median for the following data also calculate Q1 & Q3 values.

X 10 20 30 40 50 60f 5 15 25 20 10 5

Solution:

⟹N = 80 ⟹N4

= 804

= 20 ⟹ Q1 = 20 ⟹N

2 = 802

=40 ⟹ M d∨¿ Q2 = 30⟹3N4

=

3(80)4

= 60 ⟹ Q3

= 40

Continuous Data Problem -1 Find the median for the following data also calculates Q1 & Q3 values.

C.I 0-10 10-20 20-30 30-40 40-50 50-60f 4 6 10 15 8 7

Solution:N = 50

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 9

X f C.f

10 5 5

20 15 20

30 25 45

40 20 65

50 10 75

60 5 80

C.I f C.f

0-10 4 4

10-20 6 10 → m1

20-30 10 → f 0 20 → m2

30-40 15 → f 1 35 → m3

40-50 8 → f 2 43

50-60 7 50

Page 10: central tendency and correlation coeeficent

QTBD 2013

⟹N4

= 504

= 12.5

⟹N2

= 502

= 25

⟹3N4

= 3(50)4

= 37.5

Q1 = L1 + ⌈( N /4 )−m1

f 1⌉ X c

= 20 + ⌈12.5−1010

⌉ X 10

= 20 + 2.5 = 22.5

Q2 = L1 + ⌈( N /2 )−m2

f 2⌉ X c = 30 + ⌈ 25−20

15⌉ X 10 = 30 +3.33 = 33.33

Q3 = L1 + ⌈3 ( N / 4 )−m3

f 3⌉ X c = 40 + ⌈ 37.5−35

8⌉ X 10 = 40 +3.125 = 43.125

2) PROBLEMS ON MODE: Raw Data :

Problem -1 Find the mode for the following data0,6,1,7,2,3,7,6,6,2,6,6,5,6,0Solution:

∴ MODE = 6

Discrete Data :Problem -1 Find the mode for the following data

Height (in inches)

57 59 61 62 63 64 65 66 67 69

f 3 5 7 10 20 22 24 5 2 2

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 10

X f f

0 II 2

1 I 1

2 II 2

3 I 1

5 I 1

6 → M O

IIII I 6

7 II 2

Page 11: central tendency and correlation coeeficent

QTBD 2013

Solution:Height

(in inches)f

57 3

59 5

61 7

62 10

63 20

64 22

65 24

66 → M O 5

67 2

69 2

Continuous Data :Problem -1 Find the mode for the following data

C.I 0-400 400-800 800-1200 1200-1600 1600-2000 2000-2400 2400-2800 2800-3200

f 4 12 40 41 27 13 9 4Solution:

C.I f

0-400 4

400-800 12

800-1200 40 → f 0

L →1200-1600

41 → f 1

1600-2000 27 → f 2

2000=2400 13

2400-2800 9

2800-3200 4

M O = L + f 1−fo

2 f 1− fo−f 2 X C = 1200 + ⌈

41−402 (41 )−40−27

⌉ = 1200 + 22.6 = 1226.6

Problems on Geometric Mean: Raw Data :

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 11

Page 12: central tendency and correlation coeeficent

QTBD 2013

Problem -1 Find the Geometric mean for the following dataX 2000 200 20 12 8log X i 3.3010 2.3010 1.3010 1.0792 0.9030

Solution:

G.M. = Anti log ⌈Σ log X i

n⌉

= Anti log ⌈8.88525

= Anti log [1.7770] = 59.8411

Discrete Data :Problem -1 Find the geometric mean for the following data

X 10 20 30 40 50 60f 15 18 22 16 12 7

Solution:

X f log X i f (log X i ¿

10 15 1 15

20 18 1.3010 23.418

30 22 1.4771 32.4962

40 16 1.6021 25.6336

50 12 1.6989 20.3868

60 7 1.7781 12.4467

Total 𝞢f i = 90 𝞢 f (log X i ¿= 129.3797

G.M. = Antilog [ Σ f i logXi

N ] = Antilog [ 129.379790 ] = Antilog [1.4372] = 27.3652

Continuous Data :Problem -1 Find the Geometric mean for the following data.

C.I 15-20 20-25 25-30 30-35 35-40 40-45f 4 20 38 24 10 4

Solution:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 12

X log X i

2000 3.3.10

200 2.3010

20 1.3010

12 1.0792

8 0.9030

𝞢 log X i

Page 13: central tendency and correlation coeeficent

QTBD 2013

C.I f mi logmi f ( logmi)15-20 4 17.5 1.2430 4.97220-25 20 22.5 1.3521 27.04225-30 38 27.5 1.439 54.68230-35 24 32.5 1.5118 36.283235-40 10 37.5 1.5740 15.7440-50 4 42.5 1.6283 6.5132

Σ f i = 100

G.M. = [ Σf log X i

N ] = Anti log [ 145.2324100 ] = Anti log [1.4523] = 28.33

5) Problems on Harmonic Mean: Raw Data :

Problem -1 Calculate harmonic mean for the following dataX 200 300 20 12 8 0.8

Solution:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 13

𝞢f ( logmi) = 145.2324

Page 14: central tendency and correlation coeeficent

QTBD 2013

H.M. =

n

Σ ( 1X i)

=

61.516

= 3.95

Discrete Data :Problem -1 Calculate harmonic mean for the following data

X 24 26 30 42 17 11f 2 9 7 14 24 5

Solution:

H.M. = Σ f i

Σf i

X i

= 612.86

= 21.319

Continuous Data :Problem-1 Calculate the harmonic mean for the following data

C.I 100-110 110-120 120-130 130-140 140-150f 12 18 25 22 18

Solution:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 14

X 1X i

200 0.005300 0.00320 0.0512 0.08338 0.1250.8 1.25𝞢 1

X i = 1.516

X f i f i

X i

24 2 0.08326 9 0.34630 7 0.23342 14 0.33317 24 1.41111 5 0.454𝞢f i = 61 𝞢 f i

X i = 2.86

Page 15: central tendency and correlation coeeficent

QTBD 2013

H.M. =

Σ f i

Σ ( f i

mi) =

950.7577

= 125.379

Problems on Measures of Dispersion :1) Problems on Range Discrete Data :

Problem-1 Find the range for the following dataX 12 12 14 15 16 17f 6 14 10 7 5 3

Solution: Range = L-S = 17-12 = 5

Coefficient of Range = L−SL+s

= 17−1217+12

= 529

= 0.1724

Continuous Data Problem-1: Find the range for the following data

C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70f 5 8 12 20 15 7 3

Solution: Range = L-S = 70-0 = 70

Coefficient of Range = L−SL+s

= 70−070+0

= 7070

= 1

2) Problems on Quartile Deviation : Raw Data :

Problem-1 Find the quartile deviation for the following dataS.NO. 1 2 3 4 5 6 7Marks 25 35 45 17 35 20 55

Solution:

Q1 = n+14

= 7+14

= 84

= 2nd term = 20

Q3 = 3(n+1)4

= 3(7+1)4

= 244

= 6nd term = 45

Q.D. = Q3−Q1

2

= 45−202

= 12.5

Coefficient of Q.D. = Q3−Q1

Q3+Q1

= 45−2045+20

= 2565

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 15

C.I f i mi f i

mi

100-110 12 105 0.1142110-120 18 115 0.1565120-130 25 125 0.2130-140 22 135 0.1629140-150 18 145 0.1241𝞢f i = 95 𝞢 f i

mi = 0.7577

S.NO. Marks (X i ¿ Ascending order1 25 172 35 20 → Q1

3 45 254 17 355 35 356 20 45 → Q3

7 55 55

Page 16: central tendency and correlation coeeficent

QTBD 2013

= 0.3846

Discrete Data :Problem-1 Find the quartile deviation for the following data

X 30 20 40 50 10 60f 15 7 8 7 4 2

Solution:

Q1

=

N4

=

434

= 10.73

11

⟹Q1 = 20Q3 =

3N4

=3(43)4

= 32.25≅ 32

⟹Q3 = 400Q.D. =

Q3−Q1

2 = 40−202

= 10

Coefficient of Q.D. = Q3−Q1

Q3+Q1

= 40−2040+20

= 2060

= 0.3334

Continuous Data :Problem-1 Find the quartile deviation for the following data

C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70f 4 8 10 16 11 7 3

Solution:C.I f Cumulative frequency (c.f.)0-10 4 410-20 8 12 → m1

L1→20-30 10 → f 1 2230-40 16 38 → m3

L3→40-50 11 → f 3 4950-60 7 5660-70 3 59

Q1 = L1 + [ (N4 )−m1

f 1] XC

= 20 + [ 14.75−1710 ] X10 = 20 +[2.75 ] = 22.75

Q3 = L3 + [ (3N4 )−m3

f 3] XC

= 40 + [ 44.25−3811 ] X10 = 40 +[5.68 ] = 45.68

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 16

X f Ascending order f Cumulative frequency (c .f.)

30 15 10 4 420 7 20→ Q1 7 11→ Q.D. class40 8 30 15 2650 7 40 → Q3 8 34 → Q.D. class10 4 50 7 4160 2 60 2 43

Page 17: central tendency and correlation coeeficent

QTBD 2013

Q.D. = Q3−Q1

2 = 45.68−22.75

2 = 11.465

Coefficient of Q.D. = Q3−Q1

Q3+Q1

= 45.68−22.7545.68+22.75

= 22.9368.43

=0 .3351

3) Problems on Mean Deviation : Raw Data:

Problem-1 Find the mean deviation for the following dataX 7 4 10 9 15 12 7 9 7

Solution:

X = Σ X i

n = 809

= 8.9

M.D. = Σ|di|

n = 21.19

= 2.344

Coefficient of M.D. =

Discrete Data :Problem-1 Find the mean deviation for the following data

X 10 15 20 30 40 50f 8 12 15 10 3 2

Solution:

X

=

Σ f i X i

N

=

108050

= 21.6

M.D. = Σ f i|d i|

N = 39250

= 7.84

Coefficient of M.D. = M . D .Mean

= 7.8421.6

= 0.3629

Continuous Data :

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 17

X Ascending Order (X i)

|di| = |X i−X|

7 4 4.94 7 1.9

10 7 1.99 7 1.9

15 9 0.112 9 0.17 10 1.1

9 12 3.1 7 15 6.1

ΣXi = 80 𝞢 |di| = 21.1

X f X i f i |di| =|X i−X| f i|d i| 10 8 80 11.6 92.815 12 180 6.6 79.220 15 300 1.6 2430 10 300 8.4 8440 3 120 18.4 55.250 2 100 28.4 56.8

N= 50 𝞢 X i f i = 1080 Σ f i|d i| = 392

Page 18: central tendency and correlation coeeficent

QTBD 2013

Problem-1 Find the mean deviation for the following dataC.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80f 5 8 7 12 28 20 10 10

Solution:C.I f mi f i mi |di| =|X i−X| f i |di|

0-10 5 5 25 40 20010-20 8 15 120 30 24020-30 7 25 175 20 14030-40 12 35 420 10 12040-50 28 45 1260 0 050-60 20 55 1100 10 20060-70 10 65 650 20 20070-80 10 75 750 30 300

N =100 𝞢f i mi= 4500 𝞢f i |di|=¿ 1400

X = Σ f imi

N = 4500100

= 45

M.D. = Σ f i|d i|

N = 1400100

= 14

Coefficient of M.D. = M . D .Mean

= 1445

= 0.3111

4) Problems on Standard Deviation : Raw Data :

Problem-1 Find the Standard deviation for the following dataX 8 10 12 14 16 18 20 22 24 26

Solution:

24 8 6426 10 100

Σ d i= 100 𝞢d i2 = 340

X = Σ d i

n = 1010

= 1

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 18

X d i = X i - A d i2

8 -8 6410 -6 3612 -4 1614 -2 4

16 → A 0 018 2 420 4 1622 6 36

Page 19: central tendency and correlation coeeficent

QTBD 2013

S.D.(σ ¿= √[ Σ d i2

n ]−⌈ X2 ⌉ = √ 340100− (12 ) = √3.4−1 = √2.4 = 1.5492

C.V. = σX

X 100 = 1.54921

X 100 = 154.92

Discrete Data:Prolem-1 Find the Standard deviation for the following data

X 5 15 25 35 45 55 65f 3 10 20 30 15 12 10

Solution:X f d i = X i - A f i d i f i d i

2

5 3 -30 -90 270015 10 -20 -200 400025 20 -10 -200 2000

35→ A 30 0 0 045 15 10 150 150055 12 20 240 480065 10 30 300 9000𝞢 f = 100 𝞢f i d i = 400 𝞢f i d i

2=24,000

X = Σ f i di

N = 400100

= 4

S.D.(σ ¿= √[ Σ f i di2

N ]−⌈ X 2⌉ = √( 2400100 )−(42 ) = √24−16 = √8 = 2.8284

Continuous Data :Problem-1 Find the Standard deviation for the following data

C.I 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80f 5 8 7 12 28 20 10 10

Solution:C.I f mi d i = X i - A f i d i f i d i

2

0-10 5 5 -40 -200 800010-20 8 15 -30 -240 720020-30 7 25 -20 -140 280030-40 12 35 -10 -120 120040-50 28 45 0 0 050-60 20 55 10 200 200060-70 10 65 20 200 4000

X =A + Σ f i di

N X c = 45 +

0100

X 10 = 45 + 0 = 45

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 19

70=80 10 75 30 300 7000𝞢 f = 100 𝞢f i d i = 0 𝞢f i d i2=34,200

Page 20: central tendency and correlation coeeficent

QTBD 2013

S.D.(σ ¿= √[ Σ f i di2

N ]−⌈ X 2⌉ = √( 34200100 )−(452 ) = √342−2025 = √1980

= 44.4972

C.V. = σX

X 100 = 44.497245

X 100 = 98.8827

PERMUTATIONS:Definition:

The each arrangement made by choosing r objects among n is called a ‘Permutation’.The total number of arrangements innpr. Also written as P (n, r).

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 20

Page 21: central tendency and correlation coeeficent

QTBD 2013

npr= n.(n−1 ) . (n−2 ) …. ( n−r+1 )= n . ( n−1 ) . (n−2 ) … ..1(n−r ) . (n−r+1 ) … .1 =

n!

(n−r )!NOTE: i) P (n, n) = n! ii) P (n, (n-1)) = P (n,n)PERMUTATIONS WITH REPETITIONS:

Suppose there are n objects. If repetitions are allowed, then the number of permutations taking r at a time is nr

I. The number of permutations of choosing r1 of type 1, r2 of type 2 and the rest are different

and is npr=n !

(r1 ) ! (r2 ) !

ii. The number of permutations of choosing r1 of typer 1, r2 of type 2, r3 of type 3 and the rest

are different and is npr=n !

(r1 ) ! (r2 ) ! (r3 ) !RESTRICTED PERMUTATIONS:

1. Suppose there are n objects, we have to select r such that particular s objects should not be selected, then the number of permutations is ¿

2. Suppose there are n objects, we have to select r such that particular s objects should be selected, then the number of permutations is ¿. rpS

CIRCULAR PERMUTATIONS:The number of ways of sitting n people in circular seats is (n−1)!

COMBINATIONS:Definition:

The selection of r different objects selecting if the order is not important among n objects is called a ‘combination’.

If we select r objects, then number of possible ways is

nC r = C (n, r) = n!

r! (n−r )!

NOTE: i) If the order is important and repetitions are allowed, then we can select r objects among n

objects in n!

(n−r )! ways.

ii) The number of arranging n stones in r boxes such that there will be one at least one stone in each box is C (r, (n-r)) = C ((n-1), (n-r)) = ¿iii) Suppose the set A = (a1 , a2 ,….an ¿ andr1 , r2 , …. , rn. The number of permutations of A, where

each element ar is repeated ri times as (r1+r2+… ..+n )(r1 ) ! (r2 ) !… (rn ) !

REPETITIONS ARE ALLOWED:

1) The number of combinations of r objects among n objects, if the repetitions are allowed and

the r is not important is C((n+r-1), (n-1)) = (n+r−1)!

r! (n−r )!2) The number of ways of distributing n chaklets to r children, so that each child get at lest

One is C ((n-1), (n-r))

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 21

Page 22: central tendency and correlation coeeficent

QTBD 2013

3) The number of non-negative integer solutions of X1+ X2+.....+ X n =n such that X i> 0 isC ((n-1), (n-r))

PROBLEMS ON PERMUTATIONS: PROBLEM -1: How many ways can you arrange 9 different books, such that a special book is on 4th place?SOLUTION: There are 9 books, one is on 4th place, so removing 4th place, remaining other 8, can be arrange in 8! ways i.e. npr= 40,320 ways.PROBLEM-2: How many different eight –digit numbers can be found by arranging the digits 1, 1,1,1,2,3,3,3?SOLUTION: The number of digits = 8The digit 1 4 times, the digit 2 1 time, the digit 3 3 times

The number of ways npr=n !

(r1 ) ! (r2 ) ! (r3 ) ! = 8 !(4 )! (1 ) ! (3 ) ! = 240 ways.

PROBLEMS ON COMBINATIONS:PROBLEM-1: Find the number of permutations of the word CALCULUS.SOLUTION: There are 8 letters in the word. The letter C, L and U repeated twice.

So the number of permutations is 8 !

(2 )! (2 )! (2 ) ! = 5040

PROBLEM-2: How many possible committees of 6 people can be chosen from 15 men and 10 women, if 3 men and at least 2 women must be there on each committee?SOLUTION: Three women and 3 men = C (15, 3) X C (10, 3) = 54,600.

Two women and 4 men = C (15, 4) X C (10, 2) = 61,425.The total number of possible ways = 54,600 + 61,425 = 1, 16,025

BAYE’S THEORMStatement: If an event A will appears only if the combination of any one of n mutually exclusive events E1, E2, .....En. If an event A is appeared then the probability that it was preceded by the

particular event Ei is obtained. Then

P (Ei / A) =

P(E i) . P (A / Ei)

∑i=1

n

P(Ei). P( A /E i)

PROBLEMS ON BAYE’S THEORM PROBLEM -1 In a bolt factory machines A, B, C manufactures 20 %, 30 %,and 50 % of the their output and 6 %, 3 %, and 2 % are defectives. A bolt is drawn at random and found to be defective. Find the probabilities that it is manufactured by i) Machine A ii) Machine B iii) Machine C.SOLUTION: Let A = The event that the bolt is manufactured by Machine A.

B = The event that the bolt is manufactured by Machine B. C = The event that the bolt is manufactured by Machine C. D = The event that the drawn bolt is defective.

P (A) = The probability that the bolt is manufactured by Machine A = 20100

P (B) = The probability that the bolt is manufactured by Machine B = 30100

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 22

Page 23: central tendency and correlation coeeficent

QTBD 2013

P (C) = The probability that the bolt is manufactured by Machine C = 50100

P (D/A) = If the bolt is manufactured by Machine A, then the probability that the drawn bolt is

defective = 6100

P (D/B) = If the bolt is manufactured by Machine B, then the probability that the drawn bolt is

Defective = 3100

P (D/C) = If the bolt is manufactured by Machine C, then the probability that the drawn bolt is

Defective = 2100

i) If the drawn bolt is defective, then the probability that it is from machine

P (A/D) = P (A ). P(D / A)

P ( A ) . P( DA )+P (B ) . P( D

B )+P (C ) . P( DC ) =

( 20100 ) .( 6100 )( 20100 ) .( 6100 )+( 30100 ) .( 3100 )+( 50100 ) .( 2100 )

= 120/10000

120/10000+90 /10000+100 /10000 = 12/100012/10000+9/1000+10 /1000 =

= 12/1000

(12+9+10 )/1000 = 0.0120.031

= 0.3871

ii) If the drawn bolt is defective, then the probability that it is from machine

P (B/D) = P(B). P(D /B)

P ( A ) . P( DA )+P (B ) . P( D

B )+P (C ) . P( DC ) =

( 30100 ) .( 3100 )( 20100 ) .( 6100 )+( 30100 ) .( 3100 )+( 50100 ) .( 2100 )

= 90 /10000

120/10000+90 /10000+100 /10000 = 0.009

(0.012 )+(0.009 )+(0.01 ) = 0.0090.031

= 0.2903

iii) If the drawn bolt is defective, then the probability that it is from machine

P (C/D) = P(C ). P(D /C )

P ( A ) . P( DA )+P (B ) . P( D

B )+P (C ) . P( DC ) =

( 50100 ) .( 2100 )( 20100 ) .( 6100 )+( 30100 ) .( 3100 )+( 50100 ) .( 2100 )

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 23

Page 24: central tendency and correlation coeeficent

QTBD 2013

= 100/10000

120/10000+90 /10000+100 /10000 = 0.01

(0.012 )+(0.09 )+(0.01 ) = 0.010.031

= 0.3226PROBLEM -2 Urn A contains 3 red and 5 white marbles. Urn B contains 2 red and 1 white marbles and Urn C contains 2 red and 3 white marbles. An Urn is selected at random and a marble is drawn from the urn. If the marble is red, what is the probability that it came from Urn A?SOLUTION: Let A = The event of choosing the Urn A.

B = The event of choosing the Urn B. P (A) = The probability of selecting 1st urn = 1/3 P (B) = The probability of selecting 2nd urn = 1/3

P (A) = The probability of selecting 3rd urn = 1/3 P (R/A) =The probability of selecting 1 red ball from the urn A = m /n=¿¿ C1

3 / C18 = 3/8

P (R/B) =The probability of selecting 1 red ball from the urn B = m /n=¿¿ C12 / C1

3 = 2/3

P (R/C) =The probability of selecting 1 red ball from the urn C = m /n=¿¿ C12 / C1

5 = 2/5From the baye’s theorem we haveP (A/R) = If the marble is red, then the probability that is came from urn A

= P ( A ) . P ( R/ A )

P ( A ) . P (R / A )+P (B ) . P (R /B )+P (C ) . P ( R/C ) = ¿¿¿

= 1/8

1/8+2/9+2 /15 = 0.125

0.125+0.2224+0.1334 = 0.1250.4808 = 0.2601

BINOMIAL DISTRIBUTION:Definition: A random variable X is said to follow Binomial Distribution if it assumes non-negative values and its probability mass function (p.m.f) is follows

P (X=x) = (nCx) px q (n-x) ; x= 0,1,2,3....., n ; q=1-p =o ; Otherwise

Examples: 1) The number of heads obtained in 3 tosses of a coin2) The number of defectives in a lot of 10 items3) The number of boys in a family of 4 children

POISSON DISTRIBUTION:Definition: A random variable X is said to follow Poisson distribution if it assumes non- negative values and its probability mass function (p.m.f.) is given by

P (X,λ) = P (X) = e−λ λX

X ! ; X = 0, 1, 2,.... ; λ >0

= 0 ; otherwise It is denoted by X P (λ)

Examples: 1) The typing mistakes per page in a book2) The number of accidents on a road in a particular time3) The number of telephone calls received by an operator

EXPONENTIAL DISTRIBUTIONDefinition: A continuous random variable X is said to follow exponential distribution with parameter θ if its probability density function is given by

f(X) = θ. e−θX ; X≥0; θ>0 = 0 ; otherwise

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 24

Page 25: central tendency and correlation coeeficent

QTBD 2013

NORMAL DISTRIBUTIONDefinition: A random variable X is said to have a Normal distribution with parameters µ and 𝞼 if its probability density function is given by

f(X; µ, 𝞼) = 1

σ . 2√π exp {( 12 ) .(( X−µ)

σ

2

)} ;- ∞< X < ∞ ; - ∞< µ< ∞ ; 𝞼 > 0 = o ; otherwiseSTANDARD NORMAL VARIATEIf X N (µ,σ 2) the if we put Z = ( X−µ)

σ in the p.d.f. of the normal distribution

f(X; µ, 𝞼) = 1

σ . 2√π ∫−∞

e−¿ ¿¿¿¿

ᵩ (Z) = 1

σ . 2√π ∫−∞

e−¿ ¿¿¿¿ ; - ∞< Z < ∞whereᵩ (Z) = The p.d.f. of standard normal variate.

PROBLEMS ON BINOMIAL DISTRIBUTION:PROBLEM -1

The probability of a defective bolt is 0.2. Find i) Mean ii) Standard Deviation for the distribution of bolts of 400.SOLUTION: Given that n= Number of trials = 400

P= Probability of success = Probability of getting a defective bolt = 0.2 Q = 1-P = 1-0.2 = 0.8

i) Mean = np = 400(0.2) = 80 ii) Variance = npq standard deviation =√npq =

√400 (0.2 )(0.8) =√64 =√8PROBLEMS ON POISSON DISTRIBITION:PROBLEM -1

Average number of accidents on any day on a national highway is 1.8. Determine the probability that the numbers of accidents are i) At least one ii) At most one.Solution: Given that mean = λ= 1.8The mean of Poisson distribution is

P (X) = e−λ λX

X ! = e

−1.81.8X

X !→ 1

i) The probability that the number of accidents are at least one is

P (X≥1) = 1- p(X<1) = 1- p(X=0) = 1-[ e−1.81.80

0 ! ] = 1-(e−1.8 ) = 1- 0.1653 = 0.8347

ii) The probability that the number of accidents are at most one is

P (X ≤ 1) = P (X =0) + P (X=1) = [ e−1.81.80

0 ! ] + [ e−1.81.81

1! ] = e−1.8 + e−1.8 (1.8)

= e−1.8 (1+1.8) = (0.1653). (2.8) = 0.4628PROBLEMS ON EXPONENTIAL DISTRIBUTION:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 25

Page 26: central tendency and correlation coeeficent

QTBD 2013

PROBLEM -1The time taken by a person while speaking over a telephone is exponential distribution with

mean 4 minutes. Find i) The probability that he speaks for more than 6 minutes but less than 7 minutes.ii) Out of 6 calls he makes, what is the probability that exactly 2 calls taken him more than

3 minutes.iii) How many calls out of 100 are expected to take more than 3 minutes each?

Solution: Let t= the time taken (in minutes) per call.Given that X exponential distribution with mean 4 minutes.

f(X) = 14

. e−14

X ; X≥0;θ >0 → 1

i) P (The time taken for one call is between 6 and 7 minutes)

= P (6<X<7) = ∫6

7

f ( x )dx = ∫6

714

. e−X4 =14

∫6

7

e−X4 .dx =

14

¿¿

= [−e−(14)]67

= [−e−( 74)+e

−( 64)] = 0.04936

ii) P (The time taken for 2 calls is more than 3 minutes)

= P(X>3) = P (3<X<∞) = ∫3

f ( x )dx = ∫3

∞14

. e−X4 =14

∫3

e−X4 .dx

= 14

¿¿ = [−e−∞+e−3/4 ] = [0+e−3 /4 ] = 0.4724

Expected number of calls out of 100 that will be longer 3 minutes each=100XP(X>3) = 100(0.4724) = 47.24

PROBLEMS ON NORMAL DISTRIBUTION:Problem -1

If X is a Normal variate with mean 30 and standard deviation 5. Find the probabilities thati) 26≤X≤40 ii) X≥45Solution: Given that Mean = µ = 30 and S.D. =𝞼=5

i) When X = 26 ⟹Z= X−μ

σ = 26−305

=−45

= -(0.8) = - Z1

When X= 40 ⟹Z= X−μ

σ = 40−305

=−105

= 2 = Z2∴ P (26≤X≤40) = P (-0.8≤Z≤2) = P ( Z2 ) + P (Z1) = P (2) + P (-0.8)(From the normal table we have P (2) =0.4772 & P (0.8) = 0.2881)

=0.4772 + 0.2881 = 0.7653 ⟹ P (26≤X≤40) =0.7653

ii) When X=45 ⟹ Z= X−μ

σ = 45−305

=155

= 3 = Z1

∴ P (X≥45) = P (Z1≥ 3) = 0.5 – P (Z1≤ 3) = 0.5- 0.49865

JOINT PROBABILITY MASS FUNCTION:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 26

Page 27: central tendency and correlation coeeficent

QTBD 2013

Definition: Let XY are 2 random variables defined on same probability space S. W.r.to 2 image

sets X(S) = , {x1 , x2….. x i ,…. xn } and Y(S) ={ y1 , y2 ,… .. , y j , …. ym }. Then the product of sets

X(S). Y(S) = {x1 , x2… .., x i ,…. xn } X{ y1 , y2 , … .., y j , …. ym } . The probability of the ordered pair

(x i , y i ¿ is defined as P(X =x i , Y= y j). Then the above product of sets defined on a probability space and it is given by.

pij = P(X =x i , Y= y j) = PXY (x, y) = P (x i , y j )

Then P (x i , y j ) is known as joint Probability mass function of X & Y. The values of P (x i , y j ) can be represented in the following table.

X\ Y y1 y2 y3......... y j.......... ym Totalx1 p11 p12 p13 p1 j p1m p1.x2 p21 p22 p23 p2 j p1m p2.x3...

p31 p32 p33 p3 j p3m p3.

x i

.

.

.

pi1 pi2 pi3 pij pℑ pi .

xn pn1 pn2 pn3 pnj pnm pn .

Total p.1 p.2 p.3 p. j p. m ∑i=1

n

❑∑j=1

m

pij = 1

Marginal probability mass function:Definition: Let (X,Y) be a bi-variate random variable and P (X,Y) be the probability mass function of a bi-variate random variable (X,Y).

The Marginal probability mass function of X is denoted by P (X) or PX (x) and is given byP (X) = PX (X=x) = P (X= xi ∩ Y = y1) + P (X= xi ∩ Y = y2) +....+ P (X= xi ∩ Y = yj) +....+ P (X= xi ∩ Y = ym)

= P (xi, y1) + P (xi, y2) +.....+P (xi, yj) +....+ P (xi, ym) = Pi1+ Pi2 +....+ Pij+.....+ Pim = ∑j=1

m

pij

=∑j=1

m

P ¿) = Pi . = PX (x)

The Marginal probability mass function of Y is denoted by P (Y) or PY (y) and is given byP (Y) = PY (Y= y j) = P (X= x1 ∩ Y = yj) + P (X= x2 ∩ Y = yj) +....+ P (X= xi ∩ Y = yj) +....+ P (X= xn ∩Y = yj)

= P (x1, yj) + P (x2, yj) +.....+P (xi, yj) +....+ P (xn, yj) = Pi1+ Pi2 +....+ Pij+.....+ Pim = ∑j=1

m

pij

=∑j=1

m

P ¿) = Pi . = PX (x)

MatrixDefinition: A system of mn numbers (real or complex) arranged in the form of an ordered set of m rows, each row consisting of an ordered set of n numbers between [ ]∨()∨||is called a matrix of order of type mXn.

Each of mn numbers consisting of mXn matrix is called an element of the matrix.

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 27

Page 28: central tendency and correlation coeeficent

QTBD 2013

A = a11 a12......a1n = [aij ]mXn where 1≤i≤m ; 1≤j≤n

a21 a22......a2n : : : : : : am1 am2....amn

In relation to matrix we call the numbers as scalars.Operations of Matrices:Equal matrices:Definition: Two matrices A = [aij ] and B= [bij ] are said to be equal if and only if

i) A and B are of the same type ii) a ij =b ij for every i & jMultiplication of a matrix by a scalarDefinition: Let A be a matrix. The matrix obtained by multiplying every element of A by k, a scalar is called the product of A by k and is denoted by kA or Ak

If A = [aij ]mXn then Ka = [k aij ]mXn = k [aij ]mXn = kAProperties:

i) OA = O (Null matrix ), (-1) A = (-A) called the negative of Aii) k 1(k 2 A) = ¿ k 2) A = k 2(k1 A) where k 1k2 are scalars.iii) kA = O ⟹ A = O if k≠0iv) k 1 A=k2 A and A is not a null matrix ⟹k 1=k2

Addition of matrices:Definition: Le A = [aij ]mXn and B= [bij ]mXnbe 2 matrices. The matrix C = [Cij ]mXn

Where C ij =a ij + b ij is called the sum of matrices A & B is denoted by A+B

Thus [aij ]mXn + [bij ]mXn = [aij+bij ]mXn = [aij ]mXn + [bij ]mXn

Differences of matrices:Definition: If A&B are matrices of the same type then A + (-B) is A-B.Matrix Multiplication:Definition: let A = [aik ]mXn and B= [bkj]nXpbe 2 matrices. The matrix C = [Cij ]nXp

Where C ij= ∑k =1

n

aik bkj is called the product of the matrices A&B in that order we can write

C = A+BTypes of Matrices:

1) Square Matrix: If A = [aij ]mXn and m=n , then A is called a square matrix. A square matrix A of order (nXn) is sometimes called as a “n-rowed matrix A”.

Example: A = [1 12 2] is called 2nd order matrix.

2) Rectangular Matrix: A matrix which in not a square matrix is called a rectangular matrix.

Example: A = [ 1 −1 22 3 4

] is a (2X3) matrix.

3) Row Matrix : A matrix of order (1Xm) is called a row matrix.Example: A = [123 ](1 X 3)

4) Column Matrix : A matrix of order (nX1) is called a column matrix.

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 28

Page 29: central tendency and correlation coeeficent

QTBD 2013

Example: A = 112(3 X 1)

5) Unit Matrix : If A = [aij ]mXn such that a ij = 1 for i = j and a ij = 0 for i ≠ j, then A is called a unit

matrix. It is denoted by I n

Example: I 2 = 1 00 1

I 3 = 1 0 00 1 00 0 1

6) Null Matrix (or) Zero Matrix : If A = [aij ]mXn such that a ij = 0 for ∀ i&j , then A is called a Zero matrix or a null matrix. It is denoted by O.

Example: O = 0 0 00 0 0(2 X3 )

Definitions:1) Diagonal Elements

Definition: In a matrix A = [aij ]mXn , the elements a ij of A for which i =j

(i.e.a11,a22,...,ann) are called diagonal elements of A.2) Principle Diagonal

Definition: The line along which the diagonal elements line is called the principle diagonal of A.

3) Diagonal Matrix Definition: A square matrix all of whose elements except those leading diagonal are zero is called diagonal matrix. Ifd1,d2 ,.....,dn are diagonal elements of a diagonal matrix A, then A is

written as A = diag (d1,d2 ,.....,dn )

Example: A = diag (3, 1,-2) = 3 0 00 1 10 0 −2

4) Scalar Matrix :Definition: A diagonal matrix whose leading elements are equal is called a’’ scalar matrix’’.

Example: A = 3 0 00 3 00 0 3

CURVE FITTING Types of Cure Fitting:

Fitting of Straight Line Fitting of Second degree parabola Fitting of Exponential Curve Fitting of Power Curve1) Fitting of Straight Line:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 29

Page 30: central tendency and correlation coeeficent

QTBD 2013

Let us consider the fitting of a straight line Y = a + b X→ ①

To a set of n points (x i , y j ); i=1,2,....,n. The equation 1 represents a family of straight lines for a different values of arbitrary constants a and b. The problem is to determine a and b so that the line is the line of the best fit.

The best fit can be obtained with Legend’s principle of least square. Which consists in minimising the sum of squares of the deviations the actual values of y from their estimated values is given by the line of best fit.

Let pi (x i , y i ¿ be any general point in the scatter diagram. Draw pi M ⊥ to X axis meeting

the line in H i. Since H i lies on straight line its ordinate is a +b X i. Hence the co-ordinates of H i are

[x i , (a +b X i) ]

pi H i = pi M -H iM = y i- (a +b X i) ⟹e i =y i- (a +b X i) → ②

Here e i is called error of estimate or “residual” of y i . According to the principle of least square, we have to determine a & b so that

E =∑i=1

n

ei2 = ∑

i=1

n

( y i−a−b X i)2 is minimum → ③

From the principle of maxima and minima, the partial derivatives of E w.r.to a & b and

equating them to zero. i.e. ⟹ dEda

= 0 ⟹ dEdb

= 0⟹ dE

da = 0 ⟹ dE

da ∑i=1

n

( y i−a−b X i)2 = 0

⟹ 2 ∑i=1

n

( y i−a−b X i)2−1 (-1) = 0

⟹ ∑i=1

n

( y i−a−b X i)1 = 0 ⟹∑

i=1

n

yi –∑i=1

n

a - b ∑i=1

n

x i = 0 ⟹∑

i=1

n

yi – n.a- b ∑i=1

n

x i = 0 ⟹∑i=1

n

yi = n.a +b ∑i=1

n

x i → ④

⟹ dEdb

= 0 ⟹ dEdb

∑i=1

n

( y i−a−b X i)2 = 0

⟹ 2 ∑i=1

n

( y i−a−b X i)2−1 (-X i) = 0

⟹ ∑i=1

n

( y i−a−b X i)1(−x i) = 0 ⟹∑

i=1

n

( xi ) .¿¿¿) –a∑i=1

n

(X i) - b ∑i=1

n

x i2 = 0

⟹∑i=1

n

( xi ) .( y¿¿ i)¿ –a∑i=1

n

(X i) - b ∑i=1

n

x i2 = 0

⟹∑i=1

n

( xi ) .( y¿¿ i)¿ ¿a∑i=1

n

(X i)+¿b ∑i=1

n

x i2 →⑤

Normal Equitation’s: The Normal equations for straight line equation are∑i=1

n

yi = n.a +b ∑i=1

n

x i → ④∑i=1

n

( xi ) .( y¿¿ i)¿ ¿a∑i=1

n

(X i)+¿b ∑i=1

n

x i2 →⑤

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 30

Page 31: central tendency and correlation coeeficent

QTBD 2013

After solving these Normal equations we get the values of a & b with these values of a & b, put these values in equation 1, then it is called line of Best fit to the given set of points (x i , y i ¿I=1,2....,n

The given set of on n points is Y =a +b X

2. FITTING OF SECOND DEGREE PARABOLA:-

Let Y=a+bXi+cXi2①

be a 2nd degree parabola to be fitted to the given set of observations (Xi,Yi) (i=1,2,3,………..,n)

According to principle of least square technique to determine the constants a, b, c consider the residual.

ei= y i - y② y=¿ a+bXi+cXi2

ei=yi-(a+bXi+cXi2) ③

Taking summation & squaring on both sides to eq (3).

E=∑ ei2=∑

i=1

n

( y i−a−b x i−c x i2)2 ④

Taking partial derivatives w.r.to parameters a, b, c and equating them to ‘0’ then we get “normal equations”

The normal equations for the second degree parabola are

dEda

=0 dda

(E )= dda

¿

2∑i=1

n

( y i−a−b x i−c x i2)2(-1) =0 ∑

i=1

n

( y i−a−b x i−c x i2)2=0

∑i=1

n

yi−∑i=1

n

a−b∑i=1

n

x i−c∑i=1

2

x i2=0

∑i=1

n

yi=na−b∑i=1

n

x i−c∑i=1

2

xi2⑤

dEdb

=0 ddb

¿

2¿

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 31

Page 32: central tendency and correlation coeeficent

QTBD 2013

∑i=1

n

( y i−a−b x i−c x i2) ( xi )=0

∑i=1

n

x i y i−a∑i=1

n

x i−b∑i=1

n

xi2−c∑

i=1

n

x i3=0

∑i=1

n

x i y i=a∑i=1

n

x i−b∑i=1

n

xi2−c∑

i=1

n

x i3⑥

dEdc

=0 ddc

∑i=1

n

( y i−a−b x i−c x i2 )2=0

2¿

∑i=1

n

( y i−a−b x i−c x i2) ( x i

2)=0

∑i=1

n

x i2 y i−a∑

i=1

n

x i2−b∑

i=1

n

x i3−c∑

i=1

n

x i4=0

∑i=1

n

x i2 y i=a∑

i=1

n

x i2−b∑

i=1

n

x i3−c∑

i=1

n

x i4⑦

NORMAL EQUATIONS OF SECOND DEGREE PARABOLA

∑i=1

n

yi=na−b∑i=1

n

x i−c∑i=1

2

xi2

∑i=1

n

x i y i=a∑i=1

n

x i−b∑i=1

n

xi2−c∑

i=1

n

x i3

∑i=1

n

x i2 y i=a∑

i=1

n

x i2−b∑

i=1

n

x i3−c∑

i=1

n

x i4

After solving these normal equations we get the estimated values of a,b,c. substituting these estimated values in eq(1) then resulting equation is called “best fit” for the given set of data.

Y= a+b x+c x2

3. FITTING OF EXPONENTIAL CURVE Y = ab x

Let Y=abx ①

Taking logarithm on both sides we get

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 32

Page 33: central tendency and correlation coeeficent

QTBD 2013

log ( y)=log(a .b¿¿ x)=log a+log bx=loga+ x log(b)¿

[∵log x m=m log x ∵ log (m. n) = log m + log n]

U=A+Bx ②

Where U=log y, A=log a, B=log b

This is a linear equation in x and U

The normal equations for estimating A & B are

∑U=nA+B∑ x ③

∑ xU=A∑ x+B∑ x2 ④

After solving these normal equations we get the A & B values. Finally we get a, b values as follows

a=Anti log (A)

b=Anti log (B)

Substitute these a & b values in eq ① then we get “best fit” to the given set of ‘n’ points.

The best fit of the required equations is y= a bx

4. FITTING OF EXPONENTIAL CURVE Y = ae bx

Let Y=aebx →①

Taking logarithm on both sides to eq(1) ,then we get

Log y=log[aebx]log y=log a + log ebx log y=log a + bx log e

log y=log a +x [b log e]

U=A+Bx 2

Where U=log y, A=log a, B=b log e

This is a linear equation in x and U

The normal equations are:-

∑U=nA+B∑ x ③

∑ xU=A∑ x+B∑ x2④

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 33

Page 34: central tendency and correlation coeeficent

QTBD 2013

From these we find A and B are consequently

a=Anti log (A) and B=b[log e]Blog e

=b b= Bloge

= B0.4343

The best fit to the given set of ‘n’ points is

y= a e b x

5. FITTING OF A POWER CURVE Y=ax b

Let y=axb ①

Taking logarithm on both sides to eq(1), then we get

Log y=log[axb]log y=log a+ log [xb]log y=log a+ b log x

log y=log a+ log x

U=A+ Bv ②

Where U=log y, A=log a, v=log x

This is a linear equation in v and U

The normal equations are

∑U=nA+b∑ v ③

∑Uv=A∑ v+b∑ v2 ④

From these we find A and B consequently

a=Anti log (A) b=B

The best fit to the given set of ‘n’ points is y= a(x b)

1. PROBLEMS ON FITTING OF STRAIGHT LINE:

Problem – 1 Fit a straight line to the following data.

X 1 2 3 4 6 8Y 2.4 3 3.6 4 5 6

Solution:

The straight line equation is

Y = a + b X→ ①The normal equations for straight line are

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 34

X Y X2 XY

1 2.4 1 2.4

2 3.0 4 6.0

3 3.6 9 10.8

4 4.0 16 16.0

6 5.0 36 30.0

8 6.0 64 48.0

∑ X= 24

∑Y= 24

∑ X2

= 130∑Y 2

= 113.2

Page 35: central tendency and correlation coeeficent

QTBD 2013

∑i=1

n

yi = n.a +b ∑i=1

n

x i → ②

∑i=1

n

( xi ) .( y¿¿ i)¿ ¿a∑i=1

n

(X i)+¿b ∑i=1

n

x i2 → ③

From the above table we have

= 24 ∑Y = 24 ∑ X2=130 ∑ X2= 113.2

24 =6 (a) + b (24) → ④ X 4

113.2=a (24) + b (130) → ⑤

24 (a) + 96 (b) = 96

24 (a) + 130 (b) = 113.2

34 (b) = 17.2 ⟹b = 17.234

⟹b = 0.5059

Substitute b in eq 4⟹6 (a) + 24 (0.5059) = 24 ⟹6 (a) + 12.1416 = 24 ⟹6 (a) = 24 – 12.1416⟹6 (a) = 11.8584 ⟹a=11.8584

6 ⟹a =1.9764

∴ a = 1.9764 & b = 0.5059

Hence the required equation of straight line is

Y = a + b X ⟹Y = 1.9764 + (0.5059) XProblems on second degree parabola:

Problem -1 Fit a parabola of second degree to the following data.

X 0 1 2 3 4Y 1 1.8 1.3 2.5 6.3

Solution:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 35

∑ X

Page 36: central tendency and correlation coeeficent

QTBD 2013

From the table we have

∑ X=10 ∑Y =12.9 ∑ X2 =30 ∑ X3 =100 ∑ X4 =354 ∑ X2Y =130.3

The second degree parabola equation is

Y=a+bXi+cXi2①

The normal equations for 2nd degree parabola are

∑i=1

n

yi=na−b∑i=1

n

x i−c∑i=1

2

xi2②

∑i=1

n

x i y i=a∑i=1

n

x i−b∑i=1

n

xi2−c∑

i=1

n

x i3③

∑i=1

n

x i2 y i=a∑

i=1

n

x i2−b∑

i=1

n

x i3−c∑

i=1

n

x i4④

⟹12.9 =5 (a) + b (10) + c (30) ⑤

⟹37.1=a (10) + b (30) + c (100) ⑥

⟹130.3 =a (30) +b (100) + c (354) ⑦

From ⑤∧¿ ⑥we have From ⑥&⑦ we have

5 (a) + b (10) + c (30) =12.9 X 2 10 (a) + 30 (b) + 100 (c) = 37.1 X3

a (10) + b (30) + c (100) =37.1 30 (a) + 100 (b) + 354 (c) = 130.310 (a) + 20 (b) + 60 (c) =25.8 30 (a) + 90 (b) + 300 (c) = 111.3K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 36

X Y X2 X3 X 4 XY X2Y

0 1 0 0 0 0 0

1 1.8 1 1 1 1.8 1.8

2 1.3 4 8 16 2.6 5.2

3 2.5 9 27 81 7.5 22.5

4 6.3 16 64 256 25.2 100.8

∑ X=10

∑Y= 12.9

∑ X2=30

∑ X3=100

∑ X4

=354∑ XY=37.1

∑ X2Y=130.3

Page 37: central tendency and correlation coeeficent

QTBD 2013

10 (a) + 30 (b) + 100 (c) =37.1 30 (a) + 100 (b) + 354 (c) = 130.3-10 (b) – 40 (b) = - (11.3) - 10 (b) – 54 (c) = - 19

⟹10 (b) + 40 (c) = 11.3 ⑧ 10 (b) + 54 (c) = 19 ⑨

From ⑧ & ⑨ substituting c = 0.55 in eq ⑥

10 (b) + 40 (c) = 11.3 ⟹ 10 (b) + 40 (0.55) = 11.310 (b) + 54 (c) = 19 ⟹ 10 (b) + 22 = 11.3

14 (c) = 7.7 ⟹ 10 (b) = 11.3 - 22⟹ c=7.7

14 = 0.55 ⟹ c= 0.55 ⟹ 10 (b) = - 10.7 ⟹ b = −10.7

10 = - 1.07

Substituting b = - 1.07 & c= 0.55 in eq ⑤

5 (a) + 10 (-1.07) + 30 (0.55) = 12.9 ⟹ 5 a – 10.7 + 16.5 = 12.9 ⟹ 5 a = 12.9 + 10.7-16.5⟹ 5 a = 23.6 – 16.5 ⟹ 5 a = 7.1 ⟹ a = 7.1

5 = 1.42 ⟹ a = 1.42

∴ a = 1.42 b = - (1.07) c = 0.55Thus the required equation of the second degree parabola is Y = a +b X + c X2 ⟹ Y = 1.42 – 1.047 (X) + 0.55 (X2 ) PROBLEMS ON POWER CURVE Y = a x b :

Problem – 1 For given data fit a power curve of the type Y = a xb

X 1 2 3 4 5 6Y 6.2 8.3 15.4 33.1 65.2 127.4

Solution:

X Y U i= log Y V i= log X V i2 U i V i

1 6.2 0.7924 0 0 02 8.3 0.99191 0.3010 0.0906 0.27663 15.4 1.1875 0.4771 0.2276 0.56654 33.1 1.5198 0.6020 0.3624 0.91495 65.2 1.8142 0.6990 0.4886 1.26816 127.4 2.1052 0.7781 0.6054 1.6380

Total Σ U i=8.3382 Σ V i=2.8572 Σ V i2=1.7746 Σ U iV i

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 37

Page 38: central tendency and correlation coeeficent

QTBD 2013

=4.6641Let power curve be Y = a xb ①

Taking logarithm on both sides, then we get

log y=log[a xb] log y=log a+ log x U=A+ B v ②

The normal equations are

∑U=nA+b∑ v ③

∑Uv=A∑ v+b∑ v2 ④

8.3382 = 6 (A) + B (2.8572) ⑤

4.6641 = A (2.8572) + B (1.7746) ⑥

Solving these equations we get

A B 1

2.8572 -8.3382 6 2.8572

1.7746 -4.6641 2.8572 1.7746

A

[ (2.8572 ) (−4.6641 ) ]−[(−8.3382 ) (1.7746 )]= B

[ (−8.3382 ) (2.8572 ) ]−[6 (−4.6641 )]= 1¿¿

⟹ A

−13.326+14.7970 = B−23.8239+27.9846 = 1

10.6476−8.1636

⟹ A

1.471 = B4.1607

= 12.484

⟹A = 1.4712.484

= 0.5921 ⟹B = 4.16072.484

= 1.675⟹ a = Anti log (A) = Anti log (0.5921) = 3.9093 ⟹a = 3.9093

⟹b = B = 1.675 ⟹ b = 1.675Substituting a & b in equation we get the best fit of power curve①

Hence for the given data, the fitted power curve is⟹ Y = a X b ⟹Y = (3.9093) X (1.675) PROBLEMS ON EXPONENTIAL CURVE Y = a e bx Problem -2 Fit an exponential curve of the form Y = a ebx for the following data

X 1 2 3 4 5 6K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 38

Page 39: central tendency and correlation coeeficent

QTBD 2013

Y 1.4 4.1 13.2 39.3 125 303Solution:X Y U = log Y X2 XU1 1.4 0.1461 1 0.14612 4.1 0.6128 4 1.22563 13.2 1.1206 9 3.36184 39.3 1.5944 16 6.37765 125 2.0969 25 10.48456 303 2.4814 36 14.8884ΣX=21 Σ U= 8.0522 Σ X2 = 91 Σ XU =36.484The exponential curve is Y= a ebx →①

Taking logarithm on both side

⟹log y=log[a ebx] log y=log a + log ebx log y=log a + b x log e

log y=log a +x [b log e] U=A +B X ②

Where U=log y, A=log a, B=b log e

The normal equations are:-

∑U=nA+B∑ x ③

∑ xU=A∑ x+B∑ x2④

From the table we have

ΣX=21 Σ U= 8.0522Σ X2 = 91 Σ XU =36.4848.0522 = 6 (A) + B (21) →⑤

36.484 = 21(A) + B (91) →⑥

A B 121 -18.0522 6 2191 -36.484 21 91

⟹ A(−766.164+732.7502) = B

(−169.0962+218.904 ) = 1(546−441)

⟹ A−33.4138 = B

49.8078 = 1

105 ⟹A=−33.4138

105 = - 0.3182 ⟹B = 49.8078

105 = 0.4744

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 39

Page 40: central tendency and correlation coeeficent

QTBD 2013

a = Anti log (A) = Anti log (-0.3182) = 0.4806b = B

log10e =

B0.4343

= 0.47430.4343

= 1.0921

Substituting a = 0.4806 & b = 1.0921 in equation ①, then we get the best fit of the given curve.

Hence for the given data the fitted exponential curve is

⟹ Y = a e b ⟹Y = (0.4806) e (1.0921 ) X

PROBLEMS ON FITTING OF EXPONENTIAL CURVE Y = ab x Problem -1 Fit an exponential curve of the form Y = abx for the following data

X 1 2 3 4 5 6 7 8Y 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1Solution: Let Y=abx ①

Taking logarithm on both sides we get

log ( y)=log (a .b¿¿ x)=log a+log bx= loga+ x log (b)¿

U=A+Bx ②

Where U=log y, A=log a, B=log b

The normal equations for estimating A & B are

∑U=nA+B∑ x ③

∑ xU=A∑ x+B∑ x2 ④

X Y U = log Y XU X21 1.0 0 0 12 1.2 0.0792 0.1584 43 1.8 0.2553 0.7659 94 2.5 0.3979 1.5916 165 3.6 0.5563 2.7815 256 4.7 0.6721 4.0326 367 6.6 0.8195 5.7365 498 9.1 0.9590 7.6720 64ΣX = 36 Σ Y = 30.5 Σ U = 3.7393 Σ XU = 22.7385 Σ X2=204

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 40

Page 41: central tendency and correlation coeeficent

QTBD 2013

From the above table we have

ΣX = 36 Σ Y = 30.5 Σ U = 3.7393 Σ XU = 22.7385 Σ X2=204

3.7393 = 8 (A) + B (36) ⑥ X 36 ⟹288 (A) + 1296 (B) = 134.614822.7385 = A (36) + B (204) ⑦ X 8 ⟹288 (A) + 1632 (B) = 181.908

336 (B) = 47.2932

⟹ B = 47.2932336

= 0.1407 ⟹ B = 0.1408Substituting B in equation ⑥⟹8 (A) + 36 (0.1408) = 3.7393 ⟹ 8(A) + 5.0688 = 3.7393 ⟹ 8(A) = 33.7393-5.0652⟹ 8 (A) = 1.3295 ⟹A = 1.3295

8 = 0.1662 ⟹ A = 0.1662

⟹ a = Anti log (A) ⟹a = Anti log (0.1662) =0.6821 ⟹ a = 0.6821⟹ b = Anti log (B) ⟹ b = Anti log (0.1408) = 1.383 ⟹ b = 1.383

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 41

Page 42: central tendency and correlation coeeficent

QTBD 2013

CORRELATION Uni- variate Distribution Bi-variate Distribution Multi – variate Distribution

1. Uni – variate Distribution: The distribution involving only one variable is called “uni-Variate distribution “.

Example: The heights of certain group of persons.2. Bi – variate Distribution: The distribution involving only 2 variables is called “ Bi-

Variate distribution “.Example: The heights and weights of certain group of persons.

3. Multi- variate Distribution: The distribution involving 2 or more than variables is called“Multi – variate distribution “.

Correlation: Definition 1 If the change in one variable effects a change in the other variable, then

Variables are said to be “correlated variables”. Definition 2 Correlation is an analysis of the ‘co-variation’ between 2 or more variables. Types of Correlation: Positive Correlation (or) Direct Correlation Negative Correlation (or) Inverse Correlation Perfect Correlation1) Positive Correlation: Definition 1 If the variables deviate in same direction then the variables are to be

“Positive correlation”. Definition 2 In another words, if the increase in the value of one variable is accompanied

by increase in the value of other value or a decrease in the value of one variable is accompanied by the decrease in the other variable, then the variables are said to be “Directly correlated variables”. Examples: 1) Price & Supply of goods. 2) Income & Expenditures of a group of persons.

2) Negative Correlation: Definition 1 If the variables deviate in opposite direction then the variables are to be

“Negative correlation”. Definition 2 In another words, if the increase in the value of one variable is accompanied

by decrease in the value of other value or a decrease in the value of one variable is accompanied by the increase in the other variable, then the variables are said to be “Directly correlated variable”. Examples: 1) Volume & pressure of a perfect gas. 2) Price & Demand of goods.

3) Perfect Correlation:

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 42

Page 43: central tendency and correlation coeeficent

QTBD 2013

Definition: If the deviation in one variable is followed by a corresponding and proportional deviation in the other variable, then the variables are said to be “perfectly correlated variables”.

Linear Correlation: Definition: If the ‘ratio’ of the change is ‘uniform’, then there will be “linear correlation”

between the variables. If we plot these on the graph then we get a ‘straight line’.Example: We can see that ‘ratio of the change between the variables is same.

A 2 7 12 17B 3 9 15 21

Linear Correlation: Definition: The amount of change of one variable does not bear a constant ratio of the

amount of change in the other variables, and then the correlation is called “Non- linear correlation”. Non-linear correlation is also called ‘Curvy linear correlation’.

Uses (or) Applications of Correlation:1) Correlation is a measure of extent of relation between 2 variables.2) By using the correlation coefficient we can predict the future.3) Correlation coefficient will contribute the economic behaviour.4) By using the correlation coefficient we can find the value of variable if the value of another

variable has given. Perfect Linear Correlation:

Definition: If the all points lie exactly on the “straight line”, then the correlation is said to be “perfect linear correlation”.

Perfect Positive Correlation: Definition: If the correlation is linear and the line runs from lower left hand corner to the upper right hand corner. Then the correlation is called “perfect positive correlation “.It is denoted by r = +1 or r = -1.

Perfect Negative Correlation: Definition: If the correlation is linear and the line runs from upper left hand corner to lower right hand corner. Then the correlation is called “perfect negative correlation.

No Correlation: If the plotted points lie scattered all over graph paper, then there is no correlation

between 2 variables. And the variables are said to be “Statistically independent”. If r = 0 the variables X & Y are said to be “Independent”.

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 43

No correlation No correlation

Perfect +ve correlation Perfect – ve correlation

Page 44: central tendency and correlation coeeficent

QTBD 2013

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Methods of Studying Correlation: There are 2 different methods for finding out the relationship between the

Variables.1) Graphical Method 2) Mathematical Method1) Graphical Method:

a) Scatter Diagram b) Scatter gram2) Mathematical Method:

a) Karl Pearson’s Correlation Coefficient.b) Spearman’s Rank Correlation.c) Coefficient of Concurrent Deviation.d) Methods of Least Squares.

Mathematical Method: a) Karl Pearson’s Correlation Coefficient:

As a measure of ‘intensity’ or ‘degree’ of linear relationship between 2 variables, Karl Pearson’s, a British Bio-metrician, developed a formula called “correlation coefficient”.

Correlation coefficient 2 variables X & Y, usually denoted by r (x, y) or r XY and is given by

r (x, y) = r XY = cov (x , y )√ x .√ y

→1

Where Cov (X,Y) = E{ (X-E(x) (Y-E(Y) } =E { (X-X ) (Y-Y ) } = E(XY) -X Y = 1n

¿XY) -X Y

V(X)= E {( X−E( X))2} = E { X2 - X2} = E(X2) – E(X2 ) = 1n

𝞢X2 - X2

V(Y)= E {(Y −E(Y ))2} = E { Y 2 - Y 2} = E(Y 2) – E(Y 2 ) = 1n

𝞢Y 2 - Y 2

r (x, y) = 1n¿¿

Properties of Correlation Coefficient: 1) Limits for correlation coefficient lies between -1 & +1.

i.e. -1 ≤ r (x, y) ≤ +1.2) Correlation coefficient is independent of change of origin & scale.3) Two independent variables are un-correlated. Its converse need not be true.

Regression: Definition: “Regression Analysis” is a mathematical measure of average relationship between 2 or more variables in terms of the original units of the data.In regression Analysis there 2 types of variables, dependent variable & independent variable. The variable whose value is ‘influenced’ or is to be ‘predicted’ is called ‘Dependent variable’The variable which ‘influences’ or is used for ‘prediction’ is called “independent variable”.

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 44

Page 45: central tendency and correlation coeeficent

QTBD 2013

Lines of Regression: The line of regression is the line which gives the best estimate to the of one variable

for any specific value of the other variable. Thus the line of regression is the line of ‘best fit’, Which can be obtained by using “principle of least square “technique.

Linear Regression: If the points in the scatter diagram are a straight line, then it is called “linear

Regression”. Non-Linear Regression:

If the points in the scatter diagram is a curve, then is is called “non-linear Regression” or “curvy-linear regression”.

Curve of Regression: If the variables in a bi- variate distribution are related, we find that the points in the

Scatter diagram will cluster round some curve is called “curve of regression”.Let us suppose that in the bi- variate distribution (x, y) i= 1, 2, ...., n where

X= independent variable Y = dependent variable. Let the line of the regression Y on X beY = a + b X → 1According to the principle of least squares, the normal equations for estimating

a & b are

∑i=1

n

yi = n.a +b ∑i=1

n

x i →2 ∑i=1

n

( xi ) .( y¿¿ i)¿ ¿a∑i=1

n

(X i)+¿b ∑i=1

n

x i2→3

Regression Equations :1) Regression Equation Y on X 2) Regression Equation X on y

Regression Equation Y on X :Since b is the ‘slope’ of the line of regression of Y on X. And since the line of

Regression passes through the point (x , y ), and its equation is

Y - y = b (X - x ) ⟹ Y - y = r [ σ x

σ y] (X - x )

Where b yx = r [ σ y

σ x] = The regression coefficient Y on X r = correlation coefficient

Regression Equation Y on X :The regression equation X on Y is given by

(X - x )= b (Y - y ¿ ⟹(X - x ) = r [ σ x

σ y] Y - y

Where bxy = r [ σ x

σ y] = The regression coefficient Y on X r = correlation coefficient

Regression Coefficients :The slope of the regression is called “coefficient of regression”. The coefficient of

regression Y on X indicates the change in the value of variable Y corresponding to a unit change in the value of variable x and is given by

b yx = r [ σ y

σ x] = The regression coefficient Y on X ⟹b yx = r [ σ y

σ x]

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 45

Page 46: central tendency and correlation coeeficent

QTBD 2013

Similarly, the coefficient of regression X on Y indicates the change in the value ofVariable X corresponding to a unit change in the value of variable Y and is given by

bxy = r [ σ x

σ y] = The regression coefficient Y on X ⟹bxy = r [ σ x

σ y]

Properties of Regression Coefficient :1) The Geometric mean (G.M.) of regression coefficient is equals to the correlation

coefficient.√ (bxy ) .(b¿¿ yx)¿ = r

2) If one of the regression coefficients is greater than the unity, then other must be less than unity. i.e.bxy≤ 1 ⟹b yx≥ 1

3) Arithmetic Mean (A.M.) of the regression coefficients is equals to the correlation

coefficient.12

[bxy +b yx ] ≥ r

4) Regression coefficient is independent of change of origin but not scale.5) The angle between 2 regression lines are

θ = tan−1 {1−r2

r.

σ x2 . σ y

2

σx2+σ y

2 }PROBLEMS ON CORRELATION COEFFICIENT:Problem -1 Calculate the correlation coefficient for the following heights (in inches) of father(X)And their sons (Y)

X 65 66 67 67 68 69 70 72Y 67 68 65 68 72 72 69 71

Solution:X Y X2 Y 2 XY

65 67 4225 4489 435566 68 4356 4624 448867 65 4489 4225 435567 68 4489 4624 455668 72 4624 5184 489669 72 4761 5184 496870 69 4900 4761 483072 71 5184 5041 5112𝞢 X =

544𝞢 Y = 552 𝞢X2= 37028 𝞢Y 2= 38132 𝞢 XY = 37560

From the above table we have 𝞢 X = 544 𝞢 Y = 552 𝞢X2= 37028 𝞢Y 2= 38132 𝞢 XY = 37560 X =

Σ Xn

= 5448

= 68 Y = Σ Yn

= 3528

= 69

The correlation coefficient is given by

r (x, y) = cov (x , y )√ x .√ y

= 1n¿¿ =

375608

− (68 )(69)

√ 370288 −682 .√ 381328 −692

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 46

Page 47: central tendency and correlation coeeficent

QTBD 2013

= 4695−4692

√(4628.5−4624 ) .(4766.5−4761) =

3

√(4.5 ) .(5.5) =

3

√(24.75) =

34.9749

= 0.6030

∴ r (x, y) = 0.6030Problem -2

Calculate the correlation coefficient for the following heights (in inches) of father(X)And their sons (Y)

X 65 66 67 67 68 69 70 72Y 67 68 65 68 72 72 69 71

Solution:X Y U =X-68 Y=Y-69 U 2 V 2 UV65 67 -3 -2 9 4 666 68 -2 -1 4 1 267 65 -4 -4 1 16 467 68 -1 -1 1 1 168 72 0 3 0 9 069 72 1 3 1 9 370 69 2 0 4 0 0

72 71 4 2 16 4 8𝞢X=544 𝞢Y=552 𝞢U=0 𝞢V=0 𝞢U 2=36 𝞢V 2=44 𝞢UV=24The correlation coefficient is

r (U,V) = COV (U ,V )σU . σV

→①

⟹U =ΣUn

= 08

= 0 ⟹U =0 ⟹V =ΣVn

=08

=0⟹V =0⟹Cov (U, V) = 1

n UV – (U ,V ) =

248

- (0) (0) =3-0 =3 ⟹ Cov (U, V) =3⟹σ U

2= 1n 𝞢U 2 - U 2 =36

8 = 4.5-0 =4.5 ⟹σ U

2 = 4.5⟹σV

2 = 1n 𝞢V 2 - V 2 =448 = 5.5 -0 =5.5 ⟹σV2 = 5.5

∴ r (U,V) = 3

√4.5−√5.5 = 3

√24.75 = 34.9749

= 0.6030 ⟹r (U,V) =0.6030PROBLEMS ON REGRESSION LINESProblem -1 Price indices of cotton and wool are given below for the 12 months of a year. ObtainThe equations of lines of regression between the indices

Price indexOf cotton (X)

78 77 85 88 87 82 81 77 76 83 97 93

Price Index of wool (Y)

84 82 82 85 89 90 88 92 83 89 98 99

Solution:X Y U = X-84 V = Y-88 U 2 V 2 UV

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 47

Page 48: central tendency and correlation coeeficent

QTBD 2013

78 84 -6 -4 36 16 2477 82 -7 -6 49 36 4285 82 +1 -6 1 36 -688 85 +4 -3 16 9 -1287 89 +3 +1 9 1 382 90 -2 +2 4 4 -481 88 -3 0 9 0 077 92 -7 +4 49 16 -2876 83 -8 -5 64 25 40

83 89 -1 +1 1 1 -197 98 +13 +10 169 100 13093 99 +9 +11 81 121 99𝞢 X =

1004𝞢 Y=1061 𝞢U=- 4 𝞢V = +5 𝞢U 2 =488 𝞢V 2 =365 𝞢UV =287

⟹ X =ΣXn

= 100412

=83.67 ⟹X = -83.67 ⟹Y =ΣYn

= 106112

=88.42 ⟹X = 88.42⟹U =

ΣUn

= −412

= -0.34 ⟹U = -0.34 ⟹V =ΣVn

= 512

= 0.42 ⟹V = 0.42r (U,V) = COV (U ,V )

σU . σV

→①

Cov (U, V) = 1n

UV – (U ,V ) = 28712

- (0.34)(0.42) =23.92 – 0.14 =23.78 σ U

2= 1n 𝞢U 2 - U 2 = 488

12 – (0.34)2 = 40.67 – 0.110 = 40.56

σ V2 = 1

n 𝞢V 2 - V 2 = 365

12 – (0.42)2 = 30.42 – 0.18 = 30.24

r (U,V) = 23.78

√6.37 √5.50 = 23.78(6.37 )(5.50) = 23.78

35.03 = 0.6788

The regression equation Y on X is ⟹ Y - y = r [ σ x

σ y] (X - x ) ⟹ (Y- 88.42)= 0.68( 5.506.37 ) ⟹( Y-88.42) = 0.68(0.86)(X-

83.67)⟹ (Y-88.42) = (0.59) (X-83.67)⟹(X - x ) = r [ σ x

σ y] (Y - y ) ⟹ (X-83.67) = 0.68 ( 6.375.50 )⟹(X- 83.67) = 0.68 (1.16)(Y-

88.42)⟹ (X- 83.67) = (0.79) (Y-88.42)

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 48

Page 49: central tendency and correlation coeeficent

QTBD 2013

K.V.RAMESH BABUM.SC.STATISATICS @ ASSISTANT PROFESSOR Page 49