Survey carried out by team

16
Statistics Assignment . Submitted by: Syed Naser Mohiuddin : 192 Syed Tabriz Zaidie : 193 Tanay Agarwal : 194 Tanay Salpekar : 195 Tanaya Gaikwad : 196

description

Statistical Survey carried out

Transcript of Survey carried out by team

Page 1: Survey carried out by team

Statistics Assignment .

Submitted by:

Syed Naser Mohiuddin : 192

Syed Tabriz Zaidie : 193

Tanay Agarwal : 194

Tanay Salpekar : 195

Tanaya Gaikwad : 196

Page 2: Survey carried out by team

Project Assignment No. 1

Source : http://mospi.nic.in/mospi_new/upload/SYB2014/index1.html

Table 32.1: ESTIMATED EMPLOYMENT IN THE PUBLIC AND PRIVATE SECTORS('000 number)

(As on 31st March)

State/ Union Territory Central Government 2009 2010 2011 Andhra Pradesh 193 207 199 Assam 68 67 67 Bihar 92 92 92 Chhattisgarh 43 27 27 Goa 7 6 6 Gujarat 86 80 79 Haryana 23 23 23 Himachal Pradesh 14 14 14 Jammu & Kashmir 28 28 28 Jharkhand 87 96 89 Karnataka 95 97 96 Kerala 63 61 61 Madhya Pradesh 135 129 129 Maharashtra 357 306 339 Manipur 4 4 4 Meghalaya 4 5 5 Mizoram 1 1 - Nagaland 4 5 5 Orissa 63 61 69 Punjab 60 62 62 Rajasthan 121 118 115 Tamil Nadu 223 217 216 Tripura 7 2 7 Uttar Pradesh 329 324 319 Uttarakhand 21 22 22 West Bengal 308 272 167

Union Territory: A. & N. Islands 5 5 5 Chandigarh 12 11 11 Delhi 203 203 202 Daman & Diu - - - Puducherry 5 5 4

Page 3: Survey carried out by team

Purpose: The Ministry of Statistics and Programme Implementation is concerned with

coverage and quality aspects of statistics released. MOS&PI is the nodal agency for coordinating the statistical system in the country and has the responsibility for improving the quality and timeliness of existing datasets. The above data is maintained by the site in order to keep track of employment details of every state under central government scheme for a year. As it is taken from statistical year 2014, it is the most recent data.

Type of data:

The type of data is Ordinate type.

The data considered for all the calculations below is that of central government employment during 2011.

Concept Name : Frequency distribution

The frequency distribution shows the number of observations of the data set that falls into each of the classes. For example, the first row indicates that there are fifteen states in which the people employed are in the range of 0 to 49999.

Selection of variable: Central government employment during 2011. Findings and Interpretation of results:

The data indicates that more number of states have employment in the range of 0 to 49999.

Range Frequency

0-49 15 50-99 8

100-149 2 150-199 2 200-249 2 250-299 0 300-349 2

Page 4: Survey carried out by team

Concept name: Cumulative Frequency

Cumulative Frequency shows the number of observations below the given range.

Selection of variable: Central government employment during 2011. Findings and Interpretation of results:

Range Cumulative Frequency

<50 15 <100 23 <150 25 <200 27 <250 29 <300 29 <350 31

Concept Name: Relative Frequency

Relative frequency presents frequencies in terms of fractions for different classes.

Selection of variable: Central government employment during 2011. Findings and Interpretation of results:

From the following observation table, we can see the maximum employment is in the class of less than 50000. Relative frequency is the nothing but the fractional representation of frequency distribution.

Class Relative Frequency 0-49 0.483871

50-99 0.258065 100-149 0.064516 150-199 0.064516 200-249 0.064516 250-299 0 300-349 0.064516

Concept Name: Histogram

Histogram is a series of rectangles each proportional in width to the range of values within a class and proportional in height to the number of items falling in the class. Histogram is pictorial representation of frequency distribution.

Selection of variable: Central government employment during 2011. Findings and Interpretation of results:

The data is concentrated more in the first range.

Page 5: Survey carried out by team

Concept name: Pareto chart Selection of variable: Statewide employment and State wise % employment

contribution in Central Government (2011) Formula and calculation steps:

Step 1) Arrange data in Decreasing order. This will give the States contributing maximum employees in Central Govt. Step 2) Calculate Cumulative Count from that sorted data and then calculate Cumulative % Step 3) Using these parameters plot Pareto Chart

02468

10121416

25 75 125 175 225 275 325

Freq

uenc

y

Range mid value

Histogram

0102030405060708090100

050

100150200250300350400

Mah

aras

htra

Utta

r Pra

desh

Tam

il N

adu

Delh

iAn

dhra

Pra

desh

Wes

t Ben

gal

Mad

hya

Prad

esh

Raja

stha

nKa

rnat

aka

Biha

rJh

arkh

and

Guja

rat

Oris

saAs

sam

Punj

abKe

rala

Jam

mu

& K

ashm

irCh

attis

garh

Hary

ana

Utta

rakh

and

Him

acha

l Pra

desh

Chan

diga

rhTr

ipur

aGo

aM

egha

laya

Nag

alan

dA.

& N

. Isla

nds

Man

ipur

Pudu

cher

ryM

izora

mDa

man

& D

iu

Pareto Chart

State wise employment State wise % employment contribution

Page 6: Survey carried out by team

Findings and Interpretation of results: From the above Pareto chart, we can infer that top states (i.e. Maharashtra to Bihar) are contributing to the 80% of the employment in Central Government jobs.

Notes if any: 1) This chart is in accordance with Pareto Principle (80-20 rule) i.e.

roughly 80% of the effects come from 20% of the causes.

Concept Name: Frequency Polygon In a Frequency Polygon, a line graph is drawn by joining all the midpoints of the top of the bars of a histogram.

Selection of variable: Range – The number of employees are divided into 7 classes on the scale of 0-50. Frequency – The number of states falling in the above mentioned range.

Formula and calculation steps: Step 1) The number of employees in Central Government in 2011 ranged from 0 to 350. So divide them into 7 classes i.e. (350-0)/7 = 50 Hence classes would be [0-49], [50 – 99],…, [300-349] Step 2) Count the states and U.T.s falling in each class and record it against each class. Step 3) Plot the points graphically with Range on X-axis and Frequency on Y-axis and join the plots to get Frequency Polygon.

0

2

4

6

8

10

12

14

16

0-49 50-99 100-149 150-199 200-249 250-299 300-349

Freq

uenc

y

Range

Frequency Polygon

Page 7: Survey carried out by team

Findings and Interpretation of results: 1) Number of states contributing less than 150,000 are high, then the states

with range of less than 3,49,000 are more or less equal in all the ranges.

Notes if any: 1) This shows that 50% of the states contributed less than 50,000

employees in Central Government jobs in 2011.

Concept Name: Ogives

Selection of variable: Cumulative Relative Frequency and 7 equivalent classes of size 50

Formula and calculation steps: Step 1) From the Frequency distribution, calculate relative cumulative frequencies Step 2) On X-axis put the classes and Y-axis should be Cumulative Relative Frequency Step 3) Plot the points

Findings and Interpretation of results: 80% of the states contribute less than 150,000 employees in Central Govt jobs in 2011.

00.10.20.30.40.50.60.70.80.9

1

<0 <50 <100 <150 <200 <250 <300 <350

Rela

tive

Freq

uenc

y

Employment Number

Ogives

Page 8: Survey carried out by team

Concept Name: Pie Chart

Selection of variable: State wise contribution in employment in Central Govt jobs in 2011

Formula and calculation steps: 1) For each state note the number of employees

2) Add the figures obtained above, and equate the sum to 360.

3) Obtain the individual state figures with respect to 360

4) Represent the values obtained above in a Circle with unique color coding or

shades.

Findings and Interpretation of results: Top 3 contributors are Maharashtra, U.P. and Tamil Nadu in the Central Govt. Jobs in

2011 Lowest contributors are Mizoram and Daman & Diu.

Page 9: Survey carried out by team

Concept Name: Stem and Leaf Chart A stem and leaf plot may refer to plotting a matrix of y values onto a common x axis, and identifying the common x value with a vertical line, and the individual y values with symbols on the line Selection of variable:

The central government employment during 2011.

0 0 0 4 4 5 5 5 6 7 1 1 4 2 2 3 7 8 3 4 5 6 1 2 7 9 7 9 8 9 9 2 6

10 11 5 12 9 13 14 15 16 7 17 18 19 9 20 2 21 6 22 23 24 25 26 27 28 29 30 31 9 32 33 9 34

Page 10: Survey carried out by team

Table showing mean, median, mode, standard deviation, Coefficient of variation.

S No Employment in different states in Central Government during 2011 in thousands

State x (x-µ) (x-µ)2 1 Andhra Pradesh 199 120 14400 2 Assam 67 -12 144 3 Bihar 92 13 169 4 Chhattisgarh 27 -52 2704 5 Goa 6 -73 5329 6 Gujarat 79 0 0 7 Haryana 23 -56 3136 8 Himachal Pradesh 14 -65 4225 9 Jammu & Kashmir 28 -51 2601 10 Jharkhand 89 10 100 11 Karnataka 96 17 289 12 Kerala 61 -18 324 13 Madhya Pradesh 129 50 2500 14 Maharashtra 339 260 67600 15 Manipur 4 -75 5625 16 Meghalaya 5 -74 5476 17 Mizoram 0 -79 6241 18 Nagaland 5 -74 5476 19 Orissa 69 -10 100 20 Punjab 62 -17 289 21 Rajasthan 115 36 1296 22 Tamil Nadu 216 137 18769 23 Tripura 7 -72 5184 24 Uttar Pradesh 319 240 57600 25 Uttarakhand 22 -57 3249 26 West Bengal 167 88 7744 27 A. & N. Islands 5 -74 5476 28 Chandigarh 11 -68 4624 29 Delhi 202 123 15129 30 Daman & Diu 0 -79 6241 31 Puducherry 4 -75 5625 ∑ x 2462 Sum of (x-µ)2 257665 Range 339 Arithmetic Mean 79 Mode 5 Median 61 Geometric mean 0 Variance 8311.77419 Standard Deviation 91.1689322 Coefficient of Variation 114.79435 Mean Absolute Deviation

70.1613

Page 11: Survey carried out by team

Concept Name : Range It the difference of maximum value and minimum value. It basically signifies the

spread of the data. Range= 339-0=339.

Concept Name : Arithmetic Mean

It is sum of a collection of numbers divided by the number of numbers in the collection.

Arithmetic Mean = (∑x) / N = 2462/31=79.

Concept Name : Geometric Mean It is the nth root of the product of values of all the observations.

Geometric Mean= n√(πx)=0. Concept Name : Median

It is the middle value of the observations when arranged in increasing or decreasing order.

For odd number of observations it is the middle value and for even number of observations it is the average of the two middle values.

Concept Name : Mode

It is the most repeated observation in the data.

Concept Name : Variance Variance is a measure of deviation denoted by σ2.

σ2 = (x-µ)2 / N

Concept Name : Standard Deviation It is the square root of variance. (σ).

Page 12: Survey carried out by team

Concept Name : Scatter Diagram The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will hug the line.

2011 central 2011 state Andhra Pradesh 199 394 Assam 67 315 Bihar 92 206 Chattisgarh 27 147 Goa 6 55 Gujarat 79 170 Haryana 23 248 Himachal Pradesh 14 189 Jammu & Kashmir 28 148 Jharkhand 89 183 Karnataka 96 574 Kerala 61 269 Madhya Pradesh 129 464 Maharashtra 339 543 Manipur 4 61 Meghalaya 5 32 Mizoram 0 5 Nagaland 5 63 Orissa 69 361 Punjab 62 254 Rajasthan 115 603 Tamil Nadu 216 549 Tripura 7 113 Uttar Pradesh 319 692 Uttarakhand 22 106 West Bengal 167 279 A. & N. Islands 5 26 Chandigarh 11 22 Delhi 202 131 Daman & Diu 0 1 Puducherry 4 15

Page 13: Survey carried out by team

Inference:

The values of central and state government employments are positively related with one

stray value. (Delhi)

0

100

200

300

400

500

600

700

800

0 50 100 150 200 250 300 350 400

Scatter diagram for 2011 Employment in Central government Vs State government

Central government Employment

Stat

e Go

vern

men

t

Page 14: Survey carried out by team

Concept Name : Kurtosis & Skewness

State X x-ẍ (x-ẍ)^2 (x-ẍ)^4 Andhra Pradesh 199 120 14400 207360000 Assam 67 -12 144 20736 Bihar 92 13 169 28561 Chattisgarh 27 -52 2704 7311616 Goa 6 -73 5329 28398241 Gujarat 79 0 0 0 Haryana 23 -56 3136 9834496 Himachal Pradesh 14 -65 4225 17850625 Jammu & Kashmir 28 -51 2601 6765201 Jharkhand 89 10 100 10000 Karnataka 96 17 289 83521 Kerala 61 -18 324 104976 Madhya Pradesh 129 50 2500 6250000 Maharashtra 339 260 67600 4569760000 Manipur 4 -75 5625 31640625 Meghalaya 5 -74 5476 29986576 Mizoram 0 -79 6241 38950081 Nagaland 5 -74 5476 29986576 Orissa 69 -10 100 10000 Punjab 62 -17 289 83521 Rajasthan 115 36 1296 1679616 Tamil Nadu 216 137 18769 352275361 Tripura 7 -72 5184 26873856 Uttar Pradesh 319 240 57600 3317760000 Uttarakhand 22 -57 3249 10556001 West Bengal 167 88 7744 59969536 A. & N. Islands 5 -74 5476 29986576 Chandigarh 11 -68 4624 21381376 Delhi 202 123 15129 228886641 Daman & Diu 0 -79 6241 38950081 Puducherry 4 -75 5625 31640625

∑(x-ẍ)^2 257665 ∑(x-ẍ)^4 9104395021

Mean 79

Kurtosis

0.137132449 Median 61 Standard Deviation 92.67

Skewness, 3(mean-median)/σ 0.59626

Kurtosis is the degree of peakedness of a distribution, defined as a normalized form of the fourth central moment of a distribution.

A distribution with a high peak is called leptokurtic, a flat-topped curve is called platykurtic, and the normal distribution is called mesokurtic.

Page 15: Survey carried out by team

The data we have selected is platykurtic because its peakedness is less than that of normal distribution. Skewness: Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. From the above value of skewness, we can infer that the data is positively, ie right skewed.

Weighted Average:

An average in which each quantity to be averaged is assigned a weight. These weightings determine the relative importance of each quantity on the average. Volume weighted average price (VWAP) is one such concept used in share trading which is nothing but weighted average. VWAP: A trading benchmark used especially in pension plans. VWAP is calculated by adding up the dollars traded for every transaction (price multiplied by number of shares traded) and then dividing by the total shares traded for the day.

-----From Investopedia. Thus, if during a trading hour 500 shares have been sold for 30 minutes at Rs 1000 and 300 have been sold at 1500 during other 30 minutes, then the weighted average will be : (1000 x 500 + 1500 x 300) / 800 = 1187.9.

Page 16: Survey carried out by team

Concept: Percentile, Quartile & Decile:

Ranking State 2011 Percentile Quartile Decile 1 Maharashtra 339 100.00% 25.00 10.00 2 Uttar Pradesh 319 96.77% 24.19 9.68 3 Tamil Nadu 216 93.55% 23.39 9.35 4 Delhi 202 90.32% 22.58 9.03

5 Andhra Pradesh 199 87.10% 21.77 8.71

6 West Bengal 167 83.87% 20.97 8.39

7 Madhya Pradesh 129 80.65% 20.16 8.06

8 Rajasthan 115 77.42% 19.35 7.74 9 Karnataka 96 74.19% 18.55 7.42

10 Bihar 92 70.97% 17.74 7.10 11 Jharkhand 89 67.74% 16.94 6.77 12 Gujarat 79 64.52% 16.13 6.45 13 Orissa 69 61.29% 15.32 6.13 14 Assam 67 58.06% 14.52 5.81 15 Punjab 62 54.84% 13.71 5.48 16 Kerala 61 51.61% 12.90 5.16

17 Jammu & Kashmir 28 48.39% 12.10 4.84

18 Chattisgarh 27 45.16% 11.29 4.52 19 Haryana 23 41.94% 10.48 4.19 20 Uttarakhand 22 38.71% 9.68 3.87

21 Himachal Pradesh 14 35.48% 8.87 3.55

22 Chandigarh 11 32.26% 8.06 3.23 23 Tripura 7 29.03% 7.26 2.90 24 Goa 6 25.81% 6.45 2.58 25 Meghalaya 5 22.58% 5.65 2.26 25 Nagaland 5 22.58% 5.65 2.26 25 A. & N. Islands 5 22.58% 5.65 2.26 28 Manipur 4 12.90% 3.23 1.29 28 Puducherry 4 12.90% 3.23 1.29 30 Mizoram 0 6.45% 1.61 0.65 31 Daman & Diu 0 3.23% 0.81 0.32

Overall Findings and Interpretations: We can see that the data is more concentrated on the lower side i.e. out of 31, 15 states have less than 50000 employees working in the central government in the year 2011. Thus we can say that the data is skewed towards the right.

Also the coefficient of variation of the data is relatively high. Thus we can say that the data is spread over a very wide range.