stats form 4

35
TOPIC 7 : STATISTICS (FORM 4) Presented by Nur Shakirin Sabri Nur Ayuni Daud Nornaimah Rodzi

Transcript of stats form 4

Page 1: stats form 4

TOPIC 7 : STATISTICS (FORM 4)

Presented by

Nur Shakirin Sabri

Nur Ayuni Daud

Nornaimah Rodzi

Page 2: stats form 4

Statistics

Measures of central

tendency

Mean

Median

Mode

Measures of dispersion

Range

Quartiles

Variance and Stanard

deviation

Page 3: stats form 4

Ungrouped and Grouped Data

Ungrouped data is where the values are handled individually.

e.g. 1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17

Grouped data is where the values are grouped into classes because sometimes we may collect a large number of data with varying values. e.g.

Circumference (cm)

1 - 2 3 - 4 5 - 6 7 - 8

No. of branches 2 10 14 8

Page 4: stats form 4

Mean

Mean of an ungrouped set of data is when we add up all the values in a set of data and the sum is then divided by the number of the values.

Let x be any value in the set of dataN be the number of valuesx be the mean of the set of data.

Example: Find the mean of the following data.1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17 = 1 + 3 + 6 + 6 + 6 + 6 + 7 + 7 + 12 + 12 + 17 =

7.54611

Page 5: stats form 4

Mean of grouped data.For grouped data, we take the midpoint of the

class, known as the class mark, to represent the class.

Let f be the frequency of for each class x be the corresponding class mark

Example:Circumference (cm)

1 - 2 3 - 4 5 - 6 7 - 8

No. of branches, f 2 10 14 8

Midpoint, x 1+2 =1.5

2

3+4=3.5 2

5+6=5.5 2

7+8=7.5 2

=(1.5)2 + (3.5)10 + (5.5)14 + (7.5)8 = 5.147 2 + 10 + 14 + 8

Page 6: stats form 4

MedianWhen the values of a set of data is arranged either

ascending or descending, the values that lies in the middle is called median.

If n (number of values) is an odd number median = (n +1)

2If n is an even number, the median is the mean of n th value and n + 1 th value. 2Example: 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17median is (n +1) th= ( 11 + 1) th = 6 th = 7

2 2

Page 7: stats form 4

Mode

Mode is the value that occurs the most frequently in a set of data.

Example : 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17Mode = 6

Page 8: stats form 4

Modal Class of Grouped Data The class from grouped data that have

the highest frequency is known as the modal class. Example:

Q: Determine the modal class.A: The class having the highest frequency

is 30-39 cm. Hence, the modal class is 30-39 cm.

Circumference(cm)

10-19 20-29 30-39 40-49

No. of trees 14 19 23 10

Page 9: stats form 4

Finding the mode from histogram

If a distribution of a set grouped data is represented by a histogram, we can estimate the mode of distribution from the histogram.

Mode

Page 10: stats form 4

Steps to find mode from histogram.

1. Determine the modal class in the histogram.

2. Join the top vertices of the modal class to the vertices of the adjacent.

3. Determine the value of the horizontal axis at the intersection of the two lines. This value obtained represents the mode.

Modal class

Mode

Page 11: stats form 4

Median from cumulative frequency distribution table

The median of grouped set of data can be calculated from cumulative frequency distribution table using the following formula:

Median:

Where L = lower boundaries where the median lies N= total frequencyF= cumulative frequency before the class in which

the median liesC= class intervalfm= frequency of the class where the median lies.

Page 12: stats form 4

Calculate the median.The median lies between 17th and 18th branches,

which is within the class 5-6.The cumulative frequency before the class 5-6,

F=12Lower boundary, L=4.5Class interval, C= 2. Frequency of the class fm=

14

Circumference (cm)

1 - 2 3 - 4 5- 6 7 - 8

No. of branches, f 2 10 14 8

Cumulative frequency, F

2 12 26 34

Page 13: stats form 4

Estimating the median of grouped data from an ogive.

We can use ogive to estimate the median of grouped data.

Median

Page 14: stats form 4

Effects of uniform changes in data on mean, median and mode

Activity: To find the effects on the measures of central tendency if every value of the data is changed uniformly

Given 4, 12, 5, 7, 9, 6, 10, 6, 13a) Find the mean, median, and modeb) Add 3 to every value. Find the mean, median,

and mode and compare with the answer in (a)c) Multiple 2 to every value. Find the mean,

median, and mode and compare with the answer in (a)

d) Discuss the findings.

Page 15: stats form 4

Effects of uniform changes in data on mean, median and mode

From the activity, we find that if u is the original measures of central tendency, and v is the new value after a uniform change in every value of the data, then we have

v = u + k if k is added to every value of the data.

v = cu if every value of the data is multiplied by c.

Page 16: stats form 4

Extreme values in the data

30, 28, 120, 43, 35, 9

The numbers 9 and 120 are considered extreme values in the set of data.

Affect significantly to mean of the data But little or no effect on median and

mode.

Page 17: stats form 4

Effect of adding or removing a value from a set of data.

Uncertain Generally, the mean or median will be

shifted to a higher value when some values greater than original mean or median are added or some values smaller than the original mean or median are removed.

But, the mean, mode or median will remain unchanged if the value added or removed is equal to the corresponding measure of central tendency.

Page 18: stats form 4

Determining the most suitable measure of central tendency.

Quenstion: Determine the most suitable measure of central tendency for these set of data.

1. 42, 30, 39, 40, 35, 30 Mean = 36, Median = 37, Mode = 30

2. 42, 30, 39, 40, 35, 30, 120 Mean = 48, Median = 39, Mode = 30

Answer:3. Mean and median4. Median

Page 19: stats form 4

Determining the most suitable measure of central tendency.

Mean is widely used because all the values in a set of data are taken into account.

But, if a set of data contains extreme values, the mean could deviate from its central tendency and may not represent the general characteristics of the data.

Under such circumstances, median will be a better representative measure of central tendency because median eliminates the effects of extreme values in the set of data.

Mode is usually used to represent a set of data containing a large number of values which take only some specific values and many repeated values.

Page 20: stats form 4

Range

Range of ungrouped data = largest value – smallest value

Example:Find the range of data.2, 4, 6, 7, 10, 15, 16, 19, 20, 21, 24

Answer: Range = 24 – 2 = 22

Page 21: stats form 4

Interquartile range of ungrouped data

Quartiles are three values in a set of data which divide the data into four quarters with each quarter having the same number of values.

Example: 2, 4, 6, 7, 10, 15, 16, 19, 20, 21, 24

Interquartile range = Q3 – Q1 = 20 – 6 = 14

2 numbers

2 numbers

2 numbers

2 numbersQ1 = lower

quartileQ2 = median Q3 = upper

quartile

Page 22: stats form 4

Range of grouped data

Range of grouped data= largest class mark – smallest

class mark

Largest class mark = = 27 goals

Smallest class mark = = 2 goals

Hence, the range = 27- 2 = 25 goals

Number of goals

0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 - 29

Number of players

12 9 4 3 1 1

Page 23: stats form 4

Interquartile range of grouped data from cumulative frequency table.

Construct the cumulative frequency table first.

Q1

Q3

Thus, Q1 lies within the class 40-49

and Q3 lies within the class 60-69.

Number of durians

20-29 30-39 40-49 50-59 60-69 70-79

Frequency

2 5 11 16 10 4

Cumulative frequency

2 7 18 34 44 48

Page 24: stats form 4

Since the cumulative frequency before the class 40-49 is 7, so Q1is the fifth value within the class 40-49.

The size of the class 40-49 = 49.5-39.5 = 10Assuming that the 11 values are distributed evenly

within the class.Therefore, the size between the two values =

Hence, Q1

Number of durians

20-29 30-39 40-49 50-59 60-69 70-79

Frequency

2 5 11 16 10 4

Cumulative frequency

2 7 18 34 44 48

Page 25: stats form 4

Since the cumulative frequency before the class 60-69 is 34, so Q3 is the second value within the class 60-69.

The size of the class 60-69 = 59.5-69.5 = 10Hence, Q3

Therefore, the interquartile range= Q3-Q1= 61.32-44.05=17.27

durians

Number of durians

20-29 30-39 40-49 50-59 60-69 70-79

Frequency

2 5 11 16 10 4

Cumulative frequency

2 7 18 34 44 48

Page 26: stats form 4

Determining the interquartile range of grouped data from an ogive.

We are given an ogive.

Q1 == 10th value= 24.5 mm

Q3 == 30th value= 31.5 mm

Hence, the interquartile range = 31.5-24.5

= 7.0 mm

24.531.5

Page 27: stats form 4

Variance

Variance of ungrouped data or where

Variance of grouped data where f = frequency of

each class x = class mark

Page 28: stats form 4

Standard deviation

Standard deviation is a statistical measurement which measure how much the values in a set of data are scattered around the mean. It is defined as the positive square root of the variance.

Standard deviation is a good measure of dispersion because it has the same unit as the values of the data whereas variance has a unit which is the square of the unit of the values of the data.

Page 29: stats form 4

Example:

Time taken

Class mark,x

f fx x2 fx2

5-9 7 8 56 49 392

10-14 12 15 180 144 2160

15-19 17 4 68 289 1156

20-24 22 3 66 484 1452

Mean; Varianc

e; s.d.;

Page 30: stats form 4

Effects on measures of dispersion when:

If every value of the data is changed uniformly, i.e. when every value n a set of data is multiplied by a constant quantity k, then we have

new range = k x original range

new interquartile range = k x ori. interquartile range

new s.d. = k x ori. standard deviation

new variance = k2 x original variance

Page 31: stats form 4

Effects on measures of dispersion when:

If there are extreme values in the set of data, this will significantly increase the range of the set of data but have little or no effect on the interquartile range.

Extreme values also significantly increase the value of standard deviation and variance but s.d. Is affected to a smaller degree as compared to variance.

Hence, interquartile range will eliminates the effect of extreme values.

Other measures of dispersion are affected at different degree by extreme values.

Page 32: stats form 4

Effects on measures of dispersion when:

If certain values are added or removed from a set of data, the effect on the measures of dispersion is uncertain.

In general, the range and the interquartile range are less affected as compared to the variance and the standard deviation.

Variance and standard deviation are more significantly affected when the added or removed value has a greater difference from the mean

Page 33: stats form 4

Comparing the measures of central tendency and dispersion

There are many situations where we need to compare two or more sets of data and subsequently make a conclusion.

The measure of central tendency may not provide us with enough information for comparison.

We need to determine the measures of dispersion of a set of data to provide us with a better picture of the characteristics of the set of data and eventually help us arrive at a more meaningful and acceptable conclusion.

Page 34: stats form 4

Team Marks

P 60, 65, 85, 76, 64, 88

Q 68, 62, 76, 80, 81, 71Team P Team Q

A teacher would like to select one of these teams to represent the school in a Mathematics quiz. The teacher is more concerned about a steady performance of the team in the quiz. Which team would the teacher select?

=

= 10.74

Both teams have the same mean, meaning that they are considered equally good statistically. However, team Q has a lower s.d., implying a small difference in the performances between the members of the team. Hence, they are expected to have a more consistent performance in the quiz as compared to team P. Therefore, the teacher would select team Q.

=

= 6.73

Page 35: stats form 4

Thank you for your patience and attention.

THE END