Revision workshop 17 january 2013

84
REVISION WORKSHOP NUBE 17 TH JANUARY 2013

description

NUBE Revision Workshop

Transcript of Revision workshop 17 january 2013

Page 1: Revision workshop 17 january 2013

REVISION WORKSHOP

NUBE 17TH JANUARY 2013

Page 2: Revision workshop 17 january 2013

2

• Frequency table consists of a number of classes and each observation is counted and recorded as the frequency of the class.

• If n observations need to be classified into a frequency table, determine:

– max minClass widthx x

c

Organising and graphing quantitative data in a frequency

distribution table.

Number of classes:

1 3,3logc n

Page 3: Revision workshop 17 january 2013

3

Example: The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

8 11 12 20 18 10 14 18 16 9

5 7 11 12 15 14 16 9 17 11

6 18 9 15 13 12 11 6 10 8

11 13 22 11 11 14 11 10 9

19 14 17 9 3 3 16 8 2

Organising and graphing quantitative data in a frequency

distribution table.

Page 4: Revision workshop 17 january 2013

4

1 3,3log

1 3,3log 48 6,5 7

Number of classes n

max min 22 22,86 3

7

x xClass width

k

Frequency distribution

8 11 12 20 18 10 14 18 16 9

5 7 11 12 15 14 16 9 17 11

6 18 9 15 13 12 11 6 10 8

11 13 22 11 11 14 11 10 9

19 14 17 9 3 3 16 8 2

Page 5: Revision workshop 17 january 2013

5

– first class min min[ ; )x x class width[ 2 ; 2 3 )[ 2 ; 5 )

– second class [ 5 ; 5 3 )[ 5 ; 8 )[ 5 ; 5 )class width

Frequency distribution

8 11 12 20 18 10 14 18 16 9

5 7 11 12 15 14 16 9 17 11

6 18 9 15 13 12 11 6 10 8

11 13 22 11 11 14 11 10 9

19 14 17 9 3 3 16 8 2

“[“ value is included in class

“)“ value is excluded from class

Page 6: Revision workshop 17 january 2013

Classes Count

[2;5)

[5;8)

[8;11)

[11;14)

[14;17)

[17;20)

[20;23)

6

3

4

11

13

9

2

6

8 11 12 20 ….

5 7 11 12 ….

6 18 9 15 ….

11 13 22 11 ….

19 14 17 9 ….

Frequency distribution

|

|

|

|

|

|

│││

││││

│││││││││

│││││││││││││

│││││││││││

││││││

││

Page 7: Revision workshop 17 january 2013

7

Classes Frequency (f)

[2;5) 3

[5;8) 4

[8;11) 11

[11;14) 13

[14;17) 9

[17;20) 6

[20;23) 2

Total 48

Frequency distribution

Page 8: Revision workshop 17 january 2013

8

Classes f % frequency

[2;5) 3 3/48×100 = 6,3

[5;8) 4 4/48×100 = 8,3

[8;11) 11 11/48×100 = 22,9

[11;14) 13 27,1

[14;17) 9 18,8

[17;20) 6 12,5

[20;23) 2 4,2

Total 48 100

Frequency distribution

Page 9: Revision workshop 17 january 2013

9

Classes f % f Cumulative frequency (F)

[2;5) 3 6,3 3

[5;8) 4 8,3 3 + 4 = 7

[8;11) 11 22,9 7 + 11 = 18

[11;14) 13 27,1 18 + 13 = 31

[14;17) 9 18,8 31 + 9 = 40

[17;20) 6 12,5 40 + 6 = 46

[20;23) 2 4,2 46 + 2 = 48

Total 48 100

Frequency distribution

Page 10: Revision workshop 17 january 2013

10

Classes f % f F % F

[2;5) 3 6,3 3 3/48×100 = 6,3

[5;8) 4 8,3 7 7/48×100 = 14,6

[8;11) 11 22,9 18 18/48×100 = 37,5

[11;14) 13 27,1 31 64,6

[14;17) 9 18,8 40 83,3

[17;20) 6 12,5 46 95,8

[20;23) 2 4,2 48 100

Total 48 100

Frequency distribution

Page 11: Revision workshop 17 january 2013

11

Classes f F Class mid-points (x)

[2;5) 3 3 (2 + 5)/2 = 3,5

[5;8) 4 7 (5 + 8)/2 = 6,5

[8;11) 11 18 (8 + 11)/2 = 9,5

[11;14) 13 31 (11 + 14)/2 = 12,5

[14;17) 9 40 15,5

[17;20) 6 46 18,5

[20;23) 2 48 21,5

Total 48

Frequency distribution

Page 12: Revision workshop 17 january 2013

12

Classes f % f F % F (x)

[2;5) 3 6,3 3 6,3 3,5

[5;8) 4 8,3 7 14,6 6,5

[8;11) 11 22,9 18 37,5 9,5

[11;14) 13 27,1 31 64,6 12,5

[14;17) 9 18,8 40 83,3 15,5

[17;20) 6 12,5 46 95,8 18,5

[20;23) 2 4,2 48 100 21,5

Total 48 100

Frequency distribution

Page 13: Revision workshop 17 january 2013

13

Classes f % f

[2;5) 3 6,3

[5;8) 4 8,3

[8;11) 11 22,9

[11;14) 13 27,1

[14;17) 9 18,8

[17;20) 6 12,5

[20;23) 2 4,2

y-axis

x-axis

Histograms

Page 14: Revision workshop 17 january 2013

14

Histograms

Number of telephone calls per hour

at a municipal call centre

0

2

4

6

8

10

12

14

Number of calls

Nu

mb

er

of

ho

urs

2 5 8 11 14 17 20 23

Page 15: Revision workshop 17 january 2013

Definitions

Frequency Polygon

A line graph of a frequency distribution and offers a useful alternative to a histogram. Frequency polygon is useful in conveying the shape of the distribution

Ogive

A graphic representation of the cumulative frequency distribution. Used for approximating the number of values less than or equal to a specified value

15

Page 16: Revision workshop 17 january 2013

16

Class mid-points (x) f % f

3,5 3 6,3

6,5 4 8,3

9,5 11 22,9

12,5 13 27,1

15,5 9 18,8

18,5 6 12,5

21,5 2 4,2

y-axis

x-axis

Frequency polygons

Page 17: Revision workshop 17 january 2013

17

Number of telephone calls per hour

at a municipal call centre

0

2

4

6

8

10

12

14

0.5 3.5 6.5 9.5 12.5 15.5 18.5 21.5 24.5

Number of calls

Nu

mb

er

of

ho

urs

Arbitrary mid-points to

close the polygon.

(x)

3,5

6,5

9,5

12,5

15,5

18,5

21,5

Frequency polygons

Page 18: Revision workshop 17 january 2013

18

Classes F % F

[2;5) 3 6,3

[5;8) 7 14,6

[8;11) 18 37,5

[11;14) 31 64,6

[14;17) 40 83,3

[17;20) 46 95,8

[20;23) 48 100

y-axis

x-axis

Ogives

Page 19: Revision workshop 17 january 2013

19

Ogive of number of call received

at a call centre per hour

0102030405060708090

100

2 5 8 11 14 17 20 23

Number of calls

% C

um

ula

tiv

e

nu

mb

er

of

ho

urs

None of the hours had

less than 2 calls.

Ogives

Page 20: Revision workshop 17 january 2013

20

Ogive of number of call received

at a call centre per hour

0102030405060708090

100

2 5 8 11 14 17 20 23

Number of calls

% C

um

ula

tiv

e

nu

mb

er

of

ho

urs

Ogives

50% of the hours had less

than 12 calls per hour.

80% of the

hours had

less than

17 calls

per hour.

20% of the

hours had

more than

17 calls

per hour.

Page 21: Revision workshop 17 january 2013

Exam question 2 A garbage removal company would like to start charging by the weight of a customers bin rather than by the number of bins put out. They select a sample of 25 customers and weigh their garbage bins. The weights in kg are given below:-

1. Construct a frequency table to describe the data. Include a frequency and relative (%) frequency column. (Hint: start the class intervals with the whole number just smaller than the lowest value in the dataset)

14.5 5.2 16.0 14.7 15.6 18.9 13.5 24.6 24.5 7.4

13.2 23.4 13.9 12.0 22.5 31.4 16.1 10.9 25.1 22.1

14.8 15.1 4.9 17.0 10.3

Page 22: Revision workshop 17 january 2013

Procedure

1. Calculate the range of the dataset

2. Calculate the no of classes

3. Calculate the class width

4. Construct table showing the intervals calculated in 1 to 3

5. Put in the tally for each interval and then show as frequency

6. Calculate the relative (%) frequency

13 marks

Page 23: Revision workshop 17 january 2013

Range

31.4 - 4.9 = 26.5

No of classes

K or c= 1+3.3logn

n = 25 K or c= 3.3 log (25) = 5.61 ≈ 6

Class Width

= 26.5/6 = 4.41 ≈ 5

max minClass widthx x

c

Page 24: Revision workshop 17 january 2013

INTERVALS TALLY FREQUENCY (f) RELATIVE FREQUENCY (%f)

4 - < 9 111 3 12

9 - < 14 1111 1 6 24

14 - < 19 1111 1111 9 36

19 - < 24 111 3 12

24 - < 29 111 3 12

29 - < 34 1 1 4

25 100

No of classes = 6 Class width = 5

Page 25: Revision workshop 17 january 2013

Exam question 2

2. Comment on the interval containing the lowest percentage

3. In which interval do the data tend to cluster? Which descriptive statistics measure, can we assume, would be found in this interval?

4. Comment on the shape of the distribution without drawing a graph . Give reasons

4% of bins weighed between 29 & 34 kg

Largest no. of bins weighed between 14 & 19kg. We assume mode will fall in this

interval (highest frequency)

+ve skewed as more values located in lower intervals

7 MARKS

Page 26: Revision workshop 17 january 2013

Quartiles & Box & Whisker Plots

Page 27: Revision workshop 17 january 2013

27

• Quartiles • Percentiles • Interquartile range

Page 28: Revision workshop 17 january 2013

QUARTILES

28

Page 29: Revision workshop 17 january 2013

29

• QUARTILES

– Order data in ascending order.

– Divide data set into four quarters.

25% 25% 25% 25%

Q1 Q2 Q3 Min Max

Page 30: Revision workshop 17 january 2013

30

Determine Q1 for the sample of nine measurements:

•Order the measurements

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9

Find difference between data for 2 & 3

2-(-3)=5 and multiply by the decimal portion of value : 5 x 0.5 = 2.5

Add to smallest figure: -3 + 2.5: Q1 = 0.5

th1 11 4 4 is the 1 9 1 2,5 valueQ n

Page 31: Revision workshop 17 january 2013

31

−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9

Q3 = 5 + 0,5(6 − 5) = 5,5

th3 33 4 4 is the 1 9 1 7,5 valueQ n

Determine Q3 for the sample of nine measurements:

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

Page 32: Revision workshop 17 january 2013

32

Q3 = 5,5

Q1 = −0,5

Interquartile range = Q3 – Q1

Interquartile range

= 5,5 – (−0,5)

= 6

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

Page 33: Revision workshop 17 january 2013

INTERQUARTILE RANGE (IQR)

• Difference between the third and first quartiles

• Indicates how far apart the first and third quartiles are

IQR = Q3 – Q1

33

Page 34: Revision workshop 17 january 2013

BOX & WHISKER PLOT

• Provides a graphical summary of data based on 5 summary measures or values

– First quartile, median, third quartile ,lower limit, upper limit

• Box and whisker plot detects outliers in a data set

LL = Q1 – 1,5 (IQR)

UL = Q3 + 1,5 (IQR)

34

Page 35: Revision workshop 17 january 2013

35

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

Me = 12,38

Q3 = 15,67

Q1 = 9,36

IRR = 6,31

LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11

UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14

BOX-AND-WISKER PLOT

1,5(IQR) 1,5(IQR) IQR

• Any value smaller than −0,11 will be an outlier.

• Any value larger than 25,14 will be an outlier.

Page 36: Revision workshop 17 january 2013

Exam question 3 The Tubeka brothers spent the following amounts in Rand on groceries over the last 8 weeks:-

1. Calculate a five number summary table

2. Construct a box and whisker plot for the data

3. Determine whether there are any outliers. Show calculations

20 MARKS

PROCEDURE

1. Reorder the data set

2. Identify maximum and minimum values in dataset

3. Calculate median

4. Calculate Q1 & Q3

5. Construct plot

6. Calculate upper & lower limits for dataset to determine if outliers present

54 56 89 67 74 57 43 51

Page 37: Revision workshop 17 january 2013

43 51 54 56 57 67 74 89

xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 Q1 = (n+1) (1/4) = (8+1) x ¼ = 2.25 value Between 51 & 54 54-51 = 3 multiply by decimal portion of value 3x 0.25 = 0.75 and add the lower value Q1 = 51 + 0.75 = 51.75 Q3 = (n+1) (¾) = (8+1) x ¾ = 6.75 value Between 67 & 74 74 – 67 = 7 multiply by decimal portion of value 7 x 0.75 = 5.25 and add lower value Q3 = 67 + 5.25 = 72.25

Page 38: Revision workshop 17 january 2013

43 51 54 56 57 67 74 89

xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25 OUTLIERS 1. Calculate upper & lower limits

LL = Q1 – 1,5 (IQR) UL = Q3 + 1,5 (IQR)

IQR = 72.25 – 51.75 = 20.5

LL = 51.75 – 1,5(20.5) = 21 UL = 72.25 + 1.5(20.5) = 103

No values smaller than 21 or greater than 103 therefore no outliers present

Page 39: Revision workshop 17 january 2013

MEASURES OF LOCATION

Page 40: Revision workshop 17 january 2013

40

th

i

th

where frequency of the i class interval

= class midpoint of the i class interval

i i

i

i

f xx

f

f

x

• ARITHMETIC MEAN

– Data is given in a frequency table

– Only an approximate value of the mean

Page 41: Revision workshop 17 january 2013

41

12

-1

where = lower boundary of the median interval

= upper boundary of the median interval

= cumulative frequency of interval foregoing

median interval

= frequency o

n

i i i

e i

i

i

i

i

i

u l FM l

f

l

u

F

f

f the median interval

• MEDIAN

– Data is given in a frequency table.

– First cumulative frequency ≥ n/2 will indicate the median class interval.

– Median can also be determined from the ogive.

Page 42: Revision workshop 17 january 2013

42

• MODE

– Class interval that has the largest frequency value will contain the mode.

– Mode is the class midpoint of this class.

– Mode must be determined from the histogram.

Page 43: Revision workshop 17 january 2013

43

To calculate the

mean for the sample

of the 48 hours:

determine the class

midpoints

Number of Number of calls hours fi xi

[2–under 5) 3 3,5

[5–under 8) 4 6,5

[8–under 11) 11 9,5

[11–under 14) 13 12,5

[14–under 17) 9 15,5

[17–under 20) 6 18,5

[20–under 23) 2 21,5

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Page 44: Revision workshop 17 january 2013

44

Number of Number of calls hours fi xi

[2–under 5) 3 3,5

[5–under 8) 4 6,5

[8–under 11) 11 9,5

[11–under 14) 13 12,5

[14–under 17) 9 15,5

[17–under 20) 6 18,5

[20–under 23) 2 21,5

n = 48

597

48

12, 44

i i

i

f xx

f

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

Average number

of calls per hour

is 12,44.

Page 45: Revision workshop 17 january 2013

Exam question 3 The number of overtime hours worked by 40 part-time employees of a security company in 1 week is shown in the following frequency distribution:-

1. Estimate the mean number of overtime hours worked

2. What % of employees worked at least 4.2 hours overtime?

8 marks

Hours per week

Frequency (f)

2.1 - < 2.8 12

2.8 - < 3.5 13

3.5 - < 4.2 7

4.2 - < 4.9 5

4.9 - < 5.6 2

5.6 - < 6.3 1

Page 46: Revision workshop 17 january 2013

Exam question 3 Procedure

1. Calculate the midpoint x for each interval (lower limit + upper limit/2)

2. Multiply f by the midpoint x

3. Total the fx and f columns

4. Divide ∑fx by ∑f

Page 47: Revision workshop 17 january 2013

Exam question 3

Mean = 136.5/40 = 3.41hrs

Employees at least 4.2 hrs = 8 8/40 *100 = 20%

Hours per week Frequency (f) Mid point (x) fx

2.1 - < 2.8 12 (2.1 + 2.8)/2= 2.45

29.4

2.8 - < 3.5 13 3.15 40.95

3.5 - < 4.2 7 3.85 26.95

4.2 - < 4.9 5 4.55 22.75

4.9 - < 5.6 2 5.25 10.5

5.6 - < 6.3 1 5.95 5.95

40 136.5

Page 48: Revision workshop 17 january 2013

PERCENTILES

48

Page 49: Revision workshop 17 january 2013

49

• PERCENTILES

– Order data in ascending order.

– Divide data set into hundred parts.

20% 80%

P80 Min Max

50% 50%

P50 = Q2 Min Max

10%

P10 Min Max

90%

Page 50: Revision workshop 17 january 2013

50

−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9

P20 = −3

nd2020 100 100

is the 1 9 1 2 valuepP n

Determine P20 for the sample of nine measurements:

Example – Given the following data set:

2 5 8 −3 5 2 6 5 −4

Page 51: Revision workshop 17 january 2013

51

Number of Number of calls hours fi F

[2–under 5) 3 3

[5–under 8) 4 7

[8–under 11) 11 18

[11–under 14) 13 31

[14–under 17) 9 40

[17–under 20) 6 46

[20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

= np/100

= 48(60)/100

= 28,8

The first cumulative

frequency ≥ 28,8

P60

Page 52: Revision workshop 17 january 2013

52

Number of Number of calls hours fi F

[2–under 5) 3 3

[5–under 8) 4 7

[8–under 11) 11 18

[11–under 14) 13 31

[14–under 17) 9 40

[17–under 20) 6 46

[20–under 23) 2 48

n = 48

Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour.

60

1100

P

14 11 28,8 1811

1313,49

np

p p p

p

p

u l Fl

f

60% of the time less than 13,49 or 40% of the time more than 13,49 calls per hour.

Page 53: Revision workshop 17 january 2013

Exam question 3 1. John, one of the part-time workers was told he falls on the

70th percentile. Calculate the value and explain what it means.

PROCEDURE

1. Calculate the cumulative frequencies

2. Calculate which class the required percentile falls into by using P =np/100

3. Once you have identified the class use the percentile formula given in the tables book to calculate the value. Take CARE to order the calculation correctly.

4 MARKS

Page 54: Revision workshop 17 january 2013

Exam question 3

P = np/100 = 40*70/100

=28

P70 = 3.5 + [ (4.2-3.5)(28-25)]/7

= 3.5 + 0.8

=3.8

70% of the workers worked fewer hours overtime than John. 70% of the workers worked fewer than 3.8 hrs. 30% of the workers worked more overtime hours than John. 30% of the employees worked more than 3.8hrs.

Hours per week

Frequency (f)

Cumulative F

2.1 - < 2.8 12 12

2.8 - < 3.5 13 25

3.5 - < 4.2 7 32

4.2 - < 4.9 5 37

4.9 - < 5.6 2 39

5.6 - < 6.3 1 40

40

Page 55: Revision workshop 17 january 2013

CONFIDENCE INTERVALS

Page 56: Revision workshop 17 january 2013

56

Confidence interval

– An interval is calculated around the sample statistic

Confidence interval

Population parameter

included in interval

Page 57: Revision workshop 17 january 2013

57

Confidence interval

– An upper and lower limit within in which the population parameter is expected to lie

– Limits will vary from sample to sample

– Specify the probability that the interval will include the parameter

– Typical used 90%, 95%, 99%

– Probability denoted by

• (1 – α) known as the level of confidence

• α is the significance level

Example:

Meaning of a 90% confidence interval:

90% of all possible samples taken from

population will produce an interval that will

include the population parameter

Page 58: Revision workshop 17 january 2013

• An interval estimate consists of a range of values with an upper & lower limit

• The population parameter is expected to lie within this interval with a certain level of confidence

• Limits of an interval vary from sample to sample therefore we must also specify the probability that an interval will contain the parameter

• Ideally probability should be as high as possible

58

Page 59: Revision workshop 17 january 2013

SO REMEMBER

•We can choose the probability

•Probability is denoted by (1-α)

•Typical values are 0.9 (90%); 0.95 (95%) and 0.99 (99%)

•The probability is known as the LEVEL OF CONFIDENCE

•α is known as the SIGNIFICANCE LEVEL

•α corresponds to an area under a curve

•Since we take the confidence level into account when we estimate an interval, the interval is called CONFIDENCE INTERVAL

59

Page 60: Revision workshop 17 january 2013

60

Confidence interval for Population Mean, n ≥ 30

- population need not be normally distributed

- sample will be approximately normal

2

2

1 1

1 1

( ) , if is known

( ) , if is not known

CI x Zn

sCI x Z

n

Page 61: Revision workshop 17 january 2013

Example :

90% confidence interval

1 – 0,90

0,10

0,100,05

2 2

61

2

2

1 1

1 1

( ) , if is known

( ) , if is not known

CI x Zn

sCI x Z

n

Lower conf limit Upper conf limit x

1 - α

2

2

Confidence level

= 1 - α

1

1 – α

= 0,90 0,052

0,05

2

90% of all sample

means fall in this area

These 2 areas added

together = α i.e. 10%

Page 62: Revision workshop 17 january 2013

62

Page 63: Revision workshop 17 january 2013

63

• Confidence interval for Population Mean, n < 30 – For a small sample from a normal population and σ is

known, the normal distribution can be used.

– If σ is unknown we use s to estimate σ

– We need to replace the normal distribution with the t-distribution

▬ standard normal

▬ t-distribution 2

1 1;1( )

n

sCI x t

n

Page 64: Revision workshop 17 january 2013

t Distribution

64

Page 65: Revision workshop 17 january 2013

65

• Example – The manager of a small departmental store is concerned about

the decline of his weekly sales.

– He calculated the average and standard deviation of his sales for the past 12 weeks,

– Estimate with 99% confidence the population mean sales of the departmental store.

1;12

134612400 3,106

12

12400 1206,86

11193,14 ; 13606,86

n

sx t

n

= R12400 and s = R1346x

99% confident the mean weekly

sales will be between

R11 193,14 and R13 606,86

t11;0.995

Page 66: Revision workshop 17 january 2013

66

• Confidence interval for Population proportion – Each element in the population can be classified as a

success or failure

– Proportion always between 0 and 1

– For large samples the sample proportion is

approximately normal

2

1 1

ˆ ˆ(1 )ˆ( )

p pCI p p z

n

number of successesˆSample proportion = =

sample size

xp

n

Page 67: Revision workshop 17 january 2013

Exam question 7 1. In a sample of 200 residents of Johannesburg, 120 reported

they believed the property taxes were too high. Develop a

95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer

2. The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate

for the mean time it will take the mechanic for all engine tune ups. Interpret your answer

15 MARKS

Page 68: Revision workshop 17 january 2013

Exam question 7 PROCEDURE

1. Determine what measure your are looking at: mean, proportion or standard deviation

2. Select appropriate formula based on 1. and sample size (t for small sample sizes <30; z for larger sample sizes)

3. Put the numbers into the formula and calculate the confidence intervals

Page 69: Revision workshop 17 january 2013

Exam question 7 1.

𝑝 = 120/200 = 0.6

Z 1-α

2

= 1.96

CI = 0.6 +/_1.96 √( 0.6 0.4 )/200

CI = 0.6 +/- 0.07

0.53<CI<0.67

At CL of 95% between 53% and 67% of residents believe tax rate is too high

In a sample of 200 residents of Johannesburg, 120 reported they believed the property taxes were too high. Develop a 95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer

21 1

ˆ ˆ(1 )ˆ( )

p pCI p p z

n

number of successesˆSample proportion = =

sample size

xp

n

Page 70: Revision workshop 17 january 2013

Exam question 7

The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate for the mean time it will take the mechanic for all engine tune ups. Interpret your answer

2

1 1;1( )

n

sCI x t

n

= 45 +/- 2.093 14

√20

= 45 +/- 6.55 38.45< µ < 51.55 At a confidence level of 95% the population average time to complete a tune up is between 38.45 and 51.55 minutes

Page 71: Revision workshop 17 january 2013

HYPOTHESIS TESTING

Page 72: Revision workshop 17 january 2013

STEPS OF A HYPOTHESIS TEST

Step 1 • State the null and alternative hypotheses

Step 2 • State the values of α

Step 3 • Calculate the value of the test statistic

Step 4 • Determine the critical value

Step 5 • Make a decision using decision rule or graph

Step 6 • Draw a conclusion

72

Page 73: Revision workshop 17 january 2013

73

• Hypothesis test for Population Mean, n < 30 – If σ is unknown we use s to estimate σ

– We need to replace the normal distribution with the t-distribution with (n - 1) degrees of freedom

Testing H0: μ = μ0 for n < 30

Alternative

hypothesis

Decision rule:

Reject H0 if Test statistic

H1: μ ≠ μ0 |t| ≥ tn - 1;1- α/2

H1: μ > μ0 t ≥ tn-1;1- α

H1: μ < μ0 t ≤ -tn-1;1- α

0xt

s

n

Page 74: Revision workshop 17 january 2013

74

• Hypothesis testing for Population proportion

– Proportion always between 0 and 1

number of successesˆSample proportion = =

sample size

xp

n

Testing H0: p = p0 for n ≥ 30

Alternative

hypothesis

Decision rule:

Reject H0 if Test statistic

H1: p ≠ p0 |z| ≥ Z1- α/2

H1: p > p0 z ≥ Z1- α

H1: p < p0 z ≤ -Z1- α

0

0 0

ˆ

(1 )

p pz

p p

n

Page 75: Revision workshop 17 january 2013

Exam question 8

1. Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.

2. The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level

16 MARKS

Page 76: Revision workshop 17 january 2013

Exam question 8 Procedure

1. State H0 and Ha

2. Determine the critical value from the appropriate test table using α, and n

3. Compute test statistic (t or z value??)

4. Draw conclusion

Page 77: Revision workshop 17 january 2013

Exam question 8

State hypothesis

H0: µ = 42.5

Ha: µ > 42.5

Determine critical value

tn-1; 1- α = t 23; 0.9 = 1.319

Reject H0 if the test statistic is > 1.319

Calculate test statistic

T = 40-42.5 = -6.12

2

√24

Do not reject H0

Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.

0xt

s

n

Page 78: Revision workshop 17 january 2013

Exam question 8 State hypothesis

H0: p = 0.55

Ha: p > 0.55

Determine critical value

α = 0.05 Z = 1.64

Reject H0 if Z test > 1.64

Calculate test statistic

Z = 0.6−0.55

√((0.55)(0.45)/70 = 0.84

Do not reject H0

The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level

0

0 0

ˆ

(1 )

p pz

p p

n

number of successesˆSample proportion = =

sample size

xp

n

Page 79: Revision workshop 17 january 2013

CORRELATION COEFFICIENT

Page 80: Revision workshop 17 january 2013

80

Coefficient of correlation

• The coefficient of correlation is used to measure the strength of association between two variables.

• The coefficient values range between -1 and 1.

– If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line.

– If r = 0 there is no linear pattern.

• The coefficient can be used to test for linear relationship between two variables.

Page 81: Revision workshop 17 january 2013

81

X

Y

X

Y

X

Y

X

Y

X

Y

X

Y

Perfect positive

r = +1

High positive

r = +0,9

Low positive

r = +0,3

Perfect negative

r = -1

High negative

r = -0,8

No Correlation

r = 0

Page 82: Revision workshop 17 january 2013

Exam question 10 The cost of repairing cars that were involved in accidents is one reason that insurance premiums are so high. In an experiment 5 cars were driven into a wall. The speeds were varied between 20km/hr and 80km/hr (X). The costs of repair (Y) were estimated and listed below:-

1. Use calculator to calculate coefficient of correlation. Interpret your

answer 2. Calculate and interpret the coefficient of determination for this

data 3. Use your calculator to construct regression line equation and

predict repair cost at 50km/h

10 MARKS

SPEED (Km/h) (X) COST OF REPAIR (R’000) (Y)

20 3

30 5

40 8

60 24

80 34

Page 83: Revision workshop 17 january 2013

Exam question 10

1. Put data into calculator

2. Select regression function and select r

3. Calculate coefficient of determination

= r2 x100%

4. Interpret results

5. Using Y = A + BX select regression function on calculator and determine values for A & B

6. Put x = 50 into formula and calculate result

Page 84: Revision workshop 17 january 2013

Exam question 10 1. r = 0.98

There is a very strong relationship between the repair cost and speed.

2. r2 x 100% = 0.982 x 100 = 96%

96% of the variation in the cost of repair is explained by the variation in the speed at which the car crashed

3. Y = -10.7 +0.55x

X = 50 Y = 16.8