Comparing categorical data - haesemathematics.com€¦ · into an upper class English family. Her...
-
Upload
trinhkhanh -
Category
Documents
-
view
213 -
download
0
Transcript of Comparing categorical data - haesemathematics.com€¦ · into an upper class English family. Her...
Comparingcategorical data
18Chapter
Contents: A Categorical data
B Examining categorical data
C Comparing and reporting
categorical data
D Data collectionE Misleading graphs
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\369IB_MYP3_18.CDR Wednesday, 28 May 2008 11:52:44 AM PETER
HISTORICAL NOTE FLORENCE NIGHTINGALE
OPENING PROBLEM
Florence Nightingale (1820 - 1910) was born
into an upper class English family. Her
father believed that women should have an
education, and she learnt Italian, Latin, Greek
and history, and had an excellent early preparation in
mathematics.
She served as a nurse during the Crimean War, and became
known as ‘the lady with the lamp’. During this time she
collected data and kept systematic records.
After the war she came to believe that most of the soldiers
in hospital were killed by insanitary living conditions rather
than dying from their wounds.
She wrote detailed statistical reports and represented her statistical data graphically.
She demonstrated that statistics provided an organised way of learning and this led to
improvements in medical and surgical practices.
A construction company is building a new high-rise apartment building in
Tokyo. It will be 24 floors high with 8 apartments on each floor.
The company needs to know some information about the people who will
be buying the apartments. They prepare a form which is published in all
local papers and on-line:
Marital status:
¤ married ¤ single
Age group:
¤ 18 to 35 ¤ 36 to 59 ¤ 60+
Desired number of bedrooms:
¤ 1 ¤ 2 ¤ 3
The statistical officer receives 272 responses and these are typed in coded form.
Marital Status
Age group
Married (M) Single (S)
18 to 35 (Y) 36 to 59 (I) 60+ (O)
1 2 3 bedroomsApartment size
HANAKOCONSTRUCTIONS
Please respond only if you have in owning yourown residence in this prestigious new block.
some interest
HANAKO CONSTRUCTIONS NEW APARTMENTS–
U70 400to million
Phone number: :::::::::::::::::::::::::::::::::::::::::::::
Current address: :::::::::::::::::::::::::::::::::::::::::::
Name: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::
370 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\370IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:29 AM PETER
The results are:
MY1 MI3 MI2 MO2 MY2 MO2 MO2 MY2 MO2 MI2
SY1 MO2 MY1 MI3 MO2 SO1 MI3 SO2 MO2 MO2
MI3 SO3 SO2 MI3 MI1 MO3 SI3 MO2 SO2 SO1
SO1 MI3 MO2 SO1 SY2 MO1 MY1 MI2 MO1 MO1
MO2 SO1 SO2 MI3 MO1 MI3 SI1 SI2 MO2 MO1
SO1 MO2 MI3 MI3 MO1 MI2 MO2 MO2 MO1 MO1
MO2 MI3 SY2 MO3 MO1 MI3 MI3 MI3 MO1 SO3
SO1 MO2 SI2 SO1 MO3 MI3 SI2 MO1 MI3 MO1
MO2 MO1 MI3 MY2 MY3 MI3 MI1 MY1 SY2 MI3
SO1 MY2 MI3 MO1 SI3 SI1 SY3 MO1 MO1 SO1
MY1 MI3 MI3 MI3 MY2 MO3 MO2 SO2 MI3 MO1
MO1 MI1 SI2 MO3 MI1 MI3 MI3 MY3 MO2 MO1
MO2 MY2 SO2 MY2 SO1 SI2 SO3 MO3 MI3 MI3
SO2 MI3 MI3 SO1 MY2 MI3 SY2 MO1 MI2 MI3
SO1 SO2 MI3 MO3 SO2 SY1 SO2 SI1 MY2 SI1
MI2 MI3 MI3 MY2 MY2 MI3 MO2 MO3 MO1 MI3
MO1 SO1 MO1 MO2 MO2 SO2 MI3 SO1 MI3 SI1
MI2 MY2 MI3 SI1 MI3 MO2 MI3 MI3 MO1 MO2
MI3 SI1 MI3 MI3 SY2 SO2 MO1 SI2 SO2 SO1
SO1 MI2 MO2 MO2 MO1 MI3 MI3 MI3 MO3 MO2
MI2 MI3 MO1 MI3 SO1 SO2 SI2 SO1 SI2
SO1 MI3 MI3 MO3 MO2 MY1 MO2 MI3 MO3
MI1 SY2 MO3 SO1 MY2 SI2 MI2 MI3 SI1
MO1 MO2 MO3 MI3 MO1 SO1 MI2 MI3 MO2
MI3 MI3 MI3 SO1 MI3 MI3 SY2 SI3 MO2
MI1 SO1 MI3 MY2 SY3 MI3 MI2 SO2 MO2
SO1 MI3 MI3 MY1 MI1 MO2 MY1 MI2 MO3
MI1 MI3 MI3 SI1 MO3 MO1 SI1 SO1 SI1
Things to think about:
² What problems are the construction company trying to solve?
² Is the company’s investigation a census or a survey?
² What are the variables?
² Are the variables categorical or quantitative?
² What are the categories of the categorical variables?
² Can you explain why the construction company is interested in these categories?
² Is the data being collected in an unbiased way?
² Why were the names, addresses and phone numbers of respondents asked for?
² Can you make sense of the data in its present form?
² How could you reorganise the data so that it can be summarised and displayed?
² What methods of display are appropriate here?
² Can you make a conclusion regarding the data and write a report of your findings?
COMPARING CATEGORICAL DATA (Chapter 18) 371
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\371IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:31 AM PETER
Statistics is the art of solving problems and answering questions by collecting
and analysing data.
The facts or pieces of information we collect are called data.
One piece of information is known as one piece of datum (singular), whereas lots of pieces
of information are known as data (plural).
A list of information is called a data set. If it is not in organised form it is called raw data.
VARIABLES
There are two types of variables that we commonly deal with:
² A categorical variable describes a particular quality or characteristic. The data is
divided into categories, and the information collected is called categorical data.
Examples of categorical variables are:
Getting to school:
Colour of eyes:
the categories could be train, bus, car and walking.
the categories could be blue, brown, hazel, green, grey.
² A quantitative variable has a numerical value and is often called a numerical
variable. The information collected is called numerical data.
Quantitative variables can be either discrete or continuous.
A quantitative discrete variable takes exact number values and is often a result of
counting.
Examples of discrete quantitative variables are:
The number of people in a household: the variable could take the values
1, 2, 3, .....
The score out of 30 for a test: the variable could take the values
0, 1, 2, 3, ......, 30.
A quantitative continuous variable takes numerical values within a certain
continuous range. It is usually a result of measuring.
Examples of quantitative continuous variables are:
The weight of newborn babies: the variable could take any positive value
on the number line but is likely to be in the
range 0:5 kg to 8 kg.
The heights of 14 year old students: the variable would be measured in
centimetres. A student whose height is
recorded as 145 cm could have exact height
anywhere between 144:5 cm and 145:5 cm.
CATEGORICAL DATAA
372 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\372IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:33 AM PETER
CENSUS OR SAMPLE
The two methods of data collection are by census or sample.
A census involves collecting data about every individual in a whole population.
The individuals in a population may be people or objects. A census is detailed and accurate
but is expensive, time consuming, and often impractical.
A sample involves collecting data about a part of the population only.
A sample is cheaper and quicker than a census but
is not as detailed or as accurate. Conclusions drawn
from samples always involve some error.
A sample must truly reflect the characteristics of the
whole population. To ensure this it must be unbiased
and large enough.
Just how large a sample needs to be is discussed in
future courses.
In a biased sample, the data has been unfairly influenced by the collection process.
It is not truly representative of the whole population.
STATISTICAL GRAPHS
Two variables under consideration are usually linked by one being dependent on the other.
For example: The total cost of a dinner depends on the number of guests present.
The total cost of a dinner is the dependent variable.
The number of guests present is the independent variable.
When drawing graphs involving two variables,
the independent variable is usually placed on the
horizontal axis and the dependent variable is
placed on the vertical axis. An exception to this
is when we draw a horizontal bar chart.
Acceptable graphs which display categorical data are:
The mode of a set of categorical data is the category which occurs most frequently.
dependent variable
independent variable
Vertical column graph
42
68
10
Horizontal bar chart
42 6 8 10
Segment bar chartPie chart
COMPARING CATEGORICAL DATA (Chapter 18) 373
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\373IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:35 AM PETER
THE STATISTICAL METHOD
The process of statistical enquiry or investigation includes the following steps:
Step 1: Examine the problem which may be solved using data. Pose the correct
questions.
Step 2: Collect unbiased data.
Step 3: Organise the data.
Step 4: Summarise and display the data.
Step 5: Analyse the data and make a conclusion in the form of a conjecture.
Step 6: Write a report.
GRAPHING USING A COMPUTER PACKAGE
Click on the icon to obtain a computer package which can be used to draw:
² graphs of a single set of categorical data using:
I a vertical column graphI a horizontal bar chartI a pie chartI a segment bar chart
² comparative graphs of categorical data using:
I a side-by-side column graphI a back-to-back bar chart.
EXERCISE 18A
1 Classify the following variables as either categorical or numerical:
a the number of text messages you send in a day
b the places where you access the internet
c the brands of breakfast cereal
d the heights of students in your class
e the daily maximum temperature for your city
f the number of road fatalities each day
g the breeds of horses
h the number of hours you sleep each night.
2 Write down possible categories for the following categorical variables:
a brands of cars b methods of transport
c types of instruments in a band d methods of advertising.
3 For each of the following possible investigations, classify the variable as categorical,
quantitative discrete, or quantitative continuous:
a
GRAPHING
PACKAGE
374 COMPARING CATEGORICAL DATA (Chapter 18)
the types of flowers available from a florist
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\374IB_MYP3_18.CDR Monday, 2 June 2008 2:22:57 PM PETER
b the numbers on playing cards in a pack of cards
c the heights of trees that were planted one year ago
d the masses of oranges in a 5 kg bag
e the times for runners in a 400 metre race
f the number of oranges in the 5 kg bags at a
supermarket
g the varieties of peaches
h the amount of rain each day for a month
i the speeds of cars passing through an intersection
j the types of fiction
k the pulse rates of horses at the end of a race
l
m the number of passengers for a taxi driver each day for a month
n the number of students absent from school each day for a term.
4 State whether a census or a sample would be used for these investigations:
a the country of origin of the parents of students in
your class
b the number of people in your country who are
concerned about global warming
c people’s opinions about the public transport system
in your capital city
d the favourite desserts in your local restaurant
e the most popular candidate for the next election in
your state or county.
5 Comment on any possible bias in the following situations:
a Members of a dog club are asked if dogs make the best pets.
b School students are asked about the benefits of homework.
c Commuters at peak hour are asked about crowding in buses.
6
a
b What is the dependent variable?
c What is the sample size?
d Find the mode of the data.
e
0
2
4
6
8
10
Can
ada
Engla
nd
Franc
e
Ger
man
y
Spain
Uni
ted
State
s
Aus
tralia
frequency
country
Guests of a hotel in Paris were askedwhich country they lived in. The resultsare shown in the vertical column graph.
What are the variables in thisinvestigation?
Construct a pie chart for the data.If possible, use a spreadsheet.
COMPARING CATEGORICAL DATA (Chapter 18) 375
the weekly cost of groceries for your family
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\375IB_MYP3_18.CDR Monday, 2 June 2008 2:22:44 PM PETER
7 Fifty households of one street were asked which brand of
television they owned. The data alongside was collected.
a What are the dependent and independent variables in
this investigation?
b If we are trying to determine the buying patterns of
a whole city, is the sample unbiased? Explain your
answer.
c Find the mode of the data.
d
8 Find the sector angle of a pie chart if the frequency of the category is:
a 23 in a sample of 180 b 128 in a sample of 720
c 238 in a sample of 1250.
9 A sample of many people was taken, asking them their
favourite fruit. On a pie chart a sector angle of 68o
represented 277 people whose favourite fruit was an
orange.
Find, to the nearest 10, the size of the sample used.
SY1 2 SI1 11 SO1 26 39
SY2 7 SI2 9 SO2 15 31
SY3 2 SI3 3 SO3 3 8
11 23 44 78
Number
of
bedrooms
Age of single respondent
18 to 35 36 to 59 60+ Totals
1 2 11 26 39
2 7 9 15 31
3 2 3 3 8
Totals 11 23 44 78
She then uses a spreadsheet to create a series of graphs. Here are two of them:
EXAMINING CATEGORICAL DATAB
oranges
0
5
10
15
20
25
30
18 to 35 36 to 59 60+
1 bedroom2 bedrooms3 bedrooms
Housing by age group
In order to make a report tothe construction company, shedisplays this data in the formof a :two way table
TV Brand Frequency
A 9B 4
C 12D 8
E 7F 10
Construct a horizontal bar chart for the data.
376 COMPARING CATEGORICAL DATA (Chapter 18)
For the data, the statisticalofficer first extracts the data for single peopleresponding to the survey. Her findings are:
Opening Problem
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\376IB_MYP3_18.CDR Monday, 2 June 2008 9:39:18 AM PETER
If we transpose the data by interchanging the rows and columns, we also get an interesting
comparison.
From the table of counts and from graphs, various questions can be answered. In many cases
tables containing percentages may be more appropriate to use.
Organ
donor
Marital status
Single Married Totals
Yes 63 79
No 25 27
Totals
A survey was conducted to determine
willingness to be an organ donor. The
results are shown alongside:
a Complete the table to find the total
of each row and column.
b How many people surveyed were
married but not willing to be an organ donor?
c What percentage of single people surveyed were willing to be organ donors?
d What percentage of people surveyed were married?
a
Organ
donor
Marital status
Single Married Totals
Yes 63 79 142
No 25 27 52
Totals 88 106 194
b From the table, 27 people were
married but were not willing to
be an organ donor.
c Percentage of single people who
were willing to be organ donors
= 6388 £ 100%
¼ 71:6%
d Percentage of people surveyed who were married = 106194 £ 100% ¼ 54:6%
EXERCISE 18B
Preference
Gender
Male Female Totals
Basketball 21 20
Tennis 9 35
Totals
1 Residents of a suburb were sent
a survey in the mail. It asked
them to indicate their gender and
whether they would prefer a tennis
court or a basketball court built in
their suburb. The results are given
alongside.
a Complete the table to find the total of each row and column.
Example 1 Self Tutor
Find out how to transpose yourtable without having to retypethe data into new cells.
0
5
10
15
20
25
30
1 bedroom 2 bedrooms 3 bedrooms
18 to 35
36 to 59
60+
Housing by dwelling type
COMPARING CATEGORICAL DATA (Chapter 18) 377
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\377IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:45 AM PETER
b Did more males or females respond to the survey?
c What percentage of people responding to the survey preferred a basketball court?
d What percentage of people responding to the survey who preferred a tennis court
were female?
Transport
method
North South Totals
Public 85 64 149
Car 71 57 128
Totals 156 121 277
2 To determine where the need for public
transport is greatest, residents of a city were
asked to indicate whether they lived to the
north or south of the city centre, and the
method of transport they used to go to work.
a What percentage of people living north
of the city centre take public transport to work?
b What percentage of people surveyed live south of the city centre and drive their car
to work?
c Is the percentage of people using public transport greater to the north or the south
Home
computer
Year
1997 2007 Totals
Yes 113 281
No 159 23
Totals
3 A survey was conducted in 1997, and another
in 2007, to investigate how many people had a
computer in their home. The results are given
in the table alongside:
a Complete the table to find the total of
each row and column.
b What percentage of people surveyed in 1997 had a computer in their home?
c What percentage of people surveyed in 2007 did not have a computer in their home?
d
Food
preference
Age group
Under 30 Over 30 Totals
Italian 29 38 67
Greek 55 24 79
Totals 84 62 146
4 A country town in England
is organising a food festival.
Residents interested in attending
were asked to indicate their age
group and whether they preferred
Italian or Greek food.
a How many people responding to the survey expressed a preference for Greek food?
b What percentage of people who indicated they preferred Italian food were under 30?
c What percentage of people over 30 preferred Greek food?
d Which age group showed the most interest in the festival, and which type of food
did that group prefer?
5 For the Singles’ data from the Opening Problem:
a create your own tally on a spreadsheet and obtain the side-by-side column graph
b transpose the data on the spreadsheet and draw the new side-by-side column graph
c answer the following questions:
i How many singles responded to the survey?
ii What percentage of the total respondents were single?
378 COMPARING CATEGORICAL DATA (Chapter 18)
Position
of the city centre?
Find the increase in computer ownership percentage from to , accordingto the survey.
1997 2007
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\378IB_MYP3_18.CDR Monday, 2 June 2008 2:24:41 PM PETER
INVESTIGATION THE OPENING PROBLEM
iii Which singles’ age group showed the most interest in the new development and
what form of housing interested them most?
iv What percentage of the 36 to 59 singles age group were interested in 3 bedroom
apartments?
v What percentage of the single respondents were interested in buying a 2bedroom apartment?
d True or false?
i The 18 to 35 singles group has shown little interest in the new apartments.
ii There is much interest amongst the single respondents in one bedroom
apartments.
iii The 60+ singles age group has shown most interest in the new apartments and
the vast majority of them want them with 2 or 3 bedrooms.
Your task now is to organise the data from the married respondents for
the Opening Problem. You could do this without using a spreadsheet and
simply count from the raw data originally given. However, you could use
the spreadsheet found on your CD. Click on the icon to find it.
It contains all 272 responses so the singles data should first be eliminated.
What to do:
1 Follow these steps to analyse the apartment size by age group:
Step 1: Open the spreadsheet by clicking on the icon.
Step 2: Enter the formula =COUNTIF(| {z } $A:$A,| {z } “M”&D$3&$C4)| {z }” into cell D4.
Step 3: Fill the formula in D4 down and across to cell F6.
Step 4: Highlight the cell range D4:G7 and click the sum button on the
toolbar.
Step 5: Highlight the cell range C3:F6 and click on the Chart Wizard button
on the toolbar. Choose the first type of column graph and click the
Finish button. A column graph of your tabulated data should appear.
data incolumn A
the formula to count thenumber of times “ ”
appears in the dataMY1
constructs “ ” fromthe table row andcolumn headings
MY1
SPREADSHEET
COMPARING CATEGORICAL DATA (Chapter 18) 379
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\379IB_MYP3_18.CDR Tuesday, 17 June 2008 10:29:19 AM PETER
2 The next task is to suitably graph the married respondents’ data so that it can be
compared:
² in categories
² with the singles’ data to find similarities and differences.
Produce similar graphs for the singles’ data.
3 Construct your own report for the construction company. It should include:
² tabulation of the data in summarised form
² graphical representation of the data
² discussion and conclusion.
4 What conclusions can you draw from the data?
DISPLAY
To display categorical data sets for comparison we could use:
² a side-by-side column graph ² a back-to-back bar chart.
City
Colour
Red Blue Black White
London 35 26 38 21
Paris 27 19 34 40
In order to compare the popularity of
car colours in London and Paris, a
sample of 120 people from each city
were asked their favourite car colour.
a Draw a side-by-side column graph
comparing the data from London and Paris.
b Which colour is most popular in London?
c In which city are blue cars more popular?
d Which colour shows the largest difference in popularity between the two cities?
COMPARING AND REPORTINGCATEGORICAL DATA
C
Example 2 Self Tutor
0
2
4
6
8
10
12
A B C D E F G
frequency
value frequency
value
�� � � � � ��
A
B
C
D
E
F
G
380 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\380IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:53 AM PETER
a b Black is the most popular colour in
London, with 38 people preferring
black.
c From the graph, blue cars are more
popular in London.
d The largest difference in popularity
occurs in white cars, with 21 people in
London compared to 40 people in Paris.
EXERCISE 18C
Gender
Subject
Maths Science Geography Art
Boys 30 26 21 22
Girls 20 15 40 25
1 100 boys and girls were asked
to indicate their favourite
subjects. The results were:
a Draw a back-to-back bar
chart comparing the data
for boys and girls.
b What is the most popular subject for girls?
c Do boys or girls show less variation in their subject preferences?
d In which subject were the boys’ and girls’ preferences closest?
Age
Movie type
Action Comedy Drama Horror
Under 30 38 42 29 31
Over 30 17 21 25 7
2 People from two different age
groups were surveyed about their
favourite type of movie.
a Draw a side-by-side column
graph comparing movie
preferences of the under 30s and over 30s.
b Is this a sensible way to compare the groups? Give a reason for your answer.
c Draw a side-by-side column graph again, this time using relative frequencies. Is this
a more sensible way of comparing the groups?
d Which age group likes drama movies more?
e Which type of movie is preferred equally by each age group?
3 In the lead up to an election, 100 people from the electorates of Arton and Burnley were
polled to see how they intended to vote on election day. The results are shown in the
following table:
Electorate
Party
Labor Liberal Independent Undecided
Arton 37 28 14 21
Burnley 33 36 15 16
a Draw a back-to-back bar chart comparing the data from Arton and Burnley.
b Which party is the most popular in Arton?
c In which electorate is the Liberal Party performing the best?
d In which electorate is the intended voting closest?
0
10
20
30
40
50
red blue black white
London
Paris
colour
no
of
vote
s
Car colours
COMPARING CATEGORICAL DATA (Chapter 18) 381
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\381IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:55 AM PETER
DATA COLLECTION
There are a number of ways in which data can be misleading. It is always a good idea to
check the source and method of collection of a set of data before making major decisions
based on statistics.
Before data is collected the following decisions need to be made:
1 Should data for the investigation be collected from the whole population or a sample
of the population?
In most statistical investigations surveying the whole population is impractical; it is either
too time consuming or too costly. In these cases a sample is chosen to represent the
population. Conclusions for an investigation from a sample will not provide the same
degree of accuracy as conclusions made from the population.
In this context, population means all the people or things that the conclusions of a
statistical investigation would apply to.
For example, if you want to investigate a theory related only to your school then the
population is all the students who attend your school. The population in this case would
be accessible although the investigation could also be done with a carefully selected
sample.
2 How should the sample be collected?
If data is to be collected from a sample then a sample that represents the population
must be chosen so that reliable conclusions about the population can be made.
Samples must be chosen so that the results will not show bias towards a particular
outcome. For example, if the purpose of a survey is to get an accurate indication of how
the population of a city is going to vote at the next election then surveying a sample
of voters from only one suburb would not provide information that represents all of the
city. It would give a biased sample.
One way of choosing an unbiased sample is to use simple random sampling where
every member of the population has an equal chance of being chosen.
3 What should the sample size be?
The sample size is an important feature to be
considered if conclusions about the population
are to be made from the sample.
For example, measuring a group of three fifteen-
year-olds would be insufficient to give a very
reliable estimate of the height of fifteen-year-
olds.
DATA COLLECTIOND
382 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\382IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:57 AM PETER
PROJECT
Decide on a worthwhile project of a statistical nature.
The data collected should be categorical in nature.
² Discuss the project with your teacher who will judge if it is appropriate.
² Make sure that your sample is sufficiently large to accurately reflect the population.
² Make sure that your sample is not biased.
² Write a detailed, factual report including your data and your conclusions.
EXERCISE 18D
1 A school has 820 students. An investigation concerning the school uniform is being
conducted. 40 students from the school are randomly selected to complete the survey on
their school uniform.
a What is the population size?
b What is the size of the sample?
c Explain why data collected in the following ways would not produce a sample
representative of the population.
i The surveyor’s ten best friends are asked to complete the survey.
ii All the students in one class are surveyed.
iii Volunteers are asked to complete the survey.
2
3 A polling agency is employed to survey the voting intention of residents of a particular
electorate in the coming election. From the data collected they are to predict the election
result in that electorate.
Explain why each of the following situations would produce a biased sample:
a A random selection of people in the local large shopping complex is surveyed
between 1 pm and 3 pm on a weekday.
b All the members of the local golf club are surveyed.
c A random sample of people on the local train station between 7 am and 9 am are
surveyed.
d A doorknock is undertaken, surveying every voter in a particular street.
A research company wants to knowpeoples’ opinions on whether smokingshould be banned in all public places.
They ask people standing outsidebuildings in the city during office hours.Explain why the data collected is likelyto be biased.
COMPARING CATEGORICAL DATA (Chapter 18) 383
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\383IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:01 AM PETER
Graphs can also be misleading. There are two ways this usually happens:
USING A ‘CUT-OFF’ SCALE ON THE VERTICAL AXIS
For example, consider the graph shown:
A close look at the graph reveals that the vertical
scale does not start at zero and so has exaggerated
the increase in profits.
The graph should look like that on the left,
which gives a better picture of the profit
increases. It probably should be labelled
‘A slow but steady increase in profits’.
MISREPRESENTING THE ‘BARS’ ON A BAR CHART OR COLUMN
GRAPH
Sometimes the ‘bars’ on a bar chart or column graph are shown with misleading area or
volume.
For example, consider the graph below comparing sales of different flavours of drink.
By giving the ‘bars’ the appearance of
volume, the sales of lemon drinks look
to be about eight times the sales of lime
drinks.
However, on a bar chart the frequency is
proportional to the height of the bar only, and so
the graph should look like this:
MISLEADING GRAPHSE
profit ($1000’s)
month3
6
9
12
15
18
Jan Feb Mar Apr
profit ($1000’s)
month14
15
16
17
Jan Feb Mar Apr
Profit
ssk
yrocket!
sales($m’s)
flavour of drink
Lime Lemon Orange
sales ($m’s)
flavour of drink
Lime Lemon Orange
The sales of lemon arejust over twice the salesof lime.
384 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\384IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:04 AM PETER
REVIEW SET 18A
EXERCISE 18E
1 Describe the misleading or poor features of each of the following graphs:
a b
c d
2
a Which graph gives the impression of rapidly increasing sales?
b Have sales in fact rapidly increased over this 6 year period?
c According to graph A, the sales for 2006 appear to be double those of 2005. Is this
true?
1 For each of the following investigations, classify the variable as categorical,
quantitative discrete, or quantitative continuous:
a the favourite television programs watched by class members
b the number of visitors to an art gallery each week.
year
Graph A
2000
2100
2200
2300
2400
2500
2600
03 04 05 06 07 08 09
sales of chocolates
year
Graph B
0
500
1000
1500
2000
2500
03 04 05 06 07 08 09
sales of chocolates
Fish sold at markets
week
cases (100’s)
0
10
20
30
1 2
Milk production
10
20
factory
('000s L)
0
A B
Exports
year0
10
20
30
40
20062005 2007
tonnes (millions)Interstate bus fares
year
dollars
45
50
65
55
70
60
2005 2006 20072004
COMPARING CATEGORICAL DATA (Chapter 18) 385
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\385IB_MYP3_18.CDR Monday, 2 June 2008 2:27:45 PM PETER
Type of food Frequency
Pie 20
16
Pasta 9
Sandwich 17
Apple 13
Chips 5
2
a What are the variables in the investigation?
b What are the dependent and independent
variables?
c In what way is the sample biased?
d Construct a vertical column graph to illustrate
the data.
Age
Analogue Digital Totals
Under 30s 32 23
Over 30s 38 21
Totals
3 A sample of people were asked
whether they owned an analogue
or a digital radio. The results
were sorted by the age of the
people surveyed, and are shown
in the table alongside:
a Complete the table to find
the total of each row and column.
b How many people surveyed were over
30?
c What percentage of under 30s surveyed
own an analogue radio?
d What percentage of people surveyed
were over 30s who have a digital radio?
Ticket Group
Adult Concession Children
Friday 121 71 63
Saturday 139 34 82
4 The ticket sales for a movie theatre
over a two day period are given in
the table alongside:
a Draw a side-by-side column
graph comparing the ticket
sales from Friday and Saturday
for each ticket group.
b Which day is more popular with adults?
c Which ticket group was least popular on Friday?
d Which ticket group was most influenced by the day of the week?
5 To compare the sporting preferences of Hillsvale and Greensdale High Schools, 100students from each school were asked to indicate their favourite sports:
Sport
Gridiron Ice Hockey Basketball Baseball Football
Hillsvale 24 28 17 22 9
Greensdale 18 30 14 12 26
Radio type
Hot dog
386 COMPARING CATEGORICAL DATA (Chapter 18)
To find out which foods at a school canteen thestudents eat most, year students were askedto nominate the item they purchased most often.The following data was collected:
80 12
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\386IB_MYP3_18.CDR Monday, 2 June 2008 9:43:53 AM PETER
REVIEW SET 18B
a Draw a back-to-back bar graph comparing the results for Hillsvale and
Greensdale High Schools.
b Which is the most favoured sport at Hillsvale?
c Basketball is more popular in which school?
d Which sport is best described as “a mainly Greensdale School Sport”?
6 To investigate the public’s opinion
on whether money should be spent
upgrading the local library, a
questionnaire form is placed in the
library for members of the public to
fill out.
Explain why this sample is likely to
be biased.
1 For each of the following investigations, classify the variable as categorical,
quantitative discrete, or quantitative continuous:
a the area of each block of land on a street
b the marital status of people in a particular suburb.
2 State whether a census or a sample would be used for these investigations:
a finding the percentage of people who own a dog
b finding the number of pets that students in a particular class own
c finding the amount of rain a city receives each month.
Require glasses
Yes No
Left handed 39 56
Right handed 229 311
3 To investigate whether being left or right
handed has any effect on eyesight, a sample of
people were asked whether they were left or
right handed and whether they required glasses
for driving. The results are given in the table
alongside:
a How many people were surveyed?
b What percentage of people surveyed were left handed?
c What percentage of right handed people required glasses for driving?
d Was there a significant difference between the percentages of right handed and
left handed people who required glasses for driving?
Election Issues
Unemployment Inflation Health Education
Under 30 68 20 23 39
Over 30 52 27 50 21
4 A newspaper surveys
150 people aged under
30 and 150 people over
30 about an upcoming
election. They want to
find out which issues
are most important to each age group.
COMPARING CATEGORICAL DATA (Chapter 18) 387
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\387IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:11 AM PETER
a Draw a back-to-back bar graph comparing the results for the under 30s and the
over 30s.
b Which issue is most important to the people aged under 30?
c Which issue is the least important to the people aged over 30?
d Which issue is mainly an “over 30s issue”?
Preference
Hotel Shopping Park
Male 83 39 48
Female 4 16 10
5 A council is considering whether to occupy
a block of land with a hotel, a shopping
centre, or a park. To find the local
residents’ opinions, the council surveys 200spectators at the local football match. The
results are shown alongside:
a Which option was most preferred by the people surveyed?
b Would it be reasonable for the council to make a decision based on the answer
given in a? Give a reason for your answer.
c Draw a side-by-side column graph comparing the preferences of males and
females, using relative frequencies.
d Is the option of a park preferred more by males or females?
6 The Government releases the following
graph showing the increase in employment
in the tourism industry over recent years.
a Explain why the graph is misleading.
b Redraw the graph in a way that more
accurately indicates the increase in
employment.
30
31
32
33
34
35
2005 2006 2007 2008
employment (’000)
year
388 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\388IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:13 AM PETER