TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take...

18
The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 4 TYPES OF DATA Univariate data – Examines the distribution features of one variable. Bivariate data – Explores the relationship between two variables. Univariate and bivariate analysis will be revised separately. TYPES OF VARIABLES NUMERICAL VARIABLES Numerical variables represent quantities. They have numerical values. They are measured or counted. Numerical variables can either be continuous or discrete. A continuous variable can take any value in a given range. They are usually measured. Examples: Height, weight, number of litres of fuel. A discrete variable can take on only certain distinct values in a given range. They are often counted. i.e. 0, 1, 2, 3, etc. Examples: Number of siblings, number of goals in netball. CATEGORICAL VARIABLES Categorical variables represent qualities. The answer to the statistical question is a word rather than a number. Examples: Eye colour, favourite brand of cereal, type of car driven Categorical variables can be separated into two types, nominal and ordinal, but while this sometimes helps with understanding, the Further Maths Study design only refers to Categorical variables.

Transcript of TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take...

Page 1: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 4

TYPES OF DATA Univariate data – Examines the distribution features of one variable. Bivariate data – Explores the relationship between two variables.

Univariate and bivariate analysis will be revised separately.

TYPES OF VARIABLES

NUMERICAL VARIABLES

Numerical variables represent quantities. They have numerical values. They are measured or counted. Numerical variables can either be continuous or discrete. A continuous variable can take any value in a given range. They are usually measured. Examples: Height, weight, number of litres of fuel. A discrete variable can take on only certain distinct values in a given range. They are often counted. i.e. 0, 1, 2, 3, etc. Examples: Number of siblings, number of goals in netball.

CATEGORICAL VARIABLES Categorical variables represent qualities. The answer to the statistical question is a word rather than a number. Examples: Eye colour, favourite brand of cereal, type of car driven Categorical variables can be separated into two types, nominal and ordinal, but while this sometimes helps with understanding, the Further Maths Study design only refers to Categorical variables.

Page 2: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 5

Nominal categorical variables have no natural order, such as eye colour (blue, brown, green…). Ordinal categorical variables have a natural order such as salary (high, medium or low). Sometimes numerical values or scores can be assigned to can ordinal data, for example such as ‘tidies up his/her room’ (1 = regularly, 2 = sometimes, 3 = rarely, 4 = never). However, these numerical values are artificial and therefore such variables are still considered categorical.

SUMMARY

The different types of variables collected can be summarised as follows:

Warning! It is not the variable name itself that determines whether the data is numerical or not. It is the way the data was recorded. Length, for example, could be recorded in ‘m’ and hence is numerical BUT if length was recorded as ‘long’, ‘average’ or ‘short’ it is categorical. QUESTION 1 The level of water usage of 200 homes was rated in a survey as low, medium or high and the size of the houses as small, standard or large. The results of the survey are displayed in the table below.

Water Usage House Size

Large Standard Small

Low 2 10 7

Medium 19 60 13

High 19 40 30

The variables, level of water usage and size of house, as recorded in this survey, are: A Both numerical variables.

B Both categorical variables.

C Neither numerical nor categorical variables.

D Numerical and categorical variables respectively.

E Categorical and numerical variables respectively.

Page 3: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 6

QUESTION 2 Which one of the following is an example of continuous numerical data? A Number of runs made by a cricket player.

B Speed of a car captured by a speed camera.

C Your favourite secondary school year level.

D Shoe sizes.

E Labor / Liberal preference of 100 people surveyed. QUESTION 3 30 shoppers using the express lane were surveyed to find the number of items being purchased. The results were as follows.

3 5 6 12 8 15 6 8 7 5 9 11 4 8 7 12 6 8 10 8 11 9 4 8 13 1 2 7 5 11 7.

The type of data is: A Numerical discrete.

B Numerical continuous.

C Categorical ordinal.

D Categorical nominal.

E Categorical discrete.

UNIVARIATE ANALYSIS The prime objectives of Univariate analysis are to determine:

Type of distribution (i.e. shape and if any outliers)

Central tendency

Spread

ORGANISING DATA Data needs to be organised so that further analysis can be completed promptly and accurately. The data is organised either as it is collected or afterwards. The techniques used in Further Maths are: Frequency Tables

Dot plots

Bar charts

Histograms

Stem and leaf plots

Boxplots

Page 4: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 7

In organising the data there will be the need to group the data. Categorical data is self-evident as each category would be labelled according to the

category and is unique from the others.

Numerical continuous data will often be grouped. No hard and fast rules apply, but generally the data should have between five and no more than fifteen groups. The size of the interval is usually in 1’s, 2’s, 5’s or 10’s with others as the need arise. To judge this correctly look at the highest and lowest value and divide by a suitable grouping interval to find the number of intervals so that there is no less than five groups or no more than 15 groups. Often the size of the interval is relevant in the “big picture”.

Numerical discrete data can be ungrouped if there are less than 15 different scores. If

there are more, then group the data in multiples of 2, 5, 10 so there is a minimum of 5 groups and no more than 15 groups.

QUESTION 4 The following is a survey of 30 students recording the number of people in their family living at home. Summarise in a frequency table.

2 4 5 6 4 3 4 3 2 5 6 6 5 4 5 3 3 4 5 8 7 4 3 4 2 5 4 6 4 5 Solution 1. Identify it is numerical discrete. 2. There are only 7 different scores so leave as ungrouped. 3. Use a frequency table of two (three) columns. The first column is for the possible scores (x) and can be labelled as Number of members in a family. The second column is for the frequency (f) and can be labelled Number of students or families. Optional 3rd column used as a tally for very large sets of data. 4. Add up the frequency (Σf ) to confirm scores from 30 students has been recorded.

Number of people in the family

(x)

TALLY

Number of

Families (f)

2 3

3 5

4 9

5 7

6 4

7 1

8 1

Σf = 30

Page 5: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 8

VISUAL DISPLAYS Stem plots (stem and leaf plot): Stem & leaf plot or stemplot is a most useful technique for summarising numerical data. It has two main columns where the stem holds the group value while the leaf holds the final digit in the recorded data. The following stem & leaf plot of 20 students’ CD collection size. Features of stem & leaf plot display 1. Title CD collection size of 20 students 2. Column heading Stem Leaf 3. Data values summarised 0 4 keep in neat columns the final digits 1 4 6 2 2 3 5 8 3 0 4 4 6 7 7 9 4 1 3 5 9 5 4 6 4. Legend 2 5 = 25 CDs From the stemplot shown the scores collected were 4, 14, 16, 22, 23, 25, 28, 30, 34, 34, 36, 37, 37, 39, 41, 43, 45, 49, 54, 56. The benefits of using a stemplot are all the scores actual values are not lost like they

would be in a grouped frequency table. Stemplots visually display the shape of the distribution. Each group is represented as a stem. The stem represents the first digit or set of

digits of the available data (called leading digits). Example: A stem of 0 can be used to represent data whose first digit lies within the interval 0 – 9. A stem of 1 can be used to represent data whose first digit lies within the interval 10 – 19.

The leaf represents the last digit (called trailing digits) in the available data. If more than one piece of data carries the same value, you must represent each piece

of datum in the stem and leaves.

Stem and leaf plots can also be written for very large or very small numbers, using the key to overcome the need for decimal points or many zeros.

An ordered stem and leaf plot is more useful than an unordered one, as position of

values in distributions can be easily found.

Page 6: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 9

QUESTION 5 The golf scores for 30 golfers were recorded as follows. Summarise as an ordered stemplot grouped in 5’s. Comment on the distribution of scores.

63 72 74 71 70 75 66 73 69 75 79 68 85 78 76 72 73 71 70 81 73 74 82 87 78 79 68 75 76 72

Golf scores for 30 golf players

Stem Leaf

6 65 7

75 8 85 7 4 = 74 strokes QUESTION 6 For the stem & leaf plot shown: Stem Leaf

7 8 8 0 8 9 9 1 6 7 8 10 3 5 8 11 2

7 6 = 7.6 Which of the following data sets match the above stem & leaf plot? A 78 80 88 89 91 96 97 98 103 105 108 112

B 8 0 8 9 1 6 7 8 3 5 8 2

C 7.8 8.0 8.8 8.9 9.1 9.6 9.7 9.8 1.03 1.05 1.08 1.12

D 7.8 8.0 8.8 8.9 9.1 9.6 9.7 9.8 10.3 10.5 10.8 11.2

E 8.8 8.9 9.1 9.6 9.7 9.8 10.3 10.5 10.8

Page 7: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 10

Bar Graphs: For categorical data. Length of each bar represents frequency, relative frequency or percentage frequency. Width of bars and spaces between bars must be kept uniform. Note: Bars usually don’t touch one another.

Segmented bar charts: For categorical data. A segmented bar chart is very similar to a bar chart except the bars are stacked on

one another to give a single bar.

The lengths of the segments correspond to the data set frequency. Usually the percentage frequency is shown, rather than the frequency count. This is

then called a percentaged segmented bar chart. The height of the bar gives the total frequency or 100% if percentage is used.

Dot Plots: Dot plots are graphical displays of discrete numerical or categorical data sets. A single dot represents each data point. They show the shape of the distribution. They are often used for small data sets or to compare more than one data set. A dot plot is constructed by drawing a scale and then evenly placing dots above the

appropriate values to represent the data points. There is no vertical axis.

Page 8: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 11

QUESTION 7 At a major family reunion of 200 of the Ryan clan, the age groupings were recorded for all the people who attended. For the following data set summarised in a frequency table display as a bar chart (percentaged).

Age group Number of

Babies 20

Toddlers 24

Children 34

Teenagers 36

Adults 50

Pensioners 36

Barchart: 1. Give the barchart a suitable title. 2. Label the horizontal axis with the names of the categories of age groups. Leave a space between each category. 3. Label the vertical axis as frequency (percentage) or Percentage of family members. 4. Size each column to the calculated percentage from table. Segmented Bar chart: 1. Give the segmented barchart a suitable title. 2. Label the horizontal axis with variable ‘age groups”. 3. Label the vertical axis as frequency (percentage) or percentage of family members. 4. Stack each column on top of each other.

Page 9: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 12

QUESTION 8 For the following data set display as (i) bar chart (ii) dot plot: Type of sport played by 32 students on the weekend. basketball football basketball soccer tennis netball basketball football football tennis netball netball football soccer netball karate football football athletics basketball netball tennis football soccer football soccer basketball athletics soccer tennis soccer hockey For Barcharts: 1. Identify all the different categories. 2. Give the barchart a suitable title. 3. Label the horizontal axis with the names of the categories of sport. Leave a space between each category. 4. Label the vertical axis as frequency or number of students. 5. Count the number of each sport and make the bar the height of that frequency.

Follow the instructions provided.

For Dot plots: 1. Identify all the different categories. 2. Give the dot plot a suitable title. 3. Label the horizontal axis with the names of the categories of sport. Leave a space between each category. 4. Use a single dot represents each data point.

QUESTION 9 The segmented bar chart shows the distribution of degree of text-messaging use by a random sample of school students. For these students, the percentage in year levels 9 – 10 who are ‘occasional text messagers’ is closest to: A 25% B 32% C 42% D 50% E 75%

Degree of text messaging

Page 10: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 13

HISTOGRAMS For numerical data, either discrete or continuous data, grouped data. The height of the bar gives the frequency (count or percentage). The end of one rectangle must be the beginning of the next rectangle i.e. no gaps. We can also plot percentage frequencies or relative frequencies on the vertical axis for

both grouped and ungrouped data.

If the data is individual discrete values the bars are directly above the appropriate values.

If the data is continuous class intervals the bars start at the bottom end of the class

interval and extend to the upper end of the class interval. QUESTION 10 The following is a survey of 30 students of the number of people in their family living at home.

2 4 5 6 4 3 4 3 2 5 6 6 5 4 5 3 3 4 5 8 7 4 3 4 2 5 4 6 4 5 Display as histogram. Solution FOR HISTOGRAMS 1. Identify it is numerical discrete data with a

range of less than 15. Label the x-axis as Number of members in a family, leave a space at the beginning and scale the x-axis from lowest to highest score. Label the score in the middle of each column.

2. Label the y-axis as frequency (or number

of families). 3. Draw in the first column in the middle of

the score on the x-axis and a height to the value of its frequency taken from y-axis.

4. Repeat for all other scores.

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9

Number of families

Number of members in a family

Histogram of Members in Families.

Page 11: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 14

USING A TI 84 GRAPHIC CALCULATOR Entering data 1. Press STAT and select 1:Edit. 2. Enter raw scores into list L1. Setting Up Display-Histogram 3. Press 2nd STATPLOT and select 1:Plot1. 4. Select ON and cursor down and select histogram icon. 5. Cursor down and set score list to L1

(Xlist:L1) and frequency to 1 (Freq:1). Display Histogram. 6. Press ZOOM and select 9:ZoomStat. Press WINDOW to fine tune the x (scores) and y (frequency) axis values and interval size (xscl).

Press GRAPH and trace and cursor left and right to investigate the value and frequency of each column.

USING THE CASIO CLASSPAD CALCULATOR

In Stat menu, enter data into List 1

SetGraph → Settings → set screen as

shown → Set →

Set the interval, here we want to start at 10 and go up in class intervals of 10 → OK The graph will appear as shown in the bottom half of the screen.

Resize screen if necessary, Analysis → trace will allow you to find heights of peaks using the 4 way key to move across.

Page 12: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 15

QUESTION 11 Distances students travel to school by 300 students are summarised in the table below:

Distance km Number of students

0 - < 2 112

2 - < 4 65

4 - < 6 56

6 - < 8 44

8 - < 10 15

10 - < 12 8

Represent this information as Histogram. Solution

Page 13: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 16

SHAPES OF DISTRIBUTIONS The shape or skewness describes how symmetrical the distribution or results are. Most distributions are skewed. Perfect symmetry: The mean, mode and median are similar.

Frequency polygon will often be a bell shaped curve. Note that data can be symmetric without being bell shaped (normal).

Negative skew: Tail of curve stretches to the left. Most of the data is concentrated around the higher values. Mode, mean and median will not coincide (not the same values). The mean will usually be lower than the median. Mode, mean and median are located to the right (in the main body of curve).

Positive skew: Tail of curve stretches to the right. Most of the data is concentrated around the lower values. Mode, mean and median will not coincide (not the same values). The mean will usually be higher than the median. Mode, mean and median are located to the left (in the main body of curve).

QUESTION 12 For the frequency table shown the shape of the distribution is most likely to be:

Score 0 1 2 3 4 5

Frequency 13 14 5 2 0 2

A Positively skewed B Negatively skewed C Skewed to the left D Bimodal E Bell-shaped

Page 14: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 17

PARAMETERS IN STATISTICS

Measures of central tendency are given by the mode, median and mean. These measures can be applied to both grouped and ungrouped data.

Page 15: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 18

MODE Value which occurs most frequently or most popular value. Data sets can have 2 or more modes or modal classes. As the mode is simply the score (or interval) with the highest frequency, it is not always

a measure of centre. However, when the number of scores is very large and the distribution is symmetric,

the mode often does occur near the central value.

MEDIAN The median represents the central value or middle value of the given data set. The median can be viewed as the value below which 50% of the results lie. The median is not affected by extreme values, so it is a good measure of centre when

outliers are present or when the distribution is skewed.

The median is located at the thn

21

position in the data set.

If n is odd, the median is the middle data set.

If n is even, the median is the average of the two middle data points.

MEAN The mean represents the average value of the given data set. It is found by adding together all the values of the data and dividing by the number of

values.

For Ungrouped Data:

nxx i.e. valuesofNumber

valuestheofSum

When there is a large amount of data, group data together in a frequency table.

For Grouped Data:

ffx

xn

i ii

When values are given in class intervals:

Mean:- ( )Sum of midpoint of class frequencySum of frequencies

Page 16: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 19

For continuous variables: Assume that all the frequencies in a class interval have the same value as the midpoint of that interval. i.e.

x = midpoint of interval f frequency of that class interval Total for a class = Midpoint of class Frequency For discrete variables: Midpoint of interval is calculated by taking the mean of the first and last value of each interval.

The mean is not necessarily equal to one of the given values in the data set. The mean is affected by extreme values and by skewing, unless there are extremes at

both ends that balance out. In this way the mean takes into account all of the given data.

QUESTION 13 For the following raw data: 4 8 5 6 7 3 9 4 4 5 Calculate the: (a) Median (b) Mean (c) Mode Solution

(a) Median

REORDER from LOWEST to HIGHEST 3 4 4 4 5 5 6 7 8 9

Median score 1

2n th score

10 1 5.5

2 th score

that is between 5th and 6th score which is between 5 and 5 so Median = 5. (b) Mean

Mean = x x sumof all scoresn number of scores

x 4 8 5 6 7 3 9 4 4 5

10

55 5.510

(c) Mode

Look for a score that occurs the most often in the list.

4 8 5 6 7 3 9 4 4 5

The Mode is 4 as it occurred 3 times.

Page 17: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 20

ALTERNATIVE METHOD USING GRAPHIC CALCULATOR

1. Press STAT and select 1:EDIT.

2. Enter the scores in the L1 table.

3. Press STAT and select CALC

4. Select 1:1-Var Stats and press ENTER.

5. The mean is given as x

6. The number of scores is given as n

7. Scroll down the screen for the median, given as Med = Note: Some graphic calculators do not give the mode. Examine raw data for most frequent score.

Mean = 5.5. Number of scores is 10. Median is 5. Mode is 4 (it occurred 3 times).

QUESTION 14 The survey of 30 people and the number of cars in their household was summarised in a frequency table as shown. Calculate the: (a) Median (b) Mean (c) Mode Solution Number of cars

Number of households

0 0 1 3 2 12 3 8 4 4 5 2 6 1 Σ f 30

Page 18: TYPES OF DATA - TSFX – The School for Excellence | Who ...€¦ · A continuous variable can take any value in a given ... the Further Maths Study design only ... generally the

The School For Excellence 2011 The Essentials – Further Mathematics – Core Materials Page 21

METHOD USING GRAPHIC CALCULATOR 1. Press STAT and select 1:EDIT.

2. Enter the scores in the L1 table and the frequencies in L2.

3. Press STAT and select CALC

4. Select 1:1-Var Stats and press ENTER

5. Press 2nd L1 followed by COMMA and the 2nd L2.

6. Press ENTER

The mean is given as x The number of scores is given as n Scroll down the screen for the median, given as Med =

Mean = 2.766 Number of scores is 30.

<screen47> The median is 2.5 The mode is 2 (it occurred 12 times).

QUESTION 15 The results of a mathematics test marked out of 60 are represented by this stem-and-leaf plot.

Stem Leaf 0 8 1 8 9 2 2 7 8 3 0 1 3 6 9 4 2 4 5 5 6 7 9 5 4 7 8 8 9 9 6 0 0 0 Key 5|4 means 54

The mean, median and mode, respectively, are: A 60, 42, 49

B 40, 45, 60

C 46, 60, 40

D 42, 45, 60

E 60, 45, 42