Welcome to MDM4U (Mathematics of Data Management, University Preparation)
-
Upload
alyson-heath -
Category
Documents
-
view
256 -
download
5
Transcript of Welcome to MDM4U (Mathematics of Data Management, University Preparation)
Welcome to MDM4U (Mathematics of Data Management, University Preparation)
http://www.wordle.net/
AGENDA
Attendance Course Outline Chapter 1 Problem (CP1)
Assign textbooks
1.1 Displaying Data Visually
Learning goal: Classify data by typeCreate appropriate graphs
MSIP / Home Learning: p. 11 #2, 3ab, 4, 7, 8
Chapter 1 Problem
Log on to a computer You may pair up if no computers are available
Click MDM4U.LIEFF.CA Save the file MDM4U CP1.PDF to your M:\
drive Create a MDM4U folder Create a Ch1 folder
Answer CP1 and CP2 in a Word document
Why do we collect data? We learn by observing Collecting data is a systematic method of
making observations Allows others to repeat our observations
Good definitions for this chapter at: http://www.stats.gla.ac.uk/steps/glossary/alphabet.html
Types of Data 1) Quantitative – can be represented by a number
E.g. age, height, weight, number of siblings
a) Discrete Data Data where a fraction/decimal is impossible E.g., Age, Number of siblings
b) Continuous Data Data where fractions/decimals are possible E.g., Weight, Height, Academic average
2) Qualitative – cannot be measured numerically E.g. eye colour, hair colour, favourite band
Who do we collect data from?
Population - the entire group from which we can collect data / draw conclusions NOTE: Data does NOT have to be collected from every
member Census – data collected from every member of the
pop’n Data is representative of the population Can be time-consuming and/or expensive
Sample - data collected from some members of the pop’n (min. 10%) A good sample must be representative of the pop’n Sampling methods in Ch2
Organizing Data A frequency table is
often used to display data, listing the variable and the frequency.
What type of data does this table contain?
Intervals can’t overlap Use from 3-12 intervals
/ categories
Day Number of absences
Monday 5
Tuesday 4
Wednesday 2
Thursday 0
Friday 8
Organizing Data (cont’d) Another useful organizer is a
stem and leaf plot. This table represents the
following data:
101 103 107
112 114 115 115
121 123 125 127 127
133 134 134 136 137 138
141 144 146 146 146
152 152 154 159
165 167 168
Stem(first 2 digits)
Leaf(last digit)
10 1 3 7
11 2 4 5 5
12 1 3 5 7 7
13 3 4 4 6 7 8
14 1 4 6 6 6
15 2 2 4 9
16 5 7 8
Organizing Data (cont’d) What type of data is this? The class interval is the size of
the grouping, and is 10 units here 100-109, 110-119, 120-129, etc. No decimals req’d
Stem can have as many numbers as needed
A leaf must be recorded each time the number occurs
Stem Leaf
10 1 3 7
11 2 4 5 5
12 1 3 5 7 7
13 3 4 4 6 7 8
14 1 4 6 6 6
15 2 2 4 9
16 5 7 8
Measures of Central Tendency Used to indicate one value that best represents a
group of values Mean (Average)
Add all numbers and divide by the number of values Affected greatly by outliers (values that are significantly
different from the rest) Median
Middle value Place all values in order and choose middle number For an even # of values, average the 2 middle ones Not affected as much by outliers
Mode Most common number There can be none, one or many modes Only choice for Qualitative data
Displaying Data – Bar Graphs Typically used for
qualitative/discrete data Shows how certain
categories compare Why are the bars
separated? Would it be incorrect if
you didn’t separate them?
Number of police officers in Crimeville, 1993 to 2001
Bar graphs (cont’d) Double bar graph
Compares 2 sets of data
Internet use at Redwood Secondary School, by sex, 1995 to 2002
Stacked bar graph Compares 2 variables Can be scaled to 100%
Displaying Data - Histograms
Typically used for Continuous data
The bars are attached because the x-axis represents intervals
Choice of class interval size is important. Why?
Displaying Data –Pie / Circle Graphs A circle divided up
to represent the data
Shows each category as a portion of the whole
See p. 8 of the text for an example of creating these by hand
Scatter Plot
A scatter plot shows the relationship between two numeric variables
This relationship, called a correlation, can be positive, negative or none
A line or curve of best fit (regression line) can be used to model the relationship
Examining Trends
A line graph shows long-term trends over time e.g. stock price, currency, moving average
Examining the spread of data
A box and whisker plot shows the spread of data
Divided into 4 quartiles with 25% of the data in each
Instructions for creating these may be found on page 9 of the text or at:
http://regentsprep.org/Regents/math/data/boxwhisk.htm
MSIP / Home Learning
p. 11 #2, 3ab, 4, 7, 8
Mystery Data
Gas prices in the GTA
3-Jan-0
8
22-Feb-0
8
12-Apr-0
8
1-Jun-0
8
21-Jul-0
8
9-Sep-0
8
29-Oct-
080.0000.2000.4000.6000.8001.0001.2001.4001.600
f(x) = − 1.78984476996036E-05 x² + 1.41853083716074 x − 28104.9051549717R² = 0.818508472651409
Hint: These values should get you pumped!
An example… these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency table, stem and leaf plot
and graph13.60 15.60 17.20 16.00 17.50 18.60 18.7012.20 18.60 15.70 15.30 13.00 16.40 14.3018.10 18.60 17.60 18.40 19.30 15.60 17.1018.30 15.20 15.70 17.20 18.10 18.40 12.0016.40 15.60
Answers…
Mean = 494.30/30 = 16.48 Median = average of 15th and 16th numbers Median = (16.40 + 17.10)/2 = 16.75 Mode = 15.60 and 18.60 The data is numerical, so at least Interval
data. It has an absolute starting point, so it is ratio data.
Decimals so quantitative and continuous. Given this, a histogram is appropriate
1.2 Conclusions and Issues in Two Variable Data
Learning goal:Draw conclusions from two-variable graphs
MSIP / Home Learning
Read pp. 16–19
Complete p. 20–24 #1, 4, 9, 11, 14
Having the data is not enough. [You] have to show it in ways people both enjoy and understand.- Hans Rosling
What conclusions are possible? To draw a conclusion, a number of conditions
must apply data must be representative of the population sample size must be large enough data must address the question
Types of statistical relationships Correlation
two variables appear to be related i.e., a change in one variable is associated with a change in
the other e.g., salary increases as age increases
Causation a change in one variable is proven to cause a change in
the other usually requires an in-depth study i.e. WE WILL NOT DO
THIS IN THIS COURSE!!! e.g., incidence of cancer among smokers
Do not use the P-word!!!
Example 1 – Split bar graph
Do females like school more than males do?
Example 2 – Is there a correlation between attitude and performance?
Example 3 – Examine all 1046 students
Drawing Conclusions
Do females seem more likely to be interested in student government?
Does gender appear to have an effect on interest in student government?
Is this a correlation? Is it likely that being
female causes interest?
0
10
20
30
40
50
Yes No
Students Interested in Student Government
FemaleMale
References
Calkins, K. (2003). Definitions, Uses, Data Types, and Levels of Measurement. Retrieved August 23, 2004 from http://www.andrews.edu/~calkins/math/webtexts/stat01.htm
James Cook University (n.d.). ICU Studies Online. Retrieved August 23, 2004 from http://www.jcu.edu.au/studying/services/studyskills/scientific/data.html