And Here We Go … Get ready to study for the AP Stats test! Only 1050 minutes of class time until...
-
Upload
gerard-porter -
Category
Documents
-
view
214 -
download
1
Transcript of And Here We Go … Get ready to study for the AP Stats test! Only 1050 minutes of class time until...
And Here We Go … Get ready to study for the AP Stats test!
Only 1050 minutes of class time until the big
day…
Friday,MAY 10!
The Exam ItselfTo maximize your score on the AP Statistics Exam, you first need to know how the exam is organized and how it will be scored.
The AP Statistics Exam consists of two separate sections:
Section I 40 Multiple-Choice questions
90 minutes counts 50 percent of exam score
Section II Free-Response questions
90 minutes
counts 50 percent of exam score
Questions are designed to test your statistical reasoning and your communication skills.
SCORING:Five open-ended problems @ 13 minutes; each counts 15 percent of free-response scoreOne investigative task @ 25 minutes; counts 25 percent of free-response score
Each free-response question is scored on a 0 to 4 scale. General descriptors for each of the scores are:
Your work is graded holistically, meaning that your entire response to a problem is considered before a score is assigned.
4 Complete Response NO statistical errors and clear communication
3 Substantial Response Minor statistical error/omission or fuzzy communication
2Developing Response
Important statistical error/omission or lousy communication
1 Minimal ResponseA "glimmer" of statistical knowledge related to the problem
0Inadequate Response
No glimmer; statistically dangerous to himself and others
Calculator PolicyEach student is expected to bring to the exam a graphing calculator with statistical capabilities. The computational capabilities should include standard statistical univariate and bivariate summaries, through linear regression. The graphical capabilities should include common univariate and bivariate displays such as histograms, boxplots, and scatterplots.
• You can bring two calculators to the exam.
• The calculator memory will not be cleared but you may only use the memory to store programs, not notes.
• For the exam, you're not allowed to access any information in your graphing calculators or elsewhere if it's not directly related to upgrading the statistical functionality of older graphing calculators to make them comparable to statistical features found on newer models. The only acceptable upgrades are those that improve the computational functionalities and/or graphical functionalities for data you key into the calculator while taking the examination. Unacceptable enhancements include, but aren't limited to, keying or scanning text or response templates into the calculator.
• During the exam, you can't use minicomputers, pocket organizers, electronic writing pads, or calculators with QWERTY (i.e., typewriter) keyboards.
2008-09 List of Graphing CalculatorsGraphing calculators having the expected built-in capabilities listed above are indicated with an asterisk (*). However, students may bring any calculator on the list to the exam; any model within each series is acceptable.
CasioFX-6000 seriesFX-6200 seriesFX-6300 seriesFX-6500 seriesFX-7000 seriesFX-7300 seriesFX-7400 seriesFX-7500 seriesFX-7700 seriesFX-7800 seriesFX-8000 seriesFX-8500 seriesFX-8700 seriesFX-8800 seriesFX-9700 series *FX-9750 series *FX-9860 series *CFX-9800 series *CFX-9850 series *CFX-9950 series *CFX-9970 series *FX 1.0 series *Algebra FX 2.0 series *
Hewlett-PackardHP-9GHP-28 series *HP-38G *HP-39 series *HP-40 series* HP-48 series *HP-49 series *HP-50 series*
Radio ShackEC-4033EC-4034EC-4037
SharpEL-5200EL-9200 series *EL-9300 series *EL-9600 series *† EL-9900 series *
Texas Instruments TI-73TI-80TI-81TI-82 *TI-83/TI-83 Plus *TI-83 Plus Silver *TI-84 Plus *TI-84 Plus Silver *TI-85 * TI-86 *TI-89 *TI-89 Titanium *TI-Nspire *TI-Nspire CAS *
OtherDatexx DS-883 Micronta Smart2
Exam grade 2008 Statistics Goins 2008
5 14,009 12.8% 3 12%
4 24,528 22.6% 7 28%
3 25,707 23.8% 8 32%
2 20,403 18.8% 4 16%
1 23,637 21.9% 3 12%
Number of students 108,284 25
3 or higher / % 64,244 59.2% 18 72%
Mean grade 2.86 3.12
Standard deviation 1.34
1st AP Statistics test: 1997 ~ 7500 students2008 AP Stat test: ~ 100,000 students
Exam grade 2009 Statistics Goins 2009
5 12.3% 2 4.3%
4 22.3% 6 12.8%
3 24.2% 17 36.2%
2 19.1% 12 25.5%
1 22.2% 10 21.3%
Number of students 116,876 47
3 or higher / % 68,679 58.8% 25 53.3%
Mean grade 2.83 2.56
Standard deviation 1.33
1st AP Statistics test: 1997 ~ 7500 students2009 AP Stat test: 116,876 students
Exam grade 2010 Statistics Goins 2010
5 12.8% 5 13.9%
4 22.4% 10 27.8%
3 23.5% 11 30.6%
2 18.2% 6 16.7%
1 23.1% 4 11.1%
Number of students 129,899 36
3 or higher / % 58.7% 72.3%
Mean grade 2.84 3.167
Standard deviation 1.35 1.2
1st AP Statistics test: 1997 ~ 7500 students2010 AP Stat test: ~ 109,609 students
Exam grade 2011 Statistics Goins 2011
5 12.1% 8 16.0%
4 21.3% 18 36.0%
3 25.0% 14 28.0%
2 17.8% 7 14.0%
1 23.9% 3 6.0%
Number of students 142,910 50
3 or higher / % 58.8% 80.0%
Mean grade 2.82 3.42
Standard deviation 1.34 1.1
1st AP Statistics test: 1997 ~ 7500 students2011 AP Stat test: ~ 137,498 students
Exam grade 2012 Statistics Goins 2012
5 12.5% 5 8.2%
4 21.1% 13 21.3%
3 25.6% 17 27.9%
2 18.0% 16 26.2%
1 22.8% 10 16.4%
Number of students 153,859 61
3 or higher / % 59.2% 57.4%
Mean grade 2.83 2.62
Standard deviation 1.33
1st AP Statistics test: 1997 ~ 7500 students2012 AP Stat test: ~ 143,554 students
The AP Statistics Exam covers material in these areas:
I. Exploring data: describing patterns and departures from patterns (20-30%) Analyze data using graphical and numerical techniques Emphasis on interpreting info from graphical and numerical displays
and summaries
II. Sampling and experimentation: planning and conducting a study (10–15%) Collecting data with a well developed plan Clarifying the question and deciding on a method of data collection and
analysis
III. Anticipating patterns: Exploring random phenomena using probability and simulations (20-30%) Anticipating what the distribution of data should look like under a given
model
IV. Statistical inference: Estimating population parameters and testing hypotheses (30-40%) Selecting appropriate models for statistical inferences
What are the two types of univariate data sets?
Categorical: qualitative (brand)
Numerical: quantitative (numerical in nature)
Type of computer you use Car you drive Area codes
height Price of textbookAmount of cola in can
What are the two types of numerical data?
Discrete: possible values are isolated points on a number line
Continuous: possible values form an interval (measurements are usually continuous)
Number of AP classes
Distance lives from school
What are appropriate graphical displays for
categorical data?Bar Graphs• Bars do not touch• Categorical variable is
typically on the horizontal axis
• To describe – comment on which occurred the most often or least often
• May make a double bar graph or segmented bar graph for bivariate categorical data sets
Subject Preference
0
5
10
15
20
25
History Math Science English Business Foreignlanguage
Subject preference by gender
0
2
4
6
8
10
12
14
History Math Science English Business Foreignlanguage
Male
Female
Pie Charts• To make:
– Proportion X 360° – Using a protractor, mark off each part
• To describe – comment on which occurred the most often or least often
What are appropriate graphical displays for
categorical data?
Subject Preference
History6%
Math44%
Science27%
English13%
Business2%
Foreign language 8%
What are appropriate graphical displays for
numerical data?Dot Plot
• Used with numerical data (either discrete or continuous)
• Made by putting dots (or X’s) on a number line
• Can make comparative dotplots by using the same axis for multiple groups
Stem (and leaf) Plot
• Used with univariate, numerical data
• Must have key so that we know how to read numbers
• Can split stems when you have long list of leaves
• Can have a comparative stemplot with two groups (back to back)
What are appropriate graphical displays for
numerical data?Histograms• Used with numerical data• Bars touch on histograms• Two types
– Discrete• Bars are centered over discrete values
– Continuous• Bars cover a class (interval) of values
• For comparative histograms – use two separate graphs with the same scale on the horizontal axis
• Use no fewer than 5 classes (bars)• Check to see if scale is misleading• Look for symmetry and skewness
• . . . is used to answer questions about percentiles. • Percentiles are the percent of individuals that are at or
below a certain value.• Quartiles are located every 25% of the data. The first
quartile (Q1) is the 25th percentile, while the third quartile (Q3) is the 75th percentile. What is the special name for Q2?
• Interquartile Range (IQR) is the range of the middle half (50%) of the data.
IQR = Q3 – Q1
What are appropriate graphical displays for numerical data?
Cumulative Relative
Frequency Plot(Ogive)
What are appropriate graphical displays for numerical data?
Boxplot (and whisker)
• Used with numerical data (either discrete or continuous)
• Modified shows outliers• Can make comparative
by showing side-by-side on same scale
• Good for comparing quartile, medians, and spread
Why use boxplots?• ease of construction• convenient handling
of outliers• construction is not
subjective (like histograms)
• Used with medium or large size data sets (n > 10)
• useful for comparative displays
• does not retain the individual observations
• should not be used with small data sets (n < 10)
Why not use boxplots?
How to construct• find five-number summary
Min Q1 Med Q3 Max• draw box from Q1 to Q3• draw median as center line in the
box• extend whiskers to min & max
Modified boxplots• display outliers • fences mark off mild &
extreme outliers• whiskers extend to largest
(smallest) data value inside the fence
ALWAYS use modified boxplots in this class!!!
Inner fence
Q1 Q3
Q1 – 1.5IQR Q3 + 1.5IQRAny observation outside this fence is an outlier! Put a dot
for the outliers.
Interquartile Range (IQR) – is the range (length) of the box
Q3 - Q1
Modified Boxplot . . .
Q1 Q3
Draw the “whisker” from the quartiles to the observation that is within the
fence!
Outer fence
Q1 Q3
Q1 – 3IQR Q3 + 3IQR
Any observation outside this fence is an extreme outlier!
Any observation between the fences is considered a mild outlier.
the average number of text sent per month
the Math SAT Score for students at your school
the area code of an individual
the favorite movie type of AP Stat students by gender
the birth weights of female babies born at a large hospital
the number of speeding tickets each student in AP Stat received
Histogram
the number of TV’s in the homes of AP Stat students
the color of M&M candies selected at random from a bag
Continuous numerical
the income of adults in your city
Graphthe heights of male students in
your school
Type of variableVariable
Discrete numerical
Categorical Bar graph
Dot Plot
Stem Plot
Discrete numerical
Discrete numerical Dot Plot
Continuous numerical
Histogram
Categorical
Categorical
Bar graph – segmented or double
Bar graph
Discrete numerical
Cumulative frequency plot (ogive)
Histogram
Continuous numerical
Illustrated Distribution Shapes
Unimodal Bimodal Multimodal
Symmetric Skew positively(right)
Skew negatively(left)
Measures of Central Tendency
• Median - the middle of the data; 50th percentile–Observations must be in
numerical order–Is the middle single value if n is
odd–The average of the middle two
values if n is even
NOTE: n denotes the sample size
Measures of Central Tendency
• Mean - the arithmetic average
–Use to represent a population mean
–Use x to represent a sample mean
nx
x FormulaFormula: : is the capital Greek
letter sigma – it means to sum the values that
follow
parameter
statistic
Measures of Central Tendency
• Mode – the observation that occurs the most often
–Can be more than one mode
–If all values occur only once – there is no mode
–Not used as often as mean & median
Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median.
22 3 3 4 4 8 8 12 12
The numbers are in order & n is odd – so
find the middle observation.
The median is 4 lollipops!
Suppose we have sample of 6 customers that buy the following number of lollipops. The median is …
22 3 3 4 4 6 6 8 8 12 12
The numbers are in order & n is even – so find the middle two
observations.
The median is 5 lollipops!
Now, average these two values.
5
Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean.
22 3 3 4 4 6 6 8 8 12 12
To find the mean number of lollipops add the observations
and divide by n.
61286432 833.5x
What would happen to the median & mean if the 12 lollipops were 20?
22 3 3 4 4 6 6 8 8 20 20
The median is . . .
5
The mean is . . .
62086432
7.17
What happened?
What would happen to the median & mean if the 20 lollipops were 50?
22 3 3 4 4 6 6 8 8 50 50
The median is . . .
5
The mean is . . .
65086432
12.17
What happened?
Resistant -
• Statistics that are not affected by outliers
• Is the median resistant?
►Is the mean resistant?Is the mean resistant?
YES
NO
Now find how each observation deviates from the mean.
What is the sum of the deviations from the mean?
Look at the following data set. Find the mean.
22 23 24 25 25 26 29 30
5.25x
xx 0
Will this sum always equal zero?
YESThis is the deviation from
the mean.
Look at the following data set. Find the mean & median.
Mean =
Median =
21 23 23 24 25 25 26 2626 27
27 27 27 28 30 30 30 3132 32
27Create a histogram with
the data. (use x-scale of 2) Then find the mean
and median.
27
Look at the placement of the mean and median in this symmetrical distribution.
Look at the following data set. Find the mean & median.
Mean =
Median =
22 29 28 22 24 25 2821 25
23 24 23 26 36 38 6223
25Create a histogram with
the data. (use x-scale of 8) Then find the mean
and median.
28.176
Look at the placement of the mean and
median in this right skewed distribution.
Look at the following data set. Find the mean & median.
Mean =
Median =
21 46 54 47 53 60 55 5560
56 58 58 58 58 62 63 64
58Create a histogram with
the data. Then find the mean and median.
54.588
Look at the placement of the mean and
median in this skewed left distribution.
Recap:
• In a symmetrical distribution, the mean and median are equal.
• In a skewed distribution, the mean is pulled in the direction of the skewness.
• In a symmetrical distribution, you should report the mean!
• In a skewed distribution, the median should be reported as the measure of center!
Trimmed mean:Purpose is to remove outliers from a
data setTo calculate a trimmed mean:• Multiply the % to trim by n• Truncate that many observations from
BOTH ends of the distribution (when listed in order)
• Calculate the mean with the shortened data set
Find a 10% trimmed mean with the following data.
12 14 19 20 22 24 25 26 2635
10%(10) = 1
So remove one observation from each side!
228
2626252422201914
Why is the study of variability Why is the study of variability important?important?
• Allows us to distinguish between usual & unusual values
• In some situations, want more/less variability–scores on standardized tests
–time bombs
–medicine
Range: • Single number – not an interval
• Sensitive to outliers
• Midrange – average of the max and min values - VERY sensitive to outliers
13 QQIQR
Interquartile Range (IQR): .Quartiles:The first quartile (Q1) is the value for which 25% of the observations are less than. It is the Median of the first half of the set of observations. (the 25th percentile)
The third quartile (Q3) is the value for which 75% of the observations are less than. It is the Median of the second half of the set of observations. (the 75th percentile)
IQR is insensitive to outliers.
The average of the deviations squared is called the variance.
Population Sample
2 2s
parameter statistic
Suppose that we have this population:
24 34 26 30 3716 28 21 35 29
Find the mean
Find the deviations. x
What is the sum of the deviations from the mean?
( )
24 34 26 30 3716 28 21 35 29
Square the deviations: 2x
Find the average of the squared deviations:
2
2 x
n
Degrees of Freedom Degrees of Freedom (df)(df)
• n deviations contain (n - 1) independent pieces of information about variability
Calculation of standard Calculation of standard deviation of a sampledeviation of a sample
1
2
n
xxs n
Note: Variance and Standard Deviation are used to measure spread when the mean is used to describe
center.
Note: IQR is typically used to describe spread when Median is used to describe center.
Note: When the distribution is approximately symmetric, the mean and standard deviation are
generally used to summarize the distribution. If the distribution is skewed, a five number summary is
generally use
When to use what??????
Linear transformation ruleLinear transformation rule
• When adding a constant to a random variable, the mean changes but not the standard deviation.
• When multiplying a constant to a random variable, the mean and the standard deviation changes.
An appliance repair shop charges a $30 service call to go to a home for a repair. It also charges $25 per hour for labor. From past history, the average length of repairs is 1 hour 15 minutes (1.25 hours) with standard deviation of 20 minutes (1/3 hour). Including the charge for the service call, what is the mean and standard deviation for the charges for labor?
25.61$)25.1(2530
33.8$31
25
Rules for Combining two variablesRules for Combining two variables
• To find the mean for the sum (or difference), add (or subtract) the two means
• To find the standard deviation of the sum (or differences), ALWAYS add the variances, then take the square root.
• Formulas:
baba
baba
22baba
If variables are independent
Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted, etc.). Based on past experience, the times for each setup phase are independent with the following means & standard deviations (in minutes). What are the mean and standard deviation for the total bicycle setup times?
Phase Mean SD
Unpacking 3.5 0.7
Assembly 21.8 2.4
Tuning 12.3 2.7
minutes6.373.128.215.3 T
minutes680.37.24.27.0 222 T
Normal Normal DistributionsDistributions
• Symmetrical bell-shaped (unimodal) density curve
• AboveAbove the horizontal axis• N(, )• The transition points occur at + • Probability is calculated by finding the area area
under the curveunder the curve• As increasesincreases, the curve flattens &
spreads out• As decreasesdecreases, the curve gets
taller and thinner
How is this done
mathematically?
Normal distributions occur Normal distributions occur frequently.frequently.
• Length of newborn child• Height• Weight• ACT or SAT scores• Intelligence• Number of typing errors • Chemical processes
A
B
Do these two normal curves have the same mean? If so, what is it?
Which normal curve has a standard deviation of 3?
Which normal curve has a standard deviation of 1?
6
YESYES
BB
AA
Empirical RuleEmpirical Rule•Approximately 68%68% of the
observations fall within of •Approximately 95%95% of the
observations fall within 2 of •Approximately 99.7%99.7% of the
observations fall within 3 of
Suppose that the height of male students at SHS is normally distributed with a mean of 71 inches and standard deviation of 2.5 inches. What is the probability that the height of a randomly selected male student is more than 73.5 inches?P(X > 73.5) = 0.16
71
68%
1 - .68 = .32
Standard Normal Density Standard Normal Density CurvesCurves
Always has = 0 & = 1
To standardize:
x
zMust have
this memorize
d!
Strategies for finding Strategies for finding probabilities or proportions in probabilities or proportions in
normal distributionsnormal distributions
1.State the probability statement
2.Draw a picture3.Calculate the z-score4.Look up the probability
(proportion) in the table
The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last less than 220 hours?P(X < 220) =
33.115
200220
z
.9082
Write the probability statement
Draw & shade the
curve
Calculate z-score
Look up z-score in
table
The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last more than 220 hours?P(X>220) =
33.115
200220
z
1 - .9082 = .0918
The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. How long must a battery last to be in the top 5%?P(X > ?) = .05
675.22415
200645.1
x
x .95.05
Look up in table 0.95 to find z- score
1.645
The heights of the female students at SHS are normally distributed with a mean of 65 inches. What is the standard deviation of this distribution if 18.5% of the female students are shorter than 63 inches?P(X < 63) = .185
6322.2
9.2
65639.
What is the z-score for the 63?
-0.9
The heights of female teachers at SHS are normally distributed with mean of 65.5 inches and standard deviation of 2.25 inches. The heights of male teachers are normally distributed with mean of 70 inches and standard deviation of 2.5 inches. •Describe the distribution of differences of heights (male – female) teachers.
Normal distribution with = 4.5 & = 3.3634
• What is the probability that a randomly selected male teacher is shorter than a randomly selected female teacher?
4.5
P(X<0) =
34.13634.3
5.40
z
.0901
Will my calculator do any of this normal
stuff?• Normalpdf – use for graphing ONLYONLY
• Normalcdf – will find probability of area from lower bound to upper bound
• Invnorm (inverse normal) – will find z-score for probability
Bivariate data
• x – variable: is the independent or explanatory variable
• y- variable: is the dependent or response variable
• Use x to predict y
bxay ˆ
b – is the slope– it is the approximate amount by which y increases when x increases by 1 unit
a – is the y-intercept– it is the approximate height of the line
when x = 0– in some situations, the y-intercept has
no meaning
y - (y-hat) means the predicted y
Be sure to put the hat on the y
Least Squares Regression LineLSRL
• The line that gives the bestbest fit to the data set
• The line that minimizesminimizes the sum of the squares of the deviations from the line
Slope:
For each unitunit increase in xx, there is an approximateapproximate increase/decreaseincrease/decrease of bb in yy.
Interpretations
Correlation coefficient:There is a direction, strength, lineardirection, strength, linear of association between xx and yy.
Identify as having a positivepositive association, a negativenegative association, or nono association.1. Heights of mothers & heights of their
adult daughters++
2. Age of a car in years and its current value
3. Weight of a person and calories consumed
4. Height of a person and the person’s birth month
5. Number of hours spent in safety training and the number of accidents that occur
--++NONO
--
Correlation Coefficient (r)-• A quantitativequantitative assessment of the
strength & direction of the linear relationship between bivariate, quantitative data
• Pearson’s sample correlation is used most
• parameter - rho)
• statistic - r
y
i
x
i
s
yy
s
xx
nr
1
1
Moderate CorrelationStrong correlation
Properties of r(correlation coefficient)
• legitimate values of r is [-1,1]
0 .5 .8 1-1 -.8 -.5
No Correlation
Weak correlation
Properties of r(correlation coefficient)
•value of r is not changed by any transformationstransformations
•value of r does not depend on which of the two variables is labeled x
•value of r is non-resistantnon-resistant
•value of r is a measure of the extent to which x & y are linearlylinearly related
Correlation does not imply causation
Correlation does not imply causation
Correlation does not Correlation does not imply causationimply causation
Interpolation (good): • Using a regression line for estimating predicted values between known values.
•Extrapolation (bad):Extrapolation (bad):It is unknown whether the pattern observed in the scatterplot continues outside this range. The LSRL should notshould not be used to predict y for values of x outside the data set.
The following statistics are found for the variables posted speed limit and the average number of accidents.
99814818
61140
.,.,
,.,
rsy
sx
y
x
Find the LSRL & predict the number of accidents for a posted speed limit of 50 mph.
9210723 ..ˆ xy accidents2325.ˆ y
Residuals (error) -Residuals (error) -
• The vertical deviation between the observations & the LSRL
• the sum of the residuals is alwaysalways zero zero
• error = observed - expected
yy ˆresidual
Residual plotResidual plot
• A scatterplot of the (x, residual) pairs.
• Residuals can be graphed against other statistics besides x
• Purpose is to tell if a linear associationlinear association exist between the x & y variables
• If no patternno pattern exists between the points in the residual plot, then the association is linearlinear.
Coefficient of determination-Coefficient of determination-
• r2
• gives the approximate proportion of variationvariation in yy that can be attributed to an linear relationship between x & y
• remains the same no matter which variable is labeled x
Interpretation of r2
Approximately rr22%% of the variation in yy can be explained by the LSRL of xx & yy.
Outlier –Outlier –• In a regression setting, an outlier is a
data point with a largelarge residual
•Influential point-Influential point- A point that influences where the LSRL is located If removed, it will significantly change the slope of the LSRL
Which of these measures are Which of these measures are resistant?resistant?
• LSRL
• Correlation coefficient
• Coefficient of determination
NONENONE – all are affected by outliers