Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student...

12
A Sequence of Activities for Developing Statistical Concepts Christine Franklin & Gary Kader Introduction The Board of Directors of the American Statistical Association (ASA) at its May 2005 meet- ing endorsed the report, “A Curriculum Framework for Pre K-12 Statistics Education.” The develop- ment of this Framework was supported by the ASA though funding of a Strategic Initiative Grant pro- posed by the ASA Advisory Committee on Teacher Enhancement. The Framework is designed to give educators guidance towards developing statistical- ly literate citizens. The writers of the Framework were Christine Franklin, Gary Kader, Denise Mewborn, Jerry Moreno, Mike Perry, Roxy Peck, and Richard Scheaffer. The Framework Model Statistical Problem Solving and the Evolution of Statistical Concepts The Framework presents statistical problem solving as an investigative process that involves four components: (1) Question formulation, (2) Data collection, (3) Data analysis, (4) Interpretation. The Framework stresses the importance of understanding variability in the practice of this process. The formulation of a statistics ques- tion requires an understanding of the difference Number 68 ASA/NCTM Joint Committee on the Curriculum in Statistics and Probability Winter 2006 S tatistics T eacher N etwork The www.amstat.org/education/stn/index.html between a question that anticipates a deterministic answer and a question that anticipates an answer based on data that vary. The anticipation of variability is the basis for understanding this distinction. Data collection designs must acknowledge variability in data and frequently are intended to reduce variabil- ity. An understanding of data collection designs that acknowledge variability is required for effective collec- tion of data. The main purpose of statistical analysis is to give an accounting of the variability in the data. Accounting for variability with the use of distributions is the key idea in the analysis of data. Statistical inter- pretations are made in the presence of variability and must allow for it. Looking beyond the data to make generalizations must allow for variability in the data. Understanding the role of variability in the statisti- cal problem solving process requires maturation in statistical thinking. The beginning student cannot be expected to make all of these linkages. Statistical edu- cation should be viewed as a developmental process, and this report provides a framework for statistical education over three developmental levels, A, B, and C. Although these three levels may parallel grade levels, they are based on development in statistical thinking, not age. Thus, a middle school student who has had no prior experience with statistics will need to begin with Level A concepts and activities before mov- ing to Level B. This holds true for a secondary student as well. If a student hasn’t had Level A and B experi- Also In This Issue… ASA AMERICAN STATISTICAL ASSOCIATION American Statistical Association Letter from the Editor ...................................... 12

Transcript of Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student...

Page 1: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

A Sequence of Activities for Developing Statistical Concepts

Christine Franklin & Gary Kader

IntroductionThe Board of Directors of the American

Statistical Association (ASA) at its May 2005 meet-ing endorsed the report, “A Curriculum Framework for Pre K-12 Statistics Education.” The develop-ment of this Framework was supported by the ASA though funding of a Strategic Initiative Grant pro-posed by the ASA Advisory Committee on Teacher Enhancement. The Framework is designed to give educators guidance towards developing statistical-ly literate citizens. The writers of the Framework were Christine Franklin, Gary Kader, Denise Mewborn, Jerry Moreno, Mike Perry, Roxy Peck, and Richard Scheaffer.

The Framework ModelStatistical Problem Solving and the Evolution of Statistical Concepts

The Framework presents statistical problem solving as an investigative process that involves four components:

(1) Question formulation,

(2) Data collection,

(3) Data analysis,

(4) Interpretation.

The Framework stresses the importance of understanding variability in the practice of this process. The formulation of a statistics ques-tion requires an understanding of the difference

Number 68 ASA/NCTM Joint Committee on the Curriculum in Statistics and Probability Winter 2006

S tatisticsT eacherN etwork

The

www.amstat.org/education/stn/index.html

between a question that anticipates a deterministic answer and a question that anticipates an answer based on data that vary. The anticipation of variability is the basis for understanding this distinction. Data collection designs must acknowledge variability in data and frequently are intended to reduce variabil-ity. An understanding of data collection designs that acknowledge variability is required for effective collec-tion of data. The main purpose of statistical analysis is to give an accounting of the variability in the data. Accounting for variability with the use of distributions is the key idea in the analysis of data. Statistical inter-pretations are made in the presence of variability and must allow for it. Looking beyond the data to make generalizations must allow for variability in the data.

Understanding the role of variability in the statisti-cal problem solving process requires maturation in statistical thinking. The beginning student cannot be expected to make all of these linkages. Statistical edu-cation should be viewed as a developmental process, and this report provides a framework for statistical education over three developmental levels, A, B, and C. Although these three levels may parallel grade levels, they are based on development in statistical thinking, not age. Thus, a middle school student who has had no prior experience with statistics will need to begin with Level A concepts and activities before mov-ing to Level B. This holds true for a secondary student as well. If a student hasn’t had Level A and B experi-

Also In This Issue…

ASAAMERICAN STATISTICAL

ASSOCIATION

AmericanStatisticalAssociation

Letter from the Editor ...................................... 12

Page 2: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

The Statistics Teacher Network2Winter 2006 2

ences prior to high school, then it is not appropriate to jump into Level C expectations. The learning is more teacher-driven at Level A, but becomes student driven at Levels B and C.

The Framework presents a conceptual structure for statistics education in a two-dimensional model. One dimension is defined by the components of the statisti-cal problem-solving process along with the nature of and the focus on variability. The second dimension is comprised of three developmental levels. In this article we will investigate a problem from the Framework that demonstrates the statistical problem-solving process across the developmental levels and illustrates how statistical concepts evolve over the three levels.

The Problem – What’s your favorite type of popular music?

Almost all students have an interest in popular music. However, there are many different kinds of popular music, not all students like the same kind of music, and students’ tastes in music may change over time. In planning for a class party or a school dance, it would be useful to know which kind of pop-ular music is appealing to most students. Preceding the investigation of this problem at each level, the overall objectives that level will be given followed by the specific recommendations that will be illustrated through the investigation.

LEVEL AObjectives of Level A

Children are surrounded by data. It is in Level A that children need to develop data sense -- an under-standing that data can be used to answer questions and that statistics is involved with collecting data to address the question at hand, organizing and summa-rizing the data in various ways, and interpreting the data to provide an answer to our question. Students at Level A:

• Should have opportunities to generate questions about a particular context (such as their classroom) and determine what data might be collected to answer these questions.

• Should learn how to use basic statistical tools to analyze the data and make informal or casual infer-ences in answering the posed questions.

• Students should develop basic ideas of probability in order to support their later use of probability in drawing inferences at Levels B and C.

The specific Level A recommendations addressed in the favorite-type music investigation as they relate to the statistical solving process are:

1. Formulate the QuestionTeachers help pose questions (Questions in con-texts of interest to the student)

Students distinguish between statistical solution and fixed answer

2. Collect Data to Answer the QuestionStudents conduct a census of the Classroom

3. Analyze the DataStudents compare individual to individual

Students compare individual to a group

Students understand the idea of a distribution

Students describe a distribution

Students use tools for exploring distributions including:

Picture GraphFrequency Table and Bar GraphModal Category

4. Interpret ResultsStudents draw conclusions about their class

Investigation 1: Choosing the Music for the End of the Year Party – Conducting a Census

In planning an end of year party, children at Level A would be interested in the exploring the favorite type of music among students in their class. The class might investigate the question:

What type of music is most popular among students in our class?

This question attempts to measure a characteristic of the children who will attend the party. The charac-teristic, favorite music type, is a categorical variable - each child in that grade would be placed in a par-ticular non-numerical category based on his or her favorite music type. A Level A class might conduct a census of the students in their classroom to gauge what type of music should be played at the party.

A picture graph is one graph that should be intro-duced at Level A to represent the results for each cat-egory. A picture graph uses a picture of some sort of object (such as a guitar) to represent each response to the question. Alternatively, a “dot”, an “X”, or a “stick note” can be used instead of a picture. Each child would put the picture over her/his response. The pic-ture might even include the student’s name so that each student’s preference can be identified from the graph. For example, a child who prefers ‘Country’ would go to the board and place a guitar above the column labeled “Country.” In creating a picture graph, there is a deliberate recording of each element one at a time. The picture graph for data from a first grade class with 24 students is shown in Figure A1.

Notice that a picture graph does not have a verti-cal axis and that the different categories are listed along the horizontal axis. After creating a picture graph, students might use tally marks to count the number of responses for each category.

Page 3: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

3The Statistics Teacher Network Winter 20063The Statistics Teacher Network

Figure A1: Picture Graph of Our Favorite Type of Tunes

| | | Country Pop Rock

Type of Music

At Level A, students should recognize that there is variability in responses or measurements from individual-to-individual, and a picture graph pro-vides an illustration into the nature of this variabil-ity. Young children will naturally identify the value that occurs most frequently in a picture graph. This quantity is called the mode. In the favorite-music example, the most popular type of music was Pop preferred by 12 students, so the mode (or modal category) is pop music. Students could use this information to determine the type of music to play at the end of the year party.

As students develop through Level A, they should progress from representing data in relatively simple ways, such as a picture graph and/or tallying, to representations that are more summative in nature.

For example, the counts from tallying data sum-marized in a picture graph could be summarized in a table. Table A1 is the frequency count table for the favorite-music example data summarized in the Picture Graph in Figure A1.

Table A1—Frequency Count Table

Favorite Frequency or Count Country 8 Pop 12 Rock 4

A frequency bar graph is a graph that illustrates categorical data that has been summarized in a fre-quency count table. A bar graph has two axes (hori-zontal and vertical). The different possible categories are listed along horizontal axis and the vertical axis is scaled to accommodate the frequencies for the dif-ferent categories. Over each category on the horizon-tal axis, a rectangle reaching up to the desired count is drawn. The bar graph below is for the data sum-marized in Table A1 for the favorite-music data.

Note that each student’s individual response is no longer available from these representations. For this reason, a frequency count table and its corresponding bar graph are called summative representations for data. However, using either Table A1 or Figure A2, the number of students who responded in each category can be determined, and students can determine that the most popular type of music was pop music and that the least popular music was rock. This informa-

Figure A2: Bar graph of music preferences

Page 4: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

The Statistics Teacher Network4Winter 2006

tion can be used by the students to recommend the playing of pop music at the end of year party.

Transitioning from Level A to Level BAs students transition from Level A toward Level B

they should begin to make use of representations of data that require the use of multiplicative or propor-tional reasoning. For example, a picture graph (used in Level A) refers to a graph where a symbol is used to represent one response. A pictograph is a similar display in which one symbol is used to represent more that one response. For example, each guitar over a category might represent the two responses, in which case a “half” a guitar would represent one response. A Level B student should have the math-ematical maturity to interpret a pictograph.

Another standard graph for summarizing categori-cal data is a circle graph or pie chart. The circle for the favorite-music example is shown in Figure A3.

Figure A3: Circle Graph

Notice that this pie graph does not include the counts for each group. However, students should observe that based on this graph, half the students prefer pop music, more that a quarter prefer country and less than a quarter prefer rock. Since a circle graph requires a basic understanding of proportional reasoning, they should only be used as students transition from Level A to Level B statistics.

For many situations it is useful to report not only the count (frequency) of each response but to also give the count relative to the total number of responses. These relative frequencies are often reported as percentages. The relative frequency table reports these percentages and the vertical axis on the corresponding bar graph is scaled to accom-modate these percentages. For example, the relative frequency table and bar graph for the favorite-music example are shown below in Table A2 and the rela-tive frequency histogram is shown is Figure A4.

Table A2—Relative Frequency Table for Favorite Type of Music

Favorite Percentage Country 33% Pop 50% Rock 17%

Expressing counts as percentages is useful in that the percentages allow us to think of having com-parable results for groups of size 100. Once again, as students transition from Level A to Level B, they need experiences that allow them to use proportional or multiplicative reasoning.

LEVEL BObjectives of Level B

Instruction at Level B should build on the statisti-cal base developed at Level A and set the stage for statistics at Level C. Instructional activities at Level B should continue to emphasize the four main com-ponents in the investigative process and should have the spirit of genuine statistical practice. Students who complete Level B should see statistical reason-ing as a process for solving problems through data and quantitative reasoning. At Level B:

• Students become more aware of the statistical ques-tion distinction (a question with an answer based on data that vary versus a question with a determin-istic answer).

• Students make decisions about what variables to measure and how to measure them in order to address the question posed.

• Students use and expand the graphical, tabular and numerical summaries introduced at Level A to investigate more sophisticated problems.

Figure A4: Percentage Bar Graph

Page 5: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

5The Statistics Teacher Network Winter 20065

• Students develop a basic understanding of the role that probability plays in random selection when selecting a sample and in random assignment when conducting an experiment.

• Students investigate problems with more empha-sis placed on possible associations among two or more variables and understand how a more sophisticated collection of graphical, tabular and numerical summaries is used to address these questions.

• Students recognize ways that statistics is used or misused in their world.

The specific Level B recommendations addressed in the favorite-type music investigation as they relate to the statistical solving process are:

1. Formulate Questions Students begin to pose their own questions.

Students address questions involving a group larger than their classroom and begin to recognize the distinction among a population, a census, and a sample.

2. Collect Data Students conduct censuses of two or more class-

rooms.

3. Analyze Data Students expand their understanding of a data

distribution.

Students compare two or more distributions using graphical displays and summary measures.

Students explore association between two vari-ables.

4. Interpret Results Students acknowledge that a sample may not be

representative of a larger population.

Students understand basic interpretations of mea-sures of association.

Students recognize sampling variability in sum-mary measures such as the sample mode and the sample proportion.

Investigation 2: Choosing a Band for the School Dance

In planning for a school dance, a Level B class might investigate the questions:

What type of music is most popular among students at our school?

How do the favorite types of music differ between different classes?

In the favorite-type of music problem from Level A, the class was considered to be the entire popula-tion and data were collected on every member of the population. A similar investigation at Level B would include recognition that one class may not be rep-resentative of the opinions of all students at their school, and Level B students might want to compare the opinions of their class with other classes from their school.

Since class sizes may be different, in order to make comparisons, results should be summarized with relative frequencies. Level B students will see

more emphasis on multi-plicative and proportional reasoning throughout the mathematics curriculum, and they should be com-fortable summarizing and interpreting data in terms of percentages or fractions.

The results from two classes are summarized in the Table B1 using both frequency counts and rela-tive frequency percentages.

Notice that no students at Level B identified “Pop” as their favorite music, but

Class 1 Class 2

Favorite Frequency Relative Frequency Percentage

Favorite Frequency Relative Frequency Percentage

Country 8 33% Country 5 17%

Rap 12 50% Rap 11 37%

Rock 4 17% Rock 14 47%

Total 24 100% Total 30 101%

Table B1—Frequency counts and relative frequency percentages

Figure B1: Comparative bar graphs for music pref-erences

Page 6: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

The Statistics Teacher Network6Winter 2006 6

many did include “Rap.” The comparative bar graph in Figure B1 compares the percentage of each favor-ite music category for the two classes.

Students at Level B should begin to recognize that there is not only variability from one individual to another within a group, but that there is variability in results from one group to another. This second type of variability is illustrated by the fact that in Class 1 the most popular music is rap music while in Class 2 it is rock music. That is, the mode for Class 1 is rap music, while the mode for Class 2 is rock music.

Connecting Two Categorical VariablesSince rap was the most popular music for the com-

bined two classes, the students might argue for a rap band for the dance. However, more than half of those surveyed prefer either rock or country music. Will these students be unhappy if a rap band is chosen? Not necessarily since many students who like rock (or country) music may also like rap music as well. To investigate this problem, students might explore an additional question.

Do students who like rock music tend to like or dislike rap music?

To address this question, the survey should ask students not only their favorite type of music, but also whether or not they like rap, rock, and country music.

The two-way frequency table (or contingency table) in Table B2 provides a way to investigate possible con-nections between two categorical variables.

Table B2—Two-way frequency table

Like Rap Music?

Yes No Row Totals

Like Rock

Music?

Yes 27 6 33

No 4 17 21

Column Totals 31 23 54

According to these results, of the 33 students who liked rock music, 27 also liked rap music. That is, 82% (27/33) of the students who like rock music also like rap music. Figure B2 shows a segmented bar graph for the data summarized in Table B2. The display pro-vides a visual display of the percentage students who like rap music across those students who like rock music and those students who do not like rock music.

These results indicate that students who like rock music tend to like rap music as well, and that stu-dents who do not like rock tend to not like rap. That is, there appears to be an association (connection) between liking rock music and liking rap music. Once again, notice the use of proportional reasoning in

Figure B2: Segmented Bar Graph

interpreting these results. A similar analysis could be performed to determine if students who like country tend to like or dislike rap music.

Transitioning from Level B to Level CThe results from the two classes summarized

in Table B1 might be combined in order to have a larger sample of students from the entire school. The combined results indicate that rap music was the favorite type of music for 43% of the students, rock music was preferred by 33%, while only 24% of the students selected country music as their favorite. Level B students should recognize that although this is a larger sample, it still may not be representative of the entire population (all students at their school). In statistics, randomness and probability are incor-porated into the sample selection procedure in order to provide a method that is fair and to improve the chances of selecting a representative sample. For example, if the class decides to select what is called a simple random sample of 54 students, then each possible sample of 54 students has the same proba-bility of being selected. This application of probability illustrates one of the roles of probability in statistics. Although Level B students may not actually employ a random selection procedure for investigating this problem, issues related to obtaining representa-tive samples should be discussed at this level. This is especially important as students transition from Level B to Level C and as they begin understand the role that probability and random variability plays in the decision making process in statistics.

As indicated earlier, Level B students should rec-ognize that the value of sample summary measures, such as the proportion of students who like rap music, will vary from one sample to another. If ran-dom samples of the same size are repeatedly select-ed, recognizing and understanding the pattern in variation for the values of the sample proportion that should occur is crucial in making the transition to

Page 7: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

7The Statistics Teacher Network Winter 2006

inferential statistics, the primary focus at Level C. One type of statistical inference relates to conjec-tures (hypotheses) made before the data were col-lected. Suppose in the favorite-music example, a stu-dent says “I think more than 40% of the students in the school like rap music.” Notice that in the sample, 43% prefer rap music, which supports the claim, but since the sample proportion varies from one sample to another, this does not conclusively indicate the more that 40% of all students prefer rap music. The statistical question then is how strong is the sample data in support this claim?

To explore this problem, students might set up a hypothetical population that has 40% successes (likes rap music) and repeatedly simulate taking random samples of size 54 from it. One way to do this is to use a 10-sided die, which should produce the digits 0,1,2,3 about 40% of the time. Each time a 0, 1, 2, or 3 occurs, this corresponds to someone responding “rap”, otherwise, the response is “not rap”. Have each student toss the die 54 times and determine the per-cent of times the digits 0-3 occur. Another way to do this simulation is to use a random number generator or table, which should produce the digits 0,1,2,3 about 40% of the time. So, students might generate 54 ran-dom digits and determine the percent of digits 0-3 that occur. The distribution of percents generated from 100 such simulations will be similar to the one in Figure B3. This distribution is called the empirical sampling distribution of the sample percent. The distribution illustrates the nature of the behavior of the sample percent when random samples of size 54 are repeat-edly selected from a population with 40% successes. The key points here are for students to recognize that:

• the sample proportion varies from one sample to another,

• when selecting random samples, a recognizable pat-tern emerges in the distribution, and

• many samples have 43% percent and even higher percentages occurring.

The final point above is important in the develop-ment of statistical thinking. The fact that 43% or more preferring rap music is quite common in samples of size 54 when the population has only 40% who prefer rap music suggests that the sample evidence support-ing the student’s conjecture that “more than 40% pre-fer rap” is not very strong.

Objectives of Level CLevel C is designed to build on the foundation devel-

oped in Levels A and B. At Level C, additional concepts are developed for the interpretation and the use of sta-tistical methods to answer questions. Level C recom-mendations include:

• Students should be able to formulate questions and determine how data can be collected and analyzed to provide an answer.

• Students should understand what constitutes good practice in conducting a sample survey.

• Students should understand what constitutes good practice in conducting an experiment.

• Students should understand what constitutes good practice in conducting an observational study.

• Students should be able to design and implement a data collection plan for statistical studies, including observational studies, sample surveys, and simple comparative experiments.

• Students should be able to summarize numerical and categorical data using tables, graphical dis-plays, and numerical summary statistics such as the mean and standard deviation.

• Students should understand how sampling distri-butions (developed through simulation) are used to describe sample-to-sample variability.

• Students should be able to recognize association between two categorical variables.

• Students should be able to describe relationships between two numerical variables using linear regression and the correlation coefficient.

• Students should understand the meaning of statis-tical significance and the difference between statisti-cal significance and practical significance.

• Students should understand the role of P-values in determining statistical significance.

• Students should be able to interpret the margin of error associated with an estimate of a population characteristic.

The specific Level C recommendations addressed in the favorite-type music investigation as they relate to the statistical solving process are:

1. Formulate Questions Students address questions involving a group

Figure B3: Sample Percentages for 100 Samples of Size 54 from a Population with 40% SuccessesLEVEL C

Page 8: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

The Statistics Teacher Network8Winter 2006

larger than their classroom and begin to recognize the distinction among a population, a census, and a sample.

2. Collect Data Students design and conduct random sample sur-

veys.

3. Analyze Data Students expand their understanding of a data dis-

tribution.

Students compare two or more distributions using graphical displays and summary measures.

Students explore association between two variables.

4. Interpret Results Students understand sampling variability in sum-

mary measures such as the sample proportion.Students understand the role of P-values in deter-mining statistical significance.

Investigation 3: A Survey of Music Preferences

A survey of student music preferences was intro-duced at Level A, where the analysis consisted of making counts of student responses and displaying the data in a bar graph. At Level B the analysis was expanded to consider relative frequencies of prefer-ences between two classes and the responses for two types of music were cross-classified and displayed in a two-way table. In order to be able to generalize to all students at the school, a representative sample of students from a school is needed. This could be accomplished by selecting a simple random sample of 50 students from the school. The results can then be generalized to the school (but not beyond) and the Level C discussion will center on basic principles of generalization, or statistical inference.

A Level C analysis begins with a two-way table of counts that summarizes the data on two of the ques-tions, “Do you like rock music?” and “Do you like rap music?” The table provides a way to examine the responses to each question separately as well as a way to explore possible connections (association) between the two categorical variables. Suppose the survey data resulted in the two-way table shown in Table C1.

Table C1—Two-way frequency table

Like Rock Music?

Yes No Row Totals

Like Rap

Music?

Yes 25 4 29

No 6 15 21

Column Totals 31 19 50

As demonstrated at Level B, there are a variety of ways to interpret data summarized in a two-way table such as Table C1. Some examples based on all 50 stu-dents in the survey include:

25 of the 50 students (50%) liked both rap and rock music.

29 of the 50 students (58%) liked rap music.

19 of the 50 students (38%) did not like rock music.

One type of statistical inference relates to conjec-tures (hypotheses) made before the data were collected. Suppose a student says “I think more than 50% of the students in the school like rap music.” The statistical question then is whether the sample data support this claim or not. One way to arrive at an answer is to set up a hypothetical population that has 50% successes (like even and odd digits produced by a random num-ber generator) and repeatedly take samples of size 50 from it, each time recording the proportion of even dig-its. The empirical sampling distribution of proportions so generated will be similar to the one in Figure C1.

A sample proportion greater than or equal to the observed .58 occurs 12 times out of 100 just by chance when the actual population proportion is only .50. This suggests that the result of .58 is not a very unusual occurrence when sampling from a population with .50 as the “true” proportion of students who like rap music, so the evidence in support of the student’s claim is not strong enough. A population value of .50 (or maybe even something larger) is plausible based on what was observed in the sample. The fraction of times the observed result is matched or exceeded (.12 in this investigation) is called the approximate p-value. The p-value represents the chance of observing a sample result as extreme as the one observed in the sample when the hypothesized value is in fact correct.

Figure C1: Dotplot of Empirical Sampling Distribution

Page 9: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

9The Statistics Teacher Network Winter 2006

A small p-value would have supported the student’s claim, because this would have indicated that if the population proportion was .50, it would have been very unlikely that a sample proportion of .58 would have been observed.

Another type of question we investigated at Level B on student preferences on music is of the form “Do those who like rock music also tend to like rap music?” In other words, is there an association between liking rock music and liking rap music? The same data from the random sample of 50 Level C stu-dents can be used to answer this question.

According to Table C1 a total of 31 students in survey liked rock music. Among those students, the proportion who also like rap music is (25/31) = .81. Among the 19 students who do not like rock music, 4/19 = .21 is the proportion who like rap music. The large difference between these two proportions (.60) suggests that most students who like rock also like rap. There appears to be a strong association between liking rock music and liking rap music.

But could this association simply be due to chance (a consequence only of the random sampling)? If there

were no association between the two groups the 31 students who like rock would behave as a random selection from the 50 in the sample. To simulate this situation we make up a population of 29 1’s (those who like rap) and 21 0’s (those who do not like rap) and mix them together. Then, we select 31 (representing those who like rock) at random and see how many 1’s (those who like rap) we get. It is this entry that goes into the (yes, yes) cell of the table, and from that data the dif-ference in proportions can be calculated. Repeating the process many times produces a simulated sampling distribution for the difference between two proportions, as shown in Figure C2.

The observed difference of .6 was never reached in 100 trials, indicating that the observed difference can-not be attributed to chance alone. There is evidence of a real association between liking rock music and liking rap music.

Summary of the Three InvestigationsAt each of the three maturation levels, the “favorite-

music” scenario has allowed us to demonstrate the sta-

In the music problem investigation, data were col-lected and analyzed on several categorical variables. Many problems require data on numerical variables. For example, an elementary grade level class might be planning an order of long-sleeve tee shirts with the school logo. The shirts come in various sizes and sleeve lengths. To simplify measuring arm length, the manufacturer asks that the arm span (from fin-ger tip to finger tip) be measured in centimeters. In planning for the order, a Level A class might inves-tigate the statistics question: How do the arm spans for students in our class vary?

This question requires data on the numerical variable arm span for each child in the class. A use-ful graph for summarizing numeric data is the dot plot. A dot plot is obtained by scaling a horizontal axis from the minimum to maximum measurements and placing a “dot” or an “X” over each student’s measurement. Figure 5A displays data on the arm spans from a third grade class with 27 students. We will use these data throughout this investigation at Levels B and C.

Figure A5: Dotplot of Arm Spans for 27 Students

When describing a distribution for numeric data, students should focus on three general properties – center, shape, and spread. Additionally, students should identify any interesting features of the distri-bution as well as any unusual observations. Based on the dot plot, these arm spans appear to be cen-tered around 120 cm in a fairly symmetric manner. The arm spans have a wide range of 145 – 92 = 53 cm, indicating a lot of spread or variability in the data. However, 19 of the 27 arm spans are contained in two clusters with fairly narrow ranges. The first cluster (111 cm to 118 cm) has 9 values and the second cluster (122 to 126 cm) has 10 values. There appears to be one usually short arm span (92 cm) and two fairly large arm spans (136 cm and 145 cm). The class could use these results to plan their order for tee shirts.

In transitioning from Level A to Level B explora-tions, the class might seek an explanation for the two clusters in the dot plot. One possible explanation is that boys and girls may have different lengths for arm span, and the class might investigate the ques-tion: Is there a difference in the distributions of arm spans between boys and girls?

In transitioning from Level A to Level B compara-tive dotplots using the same horizontal scale are useful for making comparisons between two or more distributions. Comparative dot plots for the arm span data are given in Figure A6.

continued on next page

SidebarAnother Problem - Purchasing Tee Shirts

Page 10: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

The Statistics Teacher Network10Winter 2006

Sidebar continued

Figure A6: Comparative Dot Plots for Arm Spans (Boys versus Girls)

Based on these dotplots, there appear to be differ-ences between the two distributions. The center for girls appears to be around 125 cm, higher than the center for boys, around 112 cm. Both groups appear to be reasonably symmetric and the ranges for both groups are similar (Range for Boys is 122-92 = 30; Range for Girls is 145 -111 = 34). Children will often make comparison by dividing the data based on a single point. For example, at the point 120 cm (the approximate center of the aggregated data in Figure A5), only two boys are above 120 cm while eleven girls are above this point. Based on these data, it appears that third grade girls generally have longer arms than third grade boys.

At Level B the comparison between two or more distributions would be based on the Five-Number Summaries and the corresponding boxplots. The Five-Number Summary is a division of the ordered

data into four groups with approximately 25% of the data in each group. The boxplot is a graphical dis-play of the ranges of each of these four groups. The comparative boxplots for the data given in Figures A5 and A6 are shown in Figure B4.

Figure B4: Comparative Dot Plots for Arm Spans (Boys versus Girls)

Comparing boxplots requires proportional reason-ing. In this example, our conclusions would be simi-lar to those made when comparing dotplots.

Notice that in the boxplots, the median arm span for girls (124 cm) is larger than the median armspan for boys (113.5). If these two groups can be consid-ered as random samples, then at Level C we might ask: Is the difference between the median arm span between boys and girls statistically significant? Using simulation, we could investigate how often this large a difference occurs simply due to random variation. ■

tistical problem solving process and the evolution of sta-tistical concepts and statistical thinking. At level A, the question posed is restricted to the classroom with data

collected by taking a census of the classroom. The data is analyzed using a picture graph and tallies and then collectively with a frequency table and a frequency bar graph. The interpretation is focused on comparing indi-vidual to individual variability and individual to group variability within the context of the question posed. No generalizations are expected beyond the classroom.

Transitioning to level B, the question posed is explored beyond the classroom with a random selec-tion process being considered for gathering the data. Using multiplicative and proportional reasoning, level B students are able to transition to pictographs, circle graphs (pie charts) and relative frequency tables and bar graphs for summarizing the data. The interpreta-tion is focused on both comparing within group vari-ability and between group variability. Students make use of comparative relative frequency tables, compara-tive bar graphs, conditional percentages, and segment-ed bar graphs. At level B, students begin to understand the scope of inference is based on the manner in which the data are collected.

Transitioning to level C, students are expected to ask questions that apply beyond their individual class and attempt to generalize to a larger group. Random selec-tion is an expectation at level C, not simply a consid-

Figure C2: Dotplot showing simulated sampling distribution

Page 11: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

11The Statistics Teacher Network Winter 2006

eration when collecting data. Levels A and B focus on interpreting variability through descriptive statistics. At level C, through the use of simulation, students are transitioned to forming empirical sampling distribu-tions and introduced to the concept of sampling vari-ability. The foundation is formed for introducing the key inferential concept of a p-value.

Table 1 summarizes how these examples illustrate the statistical problem solving process across the three maturation levels.

SummaryA good deal of progress has been made in statistics

education in recent years, but there is still plenty of room for improvement. State standards and assess-ments are all over the map and the descriptions for statistics and data analysis are often poorly struc-tured. Textbooks and other teaching materials tend to be unfocused and often include errors (unless these materials have statistics educators as part of the writing team). It is hoped that this report will provide stakeholders such as writers of state standards, writ-

ers of assessment items, educators at teacher prepa-ration programs, curriculum directors, and Pre K-12 teachers with guidance in developing standards and assessments in data analysis. The Framework has already made a positive impact in the state of Georgia with the current revisions of the Georgia state math-ematical standards. ■

References“A Curriculum Framework for Pre K-12 Statistics

Education” (http://www.amstat.org/education/gaise/)

Process Component Level A Level B Level C

Formulate Question Questions restricted to classroom

Questions not restricted to classroom

Questions seek generalization

Collect Data Census of classroom Non-Random Sample Surveys

Begin to discuss random selection

Sample design using random selection

Analyze Data Display variability within a group

Compare individual to individual

Compare individual to group

Quantify variability within a group

Compare group to group (between) variability in displays

Acknowledge sampling error

Some quantification of association

Measure variability within a group

Measure variability between groups

Compare group to group using displays and measures of vari-ability

Describe and quantify sampling error based on a simulated sampling distribution with margin of error and p-values. Understand sampling variabil-ity.

Quantification of association

Interpret Results Do not look beyond the data

No generalization beyond the classroom

Acknowledge that looking beyond the data is feasible

Acknowledge that a sample may or may not be representative of larger population

Basic interpretation of models for association

Look beyond the data in some contexts

Generalize from sample to population

Interpret models for association

Table 1—Problem Solving Across Maturation Levels

Christine Franklin is a faculty member in the Department of Statistics at the University of Georgia and Gary Kader is a faculty member in the Department of Mathematical Sciences at Appalachian State University. They are both actively involved in statistical education at grade levels K-16.

Page 12: Statistical AMERICAN STATISTICAL ASSOCIATION · teacher-driven at Level A, but becomes student driven at Levels B and C. The Framework presents a conceptual structure for statistics

The Statistics Teacher Network (STN) is a free newsletter published by the ASA/NCTM Joint Committee on Curriculum in Statistics and Probability for Grades K-12. After having taken a few months off, STN is alive and well and in a new electronic format! Visit STN issue 72 at www.amstat.org/education/stn. If you want to be informed of when future editions are available online, please subscribe to our new STN email list by sending an email message to [email protected] with subscribe STN-L in the body. You will receive an email confirmation upon completion. Only the list moderator will send announcements to the list when new STN issues are available online.

Thank you for you interest in STN and your continued interest in statistics education for grades K-12. Please inform your colleagues about STN, and we welcome your feedback on the new electronic format. As always, readers are encouraged to submit articles for publication. Please email STN Editor Derek Webb at [email protected]. Jerry Moreno, Chair ASA/NCTM Joint Committee on Curriculum in Probability and Statistics for Grades K-12 [email protected]