Using Descriptive Statistics

6
Graphing Resources Using Error Bars in your Graph The knowledge that any individual measurement you make in a lab will lack perfect precision often leads a researcher to choose to take multiple measurements at some independent variable level. Though no one of these measurements are likely to be more precise than any other, this group of values, it is hoped, will cluster about the true value you are trying to measure. This distribution of data values is often represented by showing a single data point, representing the mean value of the data, and error bars to represent the overall distribution of the data. Let's take, for example, the impact energy absorbed by a metal at various temperatures. In this case, the temperature of the metal is the independent variable being manipulated by the researcher and the amount of energy absorbed is the dependent variable being recorded. Because there is not perfect precision in recording this absorbed energy, five different metal bars are tested at each temperature level. The resulting data (and graph) might look like this: For clarity, the data for each level of the independent variable (temperature) has been plotted on the scatter plot in a different color and symbol. Notice the range of energy values recorded at each of the temperatures. At -195 degrees, the energy values (shown in blue diamonds) all hover around 0 joules. On the other hand, at both 0 and 20 degrees, the values range quite a bit. In fact, there are a number of measurements at 0 degrees (shown Using Descriptive Statistics https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html 1 of 6 11/8/2015 10:07 PM

description

Using Descriptive Statistics

Transcript of Using Descriptive Statistics

Page 1: Using Descriptive Statistics

Graphing Resources

Using Error Bars in your GraphThe knowledge that any individual measurement you make in a lab will lack perfect precisionoften leads a researcher to choose to take multiple measurements at some independentvariable level. Though no one of these measurements are likely to be more precise than anyother, this group of values, it is hoped, will cluster about the true value you are trying tomeasure. This distribution of data values is often represented by showing a single data point,representing the mean value of the data, and error bars to represent the overall distributionof the data.

Let's take, for example, the impact energy absorbed by a metal at various temperatures. Inthis case, the temperature of the metal is the independent variable being manipulated by theresearcher and the amount of energy absorbed is the dependent variable being recorded.Because there is not perfect precision in recording this absorbed energy, five different metalbars are tested at each temperature level. The resulting data (and graph) might look likethis:

For clarity, the data for each level of the independent variable (temperature) has beenplotted on the scatter plot in a different color and symbol. Notice the range of energy valuesrecorded at each of the temperatures. At -195 degrees, the energy values (shown in bluediamonds) all hover around 0 joules. On the other hand, at both 0 and 20 degrees, thevalues range quite a bit. In fact, there are a number of measurements at 0 degrees (shown

Using Descriptive Statistics https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html

1 of 6 11/8/2015 10:07 PM

Page 2: Using Descriptive Statistics

in purple squares) that are very close to measurements taken at 20 degrees (shown in lightblue triangles). These ranges in values represent the uncertainty in our measurement. Canwe say there is any difference in energy level at 0 and 20 degrees? One way to do this is touse the descriptive statistic, mean.

The mean, or average, of a group of values describes a middle point, or central tendency,about which data points vary. Without going into detail, the mean is a way of summarizing agroup of data and stating a best guess at what the true value of the dependent variablevalue is for that independent variable level. In this example, it would be a best guess atwhat the true energy level was for a given temperature. The above scatter plot can betransformed into a line graph showing the mean energy values:

Note that instead of creating a graph using all of the raw data, now only the mean value isplotted for impact energy. The mean was calculated for each temperature by using theAVERAGE function in Excel. You use this function by typing =AVERAGE in the formula barand then putting the range of cells containing the data you want the mean of withinparentheses after the function name, like this:

Using Descriptive Statistics https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html

2 of 6 11/8/2015 10:07 PM

Page 3: Using Descriptive Statistics

In this case, the values in cells B82 through B86 are averaged (the mean calculated) and theresult placed in cell B87. Once you have calculated the mean for the -195 values, then copythis formula into the cells C87, etc. If you look back at the line graph above, we can now saythat the mean impact energy at 20 degrees is indeed higher than the mean impact energy at0 degrees. However, though you can say that the means of the data you collected at 20 and0 degrees are different, you can't say for certain the true energy values are different. Canwe ever know the true energy values? No, but you can include additional information toindicate how closely the means are likely to reflect the true values. You can do this witherror bars.

There are two common ways you can statistically describe uncertainty in yourmeasurements. One is with the standard deviation of a single measurement (often justcalled the standard deviation) and the other is with the standard deviation of the mean,often called the standard error. Since what we are representing the means in our graph,the standard error is the appropriate measurement to use to calculate the error bars. Whilewe were able to use a function to directly calculate the mean, the standard error calculationis a little more round about. First you have to calculate the standard deviation with theSTDEV function. It is used much the same way AVERAGE was:

The standard error is calculated by dividing the standard deviation by the square root ofnumber of measurements that make up the mean (often represented by N). In this case, 5measurements were made (N = 5) so the standard deviation is divided by the square root of5. By dividing the standard deviation by the square root of N, the standard error growssmaller as the number of measurements (N) grows larger. This reflects the greaterconfidence you have in your mean value as you make more measurements. You can makeuse of the of the square root function, SQRT, in calculating this value:

Using words you can state that, based on five measurements, the impact energy at -195 degC is 1.4 +/- 0.2 joules. The +/- value is the standard error and expresses how confident you

Using Descriptive Statistics https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html

3 of 6 11/8/2015 10:07 PM

Page 4: Using Descriptive Statistics

are that the mean value (1.4) represents the true value of the impact energy. Graphicallyyou can represent this in error bars.

With the standard error calculated for each temperature, error bars can now be created foreach mean. First click the line in the graph so it is highlighted. Now selectFormat>Selected Data Series...

Select the Y Error Bars tab and then choose to Display Both (top and bottom error bars).Now click on the Custom button as the method for entering the Error amount. You willwant to use the standard error to represent both the + and the - values for the error bars,B89 through E89 in this case. Note: it is critical to highlight the standardard deviation valuesfor all of the temperatures. This way the unique standard error value is associated with eachmean. The easiest way to do this is to click on the up arrow button as shown in the figureabove. The dialog box will now shrink and allow you to highlight cells representing thestandard error values:

Using Descriptive Statistics https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html

4 of 6 11/8/2015 10:07 PM

Page 5: Using Descriptive Statistics

When you are done, click on the down arrow button and repeat for the other value cell.When you are done, click OK. Your graph should now look like this:

The error bars shown in the line graph above represent a description of how confident youare that the mean represents the true impact energy value. The more the orginal datavalues range above and below the mean, the wider the error bars and less confident you arein a particular value. Compare these error bars to the distribution of data points in theoriginal scatter plot above.Tight distribution of points around 100 degrees - small error bars;loose distribution of points around 0 degrees - large error bars. More precisely, the part ofthe error bar above each point represents plus one standard error and the part of the barbelow represents minus one standard error.

With the error bars present, what can you say about the difference in mean impact valuesfor each temperature? If the upper error bar for one temperature overlaps the range ofimpact values within the error bar of another temperature, there is a much lower likelihood

Using Descriptive Statistics https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html

5 of 6 11/8/2015 10:07 PM

Page 6: Using Descriptive Statistics

that these two impact values differ significantly. Therefore, we can say with some confidencethat the impact energy at 0, 20, and 100 degrees is significantly greater than at -195degrees. We can also say the same of the impact energy at 100 degrees from 0 degrees.However, we are much less confident that there is a significant difference between 20 and 0degrees or between 20 and 100 degrees. How can we improve our confidence? One waywould be to take more measurements and shrink the standard error. However, rememberthat the standard error will decrease by the square root of N, therefore it may take quite afew measurements to decrease the standard error. It is also possible that your equipment issimply not sensitive enough to record these differences or, in fact, there is no real significantdifference in some of these impact values.

If you are also going to represent the data shown in this graph in a table or in the body ofyour lab report, you may want to refer to the resources on significant digits and designingtables.

© Copyright NC State University 2004Sponsored and funded by National Science Foundation

(DUE-9950405 and DUE-0231086)

Site design by Rosa Wallace

Rev. RW 5/16/05

Using Descriptive Statistics https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html

6 of 6 11/8/2015 10:07 PM