Analysis. Start with describing the features you see in the data.
-
Upload
erin-cummings -
Category
Documents
-
view
221 -
download
1
Transcript of Analysis. Start with describing the features you see in the data.
Starting pointOverlap:
I notice there is a lot of overlap between the boys’ and girls’ foot lengths.
Starting point
Shift:
The boys’ foot lengths are shifted further up the scale than the girls’ foot lengths.
Starting pointUnusual features:
There is an unusually low foot size of 15 cm in the girls’ data. I suspect that this data is a mistake as it seems too low in comparison with the other data for girls.
OR one of the girls has a recorded foot length far shorter than any other girl
After the initial overall visual non-numerical comparisons:
• Make more detailed comparative descriptions of the features including use of summary statistics and specific observation values where appropriate.
• Reflect and perhaps comment on some of the features using “I wonder . . .” and “I expect . . .” type statements, i.e., comment on any inferential thoughts.
comparative descriptions of the features including use of summary
statistics The median foot length of the boys (25cm) is 3cm longer than the median foot length of the girls (22cm).
The mean foot length of the boys (25.5cm) is 2.1cm longer than the mean foot length of the girls (23.4cm).
comparative descriptions of the features including use of summary
statistics The range of foot lengths for the boys (9cm) is the same as the range of foot lengths for the girls (9cm) if we ignore the unusual value.
Also the interquartile range for the foot lengths for the boys (3cm) is the same as for the girls (3cm).
comparative descriptions of the features including use of summary
statistics The most common result for the foot length of boys was 25cm but for the girls it was 22 and 23 cm.
In all these cases, the boys seem to have higher values of foot length than the girls by about 2cm.
comparative descriptions of the features including use of summary
statistics
The median foot length for the boys is the same as the UQ value for the girls (25cm)
Make ComparisonsBetween the groups (e.g., overlap, shift, spread
and shape statements)
Within each group (e.g., unusual observations)
Overlap
Be aware of sampling variation:Sampling alone can produce shiftsThese shifts are small in large samplesThey can be large in small samples.
Overlap
There is some overlap of the boxes but the median of the girls’ foot length is outside the boys’ box and the median of the boys’ foot length is the same as the UQ of the girls’ foot length.
Overlap
OR There is some overlap for the middle 50% of the boys’ right foot lengths and the middle 50% of the girls’ right foot lengths.
Shift
The boys values are shifted to the right of the girls values for maximum and minimum values and median and UQ and LQ foot lengths.
Shift
The middle 50% of the boys’ foot lengths (the box) is shifted much further along the scale than the middle 50% of the girls’ foot lengths.
Spread
The spread for both boys’ foot lengths and girls’ foot lengths are the same i.e. range is 9cm in both cases and IQR is 3cm for both.
Spread
The middle 50% of boys have a right foot measuring between 24cm and27cm (IQR = 3cm) whereas the middle 50% of the girls are between 22and 25cm (IQR = 3cm). This means that the foot lengths for these boys vary by about the same amount as these girls’ do.
Spread
I expect that the boys’ and girls’ foot length distributions back in the two populationshave similar variability.
Note:The range should not be used as it is very inclined to be an unstable estimate of the population spread.
The range is highly likely to vary greatly from sample to sample for samples of these sizes.
The range is also prone to be severely affected by the occasional extreme observation.
This is why we use other more resistant measures of spread such as the IQR.
The IQR is not disturbed by the presence of a few very large or very small observations.
From the dot plot:Some of the boys have bigger right foot lengths than some of the girls and vice versa
ShapeThe shape of the distributions is not clear from the dot plots but appears to be unimodal as would be expected and maybe slightly skewed to the right as indicated by the box plots.
To get a more accurate view, we would need to increase the sample size.
ShapeOR
The sample distribution for the boys’ foot lengths is roughly symmetrical
with a mound around 24 to 27cm, i.e., unimodal
The sample distribution for the girls’ foot lengths shows a large mound
around 22 to 24 cm.
ShapeI wonder if boys’ and girls’ foot length distributions back in the two populations are roughly symmetric and unimodal. I expect so for a body measurement such as foot length for both girls and boys.
Unusual valueI notice one of the girls has a foot length (15cm) far smaller than any other girl
I worry that this may be a mistake. It could be a measurement or recording mistake or perhaps this girl is much younger than 13 years. I wouldn’t expect a 13 year-old girl to have a foot size this small. I need to check her other measurements such as age, height etc. to further investigate this extreme value.
Gaps and clustersI notice the dots are stacked on whole numbers. This is because the foot lengths are measured to the nearest cm.
Gaps and clustersThere is a gap in the girls’ group at 28cm and gaps in the boys’ group at 22 and 29cm.
Gaps and clustersBoys’ and girls’ foot length distributions back in the two populations would not have gaps at these same values. The gaps are in the sample due to the small size of the sample.
SamplingIf a new random sample of 24 13-year-old boys and a new random sample of 22 13-year-old girls were taken I would expect the plots to look different because of sampling variability. With these sample sizes, I would expect each IQR spread to change slightly and that each box would be slightly further down or up the scale.
I wonder:
if I repeated this sampling process many times the boys’ foot lengths would, just about always, be shifted further up the scale than the girls’
if boys tend to have a greater foot length than girls back in the two populations
if the median foot length of boys really is greater than that of girls back in the two populations
ConclusionI notice that the distance between the medians is greater than 1/3 of the “overall visible spread”
ConclusionI am going to claim that the right foot lengths of 13 year-old New Zealand boys tend to be longer than the right foot lengths of 13 year-old New Zealand girls back in the two populations. I am prepared to make this call because, in my data, the distance between the boys’ and the girls’ median foot lengths is big relative to the overall visible spread. To make this call, with sample sizes of around 30, the difference between the two foot length medians needs to be more than about 1/3 of the overall visible spread. This is true for my data.
ConclusionI don’t believe that the pattern in my data of the boys tending to have longer foot lengths than the girls is just due to who happened to be randomly selected in the girls’ group and who happened to be randomly selected in the boys’ group, i.e., I don’t believe this data pattern has just happened by chance. I am prepared to claim that this pattern in the data is real, i.e., that this pattern persists back in the two populations.
Notes:We use ‘… right foot length …’ because the investigative question asks about the right foot length.
Notes:Using statistics there is always the possibility that the calls (decisions) that we make are wrong, i.e., we are making calls in the face of uncertainty. For example, we want to make a call on who tends to be taller (back in the two populations), 13 year-old boys or 13 year-old girls. We may make the call that it’s 13 year-old boys when in fact it’s girls who tend to be taller. Or, we may not want to make a call even though boys tend to be taller than girls..
Explanatory I expected that boys tend to have bigger feet than girls back in the populations and the information I collected (my data) supports this belief.
I can’t think of any other factor which can explain the difference in foot size other than gender.
Notes:In this explanatory element we ask ourselves if our conclusion makes sense with what we know, i.e., whether our contextual knowledge matches our conclusions.
We must try to think of other factors which may lead to alternative explanations when measuring foot lengths. These suggestions should also be present in the conclusion.