Biderman's Psychology 201 Handouts › ... › docs › p2010_lecture02_frequ… · Web...

Biderman's Psychology 201 Handouts

PSY 2010 Lecture Notes Chapter 2 - Frequency Distributions

The Goals of Descriptive Statistics

Characteristics of peoplekind, aloof, gregarious, tall, friendly, mean, spacy, etc.

Characteristics of Cities:forward-looking, violent, progressive; outdoorsy

Characteristics of Cars:fast, economical, hybrid; 8-seat etc.

Just as there are certain characteristics which seem to "belong" to people or cities or cars, there are a few characteristics which "belong" to collections of numbers and which statisticians feel should be mentioned whenever an attempt is made to describe a collection.

The Big Three Characteristics of collections of numbers

1: Central Tendency (known loosely as “average value”)

Consider the following weights:230, 260, 305, 195.

Compare them with the following:115, 120, 105, 94, 110,115, 100 90, 85.

Central tendency refers to how big the individual numbers in the collection are.

The central tendency of the first is larger - The scores in the first collection are larger than those in the second.

2: Variability

Consider the following collection:150, 155, 158, 160, 153, 156, 152.

Compare it with:85, 175, 305, 95, 130.

Variability refers to how different the numbers in the collection are from each other.

Small variability: All numbers are approximately the same value.

Large variability: All numbers are quite different from each other.

Note that the second collection is more variable than the first.

3: Shape of the distribution of scores

Shape refers to the way score values are positioned or placed on the number line.

In some distributions, the scores are all piled up on one side or the other.

In others, the scores are piled up in the middle.

Shape will be considered in detail after graphical methods of description have been introduced.

Other Characteristics

We will consider the correlation between paired data later in the course.

Descriptive Techniques

To describe

1) central tendency

2) variability, and

3) shape

Overview (Boldfaced are most important for this class)

1. Tables

Regular frequency distribution

Grouped frequency distribution

Stem and leaf displays

Chapter 2

2. Graphs.

Bar graph

Histogram

Frequency polygon

Dot plot

Scatterplots

3. Numeric summaries

Mean, median, mode

Chapter 14

Chapter 3

Standard deviation, range

Measures of skewness and kurtosis

Correlation Coefficient

Ungrouped Frequency DistributionsCorty p 35-36

Definition: A list of all possible score values from the largest down to the smallest along with the frequency of occurrence of each score value, even if it was 0.

Example . . .

Hypothetical responses to a survey item, Please indicate how much you agree with the following statement on a 1= Strongly Disagree to 7=Strongly agree scale . . .

“I think Taylor Swift is the best vocalist, male or female, of this decade.”

Hypothetical answers of 20 university undergraduates

7 4 6 3 3 4 6 4 6 4 5 4 5 1 5 4 6 6 7 4

Ungrouped Frequency Distribution of the Responses

ResponseFrequencyPercentage

7 (Strongly Agree)210.00

6525.00

5315.00

4735.00

3210.00

200.00

1 (Strongly Disagree)15.00

Same Table annotated . . .

ResponseFrequencyPercentage

Largest score value is at the top of the table.

7 (Strongly Agree)210.00

6525.00

5315.00

4735.00

3210.00

Smallest score value is at the bottom of the table.

200.00

1 (Strongly Disagree)15.00

Frequency Distributions from the ATV Data

Score values with 0 frequency are included in the table.

Whether wearing helmet at time of crash

ValueFrequency

Yes 63

No344

Unavailable 93

OK, so from the above, we can easily see that most of the persons involved in accidents that ended in the hospital were NOT wearing helmets.

Whether drinking Alcohol before time of crash Start here on 8/27/15

ValueFrequency

Yes141

No296

No info 39

No test 24

From this we can see that most people involved in hospitalization accidents on ATVs were not using alcohol.

Helmet usage by age group

Age 20 or YoungerAge 21 or Older

ValueFreqPctValueFreqPct

Yes3522%Yes2712%

No13878%No20288%

We can see that the % was slightly greater for older drivers.

Alcohol usage by age group

Age 20 or YoungerAge 21 or Older

ValueFreqPctValueFreqPct

Yes138%Yes12548%

No15692%No13852%

We can easily see that the older drivers were more likely to have been drinking.

The point of these examples is that the questions we asked were easily answered once we’d created frequency distributions. They’d have been much harder to answer with just the raw data. Grouped Frequency Distribution

Definition: A list of equal-sized score groups ordered from the group with the largest scores in it down to the group with the smallest scores in it along with number of scores in each group.

Rules

1) Groups must be equal size.

2) The left label of each interval must be divisible without remainder by the interval width.

Example

ISS: Injury Severity Scores from the ATV data

ISS: A number representing how sever the injury is. Larger scores represent more severe injuries.

More than you ever wanted to know about the ISS . . .

From http://www.trauma.org/archive/scores/iss.html

Injury Severity Score

The Injury Severity Score (ISS) is an anatomical scoring system that provides an overall score for patients with multiple injuries.

Each injury is assigned an Abbreviated Injury Scale (AIS) score and is allocated to one of six body regions (Head, Face, Chest, Abdomen, Extremities (including Pelvis), External). Only the highest AIS score in each body region is used.

The 3 most severely injured body regions have their score squared and added together to produce the ISS score.

An example of the ISS calculation is shown here:

An attempt to creat an Ungrouped Frequency Distribution of all the ISS scores in the ATV data. ARGH!!!!

iss

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

75

1

.2

.2

.2

50

1

.2

.2

.4

45

1

.2

.2

.6

43

1

.2

.2

.8

This just won’t work.

The table will have to be too big.

35

1

.2

.2

1.0

34

1

.2

.2

1.2

33

1

.2

.2

1.4

30

1

.2

.2

1.6

29

5

1.0

1.0

2.6

27

2

.4

.4

3.0

26

7

1.4

1.4

4.4

25

8

1.6

1.6

6.0

24

5

1.0

1.0

7.0

22

7

1.4

1.4

8.4

21

13

2.6

2.6

11.0

20

2

.4

.4

11.4

19

6

1.2

1.2

12.6

18

4

.8

.8

13.4

17

25

5.0

5.0

18.4

16

9

1.8

1.8

20.2

14

26

5.2

5.2

25.4

13

30

6.0

6.0

31.4

12

6

1.2

1.2

32.6

11

5

1.0

1.0

33.6

10

36

7.2

7.2

40.8

9

67

13.4

13.4

54.2

8

22

4.4

4.4

58.6

6

13

2.6

2.6

61.2

5

79

15.8

15.8

77.0

4

72

14.4

14.4

91.4

3

1

.2

.2

91.6

2

9

1.8

1.8

93.4

1

33

6.6

6.6

100.0

Total

500

100.0

100.0

Problem: The above is not an appropriate regular frequency distribution.

That’s because not all internal values are listed.

If ALL the possible score values were listed, here is what it would look like

ISSFrequency

751

740

730

720

710

700

690

680

670

660

650

640

630

620

610

600

590

580

570

560

550

540

530

520

Table continues

510

501

.

.

The problem with this is that it’ll be much too tall.

Whenever it will require more than about 10 lines for a table, the data should be grouped.A Grouped Frequency Distribution of the ISS Variable

ISS IntervalFrequencyPercent

70-79 1 0.2

60-69 0 0.0

50-59 1 0.2

40-49 2 0.4

30-39 4 0.8

20-29 49 9.8

10-1914729.4

0- 929659.2

Extended rules for Grouped Frequency Distributions . . .

1. Number of groups: About 10 although you might have a few more or less.

2. Group width of 3, 5, 10, 20, 50, 100, etc.

I suggest that you try to use 3, 5, or 10 as the interval width. A large majority of data sets will fit one of those choices.

3. Group width is the same for ALL groups, including top group and bottom group.

4. Left-hand labels are divisible without remainder by width

Group size above is 10. 0, 10, 20, 30, 40, 50, 60, 70 are divisible without remainder by 10.

5. Groups are contiguous.

Biderman's 201 Lecture Notes: Frequency Distributions - 88/27/2015

Some example frequency distributions

From the UTC Factbook: www.utc.edu -> About UTC -> Academic & Institutional Research -> Factbook (not Facebook) -> Fact Summary Sheet

Grades awarded in Psychology

Grade#%

A100443.0

B67428.9

C42218.1

D1165.0

F1195.1

Stem & Leaf Displays Not in Corty

Stem & Leaf Display: An ordered representation of scores in which rows (the stems) represent score intervals and numbers within rows (the leaves) represent individual values. The rows are called stems and the numbers within rows are called leaves.

The most straightforward such table is one representing two-digit scores. In this case, rows correspond to the first digit of each number . Within each row, the last digit of each number represents the number.

For example, consider the following two-digit values . . .

24 29 40 58 42 9 15 20 78 90 96 26 10 16 38 46 29 65 82 71 81 45 52 68 49 94

These would be represented in a stem & leaf display as follows . . .

StemsLeaves

09

Stem

Leaves

15 0 6

24 9 0 6 9

38

40 2 6 5 9

58 2

65 8

78 1

82 1

90 6 4

Usually, the leaves are ordered from

smallest to largest within stems . . .

StemsOrdered Leaves

09

10 5 6

20 4 6 9 9

38

40 2 5 6 9

52 8

65 8

71 8

81 2

90 4 6

More than you ever wanted to know about stem and leaf displays. This page left in the notes for your reference but you will not be required to “split” the stems for this class.

For large samples, data analysts will “split” the stems so that the table won’t be too wide.

Consider the following scores

24 29 40 58 42 9 15 20 78 90 96 26 10 16 38 46 29 65 82 71 81 45 52 68 49 94

56 61 74 84 90 88 79 83 86 83 76 75 79 80 75 98 97 93 95 92 81 80 78 94 92 91

Here’s the stem and leaf display for them

0 9

10 5 6

24 9 0 6 9

38

42 6 5 9

58 2 6

65 8 1

78 1 4 9 6 5 9 5 8

82 1 4 8 3 6 3 0 1 0

90 6 4 0 8 7 3 5 2 4 2 1

The “splitting” replaces each stem label with identical stem labels, one that contains all values that end in 0 thru 4 and the other that represents all values that end in 5 thru 9. So the above display would be

0

09

10

15 6

20 4

26 9 9

3

38

42

4 5 6 9

52

56 8

61

65 8

71 4

75 5 6 8 8 9

80 0 1 1 2 3 3 4

86 8

90 0 1 2 2 3 4

95 6 7 8

Graphic Representation

Bar graph / Bar charts(Corty p. 50)

A graph of bars in which each bar represents a different value and the length of the bar represents frequency of occurrence. Bars do not touch each other.

Used for nominal/categorical data – Gender, Handedness, College major, Religion, Letter grade

Gives same information as a Regular Frequency Distribution.

Mike – Open SPSS and demonstrate this.

Example from Employee Data.sav

Histograms Corty p. 52

A graph in which each value or interval of values is represented by a column whose length corresponds to frequency.

Columns may touch.

Mike - Demo this.

Used for Quantitative data: GPA, IQ, Height, Weight

Used for same data as a Grouped Frequency Distribution

Example from Employee Data.sav

Frequency Polygons – p. 54 Start here on 9/2/15

Classic Positively Skewed Distribution

A graph in which each value is represented by a point on an axis, and frequency of occurrence at each value is represented by the height of a line above the axis.

Note: SPSS’s Line graph does not create this display correctly. SPSS can only create an Idealized Frequency Polygon, created assuming that the data follow a normal distribution. (More on the normal distribution in a later chapter.)

Used for Quantitative data, typically for very large samples.

Idealized Frequency Polygon

Scores of 329 UTC students on a measure of Conscientiousness

Note that the idealized frequency polygon pretty much matches the observed histogram.

ACTComp scores of 4700+ UTC students.

Note that the idealized frequency polygon doesn’t match the histogram of actual scores very well.

This means that the observed data are not distributed as pictured by the idealized curve.

Idealized Frequency Polygon

Dot plots (Not covered in the text)

A graph in which each score is represented by a symbol placed at a location corresponding to the score’s value.

Used for quantitative data, often with paper and pencil.

If two or more scores have the same value, a “pile” of symbols at the location is created.

Following are dot plots of PSY 1010 test scores of 185 UTC students. (from Sebren data)

Menu sequence . . .

Scores on a Psych 1010 test of 185 students – Argh!! SPSS made the dots too big.

To edit the dot sizes, 1) double-click on the graph, 2) then right-click on a dot, and 3) change the size in the Properties Window. Following is the same plot with smaller dots . . .

Who did better overall on the test – females or males?

Answer: It seems as if males and females did about equally well.

The dots don’t even have be dots - I use vertical lines for my final grade distributions . . .

Here’s an example of a paper and pencil dot plot.

Box and Whisker PlotsNot in Chapter 2

A plot in the form of a rectangular box with whiskers at the top and bottom that represents 5 key quantities – Scores on PSY 1010 test.

Smallest non-outlying value

Possible outliers

25th Percentile

50th Percentile

75th Percentile

Largest value

Box and whisker plots used to compare groups . . .

Eventually you’ll be able to see from such a comparison that

1) there is not a huge difference in “average” performance of males and females on this particular test.

2) Both distributions are slightly negatively skewed. (The bottom whiskers are longer than the top whiskers. See the next page.)

3) There were more females who did really poorly on the test than males who did really poorly. What’s that about?

Distribution Shapes

Symmetric Distribution

A distribution for which each tail is a mirror image of the other.

Unimodal and Symmetric (US) distributions

The US distribution is the statistician’s favorite. (Who says statisticians aren’t patriotic.)

Most statistical techniques have been developed for such distributions.

A roughly symmetric and unimodal distribution – conscientiousness scores of 300+ students.

Skewed Distributions

Chapter 9

Positively skewed distribution

Long tail points in positive direction

A positively skewed distribution has a long tail in the positive direction.

Example: Income or money-related scores

Dot plots of male and female salariesBox and whisker plots of same salaries

From the text

Negatively skewed distributions

Long tail points in negative direction

The long tail is in the negative direction.

Examples:

Scores on an easy test.

Faked personality tests

A Dot plot of Conscientiousness scores of 166 students instructed to respond honestly.

Note that the distribution is symmetric.

Dot plot of conscientiousness scores of same 166 students, instructed to “fake good”.

When people are told to fake a personality test, they do a great job of increasing their scores from what they would be without faking.

Many of the student achieved the highest possible Conscientiousness score.

Example of comparison of graphs - Effectiveness of playing Bridge as an enrichment activity

In a school system in the Midwest one hour per week was dedicated as an “enrichment” hour. Students were given a variety of activities in which they could participate.

A group of parents volunteered to teach students to play bridge during the hour.

The parents believed that playing bridge would lead to higher math scores than those of students who did other activities. Among other notables who have endorsed bridge as a worthwhile enrichment activity are Bill Gates, founder of Microsoft, and Warren Buffet, successful investor.

The results for the first measurement of math skills after about 6 months of activity

Inspection of the two dot plots suggests that there is no huge difference in math achievement test scores between the Bridge group and the Control group. Statistical tests we’ll learn about later in the semester confirmed that there is no statistically significant difference between the groups.

Conclusion: Playing bridge during the one-hour “enrichment” period did not result in higher math achievement scores than doing other activities.

20

19

18

17

16

15

14

13

12

11

10

Biderman's 201 Handouts 2 (Tabular, Graphical Techniques) - 8/27/2015

Clerical Custodial Manager

Employment Category

0

100

200

300

400

Count

$10,000$20,000$30,000$40,000$50,000$60,000$70,000$80,000

Beginning Salary

0

50

100

150

200

250

Frequency

Mean = $17,016.09

Std. Dev. = $7,870.638

N = 474

Biderman's Psychology 201 Handouts › ... › docs › p2010_lecture02_frequ… · Web...

Documents

Transcript of Biderman's Psychology 201 Handouts › ... › docs › p2010_lecture02_frequ… · Web...