Transforming data: Some very valuable tools

42
Transforming data: Some very valuable tools S-012

description

Transforming data: Some very valuable tools. S-012. Transforming scores: Shifting scales can be a big help. Some common transformations Proportions or percentages Rank order The Z transformation (standardizing) Square root Logarithm. 1. Raw scores to proportions or percentages. - PowerPoint PPT Presentation

Transcript of Transforming data: Some very valuable tools

Page 1: Transforming data: Some very valuable tools

Transforming data:Some very valuable tools

S-012

Page 2: Transforming data: Some very valuable tools

Transforming scores:Shifting scales can be a big help

Some common transformations1. Proportions or percentages2. Rank order3. The Z transformation (standardizing)4. Square root5. Logarithm

Page 3: Transforming data: Some very valuable tools

1. Raw scores to proportions or percentages

Obs Raw score(# correct)

1 3

2 5

3 10

4 15

.

.

Page 4: Transforming data: Some very valuable tools

1. Raw scores to proportions or percentages

Obs Raw score(# correct)

Total number of items

1 3 15

2 5 15

3 10 15

4 15 15

.

.

Page 5: Transforming data: Some very valuable tools

1. Raw scores to proportions or percentages

Obs Raw score(# correct)

Total number of items

Proportion Percentage

1 3 15 .20

2 5 15

3 10 15

4 15 15

.

.

Page 6: Transforming data: Some very valuable tools

1. Raw scores to proportions or percentages

Obs Raw score(# correct)

Total number of items

Proportion Percentage

1 3 15 .20 20%

2 5 15

3 10 15

4 15 15

.

.

Page 7: Transforming data: Some very valuable tools

1. Raw scores to proportions or percentages

Obs Raw score(# correct)

Total number of items

Proportion Percentage

1 3 15 .20 20%

2 5 15 .33 33%

3 10 15

4 15 15

.

.

Page 8: Transforming data: Some very valuable tools

1. Raw scores to proportions or percentages

Probably the most common transformation. We do this all the time.

Obs Raw score(# correct)

Total number of items

Proportion Percentage

1 3 15 .20 20%

2 5 15 .33 33%

3 10 15 .67 67%

4 15 15 1.00 100%

.

.

Page 9: Transforming data: Some very valuable tools

1. Raw scores to proportions or percentagesAnother example

Another example: Analyzing conversations at dinner tables.• Recordings of conversations• Adjust for length of conversation• Proportion of turns, or proportion of utterances

Obs Raw score(# correct)

Incorrect items

Total attempted

PercentageCorrect

1 10 10 20 50%

2 5 5 10 50

3 3 2 5 60

4 15 30 45 33

. . . .

. . . .

Page 10: Transforming data: Some very valuable tools

2. Transforming to ranks

Obs Pages read

1 15

2 200

3 25

4 400

. . . .

. . . .

Example: Grade 4 students reading. Number of pages reported in one week.

Page 11: Transforming data: Some very valuable tools

2. Transforming to ranks

Obs Pages read Rank

1 15 4

2 200 2

3 25 3

4 400 1

. . . .

. . . .

Example: Grade 4 students reading. Number of pages reported in one week.

Page 12: Transforming data: Some very valuable tools

2. Transforming to ranks

Obs Pages read Rank Rank from low to high

1 15 4 1

2 200 2 3

3 25 3 2

4 400 1 4

. . . .

. . . .

Example: Grade 4 students reading. Number of pages reported in one week.

(Stata likes to rank from lowest to highest.)

Ranking preserves the order, but it ignores the distances between the scores.

Ranking is a very common and very useful transformation.

Page 13: Transforming data: Some very valuable tools

The “Z” transformationMy favorite! The best!

Obs Time 1 Time 2

1 10 25

2 15 35

3 20 30

4 16 40

5 13 37

. . .

. . .

Mean1 = 15.0 Mean2=35.0

SD1 = 5.0 SD2 = 10.0

Example: Students’ scores at two different times.

Page 14: Transforming data: Some very valuable tools

The “Z” transformation

Obs Time 1 Time 2

1 10 25

2 15 35

3 20 30

4 16 40

5 13 37

. . .

. . .

Mean1 = 15.0 Mean2=35.0

SD1 = 5.0 SD2 = 10.0

How well did student #1 do at time 1?

How about student 2? 3? Etc.? How did they do at

time 2?

Page 15: Transforming data: Some very valuable tools

The “Z” transformation

Obs Time 1 z Time 2 z

1 10 -1.0 25

2 15 35

3 20 30

4 16 40

5 13 37

. . .

. . .

Mean1 = 15.0 Mean2=35.0

SD1 = 5.0 SD2 = 10.0

Use the group mean and SD to create z-scores.

Page 16: Transforming data: Some very valuable tools

The “Z” transformation

Obs Time 1 z Time 2 z

1 10 -1.0 25

2 15 0 35

3 20 +1.0 30

4 16 0.2 40

5 13 -0.4 37

. . .

. . .

Mean1 = 15.0 Mean2=35.0

SD1 = 5.0 SD2 = 10.0

Use the group mean and SD to create z-scores.

Page 17: Transforming data: Some very valuable tools

The “Z” transformation

Obs Time 1 z Time 2 z

1 10 -1.0 25 -1.0

2 15 0 35 0

3 20 +1.0 30 -0.5

4 16 0.2 40 0.5

5 13 -0.4 37 0.2

. . . .

. . . .

Mean1 = 15.0 Mean2=35.0

SD1 = 5.0 SD2 = 10.0

The z-scores now help us a lot in comparing individual performance at time1 and time2.

Page 18: Transforming data: Some very valuable tools

The “Z” transformation formula

𝑧=𝑥−𝑥𝑠 𝑧=

𝑥−𝜇𝜎

There are two versions of the formula.

1. Here we use the sample mean and the sample SD.

2. Here we use the population mean and the population SD.

• Use the mean and SD of the sample.

• How far is each score from the sample mean?

• How many standard deviations away?

• Use the mean and SD of a population.

• How far is each score from the population mean?

• How many standard deviations away?

Page 19: Transforming data: Some very valuable tools

Z- transformation exampleGRE scores

The old version of the GRE was scaled so that the mean was 500, with a standard deviation of 100. (Mean = 500, SD = 100)• If a student had a score of 600, how good is that score?• X = 600, so z = 1.0. (One SD above the GRE population mean.)

The new version of the GRE is rescaled so that the mean is 150, with a standard deviation of 9.0. (Mean = 150, SD = 9)• If a student had a score of 160, how good is that score?• X = 160, so z = 1.1. (A bit more than 1 SD above the GRE population

mean.)

𝑧=𝑥−𝜇𝜎

Page 20: Transforming data: Some very valuable tools

• Key idea:– How many SDs away from mean– How far from mean – in SD units

– What a great idea!– Lets us compare things even when we use different tests or different scoring systems

The “Z” transformation

My favorite! The best!

Other examples?

Page 21: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x)

1 4

2 9

3 1

4 25

5 64

6 49

7 4

8 9

9 1

10 100

Page 22: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x)

1 4

2 9

3 1

4 25

5 64

6 49

7 4

8 9

9 1

10 100

Mean pulled way up beyond the median.Not “normal” at all. (Not bell-shaped.)Skewed to the right, positively skewed.

Mean = 26.6Median = 9

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990123456789

10

Page 23: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x) √X

1 4

2 9

3 1

4 25

5 64

6 49

7 4

8 9

9 1

10 100

• Let’s look at each score and take the square root.

• This will pull in the high scores.

Page 24: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x) √X

1 4 2

2 9

3 1

4 25

5 64

6 49

7 4

8 9

9 1

10 100

• Let’s look at each score and take the square root.

• This will pull in the high scores.

Page 25: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x) √X

1 4 2

2 9 3

3 1

4 25

5 64

6 49

7 4

8 9

9 1

10 100

• Let’s look at each score and take the square root.

• This will pull in the high scores.

Page 26: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x) √X

1 4 2

2 9 3

3 1 1

4 25

5 64

6 49

7 4

8 9

9 1

10 100

• Let’s look at each score and take the square root.

• This will pull in the high scores.

Page 27: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x) √X

1 4 2

2 9 3

3 1 1

4 25 5

5 64

6 49

7 4

8 9

9 1

10 100

• Let’s look at each score and take the square root.

• This will pull in the high scores.

Page 28: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Obs Score (x) √X

1 4 2

2 9 3

3 1 1

4 25 5

5 64 8

6 49 7

7 4 2

8 9 3

9 1 1

10 100 10

• Let’s look at each score and take the square root.

• This will pull in the high scores.

Page 29: Transforming data: Some very valuable tools

The square-root transformation:Often useful when things are positively skewed

Mean = 26.6Median = 9

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990123456789

10

Obs Score (x) √X

1 4 2

2 9 3

3 1 1

4 25 5

5 64 8

6 49 7

7 4 2

8 9 3

9 1 1

10 100 10

Mean = 4.2Median = 3

Page 30: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x)

1 10

2 100

3 1000

4 10000

5 90

6 9

7 50

10 1 = 10

10 2 = 100

10 3 = 1000

10 4 = 10000

These (the exponents) are the logs (the logarithms)

• Here I am using “base 10” logs.

Page 31: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x) Log(x)

1 10

2 100

3 1000

4 10000

5 90

6 9

7 50

10 1 = 10

10 2 = 100

10 3 = 1000

10 4 = 1000

1

2

3

4

The “logs” are the exponents

Page 32: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x) Log(x)

1 10 1

2 100 2

3 1000 3

4 10000 4

5 90

6 9

7 50

10 1 = 10

10 2 = 100

10 3 = 1000

10 4 = 1000

The “logs” are the exponents

10 ? = 90What will the log of 90 be?

Page 33: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x) Log(x)

1 10 1

2 100 2

3 1000 3

4 10000 4

5 90

6 9

7 50

10 1 = 10

10 2 = 100

10 3 = 1000

10 4 = 1000

The “logs” are the exponents

10 ? = 90What will the log of 90 be?

1.95

Page 34: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x) Log(x)

1 10 1

2 100 2

3 1000 3

4 10000 4

5 90 1.95

6 9 0.95

7 50 1.70

10 1 = 10

10 2 = 100

10 3 = 1000

10 4 = 1000

The “logs” are the exponents

10 1.95

= 9010

0.95= 9

10 1.70

= 50

Page 35: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x) Log(x)

1 10 1

2 100 2

3 1000 3

4 10000 4

5 90 1.95

6 9 0.95

7 50 1.70

The log transformation has a dramatic effect on the scores. This changes the distances between the scores. This has a huge effect on the distribution. When scores are spread out widely on the scale (e.g., 10, 100, 1000, etc.) the log helps to pull in the very high scores. Actually, it pulls in the high scores, and it can help to spread out the low scores. This is a very useful and very common transformation. (Widely used in economics, biology, demography, etc.)

Page 36: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x) Log(x)

1 10 1

2 100 2

3 1000 3

4 10000 4

5 90 1.95

6 9 0.95

7 50 1.70

The low scores (9, 10, 50, 100) are all clustered together at the left side.) We cannot really see them. The large values are far away from the small values.

Mean = 1608Median = 90

Page 37: Transforming data: Some very valuable tools

The log transformation:Often useful when things are positively skewed

Or when the range is very wide (over several orders of magnitude)

Obs Score (x) Log(x)

1 10 1

2 100 2

3 1000 3

4 10000 4

5 90 1.95

6 9 0.95

7 50 1.70

Mean = 1608Median = 90 Original scores

Log scores

High scores pulled in. Lower scores more spread out. The scale has changed.

Page 38: Transforming data: Some very valuable tools

The log transformation:Also often useful when we are studying growth over time

age vocabulary6 37 48 49 5

10 711 812 1013 1214 1515 1816 2217 2718 3319 4020 4921 5922 7223 8824 10825 13126 16027 19528 23829 29130 35531 43332 52833 64434 78635 95836 1169

Example: Studying children’s vocabulary growthHow many words are they learning?

During early months, the “scores” (the vocabulary sizes) are low, so they are bunched together.

But at older ages, the growth continues, and so the scores are much more spread out.

The scale changes quite a bit here.• The early scores are 4, 5, 7, 10, 20.• The later scores are 400, 600,

1100.

• So this is another example where the log transform may be helpful.

Page 39: Transforming data: Some very valuable tools

The log transformation:Also often useful when we are studying growth over time

Check these graphs

Vocabulary is growing, and it seems to be growing faster and faster!

Wait! Now we see that the growth is steady. (Here the growth is 20% per month.)The log transformation is helpful here!

Page 40: Transforming data: Some very valuable tools

The log transformation:Also often useful when we are studying growth over time

Check these graphs

Growth in vocabulary for two children. (Both growing rapidly!) But the gap is getting larger and larger over time.

The log transformation shows us the differences in the growth rates. (Here the difference is only one percent per month.) But this monthly difference is steady, so it ends producing a big difference over time.

Page 41: Transforming data: Some very valuable tools

Transformations:The can help, but they require lots of thought

1. Percentages are useful• Very common• Adjusts things to rates rather

than simple counts• Easy to understand

2. Rank ordering• Preserves the order• Ignores the distances• Very common• Several important statistical tests use the

rank order

3. Square-root transformation• Often useful with count data

(days absent) (household size)• When there is positive skew

(Skewed to the right)• Pulls in the long tail• Works with positive values

4. Log transformation• Useful when scores are spread out over a

very wide scale• When we look at things that change in

percentage terms (e.g., growth rates:1-percent growth, or 5-percent growth)

• Works only with positive values (Sometimes we add a constant so we can use square root or log transform.)

• Sometimes harder to interpret• Very commonly used in economics, biology,

ecology, etc.

Page 42: Transforming data: Some very valuable tools

But the best, most important, most valuable, most versatile, all-around most-cool transformation is . . .

ZCall it “Zee” Call it “Zed”However you pronounce it, it is a great concept.

• How far away? How many SDs away?• Z is a standard score. On a standard scale.• Helps us compare results on different tests (different test scales).• Helps compare results of different studies.• Helps us judge differences when we are comparing groups.