Measurement Myth Busters 101 Joe Adams, Ph.D. .

128
Measurement Myth Measurement Myth Busters 101 Busters 101 Joe Adams, Ph.D. Joe Adams, Ph.D. www.joeadams.net www.joeadams.net

Transcript of Measurement Myth Busters 101 Joe Adams, Ph.D. .

Measurement Myth Measurement Myth Busters 101 Busters 101

Joe Adams, Ph.D.Joe Adams, Ph.D.

www.joeadams.netwww.joeadams.net

Things to keep in mind:Things to keep in mind:

All measurement contains error.All measurement contains error.

All measures are human creations.All measures are human creations.

All measures require an observer All measures require an observer or instrument user.or instrument user.

More things to keep in More things to keep in mind:mind:

Measurement is a discipline.Measurement is a discipline. – What do you see?What do you see?– What do you hear?What do you hear?– How do you look and listen How do you look and listen

objectivity?objectivity?– How do you describe/define How do you describe/define

the observation?the observation?

Myth #1: You can’t Myth #1: You can’t measuremeasure

((fill in the blank)fill in the blank)..

The Best The Best Measures are Measures are

SimpleSimpleMeasures are a Measures are a shorthand for shorthand for experience or experience or observations.observations.Knowing your Knowing your subject matter subject matter counts!counts!If they can do it, you If they can do it, you can too!can too!Don’t be fooled by Don’t be fooled by naysayers. naysayers.

Gilley’s song inspired four teams of researchers to test his hypothesis!

And he was almost right!

The Beat Goes On…The Beat Goes On…

And so did the research:And so did the research:– On AttractivenessOn Attractiveness– On Mate SelectionOn Mate Selection– On Stability of RelationshipsOn Stability of Relationships– On Genetic Cues, etc., etc…On Genetic Cues, etc., etc…– On a lot of things you really don’t want On a lot of things you really don’t want

to know…to know…

Myth #2: It’s all Myth #2: It’s all subjective!subjective!

““Beauty is in the eye of Beauty is in the eye of the beholder!”the beholder!”

The Distorted Cultural Legacy The Distorted Cultural Legacy of A.J. Ayer (1910 – 1989)of A.J. Ayer (1910 – 1989)

Language, Truth, and Logic Language, Truth, and Logic (1936)(1936)

The most famous spokesman for the The most famous spokesman for the fact/value dichotomy.fact/value dichotomy.

Claimed that all statements about Claimed that all statements about values are merely expressions of values are merely expressions of emotion, with no logical significance.emotion, with no logical significance.

Also a formidable opponent to Mike Also a formidable opponent to Mike Tyson.Tyson.

Ayer v. TysonAyer v. Tyson““[Ayer] taught or lectured several times in the United [Ayer] taught or lectured several times in the United

States, including serving as a visiting professor at States, including serving as a visiting professor at Bard College in the fall of 1987. At a party that same Bard College in the fall of 1987. At a party that same year held by fashion designer Fernando Sanchez, year held by fashion designer Fernando Sanchez, Ayer, then 77, confronted Mike Tyson harassing the Ayer, then 77, confronted Mike Tyson harassing the (then little-known) model Naomi Campbell. When (then little-known) model Naomi Campbell. When Ayer demanded that Tyson stop, the boxer said: "Do Ayer demanded that Tyson stop, the boxer said: "Do you know who the f*** I am? I'm the heavyweight you know who the f*** I am? I'm the heavyweight champion of the world," to which Ayer replied: "And I champion of the world," to which Ayer replied: "And I am the former Wykeham Professor of Logic. We are am the former Wykeham Professor of Logic. We are both pre-eminent in our field. I suggest that we talk both pre-eminent in our field. I suggest that we talk about this like rational men". Ayer and Tyson then about this like rational men". Ayer and Tyson then began to talk, while Naomi Campbell slipped out.” - began to talk, while Naomi Campbell slipped out.” - WikipediaWikipedia

TKO – First Round!TKO – First Round!Verifiable on WikipediaVerifiable on Wikipedia

““The fact of twilight does The fact of twilight does not prevent us from not prevent us from distinguishing between day distinguishing between day and night.”and night.”

Attributed to Attributed to Dr. Samuel Johnson (1709-1784)Dr. Samuel Johnson (1709-1784)

The Real Issues Are:The Real Issues Are:ValidityValidity

and and ReliabilityReliability

Validity – Relevance - Validity – Relevance - LogicLogic

DESIRALBE QUALITIES:DESIRALBE QUALITIES:– RELEVANCE: Measures should mean RELEVANCE: Measures should mean

something important to those who use them something important to those who use them – performance measures should drive – performance measures should drive performance! performance!

– PURITY: Measures should deal with a clearly PURITY: Measures should deal with a clearly defined domain or dimension of a particular defined domain or dimension of a particular quality.quality.

– REPRESENTATIVENESS: Measures should REPRESENTATIVENESS: Measures should capture something about a phenomena capture something about a phenomena without distorting the phenomena.without distorting the phenomena.

Invalid MeasuresInvalid Measures

Tend to obscure reality, not Tend to obscure reality, not illuminate it.illuminate it.

May lead to erroneous, spurious, or May lead to erroneous, spurious, or absurd conclusions.absurd conclusions.

In Application of MeasuresIn Application of MeasuresInternal Threats to ValidityInternal Threats to ValiditySelection – picking facts that fit Selection – picking facts that fit hypothesishypothesisHistory – observations taken at different History – observations taken at different timestimesMaturation Effect – subjects or effects Maturation Effect – subjects or effects maturematureRepeated Testing – subjects get test-wiseRepeated Testing – subjects get test-wiseInstrumentation – “breaks down” or used Instrumentation – “breaks down” or used incorrectly incorrectly Experimental Mortality – people drop outExperimental Mortality – people drop outExperimenter Bias – creates expectationsExperimenter Bias – creates expectations

Threats to External ValidityThreats to External Validity

Generalizability of results may be Generalizability of results may be limited by:limited by:

– TIME – Sample taken on Fat Tuesday!TIME – Sample taken on Fat Tuesday!– SETTING – During the Superbowl.SETTING – During the Superbowl.– PLACES – PLACES – As they come out of As they come out of

Sugars…Sugars…– PEOPLE (SAMPLE) – Inside Sugars…PEOPLE (SAMPLE) – Inside Sugars…– OBSERVER – Barney FifeOBSERVER – Barney Fife

Threats to External ValidityThreats to External Validity(Continued)(Continued)

Generalizability of results may be Generalizability of results may be limited by:limited by:

– Placebo Effect – MSU Health PlanPlacebo Effect – MSU Health Plan– Novelty Effect – Ooo wow!Novelty Effect – Ooo wow!– Hawthorn Effect – More below.Hawthorn Effect – More below.

Summary of Validity IssuesSummary of Validity Issues

Does the measure capture what you Does the measure capture what you intend it to capture.intend it to capture.

Artifacts of measurementArtifacts of measurement

Artifacts of MeasuringArtifacts of MeasuringMeasures that pretend to be one thing, but are Measures that pretend to be one thing, but are actually something else (e.g. pleasing answers).actually something else (e.g. pleasing answers).

An artifact might mean that the act of An artifact might mean that the act of measuring caused something to register that measuring caused something to register that wasn’t there (e.g. questions about non-existent wasn’t there (e.g. questions about non-existent opinions).opinions).

The act of measurement disturbs the same The act of measurement disturbs the same reality it is measuring, a problem commonly reality it is measuring, a problem commonly known as the Heisenberg Principle known as the Heisenberg Principle (interviewers may make people self-conscious).(interviewers may make people self-conscious).

The Hawthorn EffectThe Hawthorn Effect

General Electric plant at Hawthorn General Electric plant at Hawthorn Works, outside Chicago in Cicero, Works, outside Chicago in Cicero, IllinoisIllinois

A series of studies done by Harvard A series of studies done by Harvard professors between 1924 and 1932.professors between 1924 and 1932.

They were testing hypotheses about They were testing hypotheses about working conditions and productivity.working conditions and productivity.

Treatment groups increased Treatment groups increased productivity regardless of conditions…productivity regardless of conditions…

Why did they improve?Why did they improve?

They felt “special” for being chosen They felt “special” for being chosen to participate in the experiment.to participate in the experiment.

The experiments spawned the whole The experiments spawned the whole Human Relations school of thought Human Relations school of thought in the field of management.in the field of management.

The Rosenthal EffectThe Rosenthal Effect

Studies done by Robert Rosenthal and Studies done by Robert Rosenthal and Lenore Jacobson (1968/1992).Lenore Jacobson (1968/1992).

Also called the Pygmalion Effect.Also called the Pygmalion Effect.

Observer / Teacher expectations Observer / Teacher expectations improved student results… more than improved student results… more than different “treatments.”different “treatments.”

That’s the good news about teaching: That’s the good news about teaching: It matters.It matters.

Reliability - Reliability - ConsistencyConsistency

DESIRALBE QUALITIES:DESIRALBE QUALITIES:

– ROBUSTNESS: Measures should work well ROBUSTNESS: Measures should work well under of variety of extraneous conditions.under of variety of extraneous conditions.

– PRECISESNESS: Measures should differentiate PRECISESNESS: Measures should differentiate between different qualities or gradations.between different qualities or gradations.

– SENSITIVITY: Measures should detect change.SENSITIVITY: Measures should detect change.

Inter-coder or inter-rater reliabilityInter-coder or inter-rater reliability:: The results of two or more people The results of two or more people correlation with each other on a correlation with each other on a particular item, using the same scale or particular item, using the same scale or instrument.instrument.

Problem: Problem: They see the same thing They see the same thing looking through the same lenses (but looking through the same lenses (but they were drunk).they were drunk).

– In the example from the In the example from the Girls All Get Girls All Get Prettier at Closing Time, Prettier at Closing Time, inter-coder inter-coder reliability on the attractiveness of reliability on the attractiveness of females typically reaches .90, or 90 females typically reaches .90, or 90 percent, depending on how you define percent, depending on how you define reliability. Most research in this area reliability. Most research in this area indicate a high degree of consistency indicate a high degree of consistency from both sexes. from both sexes. Does drinking help?Does drinking help?

Intercoder ReliabiltyIntercoder Reliabilty

Internal ConsistencyInternal Consistency

Internal consistencyInternal consistency:: The result of The result of one measure correlate with other one measure correlate with other similar, but different, measures similar, but different, measures measuring the same thing.measuring the same thing.Problem: Problem: Error in the measures may be Error in the measures may be

correlated more than the content.correlated more than the content. It’s the It’s the correlation between the measures that is correlation between the measures that is the key to knowing whether the measures the key to knowing whether the measures are reliable, but that might be a problem: are reliable, but that might be a problem:

The observer was drunk again. (GIGO)The observer was drunk again. (GIGO)

Test-retest ReliabilityTest-retest Reliability

Test-retest reliabilityTest-retest reliability: Try measuring the : Try measuring the same thing with the same instrument more same thing with the same instrument more than once to see if the results are the than once to see if the results are the same. same. Problem:Problem: The Barney Fife problem – the The Barney Fife problem – the person using the instrument is part of the person using the instrument is part of the instrument (retest won’t catch this).instrument (retest won’t catch this).– Examples: Racial differences between Examples: Racial differences between

interviewer and subject may shift responses on interviewer and subject may shift responses on surveys dealing with race. Male versus female surveys dealing with race. Male versus female interviewers asking about sexual issues has the interviewers asking about sexual issues has the same problem.same problem.

Split-Half ReliabilitySplit-Half Reliability

Split-half reliabilitySplit-half reliability: Use two : Use two equivalent forms of a scale to see if equivalent forms of a scale to see if they correlate. they correlate.

Example:Example: Use two different Use two different questions in the same survey to questions in the same survey to measure the same thing. If they are measure the same thing. If they are correlated, you’ve demonstrated the correlated, you’ve demonstrated the reliability of the instrument(s).reliability of the instrument(s).

Half Goofy: The MMPIHalf Goofy: The MMPI

The Minnesota Multiphasic The Minnesota Multiphasic Personality Inventory (1952 - )Personality Inventory (1952 - )– It’s the pattern, not the questions It’s the pattern, not the questions

alone.alone.– Different axes (dimensions).Different axes (dimensions).– The Diagnostic and Statistical Manual The Diagnostic and Statistical Manual

of Mental Disorders (DSM)of Mental Disorders (DSM)Provides standardized diagnoses.Provides standardized diagnoses.

Describes some treatment protocolsDescribes some treatment protocols

Resources for Testing Validity Resources for Testing Validity and Reliabilityand Reliability

G. David Garson, Quantitative G. David Garson, Quantitative Research in Public Administration Research in Public Administration http://www2.chass.ncsu.edu/garson/http://www2.chass.ncsu.edu/garson/pA765/reliab.htmpA765/reliab.htm

Wikipedia, Validity (Statistics) Wikipedia, Validity (Statistics) http://en.wikipedia.org/wiki/Validity_http://en.wikipedia.org/wiki/Validity_%28statistics%29%28statistics%29

Wikipedia, Validity (Logic) Wikipedia, Validity (Logic) http://en.wikipedia.org/wiki/Validityhttp://en.wikipedia.org/wiki/Validity

Myth # 3: Madison Myth # 3: Madison Avenue is home to the Avenue is home to the

world’s greatest scientific world’s greatest scientific minds (“Data minds (“Data provesproves (fill (fill

in the blank)”.in the blank)”.

How often have you heard:How often have you heard:

““Scientific research proves….”Scientific research proves….”

Science does not prove, it Science does not prove, it disproves.disproves.

Key things to understand:Key things to understand:– In science, a null hypothesis is rejected In science, a null hypothesis is rejected

or accepted.or accepted.– The outcome of any experiment or The outcome of any experiment or

statistical comparison counts as only one statistical comparison counts as only one observation, regardless of the number of observation, regardless of the number of data points.data points.

– Different observations at different times Different observations at different times may yield different results.may yield different results.

– Eternity is not ours to observe.Eternity is not ours to observe.

Key ReferencesKey References

David Hume (1711 – 1776)– Noted that David Hume (1711 – 1776)– Noted that there is nothing logically necessary about there is nothing logically necessary about the repetition of a pattern continuing in the repetition of a pattern continuing in the future.the future.Ludwig Wittgenstein (1889 – 1951) – Ludwig Wittgenstein (1889 – 1951) – Wrote the Wrote the Tractatus Logico-PhilosophicusTractatus Logico-Philosophicus, , which outlines almost all of the rules of which outlines almost all of the rules of scientific endeavor, one of the most scientific endeavor, one of the most important points of which, is that the important points of which, is that the notion of causation is a purely intellectual notion of causation is a purely intellectual construction and is never a fact.construction and is never a fact.

Myth 4#: The whole is Myth 4#: The whole is equal to sum of the parts.equal to sum of the parts.

AKA: The Ecological AKA: The Ecological FallacyFallacy

The Level of Measurement The Level of Measurement Matters Matters

(A Logical Validity Issue)(A Logical Validity Issue)

Levels of Analysis: Levels of Analysis: ExamplesExamples

Individual – a person, single cell, Individual – a person, single cell, atom, e.g. smallest discrete unit.atom, e.g. smallest discrete unit.

Group – may meet face-to-faceGroup – may meet face-to-face

Organization – does not generally Organization – does not generally meet face-to-facemeet face-to-face

State – a geopolitical jurisdictionState – a geopolitical jurisdiction

Nation – Like Texas y’all. Nation – Like Texas y’all.

Aggregate measures cannot generally be Aggregate measures cannot generally be used to estimate disaggregated used to estimate disaggregated behavior.behavior.

Conclusions about individual-level Conclusions about individual-level behavior cannot be drawn from behavior cannot be drawn from aggregate comparisons.aggregate comparisons.

Example:Example: Emile Durkheim’s Study Emile Durkheim’s Study of Suicide.of Suicide.

Just because more Bavarians Just because more Bavarians commit suicide, Catholics are commit suicide, Catholics are NOT NOT more likely to commit suicide more likely to commit suicide

Disaggregated data cannot generally be Disaggregated data cannot generally be used to estimate aggregate behavior.used to estimate aggregate behavior.

Conclusions about aggregate Conclusions about aggregate behavior cannot be drawn from behavior cannot be drawn from individual level data. individual level data.

Example:Example: Hydrogen and Oxygen Hydrogen and Oxygen burn. H2O does not.burn. H2O does not.

Not Not ALL ALL Texans carry guns and wear Texans carry guns and wear cowboy hats.cowboy hats.

Not Not ALLALL Austinites wear speedos and Austinites wear speedos and ride 10-speeds downtown.ride 10-speeds downtown.

Maybe Not?Maybe Not?

Gary King (1997). Gary King (1997). A Solution to the A Solution to the Ecological Inference Problem, Ecological Inference Problem, Princeton University Press.Princeton University Press.

Within limits, there may be “probable” Within limits, there may be “probable” statements about inferences between statements about inferences between levels. The level of certainty about levels. The level of certainty about such statements can be estimated.such statements can be estimated.

http://gking.harvard.edu/stats.shtmlhttp://gking.harvard.edu/stats.shtml

Myth #5: Attitudes Myth #5: Attitudes indicate behavior.indicate behavior.

Attitudes ≠ BehaviorAttitudes ≠ BehaviorClassic Case:Classic Case:

– LaPiere, Richard T. “LaPiere, Richard T. “Attitudes vs. Actions,” Attitudes vs. Actions,” Social Social ForcesForces, Vol. 13, No. 2. (Dec., 1934), pp. 230-237., Vol. 13, No. 2. (Dec., 1934), pp. 230-237.

Actual BehaviorActual Behavior

Customer Satisfaction?Customer Satisfaction?

Case #2 (1983)Case #2 (1983)

Cenaré Italian CuisineCenaré Italian Cuisine404 East University Drive404 East University Drive

College Station, TexasCollege Station, Texas

The tale of the half-price special!

Dr. Robert A. PetersonDr. Robert A. Peterson

Associate Dean for Research at the Associate Dean for Research at the University of Texas’ McCombs School of University of Texas’ McCombs School of BusinessBusiness

Robert A. Peterson and William R. Wilson (1992). Measuring Customer Robert A. Peterson and William R. Wilson (1992). Measuring Customer Satisfaction: Fact and Artifact, Satisfaction: Fact and Artifact, Journal of the Academy of Marketing Journal of the Academy of Marketing ScienceScience, Vol. 20, No. 1, 61-71., Vol. 20, No. 1, 61-71.

Customer satisfaction surveys may be Customer satisfaction surveys may be measuring how many happy people or measuring how many happy people or unhappy people are in the sample, nothing unhappy people are in the sample, nothing more.more.

Myth #6: Quantitative Myth #6: Quantitative data are different than data are different than

qualitative data.qualitative data.

Developing MeasuresDeveloping Measures

““Quantification is merely a second Quantification is merely a second order matching of primary order matching of primary qualities.”qualities.”

Karl Wolfgang Deutsch (1912-1992)Karl Wolfgang Deutsch (1912-1992)

Develop Powerful Develop Powerful Measures"!"Measures"!"

Three levels of Three levels of measurementmeasurement::–Nominal – The weakest measureNominal – The weakest measure–Ordinal – Mediocre, but not Ordinal – Mediocre, but not awful.awful.

–Interval/Ratio – The best Interval/Ratio – The best possible.possible.

Nominal MeasuresNominal Measures

– NominalNominal (Categorical)(Categorical) – refers to – refers to opaque qualities, color, sex, nationality, opaque qualities, color, sex, nationality, groups, etc. Must have no order or rank.groups, etc. Must have no order or rank.

Problem:Problem: There might be a hidden order There might be a hidden order to the measure that is not immediately to the measure that is not immediately identifiable, particularly in cases where identifiable, particularly in cases where social status may correlate with other social status may correlate with other measures (income, education, etc.). The measures (income, education, etc.). The existence of some hidden order is an existence of some hidden order is an empirical question that can be tested.empirical question that can be tested.

Ordinal MeasuresOrdinal Measures

Interval / Ordinal Measures Interval / Ordinal Measures – have – have direction or dimension, a greater and direction or dimension, a greater and lesser ends to the measure. Likert or lesser ends to the measure. Likert or Guttman Scales, 7-point, 5-point, but no Guttman Scales, 7-point, 5-point, but no specific distance between points. specific distance between points. Example: Scalding, hot, warm, cool, cold, Example: Scalding, hot, warm, cool, cold, freezing, etc…freezing, etc…

Problem:Problem: Survey question construction Survey question construction may prompt an order (preference among may prompt an order (preference among candidates). Randomization is a partial candidates). Randomization is a partial remedy.remedy.

Interval/RatioInterval/Ratio

Interval / Ratio Measures Interval / Ratio Measures – Most precise – Most precise kind of measures. The have a constant kind of measures. The have a constant interval of some kind, admits of degree, interval of some kind, admits of degree, gradations, sometimes referred to as a gradations, sometimes referred to as a “common metric.”“common metric.”

ProblemProblem:: Intervals may not be constant Intervals may not be constant (linear). The measures may hide uneven (linear). The measures may hide uneven increments. An example is education in increments. An example is education in years. A year of college is not equal to a years. A year of college is not equal to a year of elementary school year of elementary school

(unless you went to t.u.)(unless you went to t.u.)

Develop Powerful Develop Powerful Measures"!"Measures"!"

The more precise the measure, the The more precise the measure, the more powerful the analytical more powerful the analytical techniques that can be usedtechniques that can be used– Nominal: Crosstabs, Chi-square, Nominal: Crosstabs, Chi-square, – Ordinal: Tau-b, rank order Ordinal: Tau-b, rank order

correlation, etc.correlation, etc.– Interval/ratio: Regression, time-Interval/ratio: Regression, time-

series, etc.series, etc.

Definitions PrecisionDefinitions Precision

The precision of the measure The precision of the measure depends on two critical items:depends on two critical items:

– The quality of the definition, andThe quality of the definition, and

– The quality of the data collection The quality of the data collection system.system.

Parts of a Good DefinitionParts of a Good Definition

A clear description of the purposeA clear description of the purpose

A clear description of what the measure is A clear description of what the measure is supposed to measuresupposed to measure

A clear description of how the measure is A clear description of how the measure is to be applied, which includes:to be applied, which includes:– Every step in the data collection processEvery step in the data collection process– A means for identifying error in the collection A means for identifying error in the collection

process (what the measure is not)process (what the measure is not)

An explanation of how the measure will be An explanation of how the measure will be used.used.

““There are no facts, only There are no facts, only interpretations.”interpretations.”

Friedrich Nietzsche (1844-1900)Friedrich Nietzsche (1844-1900)

Context MattersContext Matters

What is the theory, hypothesis, or logic What is the theory, hypothesis, or logic model that makes this measure sensible?model that makes this measure sensible?

Is the measurement tied to a particular Is the measurement tied to a particular problem?problem?

Is the problem an intellectual/academic Is the problem an intellectual/academic question or a practical problem requiring question or a practical problem requiring a solution?a solution?

What question is the measure supposed What question is the measure supposed to answer?to answer?

Some call them ParadigmsSome call them Paradigms

Concept popularized by Thomas Kuhn in Concept popularized by Thomas Kuhn in the the Structure of Scientific Revolutions Structure of Scientific Revolutions (1962).(1962).The paradigm includes all the methods The paradigm includes all the methods related to the practice of a scientific related to the practice of a scientific endeavor, including the instrumentation endeavor, including the instrumentation and operating assumptions.and operating assumptions.

Example:Example: “Tell me about your mother…” “Tell me about your mother…”

http://en.wikipedia.org/wiki/Thomas_Samuel_Kuhnhttp://en.wikipedia.org/wiki/Thomas_Samuel_Kuhn

What is your context?What is your context?

Why do you need to measure Why do you need to measure something?something?– To test a hypothesis?To test a hypothesis?– To make decisions about agency To make decisions about agency

operations?operations?– To calculate cost/benefits?To calculate cost/benefits?– To demonstrate effectiveness?To demonstrate effectiveness?– To understand what is happening?To understand what is happening?– To find someone to blame?To find someone to blame?

Theories that Work!Theories that Work!

On Good Theories:On Good Theories: On the characteristics of a On the characteristics of a good theory, see the work of Imre Lakatos, good theory, see the work of Imre Lakatos, especially his book, especially his book, The Methodology of Scientific The Methodology of Scientific Research Programmes: Philosophical Papers Research Programmes: Philosophical Papers Volume 1 Volume 1 (1977); and Harry G. Frankfurt's (1977); and Harry G. Frankfurt's On On Truth. Truth.  (See also On Bullshit.) (See also On Bullshit.)

Good theories exemplify the characteristics of Good theories exemplify the characteristics of parismony (simplicity, elegance), explanatory parismony (simplicity, elegance), explanatory power (apply in a wide variety of situations), power (apply in a wide variety of situations), robustness (they operate in contaminated robustness (they operate in contaminated environments), and empirical support (fit facts environments), and empirical support (fit facts better than others). better than others). 

Feeling Good…Feeling Good…

was good enough for me was good enough for me

and Bobby McGee…and Bobby McGee…

Kris KristoffersonKris Kristofferson

(b. 1936, Brownsville, Texas)(b. 1936, Brownsville, Texas)

Flow: The Science of Optimal Flow: The Science of Optimal ExperienceExperience

by Mihaly Csikszentmihalyi by Mihaly Csikszentmihalyi ChallengesChallenges

SkillSkillss

AnxietAnxietyy

BoredBoredomom

FlowFlow

The Good Work ProjectThe Good Work Project

Recommended ReadingRecommended Reading: Martin E.P. : Martin E.P. Seligman, Authentic Happiness.com  (Book Seligman, Authentic Happiness.com  (Book Website)See his Website)See his What You Can Change and What You Can Change and What you Can't What you Can't and and The Optimistic ChildThe Optimistic Child; ; also see also see The Science of Optimism and Hope: The Science of Optimism and Hope: Research Essays in Honor of Martin E. P. Research Essays in Honor of Martin E. P. Seligman.Seligman. Mihaly Csikszentmihalyi's  Mihaly Csikszentmihalyi's Flow: Flow: The Psychology of Optimal ExperienceThe Psychology of Optimal Experience..

Also, see The Good Work Project website for Also, see The Good Work Project website for applications of these theories. applications of these theories.

Myth #7: Measures have Myth #7: Measures have to be exact.to be exact.

“…“…it is the mark of an educated it is the mark of an educated man to look for precision in each man to look for precision in each class of things just so far as the class of things just so far as the nature of the subject admits...”nature of the subject admits...”

- Aristotle- AristotleNichomachian Ethics Nichomachian Ethics

Special Cases for Special Cases for EstimationEstimation

Measures that estimate ranges and Measures that estimate ranges and compare proportions across two or compare proportions across two or more dimensions.more dimensions.

Measures that show relationships, Measures that show relationships, trade-offs, and thresholds.trade-offs, and thresholds.

Measures that show what is not Measures that show what is not seen, residuals.seen, residuals.

Flight Envelope Flight Envelope SummarizesSummarizes

Flight envelopes are Flight envelopes are estimatedestimated from from available data which show the available data which show the following characteristics: following characteristics: – aa Take-off speed Take-off speed – bb Stalling speed Stalling speed – cc Ceiling, with corresponding speed Ceiling, with corresponding speed – dd Maximum level speed Maximum level speed – dd Maximum speed at altitude Maximum speed at altitude – ff Maximum sea level speed Maximum sea level speed

Two-dimensions: Flight Two-dimensions: Flight EnvelopeEnvelope

1.1.Altitude (expressed in ranges)Altitude (expressed in ranges)2.2.Speed (expressed in ranges)Speed (expressed in ranges)

Comparing Flight Comparing Flight EnvelopesEnvelopes

1.1. Combat helicopter (ex. Boeing AH-64 Combat helicopter (ex. Boeing AH-64 Apache) Apache)

2.2. Cargo aircraft (ex. Lockheed C-130J) Cargo aircraft (ex. Lockheed C-130J) 3.3. Subsonic transport aircraft (ex. Airbus A-Subsonic transport aircraft (ex. Airbus A-

300) 300) 4.4. Supersonic fighter aircraft (ex. Lockheed F-Supersonic fighter aircraft (ex. Lockheed F-

16C)16C) http://www.aerodyn.org/Atm-flight/flimit.htmlhttp://www.aerodyn.org/Atm-flight/flimit.html

Measuring InequalityMeasuring Inequality

The Lorenz Curve describes any The Lorenz Curve describes any distribution of a quantity across any distribution of a quantity across any population.population.– The Gini coefficient provides a global The Gini coefficient provides a global

estimation of the degree of inequality estimation of the degree of inequality within that population.within that population.

The Gini CoefficientThe Gini Coefficient

Trade-offsTrade-offsBounded by a zero point (no trade-off).Bounded by a zero point (no trade-off).Change in A = Change in B = 0 Change in A = Change in B = 0

Trade-offs between A & B may occur six Trade-offs between A & B may occur six ways:ways:– A increases, B decreasesA increases, B decreases– B increases, A decreasesB increases, A decreases– A increases more than BA increases more than B– B increases more than AB increases more than A– A decreases more than BA decreases more than B– B decreases more than AB decreases more than A

Four Trade-off ConditionsFour Trade-off Conditions

Potential Trade-offs

A Wins B Wins

Net Increase A > B A < B

Net Decrease A > B A < B

Four Basic ConditionsFour Basic Conditions

More on this later…More on this later…

A Real-Life Measurement A Real-Life Measurement ProblemProblem

The Mississippi Department of The Mississippi Department of Wildlife, Fisheries, and Parks has an Wildlife, Fisheries, and Parks has an 8-week backlog in boat registration 8-week backlog in boat registration and sportsman’s licenses.and sportsman’s licenses.Delays do not discriminate between Delays do not discriminate between individuals, whether they be:individuals, whether they be:– FarmersFarmers– BankersBankers– Legislators, orLegislators, or– Governors.Governors.

Myth #8: You have to Myth #8: You have to observe subjects directly.observe subjects directly.

Measuring the UnseenMeasuring the Unseen

The Sherlock Holmes The Sherlock Holmes ApproachApproach

““We must fall back upon the We must fall back upon the old axiom that when all other old axiom that when all other contingencies fail, whatever contingencies fail, whatever remains, however improbable, remains, however improbable, must be the truth.”must be the truth.”– Sherlock HolmesSherlock Holmes

The Adventure of the Bruce Partington PlansThe Adventure of the Bruce Partington Plans

(Sir Arthur Conan Doyle)(Sir Arthur Conan Doyle)

We’re on the Case!We’re on the Case!

Whatever is left…Whatever is left…

Using residuals to Using residuals to measure measure something something indirectly has indirectly has been a very useful been a very useful technique in technique in several arenas.several arenas.

The Most Famous ExampleThe Most Famous Example

The Double-Helix of DNA was not The Double-Helix of DNA was not observed directly. In essence, Crick observed directly. In essence, Crick and Watson used Rosalind Franklin’s and Watson used Rosalind Franklin’s x-rays of wet and dry strands of x-rays of wet and dry strands of DNA.DNA.

Essentially, they were looking at the Essentially, they were looking at the shadow of DNA, not the DNA itself.shadow of DNA, not the DNA itself.

Example 2: Relative Political Example 2: Relative Political CapacityCapacity

Initial observations:Initial observations:All political systems must have resources.All political systems must have resources.Those that are able to obtain resources are Those that are able to obtain resources are stronger than those that cannot.stronger than those that cannot.Wealthier populations are able to pay more Wealthier populations are able to pay more taxes than poorer populations.taxes than poorer populations.Some economies are easier to tax than Some economies are easier to tax than others.others.People don’t like to pay taxes, unless they People don’t like to pay taxes, unless they know they’ll get the money back (e.g. Social know they’ll get the money back (e.g. Social Security).Security).

Predicted/Model vs. ActualPredicted/Model vs. Actual

Observations that fall on the regression Observations that fall on the regression line were given a score of 1.00.line were given a score of 1.00.

Those above were scored as a ratio of Those above were scored as a ratio of their predicted, if double, then 2.00, their predicted, if double, then 2.00, three times, 3.00 and so on…three times, 3.00 and so on…

Those below their predicted tax rate Those below their predicted tax rate were given scores from 0 to .99, based were given scores from 0 to .99, based on the percentage of the predicted on the percentage of the predicted scores.scores.

Results: Uses of RPCsResults: Uses of RPCs

Explains demographic transitions Explains demographic transitions (population explosions or lack (population explosions or lack thereof).thereof).

Outcomes of wars between relatively Outcomes of wars between relatively equal/uneaqual opponents.equal/uneaqual opponents.

Black market exchange rates for Black market exchange rates for currencies in unstable countries.currencies in unstable countries.

Let’sLet’s Talk Performance! Talk Performance!

Real Men and WomenReal Men and Women Use Performance Measures! Use Performance Measures!

(Wennies Don’t)(Wennies Don’t)

Performance measures should drive Performance measures should drive performance.performance.– There should be thresholds at which management There should be thresholds at which management

takes action to do something different.takes action to do something different.Example: Watch the altimeter for sudden drops, pull up Example: Watch the altimeter for sudden drops, pull up on the yoke if the numbers go down.on the yoke if the numbers go down.

– Those actions should be defined in some sort of Those actions should be defined in some sort of planplan

Example: At 500 feet, eject.Example: At 500 feet, eject.

A Barometer is not a A Barometer is not a Performance Measure!Performance Measure!

BenchmarkingBenchmarking

Choosing the Right Choosing the Right ComparisonsComparisons

Myth #9: Collin County Myth #9: Collin County Community College is the Community College is the

perfect peer.perfect peer.

Peer-to-PeerPeer-to-Peer

Choose statistical neighbors (like you).Choose statistical neighbors (like you).– Comparisons need to make sense.Comparisons need to make sense.

Choose those with a similar environment.Choose those with a similar environment.– Environments need to be “controlled” Environments need to be “controlled”

analytically.analytically.

Choose those who differ on performance.Choose those who differ on performance.– Variation requires explanation and Variation requires explanation and

understanding.understanding.– Lack of variation means nothing can be Lack of variation means nothing can be

learned.learned.

Best of BreedBest of Breed

Choose those who out-perform the Choose those who out-perform the competition.competition.– That is the “benchmark” to beat.That is the “benchmark” to beat.

Include those who do not perform Include those who do not perform well.well.– This avoids the mistake of Tom Peters.This avoids the mistake of Tom Peters.

Compare environments, but choose Compare environments, but choose on performance.on performance.

EstablishEstablish Baseline: Baseline: Compare TrendsCompare Trends

Track your own performance over Track your own performance over time.time.– Identify key internal and external Identify key internal and external

factors.factors.– Test explanations (hypotheses)Test explanations (hypotheses)

Identify variations.Identify variations.

If there are no variations, you cannot If there are no variations, you cannot draw any conclusions about causes. draw any conclusions about causes. A constant explains nothing.A constant explains nothing.

Myth #10: Good Myth #10: Good measures don’t vary.measures don’t vary.

Measures are VariablesMeasures are VariablesAnd Variables VaryAnd Variables Vary

No Variance?No Variance?– No chance of improvementNo chance of improvement– No Gains No Gains – No LearningNo Learning

A Costly Example of No A Costly Example of No VarianceVariance

Parties, Ideologies, and Budgets: A Parties, Ideologies, and Budgets: A Study of Budget Trade-offs in 18 Study of Budget Trade-offs in 18 OECD CountriesOECD Countries– Based on data from 1960 to 1990Based on data from 1960 to 1990– 65,000 cells of data drawn from more 65,000 cells of data drawn from more

than 50 sources, taking six months to than 50 sources, taking six months to enter by hand.enter by hand.

Results for Health vs. Results for Health vs. DefenseDefense

Results for Education vs. Results for Education vs. DefenseDefense

All is not lostAll is not lost

Mona LisaMona Lisa

““Discovery”Discovery”

Myth #11: Performance Myth #11: Performance measures will improve measures will improve

performance.performance.

Do Performance Do Performance Measures Improve Measures Improve

Performance?Performance?

The Case of Texas State The Case of Texas State AgenciesAgencies

Myth #12: Data integrity Myth #12: Data integrity is exclusively a reporting is exclusively a reporting

issue.issue.

Reporting is an Reporting is an operational issue.operational issue.

CREATING INTEGRITYCREATING INTEGRITY

BY BY

DESIGNDESIGN

Alabama SMART Alabama SMART Budgeting Budgeting TrainingTraining

Qualities of Qualities of Good Good Performance Performance Measures in the Measures in the Real WorldReal World

RELIABILITYRELIABILITY

Consistency – Data can be replicated by a Consistency – Data can be replicated by a competent, trained professional (e.g. competent, trained professional (e.g. Auditor).Auditor).

Accuracy – The indicators are true to the Accuracy – The indicators are true to the facts.facts.

VALIDITYVALIDITY

Relevance – Measures relate to progress Relevance – Measures relate to progress toward realistic agency/organizational toward realistic agency/organizational goals. goals.

Usefulness – They provide actionable Usefulness – They provide actionable indicatorsindicators

Data Integrity Starts with PeopleData Integrity Starts with People

Checklist:Checklist:Are reporting roles clearly defined?Are reporting roles clearly defined?Is there documentation?Is there documentation?– A ‘paper trail’ for auditing?A ‘paper trail’ for auditing?– Written procedures for verifying data Written procedures for verifying data

accuracy?accuracy?– Clear responsibility for reviewing and Clear responsibility for reviewing and

approving performance measure approving performance measure reports?reports?

Is there management Is there management ownershipownership for for performance measurement reports?performance measurement reports?

If the answer any of the first five If the answer any of the first five questions is “No.”questions is “No.”

Go back to the beginning.Go back to the beginning.

Check every step from start to finish Check every step from start to finish until the error or problem is until the error or problem is identified.identified.

If everything checks out, then it is If everything checks out, then it is time to look at program operations time to look at program operations for answers.*for answers.*

**This is a job that is the exclusive responsibility of program This is a job that is the exclusive responsibility of program management.management.

Question 6: Identify Root CausesQuestion 6: Identify Root Causes

Is the change in performance the Is the change in performance the result of an internal or external result of an internal or external factor?factor?– Can the relationship between internal Can the relationship between internal

or external factors and performance be or external factors and performance be demonstrated with data?demonstrated with data?

Do they correlate?Do they correlate?What are the patterns, trends, etc.?What are the patterns, trends, etc.?

– What factors can be changed by What factors can be changed by management?management?

Can staff, training, technology or funding Can staff, training, technology or funding change the result?change the result?What do data indicate about these What do data indicate about these connections?connections?

Response to 6: Action PlanResponse to 6: Action Plan

What is required to make change What is required to make change results?results?– What new activities will be required to What new activities will be required to

make those changes?make those changes?– What resources (or authority) would be What resources (or authority) would be

required to implement those new required to implement those new activities?activities?

– Who will implement new activities?Who will implement new activities?– When can the new activities begin?When can the new activities begin?– How long will it take for the new How long will it take for the new

activities to have an effect?activities to have an effect?

Measurement Measurement DisastersDisasters

Tennessee Sour Mash: Tennessee Sour Mash:

Corn and Student Test ScoresCorn and Student Test Scores

The SituationThe Situation

A University of Tennessee Ag A University of Tennessee Ag Economics Professor proposes using Economics Professor proposes using crop yield formulas for measuring crop yield formulas for measuring the “value-added” increases in the “value-added” increases in student test scores.student test scores.The Tennessee General Assembly The Tennessee General Assembly promptly enacts the idea, granting promptly enacts the idea, granting the professor a contract as the sole-the professor a contract as the sole-source provider, naming him source provider, naming him personally personally in statute in statute (name later (name later removed in the Tennessee Code).removed in the Tennessee Code).

Question:Question:

How do student test scores differ How do student test scores differ from corn?from corn?

Photo Credit: Lloyd Wolf/U.S. Census Bureau www.freephoto.com

Student Test Scores Crop Yields

What Type of Measures Are What Type of Measures Are They?They?Nominal?Nominal?

Ordinal?Ordinal?

Interval?Interval?

““INTERVAL”INTERVAL”

(BOTH MEASURES)(BOTH MEASURES)

Corn can always grow Corn can always grow taller!taller!

100%

Test Scores

Corn

Bounded Unbounded

Which School Would Do Which School Would Do Better?Better?

100%

95%

50%

Which is more likely to show a gain?

100%

School A? School B?

Starting Points

““Not everything that counts Not everything that counts can be measured, and not can be measured, and not

everything that can be everything that can be measured counts”measured counts”Albert Einstein (1879-1955)Albert Einstein (1879-1955)

Before we accept the first premise, we have to ask,

“Have we tried?”

Myth #13: Measures Myth #13: Measures can’t detect can’t detect

management issues.management issues.

Measuring What is Measuring What is ImportantImportant

Organizational CultureOrganizational Culture– Turnover – Big Clue!Turnover – Big Clue!– Absenteeism – Big Clue #2!Absenteeism – Big Clue #2!– Lack of initiative, passivity – Clue #3Lack of initiative, passivity – Clue #3– Low morale – Starting to see a pattern?Low morale – Starting to see a pattern?– Anger, frustration, discipline problems…Anger, frustration, discipline problems…– Sense of hopelessness!!!!Sense of hopelessness!!!!

How do we measure this?How do we measure this?

Possible Index?Possible Index?TEN RULES FOR STIFLING INNOVATIONTEN RULES FOR STIFLING INNOVATION

1.1. Regard any new idea from below with suspicion—because it’s Regard any new idea from below with suspicion—because it’s new, and because it’s from below.new, and because it’s from below.

2.2. Insist that people who need your approval to act first go Insist that people who need your approval to act first go through several other levels of management to get their through several other levels of management to get their signatures.signatures.

3.3. Ask departments or individuals to challenge and criticize each Ask departments or individuals to challenge and criticize each other’s proposals. (That saves you the job of deciding; you just other’s proposals. (That saves you the job of deciding; you just pick the survivor.)pick the survivor.)

4.4. Discuss your criticisms freely, and withhold your praise. That Discuss your criticisms freely, and withhold your praise. That keeps people on their toes. Let them know they can be fired at keeps people on their toes. Let them know they can be fired at any time.any time.

5.5. Treat identification of problems as signs of failure, to Treat identification of problems as signs of failure, to discourage people from letting you know when something in discourage people from letting you know when something in their area isn’t working.their area isn’t working.

Cont…Cont…TEN RULES FOR STIFLING INNOVATION (continued):TEN RULES FOR STIFLING INNOVATION (continued):

6.6. Control everything carefully. Make sure people count anything Control everything carefully. Make sure people count anything that can be counted, frequently.that can be counted, frequently.

7.7. Make decisions to reorganize or change policies in secret, arid Make decisions to reorganize or change policies in secret, arid spring them on people unexpectedly. (That also keeps People on spring them on people unexpectedly. (That also keeps People on their toes.) Let them know that they can be fired at any time.their toes.) Let them know that they can be fired at any time.

8.8. Make sure that requests for information are fully justified, and Make sure that requests for information are fully justified, and make sure that it is not given out to managers freely. (You don’t make sure that it is not given out to managers freely. (You don’t want data to fall into the wrong hands.)want data to fall into the wrong hands.)

9.9. Assign to lower-level managers, in the name of delegation and Assign to lower-level managers, in the name of delegation and participation responsibilities for figuring out how to cut back, lay participation responsibilities for figuring out how to cut back, lay oil, move people around, or otherwise implement threatening oil, move people around, or otherwise implement threatening decisions you have made, and get them to do it quickly.decisions you have made, and get them to do it quickly.

10.10.And above all, never forget that you, the higher-ups, already And above all, never forget that you, the higher-ups, already know everything important about this business.know everything important about this business.

“These ‘rules’ reflect pure segmentalism in action—a culture and an attitude that make it unattractive and difficult for people in the organization to take initiative to solve problems and develop innovative solutions… Segmentalist companies may not suffer from a lack of potential innovators so much as from failure to make the power available to those embryonic entrepreneurs that they can use to innovate.

And, …when innovations do occur, segmentalist organizations may not even he able to take advantage of them.”

Rosabeth Moss Kanter, The Change Masters, 1982, p. 101.

Myth #14: Counting Myth #14: Counting people is easy.people is easy.

How many people did you How many people did you serve?serve?

Recidivism or Repeat Recidivism or Repeat Customers?Customers?

Do unduplicated counts make more sense Do unduplicated counts make more sense than duplicated counts? Why?than duplicated counts? Why?

How do we count level of service?How do we count level of service?

What if wrap-around services are effective What if wrap-around services are effective and one-shot taps on the head are not?and one-shot taps on the head are not?

What counts as service? What counts as service?

How do we count costs for repeat How do we count costs for repeat customers or those that consume more customers or those that consume more than one menu item?than one menu item?

““Life is not divided into Life is not divided into federal block grants.”federal block grants.”

Robert GreensteinRobert GreensteinCenter for Budget and Policy PrioritiesCenter for Budget and Policy PrioritiesNCSL Conference in Burlington, VT NCSL Conference in Burlington, VT September 1995September 1995

Myth #15: We’ve already Myth #15: We’ve already counted everything that’s counted everything that’s

important.important.

People are strange.People are strange.

Your measures need to Your measures need to capture reality!capture reality!

All relevant observations must fit All relevant observations must fit somewhere o the measure.somewhere o the measure.

If they don’t, you’re missing reality.If they don’t, you’re missing reality.

Anomalies are as important as the Anomalies are as important as the “normal” observations.“normal” observations.

We learn from measurement when We learn from measurement when they help us see something we they help us see something we would have missed.would have missed.

Outcome Measures: Outcome Measures: Telling the Tale that Wags Telling the Tale that Wags

the Dog?the Dog?Is anybody better off?Is anybody better off?

Is anybody worse off?Is anybody worse off?

How can you tell?How can you tell?

Adapted from Mark Friedman’s Adapted from Mark Friedman’s Trying Trying Hard is Not Good Enough.Hard is Not Good Enough.