Computer Adaptive Testing and the Patient-Reported ... · Computer Adaptive Testing and the...

33
Computer Adaptive Testing and the Patient-Reported Outcomes Measurement Information System (PROMIS)* Kitty S. Chan, PhD Associate Professor Department of Health Policy and Management Johns Hopkins Bloomberg School of Public Health November 1, 2011 * Acknowledgement: Many graphic slides were taken or adapted from a lecture for a similar course by Bryce B. Reeve, PhD, then Psychometrician and Program Director, Outcomes Research Branch, National Cancer Institute.

Transcript of Computer Adaptive Testing and the Patient-Reported ... · Computer Adaptive Testing and the...

Computer Adaptive Testing and the Patient-Reported Outcomes

Measurement Information System (PROMIS)*

Kitty S. Chan, PhD Associate Professor

Department of Health Policy and Management Johns Hopkins Bloomberg School of Public Health

November 1, 2011

* Acknowledgement: Many graphic slides were taken or adapted from a lecture for a similar course by Bryce B. Reeve, PhD, then Psychometrician and Program Director, Outcomes Research Branch, National Cancer Institute.

What is CAT?

CAT integrates the power of item response theory measurement framework and computer technology to administer a

patient reported outcome (PRO) measure that select a question to administer based

on a person’s response to previously administered questions.

What is CAT? Hypothetical Example: Nutrition during Infancy

easy

hard

What are benefits of breastfeeding?

What vitamins does your baby need?

What is the minimum daily nutrition requirements for your baby?

When should you introduce solid foods?

What are the signs that your baby is ready to feed him/ herself?

What are the signs that your baby is hungry?

Presenter
Presentation Notes
But, what exactly is CAT? To start, a computerized adaptive tests consists of a pool of many, many items or questions for which we know which are easy and which are hard based on empirical data. But what is neat about CAT is unlike the traditional test format where everyone gets the same questions, the selection of the next item to present to a respondent will be based on what his or her response was on a previous item. For example, let’s say we have two respondents, one who is highly knowledgable in infant nutrition and the other less so. We would start by asking both respondents, the same question, usually one in the middle range of difficulty, say “what are the signs that your baby is ready to feed him/herself?” But when the more knowledgable parent respond correctly to this, he or she will be asked a more difficult question, say “what vitamins does your baby need?” When the less knowledgable parent respond incorrectly to the first question, they would be asked an easier question, like “what are the signs that your baby is hungry?” What this does is to tailor the set of questions to a particular parent’s knowledge level, so that we get a good measure of a parent’s knowledge even with a small number of questions.

What are CAT’s advantages

Provide an accurate estimate of a person’s score with the minimal number of questions.

• Questions are selected to match the health status of the respondent.

CAT minimizes floor and ceiling effects. • People near the top or bottom of a scale will

receive items that are designed to assess their health status.

Before you have a CAT, you need an item bank

What is an “Item Bank” A large collection of items measuring a single domain

The items have been evaluated and tested to ensure their relevance, clarity, and psychometric robustness – Items are selected to maximize precision and retain

clinical relevance

– Items in the same bank are linked on a common metric

How Do You Link Different Measures?

Different Linking Designs – One group take two (or more) tests – Two different but equivalent groups take two

(or more tests) – Tests given to two different groups, with

common items internal or external to tests* – Two groups take different tests, but common

group of individuals take both tests

* More commonly used due to feasibility

Presenter
Presentation Notes
So how do you do this? One thing to know is that during IRT estimation procedures, we often don’t know item parameter and ability levels of subjects beforehand. In order for estimation to proceed, a scale has to be established. Usually, it is the group mean that is placed on a mean of 0 and a standard deviation of one. Item parameters and scores are estimated with this scale as their reference point. So when we have two tests we need a bridge to link up the scores from one test to another. There are four designs that may be used to link things up. Designs two and three are generally considered more feasible because respondent burden tends to be lower.

The Idea Behind IRT Score Calibration and Linking

Use one of the linking designs to bridge measures When bridged by common group or common (or anchor) items – Parameters for “new” or “different” items are linked on

same scale Find any differentially functioning items With standard item parameters and after modeling differences in item functioning, scores should be calibrated

Graphic representation of Common Items Linking Design

Item A Item B Item C

Item O 1 Item P 1 Item Q

Item A Item B Item C Item M

Item N

Item O 2 Item P 2 Item R

Item C Item M Item N

Item O 3 Item P 3 Item S Item T

4 Item O

Item P 4

Item X Item Y Item Z

Item X Item Y Item Z

SET 1 SET 2 SET 3 SET 4

How does CAT work?

Item Bank(Validated & IRT-Calibrated Depression Items)

20 30 40 50 60 70 80severemoderatemildvery low

Depressive Symptoms

20 30 40 50 60 70 80severemoderatemildvery low

Depressive Symptoms

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

In the past 7 days, I felt depressed.

Never Rarely Often Some times

Always

In the past 7 days, I felt depressed.

Never Rarely Often Some times

Always

Item Bank(Validated & IRT-Calibrated Depression Items)

20 30 40 50 60 70 80moderatemildvery low

Depressive Symptoms

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

In the past 7 days, I felt depressed.

Some times

severe

Item Bank(Validated & IRT-Calibrated Depression Items)

Depressive Symptoms

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

In the past 7 days, I felt helpless.

Some times

20 30 40 50 60 70 80moderatemildvery low severe

Item Bank(Validated & IRT-Calibrated Depression Items)

Depressive Symptoms

0.0

0.2

0.4

0.6

0.8

1.0

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

In the past 7 days, I felt that nothing could cheer me up.

Rarely

20 30 40 50 60 70 80moderatemildvery low severe

NIH Roadmap Initiative: PROMIS

Patient-Reported Outcomes Measurement Information System

http://www.nihpromis.org

PROMIS: Goals NIH Roadmap Initiatives: Re-Engineering the Clinical Research Enterprise

Improve assessment of self-reported symptoms and other health-related quality of life domains across many chronic diseases.

Advance the science and technology to: – Facilitate the collection of standardized patient-reports of their

health, functioning and well-being and

– Integrate their responses to inform decision-making in research and healthcare delivery

PROMIS accomplishes these goals by…

Developing Item Banks for Patient Reported Outcome Domains Once Calibrated, items from item bank can be used to

Develop CAT for the domain Develop short forms measures of domain Select a pre-existing short form measure of domain

Creating online gateway to these item banks, CATs and measure to enable their use for clinical research and practice (“Assessment Center”)

PROMIS: Structure PROMIS Domains for Item Banking – Core : pain, fatigue, depression, anxiety, anger,

physical function, social function, and overall general health

– Additional : sleep/wake function, cognitive function, sex functioning, illness impact

– Pediatric PROMIS

Continued Development and Validation

Neuro-QoL, a related NIH supported resource

PROMIS Item Banks

What Does PROMIS Measure?

PROMIS Item Banks

PROMIS Item Banks

PROMIS: Overall Picture

Neuro-QoL Framework (NINDS)

Psycho-metricTesting

Item Bank(IRT-calibrated items reviewed forreliability, validity, and sensitivity)

0.0

0.5

1.0

1.5

2.0

2.5

-3 -2 -1 0 1 2 3

Theta

Info

rma

tio

n

0.0

0.2

0.4

0.6

0.8

1.0

-3 -2 -1 0 1 2 3

Theta

Pro

ba

bil

ity

of

Re

sp

on

se

Short FormInstruments

CAT

Items fromInstrument

A

Item Pool

Items fromInstrument

B

Items fromInstrument

CNew

Items

Questionnaireadministered to largerepresentative sample

SecondaryData Analysis

CognitiveTesting

FocusGroups

Content ExpertReview

no depression

mild depression

moderate depression

severe depression

extreme depression

Depression Item Bank

Item1

Item2

Item3

Item4

Item5

Item6

Item7

Item8

Item9

Itemn

Depression Item Bank

Item1

Item2

Item3

Item4

Item5

Item6

Item7

Item8

Item9

Itemn

Depression Short Form

A

Depression Short Form

A

Depression Short Form

B

Depression Short Form

B

Depression Short Form

C

Depression Short Form

C

Develop short forms from PROMIS Item Banks

Advantages: – Select a set of items that are matched to the

severity level of the target population. – All scales built from the same item bank are

linked on a similar metric.

PROMIS CAT Outperforms Legacy Questionnaires

No Fatigue Severe Fatigue

Stan

dard

Err

or

PROMIS CAT Outperforms Legacy Questionnaires

0

0.1

0.2

0.3

0.4

0.5

0.6

-2.5 -1.5 -0.5 0.5 1.5 2.5

4-item SF36/Vitality4-item CAT13-item FACIT-Fatigue13-item CAT98-item Bank

0

0.1

0.2

0.3

0.4

0.5

0.6

-2.5 -1.5 -0.5 0.5 1.5 2.5

4-item SF36/Vitality4-item CAT13-item FACIT-Fatigue13-item CAT98-item Bank

No Fatigue Severe Fatigue

Stan

dard

Err

or

Precision↓

US General Population mean

PROMIS Assessment Center

Goal: To enable administration of item banks of standardized patient-reported

outcomes measures for use in clinical research, population surveillance,

and clinical practice.

Assessment Center Features an online, dynamic application that will allow researchers to centralize all research activities includes features that promote instrument development, study administration, data management, and storage of statistical analysis results houses a library of instruments and items with an emphasis on health-related quality of life

Language Notes

Available Translations – Most banks available in Spanish – Some translations in other languages in

progress Chinese (Mainland, simplified) Portuguese

See full set of available and in progress translations

http://www.nihpromis.org/measures/translations

Let’s Try it Out

http://www.assessmentcenter.net/ac1/

Thank you for your attention!

Contact Information: [email protected]

Hampton House 633