APPLICATIONS OF MODERN TEST THEORY IN SKIN CANCER · Item Response Theory: Applications of modern...
Transcript of APPLICATIONS OF MODERN TEST THEORY IN SKIN CANCER · Item Response Theory: Applications of modern...
ITEM RESPONSE THEORY:
APPLICATIONS OF MODERN TEST THEORY
IN SKIN CANCER RESEARCH
Ngadiman Djaja
B. Psy (Hons), M.Ed (Research, Assessment & Evaluation)
Submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy
CENTRE FOR RESEARCH EXCELLENCE IN SUN & HEALTH
Institute of Health and Biomedical Innovation
School of Public Health and Social Work | Faculty of Health
QUEENSLAND UNIVERSITY OF TECHNOLOGY
2017
Item Response Theory: Applications of modern test theory in skin cancer research i
Keywords
Computer adaptive test, item response theory, rasch model, risk factors, skin cancer,
sun-exposure behaviours, sun-protection behaviours.
ii Chapter 1: Introduction
Abstract
The overall objective of this PhD study was to assess the feasibility of applying Item
Response Theory (IRT) to self-reported skin cancer risk questionnaires. This process
was divided into separate studies described in the five articles presented in this
dissertation.
The first study used secondary data from the Queensland University of Technology’s
Skin Awareness study. The objective of the first study was to determine how well the
ten-item skin self-examination attitude scale fit the requirements of a Rasch rating
scale model. The Rasch rating scale model is the most common model used to
analyse Likert-scale questions. The skin self-examination attitude scale showed good
internal reliability; eight out of ten items exhibited fit the Rasch model, and thus
possessed unidimensional measurement characteristics. The skin self-examination
attitude scale can be improved in the future by adding items that measure a strong
positive attitude towards skin self-examination. This study was published in Health
and Quality of Life Outcomes:
(http://hqlo.biomedcentral.com/articles/10.1186/s12955-014-0189-x).
The second research study, “Changes in self-reported sun-protection behaviours due
to concern about vitamin D status,” investigated changes in skin cancer prevention
behaviour people may undertake due to concern about vitamin D. Any decrease in
skin cancer prevention behaviours may increase future skin cancer risk. The study
used secondary data from the AusD study. The study conducted a cross-sectional
survey across the four seasons (2009-10) and latitudes ranging from 19-43°S. The
survey assessed vitamin D attitudes and changes in sun protection behaviours arising
from concerns about low vitamin D levels. Rasch partial credit models were used to
illustrate the potential effect of changing sun-protection behaviours due to concern
about vitamin D. This study was published in Photochemistry and Photobiology:
(http://onlinelibrary.wiley.com/doi/10.1111/php.12582/full)
The third study, “Advantages of Mobile Computer-Adaptive Testing (CAT) to
Quickly Estimate Skin Cancer Risk,” used secondary data from the QSkin study.
This study was devoted to the application of item response theory for computer
Item Response Theory: Applications of modern test theory in skin cancer research iii
adaptive testing to reduce response burden in skin cancer risk assessment. The study
compared the efficiency of non-adaptive testing and computer adaptive testing
facilitated by the partial credit model derived calibration of the QSkin skin cancer
risk questionnaire. The use of computer adaptive testing led to smaller standard error
of the estimated measure than non-adaptive testing, with substantially higher
efficiency without loss of precision and reducing response burden by 48%, 66%, and
66% for dichotomous, rating scale, and partial credit models, respectively. This study
was published in Journal of Medical Internet Research
(http://www.jmir.org/2016/1/e22/).
The fourth study, “Diagnostic Discrimination of a Skin Cancer Risk Scale” used
secondary data from the QSkin skin cancer risk questionnaire. The study objective
was to calibrate existing skin cancer-related questionnaires using a partial credit
model and examine their predictive discrimination of non-melanoma skin cancer
prospectively. Diagnostic discrimination showed an area under the curve statistics of
.753 (p < .000), .530 (p < .000), and .487 (p=0.093), for the phenotype, sun exposure
behaviours, and sun protection behaviours subscales, respectively. A full paper was
presented at the International Outcome Measurement Conference in Chicago, April
2015.
The fifth study, “Development and Psychometric Evaluation of Skin Cancer Risk
Scale Utilising Item Response Theory” aimed to develop a skin cancer risk scale
with strong measurement qualities utilising a modern test theory approach. The study
combined the best questions from existing skin cancer questionnaires used in Studies
2, 3, and 4, then calibrated them using a partial credit item response theory model to
create a scale measuring underlying skin cancer risk. The study found that 50-items
within three skin cancer risk subscales had good psychometrics properties (validity
and reliability). A draft manuscript is presented in Chapter 6.
To the best of our knowledge, this work is the first study that comprehensively uses
an item response theory approach to analyse data collected from skin cancer-related
questionnaires and then develop a comprehensive questionnaire measuring skin
cancer risk. Overall, this dissertation explored current and item-response theory-
based approaches to skin cancer risk-related measurement and provides empirical
evidence regarding the benefits of integrating item-response theory modelling. The
five studies presented in this thesis demonstrate the advantages of item response
iv Chapter 1: Introduction
theory in various applications within skin cancer research, including providing a first
set of items that could form part of a future skin cancer risk measurement item bank.
The development of the SunAus scale extended item response theory application into
the skin cancer field. The successful implementation of the scale in discriminating a
person’s risk based on their phenotype provides a good model for future studies. The
scale offers improvements compared to previous measures, such as greater content
coverage, precision, and if used with computer adaptive testing, will be more
economical and less burdensome for people in predicting risk. More research is
required to establish the benefits of modern test theory to measure skin cancer risk
and behaviours more accurately. This thesis makes a significant contribution to
knowledge generation, public health practice, and policy-related issues in the field of
skin cancer prevention by working towards more efficient and precise measurement
tools for future use.
Item Response Theory: Applications of modern test theory in skin cancer research v
A Note Regarding Format
This dissertation is a thesis by publication. It contains five publications that have
either been published or are under blind-peer review by refereed journals; therefore,
the wording of the journals is as published. The logical flow of the thesis is
maintained by introducing these articles where they fit most appropriately into the
thesis structure. The thesis uses the AMA numbered referencing style, with each
publication chapter containing its own reference list, and the references for Chapters
1 and 7 contained in the main reference list at the end of the document. The articles
have been reconfigured to Word to provide consistent formatting throughout the
thesis. Moreover, tables and figures have been numbered continuously throughout
the thesis, for consistency.
vi Chapter 1: Introduction
Table of Contents
Keywords .................................................................................................................................. i
Abstract .................................................................................................................................... ii
A Note Regarding Format ........................................................................................................ v
Table of Contents .................................................................................................................... vi
List of Figures ......................................................................................................................... ix
List of Tables ........................................................................................................................... xi
List of Abbreviations ............................................................................................................. xiii
Definition of Key Terms ....................................................................................................... xiv
List of Publications and Presentations .................................................................................. xvii
Statement of Original Authorship ......................................................................................... xix
Acknowledgements ................................................................................................................ xx
Chapter 1: Introduction ...................................................................................... 1
Background .................................................................................................................... 1
Brief literature review .................................................................................................... 3 1.2.1 Skin cancer-related measures ............................................................................... 3 1.2.2 Brief overview of Test Theory ............................................................................. 7 1.2.3 Differences between Classical Test Theory and Item Response Theory ............. 8 1.2.4 Test Development using Classical Test Theory and Item Response
Theory ................................................................................................................ 23 1.2.5 1-Parameter Logistic (1-PL) Model or The Rasch Model ................................. 25 1.2.6 2-Parameter Logistic (2-PL) Model ................................................................... 26 1.2.7 3-Parameter Logistic (3-PL) Model ................................................................... 28 1.2.8 Partial Credit Model........................................................................................... 29 1.2.9 Rating Scale Model ............................................................................................ 30
Choosing a model ......................................................................................................... 30
Limitations of item response theory ............................................................................. 31
Purpose of this doctoral work ...................................................................................... 32
Research questions ....................................................................................................... 33
Significance of the thesis ............................................................................................. 34
Thesis outline ............................................................................................................... 35
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an
Item Response Theory Model Approach ............................................................... 38
Abstract ........................................................................................................................ 41
Introduction .................................................................................................................. 42
Methods ........................................................................................................................ 43
Results .......................................................................................................................... 46
Discussion .................................................................................................................... 50
Item Response Theory: Applications of modern test theory in skin cancer research vii
Conclusion ....................................................................................................................52
References ....................................................................................................................53
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different
Latitudes in Australia .............................................................................................. 56
Abstract .........................................................................................................................59
Introduction ..................................................................................................................60
Material and methods ...................................................................................................62
Results ..........................................................................................................................65
Discussion .....................................................................................................................68
Conclusion ....................................................................................................................70
References ....................................................................................................................71
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer
Adaptive Testing ...................................................................................................... 82
Abstract .........................................................................................................................85
Introduction ..................................................................................................................86
Methods ........................................................................................................................88
Results ..........................................................................................................................94
Discussion .....................................................................................................................96
Conclusions ..................................................................................................................99
References ..................................................................................................................101
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale:
Application of Item Response Theory .................................................................. 105
Abstract .......................................................................................................................108
Introduction ................................................................................................................109
Methods ......................................................................................................................109
Results ........................................................................................................................110
Discussion ...................................................................................................................118
References ..................................................................................................................119
Chapter 6: Development and Psychometric Evaluation of Item Banks for the
Assessment of Skin Cancer Risk Using Item Response Theory ......................... 121
Abstract .......................................................................................................................124
Introduction ................................................................................................................126
Methods ......................................................................................................................127
Methods ......................................................................................................................134
Discussion ...................................................................................................................147
Conclusions ................................................................................................................149
References ..................................................................................................................156
Chapter 7: Discussion ...................................................................................... 159
viii Chapter 1: Introduction
Summary of the Main Findings.................................................................................. 159
Discussion of the Main Findings................................................................................ 160 7.2.1 Item Response Theory as a tool for evaluating psychometrics properties
of a questionnaire. ............................................................................................ 160 7.2.2 Item Response Theory as a tool for developing a new a questionnaire. .......... 162 7.2.3 Use of Computer Adaptive Test to reduce participants’ burden. ..................... 162
The Assessment of Skin Cancer Risk. ....................................................................... 163
Methodological Considerations and Future Studies................................................... 166 7.4.1 Limitations of the research .............................................................................. 167 7.4.2 Suggestions for Future Implementation ........................................................... 169
Conclusion ................................................................................................................. 171
References ............................................................................................................... 173
Appendices .............................................................................................................. 193
Item Response Theory: Applications of modern test theory in skin cancer research ix
List of Figures
Figure 1.1-1: Illustration of IRT................................................................................... 3
Figure 1.1-2: Uniform differential item functioning .................................................. 10
Figure 1.1-3: Non-uniform differential item functioning .......................................... 11
Figure 1.1-4: Item Characteristic Curve from 2 dichotomous items ......................... 13
Figure 1.1-5: Category response function of a polytomous item ............................... 14
Figure 1.1-6: Item information functions for 8 items ................................................ 15
Figure 1.1-7: Standard error of measurement ............................................................ 18
Figure 1.1-8: Illustration of computer adaptive testing .............................................. 20
Figure 1.1-9: Sample of test linking ........................................................................... 21
Figure 1.1-10: One-Parameter Item Characteristic Curves for Four Typical
Items ............................................................................................................. 26
Figure 1.1-11: 2-Parameter Item Characteristic Curves for Four Typical Items ....... 27
Figure 1.1-12: 3-Parameter Item Characteristic Curves for Four Typical Items ....... 29
Figure 1.1-13: Overview of current doctoral work .................................................... 33
Figure 2.1: Wright map/item person map of Skin Self-Examination Attitude
Scale with the mean theta of person on the left and mean theta of
items on the right. ........................................................................................ 48
Figure 3.1: Item Information Functions from two items plotted along the latent
trait logits of skin cancer predisposition ...................................................... 79
Figure 3.2: Supplement 2. The distribution of the skin cancer predisposition
(scores converted to T score) ....................................................................... 81
Figure 4.1: Sample selection flowchart ...................................................................... 89
Figure 4.2: Study simulation and CAT flowchart. ..................................................... 91
Figure 4.3: Determining a cut-off point ..................................................................... 94
Figure 4.4: Generated with 3 Rasch Models .............................................................. 95
Figure 4.5: Efficiency and precision of CAT and compared to using 10, 20 or
30 items in static NAT format. .................................................................... 96
Figure 4.6 A graphical CAT report shown after each response (top) and the
more item length, the less standard errors in CAT process (bottom) .......... 97
Figure 5.1: Items person map of PH subscale. ......................................................... 113
Figure 5.2: Items person map of SE subscale. ......................................................... 114
Figure 5.3: Items person map of SP subscale. ......................................................... 115
Figure 5.4: Example of most probable response for a person with skin cancer
risk in PH scale of 0.5 logits. ..................................................................... 116
Figure 5.5: Category probability curves of item PH3 .............................................. 117
x Chapter 1: Introduction
Figure 5.6: ROC curve ............................................................................................. 118
Figure 6.1: Recruitment of participants. ................................................................... 128
Figure 6.2: Steps in data analysis ............................................................................. 132
Figure 6.3: The Wright map for phenotype subscale. .............................................. 142
Figure 6.4: Distribution of Standard Error Measurement for each domain: a)
phenotype, b) sun exposure, c) sun protection ........................................... 144
Figure 6.5: ROC Curves of outcome variables ........................................................ 146
Figure 6.6: A1. Sun exposure behaviours scale item map ....................................... 154
Figure 6.7: A2. Sun protection behaviours scale item map ..................................... 155
Figure 7.1: Roadmap for an International Skin Cancer Risk Item Bank ................. 171
Item Response Theory: Applications of modern test theory in skin cancer research xi
List of Tables
Table 1.1: Studies examining the psychometrics properties of skin cancer
related measures ............................................................................................. 5
Table 1.2: Differences between Classical and Item Response Theories .................... 22
Table 1.3: Steps in test development ......................................................................... 23
Table 1.4: Taxonomy of IRT Models ........................................................................ 25
Table 2.1: Item total correlation, fit statistics and item difficulty for the 10-item
Skin Self-Examination Attitude Scale ......................................................... 49
Table 2.2: DIF statistics for the 8-item skin self-examination attitude scale ............. 50
Table 3.1: Differences in vitamin D attitudes and sun protection behaviours by
location1 ....................................................................................................... 75
Table 3.2: Vitamin D-related attitudes and self-reported changes in sun
protection behaviours ................................................................................... 76
Table 3.3: Multivariable logistic regression models of associations between
vitamin D-related attitudes and changes made during the last summer
to the way people protected themselves from the sun so they can get
enough vitamin D* ....................................................................................... 77
Table 3.4: Item location and fit statistics of sun protection behaviour items
calibrated within a skin cancer predisposition model. ................................. 78
Table 3.5: Supplement 1. Demographic and Phenotypic characteristics of the
participants (n=1,002) .................................................................................. 80
Table 4.1: 10, 20, or 30 items in static NAT format. ................................................. 92
Table 4.2: Precision of CAT. ..................................................................................... 93
Table 4.3: Efficiency of CAT..................................................................................... 93
Table 5.1: Item parameter estimations and fit statistics of skin cancer risk
(SCR) scale ................................................................................................ 111
Table 5.2: Skin cancer risk score for each subscale in validation sample ............... 117
Table 6.1: Overview of items measured on the SunAus Scale ................................ 130
Table 6.2: Characteristics of Study Participants (N=1,177) and 2011 Australia
census – Queensland (QLD) State only [43] ............................................. 135
Table 6.3: Item parameter estimations and fit statistics of phenotype scale ............ 138
Table 6.4: Item parameter estimations and fit statistics of sun exposure
behaviours scale ......................................................................................... 139
Table 6.5: Item parameter estimations and fit statistics of sun protection
behaviours scale ......................................................................................... 140
Table 6.7: Supplement 1: Table for conversion of phenotype scale summed
item scores to Rasch measures ................................................................... 151
xii Chapter 1: Introduction
Table 6.8: Supplement 2: Table for conversion of sun exposure behavior scale
summed item scores to Rasch measures .................................................... 152
Table 6.9: Supplement 3: Table for conversion of sun protection behavior scale
summed item scores to Rasch measures. ................................................... 153
Item Response Theory: Applications of modern test theory in skin cancer research xiii
List of Abbreviations
1PL One-Parameter Logistic Model
2PL Two-Parameter Logistic Model
3PL Three-Parameter Logistic Model
CAT Computer Adaptive Test
CTT Classical Test Theory
DIF Differential Item Functioning
IRT Item Response Theory
PCM Partial Credit Model
RSM Rating Scale Model
ROC Receiver Operating Characteristics
SEM Standard Error of Measurement
xiv Chapter 1: Introduction
Definition of Key Terms
1PL-IRT An item response model that estimates one item parameter -
item difficulty.
2PL-IRT An item response model that estimates two item parameters -
item difficulty and item discrimination.
3PL-IRT An item response model that estimates three item parameters -
item difficulty, item discrimination, and pseudo guessing.
Ability The quality of being able to do something.
a parameter Known as item discrimination (slope) parameter in Item
Response Theory.
b parameter Known as item difficulty parameter in item response theory.
Bias The effect of any factor that the researcher did not expect to
influence the dependent variable.
c parameter Known as pseudo guessing parameter in item response theory.
Calibration
The procedure of estimating a person’s ability or item difficulty
by converting raw score to logits on an objective measurement
scale.
Classical test
theory
The model indicates that any observed test (O) score could be
envisioned as the composite of two hypothetical components: a
true score (T) and a random error component (E).
Construct A single latent trait, characteristic, attribute, or dimension
assumed to be underlying a set of items.
Dichotomous Data that have only two values, such as right/wrong, pass/fail,
yes/no, agree/disagree, male/female.
Differential item
functioning
The loss of invariance of item estimates across testing
occasions. Differential item functioning is evidence of item
bias.
Item Individual question or statement that measures a single content
area.
Item bank A collection of items.
Item
characteristic
curve
A curve that describes the probability of response on an item
given a certain ability level.
Item response
theory
Mathematical models of how examinees at different ability
levels for a given trait should respond to a test item.
Item Response Theory: Applications of modern test theory in skin cancer research xv
Likert scale A series of statements or questions that is used to measure
people attitudes, behaviours, values, and opinions.
Measurement
error Inaccuracy resulting from a flaw in measuring instruments.
Measurement
precision The accuracy of any measurement.
Objective
measurement
The repetition of a unit amount that maintains its size, within an
allowable range of error, no matter which instrument intended
to measure the variable of interest is used, and no matter which
relevant person or thing is measured.
Partial credit
model
An item response theory model for polytomous data, which
allows the number of ordered item categories and/or their
threshold values to vary from item to item.
Polytomous An item having more than two response categories. For
example, a five-point Likert type scale.
Reliability A measure of the consistency of an instrument’s score over
time.
Scale Consists of multiple items that measure a single domain, such
as anxiety.
Standard error of
measurement
Describes an expected observed score fluctuation due to error in
the measurement tool. Standard deviation of error about an
estimated score. In classical test theory, the standard error of
measurement is the same for all score levels; in item response
theory it can vary from score to score, and therefore can be used
as a termination criterion in computer adaptive testing when the
number of items is allowed to vary until a threshold standard
error of measurement is reached.
Theta ()
Unobservable construct (or latent variable) being measured by a
scale. It is estimated from the responses people give to test
items that have been previously calibrated by an item response
theory model.
Threshold
The level at which the likelihood of failure to agree with or
endorse a given response category below the threshold turns to
the likelihood of agreeing with or endorsing the category above
the threshold.
Trait
An unobservable latent dimension, such as stress, well-being, or
pain, which is thought to give rise to a set of observed item
responses. In item response theory, the latent trait being
measured by a scale is denoted as theta (θ).
Unidimensional
A basic concept in scientific measurement that only one
attribute of an object be measured at a time. The item response
theory model requires a single construct to underlie the items
xvi Chapter 1: Introduction
that form a hierarchical continuum.
Validity
Refers to the degree to which evidence and theory support the
interpretations of test scores entailed by the proposed use of the
tests.
Item Response Theory: Applications of modern test theory in skin cancer research xvii
List of Publications and Presentations
THE FOLLOWING PAPERS HAVE BEEN PUBLISHED DURING MY
CANDIDATURE
Publications included in the thesis:
Djaja N, Youl P, Aitken J, Janda M. Evaluation of a skin self-examination attitude
scale using an item response theory model approach. Health and Quality of Life
Outcomes. 2014; 12(1): 189. doi: 10.1186/s12955-014-0189-x
Djaja N, Janda M, Lucas RM, Harrison SL, van der Mei I, Ebeling PR, Neale RE,
Whiteman DC, Nowak M, Kimlin MG. (Self-Reported Changes in Sun-Protection
Behaviours at Different Latitudes in Australia. Photochem Photobiol. 2016; 92: 495–
502. doi:10.1111/php.12582
Djaja N, Janda M, Olsen C M, Whiteman DC, Chien TW. Estimating Skin Cancer
Risk: Evaluating Mobile Computer-Adaptive Testing. J Med Internet Res. 2016;
18(1): e22. doi: 10.2196/jmir.4736.
Djaja N, Janda M, Olsen CM, Whiteman DC. Diagnostic Discrimination of the Skin
Cancer Risk (SCR) scale: Application of Item Response Theory. International
Outcome Measurement Conference. Chicago. 2015.
Djaja N, Youl P, Whiteman DC, White K, Kimlin M, Janda M. "Development and
Psychometrics Evaluation of Skin Cancer Risk Scale Utilising Item Response
Theory". (Draft). 2016.
Relevant publications (with QUT affiliation) not included in the thesis:
1. Kimlin JA, Black AA, Djaja N, Wood JM. Development and validation of a
vision and night driving questionnaire. Ophthalmic and Physiological Optics. 2016;
36(4), 465-476. doi:10.1111/opo.12307
xviii Chapter 1: Introduction
Conference publications during candidature:
1. Djaja N, Youl P, Aitken J, Janda, M. An Item Response Theory analysis of The
Skin Self-Examination Awareness Scale: An application of modern measurement
theory in skin cancer prevention research. Paper presented at the meeting of the
Pacific-Rim Objective Measurement Symposium, Kaohsiung, Taiwan. August 2013.
2. Djaja N, Youl P, Aitken J, Janda M. An Item Response Theory analysis of The
Skin Self-Examination Awareness Scale: An application of modern measurement
theory in skin cancer prevention research. Poster session presented at the meeting of
the Global Controversies and Advances in Skin Cancer. Brisbane, Australia. 2013.
3. Djaja N, Janda M, Olsen CM, Whiteman DC. Assessing the measurement quality
of a Skin Cancer Risk questionnaire using a Rasch modelling approach. Paper
presented at the meeting of the Pacific-Rim Objective Measurement Symposium,
Guangzhou, China. August 2014.
4. Djaja N, Janda M, Olsen CM, Whiteman DC. Assessing the measurement quality
of a Skin Cancer Risk questionnaire using a Rasch modelling approach. Paper
presented at the meeting of the International Objective Measurement Conference,
Chicago, IL. April 2015
5. Djaja N, Janda M, Olsen CM, Whiteman DC. Should we continue to measure skin
cancer risk factors using outdated methods? Paper presented at the meeting of the
3rd International Conference on UV and Skin Cancer Prevention, Melbourne,
Australia. December 2015.
Awards and grants during candidature:
1. Travel Grant Awards
a) Applied Psychological Measurement (USD 1,000)
2. Scholarships
a) Centre of Research Excellence in Sun and Health (CRESH) Scholarship.
b) QUT Tuition Fee Waiver Scholarship, Queensland University of Technology,
Australia.
c) Top-Up Scholarship, Queensland University of Technology, Australia.
QUT Verified Signature
xx Chapter 1: Introduction
Acknowledgements
I would like to take this opportunity to express my thanks to those who assisted me
with various aspects of conducting the research and writing of this thesis.
Special thanks to Professor Michael Kimlin, Director of Centre for Research
Excellence in Sun and Health (CRESH) and other CRESH investigators for their
support throughout my scholarship for my PhD. Thanks also to Applied
Psychological Measurement Inc. and Journal of Computer Adaptive Testing
(Minneapolis, U.S.A.) for the student travel grant that enabled me to present some of
my study at a conference.
I am grateful for my co-authors: 1). Dr Catherine Olsen and Professor David C
Whiteman from the QIMR Berghofer Medical Research Institute; 2). Associate
Professor Pip Youl and Professor Joanne Aitken from Cancer Council Queensland;
3). AusD investigators Professor Robyn M Lucas, Dr Simone L Harrison, Professor
Ingrid van der Mei, Professor Peter R Ebeling, Professor Dr Rachel E Neale, Dr
Madeline Nowak, and Professor Michael Kimlin; and 4). Dr Tsar-Wei Chien from
the Chimei Medical Centre.
Special thanks also to Director of Assessment Research Centre at The Hong Kong
Institute of Education, Professor Wang Wen Chung and Dr Tsar-Wei Chien from the
Chimei Medical Centre, Taiwan, for the internship opportunity to learn computer
adaptive testing.
I would also like to thank Dr Martin Reese who proofread this thesis during my
candidature. My thanks also to professional editor, Kylie Morris, who provided
copyediting and proofreading services of the non-published portions of this thesis,
according to university-endorsed guidelines and the Australian Standards for editing
research theses.
I want to thank my dear friends and colleagues at the iHop Research group, Linda
Finch, Anna Finnane, Benjamin Singh, Caitlin Horsham, Jena Buchan, Kelly
Prosser, Melissa Creed, Saira Sanjida, and Professor Sandi Hayes; and CRESH
students Lindsay Brandon, Shanchita Khan, and Huong Tran Cam Dang.
Item Response Theory: Applications of modern test theory in skin cancer research xxi
Last but not least, I especially wish to thank my supervisors, Professor Monika
Janda, Professor Michael Kimlin, and Professor David Whiteman. It is a privilege
simply to be associated with them, to learn from the best in the field. Words cannot
express my gratitude to them.
This thesis is dedicated to my partner and family in Indonesia.
Chapter 1: Introduction 1
Chapter 1: Introduction
BACKGROUND
Skin cancers include three common types: melanomas,1,2 basal cell carcinomas (BCC),3-5 and
squamous cell carcinomas (SCC). A large amount of research focuses on skin cancer
prevention, better understanding the risk factors for skin cancer, and improving early
detection and treatment of skin cancers. In the past five years, there has also been increasing
interest in vitamin D, a steroid hormone that requires exposure of the skin to the sun for
synthesis,6-10 which is hypothesised to be inversely associated with several types of
cancer.11,12 Australia is one of the countries that has led the research in both fields; this is
likely due to the high rates of skin cancer in Australia,13 and the unexpectedly high rates of
vitamin D deficiency, despite high ambient ultraviolet radiation.14,15
Melanoma, BCC, and SCC incidence rates all vary according to geographic locality, with the
highest rates in the northern parts of Australia,16 such as Queensland. On average, two out of
every three Australians will be treated for one of these three types of skin cancer at some
stage during their lives and it is estimated that approximately 80 percent of all new cancers
diagnosed in Australia are skin cancers.13 The diagnosis and treatment of non-melanoma skin
cancer (the collective term for BCC and SCC) was estimated to have cost the Australian
community $703 million (95% CI, $674.6–$731.4 million) in 2015, with an estimated
940,000 people receiving treatment for skin cancers each year.17
Many self-reported measures (questionnaires, surveys, rating scales, or interviews) have been
developed to assess melanoma risk or its components.18-20 Various terms are used in the
scientific community that have similar meaning to self-reported measure, such as
questionnaire,21-25 assessment,26-28 test,29-32 scale,33,34 survey,35-38 instrument,39 or inventor.40
All of these terms are generally used to refer to any procedure that aims to obtain self-
reported data in education, psychology, public health, and other fields by people giving
answers to questions in a paper-pencil form. In this thesis, these terms are used
interchangeably when referring to any procedure aiming to obtain self-reported data from
participants.
The questionnaires used in skin cancer-related studies are commonly designed for people to
self-complete. Broadly similar questionnaires that aim to assess risk, attitudinal, or
2 Chapter 1: Introduction
behavioural dimensions have been used in different geographical, cultural contexts and
among differing sub-populations.41-44 However, the questionnaires often appear not to have
been developed according to current psychometric standards, including assessment of
objectivity (clear meaning, understood the same way by different people or subgroups of the
population); validity, reliability, stability, or sensitivity to change; error of measurement;
norms; or score comparability (for those measuring the same construct).45 The few
questionnaires that reported a limited set of psychometric properties were developed using
traditional test theory (classical test theory).25,33,46,47 Only one study48 was found that used
more modern methods to assess questionnaire quality, called item response theory (IRT),
during measurement development. IRT is a group` of mathematical models suitable for the
assessment of the quality of a questionnaire that aims to capture constructs such as peoples’
attitudes, intentions, or self-reported behaviours.49 Attitudes, intentions, or self-reported
behaviours are also called latent traits, as they cannot be directly observed.
Figure 1.1-1 below helps to demonstrate how IRT works. Assume the line demarked with two
arrows on either side represents the skin cancer risk continuum, and that three items are used
to measure skin cancer risk. Items are placed in order of difficulty/severity. Easier items are
on the left and harder on the right. “Easy items” are those that are most likely to be answered
in the affirmative by people with low skin cancer risk and hard items are those that most
likely to be answered by people with high skin cancer risk only. The concept of easy-hard
items is similar to the concept of relative risk or risk ratio in epidemiology; a “hard” item will
have a larger risk ratio compared to an “easy” item. An example of a polytomous item with
five category options measuring frequency of sunscreen usage: never, rarely, sometimes,
often and always. A person who choses never will have a higher risk ratio compared to a
people who selects always as their answer. In IRT, the answers that people give to questions
will place them along the unobservable latent trait continuum depending on their
characteristics. For example, Figure 1.1 displays a person with low skin cancer risk called
Alex. Hypothetically, he should have dark hair, and be an indoor worker who has not had
sunburn in the last five years. Whereas Bob, a person with high skin cancer risk, is
hypothesised to have with ancestors from Ireland, light skin, blue eyes, and many freckles.
This illustrates one of the main benefits of IRT, that for each item, and indeed each answer
category of each item, one should know where on the underlying skin cancer risk they
measure.
Chapter 1: Introduction 3
Figure 1.1-1: Illustration of IRT
As mentioned above, Queensland has the highest incidence of melanoma in the world,13,50-52
and is ideally placed to study the natural history, risk factors, and treatment patterns of skin
cancer. Research conducted in this area of high incidence, and therefore interest in skin
cancer more broadly, provides an ideal opportunity for initial item validation and assessment
of the psychometrics properties of new measures and comparison with other skin cancer-
related measures previously used in Queensland, Australia, and worldwide.
BRIEF LITERATURE REVIEW
1.2.1 Skin cancer-related measures
Sun exposure, such as sunbathing, sun bed use, and other types of exposure to ultraviolet
radiation (UVR), as well as resulting sunburn are the major preventable risk factors for skin
cancer.12,53-56 Many studies have been conducted worldwide in an attempt to gather detailed
information regarding what contributes to skin cancer risk or protection, including sun
exposure behaviours, sun protection behaviours, and knowledge and concern about vitamin
D.8,10,13,57-59
Self-reported measures have been used frequently in those studies as the preferred method to
obtain skin cancer risk factor information, as they are considered more convenient and less
burdensome than other methods (sun diary, sunscreen swabbing, direct observation, or UV
personal dosimeter). This is evidenced in Table 1.1, which summarises the studies conducted
during the last 10 years that assessed psychometric properties of skin cancer-related
measures. These studies,24,46,47,53 attempted to develop measures to assess skin cancer risk,
4 Chapter 1: Introduction
solar UVR exposure, or sun protection behaviour using classical psychometric theory. The
results in Table 1.1 show that only one study applied an IRT approach to assess the
psychometrics properties of their measures.48
However, previous research has at least two major limitations. First, the majority of studies
were not comprehensive, and only focused on a few aspects of phenotype, sun exposure, or
sun protection behaviours, often due to time or budget constraints.39 Second, time and effort
spent on measuring design and development was often limited, and little methodological
research has been conducted to ascertain the appropriate comprehensive validity and
reliability testing of the questions used in skin cancer research studies.60
Bränström et al60 and Horsburgh-McLeod et al48 are two examples of the few studies33,53,61-63
that have focused more strongly on the psychometrics properties in their work. Bränström et
al60 investigated the stability (test-retest reliability) of measuring behaviours and attitudes
related to sun exposure. They found that items assessing people’s skin type and tendency to
burn showed moderate stability (Kw =0.67 to 0.81), while items assessing self-efficacy and
risk perception with regards to sunbathing were less stable (Kw = 0.40 to 0.73). Horsburgh-
McLeod et al48 applied IRT to a suntan attitude scale. They examined attitudes toward sun-
tanning among 6,200 New Zealand adults (15-69 years) using seven items to be answered on
five-point Likert-type scales. In this study, they used Rasch rating scale models to assess the
construct validity of a scale on attitudes towards sun tanning. Based on their results, the scale
had acceptable fit with the Rasch model (infit and outfit statistics (0.6-1.4)),64 with the
exception of one item (infit =1.96; outfit=2.20), which fell outside the acceptable range.
Although IRT has not been used extensively during questionnaire development for the
assessment of skin cancer risk, it is widely used in other areas. Several large scale projects are
currently underway that have developed large databases of IRT tested item banks. Examples
include the Patient-Reported Outcome Measurement Information System,65 Programme for
International Student Assessment,66 and Trends in International Mathematics and Science
Study.67 IRT is increasingly used in the development of scales in other areas of health
research, for example: (i) psychological scales such as: the Postpartum Depression Screening
Scale,68 the Hamilton Depression Rating Scale,69 Positive and Negative Syndrome Scale,70
and NEO five factor inventory;71 (ii) selection tests such as Test of English as a Foreign
Language,72 Graduate Record Examination,73 Graduate Management Admission Test;74,75 and
(iii) licensure examinations such as the national nurse licensure examinations76 and National
Board of Medical Examiners tests.77
Chapter 1: Introduction 5
Table 1.1: Studies examining the psychometrics properties of skin cancer related measures
No Author Measured variable(s) Methods No of items Psychometrics analysis performed
1 Oh et al., 2004221 Sun protection -Self-report 43 items -Inter-rater agreement (Kappa)
0.76 to 0.97
2 Tripp et al., 200333 Sun protection -Self-report 42 items -Confirmatory Factor Analysis
- Sunscreen-use behavioural scale
(CFI = 0.94; GFI = 0.93)
-Sun-avoidance scale (CFI = 0.91;
GFI = 0.98)
3 Glanz, et al., 200946 Sunscreen use -Self-report
-Diary
-Swabbing
N/A -Inter-rater agreement (Kappa)
Children : 0.40
Lifeguards : 0.34
Parents : 0.27
4 Jennings, et al., 201253
- Sun exposure
- Sun protective
practices
-Self-report 15 items -Test retest reliability (Kappa):
0.35 to 1.00
-Construct validation (logistic
regression)
-Internal consistency (Cronbach
alpha) : 0.77 to 0.80
5 Bränström, et al., 200260 Behaviours and attitudes
toward sun exposure
-Self-report 33 items -Test retest reliability (Kappa and
Pearson) : 0.81, 0.88 and 0.71
6 Dusza, Oliveria, Geller,
Marghoob, & Halpern,
200562
- Sun exposure
- Sun protective
practices
-Self-report 10 items -Kappa 0.52 to 0.73
7 Horsburgh-McLeod, et al.,
2010 48
Attitude toward suntan -Self-report 7 items -Internal consistency (Cronbach
alpha) : 0.77
-Validity (Spearman & Pearson)
-Rasch
-2- PL IRT models
-3-PL IRT models.
8 Morze et al., 2012 63
-Phenotypic
characteristics
-Sun exposure
-Self-report
37 items -Intraclass correlation coefficient
-Kappa : 0.87
6 Chapter 1: Introduction
No Author Measured variable(s) -Methods No of items Analysis
9 O'Riordan, Glanz, Gies, &
Elliott, 2008217
-Sun exposure
-Sun protective practices
-Self-report
-Sunscreen swabbing
-Direct observation
-Diary
-Polysulphone dosimeters
N/A -Kappa 0.21 to 0.72
-Anova
10 Cargill et al., 2013 165 -Sun exposure
-Skin pigmentation
-Sun diary
-UV dosimeter
N/A Correlation
11 Thieden, Philipsen, &Wulf,
2006 246
-Sun exposure -Sun diary
-UV dosimeter
N/A -Mann-Whitney U
-Wilcoxon
12 Hedges &Scriven, 201047 -Attitude and behaviour
to sun protection
-Interview
-Observation
N/A -Chi-square
13 Humayun et al., 201224 -Sun exposure -Questionnaire (interviewer
administered and self-
administered)
-UV dosimeter
N/A -Correlation
14 Detert, Hedlund, Anderson,
Rodvall, Festin, Whiteman,
Falk 61
-Sun exposure and
protection index (SEPI).
-Readiness to Alter Sun
Protective Behaviour
questionnaire (RASP-B)
-Questionnaire SEPI :
8 + 5 items
RASP-B :
12 items
-Cronbach Alpha (0.69 – 0.73)
Chapter 1: Introduction 7
1.2.2 Brief overview of Test Theory
The history of classical test theory (CTT) began in the early 20th century, when in
1904 Charles Spearman demonstrated how to obtain an index of reliability to correct
a correlation coefficient for attenuation due to measurement error.78 This formula
later become known as Spearman–Brown prophecy formula.79 The CTT model has
been used for around a century, as the model is simple and many researchers know
the basic terms and procedures, which makes classical test theory easy to apply and
interpret.
CTT is based on the proposition that the observed score is composed of a true score
(called a latent variable) and a measurement error, it postulates that if the
measurement error is zero then the observed score should be equal to the true score.79
CTT attempts to estimate the true score by reducing the measurement error of a test
as a whole. CTT contributes to the science of test development by providing a
framework to assess some aspects of the quality of measurement (e.g. test reliability).
Although CTT has been used extensively in test development and assessment of
quality of measurement, the method has shortcomings; CTT has weak assumptions
regarding its framework80 In CTT, the individual ability parameter is dependent on a
given question tested under a given situation, and the item difficulty parameter also
depends on a specific group of participants assumed to be a representative sample of
a given population. The inherent characteristics of sample dependence and item
dependence in CTT makes it impossible to predict an individuals’ response to an
item unless that item has been previously administered to similar individuals.81 Other
limitations include; (a) an error estimate that is assumed to be constant (common)
across all raw scores; and (b) due to the focus on the overall test score, no model
(theory) that allows the prediction of probability of success or failure (or probability
to endorse) of a given item by an individual with a given ability (latent construct)
estimate.82 These limitations make it unsuitable to implement in advance
psychometrics such as computer adaptive testing or test equating (more details
regarding applications of these methods is given in pages 19-21), which require
detailed information about each individual item’s performance.
IRT was first proposed by George Rasch in Denmark and Birnbaum in the United
States83 to overcome the limitations of CTT. It provides an alternative to CTT by
proposing a different approach in constructing new tests, modelling existing
8 Chapter 1: Introduction
scales/tests, interpreting the results of an assessment, and the quality of
measurement.80,84 IRT is a family of mathematical models that describe the
probability of a person answering a certain question as a function of a person’s
position on the latent trait plus one or more parameters (1 parameter to 3 parameter
logistic models are described on pages 25 to 28) characterising that particular item.85
The history of IRT can be traced back to the work of Binet (1905) and Thurstone
(1925).86 For a brief history of IRT and a review of the reason for the transition from
CTT, see Baker,87 Bock,86 and Hambleton.82
1.2.3 Differences between Classical Test Theory and Item Response
Theory
Classical test theory focuses on the analysis of the total score keeping all items in a
predetermined order to retain the reliability of a whole test and use the frequency of
correct responses to indicate item difficulty (see page 7 for detail).79 Compared with
classical psychometrics, IRT has several advantages.88,89 Below are some of the key
differences between classical test theory and IRT:
1. Model: In CTT, the correlation between the number of items answered
and the underlying construct is assumed to be linear. IRT assumes the
model is nonlinear. As a consequence, the mathematical equations
describing the association between a respondent’s underlying level on a
latent trait and the probability of a particular item response follows a
nonlinear monotonic function.90 The correspondence between the
predicted responses to an item and the latent trait is known as the item
characteristic curve; more detail on the item characteristic curve is given
on pages 12-15. The numbers of item parameter(s) considered in IRT
models depends on the model being used, as explained on pages 25 - 30.
2. Level of analysis: The advantages of IRT over CTT are specifically at the
item level.91 CTT usually focusses on the test as a whole. Internal
reliability indices (such as Cronbach alpha) are most frequently reported as
an indicator of test quality overall. Cronbach alpha indices can also be
used to indicate which items can safely be deleted without compromising
the reliability index, although this is rarely used in practice. In IRT, a
greater number of item characteristics are considered, including whether:
1) items fit with a particular model (1 parameter logistic model, 2
Chapter 1: Introduction 9
parameter logistic model, 3 parameter logistic model, partial credit model
(PCM), rating scale model (RSM) or other); and 2) items advantages or
disadvantages certain groups of people (differential item functioning/DIF).
DIF analysis refers to differences in the way a test item functions across
different subgroups of participants (e.g. male and female) that are matched
(equal) on the attribute measured by the test.92 Consideration of DIF is
important to establish the adequacy of each question for use in diverse
populations.93 In the context of skin cancer and sun protection behaviour,
persons of different ages, education levels, and genders who have equal
levels of sun protection should be equally likely to endorse a particular
category of a specific sun protection item. For example, males and females
who are equal in their levels of sun protection should be equally likely to
respond ‘‘yes” to the item: “Do you routinely apply sunscreen, including
moisturisers or makeup with a sun protective factor, regardless of whether
or not you are going out in the sun?”. However, the literature shows that
males and females differ in their sun protection behaviour, especially in
sunscreen use,94 and differential item functioning may be expected on
items similar to this example. During item development, care must
therefore be taken to construct well performing items, which allow people
with the same risk (in item test theory usually termed “ability” or
symbolised with the Greek letter: ) to not differ according to their gender
or other characteristics not associated with the latent construct. Ideally, the
probabilities of endorsing a specified question responses should be
independent of subgroup membership.95,96 To illustrate this, Figure 1.1-2
shows the item characteristic curve (explained in more detail on pages 12-
14) of two subgroups of participants. The blue line represents the reference
group (outdoor workers) and the red line represents the focal group (indoor
workers). The blue line always remains above the red line, which means
that outdoor workers always have a higher probability of getting skin
cancer compared to indoor workers (in this particular item), regardless of
their skin cancer risk. When items function similarly across demographic
groups (do not exhibit differential item functioning), direct comparison of
group scores is justified. If an item is differentially more difficult to
10 Chapter 1: Introduction
endorse for an identifiable subgroup, the item may be measuring
something different from the intended construct, at least in one of the
groups. As a result, DIF statistics are used to identify potential sources of
item bias.97-99 Subsequent review by subject matter experts and bias
committees is required to determine and resolve the source of attitude or
behaviour differences. Items with high DIF either need to be changed,
dropped or split for each group.
There are two types of differential item functioning: the first is called
“uniform differential item functioning”,92 where differential item
functioning is consistent across the range of the domain being measured.
Figure 1.1-2 shows a uniform DIF where the reference group is favoured
at all levels.
Figure 1.1-2: Uniform differential item functioning
The second type of DIF is “non-uniform differential item functioning”,92
where its impact can vary at different levels of the construct being
measured. As shown in Figure 1.1-3, in a non-uniform DIF curve, the focal
group is favoured at low theta (skin cancer risk construct); however, the
reference group is favoured at high theta. This means males have a lower
probability of getting skin cancer compared to females at low theta, but
they will have higher probability of getting skin cancer compared to
female at high theta.
Chapter 1: Introduction 11
Figure 1.1-3: Non-uniform differential item functioning
3. Model assumptions: According to Hambleton & Jones80 the CTT model
represents a group of weak theoretical assumptions, as it is easy to apply in
many test constructions and test utilisations,100,101 The assumptions in the
CTT are that: (a) true score and error scores are uncorrelated, (b) the
average error score in the population of examinees is zero, and (c) error
scores on parallel tests are uncorrelated.80,102 In contrast, IRT models are
referred to as strong models,103 as the underlying assumptions are strict,
and therefore less likely to be met by test data. Most applications of IRT
assume unidimensionality of the latent construct being measured, and all
models require local independence of each item.104 Unidimensionality
means that only one underlying construct is measured by the items in a
scale. Local independence means that the items are not highly correlated
with each other once the latent trait has been controlled for. In other
words, local independence is obtained when the complete latent trait space
is specified in the model.84,105 If the assumption of unidimensionality
holds, then only the underlying latent trait is influencing item responses
and local independence is obtained.
4. Item-ability (response) relationship: The relationship between item and
ability (response) in CTT is not specified; however, in item response
models, this relationship must follow a specific item response function. In
CTT, the relationship between item and ability (item characteristics) might
change depending on the population administered a questionnaire.80,106
12 Chapter 1: Introduction
Borrowing an example from educational assessment, if a high-ability
subpopulation (e.g. high-achieving students) answered a test, all items
would appear to be easy. On the other hand, when a low-ability
subpopulation (e.g. low-achieving students) is considered, the same set of
items would be classed as “difficult”. This limitation makes it difficult to
assess individuals’ abilities by using different test-forms. The terms “easy
item”, “hard item” and “item difficulty”, which are used frequently
throughout this thesis, stem from the educational assessment literature,
where IRT was developed, and are commonly used.107 “Easy items” are
items that are most likely answered correctly by most participants and
therefore are most useful in determining the people who have lower
abilities. Meanwhile, “difficult items” are most likely answered correctly
by a small number of high performing individuals and are useful in
determining people who has high abilities. In health-related studies, easy
items are the items that most likely answered in the affirmative by most
participants, and hard items are the items most likely answered by people
with certain characteristics, for example, only those with high skin cancer
risk.
5. The central concern in IRT is the relationship between the latent construct
(trait) being measured and the probabilities of respondents endorsing each
of the item’s response categories. In order to show this relationship, an
item response function can be drawn, called as an item characteristics
curve.
Figure 1.1-4 below illustrates the item characteristics curves for two
dichotomous items measuring sun protection behaviours. This example
uses the 3-parameter logistic (3-PL) model item response function
discussed further on page 28. The horizontal axis is the underlying
construct being measured in a logit scale, in this example, it is assumed
that this is sun protection behaviour. A positive score means better sun
protection behaviour. In educational assessment (the origin of IRT), the Y
axis represents the probability of answering the item correctly; however, in
other fields such as psychology and health, y represents the probability of
endorsing an item. Each plot (item characteristic curve) represents the
Chapter 1: Introduction 13
models’ prediction of the probability of answering “Yes” to an item about
sun protection behaviour. The figure shows that the probability of
answering yes to item 1 (use sunscreen) is higher than to item 2 (wear a
hat). Item 1 can be said to be the easier item to be endorsed, as the
probability of answering yes to the question is already high for people who
are low in their overall sun protection behaviour. In contrast, item 2 (wear
a hat) can be said to be harder item to be endorsed, as only people with
high sun protection behaviours are likely to say yes to that item.
Figure 1.1-4: Item Characteristic Curve from 2 dichotomous items
It can also be seen that item 1 (use sunscreen) is more informative for
people with poor (low) sun protective behaviour, and item 2 (wear a hat) is
the more informative item for people with good (high) sun protective
behaviour. This information about items’ measurement location allows the
administration of items with maximum information only for particular
groups of people, and is commonly applied in computer adaptive testing,
discussed on pages 19-20 and in Study 3 (page 82).
For polytomous items such as Likert scales, the item characteristic curves
are more complex. In polytomous IRT models, the response function is
called a category response function. In Figure 1.1-5, each curved line
represents the model’s estimate of the probability of performing each
category of a given activity according to overall sun protection probability.
The horizontal axis is the scaled score of sun protection behaviour on a T
scale (a standard with a mean of 50 and a standard deviation of 10). A
14 Chapter 1: Introduction
higher score means better sun protection behaviour. The vertical axis is the
probability of endorsing a category, ranging from 0 to 1.
How often did you apply sunscreen during the past year?
Figure 1.1-5: Category response function of a polytomous item
Item-ability relationships can also be observed by plotting item
information functions, as shown in Figure 1.1-6. Each item provides
information at different trait () levels. In Figure 1.1-6, each curved line
represents the item information function for one item. Each item carries
different information at different trait theta () levels. For example, item 2
(pink line) provides the most information for people with around 1; in
contrast item 5 provides the most information for people with around 2.
It is useful to include a range of items in a scale to ascertain coverage of a
wide . CTT cannot provide such detailed information about the optimal
measurement location for each item.77
Low High
Sun protection behaviour
Chapter 1: Introduction 15
Figure 1.1-6: Item information functions for 8 items
6. Ability: In CTT, ability () or test scores are often reported on the test-
score scale (or a transformed test score scale such as T scale, Stanine, etc.)
and usually calculated by adding up the score from each item for an
overall average. Every person must answer the same items and complete
all of them in the same order to allow comparison of scores.79,80 In
contrast, in IRT, ability scores are reported as theta () scores ranging
from – to + (or a transformed scale). Theta scores can still be
compared, even when people answered different items, as long those items
were calibrated on the same scale and their item information function is
known.49,84,108,109 This leads to the next point: invariance of item and
person statistics.
7. Invariance of item and person statistics (only applies to the Rasch
model): In CTT, item and person parameters are sample dependent.80,84
This means the item difficulties are dependent on the ability of the sample
answering the questions, and the score of a person also depends on the
number of items and item difficulties of items answered by that person.
For example, as the formula to calculate item difficulty is the number of
persons answering the item correctly divided by total number of
participants,102 the same items will have different item difficulties if
administered to a group of people who like to sunbathe compared to sun
16 Chapter 1: Introduction
avoiders. In Rasch models, the item and person parameter are sample
independent (their position on the latent trait can be estimated by any items
with known item response functions, and item characteristics are
population-independent within a linear transformation),80,84,101 if the test
data fits the model. This means that the person’s ability (score) is not
dependent on the particular item being answered (administered). A person
can answer any combination of items from the item bank and should still
receive an equivalent score.
8. Item statistics: p (item difficulty) and r (item total correlation) are usually
reported in CTT.80,102 Item difficulty is the proportion of people endorsing
an item, or the prevalence of exposure, and is dependent on sample
calibration (see point 6 above). In item response models, b (item difficulty
parameter), a (item discriminant parameter), and c (pseudo-guessing
parameter), plus the corresponding item information functions are
reported,87,104 (as described on page 28).
9. Sample size requirements: CTT usually requires a sample of 200 to 500
to allow adequate item parameter estimation;80 however, due to the greater
number of parameters, item response models require larger samples
(generally over 500) depending on the model being used.80 More complex
models will require even larger samples to allow robust item parameter
estimation104 This is one of the drawbacks of IRT; recruiting a large
sample can be challenging, especially in clinical research.110
10. Assumption of equal response category distance: In CTT, the distances
between successive response categories are assumed to be equal, while in
IRT the distances between successive response categories are derived from
the data.111 For instance, the distance between “Strongly Disagree” to
“Disagree” may not be the same as the distance between “Agree” to
“Strongly Agree”. Consider a four-point scale on which an individual is
asked to indicate their lifetime sunburn with 1 (never), 2 (seldom), 3
(sometimes) and 4 (often) as the possible answer categories. An individual
who was never exposed long enough to get sunburn would likely indicate a
very low frequency of sunburn (e.g., a response of 1). As the individual’s
frequency and duration of sun exposure increases past a threshold values,
Chapter 1: Introduction 17
he/she will likely endorse the next highest category. IRT (especially rating
scale models) does not assume that these steps (threshold) are equally
spaced. In other words, a relatively small increase in sun exposure duration
and frequency may underlie a person to cross the threshold from choosing
the ‘never’ rather than ‘seldom’ category, but a much larger increase may
be required for a person to choose ‘sometimes’ compare to ‘often’. IRT
specifically assesses these distances,112-114 (example displayed in Figure
1.1-5).
11. Standard Error of Measurement: In CTT, which assumes that the
standard error of measurement (SEM = s√(1 − 𝑟) ) is constant across all
levels of ability (latent trait),106,115 people with low, medium, or high
ability are assigned the same standard error of measurement. In contrast, as
shown in Figure 1.1-7, the SEM in IRT is conditional (dependent) on a
person’s ability. This conditional SEM is an inverted function of the test
information function, and estimates the amount of error in theta estimation
for each level of theta. Summed across the items, the conditional SEM
provides a useful index of the amount of measurement error from a test.115
Usually the conditional SEM will be high at both ends of a test score and
low in the middle area; this is because the conditional SEM will increase
in parallel to the decrease of test information. Usually both ends of the
continuum of the item information function have the least information.
Assume Figure 1.1-7 measures skin cancer risk and people can score
between -3 to 3 theta. IRT shows that the standard error of measurement of
the scale is much greater between -3 and -1 and also between +2 and +3,
but is small between -1 and 2. The item information function in Figure
1.1-7 shows that the highest level of information (or certainty in the
estimate) is in regards to people who score between -1 and +2. The
conditional SEM provides a good summary overview for a questionnaire
developer, clearly indicating where additional items require development,
and also allows the creation of a tailored test with a pre-specified
acceptable measurement error.80,115
18 Chapter 1: Introduction
Figure 1.1-7: Standard error of measurement
12. Number of response categories: In CTT, responses to several Likert-like
items can only be summed providing that all of the items use the same
Likert scale (e.g: a four-point Likert scale and six-point Likert scales
cannot be mixed to calculate an average).79,102,116 However in IRT, certain
models, as described on page 29, allow the use of different response
categories in the same scale.106 This is important, as a questionnaire
sometimes consists of items measuring the same underlying construct on
different Likert scales.
13. Detection of redundant item using item statistics: IRT models can
detect redundant items from their item parameters,83 in contrast with item
statistics of CTT, which only provide information regarding how strong
the correlation between the item and total score is, or how well the item
discriminates between people with low or high levels of the construct
being measured.79 Items that have the same item parameter location (b
parameter) on the latent trait continuum can be interpreted as measuring
the same level of latent construct and seen as redundant. In developing an
item bank, these redundant items can be useful as alternative items in
parallel forms or to enhance the items available for CAT.
Standard Error Measurement
Test Information Function
Chapter 1: Introduction 19
14. Test administration: Most currently used health-assessment
questionnaires are based on CTT.117 Therefore, all items or questions must
be administered to every person tested in order to retain the validity of the
scale. Missing items or responses will be an issue when determining the
total score of an individual.118 An advantage of IRT over CTT is the ability
to administer the test using computer adaptive testing (CAT). This mode of
test administration enables shortening of the test or use of a tailored test
(where individuals may receive different items targeted to their specific
health risk level).119,120 CAT-based item administration results in shorter
assessments without the trade-off of losing measurement precision.121,122
However, a fully functioning CAT requires a large, calibrated item
bank,123 and is considered costly to develop.
Another important component related to test administration using CAT
methods is the stopping rule. One of most common stopping rules used is
the minimum SEM required.124 As discussed earlier, in IRT the SEM can
vary across different trait levels.125 Figure 1.1-8 illustrates the use of a
conditional SEM as one of the stopping rules when using CAT for
diagnostic purposes. The red line represents the cut-off score, the black
squares represent the participant’s latent trait estimate, and black vertical
lines represent the confidence intervals around these measurements. For
example, the first item in a CAT measuring skin cancer risk is the
following: “Thinking about ALL of the times when you were outside in the
sun during the past year, how often did you apply sunscreen?” Assuming
that this person answers the first question by selecting the “Always”
category, this will result in their estimated skin cancer risk being
calculated as below average. As the person answers more and more
questions, the confidence interval around the point estimate for this
person’s skin cancer risk gets smaller as the risk estimate becomes more
precise. Around item 30, the confidence interval is clearly below the cut-
off score, but further questions are still being presented to cover all
content. In CAT, other stopping rules could be selected, for example,
desired confidence interval, minimum length or maximum length of
testing, or the run out of time rule.126 In many operational CAT programs,
20 Chapter 1: Introduction
a combination of two or more stopping (test termination) rules is used,
usually a minimum standard error criterion and maximum test length. The
maximum test length serves to ensure that the entire item bank is not
administered.124
Figure 1.1-8: Illustration of computer adaptive testing
15. Test linking and equating: If two or more questionnaires measure the
same concepts, it can be advantageous to link or equate them.127,128 The
objective of test linking is to establish a common reporting metric between
two or more tests that allows for the prediction of success (or endorsing in
non-cognitive items) on construct-linked items.129,130 Examples of recent
linking studies in patient reported outcome research are: 1. Linking
between the physical and mental health scores on the Veterans RAND 12-
Item Health Survey and the Patient Reported Outcomes Measurement
Information System Global Health scores,131 and 2. Linking between NIH
Patient Reported Outcomes Measurement Information System Physical
Function item bank and the Short Form-36 physical function ten-item PF
scale.132 Only traditional equating methods (linear linking and
equipercentile equating) can be performed under a CTT approach. This
can be a limitation, as these methods require that the same person
completes both tests.133 Item response models non-linear linking and
equating does not require the person to complete both tests, it only requires
Stop evaluation
High Risk
RiskRisDepressio
Low Risk
Depression
Chapter 1: Introduction 21
some common items (as anchor items for calibration) in each of the test to
be completed.128 Although this method is frequently used in educational
assessment, to date no single study in skin cancer-research has applied this
method.
Figure 1.1-9 demonstrates how test linking with single group design in
skin cancer research could be performed; three scales that measure aspects
of the same underlying latent construct (e.g. sun exposure behaviours) are
completed by participants.128 Using IRT approaches, these scales can be
linked and put in the same common scale, as long as they have some
common items. This means once these linking functions are established,
scores from one scale can be converted to another.
Figure 1.1-9: Sample of test linking
22 Chapter 1: Introduction
Table 1.2 below provides a summary of methodological distinctions between CTT
and IRT.80
Table 1.2: Differences between Classical and Item Response Theories
Area Classical test theory Item response theory
1. Model Linear Non linear
2. Level Test Item
3. Assumptions Weak (i.e., easy to meet with
test data)
Strong (i.e., difficult to meet with test data)
4. Item-ability relationship Not specified Item characteristics function
5. Ability Test scores or estimated true
scores are reported on the test-
score scale (or a transformed
test-score scale)
Ability scores are reported on the scale – to
+ (or a transformed scale)
6. Invariance of item and
person statistics
No – item and person
parameters are sample
dependent
Yes – item and person parameters are sample
independent, if model fits the test data
7. Item statistics p (item difficulty)
the item discrimination index
(item total correlation)
b, a, and c (for three-parameter model) plus
corresponding item information functions*
8. Sample size (for item
parameter estimation)
200 to 500 (in general) Depends on IRT model but larger samples,
i.e., over 500, in general, are needed
9. Assumptions of item/
response categories distance
Equivalence Not equivalence and has true interval-level
10. Standard error of
measurement (SEM)
Constant across ability Conditional on person ability
11. Number of response
categories
Same response categories Possible to use different response categories
in the same scale
12. Detection of redundant
item using item statistics
Not possible Possible
13. Test/scale administration Need to administer whole
items
Possible to administer shorter test or tailored
test (Computer Adaptive Test)
14. Test linking and equating Traditional equating methods:
Linear linking and
equipercentile equating
Non-linear equating: IRT linking and IRT
equating
*a = item difficulty, b = item discrimination, c = item guessing
Chapter 1: Introduction 23
Given this summary it could be beneficial to apply an IRT approach for the
evaluation of psychometric properties of questionnaires used in skin cancer research.
In addition, there would be merit to developing a scale measuring skin cancer risk
(phenotype, sun exposure behaviours, and sun protection behaviours) using modern
test theory. By using IRT models, a precise measurement with small standard errors
could be achieved, requiring fewer items to be completed, thus significantly reducing
the participants’ burden and the potential sample size required in future studies.134-136
1.2.4 Test Development using Classical Test Theory and Item
Response Theory
In test development processes, both CTT and IRT have similar and different steps.
Table 1.3 displays the typical steps in test development. Important differences
between test development using classical and item response measurement theories
occur at steps 3, 5, and 9 80.
Table 1.3: Steps in test development
Steps in test development
Step 1 Preparation of test specifications
Step 2 Preparation of the item pool
Step 3* Field testing the items
Step 4 Revision of the test items
Step 5* Test development
Step 6 Pilot testing
Step 7 Final test development
Step 8 Test administration (for norming and technical data)
Step 9 * Technical analyses (e.g., compiling norms, standard setting, and equating
scores)
Step 10 Preparation of administrative instructions and technical manual
Step 11 Printing and distribution of tests and manuals
In step 3, test developers applying CTT are concerned about the representativeness of
the overall population for whom the test is intended.137,138 They are using simple
mathematical techniques, a moderate sample size and heterogeneous samples to
24 Chapter 1: Introduction
achieve higher estimates of item discrimination indices as measured by biserial or
point-biserial correlation coefficients.79 In contrast, the test developer applying IRT
requires complex mathematical techniques and large sample sizes.
In step 5, CTT items are selected in the test based on two indices: item difficulty
(prevalence of exposure) and item discrimination. An item with too high or too low
an item difficulty is considered a poor item and must be removed from the test to
maximise discriminations among all test takers. Meanwhile in IRT, items are
selected based on goodness-of-fit criteria to detect those that do not fit the specified
response model. Test developers can determine the contribution of each test item to
the test the information function independently of other items in the test. This means
that test developers can create multiple forms of a test to maximise test information
targeted at specific regions of latent construct.81,109,139 Items at either extreme of the
latent construct may still be valuable, even if they only are relevant to a few people
and help to discriminate them.
In step 9, a test developer using CTT will typically compile norms based on specific
demographic information, such as gender and age. A person’s score must be
compared against the performance results of a selected group of participants who
have already taken the test,79,140 In contrast, in IRT, a person’s score is usually
interpreted in regards to their level of proficiency (in achievement test) or severity (in
health-related test) and the cut scores corresponding to those levels. This process is
known as standard setting,141,142
In summary, the major differences in test development using CTT and IRT are in
item calibration, selection, and scoring processes 80.
Popular models in item response theory
More than ten IRT models143 have currently been defined, as shown in Table 1.4;
however, only a few models are frequently implemented in applied research. Those
models are the Rasch model/1-parameter logistic model, 2-parameter logistic model,
3-parameter logistic model, rating scale model, and partial credit model. Each of
those frequently used models is discussed in the following paragraphs.
Chapter 1: Introduction 25
Table 1.4: Taxonomy of IRT Models
Dichotomous Data Polytomous Data
1-Parameter Logistic Model /Rasch model(1-
PLM)
Rating Scale Model (RSM)
2-Parameter Logistic Model /Birnbaum
model(2-PLM)
Partial Credit Model (PCM)
3-Parameter Logistic Model (3-PLM) Generalised Partial Credit Model (G-
PCM)
4-Parameter Logistic Model (4-PLM) One-Parameter Logistic Model for
polytomous items (OPLM-po)
One-Parameter Logistic Model with Imputed
Slopes (OPLM)
Rating Scale version of the Graded
Response Model (RS-GRM)
1-Parameter Normal Ogive Model (1-PNOM) Graded Response Model (GRM)
2-Parameter Normal Ogive Model (2-PNOM) Model of Monotone Homogeneity for
polytomous items (MHM-po)
3-Parameter Normal Ogive Model (3-PNOM) Weak Double Monotonicity Model
(WEAK DMM)
Model of Monotic Homogeneity for
Dichotomous Data (MHM-di)
Strong Double Monotonicity Model
(STRONG DMM)
Model of Double Monotonicity for
Dichotomous Data (DMM-di)
Isotonic Ordinal Probabilistic Model
(ISOP)
1.2.5 1-Parameter Logistic (1-PL) Model or The Rasch Model
The simplest, and one of the most widely used IRT models is the 1-parameter logistic
model (1-PL model). It is also called the Rasch model. Despite the Rasch model
being derived from the initial Poisson model and its conceptual differences with one-
parameter logistic model, for most practical purposes these models are identical. In
the 1950s, the Danish mathematician, Georg Rasch developed this model for reading
tests and a model for intelligence and achievement tests, which is called the Rasch
model. It is so called one parameter model because this model is only concerned with
a single item parameter (i.e., item difficulty (b) parameter). Under the Rasch model,
both guessing and discrimination are negligible or constant. It predicts the
probability of a response to an item based on the interaction between item difficulty
and individual ability 144. The 1-PL model can be expressed by equation 1 below 144.
Equation 1
Where Pi () is the probability (e) of an individual with a given ability theta ()
correctly answering (or endorsing) a particular item with a difficulty level (b
parameter). This probability represents the interaction between the person’s ability
( )
( )( )
1
i
i
b
i b
eP
e
26 Chapter 1: Introduction
and the item difficulty. An item (question) difficulty (threshold) represents the
position in logits that the item occupies on the linear skin cancer risk scale.145 It
could be described visually in an item characteristics curve (also called item response
functions or trace lines), as explained previously. Figure 1.1-10 presents an example
of item characteristics curves for the 1-PL model. The four curves in the figure
represent four items with different difficulties. It can be seen that for a person with a
given ability between a range between -3 and +3, the probability of getting the
answer correct is only determined by each item’s difficulty or location on the latent
construct continuum (b1= -2, b2= -1, b3=1, and b3=2).
Figure 1.1-10: One-Parameter Item Characteristic Curves for Four Typical Items
1.2.6 2-Parameter Logistic (2-PL) Model
An extension of the 1-PL model is the two-parameter logistic model (2-PL) model.
This model was originally developed by Lord146 based on the normal ogive function
(the curve of a cumulative distribution function) function and Birnbaum 147 then
proposed a logistic function as a simpler alternative, because calculation using
normal ogives function was considered too computationally demanding for the
computers in that era (1960s).
In addition to item difficulty (b parameter), this model incorporates the item
discrimination parameter (usually denoted a) as a second item parameter. Item
discrimination (a parameter) is represented by the slope of the item characteristics
Chapter 1: Introduction 27
curve. The usual range for item discrimination parameters is (0,2). High values of a
parameter result in very “steep” item characteristics curves and such items are more
discriminating than items with flatter curves. The equation for 2-PL model 144 is
Equation 2
Where the Pi () and bi are defined just as in Equation 1. The D factor is a constant
(scaling) factor to make the logistic function as close as possible to the normal ogive
function. The difference in Pi () for the two parameter ogive function and 2-
parameter logistic function is less than 0.01 when D=1.7 (D is a constant). The
second additional element in the two parameter model is the parameter ai, which is
the item discrimination parameter (slope). The relevant item characteristics curves in
Figure 1.1-11 facilitate understanding of the 2-PL model
Figure 1.1-11: 2-Parameter Item Characteristic Curves for Four Typical Items
Figure 1.1-11 shows that, unlike the item characteristic curves of the 1-PL in which
the curves have the same slope (usually fixed to 1, see Figure 1.1-10), the curves in
the 2-PL model have different slopes.
Figure 1.1-11 presents four items with different item discriminations. For Item 1,
b1=1.0 and a1=1.0; for item 2, b2=1.0 and a2=0.5; for item 3, b3= -1.0 and a3=1.5; for
item 4, b4=0 and a4=1.3. Both items 1 and 2 have the same item difficulty (b=1) but
( )
( )( )
1
i i
i i
Da b
i Da b
eP
e
28 Chapter 1: Introduction
they differ in item discrimination (a1=1.0 and a2=0.5); thus, item 1 has higher
discrimination than item 2). In the 2-PL model, the probability of a person with a
given ability level to get the right answer is determined by the item difficulty (b
parameter) and the item discrimination (a parameter) simultaneously.
1.2.7 3-Parameter Logistic (3-PL) Model
The three-parameter logistic (3-PL) model extends the 2-PL model by adding a third
item parameter in the model, a guessing parameter (also known as a pseudo-chance,
usually denoted c). The c parameter is the probability of endorsing the item for a
person with “zero symptoms”; it is the low point of the item characteristics curve as
it nears - on the horizontal axis. The 3-PL model144 can be expressed by the
following equation.
Equation 3
Where the Pi (), bi, ai and D are defined as for the 2PL-model and ci parameter
(pseudo-chance) is the third item parameter. Figure 1.1-12 shows an example item
characteristic curve for the 3-PL model. Item 1, b1=1.0, a1 =1.8 and c1=0.1; for item
2, b2= -1.5, a2 =1.8 and c2=0.5; for item 3, b3=1.0, a3 =1.8 and c3=0.3; for item 4,
b4=2.0, a4 =1.8 and c4=0.1. Both item 1 and item 4 have the same item discrimination
and pseudo chance (b=1.8 and c=0.1), but they differ in item difficulty (b1=1.0 and
b2=2.0); meanwhile item 1 and item 3 represent two items with similar item
difficulty (b =1.0) and item discrimination (a =1.8), but that differ in the pseudo
chance parameter (c1=0.1 and c3=0.3). It can be seen that the probability of getting
these two items right by guessing for a person with “zero ability” is different.
( )
( )( ) (1 )
1
i i
i i
Da b
i i i Da b
eP c c
e
Chapter 1: Introduction 29
Figure 1.1-12: 3-Parameter Item Characteristic Curves for Four Typical Items
1.2.8 Partial Credit Model
The partial credit models (PCM) [148] are an extension of Rasch dichotomous
models for polytomous data (items with more than two response categories). To
illustrate this, consider an example item used in Study 5: “What was your natural hair
colour at the age 18 years?” It would be expected that people with highest level of
skin cancer risk would get a score of 3 and people with lowest level of skin cancer
risk would get a score of 0. The scoring for this item is: 3 for red hair, 2 for blonde
hair, 1 for brown hair, and 0 for black hair. The general equation for the partial credit
model is [113].
Equation 4
Where Pig is the probability of responding in a specific item category for the partial
credit model, b is the location parameter, and g is the category boundary.
0
0
( )
( )
0
( )
l
igg
h
igg
b
igbm
h
eP
e
30 Chapter 1: Introduction
1.2.9 Rating Scale Model
The rating scale model (RSM)148 is a restricted version of the partial credit model,
where the distances between adjacent step difficulties are the same across all items.
The general equation for the rating scale model 113 is:
Equation 5
Where h=0,1,…,g,…,m, and g represent the specific category being modelled from
among m+1 category. Where bi is the item location parameter estimated for each
individual item in a scale and g are the threshold parameters that define the
boundary between the categories of the rating scale. g are estimated once for the
entire set of items. Likert-type questionnaires are commonly scored using the RSM.
All items in the Skin Self-Examination Attitude Scale have five ordered response
categories from strongly disagree, disagree, neither agree nor disagree, agree, and
strongly agree. The lowest categories will be scored 0 and the highest category will
be scored 4 (maximum category minus 1). For example, an item from Study 1 taken
from the Skin Self-Examination Attitude Scale 149 “Checking my skin regularly is a
priority for me”. Participant who answered strongly agree will get score of 4. Similar
to CTT, RSMs assume the distances between response categories are the same.
RSMs are commonly used for questionnaires with Likert type scales 64.
CHOOSING A MODEL
As described above, there are many IRT models, and the investigator is faced with
the decision of which one to use. Research to be conducted in the future beyond this
thesis, will explore other IRT models on current items as suggested by the examiner.
One determining factor is the type of item. For example, an item with two categories,
such as yes/no, would usually be analysed using a dichotomous model. For
dichotomous items, the data can be analysed either using 1-PL, 2-PL, or 3-PL IRT
model. Researchers decide which model to use based on a prior assumption
regarding whether the items have more than one parameter to be estimated. If there is
0
0
[( )]
[( )]
0
( )
l
i gg
h
i gg
b
igbm
h
eP
e
Chapter 1: Introduction 31
a high likelihood for guessing, then it is appropriate to use 3-PL model, which would
however seem unnecessary in the context of skin cancer risk, where guessing is
unlikely. The 3-PL model is only appropriate in a situation where multiple choice
items are used, and most health applications don’t usually require guessing
parameters.
For items with more than two answer categories, the polytomous model (rating scale
model, partial credit model or other) are more appropriate to use (see Table 1.4 for a
summary of available IRT models). In educational assessment, the 3-PL model is the
most common choice for multiple-choice items, as it is reasonable to assume a
person without the required knowledge will have a non-zero probability of choosing
the correct answer through chance alone.150 However, the Rasch model is often
chosen because it has some desirable mathematical properties that cannot be obtained
with IRT models (e.g. invariance, sufficient statistics of raw score, etc.) 151 The
Rasch model, the raw (observed) score is seen as containing sufficient statistics for
ability (). This means that all individuals with the same raw score will have the
same estimated latent score (). In analysing any data, the parsimonious principle152
should be followed, a simple model that can be estimated more accurately usually
produces better results than using a complex model (more parameters) that is
estimated poorly.153 The main advantage of the Rasch model is its parsimony,
allowing models to be successfully converged with a smaller sample required than 2-
PL or 3-PL models 154.
For this PhD program of research, various unidimensional Rasch models were used,
including the dichotomous model (Study 3), rating scale model (Studies 1 and 3), and
partial credit model (Studies 2, 3, 4, and 5) to investigate how each model can be
applied to different datasets. The selection of the models depended on the available
item characteristics in each study.
LIMITATIONS OF ITEM RESPONSE THEORY
Although there are many advantages of using IRT in analysing items or developing a
new scale, there are some limitations. First, there are a number of restrictive
assumptions144 of using the models that are difficult to meet. Second, large sample
sizes (both in terms of items and respondents) are required during scale development.
32 Chapter 1: Introduction
There is no exact guideline on sample size requirements for IRT analysis, samples
over 500 are generally recommended.80
Third, the lack of training and user-friendly computer programs to perform IRT
analysis is a barrier,112 although nowadays this is becoming less of an issue, because
some IRT software packages are now starting to provide user-friendly graphical
interfaces with point and click capability.155,156 Fourth, IRT has been developed and
used extensively in large scale educational assessment and knowledge about this
method is less commonly used in the development and assessment of health-related
scales and measures, especially in skin cancer-related research. Many researchers are
therefore not aware of the advantages and are not using the IRT approach as an
alternative to CTT. Considering those factors, the purpose of this thesis is to
introduce and familiarise IRT methods to researchers in epidemiology and public
health by demonstrating its application in skin cancer research.
PURPOSE OF THIS DOCTORAL WORK
The overarching purpose of this thesis was to:
1. Demonstrate the applications of IRT to skin cancer-related measurement
by examining the psychometric properties of existing skin-cancer-related
questionnaires. To achieve this goal, access to several large datasets was
negotiated from Centre of Research Excellence in Sun and Health
(CRESH) investigators (from QUTs Skin Awareness Study, Cancer
Council Queensland’s Melanoma Screening trial, and Melanoma Case-
Control Study, QUT and Australian National University’s AusD study,
and QIMR Berghofer’s QSkin study). Secondary data analyses of the
questionnaire data from all of these studies was conducted and calibrated
using a range of IRT models, including the dichotomous model, rating
scale model, and partial credit model (see Figure 1.1-13). Items with good
psychometric properties were integrated, retained, or modified and used in
the final study (Study 5).
2. To develop a new measure of skin cancer risk utilising IRT.
The final study’s (Study 5) objective was to develop a set of measures with
good psychometrics and IRT properties to measure: a) phenotype, b) sun
Chapter 1: Introduction 33
exposure behaviour, and c) sun protection behaviour. The significance of
this approach is described on page 34.
Figure 1.1-13: Overview of current doctoral work
RESEARCH QUESTIONS
The present study aimed to address the following research questions related to the
measurement of skin cancer risk:
1. Is it possible to calibrate existing attitude and behaviour scales related to
risk of skin cancer using item response theory approaches? (Studies 1, 3
and 4)
2. Can item response theory analysis be used to show the potential impact of
self-reported behaviour change on skin cancer risk due to concerns
regarding vitamin D? (Study 2)
3. Can item response theory reduce participants’ burden when measuring
skin cancer risk by using computer adaptive tests? (Study 3)
4. How much efficiency does a computer adaptive test offer compared to
non-adaptive testing? (Study 3)
5. Can an item response theory calibrated skin cancer risk scales predict
future development of non-melanoma skin cancer? (Study 4)
34 Chapter 1: Introduction
6. Is it possible to develop a skin cancer risk scale (SCRS) using an approach
grounded in item response theory that integrates all indicators of skin
cancer risk? (Study 5)
7. To what extent does the newly developed ‘Skin Cancer Risk Score’ fit the
Rasch model, and how well can it predict self-reported skin cancer
history? (Study 5).
SIGNIFICANCE OF THE THESIS
The research conducted and outcomes of this thesis have the potential to make
several important contributions to scholarship and practice within the field of skin
cancer research, public health, and psychometrics. The significance of the thesis is
summarised in the following points:
1. The IRT based skin cancer risk scale developed through this PhD provides
an interval scaled measure that facilitates interpretation of skin cancer risk
and comparisons among people. Peoples’ skin cancer risk can currently be
ordered based on various indicators (e.g. eye colour). People with blue
eyes have higher skin cancer risk compared to those with brown eyes;
however, their exact position on the skin cancer risk continuum is not yet
known. This will be more complicated if more indicators are added, for
example: Does a person with blue eyes, wearing sun screen, and working
indoors have a higher risk compared to a person with brown eyes, who
never wears sun screen, and works outdoors? Calibrating these items on a
common IRT scale allows the combination of these various indicators and
creates one single score for skin cancer risk, allowing further comparison
of the differences in skin cancer risk levels between persons and items.
2. An IRT scale provides a sample distribution-free and item distribution-free
measure.82,157 In other words, there is no need for any specific reference
norm to provide a person’s percentile.79,158 This is in contrast to CTT,
where a score must be interpreted in regards to the sample calibration
(obtained from a normative sample) to become meaningful. In IRT, both
persons and skin cancer risk indicators (items) can be directly located on
the common skin cancer risk scale,49,87,159 making it easy to compare
Chapter 1: Introduction 35
peoples’ risk of different skin cancer risk indicators, as long as the
indicator is calibrated on the scale.
3. A thoroughly developed scale that is in line, and fits, the Rasch model
makes it possible to construct an overall skin cancer risk indicator that
summarises a person’s skin cancer risk in different components (e.g., hair
colour, eye colour, skin colour, etc.). If the overall skin cancer risk
indicator works well, a person’s overall skin cancer risk levels could be
calibrated on the common scale, even if he/she answers only some of the
items from the whole scale (see Study 3 for more detail). This is important
in order to reduce the participants’ burden of completing long skin cancer
related questionnaires in the future.
4. This study was the first to develop and validate a comprehensive skin
cancer risk scale for Australian adults using an IRT approach. Skin cancer
risk assessment of this population is invaluable to understand the
contribution that sun exposure and sun protection behaviours make in
addition to phenotype, in order for preventive strategies to be
implemented. By applying the latest methods in a psychometrics approach,
this new measure is expected to improve the credibility of skin cancer
research findings in future studies.
From a practice standpoint, the results of this research will be directly relevant to the
Australian public health by providing a measure that allows more accurate
estimation, with known measurement quality and standard errors, of peoples’
phenotype, sun exposure behaviour, and sun protection behaviour. In so doing,
people may be better informed about their personal risk and may take precautionary
action to better protect themselves from the sun.
THESIS OUTLINE
The rest of this dissertation is organised as around a collection of five studies (four
already published and one currently under peer-review). The first research study is
“Evaluation of a skin self-examination attitude scale using an IRT model approach.”
The purpose of this study is to illustrate applications of IRT in analysing survey data
in public health setting, especially in skin cancer research. General assumptions and
36 Chapter 1: Introduction
characteristics of IRT models, such as unidimensionality and item fit are also
discussed. This study was published in Health and Quality of Life Outcomes.
The second research study, “Changes in self-reported sun-protection behaviours due
to concern about vitamin D status,” studies behaviour changes due to vitamin D
concern that may increase future skin cancer risk. The study examines a cross-
sectional survey across the four seasons (2009-10), where latitudes ranging from 19-
43°S assessed vitamin D attitudes and changes in sun protection behaviours out of
vitamin D concern. IRT was used to illustrate the potential effect of changing sun-
protection behaviour due to concern about vitamin D. This study was published in
Photochemistry and Photobiology.
The third research study, “Advantages of Mobile Computer-Adaptive Testing (CAT)
to Quickly Estimate Skin Cancer Risk,” studied the use of CAT to reduce response
burden in skin cancer risk assessment. The study compared the efficiency of non-
adaptive test and computer adaptive testing facilitated by a partial credit model
derived calibration. This study was published in the Journal of Medical Internet
Research.
The fourth research study, “Diagnostic Discrimination of Skin Cancer Risk Scale”
examined psychometrics properties of an existing skin cancer risk questionnaire and
assessed the scale’s ability to predict prospective skin cancer. The partial credit IRT
model was used in this study. The result was presented as a full paper at the
International Outcome Measurement Conference (Chicago, April 2015).
While the first four studies applied IRT to existing scales, the last used IRT to select
the items expected to measure skin cancer risk will and subjected them to use these
in an original sample of participants. The final (fifth) research study, “Development
and Psychometrics Evaluation of Skin Cancer Risk Scale Utilising IRT”, constructed
a skin cancer risk scale utilising modern test theory approach. The study combined
the best questions from existing skin cancer questionnaires and calibrated them using
a partial credit model IRT to create an underlying construct of skin cancer risk. A
draft manuscript has been prepared.
QUT Verified Signature
40 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
Evaluation of a skin self-examination attitude scale using an item response
theory model approach
This chapter includes a peer-reviewed journal article published in Health and Quality
of Life Outcomes. This article evaluates the psychometrics properties of The Skin
Self-Examination Attitude Scale, a brief measure that allows for the assessment of
attitudes in relation to skin self-examination. A Rating Scale Model was applied to
the data.
Djaja, N., Youl, P., Aitken, J., & Janda, M. (2014). Evaluation of a
skin self-examination attitude scale using an item response theory
model approach. Health and Quality of Life Outcomes, 12(1), 189.
doi:10.1186/s12955-014-0189-x
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
41
ABSTRACT
Introduction: The Skin Self-Examination Attitude Scale (SSEAS) is a brief measure
that allows for the assessment of attitudes in relation to skin self-examination. This
study evaluated the psychometric properties of the SSEAS using Item Response
Theory (IRT) methods in a large sample of men ≥ 50 years in Queensland, Australia.
Methods: A sample of 831 men (420 intervention and 411 control) completed a
telephone assessment at the13-month follow-up of a randomised-controlled trial of a
video-based intervention to improve skin self-examination (SSE) behaviour.
Descriptive statistics (mean, standard deviation, item–total correlations, and
Cronbach’s alpha) were compiled and difficulty parameters were computed with
Winsteps using the polytomous Rasch Rating Scale Model (RRSM). An item person
(Wright) map of the SSEAS was examined for content coverage and item targeting.
Results: The SSEAS have good psychometric properties including good internal
consistency (Cronbach’s alpha = 0.80), fit with the model and no evidence for
differential item functioning (DIF) due to experimental trial grouping was detected.
Conclusions: The present study confirms the SSEA scale as a brief, useful and
reliable tool for assessing attitudes towards skin self-examination in a population of
men 50 years or older in Queensland, Australia. The 8-item scale shows
unidimensionality, allowing levels of SSE attitude, and the item difficulties, to be
ranked on a single continuous scale. In terms of clinical practice, it is very important
to assess skin cancer self-examination attitude to identify people who may need a
more extensive intervention to allow early detection of skin cancer.
Keywords: Skin cancer, Skin self-examination, Attitude scale, Item response theory,
Rating scale, Rasch model
42 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
INTRODUCTION
Melanoma is the fourth most common cancer among men and women in Australia.
Men aged 50 years or older are more likely than other groups to be diagnosed with
thick melanomas and have the highest mortality [1]. Skin self- examination (SSE)
has been shown to increase the detection of thin melanoma [2-4]. A case-control
study in the United States found a 60% reduced risk of melanoma mortality (OR
0.37; 95% CI = 0.16-0.84) in people who examined their own skin [4]. While the US
Preventive Services Task Force currently does not recommend population- based
screening for skin cancer due to the absence of randomised trials investigating the
mortality benefit of screening [5], the American Cancer Society does recommend
that adults perform SSE monthly [6] and Australian Cancer Councils suggest SSE at
three-monthly intervals [7]. SSE may be one method of identifying suspicious skin
lesions early, particularly given that patients are more likely to detect their own
melanomas [8]. A large case- control study conducted in Queensland, Australia
found that melanomas detected during deliberate SSE compared to those found
incidentally, were thinner [9]. As about half of all melanomas occur on parts of the
body that are difficult to see (especially the back) [10], it has been suggested that
whole-body SSE is necessary to optimise melanoma detection rate [11].
While melanoma incidence and mortality is highest in men 50 years or older, this
group is less likely to detect their own melanomas and were less likely to undergo
whole-body clinical skin examination compared to other population groups [12,13].
Both of which could contribute to their higher melanoma mortality rates. The
increased risk of thick melanoma in this group may be due, at least in part, to low
awareness and uptake of early detection behaviours, including SSE.
Several aspects of SSE are under-researched, and few studies have measured factors
which may contribute to whether or not people conduct SSE. One study by Manne
and Lessin [14], who developed a 17-item SSE benefits and barriers scale, found
only barriers (but no benefits) were associated with SSE performance in melanoma
survivors. The authors suggested that melanoma survivors rely strongly on their
doctors’ recommendation, minimising the impact of their personal attitudes, and that
further assessment among the general population is needed. Swetter et al [15] found
that SSE awareness (defined as having heard about the ABCD rule, reading about
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
43
skin cancer detection, and requesting information about skin cancer detection from
doctor) of female spouses of men with melanoma was significantly higher than that
of the men themselves.
We previously used several attitude or outcome expectation items within a large
study of melanoma screening, and found that positive attitudes was strongly
associated with intention to conduct SSE in the future [16]. However, the
psychometric qualities of the measure as a whole have not been assessed.
Measurement of subjective and latent constructs like SSE attitudes requires
rigorously developed and tested instruments in order to obtain data of the highest
possible quality. While in the past questionnaire quality including reliability and
validity was often assessed using classical psychometric approaches, increasingly the
advantages of item response theory (IRT) methods, including allowing more precise
estimates, assessment of unidimensionality, adaptive testing and assessment of
differential item functioning have been recognised. IRT methods are now applied to
measurement tools across a wide variety of health outcomes [17-21]. It was the aim
of this study to evaluate the measurement properties and unidimensionality of the
SSE attitude scale using a Rasch modelling approach.
METHODS
To examine measurement properties of the SSE attitudes scale we used data
collected from the Skin Awareness study [22]. The primary aim of that study was to
examine the impact of a video-delivered intervention with two mailed reminder
postcards compared to a written-materials- only control group on the prevalence of
SSE in men aged 50 years or older. The primary hypothesis was that the prevalence
of SSE in the video intervention group would increase by at least 10% more than in
the control. A 10% increase was determined as the minimal change deemed to be
clinically significant. Approval for this study was obtained from the Queensland
University of Technology ethics committee, and the trial was registered with the
Australian New Zealand Clinical Trials Registry (ANZCTR N12608000384358).
Trial methods and baseline participant characteristics as well as primary and
secondary outcomes have previously been re- ported in detail [22-24].
44 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
Study population
In total, 5000 potential participants (men aged 50 or older) were randomly selected
from the Australian electoral roll (enrolling to vote is compulsory in Australia), of
which 2899 potential participants with a valid telephone number were contacted by
mail. The study pack included a letter of invitation and a colored brochure featuring a
well-known sports and TV personality, with follow-up of non-respondents via one
postal reminder and up to two follow-up phone calls. Men who were too ill, could
not speak English, or had a previous history of melanoma were excluded. The overall
consent rate was 37% (969 of
2610 eligible); however, 39 men withdrew before the study began, leaving a final
sample of 930 men who were randomised to the control or intervention condition.
Men completed telephone interviews at baseline, at 7 and 13 months after receiving
either the video intervention or written brochures only control package.
For the present analysis, we used data from 831 men who completed the 13-month
assessment time point. Similar to factor analysis, where a minimal sample size of 10
is required per item by convention, a minimum sample size of 250 is generally
requested for analyses such as those conducted here [25].
Skin self-examination attitude scale
The skin self-examination attitude scale (SSEAS) developed, and previously used, in
a large community-based pilot trial of skin cancer screening [16], and was modified
for the Skin Awareness study to include items measuring SSE outcome expectancy
and planning for future SSE. The SSEAS includes a list of 10 items, answered on a
five point Likert scale ranging from strongly disagree, disagree, unsure, agree, and
strongly agree (all items listed in Table 2.1). The total score of the SSEAS can vary
between 0 and 40, where 0 indicate low and 40 high SSE attitudes. Good reliability
for the scales was found when assessing its internal consistency (Cronbach alpha
.80).
Data analysis
To test the measurement quality of the SSEAS beyond classical test theory, item
response theory (IRT) modelling was applied. In brief, IRT model measures the
relationship between an individual’s ability and an item difficulty, and models this as
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
45
a probabilistic function. Specifically, raw data from a rating scale are converted to an
“equal interval scale” in logits (log odd units), reflecting the item difficulty and
individual’s ability [26,27]. Data were analysed using the Winsteps Rasch
Measurement [28]. To analyse the SSEAS, with 5 answer options per item, the
polytomous Rasch Rating Scale Model (RRSM) was used.
The following data quality parameters were assessed:
Dimensionality analysis
We assessed whether the data derived from the men’s answers fitted the Rasch model
in order to assess unidimensionality of the underlying trait. To assess the fit of the
data to the Rasch model, item difficulty and fit statistics were calculated for each
item.
Item difficulty
The difficulty of each SSEA item is its point on SSEA logits – when SSEA is
expressed as a unidimensional continuum. For polytomous scales including the
SSEAS, this is the point at which each answer category has a 50% probability of
being endorsed. Winsteps ranks the items in a hierarchical order based on their item
difficulty. The item at the top has high item difficulty and thus is difficult for people
to endorse; the item at the bottom of the rank is an easy-to-endorse item. Item
difficulty is calculated in logits and placed on a linear interval continuum. The higher
the logit is, the more the item measures at a high SSEA difficulty level.
Item fit statistics
To determine item fit statistics, infit and outfit mean square (MNSQ) statistics were
calculated, which specify how well each item fits the Rasch model. Infit and outfit
MNSQ values should range from 0.6 to 1.4 [29]. These fit statistics represent the
difference between expected responses and observed responses. An item perfectly
fits with the model if they have MNSQ of 1. Values less than
1.0 (overfit) show the model predicts the data too well - causing summary statistics
(e.g., reliability), to report inflated statistics. Meanwhile values greater than 1.0
(underfit) show unmodeled noise (there is other source of variance in the data) -
these will degrade measurement.
46 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
The infit and outfit MNSQ represents the unstandarised degree of fit of data
observation to the Rasch model expected responses. While the infit MNSQ is
sensitive to unexpected patterns, the outfit MNSQ statistic is more sensitive to
outliers.
Differential item functioning (DIF)
DIF was assessed to examine if the intervention condition had an effect on the
hierarchy of item difficulties. Rasch assumes the hierarchy of the items to be the
same across groups: it should work uniformly, irrespective of groups, in our case, for
men in the intervention and control groups. For example, if an item is invariant
across groups, the item with the lowest difficulty on the SSEA continuum for the
intervention group has also the lowest difficulty for the control group. Instead of
calculating the item difficulties for the whole group, in DIF analysis they are now
calculated separately (per group). The current study used a multi-step method of
initially flagging items for potential DIF using the Mantel chi-square statistic,
followed by confirmation of DIF with two other tests (Standardised Liu-Agresti
Cummulative Common Log-Odds Ratio (LOR Z) and Standardised Cox’s
Noncentrality Parameter (COX Z)). All MH-based statistics were computed using
DIFAS 5.0 [30].
RESULTS
SSEAS data was available from 831 participants, 411 (49.5%) control group
participants with a mean SSEAS score of 4.1 (SD 0.49) and 420 (50.5%) intervention
group participants (mean SSEAS score of 4.1 (SD 0.50).
Unidimensionality
The Rasch analysis showed good reliability. Item reliability (replicability of item
placements along the scale) was 0.98 and person reliability was 0.68. Individual item
difficulty level ranged from – .58 to .54 logits, with a mean ± standard deviation
(SD) of 0 ± 0.41. Whereas person measures had a mean ± SD of 1.71 ± 1.40,
indicating that the items did not adequately target the SSEA levels of this sample.
Results of the unidimensionality analysis are shown in Table 2.1.
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
47
Item difficulty
Item difficulty estimates found that the easiest item to endorse for the participants
was item SSE_1 (- 0.58): “It is important to check my skin for skin cancer even if I
have no symptoms” while the most difficult item to endorse was item SSE_3 (0.54):
“Checking my skin regularly is a priority for me”.
Items SSE_3 and SSE_9 both had about the same item difficulty of 0.54 logits and
0.53 logits, with evidence for overlap between the items and thus redundancy of
items. In addition, items SSE_4 (0.23) and SSE_8 (0.18) measure a similar level of
SSEA evidenced by a separation distance of only 0.05 logits.
We also assessed the spread of item difficulty using the item-person map (Wright
map) displayed in Figure 2.1. This map indicates both the distribution of
participants’ SSEA propensity scores, and item difficulty levels. Both the items and
responses are displayed on a logit scale; respondents with the same SSEA propensity
scores as the item difficulty have a 50% chance of endorsing the item. The left hand
side of Figure 2.1 shows the distribution of respondents’ level of SSEA, people with
a higher SSEA are placed in the higher position and people with lower SSEA are
placed in the lower positions. The right hand side shows the distribution of item
calibrations, items reflecting higher SSE attitude are placed in higher position and
items reflecting a lower SSEA level are placed in lower positions.
M is the mean value (the default value of participants mean is set to 0), while S
labels one standard deviation and T labels two standard deviations of the item and
person distribution. The map shows that the participants’ average SSEA mean was
1.71 logit above the items’ mean, implying that participants have a high level of
SSEA.
48 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
Figure 2.1: Wright map/item person map of Skin Self-Examination Attitude Scale
with the mean theta of person on the left and mean theta of items on the right.
Content coverage and item targeting
A ceiling effect was evident in the results displayed in Figure 2.1, with many
participants located in the upper part of the map, and few items located in the
corresponding level. The SSEA of this sample was higher than that reflected in the
items. The mean of item measures was more than 1 standard deviation lower than the
mean of person measures, which indicates that all items were easily endorsed by this
sample, and additional items with greater difficulty are needed to complement the
scale.
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
49
Table 2.1: Item total correlation, fit statistics and item difficulty for the 10-item Skin
Self-Examination Attitude Scale
Item Total
Correlation
Mean Square Item
difficulty
(SE) Infit Outfit
SSE_1 It is important to check my
skin for skin cancer even if
I have no symptoms
0.457 0.92 1.03 - 0.58
(0.07)
SSE_2* I think checking my skin
would make me anxious*
0.081 - - -
SSE_3 Checking my skin regularly
is a priority for me
0.526 1.05 1.25 0.54
(0.05)
SSE_4 I think I could find
something suspicious on my
skin if it was there
0.495 0.99
1.06 0.23
(0.06)
SSE_5 If I saw something
suspicious on my skin, I'd
go to the doctor straight
away
0.446 1.03 1.06 -0.36
(0.06)
SSE_6 I am confident in a doctor's
ability to diagnose skin
cancer
0.373 1.20 1.32 -0.07
(0.06)
SSE_7** I have made plans on when
to examine my own skin*
0.461 - - -
SSE_8 I am confident that I can
take up examining my own
skin again even if I have not
looked at my skin in the
past few months
0.579 0.82 0.81 0.18
(0.06)
SSE_9 I am able to keep examining
my own skin regularly,
even if I have no one to
help me
0.474 1.03 1.34 0.53
(0.06)
SSE_10 If I regularly examine my
skin, then I am helping to
look after my own health
0.582 0.75 0.67 -0.46
(0.07)
*Item was removed due to low item total correlation
**Item was removed during calibration due to fit statistics beyond acceptable range
Item fit statistics
After an iterative process of calibration, all items of the SSEAS except SSE_2: “I
think checking my skin would make me anxious” and SSE_7: “I have made plans on
when to examine my own skin” were found to have inadequate MNSQ infit and
outfit outside the recommended values 0.6 and 1.4 [26] (Table 2.1) and overall did
not met the fit criteria. SSE_2 also did not contribute to the measurement of a
unidimensional construct.
50 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
Differential item functioning (DIF) assessment
The eight item SSEA scale was used to assess DIF by group condition (DVD
intervention and control). Result of DIF analysis is presented in Table 2.2, and
revealed that none of the eight items showed DIF according to participants’ group
condition.
Table 2.2: DIF statistics for the 8-item skin self-examination attitude scale
Mantel1 LORZ2 COX Z2
SSE_1 0.870 0.954 0.931
SSE_3 0.268 0.524 0.519
SSE_4 0.719 -0.842 -0.849
SSE_5 0.177 -0.422 -0.419
SSE_6 0.238 -0.485 -0.491
SSE_8 0.063 0.253 0.250
SSE_9 0.814 -0.904 -0.903
SSE_10 1.540 1.238 1.240 1Critical values of this statistic are 3.84 for a Type I error rate of 0.05 2A value greater than 2.0 or less than –2.0 may be considered evidence of the presence of DIF.
DISCUSSION
Regular monthly or 3-monthly SSE is currently recommended by a number of cancer
control agencies, particularly for those at high risk such as older men who carry the
greatest skin cancer burden of skin cancer. SSE could improve skin awareness and
rapid clinical skin examination. In combination this has potential to reduce the
physical burden, including mortality, caused by late diagnosis of melanoma [31,32].
Studies have shown that melanomas detected during a deliberate SSE rather than
found accidentally are thinner [2,33]. Attitudes towards SSE form an important
component in explaining the likelihood of conducting an SSE [34].
IRT has been used widely in evaluation education and health measures [19,35,36].
The current study used IRT analysis to further assess the psychometric properties of
the SSEAS. Data were analysed using the Rasch Rating Scale Model [37], which has
ideal metric properties for ranking an individual’s ability (the level of the attribute
measured) along with the item difficulty on a common scale. This Rasch model
allows for the comparison of individuals regardless of items used in the measurement
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
51
[38]. It also enables the generation of a joint measurement (common scale) of items
and people, provided that the data is fitted to the model’s requirements.
In this study, the overall fit statistics and reliabilities of the SSEAS were satisfactory.
However, the spread of item difficulty was not satisfactory, with most items located
on the lower end of the scale. This means the SSEAS will give more accurate
information for individuals who have low skin-self-examination attitude. Two items
(item SSE_2 and item SSE_7) did not perform as expected and were removed to
achieve better fit to the Rasch model expectations. This suggested that those two
items may be measuring a different domain of SSE. Item SSE_2: "I think checking
my skin would make me anxious" was suspected to measure anxiety rather than
attitude. Item SSE_7: "I have made plans on when to examine my own skin"
probably measures the planning aspect of SSE.
This item could form a separate scale with additional planning items that address the
specific aspects of optimal SSE performance (such as having a partner to help, or
having available a full size and hand held mirror, or good lighting) have been added.
The distribution of the SSEAS items reflects a wide range of individual differences,
with the average level of this trait in the current sample being higher than the average
difficulty level of the items. The difficulty level of the 8 items reflected a narrow
range of levels of skin self-examination attitude among men ≥ 50 years, thus not
allowing for the optimal discrimination of more positive attitude in this sample.
The DIF analysis according to study group showed that the functioning of the 8 items
on the SSEAS was consistent, and was considered equally difficult, for both
intervention and control groups. The items were sufficiently robust to allow for the
assessment of SSE attitudes regardless of the participant’s group. Thus, the answers
only quantified the individual’s level of SSE attitude, which was measured according
to the difficulty of the items and not because of other constructs explained by the
participant’s subgroup.
The present study has some limitations. People with a high risk of skin cancer may
feel social pressure to report higher SSEA than others, and this may have resulted in
a positive reporting bias to our SSEA score. Although there is no objective measure
of SSE, adding a social desirability scale such as the Marlowe-Crowne social
desirability scale [39] in future studies could allow assessment of the SSEA scale
52 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
against this criterion. The addition of more high SSEA items to extend the difficulty
range of the measure may also help to improve the scale. Finally, our sample
consisted entirely of men aged 50 years or older. Future research should examine
whether these results also hold for the broader population including sample from
other states in Australia, women and younger age groups.
CONCLUSION
Overall, the present study confirms the SSEA scale as a brief, useful and reliable tool
for assessing attitudes towards skin self-examination in a population of men 50 years
or older in Queensland, Australia. The 8-item scale shows unidimensionality,
allowing levels of SSE attitude, and the item difficulties, to be ranked on a single
continuous scale. In terms of clinical utility, the skin awareness scale can identify
people who may need a more extensive intervention. Clinician can encourage these
people to start skin self-examination regularly looking for any abnormal growth or
unusual changes, so they can have a better chance for a cure.
Competing interests
The authors declare they have no competing interests.
Authors’ contributions
ND drafted the original manuscript, data analysis and interpretation. PY, JA, MJ
were involved in the conception, design of the study and acquisition of data. All
authors were involved in the review of draft manuscripts and read and approved a
final version prior to submission.
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
53
REFERENCES
1. Geller AC, Swetter SM, Brooks K, Demierre MF, Yaroch AL. Screening,
early detection, and trends for melanoma: current status (2000-2006) and
future directions. J Am AcadDermatol. 2007; 57:555–572.
2. Berwick M, Begg CB, Fine JA, Roush GC, Barnhill RL. Screening for
cutaneous melanoma by skin self-examination. J Natl Cancer Inst. 1996;
88:17–23.
3. Carli P, De Giorgi V, Palli D, et al. Dermatologist detection and skin self-
examination are associated with thinner melanomas: results from a survey of
the Italian Multidisciplinary Group on Melanoma. Arch Dermatol. 2003;
139(5):607–612.
4. Berwick M, Armstrong BK, Ben-Porat L, et al. Sun exposure and mortality
from melanoma. J Natl Cancer Inst. 2005; 97:195–199.
5. United States Preventive Services Task Force. Screening for skin cancer:
Recommendations and rationale. Am J Prev Med. 2001; 20(3 Suppl):44–46.
6. Skin Cancer Prevention and Early Detection. [http://www.cancer.org/
cancer/cancercauses/sunanduvexposure/skincancerpreventionandearly
detection/skin-cancer-prevention-and-early-detection-toc]
7. National CancerPrevention Policy. Ultraviolet radiation. [http://wiki.
cancer.org.au/policy/UV/Effective_interventions/Melanoma_screening]
8. Baade PD, Balanda KP, Lowe JB. Changes in skin protection behaviors,
attitudes, and sunburn: in a population with the highest incidence of skin
cancer in the world. Cancer Detect Prev. 1995; 20:566–575.
9. Baade PD, Youl PH, English DR, Mark Elwood J, Aitken JF. Clinical
pathwaystodiagnose melanoma: a population-based study. Melanoma Res.
2007; 17:243–249.
10. Youl PH, Janda M, Aitken JF, Del Mar CB, Whiteman DC, Baade PD. Body-
site distribution of skin cancer, pre-malignant and commonbenign pigmented
lesions excised in general practice. Br J Dermatol. 2011; 165:35–43.
11. Weinstock MA, Martin RA, Risica PM, et al. Thorough skin examination for
the early detection of melanoma. Am J PrevMed. 1999; 17:169–175.
12. Janda M, Youl PH, Lowe JB, et al. What motivates men age > or =50 years to
participate in a screening program for melanoma? Cancer. 2006; 107:815–
823.
13. Kasparian NA, McLoone JK, Meiser B. Skin cancer-related prevention and
screening behaviors: a review of the literature. J Behav Med. 2009; 32:406–
428.
14. Manne S, Lessin S. Prevalence and correlates of sun protection and skin self-
examination practices among cutaneous malignant melanomasurvivors. J
Behav Med. 2006; 29:419–434.
54 Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model
Approach
15. Swetter SM, Layton CJ, Johnson TM, Brooks KR, Miller DR, Geller AC.
Gender differences in melanoma awareness and detection practices between
middle-aged and older men with melanoma and their female spouses. Arch
Dermatol. 2009; 145:488–490.
16. Janda M, Youl PH, Lowe JB, Elwood M, Ring IT, Aitken JF. Attitudes and
intentions in relation to skin checks for early signs of skin cancer. Prev Med.
2004; 39:11–18.
17. Velozo CA, Lai JS, Mallinson T, Hauselman E. Maintaining instrument
quality while reducing items: application of Rasch analysis to aself-report of
visual function. J Outcome Meas. 2000; 4:667–680.
18. Hawthorne G, Densley K, Pallant JF, Mortimer D, Segal L. Deriving utility
scores from the SF-36 health instrument using Rasch analysis. Qual Life Res.
2008; 17:1183–1193.
19. Belvedere SL, de Morton NA. Application of Rasch analysis in health care is
increasing and is applied for variable reasons in mobility instruments. J Clin
Epidemiol. 2010; 63:1287–1297.
20. Franchignoni F, Salaffi F, Giordano A, Carotti M, Ciapetti A, Ottonello M.
Rasch analysis of the 22 knee injury and osteoarthritis outcome score–
physical function items in Italian patients with kneeosteoarthritis. Arch Phys
Med Rehabil. 2013; 94:480–487.
21. Cook CE, Richardson JK, Pietrobon R, Braga L, Silva HM, Turner D.
Validation of the NHANES ADL scale in a sample of patients with report of
cervical pain: factor analysis, item response theory analysis, and line item
validity. Disability and rehabilitation. 2006 Jan 1;28(15):929-35.
22. Janda M, Baade PD, Youl PH, et al. The skin awareness study: promoting
thorough skin self-examination for skin cancer among men 50 years or older.
Contemp Clin Trials. 2010; 31:119–130.
23. Auster J, Neale R, Youl P, et al. Characteristics of men aged 50 years or older
who do not take up skin self-examination following an educational
intervention. Journal of the American Academy of Dermatology. 2012;
67:e57–e58.
24. Janda M, Neale RE, Youl P, Whiteman DC, Gordon L, Baade PD. Impact of
a video-based intervention to improve the prevalence of skin self-
examination in men 50 years or older: the randomized skin awareness trial.
Arch Dermatol. 2011; 147:799–806.
25. Linacre JM. Sample size and item calibration stability. Rasch Meas Trans
1994, 7:328.
26. Bond TG, Fox CM. Applying the Rasch Model: Fundamental Measurement
in the Human Sciences. New York: Routledge; 2012.
27. Fox CM, Jones JA. Uses of Rasch modelling in counselling psychology
research. J CounsPsychol. 1998; 45:30.
Chapter 2: Evaluation of Skin Self-examination Attitude Scale Using an Item Response Theory Model Approach
55
28. Linacre J. Winstep-Rasch Model Computer Program. Version 3.69. 1.16.
2010.
29. Wright BD, Linacre JM, Gustafson J, Martin-Lof P. Reasonable mean-square
fit values. Rasch Meas Trans. 1994; 8:370.
30. Penfield RD. DIFAS 5.0 - Differential Item Functioning Analysis System.
2012.
31. Kelly JW. Melanoma in the elderly–a neglected public health challenge. Med
J Aus.t 1998; 169:403.
32. Pollitt RA, Geller AC, Brooks DR, Johnson TM, Park ER, Swetter SM.
Efficacy of skin self-examination practices for early melanoma detection.
Cancer Epidemiol Biomarkers Prev. 2009; 18:3018–3023.
33. McPherson M, Elwood M, English DR, Baade PD, Youl PH, Aitken JF.
Presentation and detection of invasive melanoma in a high-risk population.
Journal of the American Academy of Dermatology. 2006; 54:783–792.
34. Auster J, Hurst C, Neale RE, et al. Determinants of uptake of whole-body
skin self-examination in older men. Behav Med. 2013; 39:36–43.
35. Lim SM, Rodger S, Brown T. Using Rasch analysis to establish the construct
validity of rehabilitation assessment tools. Int J TherRehabil. 2009; 16:251–
260.
36. Chen HF, Lin KC, Wu CY, Chen CL. Rasch validation and predictive
validity of the action research arm test in patients receiving stroke
rehabilitation. Arch Phys Med Rehabil. 2012; 93:1039–1045.
37. Andrich D. A rating formulation for ordered response categories.
Psychometrika. 1978; 43:561–573.
38. Andrich D. Rasch Models for Measurement. Thousand Oaks: Sage; 1988.
39. Fischer DG, Fick C. Measuring social desirability: short forms of the
Marlowe-Crowne social desirability scale. Educ Psychol Meas. 1993;
53:417–424.
QUT Verified Signature
58 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
Self-Reported Changes in Sun-Protection Behaviours at different latitudes in
Australia
This chapter includes a peer-reviewed journal article published in Photochemistry
and Photobiology. This article investigates attitudes toward vitamin D and changes
in sun-protection behaviours due to concern about adequate vitamin D among people
living at four different latitudes with very different UV radiation exposure levels in
Australia.
Djaja, N., Janda, M., Lucas, R. M., Harrison, S. L., van der Mei, I.,
Ebeling, P. R., Neale, R. E., Whiteman, D. C., Nowak, M., and
Kimlin, M. G. (2016). Self-Reported Changes in Sun-Protection
Behaviours at different latitudes in Australia. Photochemistry and
Photobiology. doi:10.1111/php.12582
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 59
ABSTRACT
Sun exposure is the most important source of vitamin D, but is also a risk factor for
skin cancer. This study investigated attitudes toward vitamin D, and changes in sun
exposure behaviour due to concern about adequate vitamin D. Participants (n=1,002)
were recruited from four regions of Australia and completed self- and interviewer-
administered surveys. Chi-square tests were used to assess associations between
participants’ latitude of residence, vitamin D-related attitudes and changes in sun
exposure behaviours during the last summer. Multivariate logistic regression
analyses were used to model the association between attitudes and behaviours.
Overall, people who worried about their vitamin D status were more likely to have
altered sun protection and spent more time in the sun people not concerned about
vitamin D. Concern about vitamin D was also more common with increasing
latitude. Use of novel Item Response Theory analysis highlighted the potential
impact of self-reported behaviour change on skin cancer predisposition due concern
to vitamin D. This cross sectional study shows that the strongest determinants of self-
reported sun-protection behaviour changes due to concerns about vitamin D were
attitudes and location, with people at higher latitudes worrying more.
Keywords: self-report; sun exposure; sun-protection behaviours; vitamin D; item
response theory; rasch; population-based study
60 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
INTRODUCTION
Exposure to ultraviolet (UV) radiation from the sun causes about 90% of the global
skin cancer burden (1-3). The International Agency for Research on Cancer
summarised the most recent evidence for the carcinogenicity of solar radiation.
While there are some differences in the patterns and timing of exposure that give rise
to different types of skin cancer, overall, greater sun exposure significantly increases
skin cancer risk (4). Therefore, minimising sun exposure or protecting the skin when
outdoors by using clothing, shade and sunscreen is recommended when the UV
Index is 3 (5).
Vitamin D is synthesised when the skin is exposed to sunlight, or is consumed in
vitamin D-containing foods (naturally or fortified) or supplements (6). Research
indicates that vitamin D deficiency may increase the risks not only of diseases of
bone, but may also contribute to a wide range of other adverse outcomes such as
cancer and immune-modulated diseases (7-10). This has led to interest in defining
the optimal level of vitamin D and determining how to best achieve such a level (11,
12). To overcome concerns that sun-protection practices may lead to vitamin D
deficiency, safe durations of unprotected sun exposure at different latitudes of
Australia have been proposed and sun protection message is not recommended when
the UV Index drops below 3 (13). Exactly how much sun exposure is required to
achieve sufficient levels of vitamin D is contentious, as there is little consensus on
the level considered ‘sufficient’ and vitamin D synthesis varies according to location,
time of year, time of day, weather, and personal factors such as skin type and body
mass index (14). In Australia, current recommendations for late autumn and winter in
those parts of Australia where the UV Index is below 3, are that sun protection is not
recommended (5, 15). During these times, to support vitamin D production it is
recommended that people are outdoors in the middle of the day with some skin
uncovered on most days of the week. Being physically active while outdoors will
further assist with vitamin D levels.
Consequently, health promotion messages for sun protection have become
complicated, with different messages conveyed for different latitudes, seasons, times
of day and skin types. People are confused about when they need to protect
themselves, how much time they can spend outside, and how to balance the risk of
skin cancer versus that of vitamin D deficiency (16-19). Reflecting these concerns
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 61
there has been a huge increase in vitamin D testing in Australia, with costs to the
health care system rising from AUS$3.2 million in 2003 to AUS$143 million in 2013
(20-23). It is unknown, however, whether changes in sun-exposure behaviour are
more common in people who are concerned about achieving optimal vitamin D, and
whether this depends on where they live. The challenge now is finding the best way
to balance the risks and benefits of sun exposure and how to communicate this to the
general public (24).
Previous studies investigating knowledge, attitudes and behaviours related to
vitamin D and sun-protection have been limited by small sample sizes (16, 25-27) or
a focus on specific populations (28, 29). In this paper, we used Item Response
Theory (IRT) to assess the potential impact that behaviour change due concern to
vitamin D may have on skin cancer predisposition. IRT (modern test theory) offers
many advantages compared to classical test theory. It offers mathematical modelling
that specifies the probability of selecting each questionnaire item’s response option
as a function of the target latent trait (in our case skin cancer predisposition) being
measured. It therefore allows economical and precise assessment of the
characteristics under study and highlights specific targets for personalised
intervention. IRT is increasingly used in health research; examples include assessing
activity for post-acute care (30), and measures of physical functioning, health status,
and adolescent health risk behaviour (31-33). IRT allows computation of health
measures on an interval measurement scale (rather than ordinal scores provided by
most classical test theory-constructed health scales) and exploration of the
performance of each individual item rather than the scale as a whole. IRT
encompasses any mathematical model which attempts to predict observations from
locations on a latent variable. It uses logistic models including Rasch models,
Generalised Rating Scale models, or Samejima's Graded-Response models (34).
These models are widely used in education and patient-reported outcome
assessments (35-38). IRT-tested scales plot both respondent’s and item’s
measurements calibrated onto a common latent trait such as skin cancer
predisposition. IRT enables researchers to better visualise how changes in sun-
protective behaviours may influence underlying skin cancer predisposition.
The present study used data from a large population-based cross-sectional study (the
AusD Study), designed to assess vitamin D status and determinants across a range of
62 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
latitudes and seasons (39, 40). We aimed to a) assess the variation in attitudes and
behaviours according to residential location; b) identify the association between
participants’ attitudes about vitamin D and their self-reported changes to sun-
protection or exposure behaviours; and c) use IRT models to model the potential
effect on skin cancer predisposition that may occur if sun-protection behaviours
change due to concerns about vitamin D.
MATERIAL AND METHODS
The design, recruitment and main outcome measures of the multi-centre AusD Study
have been described previously in detail (39). Approval was obtained from four
institutional ethics committees. Potentially eligible participants were residents of 4
Australian cities (Hobart, Canberra, Brisbane, and Townsville) registered on the
Australian Electoral Roll [a compulsory register of Australian adults aged 18+ years)
and aged between 18 and 75 years. Exclusion criteria were: insufficient command of
English; an impairment or illness that prevented attendance at the interview; a
bleeding disorder; or positivity for hepatitis B virus, hepatitis C virus, or human
immunodeficiency virus. Participants completed a mailed health questionnaire
followed by two personal interviews at their local study site. At the end of the second
interview, a 20-mL venous blood sample was collected from each participant to
measure concentrations of serum 25-hydroxyvitamin D (25OHD). The serum and
buffy coat were processed using standard procedures, before storage locally in a –
80°C freezer. The final study sample was representative of the underlying population
based on a set of parameters (gender, age group, country of birth, perceived health
status, body mass index and smoking status) available from the population-based
2007–2008 National Health Survey; most participants (80.4%) had been born in
Australia, full-time workers who worked primarily indoors and considered
themselves to have fair-to-medium skin colour (participants were asked to self-report
their skin colour by a reference to a Fitzpatrick Skin Type chart (41)), brown hair,
and blue or grey eyes as previously reported (39). This analysis uses data from the
self-administered questionnaire (demographic characteristics and questions about the
way participants protect themselves from the sun), interviewer-administered
questions (phenotypic characteristics, skin cancer- and vitamin D-related attitudes,
use of sun protection, and changes in sun exposure behaviours due to concern about
vitamin D), and blood sample to measure concentrations of (25OHD).
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 63
Attitudes towards vitamin D:
Three questions assessed attitudes towards vitamin D (‘I worry about getting enough
vitamin D’; ‘I need to spend more time in the sun during summer for a healthy
vitamin D level’; ‘It is more important to stay out of the sun than to get enough
vitamin D’), with answer categories using 5-point Likert scales ranging from
strongly agree to strongly disagree. An option for ‘can’t say’ was included.
Participants were also asked whether they had noticed any news stories about
vitamin D (yes, no, unsure).
Change in sun-exposure or sun-protection behaviours due to concern about
vitamin D
Participants were asked if they had made changes to their personal sun-protection or
sun-exposure behaviours during the previous summer in order to get enough vitamin
D (“Did you try to wear shorts more often? Did you try to wear a hat less often? Did
you try to wear sunscreen less often? Did you spend more time in the sun? Any other
changes?”). Answer categories were yes, no, or can’t say. Fewer than 2.5% of
participants answered ‘can’t say’; these responses were combined with the ‘no’
category. Excluding participants who answered ‘can’t say’ did not significantly
change the results.
Statistical analysis: Prior to analysis we grouped the response categories strongly
disagree/disagree/neutral and strongly agree/agree. For the Rasch analysis, we
recoded the sun protection behaviour items (items 1-6 in Table 3.4) so that a higher
score indicated higher skin cancer predisposition as follows: strongly disagree = 4,
disagree =3, neutral =2 , agree =1 , and strongly agree = 0; and less sun protection
behaviour (item 7-11 e.g.: try to wear a hat less often) as follows: strongly disagree =
0, disagree =1, neutral =2 , agree =3 , and strongly agree = 4. We used chi-square
tests to compare attitudes, and changes in sun-protection behaviours, stratified by
participants’ locations. We also used chi-square tests to determine whether reported
changes in sun-protection behaviours to get more vitamin D varied according to
attitudes towards vitamin D or having heard news reports about vitamin D. Bivariate
logistic regression analyses were used to determine sociodemographic and skin
cancer risk factors associated with changes in sun protection behaviours. Factors that
64 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
were statistically significant (p<0.2) in the bivariate analyses and did not show
evidence for multi-collinearity were then included as adjustment factors in the
multivariate logistic regression analyses. Multivariable logistic regression models
were used to assess whether changes in sun-exposure or -protection behaviours (yes
or no) were influenced by vitamin D related attitudes, adjusted for age, sex, location,
education, indoor or outdoor work, ability to tan and participants’ measured
concentrations of serum 25OHD. We repeated the models adjusting for season (data
not shown), but results remained unchanged and the former more parsimonious
models are reported.
Item response theory: The matrix of responses of 1,002 participants to the attitude
items was subjected to Rasch analysis using the Andrich rating scale model for
polytomous data (42). Rasch models are a variant of IRT that model a relationship
between the levels of that latent trait (for this study skin cancer predisposition) and
the items used for measurement. In clinical assessment, the concept behind IRT is
that participants respond to items in a questionnaire based on the extent of the latent
trait (equivalent to person ability in Rasch analysis of a physical disability
instrument). Therefore, a person with an average level of severity of skin cancer
disposition will likely report that they had less sun exposure behaviours compared to
people with greater skin cancer predisposition. Severity of skin cancer predisposition
is expressed in terms of log odds or “logits,” and persons and items are mapped
along the same scale. Logit-transformed measures represent linear measures skin
cancer predisposition. For an item, a logit represents the log odds of the extent of an
item relative to the position of that item within the total set of items analysed. Logits
of higher positive magnitude represent a participant who has higher skin cancer
predisposition. We applied IRT models to assess the item information functions of 29
self-and interviewer administered questions when on an underlying latent trait of skin
cancer predisposition: eighteen items measured phenotype and typical sun exposure
behaviours, six items measured sun-protection behaviours and five items measured
changing sun-protection behaviours due to concern about getting enough vitamin D.
Item information is the contribution that an individual item makes to the total
information of a measured latent construct and shows where on the underlying latent
construct each item measures optimally (43). In general, item information functions
tend to look bell-shaped. Highly discriminating items have tall, narrow information
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 65
functions; they contribute greatly but over a narrow range. Less discriminating items
provide less information but over a wider range. Plots of item information can be
used to see how much information an item contributes and to what portion of the
scale score range. Calibration into the Rasch Partial Credit Model (RPCM) (44) was
completed using ACER ConQuest software (45). Calibration is the procedure of
estimating a person’s ability (in this case the person’s skin cancer predisposition) and
item difficulty (propensity to endorse an item) by converting (scaling) raw scores to
logits on an underlying uni-dimensional measurement scale.
Unweighted and weighted fit statistics were used to check the quality of the scale
from the Rasch model perspective. The mean square error (MNSQ) fit statistic is a
measure of the extent to which the data match the specifications of the model. As in
common practice in Rasch analysis, items that don’t fit with the model are removed.
Values of unweighted and weighted MNSQ can range from 0 to positive infinity with
an ideal value of 1.0 indicating that the data perfectly fit the model. Values below 1.0
suggest that variation in the observed data is over-predicted by the Rasch model
while values above 1.0 show that variation in the observed data is greater than that
predicted by the model. Currently there is no standard cut-off value for MNSQ;
different acceptable ranges are used to indicate good-fit of the model. We used a
relatively strict standard (unweighted MNSQ values between 0.75 and 1.33) as a
criteria and indication of good-fit (46). Once the skin cancer predisposition scale was
calibrated, we plotted each item and its response categories along this underlying
latent trait logit scale which is expressed as theta, with 0 representing the mean skin
cancer predisposition. To illustrate the potential effect of changing sun-protection
behaviour due to concern about vitamin D, we plotted a hypothetical example for a
person endorsing items that confer a high or low skin cancer predisposition to show
the impact on the underlying construct of skin cancer predisposition.
RESULTS
Of 11,713 people approached, 1,269 agreed to participate and 1,002 provided data
(overall study participation rate 9.1%). Demographic and phenotypic characteristics
of the sample have been previously reported (39). The distribution of participants
was approximately equally spread between the four study locations. The average age
of participants was 48 years (SD 16) and 46% were male. Over 80% of participants
were born in Australia and most had fair or medium skin colour (90%) and green,
66 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
hazel, grey or blue eyes (80%) placing them environmentally and constitutionally
(having a phenotype that confers overall higher than average risks of developing skin
cancer based on accumulated epidemiologic evidence e.g. skin type 1, red hair, lack
of tanning ability and propensity to freckle and burn).at risk of skin cancer. Fifty-six
participants had serum 25(OH)D levels below 25nmol/L; a significantly greater
proportion of these participants (32.1%) were worried about not getting enough
vitamin D compared to participants with level above 25 nmol/L (24.0 %, p<0.03).
Participants from Canberra were more likely than those from other locations to: work
indoors (81% vs 68%) (p<0.001); have a bachelor degree (30% vs 22%; p<0.001);
and be born outside Australia (29% vs 16%, p<0.001). Participants from Canberra
were less likely to report fair skin than other participants (50% vs 68%; p<0.001),
while participants from Hobart were more likely to report blue, grey or green eye
colour compared to participants from elsewhere (64% vs 52%; p<0.001). A larger
proportion of participants from Hobart entered the study in spring while a larger
proportion of participants from Canberra participated during winter.
Vitamin D-related attitudes and change in sun-protection/exposure behaviours
due to concern about vitamin D, stratified by location
Concerns about vitamin D, and reported change in sun-protection or sun-exposure
behaviour due to those concerns, increased with increasing latitude (Table 3.1). For
example, 18% of participants from Townsville, 21 % of participants from Brisbane,
31% from Canberra and 40% from Hobart agreed with the statement ‘I need to spend
more time in the sun during summer for a healthy vitamin D level’ (p<0.001).
Overall, between 4 and 15% of participants reported that they had changed their sun-
exposure or -protection behaviours during the previous summer to get sufficient
vitamin D. People from Hobart were significantly (p<0.001) more likely to report
wearing shorts (24%) and spending more time in the sun due to concern about
vitamin D (28%) than those from Brisbane or Townsville (8-10%). There were no
significant differences in hat and sunscreen use or other sun-protective behaviours
according to participants’ locations (Table 3.1), although these behaviours also
followed a latitudinal gradient.
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 67
Associations between vitamin D-related attitudes and sun protection behaviours
A larger proportion of people who worried about vitamin D or who felt they needed
to spend more time in the sun for vitamin D production reported that they had altered
their sun-exposure behaviours during the last summer (Table 3.2). In adjusted
multivariable logistic regression analyses, those who worried about getting enough
vitamin D wore sunscreen less often (adjusted OR=3.2; 95CI 1.6-6.2; p=0.001) and
shorts more often (adjusted OR=1.6; 95CI 1.0-2.6; p=0.04) and tended to spend more
time in the sun (adjusted OR=2.4; 95CI 1.5-3.7; p<0.001). Those who agreed that
they needed to spend more time in the sun in summer for a healthy vitamin D level
were less likely to wear a hat (adjusted OR=2.6; 95CI 1.2-5.6;p=0.04) or sunscreen
(adjusted OR=2.6; 95CI 1.3-5.0;p=0.004), and more likely to wear shorts (adjusted
OR=3.0; 95CI 1.9-4.7;p<0.001) and increase the amount of time spent in the sun
(adjusted OR=4.2; 95CI 2.8-6.4;p<0.001) (Table 3.3).
There were no significant differences in participants’ self-reported sun-protection
behaviours according to whether or not they had heard any ‘news about vitamin D’
or agreed or disagreed with the statement ‘it’s more important to stay out of the sun
than to get enough vitamin D’.
Potential effect of changes in sun-protection behaviour and underlying skin
cancer predisposition
For ease of interpretation, we transformed the person ability score (the skin cancer
predisposition score) from a logit score into a T-Score (see supplement 2) which
follows a T-score distribution with a mean of 50 and standard deviation of 10.
Overall the current participants were found to have skin cancer predisposition below
the mean (Mean=44.10). Table 3.4 shows the item locations and the scale and fit
statistics (MNSQ statistic) of selected sun exposure behaviour items within the
calibrated skin cancer predisposition latent trait continuum, expressed on a logit
scale. Estimates below 0 (negative) represent a low skin cancer predisposition, while
those above 0 (positive) represent an increasingly high skin cancer predisposition,
based on the self- and interviewer administered questions. The overall item
parameter estimates show that all 11 items fitted the skin cancer predisposition scale
well, as all were located within the recommended MNSQ bounds of 0.75 – 1.33.
Figure 3.1 visualises two items assessing hat wearing behaviours on calibrated skin
68 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
cancer predisposition scale. Compared to a hypothetical person who agrees with the
item “wear a hat” (i.e., skin cancer predisposition <0), a person who endorses the
item ‘try to wear a hat less often’ will be assigned a score well above 0. A Wilcoxon
Signed-Ranks Test indicated that the item location of concern about vitamin D were
statistically significantly higher than the item location of sun protection behaviour
(Z=-2.023, p=.043). This shows the potential effect of changing sun protection
behaviours.
DISCUSSION
Approximately one quarter of the participants were concerned about their vitamin D
status and believed they needed to spend more time in the sun. Although only 4%
reported changing their hat-wearing behaviours, 15% reported that they tried to
spend more time in the sun in the previous summer to synthesise enough vitamin D.
Attitudes about vitamin D and changes in sun-protection behaviours were
significantly related to each other and differed according to the latitude at which the
participant lived.
The United States Preventive Services Task Force recently reviewed the evidence on
the effect of vitamin D on fractures, cancers and other chronic disease prevention,
and concluded that while there is some positive evidence for fracture prevention, the
evidence for other chronic diseases is still inconclusive (47, 48). Given the
uncertainties surrounding the role of vitamin D in health, the known skin cancer-
inducing effects of sun exposure, and our findings suggesting a close association
between attitudes towards vitamin D and sun exposure behaviour, it is important to
ensure that public concern about vitamin D does not jeopardise skin cancer
prevention messages (29, 28).
IRT models graphically highlight the potential impact of self-reported behaviour
change on skin cancer predisposition. Cancer Council Australia’s Skin Cancer
Committee has updated their skin cancer prevention messages to accommodate the
balance between the risks and benefits of sun exposure; for example, they have
contributed to a position statement which recommends sun protection if the UV
Index is ≥ 3 but also “exposing the face, arms and hands or the equivalent area of
skin to a few minutes of sunlight on either side of the peak UV periods on most days
of the week” (49). A previous study (50) found that sun exposure to the arms and
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 69
legs as little as two exposures per week of 5 minutes duration may be sufficient to
main adequate vitamin D >30nmol/L (depending on time of day, season, etc.). One
of the concerns with changing the sun-protection messages provided by preventive
health authorities is that people may be confused. For example, should they discard
hats and sunscreen in order to optimise vitamin D regardless of where they live? Our
finding that vitamin D-related attitudes and self-reported changes in sun-protective
behaviours increased with increasing latitude is reassuring and is consistent with the
messages and position statements issued by health authorities which recommend, for
example, to discard use of hats only in the southern states of Australia in winter (51).
Once adjusted for relevant confounders, latitude and 25OHD level, only people who
worried about vitamin D, and those who specifically thought that they needed to
spend more time in the sun for vitamin D production, had higher odds of having
changed their sun-exposure behaviours. These findings suggest that people make
choices about their sun exposure based on their attitudes and environment (latitude),
and more research on these interactions is needed to determine what influences these
attitudes. We previously found that people obtained information through the media
(19, 28), but in this study we did not observe a strong association between having
heard about vitamin D on the news and change in either attitudes or behaviours.
Future work needs to explore this in more detail and should address important issues
such as adding some questions about participants’ knowledge of sun protection and
vitamin D.
Study limitations: The main limitation of this study was its cross-sectional design.
Further research incorporating longitudinal assessment of 25OHD is needed to
determine whether people who are worried about vitamin D status actually have
lower 25OHD levels, and if so, whether additional sun exposure helps to increase
these levels.
While the AusD Study recruited participants from a population-based register of all
Australian voting adults, the participation rate was low, and only through its
sampling requirements achieved a similar proportion of men and women. Results
from this study may not be generalisable to general adult Australian population due
to low response rate (9.1%). Participants were more likely than nonparticipants to be
female (54.2% vs. 47.2% (P < 0.001) and older than age 39 years (P < 0.001) (39).
70 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
Overall this study attracted a higher proportion of women and older, indoor-working,
well-educated participants compared with the underlying population. It is possible
that these participants may have been more motivated to participate because they
were more concerned about vitamin D than non-participants.
CONCLUSION
We found that the strongest and most consistent determinant of self-reported sun-
protection behaviour changes due to concerns about vitamin D were attitudes and
location, with those at higher latitudes worrying more. Further research is needed to
understand what drives people’s vitamin D-related attitudes. This information may
be useful to inform public health strategies or to help people to make behavioural
choices that are consistent with their values.
Acknowledgements
The authors thank the AusD investigators who provided the data extract used in this
study. Ngadiman Djaja is supported by the National Health and Medical Research
Council of Australia (NHMRC) CRESH PhD scholarship. Rachel E. Neale is funded
by a NHMRC Senior Research Fellowship. Robyn M Lucas is supported by a
NHMRC Career Development Fellowship
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 71
REFERENCES
1. Armstrong BK, Kricker A, English DR. Sun exposure and skin cancer.
Australasian Journal of Dermatology. 1997; Feb 1; 38(S1):S1-6.
2. International Agency for Research on Cancer. Solar and Ultraviolet
Radiation. Vol. 55. (Edited by I A F R O Cancer), Lyon, France. 1992.
3. Armstrong BK, Kricker A. How much melanoma is caused by sun exposure?.
Melanoma research. 1993; Nov 1; 3(6):395-402.
4. The International Agency for Research on Cancer (2009) Radiation : A
review of human carcinogens. In IARC monographs on the evaluation of
carcinogenic risks to humans Vol. 100 D. Lyon, France.
5. World Health Organization, World Meteorological Organization, United
Nations Environment Programme and International Commission on Non-
Ionizing Radiation Protection. Global Solar UV Index: A Practical Guide.
(Edited by WHO). 2002.
6. Ross AC, Manson JE, Abrams SA, et al. The 2011 report on dietary reference
intakes for calcium and vitamin D from the Institute of Medicine: what
clinicians need to know. The Journal of Clinical Endocrinology &
Metabolism. 2011; Jan; 96(1):53-8.
7. Barnard K, Colón-Emeric C. Extraskeletal effects of vitamin D in older
adults: cardiovascular disease, mortality, mood, and cognition. The American
journal of geriatric pharmacotherapy. 2010; Feb 28;8(1):4-33.
8. Ginde AA, Scragg R, Schwartz RS, Camargo CA. Prospective Study of
Serum 25‐Hydroxyvitamin D Level, Cardiovascular Disease Mortality, and
All‐Cause Mortality in Older US Adults. Journal of the American Geriatrics
Society. 2009; Sep 1; 57(9):1595-603.9.
9. Tomson J, Emberson J, Hill M, et al. Vitamin D and risk of death from
vascular and non-vascular causes in the Whitehall study and meta-analyses of
12 000 deaths. European heart journal. 2013; May 7; 34(18):1365-74.
10. Bischoff-Ferrari HA, Giovannucci E, Willett WC, Dietrich T, Dawson-
Hughes B. Estimation of optimal serum concentrations of 25-hydroxyvitamin
D for multiple health outcomes. The American journal of clinical nutrition.
2006; Jul 1; 84(1):18-28.
11. Ben-Shoshan M. Vitamin D deficiency/insufficiency and challenges in
developing global vitamin D fortification and supplementation policy in
adults. Int J Vitam Nutr Res 2012; 82, 237-259.
12. F Holick M. Vitamin D, sunlight and cancer connection. Anti-Cancer Agents
in Medicinal Chemistry (Formerly Current Medicinal Chemistry-Anti-Cancer
Agents). 2013; Jan 1; 13(1):70-82.
13. The Australia and New Zealand Bone and Mineral Society, Osteoporosis
Australia, Australasian College of Dermatologists and The Cancer Council
Australia. Risks and benefits of sun exposure: Position statement. 2007.
72 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
Available at: http://www.cancer.org.au/policy-and-advocacy/position-
statements/sun-smart/.
14. Samanek AJ, Croager EJ, Gies P, Milne E, Prince R, McMichael AJ, Lucas
RM, Slevin T. Estimates of beneficial and harmful sun exposure times during
the year for major Australian population centres. Medical journal of
Australia. 2006; Apr 3; 184(7):338.
15. Hartley M, Hoare S, Lithander FE, et al. Comparing the effects of sun
exposure and vitamin D supplementation on vitamin D insufficiency, and
immune and cardio-metabolic function: the Sun Exposure and Vitamin D
Supplementation (SEDS) Study. BMC public health. 2015; Feb 10; 15(1):1.
16. Janda M, Kimlin M, Whiteman D, Aitken J, Neale R. Sun protection and low
levels of vitamin D: are people concerned? Cancer Causes & Control. 2007;
Nov 1;18(9):1015-9.
17. Scully M, Wakefield M, Dixon H. Trends in news coverage about skin cancer
prevention, 1993‐2006: increasingly mixed messages for the public.
Australian and New Zealand journal of public health. 2008; Oct 1; 32(5):461-
6.
18. Dixon H, Warne C, Scully M, Dobbinson S, Wakefield M. Agenda-setting
effects of sun-related news coverage on public attitudes and beliefs about
tanning and skin cancer. Health communication. 2014; Feb 7; 29(2):173-81.
19. Langbecker D, Youl P, Kimlin M, Remm K, Janda M. Factors associated
with recall of media reports about vitamin D and sun protection. Australian
and New Zealand journal of public health. 2011; Apr 1; 35(2):159-62.
20. Bilinski K, Boyages S. The rise and rise of vitamin D testing. BMJ. 2012.
21. Bilinski K, Boyages S. Evidence of overtesting for vitamin D in Australia: an
analysis of 4.5 years of Medicare Benefits Schedule (MBS) data. BMJ open.
2013; Jan 1; 3(6):e002955.
22. The Department of Human Services. Medicare Benefits Schedule (MBS).
2014. Available at:
http://www.medicareaustralia.gov.au/provider/medicare/mbs.jsp.
23. Bilinski KL, Boyages SC. The rising cost of vitamin D testing in Australia:
time to establish guidelines for testing. The Medical journal of Australia.
2012; Jul 16; 197(2):90.
24. Glanz K, Rimer BK, Viswanath K, editors. Health behavior and health
education: theory, research, and practice. John Wiley & Sons; 2008; Aug 28.
25. Vu LH, van der Pols JC, Whiteman DC, Kimlin MG, Neale RE. Knowledge
and attitudes about vitamin D and impact on sun protection practices among
urban office workers in Brisbane, Australia. Cancer Epidemiology
Biomarkers & Prevention. 2010; Jun 22:1055-9965.
26. Youl PH, Janda M, Kimlin M. Vitamin D and sun protection: the impact of
mixed public health messages in Australia. International Journal of Cancer.
2009; Apr 15; 124(8):1963-70.
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 73
27. Janda M, Youl P, Bolz K, Niland C, Kimlin M. Knowledge about health
benefits of vitamin D in Queensland Australia. Preventive medicine. 2010;
Apr 30; 50(4):215-6.
28. Nowak M, Harrison SL, Buettner PG, et al. Vitamin D status of adults from
tropical Australia determined using two different laboratory assays:
implications for public health messages. Photochemistry and photobiology.
2011; Jul 1; 87(4):935-43.
29. Harrison S, Büttner P, Nowak M. Maternal beliefs about the reputed
therapeutic uses of sun exposure in infancy and the postpartum period.
Australian Midwifery. 2005; Aug 31; 18(2):22-8.
30. Reid CA, Kolakowsky-Hayner SA, Lewis AN, Armstrong AJ. Modern
psychometric methodology applications of item response theory.
Rehabilitation Counseling Bulletin. 2007; Apr 1; 50(3):177-88.
31. Cella D, Chang CH. Response to Hays et al and McHorney and Cohen: A
discussion of item response theory and its applications in health status
assessment. Medical Care. 2000; Sep 1; 38(9):II-66.
32. Hays RD, Morales LS, Reise SP. Item response theory and health outcomes
measurement in the 21st century. Medical care. 2000; Sep; 38(9 Suppl):II28.
33. Warne RT, McKyer EJ, Smith ML. An introduction to item response theory
for health behavior researchers. American journal of health behavior. 2012;
Jan 1; 36(1):31-43.
34. Linacre JM. What is item response theory, IRT? A tentative taxonomy. Rasch
Measurement Transactions. 2003; 17(2):926-7.
35. da Rocha NS, Chachamovich E, de Almeida Fleck MP, Tennant A. An
introduction to Rasch analysis for psychiatric practice and research. Journal
of psychiatric research. 2013; Feb 28; 47(2):141-8.
36. Leung YY, Png ME, Conaghan P, Tennant A. A systematic literature review
on the application of Rasch analysis in musculoskeletal disease—A special
interest group report of OMERACT 11. The Journal of rheumatology. 2014;
Jan 1; 41(1):159-64.
37. Lundgren-Nilsson Å, Jonsdottir IH, Ahlborg G, Tennant A. Construct validity
of the psychological general well being index (PGWBI) in a sample of
patients undergoing treatment for stress-related exhaustion: a rasch analysis.
Health and quality of life outcomes. 2013; Jan 7; 11(1):1.38.
38. Waller J, Ostini R, Marlow LA, McCaffery K, Zimet G. Validation of a
measure of knowledge about human papillomavirus (HPV) using item
response theory and classical test theory. Preventive medicine. 2013; Jan 31;
56(1):35-40.
39. Brodie AM, Lucas RM, Harrison SL, et al. The AusD Study: a population-
based study of the determinants of serum 25-hydroxyvitamin D concentration
across a broad latitude range. American journal of epidemiology. 2013; Mar
22; kws322.
74 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
40. Kimlin MG, Lucas RM, Harrison SL, et al. The contributions of solar
ultraviolet radiation exposure and other determinants to serum 25-
hydroxyvitamin D concentrations in Australian adults: the AusD Study.
American journal of epidemiology. 2014; Apr 1; 179(7):864-74.
41. Fitzpatrick TB. Soleil et peau. J Med Esthet. 1975; 2(7):33-4.
42. Andrich D. A rating formulation for ordered response categories.
Psychometrika. 1978; Dec 1; 43(4):561-73.
43. De Ayala RJ. The theory and practice of item response theory. Guilford
Publications; 2013; Oct 15.
44. Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;
Jun 1; 47(2):149-74.
45. Adams R Wu M Wilson M ACER ConQuest 3.0.1. ACER, Melbourne,
Australia. 2013.
46. Wilson M. Constructing measures: An item response modeling approach.
Routledge; 2004; Dec 13.
47. Chung M, Lee J, Terasawa T, Lau J, Trikalinos TA. Vitamin D with or
without calcium supplementation for prevention of cancer and fractures: an
updated meta-analysis for the US Preventive Services Task Force. Annals of
internal medicine. 2011; Dec 20; 155(12):827-38.
48. Lips P, Gielen E, van Schoor NM. Vitamin D supplements with or without
calcium to prevent fractures. BoneKEy reports. 2014; Mar 5; 3.
49. Cancer Council Australia. Position statement: Screening and early detection
of skin cancer. 2007.
50. Holick MF. Vitamin D deficiency. New England Journal of Medicine. 2007;
Jul 19; 357(3):266-81.
51. Nowson CA, McGrath JJ, Ebeling PR, et al. Vitamin D and health in adults in
Australia and New Zealand: a position statement. Med J Aust. 2012; Jun 18;
196(11):686-7.
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 75
Table 3.1: Differences in vitamin D attitudes and sun protection behaviours by location1
Location
Townsville 19.3°S
N=259 (%)
Brisbane
27.5°S
N=254(%)
Canberra
35.3°S,
N=252(%)
Hobart
42.8°S
N=237(%)
p-value3
I worry about getting enough vitamin D 0.001
Agree2N=237 (24.0%) 30 (11.7) 53 (21.0) 63(25.5) 91 (39.1)
I need to spend more time in the sun during summer for a healthy vitamin D <0.001
Agree2N=270 (27.2%) 47 (18.3) 53 (20.9) 78 (31.1) 92 (39.5)
It is more important to stay out of the sun than to get enough vitamin D <0.001
Agree2N=160 (16.2%) 60 (23.3) 35 (13.9) 40 (16.1) 25 (10.8)
Have you ever heard news reports about getting vitamin D from sunlight 0.002
YesN=633 (64.7%) 139 (55.2) 165 (66.0) 167 (67.6) 162 (70.7)
Last summer did you make any changes to the way you protected yourself from the sun so you could get enough vitamin D?
Wear hat less often N=41 (4.1%) 8 (3.1) 7 (2.8) 11 (4.4) 15 (6.5) 0.46
Wear sunscreen less often N =57 (5.8%) 8 (3.1) 13 (5.1) 18 (7.2) 18 (7.8) 0.18
Wear shorts more often N=136 (13.8%) 21 (8.2) 26 (10.3) 33 (13.5) 56 (24.2) <0.001
Spend more time in the sun N=153 (15.5%) 20 (7.8) 23 (9.2) 44 (17.7) 66 (28.4) <0.001
Any other changes N=81 (8.2%) 20 (7.8) 19 (7.5) 21 (8.5) 21 (9.1) 0.29
1 n may vary slightly due to some missing values 2 Agree = combined categories of strongly agree/agree3 p-value from Chi Square test
76 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
Table 3.2: Vitamin D-related attitudes and self-reported changes in sun protection behaviours
Last summer did you make any changes to the way you protect yourself from the sun so you could get enough vitamin D?
Wear hat
less often
Wear sunscreen
less often
Wear shorts
more often
Spend more time in the sun Any other changes
Yes
N (%)
No
N (%)
Yes
N (%)
No
N (%)
Yes
N (%)
No
N (%)
Yes
N (%)
No
N (%)
Yes
N (%)
No
N (%)
I worry about getting enough vitamin D
Strongly disagree/ disagree/neutral
N 752 (76.0%) Strongly agree/agree
N 237 (24.0%)
22 (55.0)
18(45.0)
730(76.9)
219(23.1)
26(47.3)
29 (52.7)
726(77.7)
208 (22.3)
80(59.7)
54 (40.3)
672(78.6)
183 (21.4)
78(51.3)
74 (48.7)
674(80.5)
163 (19.5)
62(77.5)
18 (22.4)
690(75.9)
219 (24.1)
p < 0.001 p < 0.001 p < 0.001 p < 0.001 p = 0.75
I need to spend more time in the sun during
summer for a healthy vitamin D level
Strongly disagree/ disagree/neutral N 724 (72.8%)
Strongly agree/agree
N 270 (27.2%)
19(46.3)
22 (53.7)
705(74.0)
248 (26.0)
27(47.4)
30 (52.6)
697(74.4)
240 (25.6)
65(47.8)
71 (52.2)
659(76.8)
199 (23.2)
61(39.9)
92 (60.1)
663(78.8)
178 (21.2)
52(64.2)
29 (35.8)
672(73.6)
241 (26.4) p < 0.001
p < 0.001 p < 0.001 p < 0.001 p = 0.07
It is more important to stay out of the sun than to
get enough vitamin D
Strongly disagree/ disagree/neutral N 829 (83.8%)
Strongly agree/agree
N 160 (16.2%)
35(87.5)
5 (12.5)
794(83.7)
155 (16.3)
49(87.5)
7 (12.5)
780(83.6)
153 (16.4)
122(91.0)
12 (9.0)
707(82.7)
148 (17.3)
135(89.4)
16 (10.6)
694(82.8)
144 (17.2)
70(86.4)
11 (13.6)
759(83.6)
149 (16.4) p = 0.52 p = 0.44 p = 0.01 p = 0.04 p = 0.51
Have you ever heard news reports about getting
vitamin D from sunlight
No
N 345 (35.3%)
Yes
N 633 (64.7%)
11(27.5)
29 (72.5)
334(35.6)
604 (64.4)
20(35.7)
36 (64.3)
325(35.2)
597 (64.8)
47(35.6)
85 (64.4)
298(35.2)
548 (64.8)
59(39.6)
90 (60.4)
286(34.5)
543 (65.5)
26(32.1)
55 (67.9)
319(35.6)
578 (64.4) p = 0.29 p = 0.94 p = 0.93 p = 0.23 p = 0.53
1 N may vary slightly due to some missing values
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 77
Table 3.3: Multivariable logistic regression models of associations between vitamin D-related attitudes and changes made during the last summer
to the way people protected themselves from the sun so they can get enough vitamin D*
Try to wear a hat less
often?
Try to use
sunscreen less
often?
Try to wear shorts or short
sleeved clothing more often?
Try to spend more
time out in the sun?
OR (95%CI); p value OR (95%CI); p
value
OR (95%CI); p value OR (95%CI); p value
I worry about getting enough vitamin D
Strongly disagree/ disagree/neutral 1.0 1.0 1.0 1.0
Strongly agree/ agree 1.5 (0.7-3.4); 0.31 3.2 (1.6-6.2); 0.001 1.6 (1.0-2.6); 0.04 2.4 (1.5-3.7); 0.001
I need to spend more time in the sun during summer for healthy vitamin D level Strongly disagree/ disagree/neutral 1.0 1.0 1.0 1.0
Strongly agree/ agree 2.6 (1.2-5.6); 0.04 2.6 (1.3-5.0); 0.004 3.0 (1.9-4.7); <0.001 4.2 (2.8-6.4); 0.001
It is more important to stay out of the sun than to get enough vitamin D Strongly disagree/ disagree/neutral 1.0 1.00 1.0 1.0
Strongly agree/ agree 0.9 (0.3-2.7); 0.82 1.5 (0.6-3.8); 0.34 0.6 (0.3-1.1); 0.11 0.7 (0.4-1.3); 0.28
Have you ever heard news reports about getting vitamin D from sunlight
No 1.0 1.0 1.0 1.0
Yes 1.1 (0.5-2.3); 0.93 0.7 (0.4-1.3); 0.25 0.9 (0.6-1.4); 0.61 0.6 (0.4-1.0); 0.05
*For ease of reporting, all models are adjusted for age, sex, latitude, season, education, occupational exposure, ability to tan, measured 25OHD (continuous)
78 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
Table 3.4: Item location and fit statistics of sun protection behaviour items calibrated
within a skin cancer predisposition model.
Item No Item Estimated delta
(standard error)*
Unweighted Fit
MNSQ**
1 Wear a broad-brimmed hat -0.997 (0.032) 1.01
2 Wear a cap - 1.246 (0.039) 1.02
3 Wear any other head covering -1.834 (0.070) 1.00
4 Wear a shirt with long sleeves -0.634 (0.030) 0.99
5 Wear long trousers or clothing that
covers all or most of your legs
-0.399 (0.031) 1.01
6 Wear sun glasses -0.351 (0.031) 1.00
7 Try to wear a hat less often 2.542 (0.155) 1.00
8 Try to use sunscreen less often 2.188 (0.133) 1.00
9 Try to wear shorts or short sleeved
clothing more often
1.218 (0.091) 0.99
10 Try to spend more time out in the sun 1.092 (0.087) 0.99
11 Make any other changes to the way you
protect yourself from the sun
1.802 (0.113)
Abbreviations: MNSQ = Mean Square
*The estimate delta is the item location within a skin cancer predisposition continuum on a
logit scale. The score can be from negative infinity to positive infinity. Scores below 0
(negative) represent low skin cancer predisposition score and scores above 0 (positive)
represent increasingly high skin cancer predisposition score
** The fit of the items is evaluated using unweighted Mean Square (MNSQ). A MNSQ near
1 indicates a good fit. MNSQ <1 indicates an overfit, that is, the item discriminates more
than assumed in the model. MNSQ scores >1 usually occur if the discrimination of the item
is low; this is considered to be more serious violation to model fit than MNSQ <1.
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 79
Figure 3.1: Item Information Functions from two items plotted along the latent trait
logits of skin cancer predisposition
*Average skin cancer disposition is located at zero.
**Graph indicating that if a person were to endorse the item “wear a broad brimmed hat” their skin cancer risk is
below average, whereas if they endorse the item “Last summer …, tried to wear a hat less often” risk is above 0
(average).
Wear a broad-brimmed
hat
Try to wear a hat less
often
Skin cancer predisposition
80 Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia
Table 3.5: Supplement 1. Demographic and Phenotypic characteristics of the
participants (n=1,002)
Characteristics Participants
No. %
Sex
Male 459 45.8
Female 543 54.2
Age group, years
18-24 72 7.2
25-44 358 35.7
45-64 377 37.6
65-75 195 19.5
Country of birth
Australia 806 80.4
Other countries 196 19.6
Skin colour
Dark/Black 11 1.1
Olive 94 9.4
Medium 258 25.7
Fair 628 62.7
Missing 11 1.1
Natural hair colour at 18
Black 102 10.2
Brown 655 65.4
Blonde 200 20.0
Red 37 3.7
Missing 8 0.8
Eye colour
Brown 201 20.1
Hazel 247 24.7
Blue or Grey 482 48.1
Green 62 6.2
Missing 10 1.0
Chapter 3: Self-reported Changes in Sun-Protection Behaviours at Different Latitudes in Australia 81
*The x-axis shows the participants’ skin cancer predisposition (converted to T score with Mean = 50,
SD=10)
Figure 3.2: Supplement 2. The distribution of the skin cancer predisposition (scores
converted to T score)
84 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
Estimating Skin Cancer Risk: Evaluating Mobile Computer-Adaptive Testing
This chapter includes a peer-reviewed journal article published in Journal of Medical
Internet Research. This article evaluates the efficiency of non-adaptive testing and
computer adaptive testing to estimate skin cancer risk. A Dichotomous Model,
Rating Scale Model and Partial Credit Model were applied to the simulation data.
Djaja, N., Janda, M., Olsen, C. M., Whiteman, D. C., & Chien, T.-
W. (2016). Estimating Skin Cancer Risk: Evaluating Mobile
Computer-Adaptive Testing. Journal of Medical Internet
Research, 18(e22). doi:10.2196/jmir.4736
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 85
ABSTRACT
Background: Response burden is a major detriment for questionnaire completion
rates. Computer adaptive testing may offer advantages over non-adaptive testing,
including reduction of numbers of items required for precise measurement.
Objective: To compare the efficiency of non-adaptive (NAT) and computer adaptive
testing (CAT) facilitated by Partial Credit Model (PCM) derived calibration to
estimate skin cancer risk.
Method: We used a random sample (two thirds) drawn from a population-based
Australian cohort study of skin cancer risk (n=43,794). All 30 items of the skin
cancer risk scale (SCRS) were calibrated with the Rasch PCM. A total of 1,000 cases
generated following a normal distribution (Mean=0,SD=1)were simulated using
three Rasch models with three fixed-item (dichotomous, rating scale and partial
credit) scenarios, respectively. We calculated the comparative efficiency and
precision of CAT and NAT (shortening of questionnaire length and the count
difference number ratio less than 5% using independent t tests).
Results: We found that use of CAT led to smaller person standard error(SE) of the
estimated measure than NAT with substantially higher efficiency but no loss of
precision, reducing response burden by 48%, 66%, and 66% for dichotomous, Rating
Scale Model, and PCM models, respectively.
Conclusions: CAT-based administrations of the SCRS could substantially reduce
participant burden without compromising measurement precision. A mobile on-line
computer adaptive test was developed to help people efficiently assess their skin
cancer risk.
Keywords: computer adaptive testing, skin cancer risk scale, Non Adaptive Test,
Rasch analysis, partial credit model
86 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
INTRODUCTION
In Australia, skin cancers account for approximately 80% of all newly diagnosed
cancers [1]. There are three main types of skin cancer: (1) melanoma (the most
dangerous form of skin cancer), (2) basal cell carcinoma (BCC), and (3) squamous
cell carcinoma (SCC). BCC and SCC are often grouped together as non-melanoma or
keratinocyte skin cancers. Australia’s incidence of skin cancer is one of the highest
in the world: two to three times the rates observed in Canada, the United States, and
the United Kingdom [2], with age-standardised incidence rates for cutaneous
melanoma at 65.3 x 10-5 and 1878 x 10-5 for keratinocyte cancer [1]. From a
population of only 23 million, more than 434,000 people are treated for one or more
non-melanoma skin cancers in Australia each year [1]. Ultraviolet radiation exposure
from sunlight is the major causal factor for skin cancer [2]. Personal behaviours to
reduce excessive sunlight exposure are important modifiable factors for the
prevention of skin cancers. The World Health Organization recommends several
suitable behaviours such as appropriate use of sunscreens, staying in the shade,
covering with sun protective clothing, giving up sunbathing, and abstaining from
using sunbeds [3].
Requirement for Model-Data-Fit Detection
In practice, we do not know the real skin cancer risk for a person. Thus, assuming a
person has characteristic attributes that correlate highly with the underlying construct
of skin cancer, risk can be assessed through questions (i.e., questionnaire items); for
example, phenotypic measures such as freckles, hair color, eye color, tendency to
burn, or behavioural factors such as attitudes to tanning and use of sunbeds. Using
the responses to these items, it should be possible to create a unidimensional (i.e.,
addable) scale to measure these attributes and calculate an overall skin cancer risk
score. Ideally, such a score would be precise and characterised by a small standard
error (SE).
Statistical validity is the correlation between each person’s measures (or scores) on a
questionnaire and those persons’ unobservable true status [4]. Such unobservable
variables (e.g., true score or behaviours relating to sun protection and sun exposure)
are considered latent traits (i.e., exists but cannot be directly observed). The question
is how to obtain optimal correlation (or validity) between the items when the true
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 87
score is unknown. Rasch models [5]can be a gateway to assess how well the items
measure the underlying latent trait [6-8].That is, a unidimensional scale can be
verified by Rasch analysis: when the data fit to the Rasch model, all items can be
added.
Questionnaires that are built and tested using the Rasch model have become common
in educational assessment for many years but are now also increasingly appreciated
in health assessment, including measures of patient outcomes (quality of life, pain,
depression) and other diverse latent traits such as perceptions of patient
hospitalisation and nurse bullying [9,10]. We previously applied the Rasch model to
the assessment of the quality of an instrument to measure attitudes to skin self-
examination [11]. Rasch analysis allows researchers to calculate a precise estimate of
the latent trait by assessment of unidimensionality of the items, assessment of
differential item functioning [12] (e.g., probability of giving a certain response on an
item by people from different groups with the same latent trait), and the possibility of
transferring static questionnaires to computer adaptive testing (CAT) [13].
Multimedia Graphical Representations to Improve Patients’ Health Literacy
Patients’ health literacy is increasingly recognised as a critical factor affecting
patient-physician communication and health outcomes [14], as a mediator for cancer
screening behaviour [15], and as a pathway between health literacy and cancer
screening [16]. Adults with below basic or basic health literacy are less likely than
adults with higher health literacy to get information about health issues from written
sources (e.g., newspapers, magazines, books, brochures, or the Internet) and more
likely than adults with higher health literacy to get a lot of information about health
issues from radio and television [17]. A mobile CAT with multimedia graphical
representations (i.e., similar to radio and television) could increase awareness of the
risk of developing skin cancer (i.e., health literacy) and motivate patient-physician
communication and subsequently behavioural change. However, no mobile CAT app
with graphical representations has been available until now.
Study Aims
Using data from a large cohort study of skin cancer from Queensland, Australia [18],
we conducted a simulation study with a methodological focus to apply Rasch models
to an existing skin cancer risk questionnaire. Further, we sought to compare static
88 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
(non-adaptive) presentation as commonly used in paper and pencil questionnaires
versus computer adaptive testing (CAT) for its precision in measurement. We
hypothesised that compared to non-adaptive testing (NAT), CAT would result in
greater precision (lower SE) for a similar item number or a shorter questionnaire of
similar SE.
METHODS
Data Source
De-identified data from the QSkin Sun and Health study baseline questionnaire were
used [18]. This is a population-based cohort study of 43,794 men and women aged
40-69 years randomly sampled from the population of Queensland, Australia, in
2011 (Figure 4.1). We randomly partitioned the data into a calibration dataset (two-
thirds, n=29,314) and a validation dataset (one-third, n=14,480). In the calibration
dataset, 7213 participants had a history of skin cancer and 22,101 participants did not
(Figure 4.2).
Approval for this study was obtained from the QIMR Berghofer Medical Research
Institute Human Research Ethics Committee (approval #P1309). Participants joined
the study by completing consent forms and the survey and returning them in a reply-
paid envelope. Participants completed two consent forms. The first consent form
covered the use of information provided in the survey, permission for data linkage to
cancer registries, pathology laboratories, and public hospital databases. The second
consent form gave permission for data linkage to Medicare Australia (Australia’s
universal national health insurance scheme) to ascertain whether or not participants
had developed skin cancer.
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 89
Figure 4.1: Sample selection flowchart
The baseline questionnaire consisted of 46 items and was answered by all QSkin
participants. All items were examined using the Rasch Partial Credit Model (PCM)
[19] (Figure 4.2). For optimal fit, the Rasch model requires a unidimensional
measurement with criteria of Infit and Outfit mean square errors of each item ˂1.5
[20]. PCM allows for items to have a variable number of thresholds and step
difficulties in contrast to the more commonly used Rating Scale Model (RSM)
[8,9,21], which requires all items to use the same response categories.
For item invariance, the item estimation should be independent of the subgroups of
individuals completing the questions and should work equally across populations
[22]. Items not demonstrating invariance are commonly referred to as exhibiting
differential item functioning (DIF)[23,24] or item bias. The chi-square test used for
detecting DIF was computed from a comparison of the observed overall performance
of each trait group on the item with its expected performance [25]. Its probability
(e.g.,P<.05) reports the statistical probability of observing a chi-square value when
the data fit the Rasch model. We used WINSTEPS [26] to detect items above the
thresholds for DIF.
In addition, the category structure for each of the items in the skincancer item bank
should display monotonically increasing thresholds following the Linacre’s
guidelines [27] to improve the utility of the resulting measures.
90 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
Determining a Cut-Off Point of Skin Cancer Risk
Traditionally in clinical practice, researchers use C-statistics, or area under the
receiver operating characteristic (ROC) curve to plot the true positive rate
(sensitivity) against the false positive rate (1 - specificity) at various threshold
settings [28]. In this study, we plotted two sample normal distributions incorporated
with ROC in Figure 4.3 when their means and standard deviations were known.
Much information such as cut point, area under ROC curve, and a graphical vertical
bar showing cut points can be displayed on a plot. WINSTEPS software [26] was
used to estimate means and standard deviations of cases with and without previous
skin cancers to determine a cut-off point of skin cancer risk with maximal sensitivity
and specificity in MS Excel (Figure 4.3). Providing the cut-off points in graphical
form makes the results clear and easily understandable for readers or clinicians to
interpret.
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 91
Figure 4.2: Study simulation and CAT flowchart.
Mobile Computer Adaptive Testing Designed for Examining Personal Skin
Cancer Risk
The CAT item bank (fitting to Rasch model’s requirement regarding
unidimensionality, local dependence, and monotonicity as well as DIF absence on
gender) was constructed, consisting of all 31-item parameters obtained from the
calibration using WINSTEPS [26].
To start the CAT, an initial item was selected randomly from the item bank. Using
this initial item, a provisional person measure was estimated by the expected a
posteriori (EAP) method [29] in an iterative Newton-Raphson procedure [9,30].
92 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
After each item was answered, EAP was recalculated, until the final score for the
person was determined by the maximum of the log-likelihood function before
terminating the CAT (Figure 4.2). The next item selection was based on the highest
Fisher information (i.e., item variance) of the remaining unanswered items
interacting with the provisional person measure.
Two termination rules were set. The first was a minimum standard error of
measurement (SEM) of 0.47 required for stopping the CAT. This SEM was set based
on the internal consistency of the calibration sample (Cronbach alpha=.78). SEi was
the person SE of the estimated measure according to their item variances of the
finished items on CAT, where SEM=SD xsqrt (1 - reliability) and
SEi=1/sqrt(information(i)), where i refers to the CAT finished items responded to
by a person [31], and SD is the person standard deviation of the derivation sample of
29,314 cases. The second termination rule was that each person must answer at least
10 items according to a simulation study on the data bank for attaining a minimal
average personal reliability at a desired level (e.g., 0.78) [32].
Simulation to Compare Efficiency and Precision of Computer Adaptive Testing
and Non-Adaptive Testing
Using the item parameters generated from the derivation cohort, 1000 cases
following a normal distribution (mean logit 0, SD logit 1) were simulated [33-35]
using three Rasch models (i.e., dichotomous, 5-point RSM, and PCM) with three
respective fixed-item scenarios (i.e., 10, 20, and 30 items; see Tables 4.1-4.3).
Table 4.1: 10, 20, or 30 items in static NAT format.
Datasets
Dichotomous RSM PCM
Mean SE Mean SE Mean SE
10 items -0.007 0.829 0.03 0.414 -0.179 0.398
20 items -0.008 0.555 0.02 0.289 -0.19 0.272
30 items 0.045 0.439 -0.039 0.235 -0.084 0.224
CAT -0.021 0.613 0.021 0.361 -0.154 0.32
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 93
Table 4.2: Precision of CAT.
Precision
Dichotomous RSM PCM
Diff. (%)a Corr.b Diff.(%)a Corr.b Diff.(%)a Corr.b
10 items 0.40 0.863 0.30 0.952 0.00 0.931
20 items 0.00 0.957 0.00 0.988 0.00 0.986
CAT 0.13 0.925 0.05 0.958 0.10 0.946 aDiff. (%): Different number ratio compared to the 30-item dataset. bCorr: Correlation coefficient of person theta to NAT.
Table 4.3: Efficiency of CAT.
Efficiency
Dichotomous RSM PCM
CAT item length %a CAT item length %a CAT item length %a
CAT 15.55 48.20 10 66.70 10.13 67.32 aEfficiency=1-CIL/30.
To allow testing of dichotomous and 5-point rating scale Rasch models, all item (or
step) difficulties were converted from the calibrated results of the PCM. The overall
difficulty for each item was designated to be the respective threshold of the
dichotomous scale. In contrast, the step difficulties of the 5-point RSM [21] ranged
from -2 to 2, with an advance 1.0 logit interval added to the overall difficulty of the
respective item as to the PCM.
We calculated the comparative efficiency and precision for CAT and NAT by
varying the number of items presented (10, 20, and 30 items) and by testing the
difference in precision and efficiency compared to answering all available 31-
itemsusing independent t tests to count different number ratio less than 5% as shown
in the following formula [36], respectively:
t=|cat - 30|/sqrt(SE2cat + SE2
30)
In addition, a comparison of average person SEs achieved across all different
conditions was made to verify precision for CAT and NAT. We ran an author-
created Visual Basic for Applications module in MS Excel to conduct the simulation
study (Figure 4.2) and mobile CAT.
94 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
RESULTS
Determining a Cut-Off Point
The mean and SD of skin cancer risk for participants without skin cancer (mean -
0.79, SE 1.67) or with skin cancer (mean 2.29, SE 2.21) were calculated and used to
determine the optimal cut-off point at 0.88 logit with sensitivity at 0.79 and
specificity at 0.74. Using this cut-off, the area under the ROC curve was 0.88 (see
Figure 4.3).
Figure 4.3: Determining a cut-off point
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 95
Simulation to Compare Efficiency and Precision of Computer Adaptive Testing
and Non-Adaptive Testing
Using simulation data, we found that using more items yielded higher Cronbach
alpha scores (Figure 4.4). Dichotomous scales had the lowest Cronbach alpha and
dimension coefficient [37]. The PCM scales had the highest Cronbach alpha. The
RSM scales gained the highest dimension coefficient.
As shown in Figure 4.4, CAT gained a relatively smaller SE corresponding to item
length (i.e., compared to NAT, shorter CATs result in larger SE). At equivalent
precision, CAT reduces the response burden by 48.20%, 66.70%, and 66.20%,
respectively for dichotomous, RSM, and PCM models. See Figure 4.5.
Figure 4.4: Generated with 3 Rasch Models
Mobile Computer Adaptive Testing Evaluating Skin Cancer Risk
We developed a mobile CAT survey procedure (see QR code in Figure 4.2 and
Multimedia Appendix 1) to practically demonstrate the newly designed PCM-type
CAT app in action. The CAT process was demonstrated item by item and is shown at
the top of Figure 4.6. Person theta is the provisional ability estimated by the CAT
module. The mean square error at the bottom of Figure 4.6 was generated by the
formula of 1/sqrt(Σinformation(i)), where i refers to the CAT presented items
responded to by a person [31]. In addition, the residual at the top of Figure 4.6 was
96 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
the average of the last five change differences between the pre-and-post estimated
abilities on each CAT step. CAT will stop if residual value ˂0.05. The “corr” refers
to the correlation coefficient between the CAT estimated measures and the step
series numbers using the last 5 estimated theta values. The flatter of the theta trends
means the higher probability of the person measure convergent to a final estimation.
Figure 4.5: Efficiency and precision of CAT and compared to using 10, 20 or 30
items in static NAT format.
DISCUSSION
Principal Findings
We used two different approaches to measure risk of skin cancer: non-adaptive
testing and computer adaptive testing. Using data from a very large cohort of more
than 43,000 people, we were able to show that our scale was able to accurately
identify people at highest risk for skin cancer. On our risk scale, we identified a very
high discriminatory accuracy of 0.88 (i.e., the proportion of area under ROC curve)
using a cut-off of 0.88 logits (the higher, the worse). Using CAT results in a smaller
SE at high efficiency (fewer items answered), and therefore without compromising
test precision, reduces response burden by 48.20%, 66.70%, and 66.20% for
dichotomous, RSM, and PCM models, respectively. A prototype mobile online CAT
for evaluating skin cancer risk has been developed and could be used to assess skin
cancer risk at considerable reduction of respondent burden.
Consistent with the literature [8,9,30,34,35], the efficiency of CAT over NAT was
supported for this skin cancer risk scale. We confirm the PCM-type CAT (i.e.,
different from others by using simpler Rasch family models) requires significantly
fewer items to measure a person’s risk than NAT but does not compromise the
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 97
precision of measurement. This mobile assessment could be used to quickly estimate
a person’s skin cancer risk and educate them about the need for skin protection on a
personal level [38-40]. We confirm that participants with a history of skin cancer had
a higher mean score of responses than those without a history of skin cancer.
Figure 4.6 A graphical CAT report shown after each response (top) and the more
item length, the less standard errors in CAT process (bottom)
Implications
Patients’ health literacy (e.g., understanding their own skin cancer risk) is
increasingly recognised as a critical factor affecting patient-physician
communication and health outcomes [14]. Adults with below basic or basic health
literacy are more likely than adults with higher health literacy to get information
about health issues from multimedia graphical representation [17], rather than the
traditional newspapers, magazines, books, brochures, or pamphlets. A brief CAT
such as the one we developed could be used to inform people quickly about their skin
cancer risk and how to improve their sun protection behaviours.
98 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
This CAT module is a practical tool that can gather responses from patients
efficiently and precisely. The tool offers diagnostics that can help practitioners assess
whether responses are distorted or abnormal. For example, outfit mean-square values
of 2.0 or greater suggest an unusual response. In instances where responses do not fit
with the model’s requirement, they can be highlighted for suspected cheating,
careless responding, lucky guessing, creative responding, or random responding [41];
otherwise, one can take follow-up action [8,34,35] if the result shows a high cancer
risk. For example, if a person’s measure/risk is 1.0 logit (i.e., log odds), their
probability of developing skin cancer approaches 0.53(=exp(1-0.88)/(1+exp(1-0.88)).
Interested readers can run a test of the mobile CAT through the QR code shown in
Figure 4.2.
A mobile online CAT could be used for evaluating skin cancer risk and might reduce
the item length in clinical settings. The CAT can be improved in the future by
expanding the item pool allowing use among more diverse samples. It must be noted
that (1) item overall (i.e., on average) and step (threshold) difficulties of the
questionnaire must be calibrated in advance using Rasch analysis or other item
response theory models before creating an item bank, (2) pictures used for the
subject or response categories for each question should be well prepared with a Web
link that can be shown simultaneously with the item appearing in the animation
module of CAT, and (3) the model can be used for many kinds of models based on
item response theory.
Strengths and Limitations
There are two major forms of standardised assessments in clinical settings [42]: (1) a
traditional self-administered questionnaire, and (2) a rapid short-form scale [43,44].
Each has its advantages and drawbacks. Traditional pencil-and-paper questionnaires
have a large respondent burden, often because they require patients to answer
questions that do not provide additional information about their risk of disease in
order to achieve adequate precision measurement [45]. CAT can target the optimal
question for a specific person and therefore end at an appropriate number of items
more economically according to the required SE (or say, criterion of person
reliability). However, along with the advantages offered by CAT, there are some
drawbacks as well, such as impossibility of estimating the ability in case of all
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 99
extreme responses, CAT algorithms requiring serious item calibration, several items
from the item bank being overexposed, and other test items not being used at all [46].
The strengths of this study include its very large sample size of more than 40,000
participants, permitting detailed analysis of the performance of questionnaire items
and the ability to further test the performance of the items in a validation dataset. We
simulated data by varying the types of models and item length to execute the CAT.
(Interested readers who wish to see the video demonstration or use the MS Excel-
type module can contact the corresponding author).
As with all forms of Web-based technology, advances in mobile health (mHealth)
and health communication technology are rapidly emerging [47]. Use of mobile
online CAT is promising and worth considering in many fields of health assessment,
similar to its prominent role in education and staff selection testing. However,
several issues should be considered more thoroughly in further studies. The scale’s
Cronbach alpha (=.78 yielded by studied 29,314 cases), sensitivity at 0.79, and
specificity at 0.74 are slightly low. Second, the CAT module has a potential
limitation for people using languages other than English because the interface may
need to be modified for use in real world. A multiple language interface should be
developed in the future. Third, the CAT graphical representation shown in Figure 4.6
might be confusing and difficult to interpret for people unfamiliar with CAT and may
need to be improved to become a standard part of CAT routine.
CONCLUSIONS
The PCM-type CAT for skin cancer risk can reduce respondents’ burden without
compromising measurement precision and increases endorsement efficiency. The
CAT module can be used for mobile phones and easy online assessment of patients’
disease risks. This is a novel and promising way to capture information about skin
cancer risk, for example while waiting outside physician consultation offices.
Authors’ Contributions
All authors read and approved the final manuscript. ND and T-WC developed the
study concept and design. MJ and CMO analysed and interpreted the data. ND, T-
WC, and DCW drafted the manuscript, and all authors provided critical revisions for
important intellectual content. The study was supervised by T-WC.
100 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
Conflicts of Interest
None declared
Abbreviations
BCC: basal cell carcinoma
CAT: computer adaptive testing
DIF: differential item functioning
NAT: non-adaptive testing
PCM: Partial Credit Model
ROC: receiver operating characteristic
RSM: Rating Scale Model
SCC: squamous cell carcinoma
SE: standard error
SEM: standard error of measurement
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 101
REFERENCES
1. Australian Institute of Health and Welfare & Australasian Association of
Cancer Registries. Cancer in Australia: an overview. Cancer series no. 74.
Cat. no. CAN 70. Canberra: AIHW2012. 2012.
URL:http://www.aihw.gov.au/WorkArea/DownloadAsset.aspx?id=601295
42353 [accessed 2015-11-20]
2. Narayanan DL, Saladi RN, Fox JL. Review: Ultraviolet radiation and skin
cancer. International Journal of Dermatology. 2010; 49(9):978-986.
3. World Health Organization. Global Solar UV Index: A Practical Guide.
Geneva: World Health Organization; 2002.
URL:http://www.who.int/uv/publications/en/GlobalUVI.pdf [accessed
2015-11-20]
4. Linacre JM. True-score reliability or Rasch statistical validity? Rasch
Measurement Transactions. 1996; 9(4), 455.
5. Rasch G. Probabilistic models for some intelligence and attainment tests.
Chicago: University of Chicago Press; 1960.
6. Lerdal A, Kottorp A, Gay CL, Grov EK, Lee KA. Rasch analysis of the
Beck Depression Inventory-II in stroke survivors: A cross-sectional study.
Journal of Affective Disorders. 4// 2014; 158(0):48-52.
7. Forkmann T, Boecker M, Wirtz M, et al. Development and validation of
the Rasch-based depression screening (DESC) using Rasch analysis and
structural equation modelling. Journal of Behavior Therapy and
Experimental Psychiatry. 9// 2009; 40(3):468-478.
8. Sauer S, Ziegler M, Schmitt M. Rasch analysis of a simplified Beck
Depression Inventory. Personality and Individual Differences. 2013;
54(4):530–535.
9. Chien TW, Wang WC, Huang SY, Lai WP, Chow CJ. A Web-Based
Computerized Adaptive Testing (CAT) to Assess Patient Perception in
Hospitalization. J Med Internet Res. 2011/08/15 2011;13(3):e61.
10. Ma SC, Chien TW, Wang HH, Li YC, Yui MS. Applying computerized
adaptive testing to the Negative Acts Questionnaire-Revised: Rasch
analysis of workplace bullying. Journal of Medical Internet Research.
2014; 16(2):e50.
11. Djaja N, Youl P, Aitken J, Janda M. Evaluation of a skin self examination
attitude scale using an item response theory model approach. Health and
Quality of Life Outcomes. 2014; 12(6).
12. Bjorner JB, Kreiner S, Ware JE, Damsgaard MT, Bech P. Differential
Item Functioning in the Danish Translation of the SF-36. Journal of
Clinical Epidemiology. 11// 1998; 51(11):1189-1202.
13. Ruo B, Choi SW, Baker DW, Grady KL, Cella D. Development and
Validation of a Computer Adaptive Test for Measuring Dyspnea in Heart
Failure. Journal of Cardiac Failure. 8// 2010; 16(8):659-668.
102 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
14. Williams MV, Davis T, Parker RM, Weiss BD. The role of health literacy
in patient-physician communication. Fam Med. 2002; 34(5):383-389.
15. Lee HY, Rhee TG, Kim NK. Cancer literacy as a mediator for cancer
screening behaviour in Korean adults. Health Soc Care Community. 2015;
May 14.
16. Kim K, Han HR. Potential links between health literacy and cervical
cancer screening behaviors: a systematic review. Psychooncology. 2015;
Jun 18.
17. Cutilli CC, Bennett IM. Understanding the Health Literacy of America
Results of the National Assessment of Adult Literacy. Orthop Nurs. 2009;
28(1): 27–34.
18. Olsen CM, Green AC, Neale RE, et al. Cohort profile: The QSkin Sun and
Health Study. International Journal of Epidemiology. August 1, 2012;
2012;41(4):929-929i.
19. Masters GN. A Rasch model for partial credit scoring. Psychometrika
1982; 47(2), 149-174.
20. Lai WP, Chien TW, Lin HJ, Su SB, Chang CH. A screening tool for
dengue fever in children. The Pediatric Infectious Disease Journal. 2013;
32(4):320-324.
21. Andrich D. A rating formulation for ordered response categories.
Psychometrika. 1978; 43, 561-73.
22. Smith RM, Suh KK. Rasch fit statistics as a test of the invariance of item
parameter estimates. J Appl Meas 2003; 4(2):153-163.
23. Holland PW, Wainer H. Differential Item Functioning. Hillsdale. NJ:
Lawrence Erlbaum. 1993.
24. Tennant A, Pallant J. DIF matters: A practical approach to test if
Differential Item Functioning makes a difference. Rasch Measurement
Transactions. 2007; 20(4),1082-1084.
25. Linacre JM. RUMM2020 Item-Trait Chi-Square and Winsteps DIF Size.
Rasch Mea Trans. 2007; 21(1):1096.
26. Linacre JM. WINSTEPS. URL: http://www.winsteps.com/index.htm
[accessed 2014-03-27].
27. Linacre JM. Optimizing rating scale category effectiveness. J Appl Meas
2002; 3(1):85-106.
28. Stephan C, Wesseling S, Schink T, Jung K. Comparison of eight computer
programs for receiver-operating characteristic analysis. Clinical
Chemistry. Mar 2003; 49(3):433-439.
29. Bock RD, Aitkin M. Marginal maximum likelihood estimation of item
parameters: Application of an EM algorithm. Psychometrika. Dec 1981;
46(4):443-459.
Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing 103
30. Embretson SE, Reise SP. Item response theory for psychologists.
Lawrence Erlbaum Associates; 2000.
31. Linacre JM. Computer-Adaptive Tests (CAT), Standard Errors and
Stopping Rules. Rasch Measurement Transactions. 2006; 20(2):1062.
32. Hsueh IP, Chen JH, Wang CH, Hou WH, Hsieh CL. Development of a
computerized adaptive test for assessing activities of daily living in
outpatients with stroke. Physical Therapy. 2013; 93(5):681-693.
33. Linacre JM. How to Simulate Rasch Data. Rasch Measurement
Transactions. 2007; 21(3):1125.
34. Chien TW, Wu HM, Wang WC, Castillo RV, Chou W. Reduction in
patient burdens with graphical computerized adaptive testing on the ADL
scale: tool development and simulation. Health and Quality of Life
Outcomes. 2009; 7:39.
35. Wainer H, Dorans NJ, Flaugher R, Green BF, Mislevy RJ. Computerized
adaptive testing: A primer. Routledge; 2000.
36. Smith EV Jr. Detecting and evaluating the impact of multidimensionality
using item fit statistics and principal component analysis of residuals. J
Appl Meas. 2002; 3(2):205-231.
37. Chien T. Cronbach's alpha with the dimension coefficient to jointly assess
a scale's quality. Rasch Meas Trans. 2012; 26(3):1379
38. Robinson KJ, Gaber R, Hultgren B, et al. Skin Self-Examination
Education for Early Detection of Melanoma: A Randomized Controlled
Trial of Internet, Workbook, and In-Person Interventions. J Med Internet
Res. 2014/01/13 2014; 16(1):e7.
39. Brady MS, Oliveria SA, Christos P, et al. Patterns of detection in patients
with cutaneous melanoma. Cancer. 2000; Jul 15;89(2):342-347.
40. Berwick M, Begg C, Fine J, Roush G, Barnhill R. Screening for
Cutaneous Melanoma by Skin Self-Examination. JNCI Journal of the
National Cancer Institute. 1996; Jan 03;88(1):17-23
41. Karabatsos G. Comparing the aberrant response detection performance of
thirty-six person-fit statistics. Applied Measurement in Education. 2003;
16(4), 277-298.
42. Eack SM, Singer JB, Greeno CG. Screening for anxiety and depression in
community mental health: the Beck Anxiety and Depression Inventories.
Community Mental Health Journal. Dec 2008; 2008;44(6):465-474.
43. Shear MK, Greeno C, Kang J, Ludewig D, et al. Diagnosis of
nonpsychotic patients in community clinics. The American Journal of
Psychiatry. Apr 2000; 2000; 157(4):581-587.
44. Ramirez BM, Bostic JQ, Davies D, et al. Methods to improve diagnostic
accuracy in a community mental health setting. The American Journal of
Psychiatry. Oct 2000; 157(10):1599-1605.
104 Chapter 4: Estimating Skin Cancer Risk: Evaluating Mobile Computer Adaptive Testing
45. De Beurs PD, de Vries LMA, de Groot HM, de Keijser J, Kerkhof JFMA.
Applying Computer Adaptive Testing to Optimize Online Assessment of
Suicidal Behavior: A Simulation Study. J Med Internet Res. 2014/09/11
2014; 16(9):e207.
46. Antal M, Imre A. Computerized adaptive testing: implementation issues.
Acta Univ. Sapientiae, Informatica, 2, 2, 2010; 168–183
47. Mitchell JS, Godoy L, Shabazz K, Horn BI. Internet and Mobile
Technology Use Among Urban African American Parents: Survey Study
of a Clinical Population. Journal of Medical Internet Research.
2014/01/13 2014; 16(1):e9.
QUT Verified Signature
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory 107
Diagnostic Discrimination of the Skin Cancer Risk (SCR) scale: Application of
Item Response Theory
Ngadiman Djaja1,2, Monika Janda1,2, Catherine M. Olsen3, David C. Whiteman1,2,3
1 School of Public Health and Social Work, Institute for Health and Biomedical
Innovation, Queensland University of Technology, Brisbane, Australia.
2 National Health and Medical Research Council Centre for Research Excellence in
Sun and Health (CRESH), Brisbane, Australia
3 QIMR Berghofer Medical Research Institute, Brisbane, Australia.
Citation
Djaja, N., M. Janda, C. M. Olsen and D. C. Whiteman (2015). Diagnostic
Discrimination of the Skin Cancer Risk (SCR) scale: Application of Item Response
Theory. International Outcome Measurement Conference. Chicago.
108 Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory
ABSTRACT
Aims:
Queensland, Australia has the world’s highest incidence of skin cancer. Self-
administered scales are commonly used to measure risk factors such as phenotype,
sun exposure and sun protection, or overall skin cancer risk (SCR). We sought to
develop new scales for measuring skin cancer risk and calibrate it using PCM.
Subjects:
Prospective skin cancer risk cohort of 43794 men and women aged 40–69 years
randomly sampled from the population of Queensland, Australia.
Analysis:
Dimensionality of the scale and calibration of items were studied using the Partial
Credit Model. Receiver operating characteristics (ROC) curves analyses were used to
assess how well the final items predicted future development of skin cancer.
Results:
Four of twenty nine items had mean square values outside acceptable boundaries,
indicating item misfit. Item calibration found that item measures between -2.800 and
+1.950 logits on the SCR scale. Diagnostic discrimination showed area under the
curve (AUC) statistics of .753 (p < .000), .530 (p < .000) and .487 (p=0.093), for the
phenotype (PE), sun exposure (SE) and sun protection (SP) subscales, respectively.
Conclusion:
The results show unidimensional structure of each SCR subscale. Item calibration
shows they are distributed along the continuum. Only the PE subscale shows good
predictive discrimination.
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory 109
INTRODUCTION
Queensland (Australia) has the highest rates of melanoma and other skin cancers in
the world [1, 2]. On average, two out of every three Australians will be treated for
skin cancer at some stage during their lives, and skin cancers form approximately 80
percent of all new cancers diagnosed [2]. In the USA, there 76,100 new cases of
melanoma and 9,710 deaths are estimated for 2014[3].
To appropriately stratify patients for management and counselling, doctors are
seeking tools to accurately estimate a person’s future risk of skin cancer. Many self-
administered questionnaires [4-6] have been developed. However, the measures often
do not appear to have been developed rigorously according to current psychometrics
standards such as validity, reliability, errors of measurement, norms, or score
comparability [7]. The few measures which reported their psychometric properties
used classical test theory [8-11]. The purpose of this paper was to apply a Partial
Credit Model (PCM) to establish optimal scale composition and establish predictive
validity to contribute evidence for the value of measuring skin cancer risk (SCR) by
self-report.
METHODS
Participants
Data was obtained from the QSkin Sun and Health Study prospective cohort study of
43,794 men and women aged 40–69 years randomly sampled from the population of
Queensland, Australia in 2011 [12]. The primary aim of the QSkin study is to
improve understanding of skin cancer risk. The QSkin study was approved by the
human research ethics committee of the QIMR Berghofer Medical Research
Institute.
Instruments
The present study used 29 items of the SCR scale for measuring three subscales: (1)
Phenotype/PH (twelve items), (2) Sun Exposure/SE (eleven items), and (3) Sun
Protection/SP (six items).A partial credit score coded the response to each item, with
score ranging from 0 (low risk) to 5 (high risk).
110 Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory
Procedure and Analysis
IRT analyses
ACER ConQuest software [13] was used to calibrate models to examine item-level
fit statistics for each of the SCR subscales separately; 2/3 of study population was
used for item calibration. In this study the Rasch PCM [14] was used because it is
most appropriate to modelling items with more than two ordered response categories.
Diagnostic discrimination
To investigate the diagnostic discrimination of the SCR scale, we used receiver
operating characteristics (ROC) curve analysis. For these purposes, item parameters
obtained from calibrated samples provided the anchor parameters to estimate SCR of
the validation sample (1/3 of study population, 13,178 persons; 11,528 people with
no KC and 1,650 (12.5 %) with reported KC) applying the risk score towards
correctly predicting the development of a new skin cancer.
RESULTS
Item fit
Unweighted and weighted mean square (MNSQ) statistics were calculated to
examine item fits. Adam and Kho [15] in Wilson [16] recommended these values
should be within the tolerance bounds of 0.7-1.3 Table 5.1 shows the fit statistics for
all items. Three itesm (SE1, SE2 and SE3 in the Sun Exposure (SE) subscale and one
item in the sun protection (SP6) subscale showed misfit, and were removed from
further analyses.
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response Theory 111
Table 5.1: Item parameter estimations and fit statistics of skin cancer risk (SCR) scale
Item Description Estimate Error Unweighted fit Weighted fit
MNSQ T MNSQ T
Phenotype (PH) scale
PH1 Sex -0.043 0.015 1.01 0.80 1.01 4.00
PH2 Skin colour -2.682 0.027 0.94 -5.50 0.96 -3.80
PH3 Skin burn ability -0.37 0.011 0.96 -4.20 0.96 -5.00
PH4 Skin tan -0.652 0.011 1.13 11.80 1.11 10.70
PH5 Eye colour -0.311 0.005 1.03 2.70 1.02 3.80
PH6 Hair colour 0.294 0.008 1.09 7.80 1.07 11.90
PH7 Freckles 0.49 0.01 0.96 -4.10 0.97 -3.50
PH8 Moles 0.48 0.014 1.01 0.80 1.01 1.10
PH9 Sunbeds use 0.977 0.022 1.10 9.40 1.02 0.60
PH10 Number of skin cancer cut off 0.76 0.014 0.90 -9.30 0.94 -5.30
PH11 Number of skin cancer frozen 0.291 0.006 0.91 -8.90 0.94 -5.20
PH12 Close blood have melanoma 0.767*
0.99 -1.30 0.99 -1.20
Sun exposure (SE) scale
SE1 Sunburn frequency when child 1.06 0.014 1.54 43.60 1.45a 33.30
SE2 Sunburn frequency when teenager 0.442 0.012 1.41 33.80 1.37 a 30.70
SE3 Sunburn frequency when adult 1.094 0.015 1.40 33.00 1.31 a 22.90
SE4 Outdoor duration – weekday – past year 0.585 0.009 0.99 -1.00 0.98 -2.00
SE5 Outdoor duration – weekday – age 10-19 -0.584 0.011 0.88 -11.40 0.89 -12.00
SE6 Outdoor duration – weekday – age 20-29 -0.036 0.009 0.81 -19.10 0.83 -19.70
SE7 Outdoor duration – weekday – age 30-39 0.248 0.009 0.83 -16.50 0.86 -16.20
112 Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response Theory
SE8 Outdoor duration – weekend – past year -0.086 0.009 0.93 -6.70 0.93 -7.90
SE9 Outdoor duration – weekend – age 10-19 -1.272 0.014 0.81 -18.90 0.84 -15.10
SE10 Outdoor duration – weekend – age 20-29 -0.927 0.012 0.69 -32.30 0.72 -31.20
SE11 Outdoor duration – weekend – age 30-39 -0.526*
0.75 -25.80 0.76 -27.10
Sun Protection (SP) scale
SP1 SPF to face 0.487 0.017 0.83 -17.40 0.88 -16.70
SP2 SPF to hands -1.273 0.021 0.63 -40.50 0.84 -11.10
SP3 SPF to other body parts -2.227 0.027 0.59 -45.10 0.90 -4.50
SP4 Use of SPF 0.576 0.017 0.79 -21.00 0.85 -20.90
SP5 Sunscreen usage frequency last year 0.607 0.013 1.01 0.90 1.02 1.70
SP6 Hat usage frequency last year 1.830*
1.45 37.10 1.39 a 34.20 aItems beyond acceptable boundaries.
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory 113
The item and person map in Figure 5.1 shows the calibration of PE subscale items
and the position of persons on the SCR continuum. The common logit scale is
represented on the vertical line in the centre of the map. An “X” in the person
column represents the position of a person on the skin cancer risk continuum; in this
large dataset, “X” represents a group of 103.6 persons
Figure 5.1: Items person map of PH subscale.
114 Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory
Figure 5.2: Items person map of SE subscale.
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory 115
Figure 5.3: Items person map of SP subscale.
116 Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory
Person measures on the common logit scale make it possible to find out the most
likely response to the items composing the SCR scale. For example, in Figure 5.4,
Person P has a moderate PH score (PH =0.5). It may be observed that in item
PH3(PH3=-0.37), the most likely response of this person is option 3 (burn
moderately) and in item PH10 (PH10 = 0.76) the most likely response is option 3 (2-
10 skin cancers)
Figure 5.4: Example of most probable response for a person with skin cancer risk in
PH scale of 0.5 logits.
Category Probabilities Curves
The Category Probability Curves (CPC) represents the ability threshold parameters
of the item steps (m). These curves provide information on the functioning of the
alternative responses. The intersections between the curves (thresholds) define limits
Most probable
response
Item PH3(-0.37): burn
moderately
Item PH10 (0.76): 2-
10 skin cancers
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory 117
of the “most probable response regions” on the scale continuum. As shown in Figure
5.5 (CPC of item PH3), all the response categories are the most probable in some
section of the continuum, which indicates that they are functioning properly. The
region of most probable response for persons with PH3 score of 0 is between - and
-1.99 logits; the most probable response is 1 for persons with score between -1.99
and -0.15 logits etc.
Figure 5.5: Category probability curves of item PH3
Validation sample characteristics
Table 5.2: Skin cancer risk score for each subscale in validation sample
Phenotype Sun Exposure Sun Protection
No KC
M (SD)
KC
M (SD)
No KC
M (SD)
KC
M (SD)
No KC
M (SD)
KC
M (SD)
p value p value p value
-.429
(.149)
-.303
(.158)
.350
(1.514)
.500
(1.489)
.253
(2.161)
.157(2.172)
p < 0.000 p < 0.000 p = 0.095
Skin cancer risk
2 = -
0.15
1 = -
1.99
3 = 1.04
Pr
ob
ab
ilit
y
R
1 R
2 R
3 R
4
118 Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory
Table 5.2 shows that people with KC have higher scores in the PE(p<0.000) and SE
subscales(p<0.000) and lower scores in the SP subscale (p=0.095) compared to
people with no KC.
Diagnostic discrimination
The optimal cut-off score for all three subscales were assessed using ROC analysis.
Prediction of KC was used as the outcome variable. The area under the curve (AUC)
for the three subscales was .753 (p < .000), .530 (p < .000) and.487 (p=0.093),
indicating that the PE scale differentiated well between people who will or will not
develop a new KC, but the SE and SP scales had no or very fair predictive ability.
DISCUSSION
This study showed that an IRT calibrated PE subscale has good ability to predict
development of future skin cancers, while less can be gained from the SP or SE
subscales.
This study has some limitations. First, the study population consisted people from the
location with the highest incidence of skin cancer in the world; thus the calibrated
instrument may not be suit other populations. Secondly, the SP subscale only
measures use of sunscreen, but not other ways of protecting oneself from the sun. In
future studies, we aim to add more items, such as protective clothing and related
protective behaviours.
Figure 5.6: ROC curve
Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory 119
REFERENCES
1. Queensland Cancer Registry, Cancer in Queensland: Incidence and
Mortality, 1982 to 2007. Cancer Council Queensland: Brisbane, Australia.
2010.
2. Australian Institute of Health and Welfare, Australian Cancer Incidence and
Mortality (ACIM) Books. Canberra: Australian Institute of Health and
Welfare. 2012.
3. Siegel R, DeSantis C, Jemal A. Colorectal cancer statistics, 2014. CA: a
cancer journal for clinicians. 2014; Mar 1;64(2):104-17.
4. Mackie R, Freudenberger T, Aitchison TC. Personal risk-factor chart for
cutaneous melanoma. The Lancet. 1989; Aug 26;334(8661):487-90.
5. Tacke J, Dietrich J, Steinebrunner B, Reifferscheid A. Assessment of a new
questionnaire for self-reported sun sensitivity in an occupational skin cancer
screening program. BMC dermatology. 2008; Oct 24;8(1):1.
6. Weinstock MA. Assessment of sun sensitivity by questionnaire: validity of
items and formulation of a prediction rule. Journal of clinical epidemiology.
1992; May 31;45(5):547-52.
7. American Educational Research Association, American Psychological
Association, and National Council on Measurement in Education, Standards
for educational and psychological testing. 1999; Amer Educational Research
Assn.
8. de Troya-Martin M, Blázquez-Sánchez N, Rivas-Ruiz F, et al. Validation of a
Spanish questionnaire to evaluate habits, attitudes, and understanding of
exposure to sunlight:“the beach questionnaire”. Actas Dermo-Sifiliográficas
(English Edition). 2009; Dec 31;100(7):586-95.
9. Tripp MK, Carvajal SC, McCormick LK, et al. Validity and reliability of the
parental sun protection scales. Health Education Research. 2003; Feb 1;
18(1):58-73.
10. Glanz K, McCarty F, Nehl EJ, et al. Validity of self-reported sunscreen use
by parents, children, and lifeguards. American journal of preventive medicine.
2009; Jan 31; 36(1):63-9.
11. Hedges T, Scriven A. Young park users’ attitudes and behaviour to sun
protection. Global health promotion. 2010; Dec 1; 17(4):24-31.
12. Olsen CM, et al. Cohort profile: the QSkin sun and health study. International
journal of epidemiology. 2012; Aug 1; 41(4):929-i.
13. Adams R, Wu M, Wilson M. ACER ConQuest 3.0.1. ACER: Melbourne,
Australia. 2013.
14. Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982
Jun 1; 47(2):149-74.
120 Chapter 5: Diagnostic Discrimination of the Skin Cancer Risk (SCR) Scale: Application of Item Response
Theory
15. Adams RJ, Khoo ST. Quest: the interactive test analysis system (Melbourne,
Australian Council for Educational Research). AdamsQuest: the interactive
test analysis system. 1996.
16. Conrad KJ, Wilson M. Constructing measures: An item response modeling
approach. Erlbaum Associates Mahwah, NJ. Evaluation and Program
Planning. 2005; Nov 30; 28(4):433-4.
QUT Verified Signature
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 123
Development and Psychometric Evaluation of Item Banks for the Assessment of
Skin Cancer Risk Using Item Response Theory
Ngadiman Djaja1,3,4*, David C. Whiteman1,4,6 , Philippa Youl1,4,5, Katherine M.
White2,3 , Michael Kimlin1,4,7 Monika Janda1,3,4
1 School of Public Health and Social Work, Faculty of Health, Queensland
University of Technology, Brisbane, Australia
2 School of Psychology and Counselling, Faculty of Health, Queensland University
of Technology, Brisbane, Australia
3 Institute of Health and Biomedical Innovation, Queensland University of
Technology, Brisbane, Australia
4 National Health and Medical Research Council Centre for Research Excellence in
Sun and Health, Institute of Health and Biomedical Innovation, Queensland
University of Technology, Brisbane, Australia
5 Cancer Council Queensland, Brisbane, Australia
6 QIMR Berghofer Medical Research Institute, Brisbane, Australia
7Health Research Institute (HRI), The University of the Sunshine Coast, Australia
124 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
ABSTRACT
Objective: Accurate assessment of skin cancer risk using self-reported questionnaire
items is important for epidemiological studies and appropriate targeting of
interventions. We assessed the psychometric properties of previously used items for
assessing skin cancer risk, and then evaluated those with good properties for their
reliability, stability and using item response theory.
Methods: A cohort of 1,177 participants aged 18-75 years living in Queensland,
Australia completed an online questionnaire between Winter and Spring 2015. The
questionnaire contained 51 items from a previously developed skin cancer risk item
bank. We assessed whether items measured risk on a unidimensional scale, and
whether item response categories represented increasing levels of risk. We examined
internal consistency using Cronbach’s alpha. To measure scale stability over time,
201 of these participants completed the questionnaire again within eight to ten
weeks. We measured the discriminative accuracy of the tool by calculating the area
under the receiver operating curve (AUC) of correctly identifying people with
previous self-reported melanoma or keratinocyte skin cancers.
Results: Three of 19 questions from the phenotype scale were removed due to misfit
with the model. All items from the sun exposure and sun protection subscales
showed good fit. Internal consistency was high (Cronbach alpha range: 0.73-0.89), as
was stability over time (retest coefficient: 0.74-0.95). Diagnostic discriminatory
accuracy was high for self-reported history of melanoma for the phenotype subscale
(AUC 0.72, 95% CI 0.65-0.78); moderate for the sun exposure scale (AUC 0.62,
95% CI 0.54-0.69); and low for the sun protection scale (AUC 0.36, 95% CI 0.29-
0.43). Similar discriminative ability scores were observed for self-reported non-
melanoma skin cancer (AUC phenotype scale 0.82, 95% CI 0.78-0.86; AUC sun
exposure scale 0.61, 95% CI 0.56-0.66; AUC sun protection scale 0.36, 95% CI,
0.31-0.41).
Conclusions: The new risk assessment scale for skin cancer derived performs well
and could be used in classical paper-pencil or computer adaptive assessment. Due to
its brevity and precision it may be an attractive tool for clinicians or researchers
seeking to measure personal skin cancer risk.
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 125
Keywords
Skin cancer, skin cancer risk, sun protection, sun exposure, Item Response Theory;
Partial Credit model, Rasch model, psychometrics
126 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
INTRODUCTION
Questionnaires (also commonly called surveys or scales), are frequently used in
health research to gather information on health-related behaviours, especially those
that are not easily directly observed. They are crucial for epidemiological studies,
and are also commonly used to evaluate the outcomes of health interventions. Before
a questionnaire can be used, however, its measurement properties (objectivity,
dimensionality, reliability, validity, non-differential item function, sensitivity to
change, and discrimination between known groups) must be demonstrated [1]. The
International Epidemiological Association European questionnaire group highlighted
the need to improve the quality of questionnaires administered especially for the
purpose of assessing risk factors, given that they provide important information
essential for health policy planning [2, 3].
In skin cancer prevention research, besides questionnaires, there are several other
commonly used data collection methods. These include sun diaries to record the time
outdoors or clothing worn [4], skin swabbing to assess whether sun screen has been
applied [5], direct observation of sun exposure or sun protection behaviours [6], or
ultraviolet radiation dosimeters [5, 7]. Compared to questionnaires these methods are
often more burdensome for the participants and researchers and are usually much
more costly. While questionnaires are convenient and cost-effective unfortunately to
date no standardised or commonly agreed upon questionnaire for measuring
behaviours related to skin cancer risk, sun exposure and sun protection is available.
On the contrary, many questionnaires used in intervention studies, population studies
or randomised control trials [8] have been developed de novo. They differ in content
and design, and many have not been validated formally [9]. There were some
previous efforts that aimed at providing a standard in measuring skin cancer related
behaviours. For example: Glanz [10] proposed standardised core survey items for the
measurement of sun exposure and sun protection practices for epidemiologic
research. Similarly, the National Human Genome Research Institute [11]
recommended a questionnaire to assess the main melanoma risk factors, such as
family history, number of nevi, sun exposure, freckling tendency and skin type [12-
14]. A recent systematic review analysed twenty-five risk models for the prediction
of melanoma with 144 possible risk factors identified [15] and only four validation
articles were included in the synthesis. While all models demonstrated good
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 127
discrimination, most did not assess psychometric properties. Therefore, the objective
of the current study was to develop a reliable and valid questionnaire to measure skin
cancer risk using Item Response Theory (IRT) models. IRT is a modern
psychometric approach that has rarely been applied for skin cancer risk questionnaire
development, although it is widely used in other disciplines such as educational [16,
17] and patient-reported outcome assessments [18, 19]. IRT models have several
advantages over classical test theory such as allowing more precise estimates of the
outcome of interest, confirmation of the unidimensionality of the outcome measure,
whether the response scale is used consistently by participants, ability to equate and
link different scales that measure the same underlying construct [20], providing
information about each individual item’s reliability, including whether an item has
bias toward certain group (called differential item functioning) [21], and allowing to
create computer adaptive tailored questionnaire presentation modes facilitating
economical assessment [22-24]. Thus, we used data from previous studies to select
the best performing questionnaire items to derive a new risk assessment scale
(SunAus scale), and then tested the psychometric properties of this scale in a newly
recruited sample of participants.
METHODS
Sample
This study was approved by the institutional ethics committee at Queensland
University of Technology (1200000553) and was undertaken in compliance with the
ethical guidelines of the National Health and Medical Research Council (NHMRC).
An online sample of 1,177 participants aged 18 years and older living in Queensland,
Australia was recruited during the southern hemisphere Winter (June-August) 2015.
During these months the ultraviolet radiation index commonly ranges between 4
(moderate) and 10 (very high) depending on latitude. Recruitment was conducted
through traditional and social media including email lists and a study Facebook page
(https://www.facebook.com/SunSurveyAustralia), Twitter, local radio and
newspaper, as well as an online research panel. Inclusion criteria were age ≥18 years
and currently living in Queensland. Exclusion criteria were: unable to access the
Internet, and problems with reading or understanding the English language. At the
end of their baseline online survey, participants recruited via university, social media
128 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
and local channel were invited to retake the survey a second time within eight to ten
weeks to examine test-retest reliability and stability over time.
Figure 6.1: Recruitment of participants.
Development of SunAus Scale
We started the development of the new scale by assessing the quality of items from
four previous studies conducted in Queensland which had captured information on
skin cancer risk factors using different items and instruments. The Melanoma
Screening Trial (MST) was a randomised trial of population screening for melanoma
in Queensland with a total of 3,110 participants (1,559 men and 1,551 women) [25].
The second study was the Queensland Cancer Risk Study (QCSR), a population-
based study of 9,419 Queensland residents aged 20-75 years that aimed to describe
the population prevalence of key cancer risk behaviours in Queensland [26]. The
third study entitled QSkin Sun and Health [27] (QSkin), was a population-based
cohort study of 43,794 men and women aged 40-69 years randomly sampled from
the population of Queensland, Australia, in 2011. Lastly, we used data from the
population-based cross-sectional study AusD Study (n=1,002 participants) [28]
(AusD). AusD was designed to assess vitamin D status and determinants across a
range of latitudes and seasons. It also aimed to identify the association between
participants’ attitudes about vitamin D and their self-reported changes to sun-
protection or exposure behaviours.
Baseline SunAus
participants
N = 1,177
University, social media,
local channel
n = 677
Online panel
n = 500
Test-retest within 8 weeks
n = 201
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 129
From each of the study questionnaires, we grouped items in the following three
subscales: phenotype, sun exposure behaviours, and sun protection behaviours. All
items were assessed for their item response theory properties including item fit and
category disordering (non-sequential categories) and only items with those good
psychometrics properties were retained for further development. An example of
category disordering was the eye colour question. We had different category answers
for this particular item. The QSkin study offered the options blue, grey, green, hazel,
brown, the MST and QCSR have the option blue or grey, green or hazel, brown or
black. An examination of category threshold showed that the analyst-assigned
category order does not accord with the underlying latent construct and the average
measures for each category are out of order/sequence [29]. This item showed no
category disordering after we combine them into three categories: brown or black,
green or hazel, blue or grey. More detailed information about disorder threshold can
be found in Andrich [30] and Adams [31]. After assessing item fit and category
disordering, proposed items were presented for content validation to four content
experts (two psychology researchers and two epidemiologists) with extensive
experience in skin cancer research [32]. The purpose of the discussion was to
eliminate redundant items and add new items not covered in previous studies. 240
items were reviewed, 30 items were selected, 15 items were changed, and 8 items
were added. Most of the changes to items were to make them more specific (e.g:
adding ‘volunteer/unpaid’ response category option to questions asking about main
occupation) or collapsing answer category (e.g., in eye colour questions: brown,
black, green, hazel, blue and grey become brown or black, green or hazel, blue or
grey). Overall, the new scale included 53 items measuring a broad range of possible
determinants of skin cancer. Complete item content can be seen in Table 1 and a full
description of the items are available in the appendix. These 53 items as well as
demographic information questions (year of birth, sex, education, employment status,
language usually speak at home, and ethnicity) were administered online.
130 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Table 6.1: Overview of items measured on the SunAus Scale
Domain
Phenotype Item
ID
Sun Exposure Item
ID
Sun Protection Item
ID
Skin colour 1 Use of solarium 9 Sunscreen -
face
26
Skin type 2,3,4 Attempt to get a suntan 11 Sunscreen –
other body
parts
27
Hair colour 5 Lifetime sunburn 17 Frequency of
sunscreen use
28
Eye colour 6 Last 12 months sunburn 18 Sunscreen use
during last
weekend
29
Moles at 18 yrs 7 Lifetime sunburn – child 19a Wear a broad-
brimmed hat
30a
Freckles at 18
yrs
8 Lifetime sunburn -
teenager
19b Wear a cap 30b
Melanoma
status
12 Lifetime sunburn – adult 19c Wear any other
head covering
30c
NMSC status 13 Main occupation –
lifetime
20 Wear a shirt
with long
sleeves
30d
Number of skin
cancer that has
been cut-off
14 Main occupation – current 21 Wear long
trousers
30e
Number of
sunspots that
has been frozen
15 Weekdays sun exposure 22 Wear
sunglasses
30f
Close blood
relatives with
skin cancer
16 Weekends sun exposure 23 Stay in the
shade
30g
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 131
Moles larger
than 2mm
31 Weekdays sun exposure –
5 to 12 years
13 to 19 years
20 to 39 years
40 to 65 years
After 65 years
24a
24b
24c
24d
24e
Use an
umbrella
30h
Moles larger
than 5mm
32 Weekends sun exposure –
5 to 12 years
13 to 19 years
20 to 39 years
40 to 65 years
After 65 years
25a
25b
25c
25d
25e
Limit time in
the sun during
peak UV hours
30i
Ancestor
Father’s father
Father’s mother
Mother’s father
Mother’s
mother
33a
33b
33c
33d
Data Analysis
To derive the optimal items for testing, data from all four existing studies were
calibrated using the IRT model and misfit items were removed followed by expert
content review to determine final items. Figure 1 shows the data analysis process in
this study.
132 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Figure 6.2: Steps in data analysis
Item response theory
Item response theory comprises a series of probabilistic models that describes the
relationship between a non-observable behaviour (also called a latent trait) and a
person’s response to each questionnaire question (also called item – hence the name
Item Response Theory) (see [33-35] for further information regarding item response
theory frameworks and estimation methods). The locations (threshold) along the
continuum of the latent trait values were estimated for each item. A commonly used
item response model called the Rasch Partial Credit Model (PCM) [36] was used for
the item calibration in our study as it allows for different thresholds for each category
across the items on the scale. A Rasch based item response theory software called
ACER ConQuest [37] was used to evaluate the psychometrics properties of our new
scale. The IRT analysis steps were undertaken twice; the first analysis was
MST data QCRS data QSkin data AusD data
IRT analysis of existing items
Items grouped into 3 domains
Focus group discussion and telephone interview
Final items: 53 items
IRT analysis of new items
Final items
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 133
undertaken to assess the characteristics of existing items from four studies and the
second analysis assessed the final items suggested by subject matter experts.
The following data quality parameters were assessed:
Assessment of unidimensionality
We assessed unidimensionality to identify the presence of additional explanatory
dimensions in the data, including fit statistics and principal components analysis
(PCA) of the Rasch residuals [34]. Mean square fit statistics provide summaries of
the Rasch residuals, responses that differ from what is predicted by the Rasch model,
for each person and item. A large number of misfitting items is an indication of
multidimensional construct (a single theoretical concept that is measured by several
related constructs). Unidimensionality assessment using principal components
analysis (PCA) of the Rasch residuals was defined as the first latent dimension that
explained at least 50% of the total variance and unexplained variance in the first
contrast (factor) explained less than 10% [38].
Assessment of item fit
To determine item fit statistics, Infit and Outfit Mean Square (MNSQ statistics) were
calculated. These fit statistics specify how well each item fits the Rasch partial credit
model and therefore helps to identify problematic items. Although there is no
commonly agreed criteria of infit and outfit mean square values, Wilson [33]
suggested these values should lie between 0.75 and 1.33 as an indication of good fit.
Items mean square statistics less than 1.0 (also called overfit) show that the Rasch
model predicts the data too well causing summary statistics (e.g: reliability) to report
inflated statistics On the other hand, mean square statistics greater than 1.0 (also
called underfit) show unpredictability and un-modelled noise indicating that there is
another source of variance in the data.
Assessment of item difficulty, content coverage and item targeting
Item person maps (also called The Wright map) were used to show item difficulty,
content coverage and item targeting for the new scale [33]. Wright maps consist of
two vertical histograms (see Figure 2). The left hand side of histogram shows the
distribution of the measured latent trait (e.g. skin cancer risk) of the participants most
at risk (x’s located at the top left of the map) to least at risk (x’s located at the bottom
134 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
left of the map). The right hand side of the histogram shows the distribution of the
items from the most difficult (high risk items/answer categories) at the top to the
least difficult (low risk items) at the bottom. The Wright map also can be used to
assess content coverage and item targeting of a scale by visually inspecting whether
items are spread nicely along the latent trait.
Assessment of validity and reliability
For concurrent validation, correlation between scores from the newly developed
questionnaire and two previously published questionnaires were calculated. The first
previously published questionnaire measured sun exposure and sun protection habits
(The sun protection habits scale) [10] and the second questionnaire measured
phenotypical risk of skin cancer (the 7-item skin cancer protocol from PhenX
Measures) [11]. Both questionnaires were administered online. Although there were
previous researches [5, 10, 39-42] on validation, no comprehensive study has been
done to assess the internal consistency, test-retest reliability, concurrent validity and
criterion validity of either of these previously published questionnaires. We expected
that the core measures of sun exposure and sun protection habits would correlate at
least moderately with our new sun exposure and sun protection subscales. Likewise,
the PhenX measures should correlate highly with our new phenotype subscale.
For assessment of reliability, we calculated internal consistency by calculating
Cronbach's Alpha coefficient and person separation reliability. To test questionnaire
stability over time, we used data from 201 participants who completed the
questionnaire a second time at eight to ten weeks. Coefficient of test stability was
calculated using Pearson’s product moment correlations between baseline and retest
logits (the mathematical unit of Rasch measurement and are termed locations instead
of scores.
METHODS
Characteristics of the study sample
Sociodemographic characteristics of the 1,177 study participants are presented in
Table 2. Participants were aged between 18 and 75 years (median of 37 years) old,
most participants were female (76.3%), born in Australia (73.0%), Caucasian
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 135
(82.4%), indoor workers (79.9%), almost half were university educated (41.8%) and
one third were full-time workers (32.4%).
The majority of participants reported medium or fair skin and brown or lighter hair.
Over 60% reported green or blue eyes.
Table 6.2: Characteristics of Study Participants (N=1,177) and 2011 Australia census
– Queensland (QLD) State only [43]
Characteristics Participants QLD population
No. % No. %
Sex
Female 898 76.3
2,184,519 50.4
Male 272 23.1 2,148,220 49.6
Missing 7 0.6
Age
Less than 25 years 340 28.9 1,463,625 33.8
25 - 34 years 292 24.8 587,406 13.6
35 - 44 years 211 17.9 620,750 14.3
45 - 54 years 195 16.6 590,886 13.6
More than 55 years 139 11.8 1,070,072 24.7
Median age, years (range) 37 18-75
Country of birth
Australia 859 73.0
Other Countries 312 26.5
Missing 6 0.5
0.5
Ethnic origin
Caucasian 970 82.4
Other 199 16.9
Missing 8 0.7
Mother’s father ethnicity
Asia and Middle East 187 15.9
S.Europe, E.Europe, N.Europe 201 17.1
Scotland and England 170 14.4
Australia and New Zealand 595 50.6
Missing 24 2.0
Mother’s mother ethnicity
Asia and Middle East 190 16.1
S.Europe, E.Europe, N.Europe 175 14.9
Scotland and England 151 12.8
Australia and New Zealand 639 54.3
Missing 22 1.9
Father’s father ethnicity
Asia and Middle East 188 16.0
S.Europe, E.Europe, N.Europe 184 15.6
Scotland and England 205 17.4
136 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Australia and New Zealand 576 48.9
Missing 24 2.0
Father’s mother ethnicity
Asia and Middle East 183 15.5
S.Europe, E.Europe, N.Europe 181 15.4
Scotland and England 199 16.9
Australia and New Zealand 588 50.0
Missing 26 2.2
Highest qualification
No school certificate or other
qualification
22 1.9
School or intermediate certificate 76 6.5
Higher school or leaving certificate 289 24.6
Trade / apprenticeship 61 5.2
Certificate / diploma 228 19.4
University degree or higher 492 41.8
Missing 9 0.8
Employment status
Full-time worker 458 32.4
Part-time worker 297 21.0
Home duties 74 5.2
Unemployed 70 5.0
Student 410 29.0 Retired 69 4.9 Other 36 2.5 Indoor/outdoor work
Mainly indoors 941 79.9
Half indoors and half outdoors 191 16.2
Mainly outdoors 36 3.1
Missing 9 0.8
Self-reported skin colour
Black 8 0.7
Olive/Brown 153 13.0
Medium 323 27.4
Fair 691 58.7
Missing 2 0.2
Natural hair colour
Black 211 17.9
Brown 668 56.8
Blonde 221 18.8
Red 46 3.9
Missing 31 2.6
Eye colour
Brown or Black 434 36.9
Green or Hazel 364 30.9
Blue or Grey 377 32.0
Missing 2 0.2
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 137
A comparison of the SunAus study cohort with the 2011 Queensland census data
[43] showed that SunAus participants were more likely than Queensland population
to be female (76.3% vs. 50.4%) and between 25 -34 years (24.8% vs. 13.6%).
Unidimensionality
The unidimensional assumption was met for the phenotype and sun exposure
subscales. The first factor explained 65.3% for the phenotype subscale (with 6.3%
unexplained variance in the first contrast), and 50.5% for the sun exposure subscale
(with 7.9 % unexplained variance in the first contrast), respectively. However, the
sun protection subscale did not meet the assumption of unidimensionality as its first
factor dimension only explained 44.3% of the variance and 10.5 % unexplained
variance in the first contrast.
Assessment of item fit
Tables 3–5 shows item location and item fit of the skin cancer risk subscales. Two
items (SCR31 (Outfit Mnsq=1.55) and SCR32 (Outfit Mnsq =1.44) from the
phenotype subscale) had weighted fit indices >1.33, and likely did not contribute to
the scale’s ability to differentiate participants’ skin cancer risk [33]. These items
asked about the number of moles on the left upper arm (SCR31= larger than 2mm
and SCR32 = larger than 5mm). After removing these items and recalibrating the
scale, we found item 7 (SCR07) “When you were 18 years age, how many moles did
you have on your skin?” had fit indices of 1.39 and was removed as well. All other
items in the three subscales showed good fit.
138 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Table 6.3: Item parameter estimations and fit statistics of phenotype scale
Item
Order Item Code Estimate
INFIT OUTFIT
MNSQ T MNSQ T
1 SCR01 -2.428 0.98 -0.5 1.02 0.4
2 SCR02 -1.388 0.97 -0.8 0.97 -0.8
3 SCR03 -0.442 1.03 0.7 1.01 0.2
4 SCR04 -0.191 1.12 2.8 1.10 2.6
5 SCR05 0.159 0.88 -2.9 0.88 -2.8
6 SCR06 -0.374 0.89 -2.6 0.92 2.7
7* SCR07 0.431 1.12 2.7 1.11 .4
8 SCR08 0.667 1.02 0.4 0.99 -0.3
9 SCR12 2.665 0.96 -1.0 0.99 -0.1
10 SCR13 1.788 0.81 -4.8 0.94 -0.8
11 SCR14 1.270 0.86 -3.4 0.97 -0.3
12 SCR15 0.982 1.10 2.3 1.06 0.6
13 SCR16 0.034 0.86 -3.4 0.91 -4.5
14* SCR31 0.301 2.85 30.4 1.55 9.1
15* SCR32 0.703 5.40 54.9 1.44 3.9
16 SCR33a -1.022 0.86 -3.5 0.81 -5.1
17 SCR33b -1.048 0.78 -5.8 0.78 -5.9
18 SCR33c -1.035 0.81 -4.8 0.79 -5.7
19 SCR33d -1.072 0.80 -5.2 0.76 -6.1
Mean Square fit statistic MNSQ is the item goodness-of-fit statistics of the Rasch model. The estimate
column represents item logits that indicate the difference between the mean item measure for 19 items
and the item measure for each item.
* misfit item
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 139
Table 6.4: Item parameter estimations and fit statistics of sun exposure behaviours
scale
Item
Order Item Code Estimate
INFIT OUTFIT
MNSQ T MNSQ T
1 SCR9 2.038 1.20 4.6 1.04 0.5
2 SCR11 0.808 1.16 3.6 1.11 2.8
3 SCR17 -0.800 1.22 4.9 1.20 4.9
4 SCR18 1.409 1.05 1.3 1.13 1.7
5 SCR19a -0.008 1.14 3.1 1.13 3.4
6 SCR19b -0.716 0.99 -0.2 0.99 -0.1
7 SCR19c 0.377 1.07 1.5 1.06 1.3
8 SCR20 1.117 0.99 -0.2 0.97 -0.6
9 SCR21 1.466 0.99 -0.3 1.00 0.1
10 SCR22 0.651 0.97 -0.6 0.95 -1.3
11 SCR23 -0.640 0.97 -0.8 0.97 -0.8
12 SCR24a -1.089 1.05 1.2 1.05 1.3
13 SCR24b -0.946 0.90 -2.3 0.91 -2.6
14 SCR24c 0.095 0.82 -4.2 0.86 -3.7
15 SCR24d 0.377 0.84 -2.6 0.87 -2.3
16 SCR24e 0.018 0.89 -0.7 0.94 -0.5
17 SCR25a -1.656 1.01 0.3 1.00 -0.0
18 SCR25b -1.447 0.86 -3.5 0.86 -4.0
19 SCR25c -0.772 0.83 -4.1 0.84 -4.5
20 SCR25d -0.222 0.83 -2.7 0.84 -2.9
21 SCR25e -0.057 0.91 -0.5 0.95 -0.3
Mean Square fit statistic MNSQ is the item goodness-of-fit statistics of the Rasch model. The estimate
column represents item logits that indicate the difference between the mean item measure for 21 items
and the item measure for each item.
140 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Table 6.5: Item parameter estimations and fit statistics of sun protection behaviours
scale
Item
Order
Item
Code Estimate
INFIT OUTFIT
MNSQ T MNSQ T
1 SCR26 0.084 0.91 -2.3 0.92 -4.2
2 SCR27 -1.432 0.74 -6.8 0.91 -1.5
3 SCR28 0.126 1.00 0.1 1.03 0.8
4 SCR29 -0.323 0.85 -3.8 0.90 -4.0
5 SCR30a -0.087 1.02 0.6 1.02 0.6
6 SCR30b -0.030 1.22 5.0 1.17 4.3
7 SCR30c -0.689 1.08 1.8 1.06 0.9
8 SCR30d -0.151 0.89 -2.7 0.90 -2.5
9 SCR30e 0.172 1.10 2.4 1.07 1.8
10 SCR30f 1.146 1.15 3.6 1.14 3.7
11 SCR30g 1.117 0.98 -0.5 0.98 -0.6
12 SCR30h -0.817 0.84 -4.1 0.91 -1.4
13 SCR30i 0.884 1.03 0.7 1.03 0.8
Mean Square fit statistic MNSQ is the item goodness-of-fit statistics of the Rasch model. The estimate
column represents item logits that indicate the difference between the mean item measure for 13 items
and the item measure for each item.
Assessment of item difficulty, content coverage and item targeting
The Wright map for the phenotype subscale (Figure 2) shows that respondents’
answers to the questions placed them between -3.5 and +2.5 logits (also called the
log-odds). Participants with high skin cancer risk based on their phenotypical
characteristics are located at around 0.90 logits, and those who had the lowest skin
cancer risk were located at the scale around -2.50 logits. Few respondents were found
to be at either extreme. Most respondents were located at logits between -1.00 and
+0.50.
Inspection of the Wright map also shows no evidence of ceiling or floor effects, with
all participants located within the lowest and highest risk item. This result means that
the scale has good content coverage (spread) of the latent construct being measured,
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 141
and all items were targeting the latent construct well. The easiest item is item 1.1
(SCR01): “How would you rate your natural skin colour on areas never exposed to
the sun (on the underside of your arm)?” located at > -3 logit. Meanwhile the most
difficult item is item 9 (SCR12): “Have you ever been diagnosed with melanoma?”
located > +2 logit. The Wright maps for the other two subscales are presented in the
supplementary file and revealed similar patterns although the sun protection
behaviour subscale had few items. Items of both subscales were spread well across
the latent construct continuum, suggesting that the content matched the distribution
of the participants.
142 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Figure 6.3: The Wright map for phenotype subscale.
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 143
The map represents the relationship between person risk and item difficulty measure
in logits. Each “X” on the left side of the map represents 8.3 participants. The labels
on the right side of the map represents show the levels of item, and step, respectively.
The Wright map shows Thurstonian thresholds for each of the items. The notation
x.y is used to indicate the y-th threshold of the x-th item. As an example, the red
circle represents item 5 (hair colour) with four answer categories (black, brown,
blonde and red). Item 5 has three threshold (number of answer categories-1) and 5.1
represents item 5 with threshold 1
Assessment of standard error of measurement
Figure 3 shows the standard error of measurement at each logit along the Rasch scale
continuum. The Phenotype subscale had low measurement error for respondents
between 0 and +1 logit and high error for respondents at <0 logit. The sun exposure
behaviour subscale had low measurement error for respondents at each logit except
for those at logits of -2.00 or lower (i.e., people with low exposure to the sun. For the
sun protection behaviour subscale, the standard error of measurement was high for
respondents at greater than 1 logit. This result means that the new scale best
measures people with moderate skin cancer risk.
a) b)
144 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Figure 6.4: Distribution of Standard Error Measurement for each domain: a)
phenotype, b) sun exposure, c) sun protection
c)
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 145
Concurrent Validity
Concurrent validity was assessed by correlating the person’s risk scores/person’s
location (in item response theory called theta scores) of each subscale with the total
scores obtained for the core measures of sun exposure and sun protection habits [10],
and the PhenX [11], and are reported in Table 6. All three subscales of the skin
cancer risk scale were significantly correlated with both the measures of sun
exposure and sun protection habits and PhenX measures. The result also show that
the phenotype subscale has the highest correlation (r=0.77, p< .000) with the PhenX
measure and the sun exposure subscale has the lowest but still substantial correlation
(0.52, p< .000) with Glanz’s measure of sun exposure habits.
Table 6.6: Correlation between the SunAus scale, core measure of sun exposure and
sun protection habits and PhenX measure
SunAus scale
Phenotype Sun
Exposure
Sun
Protection
PhenX 0.77*
(N=877**)
Core measures of sun exposure
habits
0.52*
(N=999**)
Core measures of sun protection
habits
0.62*
(N=978**)
* p< .000 (2-tailed)
** Sample varies between each of the scales
Test Reliability
Internal consistency for all subscales was good with Cronbach's alpha coefficients of
0.86, 0.88 and 0.73 for the phenotype, sun exposure and sun protection subscales,
respectively. Person separation reliability also showed similar reliability indices for
the three subscales (0.86; 0.82 and 0.72), respectively.
When we examined stability over time, we found high test-retest reliability at eight
weeks after the first assessment, including for the phenotype (r=0.95, n=201, p <
146 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
0.00), sun exposure behaviour (r=0.80, n=201, p < 0.00), and sun protection
behaviour (r=0.74, n=201, p < 0.00) subscales.
Diagnostic discrimination
The optimal cut-off scores for all three subscales were assessed using ROC analysis,
based on participants self-reported diagnosis of a melanoma (“Have you ever been
diagnosed with melanoma?”) or non-melanoma skin cancer (“Have you ever been
diagnosed with other sorts of skin cancer (keratinocyte cancer, basal cell carcinoma,
or squamous cell carcinoma)?” as the outcome. The area under the curve (AUC) for
melanoma were 0.72 (95% CI, 0.65-0.78), 0.62 (95% CI, 0.54-0.69) and 0.36 (95%
CI, 0.29-0.43) for the phenotype, sun exposure and sun protection subscales,
respectively; and 0.82 (95% CI, 0.78-0.86), 0.61 (95% CI, 0.56-0.66) and 0.36 (95%
CI, 0.31-0.41), respectively, for non-melanoma skin cancer.
Figure 6.5: ROC Curves of outcome variables
Conversion of raw scores to Rasch-scaled scores (theta scores)
We have provided a conversion table to score the SunAus scale for other researchers
who wish to use the SunAus scale and also gain the interval scoring benefits of
Rasch analysis, without performing Rasch analysis themselves. The tables convert
the raw (ordinal) SunAus scores to Rasch measurement estimates. These tables and
the questionnaire can be obtained by contacting the corresponding author by email or
visit our study website (www.sunaus.org).
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 147
DISCUSSION
To date, many different skin cancer risk scales have been developed in Australia and
worldwide, but to our knowledge, no prior instruments were developed by
systematically assessing the performance and discriminant ability of each individual
item within the scale. Identifying individuals at high risk of developing skin cancer is
important for future studies, as well as counselling and prevention, and may also aid
early detection of skin cancer [44, 45].
Our risk prediction model differs from previous models which focused mainly on
phenotypic factors and used logistic regression models to develop their prediction
algorithms [46-49]. We used item response theory to assess skin cancer risk and
validate it against commonly used measures of phenotype and sun exposure. This
method is becoming increasingly popular in clinical research and health outcome
research [50-54]. In contrast to questionnaires developed using classical
psychometric approaches in which reliability and validity are only calculated for the
whole scale, IRT allows the assessment of each item’s individual contribution. This
method can significantly improve the reliability and accuracy of measurement while
providing significant reductions in assessment time through implementation of
computer adaptive testing [55].
The analyses found that 50 items had good fit with the model, and provided adequate
estimates of the underlying latent construct. The analyses furthermore suggested that,
from a statistical perspective, the three items used to assess participants’ mole counts
and sizes did not fit within the underlying construct. The misfit may have occurred
due to the way the questions were constructed, or due to either random or systematic
misclassification. After removing the misfitting items, the content of the three
subscales provided reasonable content coverage for all respondents as shown by
item-person map, with moderate to high concurrent validity of the subscales with
previously validated questionnaires. Future iteration of the SunAus scale will need to
use better item measuring moliness, as this is commonly seen as one of the most
important risk factors for skin cancer [13, 14, 46, 48, 56].
Our assessment of test-retest reliability of the SunAus scale showed high stability
over time, ranging from 0.95, 0.80, and 0.74 for the phenotype, sun exposure
behaviour and sun protection behaviour subscale respectively. Our study yielded
148 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
similar reliabilities to the earlier Glanz study [40], despite using different methods to
calculate the coefficient of reproducibility.
In terms of diagnostic discriminatory accuracy compared to participants’ self -
reported history of skin cancer in the past, the best performing subscale was the
phenotype subscale (AUC 0.72), followed by sun exposure behaviour subscale (AUC
0.62). The discrimination accuracy of phenotype subscale we observed was higher
than previous studies done by Cho [48] with the area under the curve (AUC) of 0.62
(95% CI, 0.58-0.65) and slightly higher than Vuong [57] with AUC of 0.70 (95% CI,
0.67-0.73). This result suggests that the phenotype and sun exposure behaviour
subscales differentiate well between people who did or did not have melanoma and
non-melanoma skin cancer in the past. In contrast, the sun protection behaviour
subscale shows no discriminant ability, which may be caused by lack of evidence of
unidimensionality or may reflect a true independence of whether or not people use
sun protection and skin cancer at least in a high UVR environment such as
Queensland. Further investigation of items that better fit the construct is needed
While previous measures including the QSkin questionnaire have shown excellent
reliability and predictive ability (Chapter 5), the SunAus scale had similar
discriminatory performance when compared with existing model [58] and this study
shows improvement in several aspects of the scale. First, the AusSun scale had more
content coverage (more items) compared to previous measures. Second, use of theta
scores as a personal risk score has advantages over risk score using odds ratio [12] as
it gives the exact location of individuals and each item as well on skin cancer risk
continuum. Third, SunAus scale can be compared across studies through methods
such as scale linking and equating, once common/anchor items across studies are
established. Finally, the SunAus scale can predict risk with smaller measurement
error while using fewer items by incorporating computer adaptive testing approaches
which selectively choose the best performing items in sequence, depending upon a
person’s responses to preceding items as demonstrated in previous study using QSkin
data [55]. This approach can reduce respondents’ burden and be more economical to
implement.
Our study had several limitations. First, we used self-reported skin cancer status as
the outcome variable to examine diagnostic discrimination accuracy of the scale.
This can be improved in the future by conducting prospective studies with objective
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 149
outcome data from clinics or cancer registries. Second, due to time and funding
limitations, we did not validate our scale against objective measures such as
dosimeter UV readings or sunscreen swabbing. Future research should integrate
these objective measures in the validation process. Third, the sun protection
behaviour domain showed lack of evidence of unidimensionality. This finding means
that there is at least one other dimension measured by the scale, which could not be
identified in this study. Thus, future studies will need to compare the subscales
against objective outcomes such as skin colour spectrometry or sun screen swabbing
and determine better items that fit with the sun protection domain [42, 59]. Fourth,
even though our sample size was large (n=1,177), it consisted of Queensland
residents only. We expect that with a larger sample from more diverse geographic
locations, we may have observed larger variation in phenotype, sun exposure or sun
protection behaviours. Lastly, although beyond the scope of this study, testing the
invariance of the scale factor structure between different groups would allow us to
further refine the scale and provide evidence of its applicability in different
populations.
CONCLUSIONS
This work presented a comprehensive set of research studies culminating in the
development of a new SunAus skin cancer risk scale measuring phenotype, sun
exposure behaviour and sun protection behaviours based on the best items selected
from previous scales. The scale can serve as the framework to develop an
international standard measurement tool for skin cancer risk assessment, and could
also be used to develop a computer adaptive test for use in research and public health
practice.
Competing interests:
The authors declare they have no competing interests.
Acknowledgements
Ngadiman Djaja is supported by the National Health and Medical Research Council
of Australia (NHMRC) CRESH PhD scholarship. The authors are deeply grateful for
the support by Associate Professor Peter Newcombe, Amanda Weaver and QUT
Media staff during data collection.
150 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Supplementary content
Scoring of the Skin Cancer Risk Scale
A score conversion table can be used to convert raw score to Rasch (theta) score. A
response to the lowest category scores 0 and each subsequent category scores an
additional 1 point until last categories. The maximum score for each item depend on
the number of categories on that particular item. To use the conversion table, simply
sum the score of each item in each scale and refer to the corresponding table to find
corresponding theta score.
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 151
Table 6.7: Supplement 1: Table for conversion of phenotype scale summed item
scores to Rasch measures
Score Theta SE
Score Theta SE
0 -6.21624 1.73465
31 1.35226 0.36688
1 -4.66725 1.04120
32 1.47564 0.36187
2 -3.82164 0.80700
33 1.59523 0.36032
3 -3.26776 0.67548
34 1.71454 0.36239
4 -2.86424 0.58914
35 1.83616 0.36838
5 -2.55017 0.52733
36 1.96309 0.37884
6 -2.29540 0.48076
37 2.09878 0.39526
7 -2.08246 0.44491
38 2.24931 0.42010
8 -1.89986 0.41707
39 2.42589 0.45844
9 -1.73950 0.39544
40 2.65208 0.51947
10 -1.59558 0.37847
41 2.98134 0.62728
11 -1.46360 0.36564
42 3.54202 0.84140
12 -1.34021 0.35616
43 4.76171 1.49072
13 -1.22274 0.34956
14 -1.10901 0.34552
15 -0.99712 0.34378
16 -0.88533 0.34420
17 -0.77208 0.34668
18 -0.65566 0.35116
19 -0.53440 0.35754
20 -0.40656 0.36572
21 -0.27038 0.37533
22 -0.12428 0.38596
23 0.03283 0.39660
24 0.20112 0.40591
25 0.37924 0.41213
26 0.56326 0.41321
27 0.74632 0.40817
28 0.91992 0.39812
29 1.07829 0.38614
30 1.22125 0.37517
152 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Table 6.8: Supplement 2: Table for conversion of sun exposure behavior scale
summed item scores to Rasch measures
Score Theta SE
Score Theta SE
0 -5.27595 1.47207
31 0.88249 0.33971
1 -4.11926 0.87104
32 0.99858 0.34156
2 -3.54991 0.69122
33 1.11579 0.34341
3 -3.15451 0.59780
34 1.23375 0.34539
4 -2.84381 0.53924
35 1.35211 0.34779
5 -2.58325 0.49877
36 1.47086 0.35104
6 -2.35588 0.46874
37 1.59051 0.35581
7 -2.15206 0.44553
38 1.71241 0.36286
8 -1.96578 0.42702
39 1.83890 0.37314
9 -1.79311 0.41187
40 1.97354 0.38791
10 -1.63131 0.39924
41 2.12162 0.40891
11 -1.47840 0.38851
42 2.29121 0.43880
12 -1.33292 0.37926
43 2.49504 0.48180
13 -1.19374 0.37130
44 2.75405 0.54554
14 -1.05998 0.36432
45 3.10653 0.64666
15 -0.93091 0.35820
46 3.64318 0.83571
16 -0.80593 0.35284
47 4.77595 1.44204
17 -0.68450 0.34817
18 -0.56613 0.34415
19 -0.45037 0.34075
20 -0.33677 0.33795
21 -0.22495 0.33571
22 -0.11446 0.33405
23 -0.00489 0.33294
24 0.10415 0.33236
25 0.21306 0.33230
26 0.32222 0.33270
27 0.43197 0.33354
28 0.54265 0.33474
29 0.65452 0.33624
30 0.76778 0.33793
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 153
Table 6.9: Supplement 3: Table for conversion of sun protection behavior scale
summed item scores to Rasch measures.
Score Theta SE
Score Theta SE
0 -3.57570 1.42140
31 1.99349 0.55155
1 -2.45271 0.80219
32 2.34286 0.64880
2 -1.95418 0.61143
33 2.87682 0.83638
3 -1.64207 0.51348
34 4.01705 1.44694
4 -1.41526 0.45389
5 -1.23512 0.41363
6 -1.08340 0.38496
7 -0.95034 0.36376
8 -0.83016 0.34772
9 -0.71939 0.33520
10 -0.61554 0.32553
11 -0.51687 0.31819
12 -0.42221 0.31264
13 -0.33047 0.30872
14 -0.24074 0.30626
15 -0.15202 0.30524
16 -0.06346 0.30557
17 0.02586 0.30724
18 0.11692 0.31027
19 0.21070 0.31464
20 0.30823 0.32043
21 0.41061 0.32743
22 0.51871 0.33576
23 0.63339 0.34532
24 0.75536 0.35614
25 0.88526 0.36860
26 1.02403 0.38301
27 1.17340 0.40040
28 1.33656 0.42235
29 1.51930 0.45142
30 1.73192 0.49183
154 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
Each 'X' represents 7.9 cases
The labels for thresholds show the levels of item, and step, respectively
Figure 6.6: A1. Sun exposure behaviours scale item map
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 155
Each 'X' represents 7.5 cases
The labels for thresholds show the levels of item, and step, respectively
Figure 6.7: A2. Sun protection behaviours scale item map
156 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
REFERENCES
1. American Educational Research Association (AERA), American
Psychological Association (APA), and National Council on Measurement in
Education (NCME), The Standards for Educational and Psychological
Testing. 1999.
2. Olsen J. Epidemiology deserves better questionnaires. International Journal
of Epidemiology. 1998; Dec 1; 27(6):935.
3. Wilcox AJ. The quest for better questionnaires. American journal of
epidemiology. 1999; Dec 15; 150(12):1261-2.
4. Cargill J, et al. Validation of brief questionnaire measures of sun exposure
and skin pigmentation against detailed and objective measures including
vitamin D status. Photochemistry and photobiology. 2013; Jan 1; 89(1):219-
26.
5. O’Riordan DL, Glanz K, Gies P, Elliott T. A Pilot Study of the Validity of
Self‐reported Ultraviolet Radiation Exposure and Sun Protection Practices
Among Lifeguards, Parents and Children. Photochemistry and photobiology.
2008; May 1; 84(3):774-8.
6. Shoveller JA, Savoy DM, Roberts RE. Sun protection among parents and
children at freshwater beaches. Canadian Journal of Public Health/Revue
Canadienne de Sante'e Publique. 2002; Mar 1;146-8.
7. O’Riordan DL, Steffen AD, Lunde KB, Gies P. A day at the beach while on
tropical vacation: sun protection practices in a high-risk setting for UV
radiation exposure. Archives of dermatology. 2008; Nov 17; 144(11):1449-55.
8. Youl PH, Soyer HP, Baade PD, Marshall AL, Finch L, Janda M. Can skin
cancer prevention and early detection be improved via mobile phone text
messaging? A randomised, attention control trial. Preventive medicine. 2015;
Feb 28; 71:50-6.
9. Hillhouse J, Turrisi R, Jaccard J, Robinson J. Accuracy of self-reported sun
exposure and sun protection behavior. Prevention Science. 2012; Oct 1;
13(5):519-31.
10. Glanz K, et al. Measures of sun exposure and sun protection practices for
behavioral and epidemiologic research. Archives of Dermatology. 2008; Feb
1; 144(2):217-22.
11. National Human Genome Research Institute. PhenX Measure : Skin Cancer
2010 [cited 2015 1 June ]; Available from:
https://www.phenxtoolkit.org/toolkit_content/PDF/PX170601.pdf.
12. Quereux G, et al. Development of an individual score for melanoma risk.
European Journal of Cancer Prevention. 2011; May 1; 20(3):217-24.
13. Mar V, Wolfe R, Kelly JW. Predicting melanoma risk for the Australian
population. Australasian Journal of Dermatology. 2011; May 1; 52(2):109-
16.
Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer Risk
Using Item Response Theory 157
14. Fortes C, et al. Identifying individuals at high risk of melanoma: a simple
tool. European Journal of Cancer Prevention. 2010; Sep 1; 19(5):393-400.
15. Usher-Smith JA, Emery J, Kassianos AP, Walter FM. Risk prediction models
for melanoma: a systematic review. Cancer Epidemiology Biomarkers &
Prevention. 2014; Jun 3:cebp-0295.
16. Schulz W. Validating Questionnaire Constructs in International Studies: Two
Examples from PISA 2000. 2003.
17. Gonzalez EJ, Galia J, Li I. Scaling methods and procedures for the TIMSS
2003 mathematics and science scales. TIMSS. 2003; 252-73.
18. Chakravarty EF, Bjorner JB, Fries JF. Improving patient reported outcomes
using item response theory and computerized adaptive testing. The Journal of
Rheumatology. 2007; Jun 1; 34(6):1426-31.
19. Flynn KE, Dombeck CB, DeWitt EM, Schulman KA, Weinfurt KP. Using
item banks to construct measures of patient reported outcomes in clinical
trials: investigator perceptions. Clinical Trials. 2008; Dec 1; 5(6):575-86.
20. Cook LL, Eignor DR. IRT equating methods. Educational measurement:
Issues and practice. 1991; Sep 1; 10(3):37-45.
21. Teresi JA. Different approaches to differential item functioning in health
applications: Advantages, disadvantages and some neglected topics. Medical
care. 2006; Nov 1; 44(11):S152-70.
22. Dodd BG, De Ayala RJ, Koch WR. Computerized adaptive testing with
polytomous items. Applied psychological measurement. 1995. 19(1): p. 5-22.
23. Elhan AH, Öztuna D, Kutlay Ş, Küçükdeveci AA, Tennant A. An initial
application of computerized adaptive testing (CAT) for measuring disability
in patients with low back pain. BMC Musculoskeletal Disorders. 2008; Dec
18;9(1):1.
24. Ware JE Jr, et al. Applications of computerized adaptive testing (CAT) to the
assessment of headache impact. Quality of Life Research. 2003; Dec
1;12(8):935-52.
25. Aitken JF, Elwood JM, Lowe JB, Firman DW, Balanda KP, Ring IT. A
randomised trial of population screening for melanoma. Journal of Medical
Screening. 2002; Mar 1; 9(1):33-7.
26. DiSipio T, et al. The Queensland cancer risk study: behavioural risk factor
results. Australian and New Zealand journal of public health. 2006; Aug 1;
30(4):375-82.
27. Olsen CM, et al. Cohort profile: the QSkin sun and health study. International
journal of epidemiology. 2012; Aug 1; 41(4):929-i.
28. Brodie AM, et al. The AusD Study: a population-based study of the
determinants of serum 25-hydroxyvitamin D concentration across a broad
latitude range. American journal of epidemiology. 2013; Mar 22; kws322.
158 Chapter 6: Development and Psychometric Evaluation of Item Banks for the Assessment of Skin Cancer
Risk Using Item Response Theory
29. Linacre JM. Category, step and threshold: definitions & disordering. Rasch
measurement transactions. 2001; 15(1):794.
30. Andrich D. An expanded derivation of the threshold structure of the
polytomous rasch model that dispels any “threshold disorder controversy”.
Educational and Psychological Measurement. 2013; Feb 1;73(1):78-124.
31. Adams RJ, Wu ML, Wilson M. The Rasch rating model and the disordered
threshold controversy. Educational and Psychological Measurement. 2012;
Aug 1; 72(4):547-73.
32. Collins D. Pretesting survey instruments: an overview of cognitive methods.
Quality of life research. 2003; May 1; 12(3):229-38.
33. Conrad KJ, Wilson M. Constructing measures: An item response modeling
approach. Erlbaum Associates Mahwah, NJ. Evaluation and Program
Planning. 2005; Nov 30; 28(4):433-4.
34. Bond T, Fox CM. Applying the Rasch model: Fundamental measurement in
the human sciences. Routledge. 2015; Jun 5.
35. de Ayala RJ. An introduction to polytomous item response theory models.
Measurement and evaluation in Counseling and Development. 1993; Jan.
25(4): p. 172.
36. Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;
Jun 1; 47(2):149-74.
37. Wu M, et al, ACER ConQuest Version 2.0 manual : Generalised Item
Response Modelling Software. ACER Press: Victoria, Australia. 2007.
38. Linacre JM. A user’s guide to WINSTEPS MINISTEP Rasch-model computer
programs. Chicago IL: Winsteps. com. 2006.
39. Glanz K, et al. Validity of self-reported solar UVR exposure compared with
objectively measured UVR exposure. Cancer Epidemiology Biomarkers &
Prevention. 2010; Dec 1; 19(12):3005-12.
40. Glanz K, Schoenfeld E, Weinstock MA, Layi G, Kidd J, Shigaki DM.
Development and reliability of a brief skin cancer risk assessment tool.
Cancer detection and prevention. 2003; Dec 31; 27(4):311-5.
41. O'Riordan DL, et al. Validity of covering-up sun-protection habits:
association of observations and self-report. Journal of the American Academy
of Dermatology. 2009; May 31; 60(5):739-44.
42. Glanz K, et al. Validity of self-reported sunscreen use by parents, children,
and lifeguards. American journal of preventive medicine. 2009; Jan 31;
36(1):63-9.
Chapter 7: Discussion 159
Chapter 7: Discussion
Questionnaires have been used frequently in many studies as a preferred method to
measure skin cancer-related risk factors because they are widely considered to be
efficient, convenient, and economical. Most of these questionnaires, however, were
developed using classical psychometric methods and many do not appear to have
been rigorously tested according to current psychometric standards. As a result, few
previous studies could be found that have investigated the psychometric properties of
the questions to obtain precise estimates of the underlying latent trait (Table 1.1,
page 5-6). This thesis has provided a general introduction to item response theory
(IRT), presented an IRT framework that could be used in skin cancer research, and
aimed to familiarise clinicians and researchers in the skin cancer area regarding the
use of IRT to develop and test questionnaires for their suitability to the research
question and target group. The main findings of each chapter of this thesis are
summarised below and the findings and implications discussed. Methodological
consideration and suggestions for future studies are detailed in the last section of this
discussion chapter.
SUMMARY OF THE MAIN FINDINGS
Following the introduction and presentation of the IRT framework in Chapter 1, a
Rasch Rating Scale Model was applied Chapter 2 (page 38), to demonstrate that it is
possible to calibrate a skin self-examination attitude scale using an IRT approach.
This was one of the first studies worldwide to use IRT for skin cancer related attitude
assessment.
The results show the skin self-examination attitude scale is a brief, useful, and
reliable tool for assessing attitudes towards skin self-examination, thereby
legitimising its use in a population of men 50 years or older. It was also
demonstrated that the scale requires the addition of more items measuring positive
skin self-examination attitudes to cover the full range of latent constructs well.
In Chapter 3 (page 56), an IRT was used in a study to assess the potential impact of
self-reported sun protection and sun exposure behaviour change due to concern about
vitamin D on skin cancer risk at different latitudes in Australia. Visualising each
160 Chapter 7: Discussion
item’s location on the underlying latent construct of skin cancer risk enabled the
quantification of the impact that a potential change in people’s behaviours due to
concern about vitamin D deficiency may have on skin cancer risk.
In Chapter 4 (page 82), IRT was applied to assess whether it is possible to measure
phenotypes and behaviours related to skin cancer more efficiently. Using a computer
adaptive test simulation approach demonstrated that providing the questions in a
computer adaptive way can reduce participants’ burden by up to 66% compared to
non-adaptive testing, while retaining excellent measurement precision.
In Chapter 5 (page 105), data was evaluated from a large prospective cohort of skin
cancer risk in Queensland (the QSkin Sun and Health Study). Questions measuring
skin risk factors such as phenotype, sun exposure behaviours, and sun protection
behaviours were calibrated using Rasch partial credit model. The phenotype subscale
was found to be a good predictor of future development of non-melanoma skin
cancer (0.75); however, there was a lower explanatory power of the sun exposure
(0.53) or sun protection behaviour (0.49) scales.
Based on the best items from studies used in Chapters 3-5, Chapter 6 (page 121)
presented the development and evaluation of a new proposed scale for measuring
skin cancer risk consisting of three subscales: a phenotype, sun exposure behaviours,
and a sun protection behaviours subscale. The results show that the scale provides
adequate content coverage for all respondents. They also show moderate to high
correlation with existing validated questionnaires, moderate to high internal
consistency, and stability for all subscales. Only the phenotype (0.72) and sun
exposure behaviour (0.62) subscales differentiated well and moderately well,
respectively, between people who self-reported that they did or did not have non-
melanoma skin cancer in the past, while the sun protection scale (0.36) had little
association with skin cancer status.
DISCUSSION OF THE MAIN FINDINGS
The following section discusses how IRT was applied in this thesis.
7.2.1 Item Response Theory as a tool for evaluating psychometrics
properties of a questionnaire.
Until now, in health research, classical test theory (CTT) has predominantly been
used as a method for evaluating the qualities of a questionnaire, and this also applies
Chapter 7: Discussion 161
to many skin cancer questionnaires.23,61,160,161 Two indicators are usually derived
through this approach: item-total correlation and Cronbach’s alpha reliability;25,40,162
some studies have also assessed the validity of self-reported summary scales against
objective measurements, such as sunscreen cotton swabbing,46,163 physician clinical
examination,63 or UVR dosimetry,24,164 by correlating the overall scale score with the
outcome, for example, sunscreen yes/no. However, these indicators do not provide
information about the discriminative value of each individual question. For example,
item total correlation only allows the investigator to assess how strong each item
correlates with the scale total score, and assumes each question contributes equally to
the total score. Using an item response theory approach allows one to better evaluate
the psychometric properties of a questionnaire, including how well each
discriminates people with different skin cancer risk, and whether those with higher
risk correctly use the relevant response categories of the item, the degree of
information of each item, its standard error, how well it fits with the model, and
whether it is prone to differential item functioning (DIF).
Content coverage and item targeting.
The production of item-person maps (see Figure 2.1 in Study 1, page 48 and Figures
5.1, 5.2, 5.3 in Study 4 pages 113-115), can determine whether the scale covers a
wide range of content area, whether items have good content targeting (i.e., the items
are distributed evenly across the latent construct continuum), and also check for
evidence of ceiling or floor effects. This can only be done using IRT, where the item
and person parameters are calibrated on the same metric.49,165 Using this approach in
Study 1 showed that more items were required to measure high skin self-examination
attitude, as the current eight item scale contains few items covering that area of the
latent trait.149
Item location
Item map and item location (b parameter) provided an estimate for each item’s
difficulty or likelihood to be endorsed by people with different underlying trait. For
example, Study 1 showed that for item 3 “Checking my skin regularly is a priority
for me” b = 0.54. This means that only people who are quite aware of the importance
of skin self-examination are likely to answer yes to this item, compared to item 1 “It
is important to check my skin for skin cancer even if I have no symptom” which had
162 Chapter 7: Discussion
a b = -0.58. This additional information about the items cannot be obtained using a
CTT approach.
Differential Item Functioning (DIF)
IRT provides a procedure to check whether items function differently for different
subgroups of the population, for example men and women. During instrument
development and validation, it is important to ensure that items are as unbiased as
possible.93,158,166 Where differential item functioning is present and cannot be
avoided, the analyst must then be aware and adjust the analyses accordingly. For
example, Study 1149 found there was no significant DIF effect between the
intervention and control group, which means that all items were invariant across
groups.
7.2.2 Item Response Theory as a tool for developing a new a questionnaire.
IRT focuses on the statistical analysis of individual items, in contrast to CTT, where
development efforts focus on the test as a whole, which is intended to be provided in
its entirety for measurement purposes.80,167,168 For each item, IRT provides item
specific information about reliability, as well as unique information called an item
characteristic curve (presented in detail on page 12-14, Chapter 1), which visualises
where along the latent trait continuum the item measures optimally and the amount
of information it provides.169
Information function and standard error
IRT also estimates this information for each item, which allows the investigator to
create targeted item banks using the most informative items for each person. This has
the advantage that each person may answer a completely different set of items, but
still provide an equally accurate estimate of skin cancer risk.
7.2.3 Use of Computer Adaptive Test to reduce participants’ burden.
As described in the Introduction (pages 19-20), one major advantage of IRT is that it
allows the implementation of adaptive tests that tailor the difficulty of the test to each
individual participant, an advantage that has long been known to education,170-173 and
human resources selection testing,134,174 and is now increasingly applied to health-
related outcome measurement.135,175,176 The simulation modelling in Study 3177
(pages 94-95 Tables 4.1-3) demonstrated that CAT can be successfully applied to
Chapter 7: Discussion 163
reduce the length of the Skin Cancer Risk Scale by more than 60%. CAT was also
associated with smaller standard errors compared to non-adaptive testing. This thesis
argues that the use of CAT can improve accuracy and reduce the response burden
when assessing skin cancer risk. Interested readers can test an example of the CAT
used in this thesis at the publisher’s website:
(http://www.jmir.org/article/downloadSuppFile/4736/26665).
THE ASSESSMENT OF SKIN CANCER RISK.
Several self-administrated questionnaires have been developed to calculate skin
cancer risk scores, and researchers commonly calculate relative risks or odds ratios
estimated by logistic regression as scoring methods.178-180 A systematic review listed
twenty-five risk models for the prediction of melanoma,181 with 144 possible risk
factors, including 18 different measures of the number of nevi and 26 measures of
UVR exposure. The number of nevi, freckles, history of sunburn, skin colour, and
hair colour were frequently included in the final risk estimation model. All models
had similar discrimination, with area under the curve (AUC) of approximately 0.70 –
0.80,181 which was also similar to the result in Study 3, using the phenotype subscale
score only, which achieved an AUC of 0.72 (page 146, Figure 6.5). A major
weakness of most previous studies has been that only internal validation has been
conducted (using the original development population data set). Only one study180
has been validated in an external population. More recently, two studies attempted to
externally validate the performance of their melanoma risk prediction models. Olsen
and colleagues182 assessed the discriminatory performance of six melanoma
prediction models18,178-180,183,184 by using two independent data sets from The
Epigene185 and the QSkin studies.186 The results showed high discriminatory
performance for the six models with AUC values ranging from 0.73 (95% CI 0.71-
0.75) to 0.93 (95% CI 0.92-0.95). Vuong et al187 developed a melanoma risk
prediction model using the Australian Melanoma Family study,188 and then validated
it externally using four independent population-based studies: the Western Australia
Melanoma Study,189 Leeds Melanoma Case-Control Study,190,191 Epigene-QSkin
study,185,186 and Swedish Women’s Lifestyle and Health Cohort Study.192,193 The
model included hair colour, nevus density, previous melanoma skin cancer, first-
degree family history of melanoma, and lifetime sunbed use. The results showed
high discriminatory performance with AUC statistic of 0.70 (95% CI, 0.67-0.73) and
164 Chapter 7: Discussion
0.63 (95% CI, 0.60-0.67) to 0.67 (95% CI, 0.65-0.70) for internal validation and
external validation, respectively.
Similar to the overarching aims of the risk prediction modes discussed above, the
primary objective of this study was to apply IRT methods to create a reliable, valid,
and precise tool for assessing skin cancer risk. After extensive preparatory work
using data from several existing studies (page 128-129, Chapter 6), this research
selected 53 items for further testing. It was hypothesised that skin cancer risk could
be measured by three main subscales: phenotype, sun exposure behaviours, and sun
protection behaviours. People with high scores in those scales tend to have a high
probability of getting skin cancer. Each item in the subscales was calibrated using the
Rasch partial credit model and the participant’s risk score was estimated. Scores
from each subscale should be able to discriminate and predict whether the participant
has low or high skin cancer risk. A prospective non-melanoma (Study 4) or self-
reported past (Study 5) skin cancer status was used as an outcome variable. This
increased the item bank, with good content coverage from 29 items from the QSkin
Study to 53 items. The new scales showed moderate to high correlation with existing
tools,54,194 and the discriminatory performance of the phenotype scale with AUC for
self-reported past melanoma of 0.72 (95% CI, 0.65-0.78) showed similar
discrimination compared to previous studies.181,182,187 While it is difficult to compare
results from different studies, as researchers have developed and used different
questions, this new scale has advantages compared to other questionnaires. The new
scale can be used to compare results from different studies through methods called
IRT linking and equating,130,195 and can also be used to create tailored assessment
using computer adaptive tests.177
Among the three subscales, the phenotype subscale was found to have the best
discriminatory performance (AUC = 0.72). In contrast, the sun protection behaviours
scale showed low discriminant performance (AUC = 0.36). This exemplifies a
considerable advantage of IRT allowing the user to determine which of the items
contributes most to the risk prediction score, and highlighting where further
development of suitable questions is required. While this was not attempted by the
previous studies mentioned above,182,187 it is likely a reason for the predominant use
of phenotype items in all risk prediction models.182 The low discriminant diagnostic
power of the sun protection items may be due to:
Chapter 7: Discussion 165
a) Recall effect: Few studies examined recall bias in self-reported melanoma
risk factors,196-198 all of them asked about sun exposure recall,199-201 and none
of them investigated sun protection recall bias, for example, reporting more
sun protection methods than actually applied. The recall effect on sun
protection behaviours is suspected in the current study, as all self-report
measures are subject to recall errors.202,203 Future studies should investigate
how accurately people can remember their sun protection behaviours and
compare it to observations or objective measures.
1. Social desirability bias: Similar to many constructs commonly investigated in
psychology, self-reported measures in this study may be subject to social
desirability.202,204,205 Participants may feel social pressure and report
favourable responses towards sun protection behaviours. Research into
sunscreen application has found that people apply less sunscreen than the
recommended amount of 2mg/cm2, but practice few sun protection
behaviours in day to day life.206,207 A study undertaken by Hall et al208 found
social norms supporting sun safety were associated with more sun protection
habits. The “Slip! Slop! Slap!” sun safety campaign may create social
desirability bias on people’s behaviour towards sun protection in Australia. It
is one of the most successful health campaigns in Australia’s history and was
launched by the Cancer Council in 1981.209 The objective of this campaign
was to reduce population exposure to sunlight and increase sun protection to
reduce the burden of skin cancer in Australia. The data suggests that
campaigns using the “Slip! Slop! Slap!” slogan continues to have a high level
of recall among adolescents.210 One of the explanations could be this
campaign made people aware of the risk of sun exposure and the need to
protect themselves when outside,211 leading to people reporting changing
their behaviours, including wearing hats, sunscreen, and protective
clothing.212 The effectiveness of this campaign is also shown by nine cross-
sectional surveys from 1987 to 2002,213 that investigated weekend sun
protection and sunburn in Australia and their association with SunSmart
advertising. The studies found a trend of improvement of sun-protection
behaviours compared with the period prior to the launch of the campaign.
Another online survey by the melanoma genetics consortium (GenoMEL)
166 Chapter 7: Discussion
that consisted of 12 countries (Australia, Germany, Israel, Italy, Latvia, the
Netherlands, Poland, Slovenia, Spain, Sweden, the United Kingdom, and the
United States) reported that Australians had the highest use of sun protection
compared with all other countries.214 Despite this, sunburn prevalence is still
high comparable to rates 20 years ago, indicating that people may over-report
their sun protection use. If the social desirability bias is high in the current
study, this would mean that these questions cannot be used to discriminate
between high risk and low risk skin cancer. These hypotheses require further
examination and more studies are needed.
b) All previous risk prediction models for melanoma,182,215,216 have focused
mainly on phenotypic factors, such as freckles, number of nevi, hair colour
and skin colour, none included sun protection behaviours. Furthermore, the
validation studies of self-reported sun protection behaviours commonly
assessed criterion validity only, mainly comparing against objective
sunscreen presence,46,217,218 observation or sun-related diaries.219,220 None of
these studies have examined the predictive validity of their measure using
skin cancer status as the outcome variable. It is possible that the low
discriminatory performance of the sun protection subscale in the current
study was possibly caused by a lack of association between non-phenotypic
factors with skin cancer risk. The association may also possibly be mediated
by other variables such as lifestyle,221 beliefs, knowledge, attitude,222-225 and
latitude not measured in this thesis;169 further investigation is required to
confirm this hypothesis.
METHODOLOGICAL CONSIDERATIONS AND FUTURE STUDIES
Relatively few studies in public health, especially in skin cancer research, have used
IRT approaches to develop and evaluate the psychometric properties of their
questionnaires. An important focus of this thesis was to investigate whether using
such approaches would provide benefit for public health researchers, practitioners,
and the general public. This has been partially fulfilled with the finding that using a
skin cancer risk scale in a computer adaptive test mode could be used with high
precision to identify people at risk, and could reduce the response burden. The
research also demonstrated that further work is required to improve the precision of
sun protection behaviours measurement.
Chapter 7: Discussion 167
7.4.1 Limitations of the research
1. Sample representativeness: Studies 1 to 4 used data from existing
studies, most with large samples, and reasonably representative samples.
Although Study 5 had a large number of participants (n=1,177), it
collected its data based on a convenience sample and most of the
participants were from the Queensland metropolitan area. Therefore,
results may not be representative of the general adult Australian
population, which was 49.4% male and 50.6%, whereas this study had
more females (76.3%) than males (23.1%). A few studies investigated the
effect of gender and found that women respond to web surveys at higher
rates than men.226,227 This phenomena has been previously reported by Sax
et al.226 Future studies will need to replicate the findings of this thesis, and
obtain a diverse sample to test the measurement invariance of the scale
factor structure between different groups. This future work would refine
the scale and provide evidence of its applicability in a representative
population. It is also recommended that an international sample be
recruited, so that these items can be tested for global use.
2. The effects of item sequence (position) in the questionnaire: A number
of studies showed that item difficulty (b parameters) can be influenced by
the order of items in a questionnaire (or the effect of changing item
position) in personality and educational assessment.228-232 In this study,
items were always presented in the following order: phenotype items, sun
exposure behaviours items, and sun protection behaviour items. Effects of
order of presentation, or questionnaire length, especially fatigue, rather
than the items themselves, may have led to the lower discriminative ability
of the sun protection behaviours items. The question of whether subscales
positioned later in a study are answered more uniformly than those
positioned near the beginning was investigated by Galesic, who provided
evidence for such a phenomenon.233 Future studies need to consider effects
of the positioning of items, both for online and paper pencil presentation of
questionnaires.
3. Differences in presentation to locally recruited and online panel: In
this study, a number of items were presented simultaneously on each page
168 Chapter 7: Discussion
for locally recruited participants, as opposed to the online panel in which
each item was presented one at a time (to ensure data synchronisation with
the third-party panel database). No previous study has investigated the
effect that such a difference in online presentation may have on
participants’ answer patterns. Results from studies that assess the
comparability of a paper and computer version of a questionnaire are
inconsistent,234-237 with some affirming234,235,237 and others denying236 a
difference in answer.
4. Advanced Item Response Theory models: As this study aimed to
demonstrate that IRT is useful in skin cancer risk assessment broadly, The
most commonly used models in IRT were applied so that they would also
be easily accessible for other researchers interested in using this approach.
Psychometricians are increasingly using more advanced statistics,
including multidimensional238-241 and multilevel (hierarchical)
models.242,243 These methods allow for the inclusion of interaction terms
between multiple risk factors and other variables, such as group and time-
specific effects in the model, and should be applied to skin cancer risk self-
reported data in future studies.
5. Longitudinal study design: The cross-sectional study design in Study 5
did not allow for analysis of various aspects of item performance, such as
item parameters stability, item parameter drift, or predictive validity for
future development of melanoma or non-melanoma skin cancers. A
longitudinal study is required to provide more detail into these various
aspects of measurement issues in the future. Furthermore, a longitudinal
study would allow assessment of sensitivity to change of the skin cancer
risk scale compared to an external anchor, such as objective measured skin
cancer.
6. The use of objective measurements for validation: A strength of this
study was the availability of prospective skin cancer diagnosis, which was
used in Study 4 to determine the predictive value of the three QSkin
phenotype, sun protection, and sun exposure subscales. Several previous
studies have compared self-reported UVR exposure and sun protection
behaviours scales with objective methods such as UVR dosimeter27,164,244-
Chapter 7: Discussion 169
246 and sunscreen cotton swab results;163 however, this was conducted in
the current study due to time and budget limitations. Future iterations of
the AusSun scale can be improved by incorporating objective methods and
obtaining prospective clinical skin cancer status from the cancer registry or
Medicare link database.
7.4.2 Suggestions for Future Implementation
This dissertation concludes with suggestions for future implementation of IRT
approaches in skin cancer-related research.
1. Given the significant reduction in response burden through the CAT
presented in Chapter 4, the development of a native app is suggested
(which is installed directly onto the smart phone and can work, in most
cases, without internet connection) for android and iOS, instead of using
web-based apps for skin cancer risk assessment. The benefits of a native
app are that it can work independently of the browser, work much faster
than a web application by optimising the power of the processor, has the
ability to connect to various wearable devices such as a Fitbit®, and can
access the hardware of the mobile phone, such as the light sensor and GPS.
Nowadays, almost all people have access to a smartphone, in addition to
personal computers. Mobile apps may attract more people to use the scale
and could then be linked with objective data, such as real-time location
[through internal global positioning satellite (GPS)] and UVR data, that
could be incorporated in more precise prediction models in the future.
2. Extension of the Computer Adaptive Testing module to other languages.
While the present study tested CAT with good success, the current CAT
module has the limitation of being unsuitable for people using languages
other than English. A multilingual interface could be added in the future to
overcome this limitation.
3. Dissemination of the Item Response Theory framework among skin cancer
researchers. Many people in skin cancer research are not trained in item
response theory; it would therefore be beneficial to familiarise them with
this framework so that they could consider using these modern
psychometric approaches in their own research.
170 Chapter 7: Discussion
Despite its limitations, this thesis could be a first step towards an international skin
cancer risk item bank. This work was presented and discussed with international
experts at the 3rd International Conference on UV and Skin Cancer Prevention held in
Melbourne in 2015, during a pre-conference workshop about surveillance of skin
cancer risk factors. The workshop was attended by world experts in skin cancer
research, including representatives from the National Cancer Institute, Centre for
Disease Control and Prevention, and Cancer Council Victoria. It was initiated to
highlight the need for greater standardisation of questionnaires used in skin cancer
research, as current practice prohibits comparison of findings across countries. Each
country, and even most studies or population-based surveys currently use slightly
different questions. Many of these questionnaires are used for historical reasons, and
cannot easily be changed, as the performance of individual items and their
contribution to the overall risk estimates is unknown. This lack of standardisation
makes it difficult to compare results across countries. One way to overcome this
problem is to create an International Item Bank for Skin Cancer Risk of items with
known reliability, risk estimates, and discrimination. IRT methods could therefore be
used to link various questionnaires so that their overlap and unique measurement
properties can be explored. This has been achieved successfully in educational
assessment (such as the Trends in International Mathematics and Science Study,247-
249 the Progress in International Reading Literacy Study, and Programme for
International Student Assessment)250-252 and work is underway to provide item
databanks in patient outcome assessment (NIH-PROMIS).253-255 This study can be
the first step in developing an international item bank for skin cancer assessment.
Similar to the NIH-PROMIS’ roadmap,256 this thesis proposes a roadmap for the
development of International Skin Cancer Risk Item Bank in Figure 7.1. The most
important step to accomplish this goal is to explore culturally, ethnically, and
linguistically diverse perceptions of skin cancer risk, sun exposure behaviours, and
sun protection behaviours. Studies have shown different patterns of sun exposure and
protection behaviours between cultures,257-260 it is therefore very important to have
an item bank that is linguistically equivalent, culturally relevant, and
psychometrically sound. It should be provided in multiple languages, developing a
standardised skin cancer risk assessment and allowing conduct of cross-cultural skin
cancer research. The 53-items of the SunAus scale developed in the present research
could be the first to be entered into the item database.
Chapter 7: Discussion 171
Figure 7.1: Roadmap for an International Skin Cancer Risk Item Bank
* This works needs to be done by an international community/consortium
CONCLUSION
This thesis explored a new approach to skin cancer-related measurement that is
rarely discussed in current health literature. The thesis provided empirical evidence
for the benefits of using IRT in questionnaire development and psychometric testing
in skin cancer research. The studies conducted in this thesis have therefore
demonstrated some of the advantages of IRT in various applications. The large
number of participants in this study and the successful implementation of the
phenotype subscale in discriminating peoples’ risk make the findings relevant to the
research community and also provide directions for future studies.
This new scale shows improvement compared to previous measures, such as wider
content coverage; being more precise, as its provides the exact location of individuals
and items on skin cancer risk continuum; and being more economical, as it is able to
predict risk with fewer items. These findings offer not only the initial evidence of the
Items from participating country
Field test
International Item Bank
CAT SKIN
-Item writing
(Anchor and
unique item)
-Item
mapping
IRT calibrated
item bank
reviewed for
reliability,
validity,
sensitivity
CAT version
of
International
Item Bank
for Skin
Cancer Risk
Questionnaire
administered to
large
representative
sample in each
country
172 Chapter 7: Discussion
usefulness of IRT in analysing and developing skin cancer-related questionnaires, but
also demonstrate the potential of its application to reduce participant burden without
compromising measurement precision via the implementation of computer adaptive
testing.
Lastly, IRT is not a universal solution for every assessment problem and it does not
correct problems of bias items or failure to meet predictive ability. IRT is also not a
substitute for classical methods that were influential and remain important. However,
IRT is a valuable tool that can and should be used to increase the quality of
assessment in epidemiological research. More research is required to demonstrate a
greater impact of IRT, especially within the field of skin cancer research and
practice. The major findings and implication for practice should make contributions
to knowledge generation, clinical practice, and policy-related issues in the field of
skin cancer preventive initiatives. In conclusion, the new IRT-based skin cancer risk
scale appears to be a promising tool for the assessment of skin cancer risk and is
recommend for use in other studies.
References 173
References
1. Geller AC, Emmons K, Brooks DR, et al. Skin cancer prevention and
detection practices among siblings of patients with melanoma. Journal of the
American Academy of Dermatology. 10// 2003; 49(4):631-638.
2. Hirst NG, Gordon LG, Scuffham PA, Green AC. Lifetime Cost-Effectiveness
of Skin Cancer Prevention through Promotion of Daily Sunscreen Use. Value
in Health. 3// 2012; 15(2):261-268.
3. Kyrgidis A, Tzellos TG, Vahtsevanos K, Triaridis S. New Concepts for Basal
Cell Carcinoma. Demographic, Clinical, Histological Risk Factors, and
Biomarkers. A Systematic Review of Evidence Regarding Risk for Tumor
Development, Susceptibility for Second Primary and Recurrence. Journal of
Surgical Research. 3// 2010; 159(1):545-556.
4. Sánchez G, Nova J, de la Hoz F. Risk Factors for Basal Cell Carcinoma: A
Study From the National Dermatology Center of Colombia. Actas Dermo-
Sifiliográficas (English Edition). 5// 2012; 103(4):294-300.
5. Wright TI, Spencer JM, Flowers FP. Chemoprevention of nonmelanoma skin
cancer. Journal of the American Academy of Dermatology. 6// 2006;
54(6):933-946.
6. Janda M, Kimlin M, Whiteman D, Aitken J, Neale R. Sun protection and low
levels of vitamin D: are people concerned? Cancer Causes Control.
2007/11/01 2007; 18(9):1015-1019.
7. Van Der Pols JC, Russell A, Bauer U, Neale RE, Kimlin MG, Green AC.
Vitamin D status and skin cancer risk independent of time outdoors: 11-year
prospective study in an Australian community. Journal of Investigative
Dermatology. // 2013; 133(3):637-641.
8. Janda M, Youl P, Bolz K, Niland C, Kimlin M. Knowledge about health
benefits of vitamin D in Queensland Australia. Preventive Medicine. // 2010;
50(4):215-216.
9. Youl PH, Janda M, Kimlin M. Vitamin D and sun protection: The impact of
mixed public health messages in Australia. International Journal of Cancer.
// 2009; 124(8):1963-1970.
10. Jayaratne N, Russell A, van der Pols JC. Sun protection and vitamin D status
in an Australian subtropical community. Preventive Medicine. 8// 2012;
55(2):146-150.
11. Garland CF, Garland FC, Gorham ED, Lipkin M, et al. The Role of Vitamin
D in Cancer Prevention. American Journal of Public Health. 2006;
96(2):252-261.
12. Hughes A, Hoffman J, Hoffman A. Vitamin D and sun exposure: To bare all
or cover up? Expert Review of Dermatology. // 2012; 7(6):495-497.
174 References
13. Australian Institute of Health and Welfare. Australian Cancer Incidence and
Mortality (ACIM) Books Canberra: Australian Institute of Health and
Welfare. 2012.
14. McGrath J, Kimlin M, Saha S, Eyles D, Parisi A. Vitamin D insufficiency in
south-east Queensland. Medical Journal of Australia. 2001; 174(3): 150-150.
15. Nowson CA, Margerison C. Vitamin D intake and vitamin D status of
Australians. Medical journal of Australia. 2002; 177(3): 149-152.
16. Marks R, Staples M, Giles GG. Trends in non-melanocytic skin cancer
treated in Australia: The second national survey. International Journal of
Cancer. 1993; 53(4):585-590.
17. Fransen M, Karahalios A, Sharma N, English DR, Giles GG, Sinclair RD.
Non-melanoma skin cancer in Australia. Medical Journal of Australia. 2012;
197(10):565-568.
18. MacKie R, Freudenberger T, Aitchison T. Personal risk-factor chart for
cutaneous melanoma. The Lancet. 1989; 334(8661):487-490.
19. Tacke J, Dietrich J, Steinebrunner B, Reifferscheid A. Assessment of a new
questionnaire for self-reported sun sensitivity in an occupational skin cancer
screening program. BMC dermatology. 2008; 8(1):4.
20. Weinstock MA. Assessment of sun sensitivity by questionnaire: validity of
items and formulation of a prediction rule. Journal of clinical epidemiology.
1992; 45(5):547-552.
21. Tacke J, Dietrich J, Steinebrunner B, Reifferscheid A. Assessment of a new
questionnaire for self-reported sun sensitivity in an occupational skin cancer
screening program. BMC Dermatology. 2008; 8(1):1-10.
22. Gillespie H, Watson T, Emery J, Lee A, Murchie P. A questionnaire to
measure melanoma risk, knowledge and protective behaviour: Assessing
content validity in a convenience sample of Scots and Australians. BMC
Medical Research Methodology. 2011; 11(1):123.
23. Morales-Sánchez MA, Peralta-Pedrero ML, Domínguez-Gómez MA.
Validation of a questionnaire to quantify the risk for skin cancer. Gaceta
Medica de Mexico. 2014; (150):409-419.
24. Humayun Q, Iqbal R, Azam I, Khan A, Siddiqui A, Baig-Ansari N.
Development and validation of sunlight exposure measurement questionnaire
(SEM-Q) for use in adult population residing in Pakistan. BMC Public
Health. 2012; 12(1):421.
25. de Troya-Martín M, Blázquez-Sánchez N, Rivas-Ruiz F, et al. Validation of a
Spanish Questionnaire to Evaluate Habits, Attitudes, and Understanding of
Exposure to Sunlight: “The Beach Questionnaire”. Actas Dermo-
Sifiliográficas (English Edition). // 2009; 100(7):586-595.
26. Glanz K, Schoenfeld E, Weinstock MA, Layi G, Kidd J, Shigaki DM.
Development and reliability of a brief skin cancer risk assessment tool.
Cancer Detection and Prevention. // 2003; 27(4):311-315.
References 175
27. Dwyer T, Blizzard L, Gies P, Ashbolt R, Roy C. Assessment of habitual sun
exposure in adolescents via questionnaire--a comparison with objective
measurement using polysulphone badges. Melanoma research. 1996;
6(3):231.
28. McCarty CA. Sunlight exposure assessment: can we accurately assess
vitamin D exposure from sunlight questionnaires? The American Journal of
Clinical Nutrition. April 2008 2008; 87(4):1097S-1101S.
29. Crescentini A, Zanolla G. The Evaluation of Mathematical Competency:
Elaboration of a Standardized Test in Ticino (Southern Switzerland).
Procedia - Social and Behavioral Sciences. 2/7/ 2014; 112:180-189.
30. Spooren AIF, Arnould C, Smeets RJEM, Bongers HMH, Seelen HAM.
Improvement of the Van Lieshout hand function test for Tetraplegia using a
Rasch analysis. Spinal Cord. 2013; 51(10):739-744.
31. El-Korashy AF. Applying the Rasch Model to the Selection of Items for a
Mental Ability Test. Educational and Psychological Measurement. October
1, 1995. 1995; 55(5):753-763.
32. Lee SH. Multidimensional item response theory: A SAS MDIRT macro and
empirical study of PIAT math test [Ph.D.]. Ann Arbor, The University of
Oklahoma; 2007.
33. Mary K. Tripp, Scott C. Carvajal, Laura K. McCormick, et al. Validity and
reliability of the Parental Sun Protection Scales. Health Education Research.
2003; 18(1).
34. Day AK, Wilson C, Roberts RM, Hutchinson AD. The Skin Cancer and Sun
Knowledge (SCSK) Scale: Validity, Reliability, and Relationship to Sun-
Related Behaviors Among Young Western Adults. Health Education &
Behavior. August 1, 2014. 2014; 41(4):440-448.
35. Staples MP, Elwood M, Burton RC, Williams JL, et al. Non-melanoma skin
cancer in Australia: the 2002 national survey and trends since 1985. Medical
Journal of Australia. 2006; 184(1):6-10.
36. Diffey BL, Norridge Z. Reported sun exposure, attitudes to sun protection
and perceptions of skin cancer risk: a survey of visitors to Cancer Research
UK’s SunSmart campaign website. British Journal of Dermatology. 2009;
160(6):1292-1298.
37. Newman WG, Agro AD, Woodruff SI, Mayer JA. A survey of recreational
sun exposure of residents of San Diego, California. American journal of
preventive medicine. 1996.
38. Tempark T, Chatproedprai S, Wananukul S. Attitudes, knowledge, and
behaviors of secondary school adolescents regarding protection from sun
exposure: a survey in Bangkok, Thailand. Photodermatology,
Photoimmunology & Photomedicine. 2012; 28(4):200-206.
39. Falk M, Anderson CD. Measuring sun exposure habits and sun protection
behaviour using a comprehensive scoring instrument – An illustration of a
176 References
possible model based on Likert scale scorings and on estimation of readiness
to increase sun protection. Cancer Epidemiology. 2012; 36(4):e265-e269.
40. Jennings L, Karia PS, Jambusaria-Pahlajani A, Whalen FM, Schmults CD.
The Sun Exposure and Behaviour Inventory (SEBI): Validation of an
instrument to assess sun exposure and sun protective practices. Journal of the
European Academy of Dermatology and Venereology. // 2013; 27(6):706-
715.
41. Wild D, Eremenco S, Mear I, et al. Multinational Trials—Recommendations
on the Translations Required, Approaches to Using the Same Language in
Different Countries, and the Approaches to Support Pooling the Data: The
ISPOR Patient-Reported Outcomes Translation and Linguistic Validation
Good Research Practices Task Force Report. Value in Health. 6// 2009;
12(4):430-440.
42. Gothwal VK, Bagga DK, Sumalini R. Rasch validation of the PHQ-9 in
people with visual impairment in South India. Journal of Affective Disorders.
10/1/ 2014; 167:171-177.
43. Lundgren-Nilsson Å, Dencker A, Jakobsson S, Taft C, Tennant A. Construct
Validity of the Swedish Version of the Revised Piper Fatigue Scale in an
Oncology Sample—A Rasch Analysis. Value in Health. 6// 2014; 17(4):360-
363.
44. Pilatti A, Read JP, Vera BdV, Caneto F, Garimaldi JA, Kahler CW. The
Spanish version of the Brief Young Adult Alcohol Consequences
Questionnaire (B-YAACQ): A Rasch Model analysis. Addictive Behaviors.
5// 2014; 39(5):842-847.
45. American Educational Research Association, American Psychological
Association, National Council on Measurement in Education. Standards for
educational and psychological testing. Amer Educational Research Assn;
1999.
46. Glanz K, McCarty F, Nehl EJ, et al. Validity of Self-Reported Sunscreen Use
by Parents, Children, and Lifeguards. American Journal of Preventive
Medicine. 2009; 36(1):63-69.
47. Hedges T, Scriven A. Young park users' attitudes and behaviour to sun
protection. Global Health Promotion. 2010; 17(4):24-31,90,96.
48. Horsburgh-McLeod GF, Gray AR, Reeder AI, McGee R. Applying Item
Response Theory (IRT) to a suntan attitudes scale. Australasian
Epidemiologist. 2010; 17(1):40.
49. van der Linden WJ, Hambleton RK. Handbook of modern item response
theory. Springer Science & Business Media; 2013.
50. Coory M, Baade P, Aitken J, Smithers M, McLeod GRC, Ring I. Trends for
in situ and invasive melanoma in Queensland, Australia, 1982–2002. Cancer
Causes Control. 2006; 17(1):21-27.
References 177
51. Jones WO, Harman CR, Ng AK, Shaw JH. Incidence of malignant melanoma
in Auckland, New Zealand: highest rates in the world. World journal of
surgery. 1999; 23(7):732-735.
52. Green A, Siskind V. Geographical distribution of cutaneous melanoma in
Queensland. The Medical journal of Australia. 1983; 1(9):407.
53. Jennings L, Karia PS, Jambusaria-Pahlajani A, Whalen FM, Schmults CD.
The Sun Exposure and Behaviour Inventory (SEBI): validation of an
instrument to assess sun exposure and sun protective practices. Journal of the
European Academy of Dermatology and Venereology. 2012; no-no.
54. Glanz K, Yaroch AL, Dancel M, et al. Measures of sun exposure and sun
protection practices for behavioral and epidemiologic research. Archives of
Dermatology. // 2008; 144(2):217-222.
55. Oliveria SA, Saraiya M, Geller AC, Heneghan MK, Jorgensen C. Sun
exposure and risk of melanoma. Archives Of Disease In Childhood. 2006;
91(2):131-138.
56. Armstrong BK. How sun exposure causes skin cancer: an epidemiological
perspective. Prevention of skin cancer: Springer; 2004:89-116.
57. Vu LH, van der Pols JC, Whiteman DC, Kimlin MG, Neale RE. Knowledge
and Attitudes about Vitamin D and Impact on Sun Protection Practices among
Urban Office Workers in Brisbane, Australia. Cancer Epidemiology
Biomarkers & Prevention. July 1, 2010. 2010; 19(7):1784-1789.
58. Dobbinson S, Wakefield M, Hill D, et al. Children's sun exposure and sun
protection: Prevalence in Australia and related parental factors. Journal of the
American Academy of Dermatology. Jun 2012; 66(6):938-947.
59. Schofield PE, Freeman JL, Dixon HG, Borland R, Hill DJ. Trends in sun
protection behaviour among Australian young adults. Australian and New
Zealand Journal of Public Health. 2001; 25(1):62-65.
60. Bränström R, Kristjansson S, Ullen H, Brandberg Y. Stability of
questionnaire items measuring behaviours, attitudes and stages of change
related to sun exposure. Melanoma research. 2002; 12(5):513-519.
61. Detert H, Hedlund S, Anderson CD, et al. Validation of sun exposure and
protection index (SEPI) for estimation of sun habits. Cancer Epidemiology.
2015; 39(6):986-993.
62. Dusza SW, Oliveria SA, Geller AC, Marghoob AA, Halpern AC. Student-
parent agreement in self-reported sun behaviors. Journal of the American
Academy of Dermatology. 2005; 52(5):896-900.
63. Morze CJ, Olsen CM, Perry SL, et al. Good test-retest reproducibility for an
instrument to capture self-reported melanoma risk factors. Journal of Clinical
Epidemiology. // 2012; 65(12):1329-1336.
64. Bond TG, Fox CM. Applying the Rasch model: Fundamental measurement in
the human sciences. Psychology Press; 2013.
178 References
65. Fries J, Bruce B, Cella D. The promise of PROMIS: using item response
theory to improve assessment of patient-reported outcomes. Clinical and
experimental rheumatology. 2005; 23(5):S53.
66. Schulz W. Validating Questionnaire Constructs in International Studies: Two
Examples from PISA 2000. Australian Council for Educational
ResearchMelbourne/Australia 2003.
67. Gonzalez EJ, Galia J, Li I. Scaling methods and procedures for the TIMSS
2003 mathematics and science scales. TIMSS. 2003; :252l273.
68. Beck CT, Gable RK. Postpartum Depression Screening Scale: development
and psychometric testing. Nursing Research. 2000; 49(5):272-282.
69. Revicki DA, Chen WH, Frank L, Feltner D, Morlock R. Development and
Analysis of Item Response Theory-based Short-form Depression Severity
Scales Based on the HDRS and MADRS. Health Outcomes Research in
Medicine. 12// 2010; 1(2):e111-e122.
70. Levine SZ, Rabinowitz J, Rizopoulos D. Recommendations to improve the
Positive and Negative Syndrome Scale (PANSS) based on item response
theory. Psychiatry Research. 8/15/ 2011; 188(3):446-452.
71. Spence R, Owens M, Goodyer I. Item response theory and validity of the
NEO-FFI in adolescents. Personality and Individual Differences. 10// 2012;
53(6):801-807.
72. Wainer H, Wang X. Using a New Statistical Model for Testlets to Score
TOEFL. Journal of Educational Measurement. 2000; 37(3):203-220.
73. Kingston NM, Dorans NJ. The Feasibility of Using Item Response Theory as
a Psychometric Model for the GRE Aptitude Test. GRE Board Professional
report GREB No. 79-12P. ETS Research Report 82-12. 1982.
74. Kingston N. An Exploratory Study of the Applicability of Item Response
Theory Methods to the Graduate Management Admission Test. Distributed
by ERIC Clearinghouse [Washington, D.C.] 1985.
<http://www.eric.ed.gov/contentdelivery/servlet/ERICServlet?accno=ED2681
41>1985.
75. McKinley RL, Kingston NM. Exploring the use of IRT equating for the GRE
subject test in mathematics. Educational Testing Service; 1987.
76. Zara AR. Using Computerized Adaptive Testing to Evaluate Nurse
Competence for Licensure: Some History and Forward Look. Advances in
health sciences education. 1999; 4(1):39-48.
77. Downing SM. Item response theory: applications of modern test theory in
medical education. Medical Education. 2003; 37(8):739-745.
78. Traub RE. Classical test theory in historical perspective. Educational
Measurement: Issues and Practice. 1997; 16(4):8-14.
79. Anastasi A, Urbina S. Psychological testing (7th ed). New York, Upper
Saddle River: Macmillan; 1997.
References 179
80. Hambleton RK, Jones RW. Comparison of classical test theory and item
response theory and their applications to test development. Educational
measurement: issues and practice. 1993; 12(3):38-47.
81. Lord FM. Applications of item response theory to practical testing problems.
Routledge; 1980.
82. Hambleton RK, Swaminathan H, Roger HJ. Fundamentals of item response
theory. Newbury Park, California: Sage Publications; 1991.
83. Streiner DLP. Measure for Measure: New Developments in Measurement and
Item Response Theory. Canadian Journal of Psychiatry. 2010; 55(3):180-
186.
84. Hays RD, Morales LS, Reise SP. Item response theory and health outcomes
measurement in the 21st century. Medical care. 2000; 38(9 Suppl):II28.
85. Molenaar IW. Some Background for Item Response Theory and the Rasch
Model. In: Fischer GH, Molenaar IW, eds. Rasch Models: Foundations,
Recent Developments, and Applications. New York, NY: Springer New York;
1995; :3-14.
86. Bock RD. A Brief History of Item Theory Response. Educational
Measurement: Issues and Practice. 1997; 16(4):21-33.
87. Baker FB. The basics of item response theory. ERIC; 2001.
88. Yen WM. The Choice of Scale for Educational Measurement: An Art
Perspective. Journal of Educational Measurement. 1986; 23(4):299-325.
89. Maria OE, Bryce BR. Applying item response theory (IRT) modeling to
questionnaire development, evaluation, and refinement. Qual Life Res.
2007/08/01 2007; 16(1):5-18.
90. Reeve BB, Fayers P. Applying item response theory modeling for evaluating
questionnaire item and scale properties. Assessing quality of life in clinical
trials: methods of practice. 2005; 2:55-73.
91. Kline T. Psychological testing: A practical approach to design and
evaluation. Sage; 2005.
92. Osterlind SJ, Everson HT. Differential Item Functioning. Thousand Oaks,
CA: SAGE Publications, Inc.; 2010.
93. Walker CM. What's the DIF? Why Differential Item Functioning Analyses
Are an Important Part of Instrument Development and Validation. Journal of
Psychoeducational Assessment. May 19, 2011. 2011.
94. Abroms L, Jorgensen CM, Southwell BG, Geller AC, Emmons KM. Gender
differences in young adults’ beliefs about sunscreen use. Health Education &
Behavior. 2003; 30(1):29-43.
95. Groenvold M, Bjorner JB, Klee MC, Kreiner S. Test for item bias in a quality
of life questionnaire. Journal of Clinical Epidemiology. 6// 1995; 48(6):805-
816.
180 References
96. Camilli G, Shepard LA. Methods for identifying biased test items. Sage; 1994.
97. Clauser BE, Mazor KM. Using Statistical Procedures to Identify
Differentially Functioning Test Items. Educational Measurement: Issues and
Practice. 1998; 17(1):31-44.
98. Douglas JA, Roussos LA, Stout W. Item-Bundle DIF Hypothesis Testing:
Identifying Suspect Bundles and Assessing Their Differential Functioning.
Journal of Educational Measurement. 1996; 33(4):465-484.
99. Allalouf A, Hambleton RK, Sireci SG. Identifying the Causes of DIF in
Translated Verbal Items. Journal of Educational Measurement. 1999;
36(3):185-198.
100. Novick MR. The axioms and principal results of classical test theory. Journal
of Mathematical Psychology. 2// 1966; 3(1):1-18.
101. Fan X. Item Response Theory and Classical Test Theory: An Empirical
Comparison of their Item/Person Statistics. Educational and Psychological
Measurement. June 1, 1998 1998; 58(3):357-381.
102. Algina J, Crocker L. Introduction to classical and modern test theory. New
York: Wadsworth Publishing; 1986.
103. Degreef E, Buggenhaut JV, eds. Trends in mathematical psychology.
Amsterdam, North-Holland 1984. Mathematical social sciences; No. 13.
104. De Ayala RJ. The theory and practice of item response theory. Guilford Press
New York; 2009.
105. Green JL, Camilli G, Elmore PB. Handbook of complementary methods in
education research. Routledge; 2012.
106. Hays RD, Morales LS, Reise SP. Item Response Theory and Health
Outcomes Measurement in the 21st Century. Medical care. 2000; 38(9
Suppl):II28-II42.
107. Hambleton RK, Swaminathan H. Item response theory: Principles and
applications. Springer Science & Business Media; 2013.
108. Van der Linden WJ, Glas CAW, Interuniversitair Centrum voor
Onderwijskundig Onderzoek. Computerized adaptive testing : theory and
practice. Dordrecht; Boston: Kluwer Academic; 2000.
109. Revicki DA, Cella DF. Health status assessment for the twenty-first century:
item response theory, item banking and computer adaptive testing. Qual Life
Res. 1997/11/01 1997; 6(6):595-600.
110. Sygna K, Johansen S, Ruland CM. Recruitment challenges in clinical
research including cancer patients and caregivers. Trials. 2015; 16(1):1-9.
111. Ong AD, Van Dulmen MH. Oxford handbook of methods in positive
psychology. Oxford University Press New York; 2007.
112. Nering ML, Ostini R. Handbook of polytomous item response theory models.
Taylor & Francis; 2011.
References 181
113. Ostini R, Nering ML. Polytomous item response theory models. Sage; 2006.
114. Drasgow F, Levine MV, Tsien S, Williams B, Mead AD. Fitting Polytomous
Item Response Theory Models to Multiple-Choice Tests. Applied
Psychological Measurement. June 1, 1995 1995; 19(2):143-166.
115. Kolen MJ, Zeng L, Hanson BA. Conditional Standard Errors of Measurement
for Scale Scores Using IRT. Journal of Educational Measurement. 1996;
33(2):129-140.
116. Friborg O, Martinussen M, Rosenvinge JH. Likert-based vs. semantic
differential-based scorings of positive psychological constructs: A
psychometric comparison of two versions of a scale measuring resilience.
Personality and Individual Differences. 4// 2006; 40(5):873-884.
117. Prieto L, Alonso J, Lamarca R. Classical test theory versus Rasch analysis for
quality of life questionnaire reduction. Health and Quality of Life Outcomes.
07/28 04/11/received 07/28/accepted 2003; 1:27-27.
118. Downey RG, King CV. Missing Data in Likert Ratings: A Comparison of
Replacement Methods. The Journal of General Psychology. 1998/04/01
1998; 125(2):175-191.
119. Schaeffer GA, Bridgeman B, Golub-Smith ML, Lewis C, Potenza MT,
Steffen M. Comparability of paper-and-pencil and computer adaptive test
scores on the GRE General Test. ETS Research Report Series. 1998.
120. Wainer H. CATs: Whiter and whence. Psicológica: revista de metodología y
psicología experimental. 2000; 21(1):121-134.
121. Jette AM, Haley SM, Tao W, et al. Prospective evaluation of the AM-PAC-
CAT in outpatient rehabilitation settings. Physical Therapy. 2007; 87(4):385-
398.
122. Gardner W, Shear K, Kelleher K, et al. Computerized adaptive measurement
of depression: A simulation study. BMC Psychiatry. 2004;4(1):13.
123. Wainer H, Dorans NJ, Green BF, et al. Computerized adaptive testing: A
primer. Lawrence Erlbaum Associates, Inc; 1990.
124. Thompson NA, Weiss DJ. A framework for the development of computerized
adaptive tests. Practical Assessment, Research, and Evaluation. 2011; 16(1).
125. Embretson SE, Reise SP. Item Response Theory. Hoboken: Taylor and
Francis; 2013: http://QUT.eblib.com.au/patron/FullRecord.aspx?p=1166563.
126. Boyd AM. Strategies for controlling testlet exposure rates in computerized
adaptive testing systems [3110732]. United States -- Texas, The University of
Texas at Austin; 2003.
127. Orlando M, Sherbourne CD, Thissen D. Summed-score linking using item
response theory: Application to depression measurement. Psychological
Assessment. 2000; 12(3):354-359.
182 References
128. Kolen MJ, Brennan RL. Test Equating, Scaling, and Linking. New York:
Springer 2014.
129. Stocking M, Lord FM. Developing a common metric in item response theory.
Appl Psychol Meas. 1983; 7.
130. Kim SH, Cohen AS. A Comparison of Linking and Concurrent Calibration
Under Item Response Theory. Applied Psychological Measurement. June 1,
1998 1998; 22(2):131-143.
131. Schalet BD, Rothrock NE, Hays RD, et al. Linking Physical and Mental
Health Summary Scores from the Veterans RAND 12-Item Health Survey
(VR-12) to the PROMIS® Global Health Scale. Journal of General Internal
Medicine. 2015; 30(10):1524-1530.
132. Schalet BD, Revicki DA, Cook KF, Krishnan E, Fries JF, Cella D.
Establishing a Common Metric for Physical Function: Linking the HAQ-DI
and SF-36 PF Subscale to PROMIS® Physical Function. Journal of General
Internal Medicine. 2015; 30(10):1517-1523.
133. Lim RL. Linking Results of Distinct Assessments. Applied Measurement in
Education. 1993/01/01 1993; 6(1):83-102.
134. Kantrowitz TM, Dawson CR, Fetzer MS. Computer Adaptive Testing (CAT):
A Faster, Smarter, and More Secure Approach to Pre-Employment Testing.
Journal of Business and Psychology. 2011; 26(2):227-232.
135. Fayers PM. Applying item response theory and computer adaptive testing: the
challenges for health outcomes assessment. Qual Life Res. 2007; 16(1):187-
194.
136. Rebollo P, Castejon I, Cuervo J, et al. Validation of a computer-adaptive test
to evaluate generic health-related quality of life. Health and Quality of Life
Outcomes. 2010; 8(1):147.
137. Rao CR, Sinharay S. Handbook of statistics: Psychometrics. Vol 26: Elsevier;
2006.
138. Kline P. Handbook of psychological testing. Routledge; 2013.
139. Hambleton RK. Fundamentals of item response theory. Vol 2: Sage
publications; 1991.
140. Groth-Marnat G. Handbook of psychological assessment. John Wiley & Sons;
2009.
141. Hambleton RK. Test score validity and standard-setting methods. Criterion-
referenced measurement: The state of the art. 1980; 80:123.
142. Cizek GJ, Bunch MB. Standard setting: A guide to establishing and
evaluating performance standards on tests. SAGE Publications Ltd; 2007.
143. Sijtsma K, Hemker BT. A taxonomy of IRT models for ordering persons and
items using simple sum scores. Journal of Educational and Behavioral
Statistics. 2000; 25(4):391-415.
References 183
144. Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of item response
theory. Newbury Park, California: Sage; 1991.
145. Wolfe F, Michaud K, Pincus T. Development and validation of the health
assessment questionnaire II: A revised version of the health assessment
questionnaire. Arthritis & Rheumatism. 2004; 50(10):3296-3305.
146. Lord F. A Theory of Test Scores (Psychometric Monograph No.7).
Richmond, VA: Psychometric Corporation; 1952.
147. Birnbaum A. Some latent train models and their use in inferring an
examinee's ability. Statistical theories of mental test scores. 1968; :395-479.
148. Andrich D. Application of a psychometric rating model to ordered categories
which are scored with successive integers. Applied Psychological
Measurement. 1978; 2:325-359.
149. Djaja N, Youl P, Aitken J, Janda M. Evaluation of a skin self examination
attitude scale using an item response theory model approach. Health and
Quality of Life Outcomes. 2014; 12(1):189.
150. Harris D. Comparison of 1-, 2-, and 3-Parameter IRT Models. Educational
Measurement: Issues and Practice. 1989 ;8(1):35-41.
151. DeMars C. Item response theory. Oxford University Press, USA; 2010.
152. Nguyen T, Han HR, Kim M, Chan K. An Introduction to Item Response
Theory for Patient-Reported Outcome Measurement. Patient. 2014/03/01
2014; 7(1):23-35.
153. Barnes LLB, Wise SL. The Utility of a Modified One-Parameter IRT Model
With Small Samples. Applied Measurement in Education. 1991/04/01 1991;
4(2):143-157.
154. Embretson SE, Hershberger SL. The new rules of measurement: what every
psychologist and educator should know. Mahwah, NJ: Erlbaum Associates;
1999.
155. Paek I, Han KT. IRTPRO 2.1 for Windows (Item Response Theory for
Patient-Reported Outcomes). Applied Psychological Measurement. May 1,
2013 2013; 37(3):242-252.
156. Guyer R, Thompson NA. User's manual for XCalibre 4.1 [computer
program]. St. Paul MN: Assessment Systems Corporation; 2011.
157. Teresi JA. Overview of quantitative measurement methods: Equivalence,
invariance, and differential item functioning in health applications. Medical
care. 2006; 44(11):S39-S49.
158. Downing SM, Haladyna TM. Handbook of test development. L. Erlbaum
Mahwah, NJ; 2006.
159. Wilson M. Constructing measures: An item response modeling approach.
Mahwah, New Jersey: Lawrence Erlbaum Associates; 2005.
184 References
160. Aygun O, Ergun A. Validity and Reliability of Sun Protection Behavior Scale
among Turkish Adolescent Population. Asian Nursing Research. 9// 2015;
9(3):235-242.
161. Wu S, Ho SC, Lam TP, et al. Development and validation of a lifetime
exposure questionnaire for use among Chinese populations. Scientific
Reports. 09/30/online 2013; 3:2793.
162. Borschmann RD, Cottrell D. Developing the readiness to alter sun-protective
behaviour questionnaire (RASP-B). Cancer Epidemiology. 2009; 33(6):451-
462.
163. O'Riordan DL, Lunde KB, Steffen AD, Maddock JE. Validity of beachgoers'
self-report of their sun habits. Archives of Dermatology. // 2006;
142(10):1304-1311.
164. Cargill J, Lucas RM, Gies P, et al. Validation of brief questionnaire measures
of sun exposure and skin pigmentation against detailed and objective
measures including vitamin D status. Photochemistry and Photobiology. //
2013; 89(1):219-226.
165. Reid CA, Kolakowsky-Hayner SA, Lewis AN, Armstrong AJ. Modern
Psychometric Methodology: Applications of Item Response Theory.
Rehabilitation Counseling Bulletin. 2007; 50(3):177-188.
166. Hambleton RK. Good practices for identifying differential item functioning.
Medical care. 2006; 44(11):S182-S188.
167. Edelen MO, Reeve BB. Applying item response theory (IRT) modeling to
questionnaire development, evaluation, and refinement. Qual Life Res. 2007;
16:5-18.
168. Hambleton RK. Emergence of Item Response Modeling in Instrument
Development and Data Analysis. Medical Care. 2000; 38(9):II60-II65.
169. Djaja N, Janda M, Lucas RM, et al. Self-Reported Changes in Sun-Protection
Behaviours at different latitudes in Australia. Photochemistry and
Photobiology. 2016.
170. Weiss DJ, Kingsbury GG. Aplication of Computerized Adaptive testing to
Educational Problems. Journal of Educational Measurement. 1984;
21(4):361-375.
171. Chalhoub–Deville M, Deville C. Computer Adaptive Testing in Second
Language Contexts. Annual Review of Applied Linguistics. 1999; 19:273-299.
172. Weiss DJ. Computerized adaptive testing for effective and efficient
measurement in counseling and education. Measurement and Evaluation in
Counseling and Development. 2004; 37(2):70.
173. Lilley M, Barker T, Britton C. The development and evaluation of a software
prototype for computer-adaptive testing. Computers & Education. 8// 2004;
43(1–2):109-123.
References 185
174. Tippins NT, Beaty J, Drasgow F, et al. Unproctored Internet Testing in
Employment Settings. Personnel Psychology. 2006; 59(1):189-225.
175. Crins MHP, Roorda LD, Smits N, et al. Calibration and Validation of the
Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with
Chronic Pain. PLoS ONE. 07/27 12/16/received 07/06/accepted
2015;10(7):e0134094.
176. Dijkers MP. A computer adaptive testing simulation applied to the FIM
instrument motor component. Archives of Physical Medicine and
Rehabilitation. 3// 2003; 84(3):384-393.
177. Djaja N, Janda M, Olsen CM, Whiteman DC, Chien TW. Estimating Skin
Cancer Risk: Evaluating Mobile Computer-Adaptive Testing. Journal of
Medical Internet Research. 2016; 18(e22).
178. Quereux G, Moyse D, Lequeux Y, et al. Development of an individual score
for melanoma risk. European Journal of Cancer Prevention. 2011; 20(3):217-
224.
179. Mar V, Wolfe R, Kelly JW. Predicting melanoma risk for the Australian
population. Australasian Journal of Dermatology. 2011; 52(2):109-116.
180. Fortes C, Mastroeni S, Bakos L, et al. Identifying individuals at high risk of
melanoma: a simple tool. European Journal of Cancer Prevention. 2010;
19(5):393-400.
181. Usher-Smith JA, Emery J, Kassianos AP, Walter FM. Risk Prediction Models
for Melanoma: A Systematic Review. Cancer Epidemiology Biomarkers &
Prevention. August 1, 2014 2014; 23(8):1450-1463.
182. Olsen CM, Neale RE, Green AC, et al. Independent Validation of Six
Melanoma Risk Prediction Models. J Invest Dermatol. 2015; 01/29/online.
183. Fears TR, Guerry D, Pfeiffer RM, et al. Identifying Individuals at High Risk
of Melanoma: A Practical Predictor of Absolute Risk. Journal of Clinical
Oncology. August 1, 2006 2006; 24(22):3590-3596.
184. Cho E, Rosner BA, Feskanich D, Colditz GA. Risk Factors and Individual
Probabilities of Melanoma for Whites. Journal of Clinical Oncology. April
20, 2005 2005; 23(12):2669-2675.
185. Kvaskoff M, Pandeya N, Green AC, et al. Site-Specific Determinants of
Cutaneous Melanoma: A Case–Case Comparison of Patients with Tumors
Arising on the Head or Trunk. Cancer Epidemiology Biomarkers &
Prevention. 2013; 22(12):2222-2231.
186. Olsen CM, Green AC, Neale RE, et al. Cohort profile: The QSkin Sun and
Health Study. International Journal of Epidemiology. August 1, 2012 2012;
41(4):929-929i.
187. Vuong K, Armstrong BK, Weiderpass E, et al. Development and external
validation of a melanoma risk prediction model based on self-assessed risk
factors. JAMA Dermatology. 2016.
186 References
188. Cust AE, Schmid H, Maskiell JA, et al. Population-based, Case-Control-
Family Design to Investigate Genetic and Environmental Influences on
Melanoma Risk: Australian Melanoma Family Study. American Journal of
Epidemiology. December 15, 2009 2009; 170(12):1541-1554.
189. English DR, Armstrong BK. Identifying people at high risk of cutaneous
malignant melanoma: results from a case-control study in Western Australia.
British medical journal (Clinical research ed.). 1988; 296(6632):1285.
190. Newton-Bishop JA, Chang YM, Elliott F, et al. Relationship between sun
exposure and melanoma risk for tumours in different body sites in a large
case-control study in a temperate climate. European Journal of Cancer. 3//
2011; 47(5):732-741.
191. Newton-Bishop JA, Chang YM, Iles MM, et al. Melanocytic Nevi, Nevus
Genes, and Melanoma Risk in a Large Case-Control Study in the United
Kingdom. Cancer Epidemiology Biomarkers & Prevention. August 1, 2010
2010; 19(8):2043-2054.
192. Veierød MB, Adami HO, Lund E, Armstrong BK, Weiderpass E. Sun and
Solarium Exposure and Melanoma Risk: Effects of Age, Pigmentary
Characteristics, and Nevi. Cancer Epidemiology Biomarkers & Prevention.
January 1, 2010. 2010; 19(1):111-120.
193. Roswall N, Sandin S, Adami HO, Weiderpass E. Cohort Profile: The Swedish
Women’s Lifestyle and Health cohort. International Journal of
Epidemiology. June 10, 2015.
194. National Human Genome Research Institute. PhenX Measure: Skin Cancer
2010; https://www.phenxtoolkit.org/toolkit_content/PDF/PX170601.pdf.
Accessed 1 June 2015.
195. Mislevy RJ. Linking Educational Assessments: Concepts, Issues, Methods,
and Prospects. Princeton, NJ: Educational Testing Service; 1992.
196. Van Der Mei IAF, Blizzard L, Ponsonby AL, Dwyer T. Validity and
reliability of adult recall of past sun exposure in a case-control study of
multiple sclerosis. Cancer Epidemiology Biomarkers and Prevention. // 2006;
15(8):1538-1544.
197. Cockburn M, Hamilton A, Mack T. Recall Bias in Self-reported Melanoma
Risk Factors. American Journal of Epidemiology. May 15, 2001 2001;
153(10):1021-1026.
198. Weinstock MA, Colditz GA, Willet WC, Stampfer MJ, Rosner B, Speizer FE.
Recall (Report) Bias and Reliability in the Retrospective Assessment of
Melanoma Risk. American Journal of Epidemiology. February 1, 1991 1991;
133(3):240-245.
199. van der Mei IAF, Blizzard L, Ponsonby AL, Dwyer T. Validity and
Reliability of Adult Recall of Past Sun Exposure in a Case-Control Study of
Multiple Sclerosis. Cancer Epidemiology Biomarkers & Prevention. August
1, 2006. 2006; 15(8):1538-1544.
References 187
200. Wu S, Ho SC, Lam TP, et al. Development and validation of a lifetime
exposure questionnaire for use among Chinese populations. Scientific reports.
2013; 3.
201. Rosso S, Miñarro R, Schraub S, Tumino R, Franceschi S, Zanetti R.
Reproducibility of skin characteristic measurements and reported sun
exposure history. International Journal of Epidemiology. April 1, 2002. 2002;
31(2):439-446.
202. Buller DB, Cokkinides V, Hall HI, et al. Prevalence of sunburn, sun
protection, and indoor tanning behaviors among Americans: Review from
national surveys and case studies of 3 states. Journal of the American
Academy of Dermatology. 11// 2011; 65(5, Supplement 1):S114.e111-
S114.e111.
203. Adams A, Soumerai S, Lomas J, Ross-Degnan D. Evidence of self-report bias
in assessing adherence to guidelines. International Journal for Quality in
Health Care. 1999-06-01 00:00:00 1999; 11(3):187-192.
204. Manne S, Lessin S. Prevalence and Correlates of Sun Protection and Skin
Self-Examination Practices Among Cutaneous Malignant Melanoma
Survivors. J Behav Med. 2006; 29(5):419-434.
205. Weinstock MA, Risica PM, Martin RA, et al. Reliability of assessment and
circumstances of performance of thorough skin self-examination for the early
detection of melanoma in the Check-It-Out Project. Preventive Medicine. 6//
2004; 38(6):761-765.
206. Stenberg C, Larkö O. Sunscreen application and its importance for the sun
protection factor. Archives of Dermatology. 1985; 121(11):1400-1402.
207. Stokes R, Diffey B. How well are sunscreen users protected?
Photodermatology, Photoimmunology & Photomedicine. 1997; 13(5-6):186-
188.
208. Hall DM, McCarty F, Elliott T, Glanz K. Lifeguards' sun protection habits
and sunburns: Association with sun-safe environments and skin cancer
prevention program participation. Archives of Dermatology. 2009;
145(2):139-144.
209. Montague M, Borland R, Sinclair C. Slip! Slop! Slap! and SunSmart, 1980-
2000: Skin Cancer Control and 20 Years of Population-Based Campaigning.
Health Education & Behavior. June 1, 2001 2001; 28(3):290-305.
210. Smith BJ, Ferguson C, McKenzie J, Bauman A, Vita P. Impacts from
repeated mass media campaigns to promote sun protection in Australia.
Health Promotion International. March 1, 2002. 2002; 17(1):51-60.
211. Paul C, Tzelepis F, Girgis A, Parfitt N. The Slip Slop Slap years: Have they
had a lasting impact on today's adolescents? Health Promotion Journal of
Australia. 2003; 14(3):219-221.
188 References
212. Hill D, White V, Marks R, Borland R. Changes in sun-related attitudes and
behaviours, and reduced sunburn prevalence in a population at high risk of
melanoma. European journal of cancer prevention. 1993; 2(6):447-456.
213. Dobbinson SJ, Wakefield MA, Jamsen KM, et al. Weekend Sun Protection
and Sunburn in Australia: Trends (1987–2002) and Association with
SunSmart Television Advertising. American Journal of Preventive Medicine.
2// 2008; 34(2):94-101.
214. Bränström R, Kasparian NA, Chang YM, et al. Predictors of Sun Protection
Behaviors and Severe Sunburn in an International Online Study. Cancer
Epidemiology Biomarkers & Prevention. September 1, 2010 2010;
19(9):2199-2210.
215. Usher-Smith JA, Emery J, Kassianos AP, Walter FM. Risk prediction models
for melanoma: A systematic review. Cancer Epidemiology Biomarkers &
Prevention. June 3, 2014.
216. Quéreux G, Nguyen JM, Volteau C, Lequeux Y, Dréno B. Creation and test
of a questionnaire for self-assessment of melanoma risk factors. European
Journal of Cancer Prevention. 2010; 19(1):48-54.
217. O'Riordan D, Glanz K, Gies P, Elliott T. A pilot study of the validity of self-
reported ultraviolet radiation exposure and sun protection practices among
lifeguards, parents and children. Photochem Photobiol. 2008; 84:774 - 778.
218. O’Riordan, D, Lunde KB, Steffen AD, Maddock JE. . Validity of beachgoers'
self-report of their sun habits. Archives of Dermatology. 2006; 142(10):1304-
1311.
219. O'Riordan DL, Nehl E, Gies P, et al. Validity of covering-up sun-protection
habits: Association of observations and self-report. Journal of the American
Academy of Dermatology. 5// 2009; 60(5):739-744.
220. Oh SS, Mayer JA, Lewis EC, et al. Validating outdoor workers' self-report of
sun protection. Preventive Medicine. 2004; 39(4):798-803.
221. Santmyire BR, Feldman SR, Fleischer AB. Lifestyle high-risk behaviors and
demographics may predict the level of participation in sun-protection
behaviors and skin cancer primary prevention in the united states. Cancer.
2001; 92(5):1315-1324.
222. Bränström R, Brandberg Y, Holm L, Sjöberg L, Ullen H. Beliefs, knowledge
and attitudes as predictors of sunbathing habits and use of sun protection
among Swedish adolescents. European Journal of Cancer Prevention. 2001;
10(4):337-345.
223. Stone VB, Parker V, Quarterman M, Lee C. The relationship between skin
cancer knowledge and preventive behaviors used by parents. Dermatology
Nursing. 1999; 11(6):411.
224. Jackson KM, Aiken LS. A psychosocial model of sun protection and
sunbathing in young women: The impact of health beliefs, attitudes, norms,
and self-efficacy for sun protection. Health Psychology. 2000; 19(5):469-478.
References 189
225. Kim BH, Glanz K, Nehl EJ. Vitamin D beliefs and associations with
sunburns, sun exposure, and sun protection. International Journal of
Environmental Research and Public Health. // 2012; 9(7):2386-2395.
226. Sax LJ, Gilmartin SK, Bryant AN. Assessing Response Rates and
Nonresponse Bias in Web and Paper Surveys. Research in Higher Education.
2003; 44(4):409-432.
227. Underwood D, Kim H, Matier M. To Mail or To Web: Comparisons of
Survey Response Rates and Respondent Characteristics. 21-24 May presented
at 40th Annual Forum of the Association for Institutional Research; 2000;
Cincinnati, OH.
228. Ortner TM. On changing the position of items in personality questionnaires
Analysing effects of item sequence using IRT. Psychology Science. 2004;
46(4):466-476.
229. Kingston NM, Dorans NJ. The effect of the position of an item within a test
on item responding behavior: An analysis based on item response theory. ETS
Research Report Series. 1982; 1982(1):i-26.
230. Meyers JL, Miller GE, Way WD. Item position and item difficulty change in
an IRT-based common item equating design. Applied Measurement in
Education. 2008; 22(1):38-60.
231. Hohensinn C, Kubinger KD, Reif M, Holocher-Ertl S, Khorramdel L, Frebort
M. Examining item-position effects in large-scale assessment using the
Linear Logistic Test Model. Psychology Science. 2008; 50(3):391.
232. Hambleton RK, Traub RE. The Effects of Item Order on Test Performance
and Stress. The Journal of Experimental Education. 1974/09/01 1974
;43(1):40-46.
233. Galesic M, Bosnjak M. Effects of Questionnaire Length on Participation and
Indicators of Response Quality in a Web Survey. Public Opinion Quarterly.
June 20, 2009. 2009; 73(2):349-360.
234. Choi IC, Kim KS, Boo J. Comparability of a paper-based language test and a
computer-based language test. Language Testing. 2003; 20(3):295-320.
235. Lee GL, Weerakoon P. The role of computer-aided assessment in health
professional education: a comparison of student performance in computer-
based and paper-and-pen multiple-choice tests. Medical teacher. 2001;
23(2):152-157.
236. Clariana R, Wallace P. Paper–based versus computer–based assessment: key
factors associated with the test mode effect. British Journal of Educational
Technology. 2002; 33(5):593-602.
237. Miller ET, Neal DJ, Roberts LJ, et al. Test-retest reliability of alcohol
measures: is there a difference between internet-based assessment and
traditional methods? Psychology of Addictive Behaviors. 2002; 16(1):56.
190 References
238. Ackerman TA, Gierl MJ, Walker CM. Using multidimensional item response
theory to evaluate educational and psychological tests. Educational
Measurement: Issues and Practice. 2003; 22(3):37-51.
239. McDonald RP. A basis for multidimensional item response theory. Applied
Psychological Measurement. 2000; 24(2):99-114.
240. Ackerman T. Graphical representation of multidimensional item response
theory analyses. Applied Psychological Measurement. 1996; 20(4):311-329.
241. Reckase M. Multidimensional item response theory. Vol 150: Springer; 2009.
242. Pastor DA. The use of multilevel item response theory modeling in applied
research: An illustration. Applied measurement in education. 2003;
16(3):223-243.
243. Van Nispen RMK, Dirk L, Langelaan M, De Boer MR, Terwee CB, Van
Rens GH. Applying multilevel item response theory to vision-related quality
of life in Dutch visually impaired elderly. Optometry & Vision Science. 2007;
84(8):710-720.
244. Gies P, Glanz K, O'Riordan D, Elliott T, Nehl E. Measured occupational solar
UVR exposures of lifeguards in pool settings. American Journal of Industrial
Medicine. 2009; 52(8):645-653.
245. Thieden E, Philipsen PA, Wulf HC. Compliance and data reliability in sun
exposure studies with diaries and personal, electronic UV dosimeters.
Photodermatology, Photoimmunology & Photomedicine. 2006; 22(2):93-99.
246. Glanz K, Gies P, O'Riordan DL, et al. Validity of Self-reported Solar UVR
Exposure Compared with Objectively Measured UVR Exposure. Cancer
Epidemiology Biomarkers & Prevention. December 1, 2010. 2010;
19(12):3005-3012.
247. Mullis IV, Martin MO, Gonzalez EJ, Chrostowski SJ. TIMSS 2003
International Mathematics Report: Findings from IEA's Trends in
International Mathematics and Science Study at the Fourth and Eighth
Grades. ERIC; 2004.
248. Neidorf TS, Binkley M, Gattis K, Nohara D. Comparing Mathematics
Content in the National Assessment of Educational Progress (NAEP), Trends
in International Mathematics and Science Study (TIMSS), and Program for
International Student Assessment (PISA) 2003 Assessments. Technical
Report. NCES 2006-029. National Center for Education Statistics. 2006.
249. Reddy V. Cross‐national achievement studies: learning from South Africa's
participation in the Trends in International Mathematics and Science Study
(TIMSS). Compare: A Journal of Comparative and International Education.
2005; 35(1):63-77.
250. Mullis IV, Kennedy AM, Martin MO, Sainsbury M. PIRLS 2006 Assessment
Framework and Specifications: Progress in International Reading Literacy
Study. ERIC; 2004.
References 191
251. Martin MO, Mullis IV, Kennedy AM. Progress in International Reading
Literacy Study (PIRLS): PIRLS 2006 Technical Report. ERIC; 2007.
252. Goldstein H. International comparisons of student attainment: some issues
arising from the PISA study. Assessment in Education: principles, policy &
practice. 2004; 11(3):319-330.
253. Bevans M, Ross A, Cella D. Patient-Reported Outcomes Measurement
Information System (PROMIS(®)): Efficient, Standardized Tools to Measure
Self-Reported Health and Quality of Life. Nursing outlook. Sep-Oct 06/12
2014; 62(5):339-345.
254. Schalet BD, Cook KF, Choi SW, Cella D. Establishing a Common Metric for
Self-Reported Anxiety: Linking the MASQ, PANAS, and GAD-7 to
PROMIS Anxiety. Journal of anxiety disorders. 12/01 2014; 28(1):88-96.
255. Riley WT, Pilkonis P, Cella D. Application of the National Institutes of
Health Patient-Reported Outcomes Measurement Information System
(PROMIS®) to Mental Health Research. The journal of mental health policy
and economics. 2011; 14(4):201-208.
256. Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes
Measurement Information System (PROMIS): Progress of an NIH Roadmap
Cooperative Group During its First Two Years. Medical Care. 2007;
45(5):S3-S11.
257. Coups EJ, Stapleton JL, Hudson SV, et al. Linguistic acculturation and skin
cancer–related behaviors among hispanics in the southern and western united
states. JAMA Dermatology. 2013; 149(6):679-686.
258. Coups EJ, Stapleton JL, Hudson SV, et al. Skin cancer surveillance behaviors
among US Hispanic adults. Journal of the American Academy of
Dermatology. 4// 2013; 68(4):576-584.
259. Korta DZ, Saggar V, Wu TP, Sanchez M. Racial differences in skin cancer
awareness and surveillance practices at a public hospital dermatology clinic.
Journal of the American Academy of Dermatology. 2014; 70(2):312-317.
260. Harvey VM, Patel H, Sandhu S, Wallington SF, Hinds G. Social determinants
of racial and ethnic disparities in cutaneous melanoma outcomes. Cancer
Control. 2014; 21(4):343-349.
Appendices 193
Appendices
Appendix – SunAUS Scale
194 Appendices
Appendices 195
196 Appendices
Appendices 197
198 Appendices
Appendices 199
200 Appendices
Appendices 201
202 Appendices
Appendices 203
204 Appendices
Appendices 205
206 Appendices
Appendices 207
208 Appendices