Open Access Protocol International multiphase mixed (CLEFT-Q) · International multiphase mixed...

9
International multiphase mixed methods study protocol to develop a cross-cultural patient-reported outcome instrument for children and young adults with cleft lip and/or palate (CLEFT-Q) Karen W Y Wong Riff, 1 Elena Tsangaris, 2 Tim Goodacre, 3 Christopher R Forrest, 1 Andrea L Pusic, 4 Stefan J Cano, 5 Anne F Klassen 6 To cite: Wong Riff KWY, Tsangaris E, Goodacre T, et al. International multiphase mixed methods study protocol to develop a cross- cultural patient-reported outcome instrument for children and young adults with cleft lip and/or palate (CLEFT-Q). BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016- 015467 Prepublication history for this paper is available online. To view these files please visit the journal online (http://dx.doi.org/10.1136/ bmjopen-2016-015467). Received 7 December 2016 Accepted 13 December 2016 For numbered affiliations see end of article. Correspondence to Professor Anne F Klassen; [email protected] ABSTRACT Introduction: Patient-reported outcome (PRO) instruments should be developed according to rigorous guidelines in order to provide clinically meaningful, scientifically sound measurement. Understanding the methodology behind instrument development informs the selection of the most appropriate tool. This mixed methods protocol describes the development of an internationally applicable PRO instrument, the CLEFT- Q, for evaluating outcomes of treatment for cleft lip and/or palate (CL/P). Methods and analysis: The study includes three main phases that occur iteratively and interactively. In phase I, we determine what concepts are important to patients regarding their outcome. A conceptual framework for the CLEFT-Q is formed through a systematic review and an extensive international qualitative study. The systematic review ascertains what concepts have previously been measured in patients with CL/P. The qualitative study employs interpretive description and involves in-depth interviews with patients in high-income and lower-middle income countries. Preliminary items are generated from the qualitative data. Preliminary scales are then created for each theme in the framework. Cognitive debriefing interviews and expert clinician input are used to refine the scales in an iterative process. In phase II, the preliminary scales are administered to a large international group of patients with CL/P. The modern psychometric method of Rasch Measurement Theory analysis is employed to define the measurement characteristics. The preliminary scales are shortened based on these results. In phase III, further tests assess reliability, validity and responsiveness of the instrument. Ethics and dissemination: The study is approved by Research Ethics Boards for each participating site. Findings from this study will be published in open access peer-reviewed journals and presented at national and international conferences. Integrated knowledge translation is employed to engage stakeholders from the outset of the study. Successful execution of the CLEFT-Q will result in an internationally applicable PRO instrument for children and young adults with CL/P. INTRODUCTION Patient-reported outcomes (PROs) are increasingly important in the assessment of treatment effectiveness. 12 If PRO data are to be used to drive quality improvement and treatment decisions, PROs should be evalu- ated in a scientically sound manner using a PRO instrument developed according to rigorous guidelines. 3 The methodology behind the development or validationof an instrument can be complex. A clear descrip- tion and understanding of the methods can Strengths and limitations of this study Multicentre, international study that includes patients in high-income and lower-middle income countries will ensure the CLEFT-Q is internationally applicable. Extensive qualitative component of the study will ensure content validity of the CLEFT-Q. Adherence to rigorous guidelines of instrument development and use of modern psychometric methods will make the CLEFT-Q as scientifically sound and clinically relevant as possible. The scope of the study, which includes partici- pants from high-income and lower-middle income countries, necessitates a long time frame to completion. The CLEFT-Q field-test will not include children with CL/P aged under 8 years. Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467 1 Open Access Protocol on September 10, 2020 by guest. Protected by copyright. http://bmjopen.bmj.com/ BMJ Open: first published as 10.1136/bmjopen-2016-015467 on 11 January 2017. Downloaded from

Transcript of Open Access Protocol International multiphase mixed (CLEFT-Q) · International multiphase mixed...

  • International multiphase mixedmethods study protocol to developa cross-cultural patient-reportedoutcome instrument for children andyoung adults with cleft lip and/or palate(CLEFT-Q)

    Karen W Y Wong Riff,1 Elena Tsangaris,2 Tim Goodacre,3 Christopher R Forrest,1

    Andrea L Pusic,4 Stefan J Cano,5 Anne F Klassen6

    To cite: Wong Riff KWY,Tsangaris E, Goodacre T,et al. International multiphasemixed methods studyprotocol to develop a cross-cultural patient-reportedoutcome instrument forchildren and young adultswith cleft lip and/or palate(CLEFT-Q). BMJ Open2017;7:e015467.doi:10.1136/bmjopen-2016-015467

    ▸ Prepublication history forthis paper is available online.To view these files pleasevisit the journal online(http://dx.doi.org/10.1136/bmjopen-2016-015467).

    Received 7 December 2016Accepted 13 December 2016

    For numbered affiliations seeend of article.

    Correspondence toProfessor Anne F Klassen;[email protected]

    ABSTRACTIntroduction: Patient-reported outcome (PRO)instruments should be developed according to rigorousguidelines in order to provide clinically meaningful,scientifically sound measurement. Understanding themethodology behind instrument development informsthe selection of the most appropriate tool. This mixedmethods protocol describes the development of aninternationally applicable PRO instrument, the CLEFT-Q, for evaluating outcomes of treatment for cleft lipand/or palate (CL/P).Methods and analysis: The study includes threemain phases that occur iteratively and interactively. Inphase I, we determine what concepts are important topatients regarding their outcome. A conceptualframework for the CLEFT-Q is formed through asystematic review and an extensive internationalqualitative study. The systematic review ascertains whatconcepts have previously been measured in patientswith CL/P. The qualitative study employs interpretivedescription and involves in-depth interviews withpatients in high-income and lower-middle incomecountries. Preliminary items are generated from thequalitative data. Preliminary scales are then created foreach theme in the framework. Cognitive debriefinginterviews and expert clinician input are used to refinethe scales in an iterative process. In phase II, thepreliminary scales are administered to a largeinternational group of patients with CL/P. The modernpsychometric method of Rasch Measurement Theoryanalysis is employed to define the measurementcharacteristics. The preliminary scales are shortenedbased on these results. In phase III, further testsassess reliability, validity and responsiveness of theinstrument.Ethics and dissemination: The study is approvedby Research Ethics Boards for each participating site.Findings from this study will be published in openaccess peer-reviewed journals and presented atnational and international conferences. Integratedknowledge translation is employed to engage

    stakeholders from the outset of the study. Successfulexecution of the CLEFT-Q will result in aninternationally applicable PRO instrument for childrenand young adults with CL/P.

    INTRODUCTIONPatient-reported outcomes (PROs) areincreasingly important in the assessment oftreatment effectiveness.1 2 If PRO data are tobe used to drive quality improvement andtreatment decisions, PROs should be evalu-ated in a scientifically sound manner using aPRO instrument developed according torigorous guidelines.3 The methodologybehind the development or ‘validation’ of aninstrument can be complex. A clear descrip-tion and understanding of the methods can

    Strengths and limitations of this study

    ▪ Multicentre, international study that includespatients in high-income and lower-middleincome countries will ensure the CLEFT-Q isinternationally applicable.

    ▪ Extensive qualitative component of the study willensure content validity of the CLEFT-Q.

    ▪ Adherence to rigorous guidelines of instrumentdevelopment and use of modern psychometricmethods will make the CLEFT-Q as scientificallysound and clinically relevant as possible.

    ▪ The scope of the study, which includes partici-pants from high-income and lower-middleincome countries, necessitates a long time frameto completion.

    ▪ The CLEFT-Q field-test will not include childrenwith CL/P aged under 8 years.

    Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467 1

    Open Access Protocol

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://dx.doi.org/10.1136/bmjopen-2016-015467http://dx.doi.org/10.1136/bmjopen-2016-015467http://crossmark.crossref.org/dialog/?doi=10.1136/bmjopen-2016-015467&domain=pdf&date_stamp=2017-01-11http://bmjopen.bmj.comhttp://bmjopen.bmj.com/

  • help to inform researchers selecting an appropriatePRO instrument for their target patient population.Cleft lip and/or palate (CL/P) is the most common

    congenital craniofacial anomaly, with 7.94 cases per10 000 live births annually.3 The condition affects indivi-duals worldwide and impacts an individual’s appearance,dentition, hearing and speech. Treatment protocolsvary widely, within and between countries.4 5

    Observer-reported or clinician-reported outcomes formthe majority of clinical outcome assessments (COAs) todate.6–8 However, the goal of treatment of CL/P is toimprove the patient’s physical, psychological and socialhealth, all of which are difficult to evaluate accuratelywith observer-reported or clinician-reported outcomes.Measuring these outcomes requires the patient perspec-tive, but there is currently no comprehensive, specificPRO instrument for patients with CL/P available.9

    Beyond the scope of CL/P, few scales exist thatmeasure appraisal of appearance from the patient per-spective.10 Congenital anomalies, trauma and otherbenign and malignant conditions can cause facial orother differences that are stigmatising and may lead tosocial isolation. The treatment of these conditionsaddresses form and function, yet the outcomes of treat-ment cannot be measured without appropriate PROinstruments that evaluate these concerns specifically anddirectly. The current study begins to fill this gap in meas-urement of appraisal of appearance from the patientperspective.Many clinical conditions are prevalent around the

    world in high-income as well as low-income andmiddle-income countries. Multinational studies areincreasingly common, and PROs are frequently used asprimary or secondary end points. The ConsolidatedStandards of Reporting Trials (CONSORT) recommen-dations for reporting randomised controlled trials haveincluded a PRO extension to guide PRO reporting.2

    However, PRO instruments have typically been devel-oped in a single language and often in a singlecountry.11 Few PRO instruments have been designed foruse in low-income and middle-income countries.4 WhileCOAs such as clinician-reported or observer-reportedoutcomes are more easily compared between countries,it is difficult to compare PROs globally in the absence ofinstruments designed for global use. While guidelinesexist for translation and cross-cultural adaptation of PROinstruments,11 the optimal design would be to developthe instrument in a cross-cultural manner from theoutset.Establishing scientifically sound, cross-cultural meas-

    urement tools involves a rigorous process. The followingprotocol describes the methodology for an internationalstudy to develop a cross-cultural PRO instrument forchildren and young adults with CL/P, called theCLEFT-Q. To the best of our knowledge, the CLEFT-Qwill be the first international PRO measure that evalu-ates appraisal of appearance in addition to quality of lifeand function.

    METHODS AND ANALYSISDevelopment of the CLEFT-Q follows the guidelines setforth by the Scientific Advisory Committee of theMedical Outcomes Trust,12 the USA Food and DrugAdministration13 and the International Society forPharmacoeconomics and Outcomes Research.14 15

    The aim is to develop a self-report instrument forpatients 8–29 years of age that is internationally applic-able, multidimensional (eg, measures a number of dif-ferent concepts of interest (COIs)) and useful in clinicalpractice as well as in clinical audits and research.The study employs a multiphase mixed methods

    approach, with an iterative combination of qualitativeand quantitative inquiries.16 Measurement properties ofinstruments fall into the three categories of (1) reliability,(2) validity and (3) responsiveness. The Consensus-basedStandards for the Selection of Health StatusMeasurement Instruments (COSMIN) checklist wasdesigned to ensure and evaluate validity and reliability inmeasuring health-related PROs.17 18 Similarly, aminimum standard for PRO instruments was outlined bymembers of the International Society for Quality of LifeResearch (ISOQOL).19 There are three main phases todeveloping a PRO instrument, including item generation,item reduction and psychometric evaluation, and thesephases are carried out in an iterative and interactivemanner as opposed to a linear progression (figure 1).These three phases ensure that the resulting instrumentfulfills the minimum standards outlined by ISOQOL aswell as the COSMIN criteria for reliability and validity.The components of each phase are shown in figure 2.

    Phase I: what should we measure?The aims of phase I are to establish content validity of theCLEFT-Q and to generate preliminary scales. First, a sys-tematic review of the literature was performed to ensurethat there was indeed no existing instrument available andto define what PRO instruments have been validated andused in patients with CL/P in the past.20 A comprehensivesearch following PRISMA guidelines yielded 4595 citations,of which 26 studies met inclusion criteria.20 The studieswere carried out in 9 high-income countries, confirmingthe lack of PRO measurement in low-income andmiddle-income countries. Twenty-nine different PROinstruments were used in the 26 studies, and 20 measureswere used only once. On the basis of these findings, a

    Figure 1 The phases of PRO instrument development. It isimportant to note that the phases can occur iteratively andinteractively rather than in a linear progression. PRO,patient-reported outcome.

    2 Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://bmjopen.bmj.com/

  • need for a comprehensive PRO instrument for CL/Pexists, and we proceeded with the current study.

    Conceptual frameworkThe first step in phase I is to develop a conceptualframework, or ‘a rationale for and description of theconcepts and the populations that a measure isintended to assess and the relationship between thoseconcepts’.12 From the systematic review performed atthe outset of the study,20 COIs that were previouslymeasured are mapped to create a preliminary concep-tual framework.

    Qualitative studyNext, a comprehensive qualitative study is carried outwith participants with CL/P in high-income and lower-middle income countries. The qualitative methodologyemployed is Interpretive Description, which seeks to

    generate relevant knowledge for a clinical context pre-suming that there is theoretical and clinical knowledgeinforming the study.21 22 For this study, the theoreticalknowledge is derived from the systematic review, andclinical knowledge is derived from the team memberscarrying out the study. The philosophical underpinningof the qualitative study is pragmatism, meaning that theindividual’s understanding of a concept is of greatestimportance, regardless of clinical explanations.23

    Participants, setting and recruitmentEliciting knowledge in high-income and lower-middleincome countries allows for cultural differences to beidentified from the outset, facilitating accurate targetingof the scales in subsequent phases. The participatingcentres in this phase of the study are in six countries(Canada, Kenya, India, Philippines, UK and USA).Recruitment takes place at cleft care centres. In the

    Figure 2 Flow diagram showingthe multiphase mixed methodsprotocol for developing theCLEFT-Q. It is important to notethat the process can be iterativeand interactive as opposed tostrictly linear. QUAN, quantitativestudy component; QUAL,qualitative study component.

    Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467 3

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://bmjopen.bmj.com/

  • high-income countries (Canada, UK, USA), participantsare recruited either through posters in clinics (and con-tacted by telephone to arrange an interview), orface-to-face in the clinical setting. In the lower-middleincome countries (Kenya, India, Philippines), a studyteam member recruits participants face-to-face in theclinical setting. Participants are eligible for inclusion ifthey have a diagnosis of CL/P. In the high-income coun-tries, participants between 8–29 years of age areincluded. In the lower-middle income countries, partici-pants of any age are included if they are presenting forclinical care to maximise the information gathered atthese sites. In addition, parents of children with CL/P inthe lower-middle income countries are invited to partici-pate if the child prefers. This difference is importantsince a study team member, foreign to these countries,is present and working with a translator, which maymake the child feel less comfortable if they are alone.Exclusion criteria include the inability to speak the lan-guage of the interviewer or translator in each country ora cognitive delay such that the individual cannot partici-pate in a semistructured interview.

    SamplingParticipants are purposively sampled to gain a heteroge-neous sample based on age, gender and cleft type.Sampling continues until the point of saturation, whenno further new concepts arise in subsequentinterviews.24

    Data collectionAfter obtaining written assent and/or consent as appro-priate, a study team member trained in qualitative inter-viewing technique carries out individual, semistructuredinterviews that are audio-recorded, using a translator inthe lower-middle income countries as needed.24

    Participant age, gender and cleft type is documented.An interview guide is developed based on the prelimin-ary conceptual framework, providing a list of open-ended questions for the interview. The interviewerprobes new concepts as they arise. As standard qualita-tive methods dictate, data from interviews are analysedon an ongoing basis, allowing for changes to be made tothe interview guide for subsequent interviews to includenew concepts that warrant further probing.

    Data analysisInterviews are transcribed verbatim. Interviews per-formed through a translator, which would have languagein English and the target language, are again translatedto English by a bilingual individual to confirm the trans-lation. The interview data are then analysed withinNVivo V.8 software (QSR International Pty, 2012) usingthe line-by-line approach to coding data, with constantcomparison used to identify and classify the COIs identi-fied. These concepts are then categorised into overarch-ing domains with themes within the domains to refinethe preliminary conceptual framework. Concurrent and

    iterative data collection and analysis are performed,allowing for changes to be made to the interview guideas new concepts arise. When no further new conceptsare elicited from interviews, data collection ends and theconceptual framework is finalised. This conceptualframework represents all the COIs to patients in six dif-ferent countries with CL/P.

    RigorRigor in the qualitative study is ensured using severalstrategies. One team member performs data coding, anda second team member then confirms the analysis. Byperforming interviews in an iterative fashion,member-checking is employed to confirm that conceptsidentified are indeed valuable and important to partici-pants with CL/P. Finally, peer debriefing is used to verifydata analysis between members of the study team.

    Item generationCoding of the qualitative data creates an exhaustive listof potential items to include in scales. A list of scales tobe created is derived from the conceptual frameworkarising from the qualitative study. Each theme within thedomains of the conceptual framework is turned into anindividual preliminary scale. In this way, the entire suiteof scales should cover all the COIs to patients with CL/P.Individual scales are populated with items generatedfrom the patients’ own language whenever possible withthe lowest feasible grade reading level (Fleisch-Kincaidlevel). Positive or neutral wording is adopted for theitems in the scales as much as possible to limit any nega-tive effects of filling out the CLEFT-Q in the future.

    Refining the preliminary scalesThe final stage of phase I aims to refine the preliminaryscales through an iterative process of returning to thepatients to perform cognitive debriefing interviews14

    and obtaining expert multidisciplinary clinician input.

    Cognitive debriefing interviewsOnce the preliminary scales are formed, further semi-structured individual interviews are carried out toensure that patients with CL/P understand the items onthe scales and to confirm that no concepts are missing.Recruitment is carried out in a similar fashion to thequalitative study with participants from multiple coun-tries to ensure cross-cultural input. Participants gothrough all the items on the preliminary scales with theinterviewer using the ‘think aloud’ technique. The inter-viewer records items that are problematic and thereasons why these items are problematic. Cognitivedebriefing interviews are carried out iteratively alongsideobtaining expert clinician input as described below.Data from both sources are analysed concurrently, againallowing for progressive improvements to the scales.Cognitive debriefing interviews follow a similar strategyas the qualitative study in that interviews continue untilno further issues with the items on the scales arise.

    4 Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://bmjopen.bmj.com/

  • Expert clinician inputExpert clinician input is sought to ensure that nofurther concepts should be included in the scales.Clinicians involved in cleft care from different disci-plines (nursing, orthodontics, otolaryngology, paediat-rics, psychology, social work, speech-language pathology,surgery) are purposively sampled from multiple coun-tries through the networks of the study team. Focusgroups with groups of clinicians are performed in asimilar fashion to the cognitive debriefing interviews. Incases where focus groups cannot be performed, individ-ual input is sought. The interviewer goes through all theitems on the scales, looking for input on any missingitems or on the wording of items. Again, data are ana-lysed concurrently with the cognitive debriefing inter-views to refine the scales.

    TranslationIn the next phase of the study, the scales are field-testedin a large population of patients from multiplecountries. The preliminary scales are translated intothe necessary target languages according to guidelinesset forth by the International Society forPharmacoeconomics and Outcomes Research25 andMapi Research Trust.26 Briefly, each translation is per-formed using two translators whose mother tongue is inthe target language and are fluent English. The twotranslators perform independent translations of theCLEFT-Q from English to the target language. Resultingtranslations are then reconciled to create a single trans-lated version. A third individual whose mother tongue isEnglish and is fluent in the target language then trans-lates this version back into English, and this Englishversion is compared to the original. The group thenresolves the discrepancies together. The translated ver-sions are taken back to the patient population in furthercognitive debriefing interviews to ensure that themeaning of the items, response options and instructionsare the same, and that the wording is appropriate. Atthe end of this phase, a complete set of CLEFT-Q scalesis ready to be tested in a population of patients.

    Phase II: what questions are effective in measuring theconcepts identified in phase I?The next phase of developing the CLEFT-Q involvesfield-testing the scales in a large population of patientswith CL/P to determine which items on the scales arethe most effective in measuring the COIs. We employthe modern psychometric method of RaschMeasurement Theory (RMT) analysis to identify whichitems perform well on scales and to determine the meas-urement properties of the scales.27 In order to providerigorous measurement, the data must fit the require-ments of a mathematical model, that is, the Raschmodel. Briefly, RMT creates a scale where an individualis placed along the scale based on the probability thathe/she answered the questions or items in a certain way.This method contrasts with classical test theory, where

    scores are designed for group level analyses. This differ-ence in mathematical modelling allows RMT analysis toprovide an accurate individual person estimate. A RMTscale can be conceptualised as a ruler, with an orderedarrangement or hierarchy of items from a low to high‘amount’ of the construct. RMT analysis creates interval-level measurement, or a scale where the notches on thescale are evenly spaced, as opposed to ordinal-level meas-urement, or a scale where the notches are not necessarilyevenly spaced. Interval-level measurement allows foraccurate tracking of change over time.28 In addition,RMT analysis results in a scale that provides person esti-mates that are independent of the sampling distributionof the items. In other words, the scale functions the sameway regardless of the people that it is measuring,meaning that the same scale can be used accurately indifferent subsets of the target population (eg, partici-pants in different countries, or of different ages).Through the RMT analysis, the psychometric proper-

    ties of the scale are defined. Items that are effective inmeasurement within the preliminary scales are thenkept, and items that do not function as well in measure-ment or items that are identified as being redundantcan be dropped. The final scales are created throughthis process of item reduction as described below.

    Pilot field-testA large-scale field-test of the CLEFT-Q is planned to takeplace in multiple countries. Since a multicentredfield-test is a resource-intensive endeavour, a pilotfield-test is carried out at two sites in Ontario, Canada,to identify any logistic obstacles and to perform an earlypreliminary RMT analysis to troubleshoot any earlyissues with scale performance.

    Study participantsPatients with CL/P who are 8–29 years of age and whodo not have a cognitive delay resulting in an inability tofill out the scales are recruited from two clinical settingsin Canada. A minimum of 200 patients is required toperform the preliminary RMT analysis. Since this pilotstudy is meant to optimise the scales prior to thelarge-scale field-test, the preliminary RMT analysis maytrigger further data collection and recruitment prior tofinalising the field-test versions of the scales.

    Data collectionParticipants are asked to fill out the CLEFT-Q scales onpaper and to give qualitative feedback in written formaton completion. Demographic characteristics includingage, gender, cleft type and stage of treatment are col-lected. Participants are also asked if they feel that thelength of the entire CLEFT-Q is ‘about right’, ‘too long’or ‘too short’. The time to complete the scales isrecorded.

    Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467 5

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://bmjopen.bmj.com/

  • Data analysisQualitative feedback is analysed in a similar fashion tothe cognitive debriefing interviews. Details of the RMTanalysis are described in further detail below. The resultsfrom the qualitative and RMT analyses are used tofurther refine the scales. This iterative nature to scaledevelopment optimises the likelihood that the scales willfunction well with minimal logistical obstacles in theensuing large-scale field-test.

    International field-test and RMT analysisThe goal of the international field-test is to gatherCLEFT-Q data from a large population of patients withCL/P internationally to define which items should beincluded in the final scales and to examine the measure-ment properties of the scales.

    Study participantsThe international field test includes participants from 12countries (Australia, Canada, Chile, Colombia, England,Ireland, India, Netherlands, Spain, Sweden, Turkey andUSA). Centres are included based on interest and feasibil-ity of recruiting the sample size required in a reasonabletime frame. Participants with CL/P between the ages of 8and 29 years are recruited to fill out the CLEFT-Q scales.Exclusion criteria include a cognitive delay resulting in theinability to complete the scales. Recruitment takes placeeither face-to-face or by mail depending on each centre’spreferences. The goal is to recruit a minimum of 108 fromeach country; a sample size from 108 to 200 results in itemcalibrations that are stable within 0.5 logits (person loca-tion estimates) with a 99% CI.29

    Data collectionThe demographic characteristics collected are listed intable 1. Participants will fill out the CLEFT-Q scaleseither on paper or on tablets in Research ElectronicData Capture (REDCap), a secure, web-based applica-tion for electronic data capture.30

    Data analysisField-test data are entered into REDCap if participantsfilled out the scales on paper. Completed data files arethen downloaded into IBM SPSS 22.0 (IBM Corp. IBMSPSS Statistics for Windows. 22.0 ed. Armonk, New York,

    USA: IBM Corp., Released 2013). The SPSS file is thenimported into RUMM2030, the Rasch analysis software.31

    Each scale is analysed independently. The psychometricfunction of each scale is examined using a number oftests and various criteria. First, the thresholds for theitem response options must be ordered, meaning that a‘1’ on a 4-point scale must sit lower in the continuumthan a ‘2’, and so on. The RMT analysis then defines thehierarchy of items on the scale, from the ‘easiest’ ques-tion for a patient to endorse to the ‘hardest’ question.Second, 3 item fit statistics are used to evaluate whetherthe items in a scale work together as a set: (1) log resi-duals, which represent item-person interaction; (2) χ2

    values, which represent item-trait interaction and (3)item characteristic curves. Items that are not functioningwell with respect to these 3 statistics will be droppedfrom the scales unless they represent clinically importantconcepts. Third, the scale must be targeted to the popu-lation. The range of the construct measured by the scaleis compared to the range of the construct experiencedby the population, and maximal overlap is preferable toensure that the scale can measure the construct in thepopulation of interest.The next component of the analysis ensures internal

    consistency, which refers to the interrelatedness amongitems on a scale. First, the scale is tested for unidimension-ality, or whether the items on the scale all measure a singleconstruct.32 Second, the scale is evaluated using the PersonSeparation Index, a measure of the precision of a personestimate, which is a corollary of reliability (Cronbach’s α)in classical test theory.33 At any stage of the analysis, scalesthat are not functioning appropriately can be analysed withpoorly functioning items dropped. This process continuesuntil all the above statistics are within the acceptable range.

    Differential item functioningSince the Rasch model creates a fixed ruler that is inde-pendent of the individual person estimates, differencesbetween subgroups can be identified. Differential itemfunctioning (DIF) occurs when one subset of the targetpopulation answers a question differently than anothersubset.33 In creating an international PRO instrument forchildren and young adults, differences based on countryand age are an important consideration in creating scientif-ically sound instruments. In the field-test, DIF can be iden-tified in RUMM2030 and items that show DIF can bedropped in the item reduction phase or kept in with adjust-ments made to the scoring to account for the differences.

    Item reductionFrom the item and threshold locations, the location ofeach question within the field-test scale on the overallruler can be determined. Poorly functioning items canbe dropped as described above, and extra items thatmeasure in a similar fashion (showing residual correla-tions in the RMT analysis) can be dropped to develop ascale with the optimal number of items. It should benoted that at some point, further dropping of items will

    Table 1 Demographic characteristics collected forparticipants in the international field-test

    Demographic characteristics

    Age SyndromesGender Other craniofacial anomaliesCleft type Developmental disabilitiesCountry Past treatmentsStudent status Current treatmentsLanguage spoken at home Future treatmentsAdopted

    6 Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://bmjopen.bmj.com/

  • result in less precise measurement. The final decisionregarding the optimal number of items depends on thedistribution of the item locations as well as some clinicalindication of a requirement for a certain degree of pre-cision. Once item reduction is complete, the scales arefinalised. The RMT analysis then provides a scoring tablefor each scale, since calculating the score on each scaleis more complex than simply summing the responses toeach of the individual items.

    Normative data and construct validityOnce the scale scoring has been determined, scores arecalculated for the field-test participants. Normative dataand basic associations between scores and demographiccharacteristics can then be calculated using analysis ofvariance (ANOVA) in SPSS.Construct validity includes the aspects of structural val-

    idity, which assesses internal relationships, hypothesestesting and cross-cultural validity. Structural validity andcross-cultural validity are addressed in the RMT analysiswith unidimensionality and DIF, respectively. Hypothesestesting is used to establish whether the responses eithercorrelate or differ in different patient groups in a waythat would be expected.34 In the CLEFT-Q, we test thefollowing hypotheses: (1) that patients with a visible dif-ference, that is, CL and CL/P, will have lower scores onappraisal of appearance compared to those with an invis-ible difference (ie, CP only); (2) that patients undergo-ing speech therapy or speech surgery will have lowerscores on the speech scales than those not requiring anyfurther intervention; (3) that patients requiring furthertreatment to the nose, lip or jaw will have lower scoreson the appearance scales as well as the quality of lifescales compared to those not needing any further treat-ment; (4) that patients who rank their appraisal of theiroverall appearance or speech to be higher (‘like’ theirappearance more) on a four-point scale will have higherscores on the appearance or speech and quality of lifescales, respectively and (5) that patients who are receiv-ing psychological counselling or therapy will have lowerscores on the quality of life scales. ANOVA in SPSS willbe used to test these hypotheses.

    Phase III: how does the instrument work?Several components of the COSMIN checklist areaddressed in phases I and II of development. Additionaltests to ensure reliability, validity and responsivenesscomprise phase III. All tests of the CLEFT-Q employ thefinalised scales in this phase.

    ReliabilityReliability includes two measurement concepts: (1)internal consistency, which is evaluated in phase II; and(2) test–retest reliability, which is evaluated in phase III.To establish test–retest reliability, a smaller group ofpatients complete the CLEFT-Q scales and thencomplete the scales again 1 week after the first adminis-tration. Scales that are reliable will have a minimum

    test–retest reliability of 0.70 in studies including at least50 patients.35

    ValidityIn the COSMIN checklist, the domain of validityincludes three measurement properties, that is, contentvalidity, construct validity and criterion validity.17 18

    Content validity is addressed in phase I of the study, andconstruct validity is addressed in phase II.The final component of validity is criterion validity, or

    the degree to which the instrument reflects the findingson a ‘gold standard’ instrument.17 When an instrument iscomparable to similar instruments, concurrent validity isestablished. While we did not identify any single instru-ment as comprehensive as the CLEFT-Q, the aim of thissubstudy is to compare the results on the CLEFT-Q to twoother instruments used in the past in patients with CL/P:(1) the Child Oral Health Impact Profile (COHIP),36 37

    and (2) the CHASQ.38 We hypothesise that CLEFT-Qscores for similar constructs will moderately correlatewith the scores on these other instruments.

    ResponsivenessResponsiveness evaluates the instrument’s ability todetect clinically meaningful change over time. The twomain methods of evaluating responsiveness include ananchor-based and a distribution-based approach. In theanchor-based approach, patient-rated, clinician-rated orcondition-specific variables are used to estimate a min-imally important difference (MID) for a scale.39 Thedistribution-based approach estimates the MID based onthe distribution of scores from a target population.39

    Techniques to evaluate responsiveness are debated inthe literature.17 RMT analysis has been shown to allowfor increased detection of responsiveness.40 We employ avariety of methods to best define responsiveness.

    Study participantsParticipants for the test–retest reliability and criterionvalidity testing are recruited simultaneously. Again, parti-cipants from 8-29 years of age are recruited from theclinical setting with the same exclusion criteria as thefield-test. Since this phase requires fewer numbers ofparticipants (50), the number of participating centers islower than the field-test (Canada, UK, USA). To studyresponsiveness, participants who are undergoing either(1) orthognathic surgery, (2) rhinoplasty or (3) lip revi-sion are recruited.

    Data collectionParticipants fill out the CLEFT-Q scales in addition tothe COHIP and the CHASQ on tablets throughREDCap. Contact information is collected, and partici-pants are sent a link to complete the CLEFT-Q scalesonline 1 week later. Similar demographic data to thefield-test is collected. For the responsiveness substudy,participants fill out the CLEFT-Q scales preoperatively.Contact information is collected and participants are

    Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467 7

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://bmjopen.bmj.com/

  • sent a link to the complete the CLEFT-Q scales again atleast 6 months later.

    Data analysisTest–retest reliabilityCLEFT-Q scores are calculated from the two separateadministrations of the scales for each participant. Test–retest reliability is then calculated in SPSS.

    Criterion validityCLEFT-Q, COHIP and CHASQ scores are calculated foreach participant. Scores on each of the scales are thencompared using a Pearson’s r correlation in SPSS.

    ResponsivenessAnchor-based techniques are used to calculate the MIDfrom the transformed Rasch scores. To support theanchor-based methods, a distribution-based approach isused. The transformed Rasch scores are compared usingpaired t-tests, and then an effect size and standardisedresponse means, two indicators of change, can be calcu-lated.39 40 One of the strengths of RMT analysis is theability to perform individual person level analyses forresponsiveness. The tests listed above provide grouplevel comparisons. In individual person level compari-son, the significance of a person’s own change can becalculated using the individual person estimates, whichare associated with bespoke standard errors.40 Usinggroup level and individual level comparisons, theresponsiveness of the CLEFT-Q can be defined asclearly as possible.

    ETHICSThroughout the study, participants may be asked to discussor answer questions about issues that are sensitive and mayexperience distress as a result. To address this concern,study team members explain during the consent processthat should this occur, an option to follow-up with a clin-ical team member will be provided. Participants are alsoassured that all information is kept confidential; in thequalitative phase, interviews are transcribed with no identi-fying data, and in the qualitative phase, identifying dataare kept in a separate file at each institution.

    DISSEMINATIONThe intention of the study is not to directly compare dif-ferent centres with respect to their outcomes. Any publi-cations or presentations arising from this study will notidentify specific centres.An integrated knowledge translation approach is taken

    in this study. Collaborations with multiple sites inter-nationally will hopefully result in increased uptake andthe use of the CLEFT-Q in the future. All phases II andIII results for participants from each site will be sentback to the individual sites for their own use.Finally, results of the study will be published in open

    access journals as required by the granting agency. Study

    team members will present the results at internationaland national conferences.

    Author affiliations1Division of Plastic and Reconstructive Surgery, Department of Surgery,Hospital for Sick Children, University of Toronto, Toronto, Canada2Department of Clinical Epidemiology and Biostatistics, McMaster University,Hamilton, Ontario, Canada3Spires Cleft Center, Oxford Radcliffe Children’s Hospital, Oxford, UK4Department of Plastic and Reconstructive Surgery, Memorial Sloan-KetteringCancer Center, New York, New York, USA5Modus Outcomes, Letchworth Garden City, UK6Department of Pediatrics, McMaster University, Hamilton, Ontario, Canada

    Twitter Follow Anne Klassen @anneklassen and Karen Wong Riff@KWongRiffMD

    Acknowledgements The authors acknowledge our partners in cleft carecentres around the world for participating in data collection in various phasesof the study.

    Contributors KWYWR and AFK conceived of the study. KWYWR, AFK, ET,TG, CRF, ALP and SJC participated in the design of the study. All authorsread and approved the final manuscript.

    Funding This work is supported by the Canadian Institutes of HealthResearch Open Operating Grant number 299133 (to AFK, KWYWR, ALP andCRF), the Physicians’ Services Incorporated Foundation Resident ResearchGrant (to KWYWR, AFK and CRF) and the Canadian Society of PlasticSurgeons Educational Foundation Grant (to KWYWR, AFK and CRF).

    Competing interests KWYWR, ET, TG, CRF, ALP and AFK declare that theyhave no competing interests. SJC started Modus Outcomes, a company thatdesigns PRO instruments, after the inception of this study.

    Patient consent Obtained.

    Provenance and peer review Not commissioned; peer reviewed for ethicaland funding approval prior to submission.

    Data sharing statement No additional data are available.

    Open Access This is an Open Access article distributed in accordance withthe Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license,which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, providedthe original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

    REFERENCES1. Basch E. The missing voice of patients in drug-safety reporting.

    N Engl J Med 2010;362:865–9.2. Calvert M, Blazeby J, Altman DG, et al. Reporting of patient-reported

    outcomes in randomized trials: the CONSORT PRO extension.JAMA 2013;309:814–22.

    3. Tanaka SA, Mahabir RC, Jupiter DC, et al. Updating theepidemiology of cleft lip with or without cleft palate. Plast ReconstrSurg 2012;129:511e–18e.

    4. Paltzer J, Barker E, Witt WP. Measuring the health-related quality oflife (HRQoL) of young children in resource-limited settings: a reviewof existing measures. Qual Life Res 2013;22:1177–87.

    5. Shaw WC, Semb G, Nelson P, et al. The Eurocleft project 1996–2000:overview. J Craniomaxillofac Surg 2001;29:131–40; discussion 41-2.

    6. Semb G, Brattström V, Mølsted K, et al. The Eurocleft study:intercenter study of treatment outcome in patients with complete cleftlip and palate. Part 1: introduction and treatment experience. CleftPalate Craniofac J 2005;42:64–8.

    7. Long RE Jr, Hathaway R, Daskalogiannakis J, et al. The Americleftstudy: an inter-center study of treatment outcomes for patients withunilateral cleft lip and palate part 1. Principles and study design.Cleft Palate Craniofac J 2011;48:239–43.

    8. Heliövaara A, Küseler A, Skaare P, et al. SCANDCLEFT randomisedtrials of primary surgery for unilateral cleft lip and palate: 6. Dental archrelationships at 5 years. J Plast Surg Hand Surg 2016;25:1–6.

    8 Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://twitter.com/anneklassenhttp://twitter.com/KWongRiffMDhttp://creativecommons.org/licenses/by-nc/4.0/http://creativecommons.org/licenses/by-nc/4.0/http://creativecommons.org/licenses/by-nc/4.0/http://dx.doi.org/10.1056/NEJMp0911494http://dx.doi.org/10.1001/jama.2013.879http://dx.doi.org/10.1097/PRS.0b013e3182402dd1http://dx.doi.org/10.1097/PRS.0b013e3182402dd1http://dx.doi.org/10.1007/s11136-012-0260-1http://dx.doi.org/10.1054/jcms.2001.0217http://dx.doi.org/10.1597/02-119.1.1http://dx.doi.org/10.1597/02-119.1.1http://dx.doi.org/10.1597/09-180.1http://bmjopen.bmj.com/

  • 9. Eckstein DA, Wu RL, Akinbiyi T, et al. Measuring quality of life incleft lip and palate patients: currently available patient-reportedoutcomes measures. Plast Reconstr Surg 2011;128:518e–26e.

    10. Wickert NM, Wong Riff KW, Mansour M, et al. Content Validity ofPatient Reported Outcome Instruments Used with Pediatric Patientswith Facial Differences: A Systematic Review. Cleft Palate CraniofacJ 2016 In Press.

    11. Wild D, Eremenco S, Mear I, et al. Multinationaltrials-recommendations on the translations required, approaches tousing the same language in different countries, and the approachesto support pooling the data: the ISPOR Patient-Reported OutcomesTranslation and Linguistic Validation Good Research Practices TaskForce report. Value Health 2009;12:430–40.

    12. Aaronson N, Alonso J, Burnam A, et al. Assessing health status andquality-of-life instruments: attributes and review criteria. Qual LifeRes 2002;11:193–205.

    13. U.S. Department of Health and Human Services Food and DrugAdministration. Guidance for Industry Patient-Reported OutcomeMeasures: Use in Medical Product Development to Support LabelingClaims. http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf[2].

    14. Patrick DL, Burke LB, Gwaltney CJ, et al. Content validity—establishingand reporting the evidence in newly developed patient-reportedoutcomes (PRO) instruments for medical product evaluation: ISPORPRO Good Research Practices Task Force report: part 2—assessingrespondent understanding. Value Health 2011;14:978–88.

    15. Patrick DL, Burke LB, Gwaltney CJ, et al. Content validity—establishingand reporting the evidence in newly developed patient-reportedoutcomes (PRO) instruments for medical product evaluation: ISPORPRO good research practices task force report: part 1—elicitingconcepts for a new PRO instrument. Value Health 2011;14:967–77.

    16. Creswell JW, Plano Clark VL. Designing and conducting mixedmethods research. 2nd edn. Los Angeles (CA): Sage, 2011.

    17. Mokkink LB, Terwee CB, Knol DL, et al. The COSMIN checklist forevaluating the methodological quality of studies on measurementproperties: a clarification of its content. BMC Med Res Methodol2010;10:22.

    18. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN checklistfor assessing the methodological quality of studies on measurementproperties of health status measurement instruments: aninternational Delphi study. Qual Life Res 2010;19:539–49.

    19. Reeve BB, Wyrwich KW, Wu AW, et al. ISOQOL recommendsminimum standards for patient-reported outcome measures used inpatient-centered outcomes and comparative effectiveness research.Qual Life Res 2013;22:1889–905.

    20. Klassen AF, Tsangaris E, Forrest CR, et al. Quality of life of childrentreated for cleft lip and/or palate: a systematic review. J PlastReconstr Aesthet Surg 2012;65:547–57.

    21. Thorne S, Kirkham SR, MacDonald-Emes J. Interpretive description:a noncategorical qualitative alternative for developing nursingknowledge. Res Nurs Health 1997;20:169–77.

    22. Thorne SE. Interpretive description. Developing qualitative inquiry,vol 2. Walnut Creek (CA): Left Coast Press, 2008.

    23. Welford C, Murphy K, Casey D. Demystifying nursing researchterminology. Part 1. Nurse Res 2011;18:38–43.

    24. Sandelowski M. Theoretical saturation. In: Given LM, ed. The sageencyclopedia of qualitative methods. Thousand Oaks (CA): Sage,2008:875–6.

    25. Wild D, Grove A, Martin M, et al., ISPOR Task Force for Translationand Cultural Adaptation. Principles of Good Practice for theTranslation and Cultural Adaptation Process for Patient-ReportedOutcomes (PRO) Measures: report of the ISPOR Task Force forTranslation and Cultural Adaptation. Value Health 2005;8:94–104.

    26. Acquadro C, Conway K, Giroudet C, et al. Linguistic validationmanual for health outcome assessments. Lyon: Mapi Institute, 2012.

    27. Rasch G. Probabilistic models for some intelligence and attainmenttests. vol 1. Copenhagen: Studies in Mathematical Psychology,1960.

    28. Cano SJ, Hobart JC. The problem with health measurement. PatientPrefer Adherence 2011;5:279–90.

    29. Linacre JM. Sample size and item calibration (or person measure)stability. Rasch Meas Trans 1994;7:328.

    30. Harris PA, Taylor R, Thielke R, et al. Research electronic datacapture (REDCap)—a metadata-driven methodology and workflowprocess for providing translational research informatics support.J Biomed Inform 2009;42:377–81.

    31. RUMM Lab. http://www.rummlab.com.au/ (accessed 10 Oct 2014).32. Fayers PM, Hand DJ, Bjordal K, et al. Causal indicators in quality of

    life research. Qual Life Res 1997;6:393–406.33. Andrich D. Rasch models for measurement. Sage university papers

    series quantitative applications in the social sciences, vol 07-068.Newbury Park (CA): Sage, 1988.

    34. Strauss ME, Smith GT. Construct validity: advances in theory andmethodology. Annu Rev Clin Psychol 2009;5:1–25.

    35. Giraudeau B, Mary JY. Planning a reproducibility study: how manysubjects and how many replicates per subject for an expected widthof the 95 per cent confidence interval of the intraclass correlationcoefficient. Stat Med 2001;20:3205–14.

    36. Broder HL, Wilson-Genderson M. Reliability and convergent anddiscriminant validity of the Child Oral Health Impact Profile (COHIPChild’s version). Community Dent Oral Epidemiol 2007;35(Suppl1):20–31.

    37. Broder HL, Wilson-Genderson M, Sischo L. Reliability and validitytesting for the Child Oral Health Impact Profile-Reduced (COHIP-SF19). J Public Health Dent 2012;72:302–12.

    38. Emerson M, Spencer-Bowdage S, Bates A. Relationships betweenself-esteem, social experiences and satisfaction with appearance:standardisation and construct validation of two cleft audit measures.Presented at the Annual Scientific Conference of the CraniofacialSociety of Great Britain and Ireland, Bath, UK, 2004.

    39. Revicki D, Hays RD, Cella D, et al. Recommended methods fordetermining responsiveness and minimally important differences forpatient-reported outcomes. J Clin Epidemiol 2008;61:102–9.

    40. Hobart JC, Cano SJ, Thompson AJ. Effect sizes can be misleading:is it time to change the way we measure change? J NeurolNeurosurg Psychiatry 2010;81:1044–8.

    Wong Riff KWY, et al. BMJ Open 2017;7:e015467. doi:10.1136/bmjopen-2016-015467 9

    Open Access

    on Septem

    ber 10, 2020 by guest. Protected by copyright.

    http://bmjopen.bm

    j.com/

    BM

    J Open: first published as 10.1136/bm

    jopen-2016-015467 on 11 January 2017. Dow

    nloaded from

    http://dx.doi.org/10.1097/PRS.0b013e31822b6a67http://dx.doi.org/10.1111/j.1524-4733.2008.00471.xhttp://dx.doi.org/10.1023/A:1015291021312http://dx.doi.org/10.1023/A:1015291021312http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf[2]http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf[2]http://dx.doi.org/10.1016/j.jval.2011.06.013http://dx.doi.org/10.1016/j.jval.2011.06.014http://dx.doi.org/10.1186/1471-2288-10-22http://dx.doi.org/10.1007/s11136-010-9606-8http://dx.doi.org/10.1007/s11136-012-0344-yhttp://dx.doi.org/10.1016/j.bjps.2011.11.004http://dx.doi.org/10.1016/j.bjps.2011.11.004http://dx.doi.org/10.1002/(SICI)1098-240X(199704)20:23.0.CO;2-Ihttp://dx.doi.org/10.7748/nr2011.07.18.4.38.c8635http://dx.doi.org/10.1111/j.1524-4733.2005.04054.xhttp://dx.doi.org/10.2147/PPA.S14399http://dx.doi.org/10.2147/PPA.S14399http://dx.doi.org/10.1016/j.jbi.2008.08.010http://www.rummlab.com.au/http://www.rummlab.com.au/http://dx.doi.org/10.1023/A:1018491512095http://dx.doi.org/10.1146/annurev.clinpsy.032408.153639http://dx.doi.org/10.1002/sim.935http://dx.doi.org/10.1111/j.1600-0528.2007.0002.xhttp://dx.doi.org/10.1111/j.1752-7325.2012.00338.xhttp://dx.doi.org/10.1016/j.jclinepi.2007.03.012http://dx.doi.org/10.1136/jnnp.2009.201392http://dx.doi.org/10.1136/jnnp.2009.201392http://bmjopen.bmj.com/

    International multiphase mixed methods study protocol to develop a cross-cultural patient-reported outcome instrument for children and young adults with cleft lip and/or palate (CLEFT-Q)AbstractIntroductionMethods and analysisPhase I: what should we measure?Conceptual frameworkQualitative studyParticipants, setting and recruitmentSamplingData collectionData analysisRigor

    Item generationRefining the preliminary scalesCognitive debriefing interviewsExpert clinician inputTranslation

    Phase II: what questions are effective in measuring the concepts identified in phase I?Pilot field-testStudy participantsData collectionData analysis

    International field-test and RMT analysisStudy participantsData collectionData analysisDifferential item functioning

    Item reductionNormative data and construct validity

    Phase III: how does the instrument work?ReliabilityValidityResponsivenessStudy participantsData collectionData analysisTest–retest reliabilityCriterion validityResponsiveness

    EthicsDisseminationReferences