Evidence of Validity for the Hip Outcome Score

8
 Evidence of Validity for the Hip Outcome Score RobRoy L. Martin, Ph.D., P.T., C.S.C.S., Bryan T. Kelly, M.D., and Marc J. Philippon, M.D. Purpose:  The purpose of this study was to offer evidence of validity for the Hip Outcome Score (HOS) based on internal structure, test content, and relation to other variables.  Methods: The study population consisted of 507 subjects with a labral tear. Internal structure was evaluated by use of factor analysis and coefcient  . Test content was evaluat ed by use of item response theor y. Pearson correlation coefcients were used to assess relations between the Short Form 36 and the HOS. Results:  The mean subject age was 38 years (range, 13 to 66 years), with 232 male and 273 female subjects. Of the subjects, 263 (52%) underwent arthroscopic surgery. Factor analysis found that 17 of 19 items on the activities-of-daily-living (ADL) subscale loaded on 1 factor. The 2 items that did not t the 1-factor model were omitted from further testing. All 9 items on the sports subsca le loaded on 1 factor. The coefcient values were .96 and .95 for the ADL and sports subscales, respectively. The errors associated with a single measure were 4.6 and  3.8 points for the ADL and sports subscales, respectively. Item response theory found that all items contributed to their test information curves and wer e pot ent ial ly resp ons ive . The cor rel ations bet wee n the HOS and Short Form 36 measures of physical function were signicantly different than their correlation to measures of mental functioning ( P .005). Conclusions: The results of this study provide evidence of validity to support the use of the HOS ADL and sports subscales for individuals with labral tears. This inc ludes individuals who underwent arthroscopic surgery, as well as those who did not. Specically, the results of this study found that the HOS ADL and sports subscales were unidimensional, had adequate internal consis tency, were potentially responsive across the spectr um of abilit y, and contributed inf ormati on acr oss the spe ctr um of abi lit y. In additi on, scores obtain ed by the HOS rel ate d to measures of function and did not relate to measures of mental health.  Level of Evidence:  Level III, development of diagnostic criteria with nonconsecutive patients.  Key Words: Hip Outcome Score— Labral tear—Hip arthroscopy—Outcome instrument—Validity. M uscul oskel etal hip disorders and hip arthros- copy are areas of growing interest within the eld of orthopaedics. As physicians and other health care practitioners become more involved in these ar- eas, research that denes the expected outcomes for various treatments will be needed. This will include continuing to dene the outcomes of both arthroscopic surgical treatment and nonsurgical treatment for indi- vidual s wit h ace tabula r labral tears. A number of self- repor t eval uative instrumen ts have been deve l- oped for indivi dua ls wit h hip pat hol ogy. 1-8 Al l of  these instruments have dec ienci es that may nega- tively impact their ability to assess the effect of treat- ment interven tions for individuals with labral tear s who may be functioning throughout a wide range of ability. The usefulness of an instrument can be determined based on concepts associated with contemporary va- lidity theory. Important concepts to consider include evidence for test content, internal structure, and rela- From the Department of Physical Therapy, Duquesne University (R.L.M.), Pittsburgh, Pennsylvania; Hospital for Special Surgery,  New York-Presbyterian Hospital, Weill Medical College of Cornell Uni ver sit y (B. T.K.), New York, New Yor k; Ste adman Hawkins Clinic , Steadman Hawki ns Resea rch Founda tion (M.J. P.), Vail, Colorado; and University of Pittsburgh Medical Center (M.J.P.), Pittsburgh, Pennsylvania, U.S.A. Supported in part by a grant from the Orthopaedic Section of the  American Physical Therapy Association and the Steadman  Hawkins Research Foundation. Within the last 12 months, B.T.K. and M.J.P. have received nancial support exceeding $500 from Smith & Nephew, Andover, MA.  Address correspondence and reprint requests to RobRoy L.  Martin, Ph.D., P.T., C.S.C.S., Department of Physical Therapy,  Duquesne University, 11 4 Rangos School of Health Sciences, Pitts- burgh, PA 15282, U.S.A. E-ma il: martinr280@d uq.edu © 2006 by the Arthroscopy Association of North America 0749-8063/06/2212-5197$32.00/0 doi:10.1016/j.arthro.2006.07.027 Note : To access the supp lemen tary Appe ndix accompany ing this re port , vi si t the De cembe r is sue of   Arthroscopy  at www.arthroscopyjournal.org . 1304  Arthro scopy: T he Journal of Arthroscopic and Relate d Surgery , Vol 22, No 12 (De cembe r), 2006: p p 1304-131 1

description

Evidência da validade do Hip Outcome Score

Transcript of Evidence of Validity for the Hip Outcome Score

  • the

    R n T.

    evided relal tearevalu

    tionsge, 13rthros) subsr testi.95 fore 4.d thatrrelatiifferenof this

    the use of the HOS ADL and sports subscales for individuals with labral tears. This includesindividuals who underwent arthroscopic surgery, as well as those who did not. Specifically, theresults of this study found that the HOS ADL and sports subscales were unidimensional, had adequateinternal consistency, were potentially responsive across the spectrum of ability, and contributed

    F(R.NeUnCliCoPit

    SAmHaandSm

    AMaDubur

    0d

    13information across the spectrum of ability. In addition, scores obtained by the HOS related tomeasures of function and did not relate to measures of mental health. Level of Evidence: Level III,development of diagnostic criteria with nonconsecutive patients. Key Words: Hip Outcome ScoreLabral tearHip arthroscopyOutcome instrumentValidity.

    Musculoskeletal hip disorders and hip arthros-copy are areas of growing interest within thefield of orthopaedics. As physicians and other healthcare practitioners become more involved in these ar-eas, research that defines the expected outcomes forvarious treatments will be needed. This will includecontinuing to define the outcomes of both arthroscopicsurgical treatment and nonsurgical treatment for indi-viduals with acetabular labral tears. A number ofself-report evaluative instruments have been devel-oped for individuals with hip pathology.1-8 All ofthese instruments have deficiencies that may nega-tively impact their ability to assess the effect of treat-ment interventions for individuals with labral tearswho may be functioning throughout a wide range ofability.

    The usefulness of an instrument can be determinedbased on concepts associated with contemporary va-lidity theory. Important concepts to consider include

    rom the Department of Physical Therapy, Duquesne UniversityL.M.), Pittsburgh, Pennsylvania; Hospital for Special Surgery,w York-Presbyterian Hospital, Weill Medical College of Cornelliversity (B.T.K.), New York, New York; Steadman Hawkinsnic, Steadman Hawkins Research Foundation (M.J.P.), Vail,lorado; and University of Pittsburgh Medical Center (M.J.P.),tsburgh, Pennsylvania, U.S.A.upported in part by a grant from the Orthopaedic Section of theerican Physical Therapy Association and the Steadmanwkins Research Foundation. Within the last 12 months, B.T.K.

    M.J.P. have received financial support exceeding $500 fromith & Nephew, Andover, MA.ddress correspondence and reprint requests to RobRoy L.rtin, Ph.D., P.T., C.S.C.S., Department of Physical Therapy,quesne University, 114 Rangos School of Health Sciences, Pitts-gh, PA 15282, U.S.A. E-mail: [email protected]

    2006 by the Arthroscopy Association of North America749-8063/06/2212-5197$32.00/0oi:10.1016/j.arthro.2006.07.027

    Note: To access the supplementary Appendix accompanyingthis report, visit the December issue of Arthroscopy atEvidence of Validity forobRoy L. Martin, Ph.D., P.T., C.S.C.S., Brya

    Purpose: The purpose of this study was to offer(HOS) based on internal structure, test content, anpopulation consisted of 507 subjects with a labrafactor analysis and coefficient . Test content wascorrelation coefficients were used to assess relaResults: The mean subject age was 38 years (ransubjects. Of the subjects, 263 (52%) underwent aof 19 items on the activities-of-daily-living (ADLnot fit the 1-factor model were omitted from furtheon 1 factor. The coefficient values were .96 andThe errors associated with a single measure wersubscales, respectively. Item response theory founcurves and were potentially responsive. The comeasures of physical function were significantly dfunctioning (P .005). Conclusions: The resultsevwww.arthroscopyjournal.org.

    04 Arthroscopy: The Journal of Arthroscopic and Related SurgeryHip Outcome ScoreKelly, M.D., and Marc J. Philippon, M.D.

    nce of validity for the Hip Outcome Scoretion to other variables. Methods: The study. Internal structure was evaluated by use ofated by use of item response theory. Pearsonbetween the Short Form 36 and the HOS.to 66 years), with 232 male and 273 female

    copic surgery. Factor analysis found that 17cale loaded on 1 factor. The 2 items that didng. All 9 items on the sports subscale loadedthe ADL and sports subscales, respectively.

    6 and 3.8 points for the ADL and sportsall items contributed to their test informationons between the HOS and Short Form 36t than their correlation to measures of mentalstudy provide evidence of validity to supportidence for test content, internal structure, and rela-

    , Vol 22, No 12 (December), 2006: pp 1304-1311

  • tion to other variables.9,10 To be useful in the realm ofspshtioofhiginsa cmahigavassabthetiaity

    meusforfunhyunbeityabrelunhe

    Cr

    frodivpofundetiotoitesytotrutheinithainc

    ac

    The ADL subscale contained 19 items pertaining tobataisuponalssoacnomopliassan

    Pr

    suoftreanwecoPahamemaiteif tbetivthestuanthi

    Da

    Ev

    usincstrof

    tiobefacaswisis

    1305VALIDITY EVIDENCE FOR HIP OUTCOME SCOREorts medicine and hip arthroscopy, an instrumentould have adequate representation of items ques-ning an individuals proficiency with a wide rangeability. This would include activities requiring ah level of ability (i.e., sports participation). If antrument does not have this adequate representation,eiling effect and inadequate sensitivity to changey occur when individuals are only limited at theh end of ability. The instruments that are currently

    ailable contain only a limited number of items thatess activity and participation at the higher end of

    ility. Objectively evaluating the individual items inir ability to contribute information and be poten-lly responsive, particularly at the high end of abil-, can be done with item response theory (IRT).The purpose of this study was to create an instru-nt, the Hip Outcome Score (HOS), that could be

    ed to assess the outcome of treatment interventionindividuals with acetabular tears who may be

    ctioning throughout a wide range of ability. It waspothesized that the newly created HOS would beidimensional, have adequate internal consistency,potentially responsive across the spectrum of abil-, and contribute information across the spectrum ofility. In addition, scores obtained by the HOS wouldate to concurrent measures of function while notduly relating to concurrent measures of mentalalth.

    METHODS

    eating the Interim HOS

    Item content for the HOS was derived from inputm physicians and physical therapists who treat in-iduals with musculoskeletal hip disorders. The pur-

    se of this instrument was to assess self-reportedctional status. Therefore, according to the terms

    fined by the International Classification of Func-ning, Disability and Health model items that relatedactivity and participation were included whereasms relating to body structure and function (i.e.,mptoms) were not considered.11 An effort was madeinclude functional activities that cover a full spec-m of ability, including sports-related activities. On

    basis of these criteria, a total of 28 items weretially considered and developed. It was believedt all 28 items were appropriate and should beluded in the HOS.

    A decision was made to create 2 subscales, thetivities-of-daily-living (ADL) and sports subscales.sic daily activities, and the sports subscale con-ned 9 items pertaining to higher-level activities,ch as those required in athletics. In addition to the 5tential responses, ranging from unable to do too difficulty, a response of nonapplicable waso added. This allows subjects to designate thatmething other than their hip problem limits theirtivity. This means that both missing responses andnapplicable responses could not be scored. Thisdel of 2 subscales, as well as the use of a nonap-cable response, was based on the successful resultsociated with an instrument developed for the foot

    d ankle.10

    ocedure for Data CollectionWe used a cross-sectional study design. Potentialbjects consisted of patients who were under the carea single orthopaedic surgeon who specializes in theatment of musculoskeletal hip-related disordersd, particularly, acetabular labral tears. Subjectsre given the HOS and Short Form 36 (SF-36) tomplete during a regularly scheduled clinical visit.tients who could not read English or who did notve a labral tear were excluded. On the basis of thethods used in previous studies,12 a decision wasde to exclude subjects who had a high number ofms that could not be scored. Subjects were includedhey had at least 14 of 19 and 7 of 9 items that couldscored on the ADL and sports subscales, respec-

    ely. Demographic information was recorded fromcomputer database and medical records. This

    dy was approved by the institutional review board,d all subjects gave their consent for participation ins study.

    ta Analysis

    idence for Test Content

    Psychometric procedures associated with IRT wereed to obtain evidence for test content. This analysisluded an assessment of unidimensionality, con-uction of item characteristic curves, and productiontest information functions.13Assumption of Unidimensionality: The assump-n of unidimensionality must be met before IRT canused.13 Evidence for this would be provided by ator that accounts for a large amount of the varianceindicated by the production of only 1 eigenvalueth a value greater than 1. Exploratory factor analy-was completed by use of PRELIS (Scientific Soft-

  • ware International, Chicago, IL). Eigenvalues and fac-torfacpriongre

    tifiangraiteExearamitewiwiananabiteeli

    Test Information Function: The results of IRT

    StaGe

    a .3%)Put .5%)Wa .8%)Wa %)Go .7%)Go

    s .9%)Go %)De .3%)Ge

    b .4%)Sit .8%)Wa .7%)Wa

    1 .1%)Wa

    g %)Tw

    i .4%)Ro .2%)Lig

    ( .7%)He

    (c .2%)

    Re .4%)

    1306 R. L. MARTIN ET AL.loading patterns were used to identify and extracttors. Items with the lowest factor loading to thencipal component were sequentially deleted untilly 1 eigenvalue was produced that had a valueater than 1.

    Item Characteristic Curves: MULTILOG (Scien-c Software International) was used to perform IRTd calibrate the items by use of the 2-parameterded response model. The results of IRT allow for

    m characteristic curves to be constructed in ancel spreadsheet (Microsoft, Redmond, WA) forch item by use of difficulty and discrimination pa-

    eters generated by MULTILOG. An appropriatem characteristic curve with 5 potential responses,th each response describing a level of proficiencyth the activity in question, should have 5 distinctd separate curves. Each curve should have 1 peak,d together, the 5 curves should span the spectrum ofility (theta).13 Items that did not have appropriatem characteristic curves were considered formination.

    TABLE 1. Item Response

    Unable toDo

    ExtremeDifficulty

    ModeDiffic

    nding for 15 min 7 (1.4%) 29 (5.7%) 129 (25tting into and out of anverage car 0 46 (9.1%) 118 (23ting on socks and shoes 5 (1%) 60 (11.8%) 104 (20lking up steep hills 20 (3.9%) 69 (13.6%) 161 (31lking down steep hills 13 (2.6%) 48 (9.5%) 147 (29ing up 1 flight of stairs 1 (0.2%) 38 (7.5%) 100 (19ing down 1 flight oftairs 1 (0.2%) 17 (3.4%) 91 (17ing up and down curbs 1 (0.2%) 9 (1.8%) 66 (13ep squatting 67 (13.2%) 105 (20.7%) 123 (24tting into and out of aath 10 (2%) 21 (4.1%) 83 (16

    ting for 15 min 2 (0.4%) 28 (5.5%) 80 (15lking initially 8 (1.6%) 25 (4.9%) 100 (19lking for approximately0 min 12 (2.4%) 41 (8.1%) 107 (21lking for 15 min orreater 33 (6.5%) 82 (16.2%) 137 (27isting/pivoting onnvolved leg 49 (9.7%) 107 (21.1%) 139 (27lling over in bed 7 (1.4%) 47 (9.3%) 87 (17ht to moderate workstanding and walking) 11 (2.2%) 33 (6.5%) 110 (21avy workpushing/pulling,limbing, carrying) 64 (12.6%) 114 (22.5%) 143 (28

    creational activities 98 (19.3%) 98 (19.3%) 139 (27vide information values generated by MULTILOGeach item at 9 ability levels, ranging from 2.0 to

    . The item information values for each item at the 9ility levels were summed to produce the test informa-n function. The target test information function for anluative instrument should provide information across alllity ranges.13 Items that did not contribute to the testormation function were considered for elimination.

    idence of Internal Structure

    The Cronbach coefficient value was calculated bye of the SPSS program (version 11.5; SPSS, Chi-go, IL) to assess internal consistency. The standardor of measure (SEM) was calculated as follows:

    SEM1 rwhich was the SD of the scores and r was theefficient . A 90% confidence interval (CI) wasn calculated to determine the error associated withcore at a single point in time.

    rn for ADL SubscaleSlight

    DifficultyNo

    Difficulty NonapplicableMissing

    Response

    132 (26%) 206 (40%) 2 (0.4%) 3 (0.4%)

    189 (37.3%) 153 (30.2%) 0 1 (0.2%)156 (30.8%) 180 (35.5%) 0 2 (0.4%)133 (26.2%) 112 (22.1%) 10 (2%) 2 (0.4%)140 (27.6%) 142 (28%) 11 (2.2%) 6 (1.2%)158 (31.2%) 209 (41.2%) 1 (0.2%) 0

    164 (32.3%) 230 (45.4%) 1 (0.2%) 3 (0.6%)134 (26.4%) 292 (57.6%) 0.3 (0.6%) 2 (0.4%)114 (22.5%) 72 (14.2%) 16 (3.2%) 10 (2%)

    136 (26.8%) 184 (36.3%) 64 (12.6%) 9 (1.8%)148 (29.2%) 248 (48.9%) 0 1 (0.2%)170 (33.5%) 199 (39.3%) 0 5 (1%)

    154 (30.4%) 190 (37.5%) 0 3 (0.6%)

    113 (22.3%) 136 (26.8%) 4 (0.8%) 2 (0.4%)

    136 (26.8%) 65 (12.8%) 5 (1%) 6 (1.2%)176 (34.7%) 185 (36.5%) 1 (0.2%) 4 (0.8%)

    181 (35.7%) 167 (32.9%) 1 (0.2%) 4 (0.8%)

    114 (22.5%) 52 (10.3%) 18 (3.6%) 4 (0.8%)106 (20.9%) 39 (7.7%) 18 (3.6%) 9 (1.8)profor2.0abtioevaabiinf

    Ev

    uscaerr

    incothea s

    Patte

    rateulty

    .4%)

  • Evidence of Convergent and Divergent Validity

    oficamaDithehemacosubatypmu

    Su

    Detheyemasytofunabthege

    completion of the questionnaires in these individuals

    As

    forusspanagdasu

    Patter

    tety

    Ru %)Jum %)Sw

    a )Lan %)Sta

    q %)Cu

    m %)Lo

    l %)Ab

    a

    n %)Ab

    is

    y %)

    1307VALIDITY EVIDENCE FOR HIP OUTCOME SCOREConvergent evidence was examined by assessmentthe associations between the HOS and SF-36 phys-l function subscale and physical component sum-ry score by use of Pearson correlation coefficients.vergent evidence was examined by assessment of

    associations between the HOS and SF-36 mentalalth subscale and the mental health component sum-ry score. Testing for differences in the correlation

    efficients between the HOS and concurrent mea-res of physical function and mental health was donesed on the equation of Meng et al.14 The a priorie I error rate was set at .005 to account for theltiple comparisons.

    RESULTS

    bjectsIncluded in the data analysis from October 2003 tocember 2004 were 507 subjects with a labral tear asir primary diagnosis. The mean subject age was 38

    ars (SD, 13 years; range, 13 to 66 years), with 232le and 273 female subjects. The mean duration of

    mptoms was 3.4 years (SD, 5 years; range, 11 days29 years). The subjects reported current level ofction was normal in 3%, nearly normal in 26%,

    normal in 51%, and severely abnormal in 20%. Ofsubjects, 263 (52%) underwent arthroscopic sur-

    ry. The mean length of time between surgery and

    TABLE 2. Item Response

    Unable toDo

    ExtremeDifficulty

    ModeraDifficul

    nning 1 mile 286 (56.4%) 62 (12.2%) 60 (11.8ping 171 (33.7%) 96 (18.9%) 84 (16.6

    inging objects likegolf club 82 (16.2%) 50 (9.9%) 66 (13%ding 110 (21.7%) 89 (17.6%) 98 (19.3

    rting and stoppinguickly 79 (15.6%) 118 (23.3%) 131 (25.8

    tting/lateralovements 105 (20.7%) 133 (26.2%) 113 (22.3

    w-impact activitiesike fast walking 74 (14.6%) 76 (15%) 115 (22.7ility to performctivity with yourormal technique 116 (22.9%) 95 (18.7%) 116 (22.9

    ility to participaten your desiredport as long asou would like 260 (51.3%) 96 (18.9%) 64 (12.6s 6.7 months (range, 2 days to 3.86 years). Withpect to comorbidities, all subjects noted that theircondition was their primary limiting factor.

    m Response Patterns for ADLd Sports Subscales

    The response patterns for the individual items aresented in Tables 1 and 2 for the ADL and sports

    bscales, respectively. For the ADL subscale, item(getting into and out of a bath) had the highest

    mber of nonapplicable and missing responses, be-se 14.4% of individuals had data that could not bered. For the remaining items, fewer than 6% ofividuals had data that could not be scored for each

    m. Compared with the ADL subscale, the sportsbscale had a larger number of nonapplicable re-onses. The number of missing responses was onlyghtly greater for the sports subscale.

    sumption of UnidimensionalityPRELIS requires the use of complete data. There-e 430 subjects (85%) and 343 subjects (68%) were

    ed to evaluate the ADL and sports subscales, re-ectively. For both the ADL and sports subscales,alysis was done to assess for difference in gender,e, duration of symptoms, time between surgery andta collection, and current rating of function betweenbjects with no missing responses compared with the

    n for Sports SubscaleSlight

    DifficultyNo

    Difficulty NonapplicableMissing

    Response

    41 (8.1%) 35 (6.9%) 21 (4.1%) 2 (0.4%)81 (16%) 64 (12.6%) 8 (1.6%) 3 (0.6%)

    99 (19.5%) 111 (21.9%) 92 (18.1%) 7 (1.4%)96 (18.9%) 77 (15.2%) 22 (4.3%) 15 (3%)

    107 (21.1%) 67 (13.2%) 3 (0.6%) 15 (3%)

    104 (20.5%) 33 (6.5%) 9 (1.8%) 10 (2%)

    132 (26%) 104 (20.5%) 4 (0.8%) 2 (0.4%)

    104 (20.5%) 61 (12%) 8 (1.6%) 7 (1.4)

    42 (8.3%) 29 (5.7%) 14 (2.8%) 2 (0.4%)wareship

    Itean

    presu10nucauscoinditesuspsli

  • other subjects included in the study. The value wassetpla

    noduanfunicadugeoffoufemgrowi

    catvaitecowifacitesh

    wheigthepo17thefun

    1 fanto

    Ite

    ADfittingdoitewaa wsentheacupchonsim

    Te

    itebeiteste

    Ite

    TABLE 4. Factor Loadings of Individual Itemsof Sports Subscale

    Ite

    1308 R. L. MARTIN ET AL.at .05 but adjusted to .005 because of the 10nned comparisons.

    For the ADL subscale, a significant difference wast found for gender (P .94), age (P .009),ration of symptoms (P .7), time between surgeryd data collection (P .012), and current rating ofction (P .18). For the sports subscale, a signif-nt difference was not found for age (P .58),ration of symptoms (P .37), time between sur-ry and data collection (P .39), and current ratingfunction (P .56). A significant difference wasnd for gender (P .0005), because the ratio ofale subjects to male subjects was lower in theup with no missing data compared with the group

    th 1 or 2 missing responses.Factor analysis of the 19-item ADL subscale indi-ed that the items loaded on 2 factors with eigen-lues of 12.4 and 1.2. The factor loadings of eachm on the 19-item ADL subscale to the first principalmponent are reported in Table 3. Because 2 factorsth an eigenvalue greater than 1 were produced, thetor analysis was repeated sequentially omittingm 11 (sitting) and item 3 (putting on socks andoes). A 17-item ADL subscale loaded on 1 factor,

    TABLE 3. Factor Loadings of Individual Itemsfor ADL Subscale

    m No. Item Content

    Factor Loading

    19 Item 17 Item

    1 Standing for 15 min .82 .822 Getting into and out of an

    average car.78 .76

    3 Putting on socks and shoes .63 4 Walking up steep hills .84 .855 Walking down steep hills .84 .856 Going up 1 flight of stairs .85 .867 Going down 1 flight of stairs .86 .868 Going up and down curbs .84 .839 Deep squatting .75 .75

    10 Getting into and out of a bath .81 .8011 Sitting for 15 min .55 12 Walking initially .76 .7513 Walking for approximately 10

    min.86 .87

    14 Walking for 15 min or greater .84 .8515 Twisting/pivoting on involved leg .75 .7416 Rolling over in bed .76 .7417 Light to moderate work (standing

    and walking).89 .89

    18 Heavy work (pushing/pulling,climbing, carrying)

    .81 .82

    19 Recreational activities .77 .77ich accounted for 68% of the variance and had anenvalue of 11.6. The factor loading of each item on17-item ADL subscale to the first principal com-

    nent is found in Table 3. Scores from this modified-item ADL subscale were then used for constructing

    item characteristic curves and test informationctions.

    The 9-item sports subscale loaded on 1 factor. Thisactor accounted for 80.3% of the variance and hadeigenvalue of 7.1. The factor loadings of each itemthe first principal component are found in Table 4.

    m Characteristic Curves

    Inspection of the item characteristic curves for theL subscale revealed that all but 4 items had well-

    ing curves. Those 4 items pertained to the follow-: (1) getting into a car, (2) going up steps, (3) going

    wn steps, and (4) going up and down curbs. Them characteristic curve for the item pertaining tolking up hills is an example of an item that hadell-fitting item characteristic curve and is pre-ted in Fig 1. The item characteristic curves foritems that did not have well-fitting item char-

    teristic curves resembled that pertaining to goingand down curbs, which is shown in Fig 2. Item

    aracteristic curves were also plotted for 9 itemsthe sports subscale. All 9 had well-fitting curvesilar to that displayed in Fig 1.

    st Information FunctionThe test information function for the modified 17-m ADL subscale and the 9-item sports subscale canfound in Fig 3. The 4 items without well-fitting

    m characteristic curves (getting into a car, going upps, going down steps, and going up and down

    m No. Item ContentFactor

    Loading

    1 Running 1 mile .902 Jumping .933 Swinging objects like a golf club .854 Landing .945 Starting and stopping quickly .896 Cutting/lateral movements .887 Low-impact activities like fast walking .868 Ability to perform activity with your

    normal technique.83

    9 Ability to participate in your desiredsport as long as you would like

    .87

  • cuforeainfThinsran

    scaabsp

    latanothtoucotogbeobsw1 i2 aonpoobsim36icaTh9-i10tiv

    Ev

    17.96Fo.95

    0

    0

    0

    0

    0

    0

    0

    0

    0

    1

    Pro

    bab

    ility

    of

    Res

    po

    nse

    FIGhillactnottogunREres

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Pro

    bab

    ility

    of

    Res

    po

    nse

    FIGcurthaonl(REspocul

    35

    Info

    rmatio

    n

    ADL TIF

    FIGsubrangarsub

    1309VALIDITY EVIDENCE FOR HIP OUTCOME SCORErbs) were considered for elimination. The test in-mation function was recalculated separately with

    ch of these items deleted. In each case a decrease inormation was noted throughout the range of ability.erefore these 4 items were retained to maximize thetruments precision of measurement across thege of ability.

    The 19-item ADL subscale and 9-item sports sub-le can be found in Appendix 1 (online only, avail-

    le at www.arthroscopyjournal.org). The ADL andorts subscales are scored separately. The item re-

    0

    .1

    .2

    .3

    .4

    .5

    .6

    .7

    .8

    .9

    THETA -3.2 -2.2 -1.2 -0.2 0.8 1.8 2.8 3.8

    RESP0

    RESP1

    RESP2

    RESP3

    RESP4

    URE 1. Item characteristic curve for item 4 (walking up steeps) on ADL subscale. This represents the appropriate item char-eristic curve for an item with 5 potential responses. It should beed that there are 5 separate curves, each with 1 peak, andether, the 5 curves span the spectrum of ability (). (RESPO,able to do response; RESP1, extremely difficult response;SP2, moderate difficulty response; RESP3, slight difficultyponse; RESP4, no difficulty at all response.)

    0

    .1

    .2

    .3

    .4

    .5

    .6

    .7

    .8

    .9

    1

    THETA -3.2 -2.2 -1.2 -0.2 0.8 1.8 2.8 3.8

    RESP0

    RESP1

    RESP2

    RESP3

    RESP4

    URE 2. Item characteristic curve for item 8 (going up and downbs) on ADL subscale. This represents an item characteristic curvet may be potentially unacceptable. It should be noted that there arey 4 separate curves for an item that has 5 potential responses.SPO, unable to do response; RESP1, extremely difficult re-nse; RESP2, moderate difficulty response; RESP3, slight diffi-ty response; RESP4, no difficulty at all response.)ed to sitting and the item related to putting on socksd shoes are not scored. The response to each of theer 17 items on the ADL subscale is scored from 40, with 4 indicating no difficulty and 0 indicatingnable to do. Nonapplicable responses are notunted. The scores for each of the items are addedether to obtain the item score total. The total num-

    r of items with a response is multiplied by 4 totain the highest potential score. If the subject an-ers all 17 items, the highest potential score is 68. Iftem is not answered, then the highest score is 64; ifre not answered, then the highest score is 60; and so. The item score total is divided by the highesttential score. This value is then multiplied by 100 totain a percentage. The sports subscale is scored in ailar manner, with the highest potential score being

    . A higher score represents a higher level of phys-l function for both the ADL and sports subscales.e mean score for the 17-item ADL subscale andtem sports subscale was 67.8 (SD, 20.5; range, 4 to0) and 41.9 (SD, 27.8; range, 0 to 100), respec-ely.

    idence of Internal StructureThe assessment of internal consistency for the-item ADL subscale found a coefficient value of, with an SEM of 2.8 and a 90% CI of 4.6 points.r the sports subscale, the coefficient value was, with an SEM of 2.3 and a 90% CI of 3.8.

    0

    5

    10

    15

    20

    25

    30

    -2.0

    -1.5

    -1.0

    -0.5 0.0

    0.5

    1.0

    1.5

    2.0

    Ability

    Sports TIF

    URE 3. Test information function (TIF) for ADL and sportsscales showing their potential to provide information acrossge of ability. The ADL subscale offers more information re-ding function at the lower end of ability, whereas the sportsscale offers more information at the higher range of ability.

  • Evidence of Convergent and Divergent Validity

    ADphsu0.7tioSFsuco0.1difADme

    HOtioareorvidbeteaoflidmeme

    thespcubehamiomfacchshhaspcocothafactigtheare

    da

    sponses, whereas the IRT and evidence for convergentansetbethothestaan3.5sigdifwidapo

    anintiodivshothsutenscaiteshtheSitvaassanthemetea

    psintclufuntheIRtioofinsfur

    preToscotha65at

    1310 R. L. MARTIN ET AL.The correlation coefficients between the 17-itemL subscale and SF-36 physical function subscale,

    ysical component summary score, mental healthbscale, and mental component summary score were6, 0.74, 0.27, and 0.18, respectively. The correla-n coefficients between the sports subscale and-36 physical function subscale, physical componentmmary score, mental health subscale, and mentalmponent summary score were 0.72, 0.68, 0.23, and, respectively. The calculated t values assessing forferences in the correlation coefficients between theL and sports subscales to measures of physical andntal functioning were significant with P .0005.

    DISCUSSION

    The results of this study offer evidence that theS is a valid measure of self-reported physical func-

    n for individuals with acetabular labral tears whoundergoing either arthroscopic surgical treatment

    nonsurgical treatment. Specifically, this study pro-es evidence for internal structure and test content

    cause the HOS represents the influence of labralrs on activity and participation across the spectrumability. Evidence for convergent and divergent va-ity was obtained because scores relate to otherasures of the same construct and do not relate toasures of a different construct.

    Missing data from a self-report instrument threatenpotential accuracy and validity of the patient re-

    onses. In a clinical situation missing responses oc-r and, when present to a small degree, are thought toacceptable. Item 10 (getting into and out of a bath)

    d a noticeably larger number of nonapplicable andssing responses. It was therefore considered forission from the HOS. However, on the basis of itstor loading pattern, coefficient value, and item

    aracteristic curve, it was believed that this itemould remain on the instrument. The sports subscaled a noticeably higher number of nonapplicable re-onses and therefore a lower number of items thatuld be scored. Although individuals noted their hipndition as their primary limiting factor, we have foundt higher-level activities can sometimes be limited bytors other than hip pathology. A more detailed inves-ation of these other factors is outside of the scope ofdata collected in this study. However, future studiesplanned to examine this issue.

    The factor analysis performed required completeta sets without any nonapplicable or missing re-d divergent validity analyses used incomplete datas. Analyses comparing demographic informationtween individuals with complete data sets withse with incomplete data sets were completed. ForADL subscale, even though P values approached

    tistical significance for age (40 years v 40.7 years)d time between surgery and collection (5.9 months v

    months), there was probably not much clinicalnificance. For the sports subscale, the significantference in the gender distribution between thoseth complete data sets and those with incompleteta sets may require further analysis and, at thisint, is difficult to interpret.The analysis of internal structure by use of factoralysis found that 2 items needed to be eliminatedorder for the ADL subscale to meet the assump-n of unidimensionality. Items describing the in-iduals ability to sit and to put on socks and

    oes may represent a different domain than theer 17 items. Because this finding was somewhat

    rprising, a poststudy analysis of internal consis-cy was done with the entire 19-item ADL sub-le. Internal consistency was higher when the

    ms related to sitting and putting on socks andoes were deleted and confirmed that omittingse items from the ADL score was appropriate.ting and putting on socks and shoes may offerluable differential diagnosis information for painociated with femoral acetabular impingement

    d stiffness associated with arthritis. However, onbasis of this study, their inclusion on an instru-

    nt that assesses the influence of acetabular labralrs on functional status is questioned.

    Objective evidence for content was obtained by theychometric procedures of IRT and the results ofernal consistency. Individuals with labral tears, in-ding those who undergo hip arthroscopy, generallyction at a high level and may only be limited inir ability to participate in sports. The results of theT analysis offer evidence for adequate representa-n of items questioning activities in the higher rangeability. This may offer an advantage over othertruments currently available. To substantiate this,ther study will be required.The coefficient value was used to estimate thecision of a measurement at a single point in time.help interpret these values, consider a patient whores 70 on the ADL subscale. One can be confidentt, 90% of the time, this patient will score between.4 and 74.6. In addition, one can be confident that,a single point in time, individuals who score above

  • 74.6 or below 65.4 are performing at a different levelthan an individual with an observed score of 70. Thisinformation can used when evaluating whether thescores from 2 individuals are different at a single pointin time.

    In addition to evidence for test content and internalstructure, our results provide convergent and divergentevidence of validity. As expected, the HOS was found tohave relatively high correlations with concurrent mea-sures of physical function and relatively low correlationswith concurrent measures of mental health. This findingprovides evidence that the HOS is a measure of physicalfunction as opposed to mental function.

    CONCLUSIONS

    The results of this study provide evidence of validityto support the use of the HOS ADL and sports subscalesinwhthofouunpocoadofhe

    1.

    antirheumatic drug therapy in patients with osteoarthritis of thehip or knee. J Rheumatol 1988;15:1833-1840.

    2. Nilsdotter AK, Lohmander LS, Klassbo M, Roos EM. Hipdisability and osteoarthritis outcome score (HOOS)Validityand responsiveness in total hip replacement. BMC Musculo-skelet Disord 2003;4:10.

    3. Harris WH. Traumatic arthritis of the hip after dislocation andacetabular fractures: Treatment by mold arthroplasty. An end-result study using a new method of result evaluation. J BoneJoint Surg Am 1969;51:737-755.

    4. Tugwell P, Bombardier C, Buchanan WW, Goldsmith CH,Grace E, Hanna B. The MACTAR Patient Preference Disabil-ity QuestionnaireAn individualized functional priority ap-proach for assessing improvement in physical disability inclinical trials in rheumatoid arthritis. J Rheumatol 1987;14:446-451.

    5. Wright JG, Young NL, Waddell JP. The reliability and validityof the self-reported patient-specific index for total hip arthro-plasty. J Bone Joint Surg Am 2000;82:829-837.

    6. Binkley JM, Stratford PW, Lott SA, Riddle DL. The LowerExtremity Functional Scale (LEFS): Scale development, mea-surement properties, and clinical application. North AmericanOrthopaedic Rehabilitation Research Network. Phys Ther1999;79:371-383.

    7. Hunsaker FG, Cioffi DA, Amadio PC, Wright JG, Caughlin B.

    8.

    9.

    10.

    11.

    12.

    13.

    14.

    1311VALIDITY EVIDENCE FOR HIP OUTCOME SCOREindividuals with labral tears. This includes individualso have undergone arthroscopic surgery, as well asse who have not. Specifically, the results of this studynd that the HOS ADL and sports subscales were

    idimensional, had adequate internal consistency, weretentially responsive across the spectrum of ability, andntributed information across the spectrum of ability. Indition, scores obtained by the HOS related to measuresfunction and did not relate to measures of mental

    alth.

    REFERENCES

    Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, StittLW. Validation study of WOMAC: A health status instrumentfor measuring clinically important patient relevant outcomes toThe American Academy of Orthopaedic Surgeons outcomesinstruments: Normative values from the general population.J Bone Joint Surg Am 2002;84:208-215.Christensen CP, Althausen PL, Mittleman MA, Lee JA, Mc-Carthy JC. The nonarthritic hip score: Reliable and validated.Clin Orthop Relat Res 2003:75-83.Messick S. Meaning and values in test validation: The scienceand ethics of assessment. Educ Res 1989;18:5-11.Martin RL, Irrgang JJ, Burdett RG, Conti SF, Van SwearingenJM. Evidence of validity for the Foot and Ankle AbilityMeasure (FAAM). Foot Ankle Int 2005;26:968-983.International classification of functioning, disability andhealth (ICF). Geneva: World Health Organization, 2001.Irrgang JJ, Anderson AF, Boland AL, et al. Development andvalidation of the international knee documentation committeesubjective knee form. Am J Sports Med 2001;29:600-613.Hambleton RK, Jones RW. Comparison of classical test theoryand item response theory and their applications to test devel-opment. Educ Meas Issues Pract 1993;12:38-47.Meng X, Roenthal R, Sax G. Comparing correlation coeffi-cients. Psychol Bull 1957;111:172-175.

    Evidence of Validity for the Hip Outcome ScoreMETHODSCreating the Interim HOSProcedure for Data CollectionData AnalysisEvidence for Test ContentAssumption of UnidimensionalityItem Characteristic CurvesTest Information Function

    Evidence of Internal StructureEvidence of Convergent and Divergent Validity

    RESULTSSubjectsItem Response Patterns for ADL and Sports SubscalesAssumption of UnidimensionalityItem Characteristic CurvesTest Information FunctionEvidence of Internal StructureEvidence of Convergent and Divergent Validity

    DISCUSSIONCONCLUSIONSREFERENCES