· Web viewScoring as a Right/Wrong test. To score a test as a right/wrong test, the responses...

72
Bond and Fox, Chapter 6 – Likert Scales The authors spend pages 102-103 of the 2 nd edition spreading what might be termed lies about Likert scaling. They suggest that the response of 5 on a 5-point scale is taken by researchers to mean 5 times the value of the response of 1 on the same 5-point scale. No one who knows anything about measurement would presume that. There are some assumptions underlying common usage of Likert scales that can be questioned, however. For example, Does a 5 response to Item 1 from an Extraversion scale represent the same amount of Extraversion as a 5 response to Item 10 from the same scale? Does a 5 response to Item 1 represent more extraversion than a 3 response to Item 3? Does a 3 response to item 1 represent more Extraversion than a 2 response to the same item? Is the difference between a 2 and a 3 response the same as the difference between an 4 and 5? These are the kinds of issues that Rasch modeling attempts to address. The relationship between Likert Scales and Right/Wrong tests Consider the IPIP Extraversion item, “Am the life of the party.” A person who is extraverted will agree with this item. Thus, an agreement response is the “correct” response for Extraverts to this item. So, if we count the number of agreement responses to the Extraversion items, that’s counting the number of “correct” PSY 5950 BF6 - 1

Transcript of  · Web viewScoring as a Right/Wrong test. To score a test as a right/wrong test, the responses...

Bond and Fox, Chapter 6 – Likert ScalesThe authors spend pages 102-103 of the 2nd edition spreading what might be termed lies about Likert scaling.

They suggest that the response of 5 on a 5-point scale is taken by researchers to mean 5 times the value of the response of 1 on the same 5-point scale. No one who knows anything about measurement would presume that.

There are some assumptions underlying common usage of Likert scales that can be questioned, however.

For example,

Does a 5 response to Item 1 from an Extraversion scale represent the same amount of Extraversion as a 5 response to Item 10 from the same scale?

Does a 5 response to Item 1 represent more extraversion than a 3 response to Item 3?Does a 3 response to item 1 represent more Extraversion than a 2 response to the same item?Is the difference between a 2 and a 3 response the same as the difference between an 4 and 5?

These are the kinds of issues that Rasch modeling attempts to address.

The relationship between Likert Scales and Right/Wrong tests

Consider the IPIP Extraversion item, “Am the life of the party.”

A person who is extraverted will agree with this item.

Thus, an agreement response is the “correct” response for Extraverts to this item.

So, if we count the number of agreement responses to the Extraversion items, that’s counting the number of “correct” responses to those items. The person with the higher count has more Extraversion, i.e., greater “ability” on the Extraversion dimension.

So, if we were to score all Likert items as Disagree=0 and Agree=1 and if we were to count the 1s to get the persons total Extraversion score, then scoring them would be no different from scoring a Right/Wrong test by counting the 1s.

That is, a Likert item can be viewed as simply a Right/Wrong item in which the Agreement response is the correct answer.

Measurement of personality and intelligence is essentially the same??

PSY 5950 BF6 - 1

Example of scoring a personality test as Right/Wrong, i.e., treating responses as a dichotomy

The data here are taken from a study in which the IPIP and the NEO-FFI Big Five instruments were compared.

There were 189 respondents – UTC UG and G students. Half filled out the IPIP first and the NEO second. The order was reversed for the other half. Bias Study.

The focus here will be on the IPIP Extraversion scale.

The IPIP Extraversion items are

1 I am the life of the party. 2 I don't talk a lot. 3 I feel comfortable around people. 4 I keep in the background. 5 I start conversations. 6 I have little to say. 7 I talk to a lot of different people at parties. 8 I don't like to draw attention to myself. 9 I don't mind being the center of attention. 10 I am quiet around strangers.

The standard SPSS stuff on the data is below. Alpha was .890. Negatively worded items were reverse-scored prior to this and all following analyses.

PSY 5950 BF6 - 2

Item-Total StatisticsScale Mean if Item Deleted

Scale Variance if Item Deleted

Corrected Item-Total

Correlation

Cronbach's Alpha if Item

Deletedie1 30.63492 45.467 .631 .879ie2 29.87831 43.076 .703 .874ie3 29.73016 47.411 .577 .883ie4 30.15344 44.705 .717 .874ie5 29.73545 45.196 .711 .875ie6 29.71429 46.822 .581 .883ie7 30.14286 42.251 .690 .875ie8 30.84127 46.358 .569 .884ie9 30.26455 45.940 .524 .887ie10 30.71429 43.748 .623 .881

This is the histogram of Extraversion mean-of-responses scores.

It’s approximately Unimodal and Symmetric.

Scoring as a Right/Wrong test.

To score a test as a right/wrong test, the responses must be dichotomized into Disagreement vs Agreement.

The dichotomization should be made near the middle of the response scale.I’ll actually try all possible dichotomizations here, in order to show you why the middle is better.

First, I dichotomized the item responses as 1=0 and 2,3,4,5=1 and computed the sum of 1s.Then I dichotomized the item responses as 1,2=0 and 3,4,5=1 and computed the sum of 1s.Next was 1,2,3=0 and 4,5=1 and computed the sum of 1s.Finally, it was 1,2,3,4=0 and 5=1 . . . . (Boy, I had so much energy when I was young.)

The correlations of the 4 new right/wrong dichotomies with the original gold standard summated mean-of-responses Extraversion scale scores are

Correlations with Extraversion gold standard scale scores

biasid1iext biasid2iext biasid3iext biasid4iext

biasiext

Pearson Correlation .673 .900 .929 .671

Sig. (2-tailed) .000 .000 .000 .000

N 189 189 189 189

As you can see, the two “middle” dichotomizations – 1,2=0 vs 3,4,5=1 and 1,2,3=0 vs 4,5=1 – yielded “right/wrong” scale scores that were very highly correlated with the original summated scores. Dichotomizing at 1 vs 2,3,4,5 and at 1,2,3,4 vs 5 yielded scores that were less valid.

I’m not recommending that you do this. But in a pinch, you can get a fairly good estimate of a personality summated scale score from dichotomized variables treated as if they were right/wrong items, as long as the dichotomization is in the “middle” of the Likert response format.

Here’s a screen shot of the first few cases of the original responses (biasiext) and the 1,2,3=0 / 4,5=1 dichotomized responses (biasd3ext) . . .

As you can see, the scale scores based on the dichotomized items, biasd3iext, look different from the scale scores based on actual responses, biasiext. However they’re quite highly correlated.

PSY 5950 BF6 - 3

Rasch analysis of personality items dichotomized so that they’re like right/wrong items.Here’s the Rasch analysis of the above 1,2,3=0 vs 4,5=1 items as if they were right/wrong items.(File is BondFoxChapter6D.txt)Item informationTABLE 14.1 Bias Study Dichotomized (1,2,3vs4,5)Ex ZOU872WS.TXT Apr 1 19:38 2012INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 2 CATS 1.0.0--------------------------------------------------------------------------------Person: REAL SEP.: 1.49 REL.: .69 ... Item: REAL SEP.: 6.30 REL.: .98 Item STATISTICS: ENTRY ORDER+-----------------------------------------------------------------------------------------------------------+|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item ||------------------------------------+----------+----------+-----+-----------+------------------------------|| 1 58 189 1.74 .21|1.01 .1| .85 -.4| .63| 81.3 81.3| 01 Life of party || 2 121 189 -.80 .20| .94 -.6|1.00 .1| .65| 78.3 78.4| 02R Don't talk a lot || 3 138 189 -1.54 .22|1.06 .6| .78 -.6| .60| 78.3 81.4| 03 Comfortable around people|| 4 94 189 .26 .20| .84 -1.7| .74 -1.7| .71| 81.3 77.6| 04R Keep in background || 5 137 189 -1.49 .22| .80 -1.9| .66 -1.1| .67| 84.9 81.1| 05 Start conversations || 6 140 189 -1.63 .22|1.05 .5| .81 -.5| .59| 81.9 81.9| 06R Have little to say || 7 111 189 -.40 .20| .92 -.8|1.16 .9| .66| 79.5 77.6| 07 Talk to diff people || 8 42 189 2.55 .24|1.11 .9|1.63 1.4| .54| 84.9 84.8| 08R Don't draw attention || 9 97 189 .14 .20|1.36 3.4|1.28 1.5| .55| 63.9 77.4| 09 Don't mind being center || 10 71 189 1.17 .20| .97 -.3| .83 -.8| .66| 79.5 79.3| 10R Quiet around strangers ||------------------------------------+----------+----------+-----+-----------+------------------------------|| MEAN 88.9 166.0 .00 .21|1.01 .0| .97 -.1| | 79.4 80.1| || S.D. 33.1 .0 1.38 .01| .15 1.4| .29 1.0| | 5.6 2.3| |+-----------------------------------------------------------------------------------------------------------+

There is one item in this analysis that is very poorly fitting – “Don’t mind being the center of attention”. Recall that this item was the only item whose removal would have increased alpha.

Same information as above, but ordered by ExtraversionTop items – Respondent must have a LOT of E to endorse – lots of E required to get this “right”.Bottom items – Respondent may endorse with just a little E – little E required to get this “right”

TOTAL SCORE is the number of 4/5 responses after reverse-scoring of negatively worded items. A small TOTAL SCORE is an item that only those high in Extraversion would endorse.

TABLE 13.1 Bias Study Dichotomized (1,2,3vs4,5)Ex ZOU872WS.TXT Apr 1 19:38 2012INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 2 CATS 1.0.0--------------------------------------------------------------------------------Person: REAL SEP.: 1.49 REL.: .69 ... Item: REAL SEP.: 6.30 REL.: .98 Item STATISTICS: MEASURE ORDER+-----------------------------------------------------------------------------------------------------------+|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item ||------------------------------------+----------+----------+-----+-----------+------------------------------|| 8 42 189 2.55 .24|1.11 .9|1.63 1.4| .54| 84.9 84.8| 08R Don't draw attention || 1 58 189 1.74 .21|1.01 .1| .85 -.4| .63| 81.3 81.3| 01 Life of party || 10 71 189 1.17 .20| .97 -.3| .83 -.8| .66| 79.5 79.3| 10R Quiet around strangers || 4 94 189 .26 .20| .84 -1.7| .74 -1.7| .71| 81.3 77.6| 04R Keep in background || 9 97 189 .14 .20|1.36 3.4|1.28 1.5| .55| 63.9 77.4| 09 Don't mind being center || 7 111 189 -.40 .20| .92 -.8|1.16 .9| .66| 79.5 77.6| 07 Talk to diff people || 2 121 189 -.80 .20| .94 -.6|1.00 .1| .65| 78.3 78.4| 02R Don't talk a lot || 5 137 189 -1.49 .22| .80 -1.9| .66 -1.1| .67| 84.9 81.1| 05 Start conversations || 3 138 189 -1.54 .22|1.06 .6| .78 -.6| .60| 78.3 81.4| 03 Comfortable around people|| 6 140 189 -1.63 .22|1.05 .5| .81 -.5| .59| 81.9 81.9| 06R Have little to say ||------------------------------------+----------+----------+-----+-----------+------------------------------|| MEAN 88.9 166.0 .00 .21|1.01 .0| .97 -.1| | 79.4 80.1| || S.D. 33.1 .0 1.38 .01| .15 1.4| .29 1.0| | 5.6 2.3| |+-----------------------------------------------------------------------------------------------------------+

The item “Negation of : Don’t draw attention to myself” is the item only the most extraverted “got correct”. Most respondents agreed with this item. Only those respondents who were the most highly extraverted disagreed with it – they DO draw attention to themselves.

The item, “Negation of : Have little to say” is the item most people from the least extraverted to the most extraverted “got correct”. People even just slightly extraverted disagree with this – they DO have something to say. Only an extreme introvert would agree with this item.

PSY 5950 BF6 - 4

Item Map of dichotomized itemsTABLE 12.2 Bias Study Dichotomized (1,2,3vs4,5)Ex ZOU872WS.TXT Apr 1 19:38 2012INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 2 CATS 1.0.0--------------------------------------------------------------------------------

Persons MAP OF Items <more>|<rare> 3 ###### + .############ | |T | | | 08R Don't draw attention | | | | 2 S+ ########## | | | 01 Life of party | | |S | | 10R Quiet around strangers ######### | 1 + | | | | ######## | | | 04R Keep in background M| | 09 Don't mind being center 0 +M ########### | | | | 07 Talk to diff people | ######## | | | 02R Don't talk a lot | -1 + | .########## | | |S S| 03 Comfortable around people 05 Start conversations | 06R Have little to say | | .######## | -2 + | | | | | | | .##### |T | -3 .##### + <less>|<frequ> EACH '#' IS 2.

The persons on the left extend beyond the items in the high extraversion direction (upwards) so there were people who likely got all items “correct.” People also extend beyond the items in the low extraversion direction (downwards), so there were people who didn’t get many items “correct” in this dichotomization. The fact that the distribution of person “Extraversion” values extends beyond the item values is good for estimation of item Extraversion.

PSY 5950 BF6 - 5

Comparison of person measures from the dichotomized scale vs Likert measures.

I pasted the Rasch person measures into SPSS and created a scatterplot of the Rasch person measures from the analysis of dichotomized items on the vertical axis vs the original Likert summated scores of the original responses on the horizontal axis. Note that the person measures are of the (1,2,3 vs 4,5) dichotomized values treated as wrong(1,23)/right(4,5) answers. Here’s the scatterplot . . .

There are multiple Likert scale scores for each Rasch measure value because a given proportion “correct” could be obtained by means of different combinations of the 10 Extraversion responses.

The r-squared is .861, which means the r is .928, essentially the same as the correlation of the sum of “correct” responses found above (.929, p 3 of this lecture).

The correlation of Rasch measures of dichotomized items with the traditional Likert summated scale scores is very large. In spite of this, inspection of the scatterplot yields some reason not to treat them as being completely equivalent. For example, the points circled in the scatterplot all have the same Rasch score of -2.2 or so. Yet their Likert means range from 2 to 3.2, a wide range of Likert scores all for persons with the same Rasch score. The same criticism applied to the highest and lowest Rasch measures. Each single Rasch value represents a range of summated score values. This is due to the dichotomization.

So we won’t continue to dichotomize the Likert responses and analyze them as right/wrong answers. This was what is called an intellectual exercise – an exercise in “what if”.

PSY 5950 BF6 - 6

Mean of responses to E items – regular Likert scale score

Rasch measures of

dichotomized items

1,2,3 vs 4,5

So r=.928

Using all the response values in analyses of Likert items – Analysis as Dimension ScalesStart here on 3/25/15 or 4/5/17 or 4/12/17.We hardly ever dichotomize Likert items and score them as right/wrong.

While personality items can be considered to be kind of like right/wrong items, the two types of of item are not exactly the same – specifically, strength of agreement with a personality item has no direct counterpart in right/wrong answers. In right/wrong answers, you’re either right or you’re wrong. “Amount of rightness” or “amount of wrongness” is rarely measured.

To assess this strength of agreement with the statement aspect, most Likert items use response formats with multiple response options each with a different value, such as 1 thru 5 or 1 thru 7, appropriate analyses of such items must take all the information in the responses into account.

Dichotomizing as done above ignores differences in responses within the two categories.For example, response of 1 is not the same as a response of 2 or 3, even though they were all categorized as 0 in the above scheme. A response of 7 is more agreement than a response of 5, so treating them both as simply “Agree” jus’ ain’t right.

Rasch (and general IRT) analysis can incorporate multiple responses to each item.

The ideas behind these analyses date to the seminal work of L. L. Thurstone in the 1920s.

It uses a model of respondent behavior that assumes the reading of an item results in an internal “amount” of agreement. “I am the life of the party.”

-----------------|-----------------------|---------------------|---------------------|-----------------------

This internal “amount” is evaluated against the perceived response alternatives each of which also has an internal representation on the same continuum.

-----------------|-----------------------|---------------------|---------------------|-----------------------

SD D N A SA The response alternative to which the internal “amount” is closest is picked as the response.

This means that both the perception of the item and the perceptions of the response alternatives are internalized and placed on an internal continuum.

The assumption is made that there is a dividing line between the internalization of adjacent response alternatives.

These dividing lines are called thresholds in much of the literature on scaling of Likert data.

PSY 5950 BF6 - 7

The Thurstone assumption

T1 T2 T3 T4-----------------|-----------------------|---------------------|---------------------|-----------------------

SD D N A SA

For example, the internal representation of “I am the life of the party” is closest to the internal representation of “Agree”, so the participant responds A (for Agree) to that item.

So, the program estimates K-1 thresholds for a K-response format. This means that for a 5-response format with SD, D, N, A, SA, it would estimate 4 thresholds.

The Bond and Fox program also estimates values for the midpoints of the intervals between thresholds – the points corresponding to SD, D, N, A, and SA in the above illustration.

The program reports the threshold values as well as response values on the same scale as people and items, in case you want to use them.

The program estimates the following for Likert data . . .

1. Person “ability” values. More appropriately, person positions on the dimension.

2. Item “difficulty” values. Position of items on the same personality dimension.

Items with high numeric positions are those only endorsed by persons with the most of whatever it is that the items represent, e.g., Extraversion.

3. Threshold values relative to each item.

We won’t pay much attention to these in the hope of avoiding sensory overload.Trying to deal with 4 thresholds X 50 items, for example, would mean that we wouldHave to try to interpret 200 different values. It’s probably the case that individual item thresholds are about the same for each item, so . . .

4. Average threshold values – we’ll consider these.

5. Average Response “marker” values – locations on the internal continuum corresponding to each response category.

PSY 5950 BF6 - 8

Hypothetical internal thresholds

Hypothetical internal continuum

“I am the life of the party.”

Rasch Control file for the analysis – Bond&FoxChapter6B.txt.

The data are Extraversion items from the “Bias” study conducted several years ago. 189 persons responded to both the IPIP-50 and the NEO-FFI Big 5 questionnaires. Responses were on a 5-point scale due to the fact that the NEO questionnaire packets were set up for 5-point scales.

&INST ; initial line (can be omitted) TITLE = "Bias Study Extraversion Items" PERSON = Person ; persons are ... ITEM = Item ; items are ... ITEM1 = 5 ; column of response to first item in data record NI = 10 ; number of items NAME1 = 1 ; column of first character of person labelNAMELEN = 4 ; length of person label XWIDE = 1 ; number of columns per item response TOTAL = Yes ; show total raw scores CHART = Yes ; produce across-pathway picture MNSQ = No ; use Standardized fit statistics STBIAS = Yes ; Adjust for estimation biasMAXPAGE = 60 ; Maximum lines per pageIREFER= FFFFFFFFFF ; Forward and Reversed items; all items are F here CODES = 12345 ; valid codes in data fileIVALUEF = 12345 ; Forward itemsIVALUER = 54321 ; Reversed items if there are any reversed items CLFILE = * ; category label file for category naming1+1 "STD Strongly Disagree" ; Item 1 is a forward item1+2 "D Disagree"1+3 "N Neither A nor D"1+4 "A Agree"1+5 "STA Strongly Agree"* ; end of CLFILE=* list&END01 Life of party02R Don't talk a lot03 Comfortable around people04R Keep in background05 Start conversations06R Have little to say07 Talk to diff people08R Don't draw attention09 Don't mind being center10R Quiet around strangersEND LABELS30013343433333 Data lines – one for each respondend.30034544444244 Values of negatively-worded items are reverse-scored.

N = 189 for this dataset.

PSY 5950 BF6 - 9

Traditional Analysis of these data.

Here are the first 10 cases (We’ve seen this before, so it’s included here to refresh our memories.) origorder ie1 ie2 ie3 ie4 ie5 ie6 ie7 ie8 ie9 ie10

1 3 3 4 3 4 3 3 3 3 3 2 4 5 4 4 4 4 4 2 4 4 3 4 4 4 4 4 4 4 3 4 3 4 2 4 4 4 4 4 5 1 1 2 5 2 2 4 3 3 3 2 2 3 2 6 4 3 3 4 4 2 4 3 3 4 7 3 5 4 3 4 4 4 3 3 2 8 1 2 2 2 2 2 2 1 1 2 9 4 5 4 4 5 5 5 4 5 4 10 2 4 2 4 4 2 2 2 3 2

Here’s the RELIABILITY Output

Reliability Statistics

Cronbach's Alpha

Cronbach's Alpha Based on

Standardized Items N of Items.890 .892 10

Summary Item Statistics

Mean Minimum Maximum Range Maximum / Minimum Variance N of ItemsItem Means 3.353 2.693 3.820 1.127 1.418 .183 10

Item-Total StatisticsScale Mean if Item

DeletedScale Variance if

Item DeletedCorrected Item-Total

CorrelationSquared Multiple

CorrelationCronbach's Alpha if

Item Deletedie1 30.63 45.467 .631 .476 .879

ie2 29.88 43.076 .703 .566 .874

ie3 29.73 47.411 .577 .373 .883

ie4 30.15 44.705 .717 .542 .874

ie5 29.74 45.196 .711 .550 .875

ie6 29.71 46.822 .581 .428 .883

ie7 30.14 42.251 .690 .590 .875

ie8 30.84 46.358 .569 .434 .884

ie9 30.26 45.940 .524 .454 .887

ie10 30.71 43.748 .623 .458 .881

None of the items “dragged down” alpha.Here’s a dot plot of the scale scores based on the 10 items. Mean = 3.35Median = 3.40SD = 0.74

Skewness = -0.45

PSY 5950 BF6 - 10

The distribution of scale scores was slightly above the midpoint (3) of the 1-5 response scale.

The distribution is skewed to the left.

PSY 5950 BF6 - 11

Application of Rasch model to a Dimension scale – Bias study Extraversion responses(File is Bond&FoxChapter6B.txt; Negatively worded items were reverse-scored prior to entry, so all items will be treated by B&F as if they were positively-scored.)

Note that the Person reliability estimate of .85 indicates that if the sample of persons were given an equivalent set of items, we’d expect a correlation of .85 person measures for the two sets of items. Correlation of .85 between scores of same people using equivalent items – so good test.

The item reliability of .98 indicates that if the same items were used on an equivalent sample of persons, the correlations between item measures for the two samples would be expected to be .98. Correlation of .98 between measures of the same items using equivalent people.

I

PSY 5950 BF6 - 12

tems - Ordered by measure.TABLE 13.1 Bias Study Extraversion Items ZOU268WS.TXT Apr 2 12:27 2012INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 5 CATS 1.0.0--------------------------------------------------------------------------------Person: REAL SEP.: 2.42 REL.: .85 ... Item: REAL SEP.: 7.03 REL.: .98

Item STATISTICS: MEASURE ORDER

+-----------------------------------------------------------------------------------------------------------+|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item ||------------------------------------+----------+----------+-----+-----------+------------------------------|| 8 509 189 1.14 .09| .98 -.2|1.03 .3| .65| 52.9 48.3| 08R Don't draw attention || 10 533 189 .93 .09|1.16 1.6|1.16 1.5| .70| 46.5 47.6| 10R Quiet around strangers || 1 548 189 .80 .09| .86 -1.4| .93 -.7| .69| 50.8 48.1| 01 Life of party || 9 618 189 .18 .09|1.32 2.9|1.28 2.6| .62| 48.1 49.0| 09 Don't mind being center || 4 639 189 -.01 .10| .68 -3.6| .75 -2.6| .75| 64.2 49.8| 04R Keep in background || 7 641 189 -.03 .10|1.20 1.9|1.17 1.6| .73| 47.6 49.9| 07 Talk to diff people || 2 691 189 -.52 .10|1.13 1.2|1.06 .6| .74| 46.5 52.9| 02R Don't talk a lot || 5 718 189 -.81 .10| .75 -2.5| .74 -2.6| .75| 61.0 55.2| 05 Start conversations || 3 719 189 -.82 .10| .96 -.3| .88 -1.1| .65| 59.4 56.0| 03 Comfortable around people|| 6 722 189 -.86 .10|1.03 .3|1.01 .2| .65| 59.9 56.2| 06R Have little to say ||------------------------------------+----------+----------+-----+-----------+------------------------------|| MEAN 627.8 187.0 .00 .10|1.01 .0|1.00 .0| | 53.7 51.3| || S.D. 76.7 .0 .72 .00| .19 1.9| .17 1.6| | 6.4 3.2| |+-----------------------------------------------------------------------------------------------------------+

Items with large positive measures are those to which only the most extraverted endorsed. That is, if a person endorsed with those items, they were very extraverted.

Items with negative measures are those to which moderately or even low extraverted respondents endorse. Note that disagreement is endorsement for negatively-worded items. So even low extraverted participants disagreed with “Have little to say.”

Note that there are 3 items for which there is an indication that agreement to the items was not as consistent as we would like. Two of the Infit values are negative, indicating “very good, perhaps too good consistency”.

One item - #9 – has a large positive infit value. This is an item for which some high E people might disagree and low E people might agree.

Why? For example, the item measure Extraversion. #9: “Don’t mind being the center of attention” may reflect characteristics other than extraversion, e.g., narcissism, in some people.

PSY 5950 BF6 - 13

This information is essentially the same as we got from the analysis of the dichotomized items above. In fact, here is the scatterplot of item measures from this analysis vs. the item measures from the analysis of the 1,3,4 vs 4,5 dichotomy above . . .

Note that the item measure values are not precisely the same for the two analyses. These values are unique up to a linear transformation, as are the person measures. But they’re clearly very highly related: r = .978.

PSY 5950 BF6 - 14

Thresholds – Average (Table 3.2 Rating (partial credit) scale)

This is new stuff.

+--------------------------------------------------------------------------|CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|| LABEL MEASURE S.E. | AT CAT. ----ZONE----|PROBABLTY| M->C C->M|DISCR||------------------------+---------------------+---------+----------+-----+| 1 NONE |( -3.66) -INF -2.75| | 64% 16%| | 1 STD Strongly Disagree| 2 -2.47 .12 | -1.61 -2.75 -.82| -2.58 | 54% 48%| .87| 2 D Disagree| 3 -.43 .07 | -.16 -.82 .56| -.68 | 36% 54%| .97| 3 N Neither A nor D| 4 .02 .06 | 1.57 .56 3.02| .34 | 62% 69%| 1.05| 4 A Agree| 5 2.87 .08 |( 4.02) 3.02 +INF | 2.93 | 76% 29%| 1.06| 5 STA Strongly Agree+--------------------------------------------------------------------------M->C = Does Measure imply Category?C->M = Does Category imply Measure?

SD D N A SA -2.75 -0.82 0.56 3.02 | | | |-.----.----.----.----.----.----.----.----.----.----.----.----.----.----.----.----.--4 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0

In English . . .

The leftmost category, SD, label has value -3.66. The interval of “internal perceptions” that would yield a response of SD extends from –infinity to -2.75.The D label has value -1.62. The interval of “internal perceptions” that would yield a response of D extends from -2.75 to -0.82.The N label has value of -.16. The interval of “internal perceptions” that would yield a response of N extends from -.82 to +.56.The A label has value of 1.57. The interval of “internal perceptions” that would yield a response of A extends from +.56 to +3.02.The SA label has value of 4.02. The interval of “internal perceptions” that would yield a response of SA extends from +3.02 to +infinity.

People ask me, “Mike, how can you prove that the integers 1, 2, 3, 4, 5, 6, 7 are appropriate?”

This kind of analysis is evidence that the successive integer assumption is at least approximately appropriate. Here’s a scatterplot of the estimated values above vs. the successive integers.

The plot is nearly perfectly linear.

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

-6-4-20246

Chart Title

PSY 5950 BF6 - 15

Items+Thresholds – Probabilities of giving each possible response vs Person measure (Table 3.2)

This is a figure in which the probability of making a given response, such as SD – the 1s in the figure below - is plotted vs. Estimated Person Extraversion values.

CATEGORY PROBABILITIES: MODES - Structure measures at intersectionsP ++-----+-----+-----+-----+-----+-----+-----+-----+-----++R 1.0 + +O | |B | 55|A | 55 |B .8 +11 55 +I | 11 5 |L | 1 55 |I | 1 444444 5 |T .6 + 1 44 44 5 +Y | 1 222222 4 44 5 | .5 + 1122 2 4 45 +O | 221 2 4 554 |F .4 + 2 1 22 4 5 44 + | 2 1 33*3*33 5 4 |R | 22 1 33 * 33 5 44 |E | 22 1 3 4 2 33 55 4 |S .2 +22 ** 4 2 33 5 44 +P | 33 1 44 2 5*3 44 |O | 33 4*1 22555 33 44|N | 33333 444 1111 55552222 33333 |S .0 +**************5555555555*11111111**********************+E ++-----+-----+-----+-----+-----+-----+-----+-----+-----++ -4 -3 -2 -1 0 1 2 3 4 5 Person [MINUS] Item MEASURE – Diff between person and item value

The x-axis of this graph represents amount of Extraversion, from -4 to +5.

The probability of making each response on the Y-axis is plotted as a function of the amount of person extraversion. The extreme response curves (1 and 5) should be highest at the extreme and go to zero toward the interior of the graph.The interior response curves (2,3,4) should start small, increase to a max, then fall off again.

The 1s, 2s, 3s, 4s, and 5s curves should be ordered regularly from left to right. For example, the 2 plot should be to the left of the 3 plot which should be to the left of the 4 plot.If they weren’t that would mean that people with high Extraversion were more likely to disagree than to agree with E items, something that wouldn’t make sense.

These plots are nearly exactly what they should be.People with low E are more likely to respond 1 or 2 than to respond 4 or 5.People with high E are more likely to respond 4 or 5 than to response 1 or 2.People with medium amounts of E are most likely to respond 3, just as we would expect.

The program also plots item-specific measures in Table 14.3, with an estimate of “where” each response is for each item. Too much for this presentation.

PSY 5950 BF6 - 16

For example, persons low in E should have low probability of choosing response 5. That probability should increase for persons higher in E, as the “5” curve shows.

Item Map

The item map from our study of dichotomized items is shown on the right.

TABLE 12.2 Bias Study Extraversion Items ZOU268WS.TXT Apr 2 12:27 2012INPUT: 189 Persons 10 Items MEASURED: 189 Persons 10 Items 5 CATS 1.0.0-------------------------------------------------------------------------------- Dichotomized items Persons MAP OF Items <more>|<rare> 5 . + | | | 4 .# + | .# | T| 3 ## + .# | # | ### | 2 ####### S+ .### | ###### |T ##### | 08R Don't draw attention 1 .### + 10R Quiet around strangers .######## |S 01 Life of party .###### M| ####### | 09 Don't mind being center 0 ##### +M 04R Keep in background 07 Talk to diff people .######### | .###### | 02R Don't talk a lot ### S|S 03 Comfortable around people 05 Start conversations 06R Have little to say -1 ### + ## | ## |T .# | -2 # T+ # | # | . | -3 + | | | -4 . + <less>|<frequ> EACH '#' IS 2.

As can be seen from this display, there are people both above and below the items.

This means that there are people so extraverted that they would probably respond 5 to all items and that there are people so unextraverted that they might respond 1 to all items.

As we saw above, the ordering of items is essentially the same for the analysis based on 5 responses per item vs. the dichotomized items.

PSY 5950 BF6 - 17

Person information

Obviously, there are too many respondents to usefully list them all. But it is useful to identify those whose response patterns were not consistent with the assumptions of the Rasch model.

These are persons with high Extraversion on the average who too often disagreed with “Low E” items or persons who had low Extraversion on the average who too often agreed with “High E” items.

Here they are from Table 6.1. They might be persons who could be treated as outliers???!!! Person STATISTICS: MISFIT ORDER

+------------------------------------------------------------------------------------+|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Person||------------------------------------+----------+----------+-----+-----------+-------|| 102 38 10 1.22 .45|7.06 6.0|7.53 6.1|A .02| .0 55.0| 3118 || 14 31 10 .01 .39|2.90 3.2|2.92 3.2|B-.11| 30.0 45.1| 3016 || 171 27 10 -.59 .39|2.91 3.4|2.87 3.2|C .23| 20.0 40.7| 3193 || 176 45 10 3.09 .60|2.73 2.8|2.37 2.5|D .32| 60.0 63.4| 3209 || 105 30 10 -.14 .39|2.69 3.0|2.70 3.0|E .46| 20.0 46.6| 3121 || 116 39 10 1.43 .47|2.68 2.6|2.52 2.4|F .25| 40.0 56.1| 3132 || 123 41 10 1.90 .50|2.63 2.5|2.36 2.3|G .75| 30.0 58.6| 3140 || 19 38 10 1.22 .45|2.49 2.4|2.60 2.5|H .54| 10.0 55.0| 3021 || 112 38 10 1.22 .45|2.57 2.5|2.55 2.4|I .62| 20.0 55.0| 3128 || 57 42 10 2.16 .52|2.54 2.4|2.06 1.9|J .77| 50.0 57.5| 3070 || 165 21 10 -1.53 .42|2.54 2.7|2.36 2.4|K .39| 20.0 45.3| 3187 || 61 33 10 .33 .40|2.49 2.6|2.51 2.6|L .37| 30.0 47.0| 3074 || 185 41 10 1.90 .50|2.45 2.3|2.10 1.9|M .61| 50.0 58.6| 3219 || 138 23 10 -1.20 .40|2.40 2.6|2.34 2.5|N .14| 40.0 45.6| 3157 || 15 40 10 1.65 .48|2.35 2.2|2.19 2.0|O .43| 40.0 57.4| 3017 || 101 27 10 -.59 .39|2.21 2.4|2.30 2.5|P-.01| 20.0 40.7| 3117 || 170 43 10 2.44 .54|2.18 2.0|2.26 2.2|Q .30| 50.0 59.6| 3192 || 156 26 10 -.74 .39|1.98 2.1|2.18 2.3|R .28| 50.0 42.4| 3178 || 43 30 10 -.14 .39|2.06 2.1|2.15 2.2|S-.30| 40.0 46.6| 3053 || 98 28 10 -.44 .39|2.12 2.3|2.05 2.1|T .18| 30.0 40.4| 3114 || 25 39 10 1.43 .47|2.00 1.8|1.83 1.5|U .41| 50.0 56.1| 3027 || 177 44 10 2.75 .57|1.94 1.8|1.75 1.6|V .67| 70.0 61.2| 3210 || 49 46 10 3.46 .64|1.85 1.7|1.46 1.0|W .67| 70.0 67.5| 3061 || 111 23 10 -1.20 .40|1.80 1.7|1.72 1.6|X .31| 50.0 45.6| 3127 || 129 40 10 1.65 .48|1.77 1.4|1.80 1.5|Y .53| 20.0 57.4| 3146 || 84 25 10 -.89 .39|1.74 1.7|1.68 1.5|Z .59| 30.0 45.5| 3100 || BETTER FITTING OMITTED +----------+----------+-----| | || 5 26 10 -.74 .39| .38 -1.9| .35 -2.0| .66| 60.0 42.4| 3006 || 71 27 10 -.59 .39| .36 -2.0| .33 -2.1|z .78| 60.0 40.7| 3085 || 153 35 10 .66 .42| .35 -1.8| .29 -2.0|y .85| 70.0 51.3| 3175 || 180 30 10 -.14 .39| .35 -2.0| .34 -2.0|x .80| 60.0 46.6| 3213 || 64 36 10 .84 .43| .34 -1.9| .29 -2.0|w .60| 70.0 54.6| 3077 || 1 32 10 .17 .40| .31 -2.2| .34 -1.9|v .57| 70.0 46.4| 3001 || 39 19 10 -1.90 .45| .26 -2.2| .34 -1.9|u .43| 90.0 53.2| 3047 || 67 29 10 -.29 .39| .31 -2.3| .32 -2.2|t .64| 60.0 44.1| 3081 || 76 41 10 1.90 .50| .28 -1.9| .30 -1.9|s .40| 90.0 58.6| 3090 || 122 39 10 1.43 .47| .29 -1.9| .24 -2.2|r .37| 70.0 56.1| 3139 || 150 36 10 .84 .43| .28 -2.2| .22 -2.3|q .80| 90.0 54.6| 3172 || 113 37 10 1.02 .44| .26 -2.2| .22 -2.3|p .64| 80.0 55.2| 3129 || 132 37 10 1.02 .44| .26 -2.2| .22 -2.3|o .64| 80.0 55.2| 3149 || 155 35 10 .66 .42| .25 -2.3| .21 -2.4|n .83| 90.0 51.3| 3177 || 148 35 10 .66 .42| .24 -2.4| .21 -2.4|m .83| 70.0 51.3| 3170 || 45 33 10 .33 .40| .24 -2.5| .22 -2.5|l .91| 80.0 47.0| 3055 || 27 40 10 1.65 .48| .23 -2.2| .22 -2.3|k .00| 90.0 57.4| 3029 || 48 40 10 1.65 .48| .23 -2.2| .22 -2.3|j .00| 90.0 57.4| 3060 || 149 40 10 1.65 .48| .23 -2.2| .22 -2.3|i .00| 90.0 57.4| 3171 || 183 40 10 1.65 .48| .23 -2.2| .22 -2.3|h .00| 90.0 57.4| 3217 || 135 26 10 -.74 .39| .22 -2.9| .20 -2.9|g .81| 80.0 42.4| 3152 || 44 39 10 1.43 .47| .19 -2.5| .17 -2.6|f .53| 90.0 56.1| 3054 || 77 39 10 1.43 .47| .19 -2.5| .17 -2.6|e .53| 90.0 56.1| 3092 || 35 38 10 1.22 .45| .18 -2.6| .15 -2.7|d .67| 90.0 55.0| 3043 || 182 23 10 -1.20 .40| .13 -3.4| .14 -3.3|c .75| 90.0 45.6| 3215 || 3 38 10 1.22 .45| .14 -2.9| .12 -2.9|b .72| 90.0 55.0| 3004 || 68 36 10 .84 .43| .10 -3.4| .10 -3.2|a .87| 90.0 54.6| 3082 |

PSY 5950 BF6 - 18

Those persons with negative Infit values were actually persons for whom the model fit “too well”.

PSY 5950 BF6 - 19

Comparison of Person measures with Likert summated scale scores

R-squared = .950, so r= .975. As we have come to expect, the Rasch measures expand the tails identifying differences between respondents in the tails whose summated scores are closer together.

Overall, the relationship is very very strong. So I would surmise that summated scales would suffice for this scale except in those rare instances where more precise distinctions among persons in the extremes of the distribution were of interest.

PSY 5950 BF6 - 20

5-category Rasch Measures vs Dichotomized Rasch Measures

What about the relationship of the Person measures from the analysis of 5-category responses to Person measures from the analysis of dichotomized (1,2,3 vs 4,5) data. Here’s that scatterplot.

Remember that the dichotomization process resulted in persons having identical scores on the dichotomized measure even though they had different combinations of 1, 2, 3, 4, and 5 responses.

That’s why there are multiple points arrayed vertically at each dichotomized Rasch value.

The r value is “only” .885, and there is noticeable curvature on the left and right. This suggests to me that these two analytic strategies are not equivalent. My money would be on the analysis of the 5-category responses.

PSY 5950 BF6 - 21

A 7-point Likert Extraversion scale.

The study (Balanced Scale Study) was conducted in 2009 or so. Its initial purpose was to develop a questionnaire for the Big Five for which equal numbers of positively-worded and negatively-worded pairs of items could be used for each of the Big Five dimensions. Interestingly, we’ve yet to pursue that aspect of the study, although we’ve presented papers on other analyses that were done with the data. The moral of that story is “Go with the flow.”

The data are in Bond&FoxCh6C.txt .

&INST ; initial line (can be omitted) TITLE = "Balanced Scale Extraversion Items" PERSON = Person ; persons are ... ITEM = Item ; items are ... ITEM1 = 5 ; column of response to first item in data record NI = 10 ; number of items NAME1 = 1 ; column of first character of person labelNAMELEN = 4 ; length of person label XWIDE = 1 ; number of columns per item response TOTAL = Yes ; show total raw scores CHART = Yes ; produce across-pathway picture MNSQ = No ; use Standardized fit statistics STBIAS = Yes ; Adjust for estimation biasMAXPAGE = 60 ; Maximum lines per pageIREFER= FFFFFFFFFF ; Forward and Reversed items CODES = 1234567 ; valid codes in data fileIVALUEF = 1234567 ; Forward itemsIVALUER = 7654321 ; Reversed items CLFILE = * ; category label file for category naming1+1 "Completely Inaccurate" 1+2 "Very Inaccurate"1+3 "Probably Inaccurate"1+4 "Sometimes Acc, sometimes Inacc"1+5 "Probably Accurate"1+6 "Very Accurate"1+7 "Completely Accurate"* ; end of CLFILE=* list&END01 Life of party02R Don't talk a lot03 Comfortable around people04R Keep in background05 Start conversations06R Have little to say07 Talk to diff people08R Don't draw attention09 Don't mind being center10R Quiet around strangersEND LABELS50013336454434206 cases total.

PSY 5950 BF6 - 22

The SPSS Analysis of the Extraversion scale . . .

Reliability StatisticsCronbach's Alpha N of Items

.885 10

Item-Total StatisticsScale Mean if Item

DeletedScale Variance if

Item DeletedCorrected Item-Total

CorrelationCronbach's Alpha if

Item Deletede1 43.15 92.249 .575 .877e14 42.36 84.470 .687 .869e3 42.16 89.895 .660 .872e4 42.76 86.541 .695 .868e5 42.19 88.985 .703 .869e6 42.34 91.493 .516 .881e7 42.54 87.031 .606 .875e8 43.42 91.752 .510 .881e9 42.76 87.450 .635 .873e10 43.46 83.330 .631 .874

The distribution of summated scores is quite nicely unimodal and symmetric. None of the items has a markedly deleterious effect on reliability.

PSY 5950 BF6 - 23

The Rasch analysis of the original items.

Both the person reliability value of .86 (expected r between same people with different items) and the item reliability value of .97 (expected r between same items with different people) are acceptable. CAN RELIABILITY BE TOO HIGH?? As in Likert scaling????

Item statistics ordered by number

TABLE 14.1 Balanced Scale Extraversion Items ZOU007WS.TXT Apr 4 9:58 2012INPUT: 206 Persons 10 Items MEASURED: 206 Persons 10 Items 7 CATS 1.0.0--------------------------------------------------------------------------------Person: REAL SEP.: 2.46 REL.: .86 ... Item: REAL SEP.: 5.77 REL.: .97

Item STATISTICS: ENTRY ORDER

+-----------------------------------------------------------------------------------------------------------+|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item ||------------------------------------+----------+----------+-----+-----------+------------------------------|| 1 888 206 .38 .06| .82 -1.9| .85 -1.6| .67| 40.7 35.5| 01 Life of party || 2 1050 206 -.30 .07|1.10 1.0|1.04 .4| .70| 37.7 37.6| 02R Don't talk a lot || 3 1092 206 -.50 .07| .78 -2.4| .74 -2.7| .68| 45.1 38.0| 03 Comfortable around people|| 4 968 206 .06 .06| .79 -2.2| .82 -1.9| .72| 47.5 36.7| 04R Keep in background || 5 1085 206 -.47 .07| .69 -3.4| .66 -3.7| .72| 47.1 37.9| 05 Start conversations || 6 1054 206 -.32 .07|1.23 2.2|1.27 2.4| .59| 40.2 37.7| 06R Have little to say || 7 1013 206 -.14 .07|1.20 2.0|1.24 2.2| .65| 34.3 37.8| 07 Talk to diff people || 8 833 206 .60 .06|1.11 1.2|1.19 1.9| .60| 30.4 34.1| 08R Don't draw attention || 9 969 206 .05 .06| .97 -.3| .95 -.4| .69| 37.3 36.7| 09 Don't mind being center || 10 825 206 .63 .06|1.28 2.7|1.26 2.5| .70| 33.3 33.5| 10R Quiet around strangers ||------------------------------------+----------+----------+-----+-----------+------------------------------|| MEAN 963.7 204.0 .00 .07|1.00 -.1|1.00 -.1| | 39.4 36.5| || S.D. 94.5 .0 .40 .00| .20 2.1| .22 2.2| | 5.6 1.6| |+-----------------------------------------------------------------------------------------------------------+

Wow – many of the items – most of the items – exhibit unusual fit statistics.

The negative Infit values indicate “very good, perhaps too good” fit by the model.

The positive Infit values indicate poor fit by the model.

PSY 5950 BF6 - 24

Item Statistics ordered by Measure

TABLE 13.1 Balanced Scale Extraversion Items ZOU007WS.TXT Apr 4 9:58 2012INPUT: 206 Persons 10 Items MEASURED: 206 Persons 10 Items 7 CATS 1.0.0--------------------------------------------------------------------------------Person: REAL SEP.: 2.46 REL.: .86 ... Item: REAL SEP.: 5.77 REL.: .97

Item STATISTICS: MEASURE ORDER

+-----------------------------------------------------------------------------------------------------------+|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item ||------------------------------------+----------+----------+-----+-----------+------------------------------|| 10 825 206 .63 .06|1.28 2.7|1.26 2.5| .70| 33.3 33.5| 10R Quiet around strangers || 8 833 206 .60 .06|1.11 1.2|1.19 1.9| .60| 30.4 34.1| 08R Don't draw attention || 1 888 206 .38 .06| .82 -1.9| .85 -1.6| .67| 40.7 35.5| 01 Life of party || 4 968 206 .06 .06| .79 -2.2| .82 -1.9| .72| 47.5 36.7| 04R Keep in background || 9 969 206 .05 .06| .97 -.3| .95 -.4| .69| 37.3 36.7| 09 Don't mind being center || 7 1013 206 -.14 .07|1.20 2.0|1.24 2.2| .65| 34.3 37.8| 07 Talk to diff people || 2 1050 206 -.30 .07|1.10 1.0|1.04 .4| .70| 37.7 37.6| 02R Don't talk a lot || 6 1054 206 -.32 .07|1.23 2.2|1.27 2.4| .59| 40.2 37.7| 06R Have little to say || 5 1085 206 -.47 .07| .69 -3.4| .66 -3.7| .72| 47.1 37.9| 05 Start conversations || 3 1092 206 -.50 .07| .78 -2.4| .74 -2.7| .68| 45.1 38.0| 03 Comfortable around people||------------------------------------+----------+----------+-----+-----------+------------------------------|| MEAN 963.7 204.0 .00 .07|1.00 -.1|1.00 -.1| | 39.4 36.5| || S.D. 94.5 .0 .40 .00| .20 2.1| .22 2.2| | 5.6 1.6| |+-----------------------------------------------------------------------------------------------------------+

We’ll see whether the item difficulty measures from this analysis correlate with those from the previous analysis of the same scale using different people.

Consider the item, “Quiet around strangers.” It’s reverse-scored, so persons who are extraverted would disagree with it. Based on its measure of 0.63, only the most highly extraverted respondents disagreed with it.

Consider the item, “Comfortable around people.” It’s positively-scored, so persons who are extraverted with agree with it. Based on its measure of -.50, most respondents agreed with it. Only the least extraverted would disagree with it.

PSY 5950 BF6 - 25

Average Thresholds (Output Tables -> 3.2 Rating (partial credit) scale to get both displays)+--------------------------------------------------------------------------|CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|| LABEL MEASURE S.E. | AT CAT. ----ZONE----|PROBABLTY| M->C C->M|DISCR||------------------------+---------------------+---------+----------+-----+| 1 NONE |( -2.79) -INF -2.18| | 100% 1%| | 1 Completely Inaccurate| 2 -1.29 .14 | -1.58 -2.18 -1.19| -1.80 | 39% 24%| .55| 2 Very Inaccurate| 3 -.93 .09 | -.85 -1.19 -.51| -1.12 | 23% 29%| .87| 3 Probably Inaccurate| 4 -1.17 .07 | -.13 -.51 .28| -.66 | 46% 42%| .98| 4 Sometimes Acc sometimes Inacc| 5 .74 .06 | .70 .28 1.13| .39 | 31% 52%| 1.08| 5 Probably Accurate| 6 .81 .06 | 1.65 1.13 2.43| 1.04 | 42% 43%| 1.03| 6 Very Accurate| 7 1.85 .08 |( 3.17) 2.43 +INF | 2.14 | 82% 29%| 1.13| 7 Completely Accurate+--------------------------------------------------------------------------M->C = Does Measure imply Category?C->M = Does Category imply Measure?

Below is a map of the Rasch Response values, with labels to help show how they correspond to the Likert values.The Likert response values seen by respondents are successive integers, with equal distances between adjacent values.

On the figure below, I’ve attempted to place the labels “CI”, “VI”, etc at locations corresponding to their estimated values.

1 2 3 4 5 6 7 CI VI PI S PA VA CA

-.----.----.----.----.----.----.----.----.----.----.----.----.----.----.----.----.--4 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0

Note that the ordering of the label estimates: -2.79 < -1.58 < -0.85 < -0.13 < +0.70 < +1.65 < +3.17 is what we would expect.

The estimated differences between labels are 1.21 0.73 0.72 0.83 0.95 1.52

These differences are not exactly equal as is assumed by the labels seen by respondents.

But the sample size is only 206 and the differences between values of adjacent labels are “roughly” equal, close enough to being equal to justify the “equal intervals” assumption of Likert scale scoring.

Plot of Rasch response values vs. Likert successive integer “presented to respondent” response values The linearity of the relationship and the fairly uniform locations of the Rasch response values are certainly support for the common practice of calling the labels “1”, “2”, “3”, etc.(See 5130 lecture on Scale construction.)

PSY 5950 BF6 - 26

Probability of use of each response . . .

CATEGORY PROBABILITIES: MODES - Structure measures at intersectionsP ++-------+-------+-------+-------+-------+-------+-------++R 1.0 + +O | 777|B .8 +1111 77777 +A | 111 777 |B .6 + 11 777 +I .5 + 11 444 777 +L .4 + 11 4444 4444 66666**666 +I | 2222222**244 555****5 77 66666 |T .2 +222222 33*******33**55 666 4***555 66666 +Y | 3333**44 **********3**777 44444555555 6666| .0 +*********************************************************+O ++-------+-------+-------+-------+-------+-------+-------++F -3 -2 -1 0 1 2 3 4 Person [MINUS] Item MEASURE

Note that the X-axis of the graph is Person measure MINUS Item measure.

So if a person’s measure was below the item’s measure – the person was less extraverted than the item, the X-axis value would be negative.

If a person’s measure was equal to an item’s measure, the X-axis value would be 0.

And if a person’s measure was greater than the item’s, the X-axis value would be positive.

If a person was less extraverted than an item, we’d expect that person to disagree with the item - to respond 1 or 2.

If a person was about as extraverted as the item, we’d expect the response to be in the middle of the range – 3 or 4 or 5.

If the person was more extraverted than the item, we’d expect the response to be high – 6 or 7.

Although it’s difficult to follow the curves for each response category on this map, it appears that the probabilities of using the different response categories make sense. The highest probability response for a person whose extraversion was 3 units below the item’s measure was 1. The highest probability response for a person whose E was at the item’s level was 4. The highest probability of response of those 3 units above the item’s measure was 7. Wow!! This is great.

Thanks, participants!!

PSY 5950 BF6 - 27

Item map (Output Tables -> 12. Item map)

TABLE 12.2 Balanced Scale Extraversion Items ZOU007WS.TXT Apr 4 9:58 2012INPUT: 206 Persons 10 Items MEASURED: 206 Persons 10 Items 7 CATS 1.0.0--------------------------------------------------------------------------------

Persons MAP OF Items <more>|<rare> 4 # + | | .# | | 3 # + # | | # T| ### | 2 # + ## | ### S| ####### | .##### | 1 .####### + .######### |T ###### M| 08R Don't draw attention 10R Quiet around strangers ########## |S 01 Life of party .####### | 0 .######### +M 04R Keep in background 09 Don't mind being center ########## | 07 Talk to diff people .#### S|S 02R Don't talk a lot 03 Comfortable around people 05 Start conversations 06R Have little to say #### | ### |T -1 .# + # | .# T| . | | -2 + <less>|<frequ> EACH '#' IS 2.

As was the case with the Bias data above, there are persons with less Extraversion than the lowest E item and there are persons with more Extraversion than the item with the largest measure of E.

PSY 5950 BF6 - 28

Person statistics

As above, it’s not possible to display all the person measures. I won’t display the person information in misfit order to save space.

The relationship of Rasch measures on the Y-axis to summated scores on the X-axis..

The “stretching” of the extremely high scores for the Rasch measures is apparent here. There is no such “stretching” of the low scores, perhaps because there was no one in this sample who scored lower than a mean of 2 on the 1 to 7 scale.

So the Rasch measures are telling us that there are a few extraordinarily extraverted people in the sample, something the “compressed-tail” Likert measures didn’t tell us about.

PSY 5950 BF6 - 29

Comparison of item measures

Since we have item measures for the same items from two studies, with different response formats (1 to 5 vs 1 to 7), it is of interest to compare them.

Here are item measures from the 1-7 scale vs item measures from the 1-5 scale (taken from the above tables) . . .

Clearly, as we would hope, the items that represent the most extraversion when a 5-point response scale is used are the same items that represent the most extraversion when a 7-point scale is used. The r-squared is .957, which means the r is .978. Recall that the estimated item reliabilities from the two analyses were .98 and .97 for the two studies.

This is evidence for the equivalence of 5 vs 7-point scales. (We had a negative review from a referee of a paper saying that using a 7-point scale introduced “bias” into the measurements. He/she was wrong.

PSY 5950 BF6 - 30

1-7

1-5

INFIT and inconsistency

Infit

The infit measure is computed from mean “Inlier-sensitive or Information-weighted fit.”

It is based on the squared differences between observed responses (0 or 1 in right/wrong tests; response value in Likert scales) and the probabilities of response based on the Rasch model. So infit is based on squared residuals.

Through computational magic, the mean of the quantity computed for data for which the Rasch model is an appropriate fit is 1.0. (This is based on chi-square divided by degrees of freedom computations.)

Values of Raw Infit about equal to 1 represent goodness of fit that would be expected if the model were true with the exception of random variability.

Values Raw infit larger than 1 represent differences between actual responses and predicted probabilities that are greater than would be expected by chance.

Values of Raw infit less than 1 represent differences that are smaller than would be expected by chance, perhaps representing goodness of fit that is “too good.”

The Z (sometimes it’s called t) statistic that is printed is simply the infit value for a person minus 1 divided by the estimate of the standard deviation. So

ZInfit ~ ~ 0 Goodness of fit for a person that would be expected.

ZInfit > > 0 Goodness of fit that is poor, larger than expected.

ZInfit << 0 Goodness of fit that might be “too good”.

PSY 5950 BF6 - 31

Inconsistency

In our research program here at UTC, we’ve been investigating a concept we call response inconsistency. (James Nauert’s thesis determined that response inconsistency is stable over time.) This represents differences in response participants make to items from the same personality dimension.

For a single dimension, our measure of inconsistency is computed as the standard deviation of responses of a person to the items from that dimension.

In our research we’ve computed the inconsistency for each of the five dimensions of the Big 5 and then averaged them to get a measure that had acceptable reliability (alpha around .7 or so).

We’ve found that inconsistency measured as we compute it is related to reliability. If we split a sample into inconsistent responders vs. consistent responders, the reliabilities of scales computed from the consistent responders are significantly larger than those of the inconsistent responders.

We’ve also found that inconsistency measured as we do is related to validity. Validity of conscientiousness as a predictor of UGPA is larger for consistent responders than it is for inconsistent responders.

Reddock, C. R., Biderman, M. D., & Nguyen, N. T. (2011). The relationship of reliability and validity of personality tests to frame-of-reference instructions and within-person inconsistency. International Journal of Selection and Assessment, 19, 119-131.

Biderman, M. D., & Reddock, C. M. (2012).  The relationship of scale reliability and validity to respondent inconsistency.  Personality and Individual Differences, 52, 647-651.

Khan, R. (2012). An examination of two methods of measuring inconsistency. MS Thesis. University of Tennessee at Chattanooga.

Khan, R., & Biderman, M. D. (2013). Behavioral variability measured using Likert and frequency based response formats. Paper presented at the 28th annual conference of The Society for Industrial and Organizational Psychology, Houston, TX, April.

Nauert, J. (2017). Stable Inconsistency: A Study of Response Inconsistency over Time. MS Thesis – University of Tennessee at Chattanooga.

Look at inconsistency across dimensions – especially Agreeableness . Thanks, Aaron.

PSY 5950 BF6 - 32

Is person infit computed from Rasch analyses related to inconsistency computed from Likert analyses?

I performed Rasch analysis of the data of each of the Big Five scales from the Balanced Scale study.

For each analysis, I copied the Z infit measures for all 206 respondents and pasted them into SPSS. I then computed the mean of the five Z infit measures – for E, A, C, S, and O.

Next I computed the SD of raw responses to items within each dimension and averaged those 5 standard deviations – E, A, C, S, and O to get a mean Likert measure of Inconsistency.

Finally, I correlated the mean of the within-dimension standard deviations (our Likert measure of inconsistency) with the mean of the Z infit measures. Here’s the result . . .

The r-squared is .751, meaning the correlation between the two measures is .867. N = 206.

So the Rasch analyses and the Likert analyses are connected in this way.

PSY 5950 BF6 - 33

Mean Infit

Mean Inconsistency

Does Rasch Conscientiousness predict GPA better than Likert Conscientiousness?

A combined sample of N=1143.

We have GPA information on all participants. (No names, just GPAs.)

The correlation of Likert Conscientiousness scales with GPA is .180 (p < .001), about as would

be expected.

The question is, if we measured Conscientiousness as a Rasch measure, would it be more (or less

or equally) valid than the Likert measure?

PSY 5950 BF6 - 34

Here’s the Bond&Fox Steps data file – the first part of it, at least.

This file is: Bond&FoxChapter6GCombo.txt in Bond&Fox data folder

&INST ; initial line (can be omitted) TITLE = "RosOPQFORBalSc Conscientiousness Items" PERSON = Person ; persons are ... ITEM = Item ; items are ... ITEM1 = 5 ; column of response to first item in data record NI = 10 ; number of items NAME1 = 1 ; column of first character of person labelNAMELEN = 4 ; length of person label XWIDE = 1 ; number of columns per item response TOTAL = Yes ; show total raw scores CHART = Yes ; produce across-pathway picture MNSQ = No ; use Standardized fit statistics STBIAS = Yes ; Adjust for estimation biasMAXPAGE = 60 ; Maximum lines per pageIREFER= FFFFFFFFFF ; Forward and Reversed items CODES = 1234567 ; valid codes in data fileIVALUEF = 1234567 ; Forward itemsIVALUER = 7654321 ; Reversed items CLFILE = * ; category label file for category naming1+1 "Completely Inaccurate" ; Item 1 is a forward item1+2 "Very Inaccurate"1+3 "Probably Inaccurate"1+4 "Sometimes Acc, sometimes Inacc"1+5 "Probably Accurate"1+6 "Very Accurate"1+7 "Completely Accurate"* ; end of CLFILE=* list&END1 Am always prepared 2R Leave my belongings around 3 Pay attention to details 4R Make a mess of things 5 Get chores done right away 6R Often forget to put things back7 Like order 8R Shirk my duties 9 Follow a schedule 10 Am exacting in my work END LABELS6027554433443460366746575766

PSY 5950 BF6 - 35

The first thing to do is to do the usual checks of item appropriateness.

So Output Tables -> 13. Item: Measure

TABLE 13.1 RosOPQFORBalSc Conscientiousness Items ZOU507WS.TXT Mar 27 19:31 2014INPUT: 1143 Persons 10 Items MEASURED: 1143 Persons 10 Items 7 CATS 1.0.0--------------------------------------------------------------------------------Person: REAL SEP.: 1.97 REL.: .79 ... Item: REAL SEP.: 9.87 REL.: .99

Item STATISTICS: MEASURE ORDER

+----------------------------------------------------------------------------------------------------------------+|ENTRY TOTAL MODEL| INFIT | OUTFIT |PTMEA|EXACT MATCH| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| Item ||------------------------------------+----------+----------+-----+-----------+-----------------------------------|| 5 4623 1143 .61 .03| .93 -1.7| .95 -1.2| .65| 36.3 33.8| 5 Get chores done right away || 6 5108 1143 .28 .03|1.17 4.0|1.17 3.9| .67| 31.7 34.9| 6R Often forget to put things back|| 2 5373 1143 .10 .03|1.33 7.3|1.32 6.9| .62| 29.2 36.0| 2R Leave my belongings around || 8 5412 1142 .07 .03| .96 -1.0|1.09 2.0| .49| 37.8 35.8| 8R Shirk my duties || 10 5539 1143 -.02 .03| .87 -3.3| .93 -1.6| .51| 42.8 36.0| 10 Am exacting in my work || 9 5608 1143 -.08 .03|1.09 2.1|1.08 2.0| .61| 36.7 35.8| 9 Follow a schedule || 1 5663 1143 -.12 .03| .66 -9.3| .68 -8.6| .63| 44.8 36.2| 1 Am always prepared || 4 5706 1143 -.15 .03| .96 -.9| .99 -.2| .59| 40.3 35.9| 4R Make a mess of things || 7 5879 1143 -.29 .03| .94 -1.4| .93 -1.7| .63| 40.6 36.8| 7 Like order || 3 6032 1143 -.41 .03| .97 -.8|1.00 .1| .50| 40.7 36.8| 3 Pay attention to details ||------------------------------------+----------+----------+-----+-----------+-----------------------------------|| MEAN 5473.3 1139.9 .00 .03| .99 -.5|1.01 .2| | 38.1 35.8| || S.D. 381.5 .3 .28 .00| .17 4.2| .16 3.9| | 4.6 .8| |+----------------------------------------------------------------------------------------------------------------+

As we discovered previously, some of the items have poor fit measures.

Only the most Conscientious respondents endorsed “Got chores done right away.”

The least Conscientious respondents endorsed “Pay attention to details.”

Output Tables -> 3.2 Rating (partial credit) scale

The estimates of the 7 response “values” were reasonable . . .+--------------------------------------------------------------------------|CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|| LABEL MEASURE S.E. | AT CAT. ----ZONE----|PROBABLTY| M->C C->M|DISCR||------------------------+---------------------+---------+----------+-----+| 1 NONE |( -2.91) -INF -2.24| | 100% 1%| | 1 CI: Completely Inaccurate| 2 -1.52 .08 | -1.58 -2.24 -1.16| -1.90 | 41% 8%| .92| 2 VI: Very Inaccurate| 3 -.76 .04 | -.81 -1.16 -.47| -1.06 | 28% 21%| .91| 3 PI: Probably Inaccurate| 4 -1.03 .03 | -.12 -.47 .26| -.59 | 43% 45%| 1.02| 4 S: Sometimes Acc sometimes Inacc| 5 .57 .02 | .67 .26 1.11| .32 | 31% 56%| 1.02| 5 PA: Probably Accurate| 6 .80 .02 | 1.65 1.11 2.48| 1.02 | 42% 36%| 1.04| 6 VA: Very Accurate| 7 1.94 .03 |( 3.24) 2.48 +INF | 2.20 | 75% 17%| 1.00| 7 CA: Completely Accurate

“1” “2” “3” “4” “5” “6” “7” CI VI PI S PA VA CA

-.----.----.----.----.----.----.----.----.----.----.----.----.----.----.----.----.--4 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0

Rasch DimensionHere’s a plot of the Likert response values vs. the Rasch response values.

So, again, there is a little bit of stretching of the response scale at each end.

PSY 5950 BF6 - 36

Comparison of distribution shapes

Clearly the two distribution shapes are different. The Likert scale score distribution is slightly negatively skewed.

The Rasch measure distribution is positively skewed..Which is more appropriate? “truer”? Rasch adherents would say the Rasch distribution is likely closer to the true distribution, though no one has actually seen that “true” distribution.

PSY 5950 BF6 - 37

Correlation of individual Rasch person vs. Likert Person measures.

The “stretching” of the extremes is quite apparent in the graph, in particular the “high conscientiousness” extreme, as would have been expected from the

So, which of the two ways of measuring Conscientiousness is better for prediction of GPA?

Correlations

corig crasch

GPA Ros:EOYGPA; OPQ:EOSGPA

FOR:EOSGPA

Pearson Correlation .180 .158

Sig. (2-tailed) .000 .000

N 1143 1143

The original measure appears to be the winner here.

WHAT’S GOING ON?

PSY 5950 BF6 - 38

Note, however, that the distribution of GPAs is negatively skewed.

So the distribution of the Y variable has the same shape as the distribution of corig, but has a very different shape from the distribution of crasch.

PSY 5950 BF6 - 39

Here are the two scatterplots.

Note that the relationship of gpa with both corig and crasch is curvilinear, particularly with crasch. This may mean that the criterion measure is not representing the actual “performance” of students appropriately – particularly the students with high conscientiousness.. What would be needed is a Rasch measure of gpa. Hmm.

PSY 5950 BF6 - 40

Revisiting this issue, including a quadratic component in the prediction of GPA

The SPSS syntax to create quadratic polynomial components.

compute crasch2 = crasch**2.compute corig2 = corig**2.regression variables = gpa corig corig2 /dep=gpa /enter /save= pred(rawpred).regression variables = gpa crasch crasch2 /dep=gpa /enter /save= pred(raschpred).

The results . . .

Regression of GPA onto summated C and summated C2.

Model Summaryb

Model R R Square Adjusted R SquareStd. Error of the

Estimate1 .190a .036 .034 .6456a. Predictors: (Constant), corig2, corigb. Dependent Variable: GPA

Coefficientsa

ModelUnstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Beta1 (Constant) 1.431 .438 3.270 .001

corig .516 .183 .658 2.814 .005corig2 -.039 .019 -.482 -2.059 .040

a. Dependent Variable: GPA

Regression of GPA onto Rasch C & Rasch C2.

Model Summaryb

Model R R Square Adjusted R SquareStd. Error of the

Estimate1 .188a .035 .034 .6458a. Predictors: (Constant), crasch2, craschb. Dependent Variable: GPA

Coefficientsa

ModelUnstandardized Coefficients

Standardized Coefficients

t Sig.B Std. Error Beta1 (Constant) 2.891 .024 120.205 .000

crasch .230 .038 .287 6.126 .000crasch2 -.045 .013 -.165 -3.520 .000

a. Dependent Variable: GPA

So, when the quadratic nature of the relationship of gpa to C is taken into account, summated C and Rasch C yield virtually identical predictions of gpa.

PSY 5950 BF6 - 41