Cognitive abilities and information system usability

0306.4573/94 $6.00 + .oO Copyright 0 1993 Pergamon Press Ltd.

COGNITIVE ABILITIES AND INFORMATION SYSTEM USABILITY

BRYCE ALLEN Graduate School of Library and Information Science,

University of Illinois at Urbana-Champaign, Urbana, IL 61801, U.S.A.

(Received 12 October 1992; accepted in final form 1 March 1993 )

Abstract-Two experiments were undertaken to determine how cognitive abilities of users of information systems and specific design features that might be implemented in information technology combine to create system usability. In one case, an interaction was found between logical reasoning and order of presentation of references. The cognitive ability was found to combine with system characteristics in a non-uniform manner. The existence of this interaction between user characteristics and system features is interpreted as an opportunity for incorporating user-selectable options in an information retrieval system. In the other case, there was no interaction between perceptual speed and the way index terms were presented in browsable displays. This suggests that user characteristics and system features combined uniformly to create system usability. This uniformity is interpreted as indicating that information system designers have a relatively simple choice between forms of browsable display.

INTRODUCTION

One of the objectives of information science research has been to explore mechanisms that improve the performance of information retrieval systems. This focus on system performance as an end in itself has been achieved by taking an abstract and simplistic approach to the people for whom the information technology is intended: the users. For example, system performance is frequently assessed by using one or more standard datasets, with standard queries and relevance judgments. These standard queries and relevance judgments provide an abstract representation of system users that may bear little resemblance to the ways in which real people use the technology.

An alternative objective for information science research is to explore mechanisms that improve the usability of information retrieval systems. System usability is an objective that is more difficult to operationalize than system performance, and accordingly more difficult to assess. One of the most thorough approaches to understanding system usability was presented by Eason et al. (1987). In their approach, the usability of information systems is related to both system performance and user performance, and is achieved by designs that take into account the physical and psychological characteristics of users, the tasks they are likely to accomplish, and the environments in which they work. In other words, the objective of achieving system usability can be achieved if researchers take into account a variety of user characteristics along with system characteristics.

When information science research moves beyond abstract representations of users and begins to assess usability with real people, a number of relationships between system characteristics and user characteristics are possible. The following theoretical framework has helped define the research presented here, and may help outline the options faced both by researchers who investigate system usability and by engineers who work on designing usable systems. This framework begins with the assumption that there are two information systems, one of which has been found to have a higher level of performance than the other. If these two systems are evaluated with a wide range of users, the possible outcomes of the evaluation can be divided into uniform and non-uniform results.

Correspondence should be addressed to Bryce Allen, Assistant Professor, Graduate School of Library and information Science, University of Illinois at Urbana-Champaign, 1407 W. Gregory Dr., Urbana, IL 61801.

,111 30*2-I 177

178 B. ALLEN

In some evaluations, user characteristics may influence system usability in a uniform manner. This effect can be divided further into positive and negative uniform influences. A positive effect on usability would occur if users uniformly find the system with higher performance easier to use than the system with lower performance. User characteristics wodd increase the usability of the system by an amount greater than that predicted by the improved performance of the system. A negative effect on usability would occur if users uniformly found the new system much more difficult to use than the previous system. The negative effect of user characteristics might cancel the increase in usability predicted by the enhancement in system performance.

In other evaluations, user characteristics would influence system usability, but not in a uniform manner. In such cases, system usability might be enhanced for some users, and remain unchanged, or even degraded, for other users. Given the number of possible user characteristics that can influence how people use information systems, it seems likely that there will be many instances in which user characteristics influence system usability in a non-uniform manner.

In the case of evaluations that show a uniform influence of user characteristics, the formula for assessing system usability is a simple additive one. For system designers, decisions about which version of the information system to implement can be reasonably straightforward. The system with the greatest level of usability, taking both system performance and user performance into consideration, should be implemented.

In the case of evaluations that show a non-uniform influence of user characteristics, the formula for assessing system usability is no longer simply additive. There is an interaction between user characteristics and system characteristics. For designers, decisions about system implementation are more complex. It may be preferable to implement both versions of the system, and to give users the choice as to which one they will use.

This type of interaction between system characteristics and user characteristics means that it is possible to customize information systems for some users. Fischer and Lemke (1988), building on Illich’s (1973) ideas about convivial tools, introduced an approach they called “convivial computing.” Convivial information technology gives users control over the functioning of the system, or enables the user to adapt the system to meet particular needs. Interactions between user characteristics and system characteristics indicate oppor- tunities for designers to build information systems with user-selectable options, thus giv- ing users control over the functioning of the system. This kind of customizable system seems to meet the requirements for convivial information technology.

The objective of usability in information technology can be achieved if system designers understand how system features and user characteristics combine. One set of user characteristics that seems relevant to information technology design is their cognitive abilities. Cognitive abilities or aptitudes are usually defined as people’s capabilities to perform a variety of mental functions such as perception, memory, and reasoning. In addition, these abilities are frequently seen as components of larger constructs such as intelligence. For example, Guilford’s “structure of intellect” model (Guilford, 1967) treated cognitive abilities as components of intelligence.

In a pre-test study, Allen (1992) found a number of interesting correlations between the cognitive abilities of users and their performance in searching a standard CD-ROM index. In particular, higher scores on cognitive abilities tests were associated with more thorough searches, and with more acceptable search outcomes. Naturally, lower scores on the tests were associated with less satisfactory results, and these correlations suggested that certain design changes might help some people use information systems more effectively. These correlations seemed to present instances of the kind of interaction between system characteristics and user characteristics that would lead to customizable, convivial information technology. The purpose of the present research was to test the effects of changes in information system design by assessing the usability of systems incorporating alternative design features.

In each of two experiments, two experimental information systems that embodied different approaches to the design of information systems were created. Levels of the relevant cognitive ability of participants were tested, and participants were randomly assigned

Cognitive abilities 179

to one of the two alternative systems. Performance of the participants in using the experimental systems was monitored, and performance measures were analyzed to assess whether the effects of cognitive abilities on retrieval performance varied between the two experimental systems.

EXPERIMENT 1

Background The cognitive ability tested in this case was logical reasoning, defined as “the ability

to reason from premise to conclusion, or to evaluate the correctness of a conclusion” (Ekstrom et al., 1976, p. 141). The pre-test found a correlation between this ability and user selectivity. Participants with higher scores on the logical reasoning tests selected fewer references as being potentially useful (r = -.31, p < &I), and selected fewer non-relevant references as part of their retrieval sets (r = -.41, p < .Ol). It was suggested that one way to augment selectivity in users with lower levels of logical reasoning ability would be to employ a system that presented references in the order of their anticipated relevance to the search topic.

The test systems Two programs were written in Z Basic on an Apple Macintosh computer. Both pre-

sented a sequence of 60 references drawn from the Reader’s Guide to Periodical Litera- ture database. References were presented one at a time, and each reference was followed by the question, “Print this citation?” Participants answered “Y” or “N” to this question. If any other response was entered, the reference was presented again until one of the two acceptable answers was given. The only difference between the two systems was the order of presentation. In the first system, the references were presented in reverse chronological order, which is the standard order of presentation for many existing information systems. In the second system, the references were presented in order of their relevance to the search topic.

In systems that use this order of presentation, the order is usually based on an algo- rithm that calculates a similarity measure between the vocabulary of the references and the vocabulary of the search expression (Radecki, 1988). In this research, however, relevance judgments from the 50 participants in the pre-test were available. These judgments, made during the course of actual searches of the database, were of a binary nature. Participants simply stated whether they thought a reference was “good” (potentially useful) or not. The relevance score for any reference was the proportion of participants who saw a given reference and also selected it as potentially useful. The 60 references used in this investigation were chosen so that they represented the full range of relevance scores, from 0.00 to 0.98. Thirty of the references achieved relevance scores above S, and were considered to be relevant for the purposes of analysis, and the remaining 30, achieving relevance scores below S, were considered non-relevant. Examples of relevant and non-relevant references are presented in Fig. 1.

Materials The stimulus article was a two-page abridgment of this article: Eron, L.D. (1980). Pre-

scription for reduction of aggression. American Psychologist, 35, 244-252. This article has been used in a number of studies (Allen, 1990, 1992), and its Flesch-Kincaid reading ease score of 12.3 shows that it is suitable for undergraduates. Tests from the Kit of Factor- Referenced Cognitive Tests (Ekstrom et al., 1976) were used to assess the logical reasoning ability of participants in this research. The first test used was the Nonsense Syllogisms Test, in which participants were presented with formal syllogisms that, because of their non- sensical content, could not be solved by reference to existing knowledge. The conclusions of some of the syllogisms followed correctly from the premises, but for the remainder of the syllogisms the conclusions did not follow logically from the premises. The task for the participants was to indicate whether or not the conclusion presented was logically correct.

180 B. ALLEN

Relevant References

Greenberg, Joel Parental behavior, TV habits, IQ predict aggression (study by Leonard D. Eron and others) Science News ~124 ~148 September 3 ‘83 SU8JECl’S COUERED: Aggressiueness in chitdren ***********************r(l********yg*********~*******~***********************

Bower-, 8ruce Kids’ aggressiue behauior linked to watching TU uiakrnce (research by 1. Rowetl Uuesmann and others) Science News ~126 ~196 September 22 ‘84 SUBJECTS COUERED: ftggressiueness in chttdren Teteuision and chitdren Uiolence tn teteuision **********************************~**************************~*****

Non-Retem References

ilM, James M. nut~f-~~tro# media hear harsh criticism tNational Council of Churches report) The Christian Century ~102 ~883-4 October 9 ‘85 sU8JEcTs COWTED: Utotence in teleuision NatTonal Council of Churches *~*************************************************~****~*****

TU makes kids fat Preuentton tEmmaus, Pa) ~37 ~14 September ‘85 il sU8JEcTs COALS: Chtldhood obesity Teleuision and children **********************************************************************

Fig. 1. Examples of relevant and non-relevant references.

The second test used to indicate this ability was the Diagramming Relationships Test, in which participants were presented with three objects (e.g., animals, cats, and dogs), and were asked to select the diagram from a set of five diagrams that best illustrated the inter- relationships among the three objects. In both of these tests, the number of correct answers achieved during a set time period, Iess a penalty for incorrect answers, was the score recorded for the participant.

Participants Eighty volunteer participants, drawn from the general student population of the Uni-

versity of Illinois, took part in this research. The demographics of this sample showed that it represented a broad cross-section of the academic community. Students were paid $4.00 for their participation, which lasted 30-40 minutes. The results for two of the students on the logical reasoning test and the experimental tasks seemed to indicate that they had mis- understood the instructions, and were excluded from the analysis, resulting in a sample of 78 participants.

Procedures


Participants first read the stimulus article. They were instructed in writing, “Please read carefully the following article. In a few minutes you will be asked to identify references to additional articles on the same topic.” Then they completed the cognitive tests, which acted as an unrelated interpolated task and ensured that long-term memory and understanding were being used in the experimental task rather than short-term or rote memory. This delay was not sufficient to permit significant deterioration of recall of the stimulus article. Long-term memory for the substance of expository documents of at least a week has been demonstrated in the information retrieval setting (Allen, 1990). The two tests were presented in random order, to ensure that learning effects were nullified. Following the tests, which lasted approximately 20 minutes, participants were randomly assigned to one of the two experimental systems. The general instructions were:

Choosing References. A few minutes ago you read an article on a topic. Now, assume that you are doing a term paper on that topic for one of your courses. To do your term paper, you want to find more articles on the topic. You will be examining a series of references to articles, and deciding if you think they would be useful in preparing your term paper. If you think a reference is potentially useful, answer “Y” when asked whether you want to print the reference. Otherwise, answer “N.”

They were informed about the order of presentation of the references they were about to see. In the case of the reverse chronological order presentation, the instructions said, “Note: The references you will be examining are arranged so that you will be seeing more recent references first, followed by older references.” For the second interface, the instructions said, “Note: The references you will be examining are arranged so that you will be seeing references that have been judged to be more relevant to the topic first, followed by references that have been judged to be less relevant to the topic.”

After completing the assessment of the 60 references, participants completed a short questionnaire that determined their demographic characteristics and asked them to assess the difficulty of the task they had just completed.

Analysis The hypothesis tested in this experiment was that there would be an interaction

between the system design variable (i.e., the two orders of presentation) and the individual differences variable (i.e., logical reasoning) as influences on user selectivity. This hypothesis was tested by general linear modeling, a statistical technique that combines the features of ANOVA and linear regression. A model containing the hypothesized effects was tested after the effects of the control variables had been accounted for. Where appropriate, variables were encoded as dummy variables, or transformed using a rank transforma- tion, to ensure validity of the statistical analysis.

Dependent variables. The experimental systems logged the time taken to complete the experimental task and the responses of the participants to each reference, and calculated the number of relevant and non-relevant references selected by each participant. An additional dependent variable was provided by a questionnaire item that asked participants how difficult they found the task of choosing which references were potentially useful.

Independent variables. Because the objective of this research was to investigate the interaction between logical reasoning ability and the order in which the information system presented the references, independent variables were the results of the logical reasoning tests and the order of presentation. Although analysis showed that the two logical reasoning tests achieved moderate reliability (Cronbach’s CY = .65), it seemed likely that the two tests were assessing somewhat different aspects of logical reasoning. Accordingly, they were included in the analysis as separate independent variables.

Control variables. Demographic and knowledge variables were controlled by .in- eluding them in the analysis. Academic department, academic level (freshman through graduate student), gender, age, familiarity with the stimulus topic, frequency of use of microcomputers, and frequency of use of libraries were all reported on the questionnaire and were included in the analysis.

182

Findings

B. ALLEN

A significant interaction between logical reasoning ability and system type was found for user selectivity, as operationalized by the number of non-relevant references selected as being potentially useful. The measure of logical reasoning found to be the best predictor of selectivity was the Diagramming Relationships Test. The results of the analysis are given in Table 1. These results clearly showed an interaction between the ability measured by the Diagramming Relationships Test and the order of presentation of the references.

Results of the Nonsense Syllogisms Test had no significant effect on any of the dependent variables, either alone or in interaction with the order of presentation. None of the control variables was found to be a significant predictor of selectivity. The significant effect outlined on Table 1 was limited to selection of non-relevant references. There were no effects on selection of relevant references, perceived difficulty of the search task, or time to complete the task. This pattern of findings indicates a clear and unambiguous locus within the information retrieval process for the interaction effect.

This interaction is illustrated in Fig. 2, which plots the results for both systems (i.e., for both orders of presentation), and shows the linear regression lines for each system. Fur- ther analysis showed that data from the participants who used the interface with the usual (reverse chronological) order of presentation exhibited a significant negative correlation between logical reasoning scores and number of non-relevant references selected as potentially useful (r = -.347, p < .04). This finding paralleled that of the pre-test, in which higher logical reasoning skills were associated with lower levels of selection of non-relevant references. However, data from the participants who used the interface that presented references in order of relevance showed no such negative relationship (r = .183, p > .25). This non-significant correlation indicates that participants with all levels of logical reasoning were equally selective when they were using the relevance-ordered interface.

EXPERIMENT 2

Background The cognitive ability tested in this case was perceptual speed. This ability is defined

as “speed in . . . carrying out very simple tasks involving visual perception” (Ekstrom et al., 1976, p. 123). The pre-test found a correlation between this ability and search precision. Search precision is normally defined as the proportion of relevant reference retrieved to total references retrieved. In the pre-test, participants with higher scores on the perceptual speed tests achieved searches with higher precision ratios (r = .37, p < .Ol). Users with higher levels of perceptual speed may have been able to scan the subject heading list presented by a browse search interface more effectively than users with lower levels of perceptual speed. Such a facility in scanning heading lists would enable users to find more acceptable subject headings, and to complete more precise searches.

The test systems Two browse interfaces were programmed on the Apple Macintosh, using Z Basic. Each

of the interfaces presented a list of over 700 subject headings, from which participants were asked to select the headings they thought best represented a topic. The first interface presented the subject headings in alphabetical order, with 23 headings appearing on each screen. A number of mechanisms permitted browsing through the subject heading list. For

Table 1. Predictors of number of non-relevant references selected as potentially useful

Source Sum-of-squares DF Mean-square F-ratio p

Logical reasoning 3.126 1 3.126 0.094 0.76 System used 196.959 1 196.959 5.9 0.018 Logical reasoning * system used 159.924 1 159.924 4.79 0.032 Error 2470.487 14 33.385


30

20

10

0

m

l l l n

cl a .

m

8 m I n l n q n . cl0 0

8 n a n

m

cl

0 10 20 30

0 Reverse Chronological Order Logical Reasoning Ability

Fig. 2. Interaction between logical reasoning and system type in predicting selectivity.

example, participants could enter a search expression, and the list would scroll to the heading most closely matching that search expression. Other options for moving through the list included the “page up,” “page down,” “up arrow,” and “down arrow” keys. The standard Macintosh scroll bar at the right side of the list was also available to allow the user to move through the list a page at a time, or a line at a time.

Headings from the list were selected by highlighting them (either by clicking on the heading or by scrolling to it), then hitting the “return” key. When this was done, a win- dow appeared identifying the selection that had been made. No references were displayed by the interfaces, because this experiment focused specifically on the effects of different mechanisms for scanning subject heading lists.

The second interface had the same mechanisms for scanning the list of subject headings, but presented the list in a different way. The initial display showed 39 top-level headings (i.e., those headings that consisted of a main heading without a sub-heading). Nineteen of these top-level headings were followed by an asterisk (*), indicating that those headings could be expanded. At the top of the display were two buttons labeled “EXPAND” and “CONTRACT.” If one of the 19 headings was highlighted and the “expand” button clicked, the list was expanded to include all of the second-level headings under that top-level heading (i.e., all of the headings consisting of the main heading followed by one subheading). Subsequent clicking on the “contract” button while the top-level heading was highlighted

184 B. ALLEN

contracted the list so that the second-level headings were no longer displayed. Of the 290 second-level headings, 101 were identified with an asterisk, meaning that they could also be expanded to show third-level headings (i.e., those consisting of a main heading followed by two sub-headings). There was a total of 418 of these third-level headings. For example, it was possible for a participant to scan through the original list of 39 top-level headings and select the heading “AIRLINES.” Clicking on the “expand” button would display 60 second-level headings, including “AIRLINES-DEREGULATION.” The participant could then scan through those 60 headings, select the “AIRLINES-DEREGULATION” heading, and click the expand button again, thus revealing a third-level heading, “AIRLINES-DEREGULATION-ECONOMIC ASPECTS.” Figures 3 and 4 show the appearance of the interface screens at different levels of expansion. In summary, the second browse interface enabled users to find appropriate subject headings by browsing selec- tively through the hierarchy of headings. It was anticipated that this additional capability would minimize the amount of scanning required, and so enable users with lower levels of perceptual speed to complete more precise searches.

For complete comparability between the two interfaces, it would have been desirable to have “EXPAND” and “CONTRACT” buttons, hopefully with some reasonable func- tion, on the first interface as well as the second. This possibility was considered and rejected, because no meaningful purpose for these buttons could be devised in the first interface.

Materials The stimulus document for identifying the topic of the search was a title and abstract

taken from the book “Deregulation and Airline Competition, Paris: OECD, 1988.” The abstract contained 103 words, and the Flesch Reading Ease test indicated a score of 25.7, suitable for most undergraduates. The choice of this topic was influenced by Connell’s (1991) dissertation, in which it was demonstrated that this topic presented a number of dif- ficulties for searchers. Accordingly, it seemed that this topic would provide a good test- bed on which to investigate the effects of perceptual speed and type of interface.

The list of subject headings, containing 747 different headings, was created by edit- ing the subject index of the online public access catalog of the University of Illinois. In preparing the list, only headings beginning with the words “Aeronautics,” “Airlines,” or

AERONAUTICS IN CRYWEDOLOOY AERONAUTlCS M EARTH SCtENCES* AERONAUTICS IN EDUCATION AERONAUTCS IN FISHERIES AERONAUTICS IN FISHINS AERONAWICS IN FOREST FIRE CONTROL* AERONAUTICS IN FORESTRY l AERONAUTICS IN GEODESY AERONAUTICS IN QEOLOOY AERONAUTICS IN HUNTING AERONAUTICS IN HYDROMETEOROLOGY AERONAUTICS IN LITERATURE

SPACE BAR = ANOTHER SEARCH : * = EXPAND/CONTRACT

Fig. 3. The expand/contract interface before expansion.


“Deregulation” were selected. This meant that the list was more likely to contain potentially useful subject headings than any operational subject heading list.

Tests used to assess the perceptual speed of participants were drawn from the Kit af Factor-Referenced Cognitive Tests {Ekstrom et al., 1976). The first test was the Number ~~~~ar~on Test, in which participants inspected pairs of multi-digit numbers and indicated whether the two numbers in each pair were the same or different, The second test was the fdeplticaf Pictures Test, in which participants checked which one of five numbered geometrical figures or pictures in a row was identical to the figure given at the left end of the row. In each case, the number of correct answers obtained within a limited time, less a penalty for incorrect answers, was the score recorded for the participant.

Eighty volunteer participants from the student body of the University of Illinois completed the experimenta tasks in this study. These volunteers were obtained by advertising on campus, and were paid $3.00 for their participation, which lasted about half an hour. The demographics of this group of participants indicated that they represented a broad cross-section of the student body. A number of pa~~ipants in this research also partici- pated in Experiment 1. This overlap in the participant body was considered unimportant, because the two experiments used different interfaces, different search topics, and different search tasks. No benefit was derived from participating in the first experiment that would influence performance in the second experiment,

Procedwes Participants first read the stimulus title and abstract. They were instructed in writing,

“Please read carefully the following information about a book. In a few minutes you will be using subject headings to find additional books on the same topic.” Parti~pants then completed the cognitive tests, which acted as an unrelated interpolated task. This ensured that long-term memory and understanding were being used in the experimental task, rather than short-term or rote memory. As explained above, the time required to complete the tests was far too brief to erode long-term gist recall of the stimulus material. The tests were presented in random order, to ensure that learning effects were nullified. Following the

Fig. 4. The expand/contract interface after one expansion.

186 8. ALLEN

tests, which lasted approximately eight minutes, participants were randomly assigned to one of the two experimental systems. Their instructions were as follows:

Index Search. A few minutes ago you read a brief description of a book. Now, assume that you are doing a term paper on that topic for one of your courses. To do your term paper, you want to find more books on the topic. You will be searching a computerized index to find subject headings that you think would lead you to more materials on the topic. Try to find at least two subject headings that you think would be useful.

Participants were then given instructions on how to use the interface to which they had been assigned. No advice was given regarding how best to search for information, but ques- tions about the working of the interface were answered. Participants were also instructed that once they had found the two subject headings required, they could continue to search the interface for additional subject headings. They were told that they should govern the amount of this additional searching activity by their perceptions of how much they normally would search an index of this sort when preparing a term paper. The instruction to “find at least two subject headings” was probably unnecessary. Only three participants found this minimum number, and the average number of headings found was 9.7. After completing the search, participants completed a short questionnaire that determined their demographic characteristics and asked them to assess the difficulty of the task they had just completed.

The hypothesis tested in this experiment was that there would be an interaction between the system design variable (i.e., the two ways of presenting subject headings) and the individual differences variable (i.e., perceptual speed) as influences on search precision. This hypothesis was tested by general linear modeling, a statistical technique that combines the features of ANOVA and linear regression. A model containing the hypothesized effects was tested after the effects of the control variables had been accounted for. Where appropriate, variables were encoded as dummy variables, or transformed using a rank transfor- mation, to ensure validity of the statistical analysis.

Dependent variables. Built into the interfaces were data collection mechanisms that provided the following variables for each search: total time spent on the search, number of search expressions entered, number of subject headings selected, number of clicks of the scroll bar, number of uses of each of the “up arrow,” “down arrow,” “page up,” and “page down” keys, and (where appropriate) the number of clicks of the “expand” and “contract” buttons. From these data it was possible to calculate the number of lines of the subject heading display scanned by each participant while choosing subject headings. Since each subject heading was presented on a separate line, the number of lines scanned was equal to the number of subject headings scanned.

The interfaces both presented two scanning techniques: the “direct manipulation” of the scroll bar by the mouse, and the keyboard arrow keys. Thirty-one participants used mouse only, 23 used keyboard only, and 26 used a combination of both. The performance of these three groups of participants was compared using ANOVA, and it was found that there was no significant difference among the groups in number of lines scanned, number of headings selected, or search time. It appears that scanning technique was irrelevant to search performance. In addition, cross-tabulation showed that these three groups were evenly distributed across the two interfaces used in the experiment.

The interface programs also recorded which headings were selected by participants as being potentially useful for their term papers. From this recorded data it was possible to ascertain which of the headings were selected most frequently. This frequency-of-selection data was used to identify those headings that might be considered particularly relevant to the topic. The number of high-frequency headings selected in each search, divided by the total number of headings selected, was recorded as a rough measure of precision achieved. A list of the frequently selected headings is included as Fig. 5. It should be noted that most of the frequently selected headings appeared toward the bottom of the list. Only one

Cognitive abilities

AERONAUTICS_COMMERCIA-DEREGULATION

AIRLINES--DEREGULATION

AIRLINES-DEREGULATION-IXONOMIC ASPECTS

AIRIJNES-GOvERNMENT POLICY

AIRLINES-GOVERNMENT FOLKY--GREAT BRITAIN

AIRLINES--GOVERNMENT POLICY--UNITED STATES

AIRLINES-LAW AND LEGISLATION

AIRLINES-REGULATION

AIRLINES-SAFFZ’Y REGULATIONS

AIRLINES--UNITED STATES-COMPGITTION

AIRLINES--UNITED STATES-DEREGULATION

DERBGULATION

DERBCIUL4TION--UNITED STATES

DEREGULATION-UNITED STATES-COST EFFECTIVENESS

Fig. 5. Most frequently selected subject headings.

187

(AERONAUTICS_COMMERCIAL-DEREGULATION) was from the top half of the list, appearing online 315. The remainder appeared below line 600. This means that participants scanned through the entire list to find the frequently selected headings. One final dependent variable was the perceived difficulty of finding subject headings reported by participants on the questionnaire.

Independent variables. Because the objective of this research was to investigate the interaction between perceptual speed and the amount of scanning required by the two browse interfaces, independent variables were given by the results on the perceptual speed tests and the interface to which the participants were assigned. Although analysis showed that the two perceptual speed tests achieved moderate reliability (Cronbach’s cx = .69), it seemed likely that the two tests were assessing somewhat different aspects of perceptual speed. Accordingly, they were included in the analysis as separate independent variables.

Control variables. Demographic and knowledge variables were controlled by including them in the analysis. As in the first experiment, academic department, academic level, gender, age, familiarity with the stimulus topic, frequency of use of microcomputers, and frequency of use of libraries were all reported on the questionnaire and were included in the analysis.

Findings There was no significant interaction between results on either of the tests of percep-

tual speed and the interface type used in influencing search precision. This result occurred despite the fact that the interfaces functioned as expected. The expand/contract interface substantially reduced the number of lines scanned to find suitable subject headings, without reducing the number of relevant subject headings found by participants. The average number of lines scanned by participants using the regular interface was 419, whereas the average number of lines scanned by participants using the expand/contract interface was 185. A separate variance t-test showed that this difference was statistically significant, t(46.5) = 3.649, p < .Ol. As indicated in Table 2, however, there was no significant effect on this interface difference on the precision of the results, nor was there an interaction between the interface type and the perceptual speed of the participants. None of the control variables was found to be a significant predictor of precision.

It is clear from this analysis that the hypothesis of an interaction was not supported by the results. Figure 6 outlines the nature of the relationship between perceptual speed and

188 B. ALLEN

Table 2. Predictors of precision

Source* Sum-of-squares DF Mean-square F-ratio p

Perceptual speed 0.163 1 0.163 1.507 0.223 Interface used 0.002 1 0.002 0.019 0.891 Perceptual speed * interface interaction 0.000 1 0.000 0.004 0.947 Error 8.233 16 0.108

precision for each interface. This experiment was based on a pre-test finding of a positive correlation between perceptual speed and precision. It was hypothesized that this relationship could be attributed to a greater facility in scanning lists of subject headings on the part of participants with higher levels of perceptual speed. This hypothesis must be rejected. The fact that there was no significant difference in precision between participants who used an interface that required many headings to be scanned and participants who used an interface that required fewer headings to be scanned seems to demonstrate conclusively that scanning of subject headings is not the locus for this particular effect.

0.7

0.6

r Regular Interface I-l

Fig. 6. Predictors of precision: interface type and perceptual speed.


DISCUSSION AND CONCLUSIONS

A note on reliability and validity The implications of these findings for the design of information technology depend

on the appropriateness of the tests of user characteristics employed here. Some researchers have suggested that cognitive tests such as those used in this investigation are of ques- tionable validity in information retrieval research. Saracevic and Kantor (1988) used cognitive tests in their research, but expressed doubts about the validity of these tests. Their approach was to take the test results at their face value, while recognizing that the validity of the tests continues to be investigated by psychologists. Hockey (1990) noted that ability tests such as those employed in this investigation provide an overall prediction of users’ capabilities, but fail to assess the higher-level skills, such as planning and problem-solv- ing, that are central to human-computer interaction.

Data collected in the experiments reported here, and in the pre-test, provided the basis for a preliminary assessment of the reliability and validity of these cognitive tests in an information retrieval context. To assess the reliability of individual tests, the results from 50 students in the 1991 pre-test were compared with the results on the same tests of 80 students in the present research. Because one of the tests was scored differently in Experiment 2 than in the pre-test, it was possible to make direct comparisons for only three of these tests. For the three tests that were administered and scored identically, means achieved by the two groups were shown by t-tests to be identical (all ps > .l; see Table 3 for details), and the distributions of scores were shown by Kolmogorov-Smirnov tests to be the same (all ps < .05). These comparisons suggest that the tests used in this research were consistently and reliably measuring a characteristic of the population.

It was also possible to compare the associations between the test scores obtained in the pre-test and in the subsequent investigation and information retrieval performance. Again, three cognitive abilities tests were administered and scored identically, and despite different experimental conditions and different search systems, similar correlations between test scores and search performance were found. Results from the two research projects were entered into a single regression, with the research project identified using an explicit (dummy) variable. No significant interaction was found between the predictor variables (the test scores) and the project variable in predicting the outcome variables. This finding suggests that the user attributes that are consistently measured by the cognitive abilities tests are consistently associated with performance in information retrieval tasks. This is evidence for the construct validity of the tests in the information retrieval context (Carmines & Zeller, 1979).

There remains one area of concern about the reliability of the tests. The Kit of Factor- Referenced Cognitive Tests, from which the tests used in this investigation were drawn, pro- vides several marker tests for each cognitive factor or ability. It would be reasonable to expect that these tests would show high levels of consistency, as measured by Cronbach’s (Y. Table 4 summarizes the results achieved by this measure of between-test consistency. These results suggest that in some cases, the tests that purport to be measuring the same cognitive ability may actually be measuring somewhat different abilities. In this circumstance, the

Table 3. Indications of reliability of cognitive tests

Pre-test (n = 50) Followu~ (n = 80) Comoarison

Identical Pictures Test m = 18.8, m = 19.3, t(113.2) = ,438, p > .66 SD = 6.3 SD = 7.1

Nonsense Syllogisms Test m = 9.4, m = 11.7, f(92.4) = 1.627, p > .l SD = 8.4 SD = 7.2

Diagramming Relationships Test m = 80.9, m = 78.1, r(l15.5) = 1.308,p> .19 SD= 11.4 SD = 13.2

190 B. ALLEN

Table 4. Measures of between-test consistency (Cronbach’s 01)

Cognitive ability Pre-test Present research

Logical reasoning .74 .65 Perceptual speed .54 .69

most conservative approach is to treat the tests as separate measures unless they achieve reasonably high between-test reliability. This is the approach adopted in these investigations.

In summary, the results to date provide tentative support for the reliability and validity of the individual tests, although between-test consistency is open to question. More results will enable a more detailed assessment of the validity of cognitive abilities tests in information science research, but these initial results are reason for cautious optimism that such investigations will be considered acceptable by the research community. On this basis, it seems reasonable to use cognitive abilities tests in discussions of the design of information technology.

Conclusions The theoretical model of system usability presented above suggests the following inter-

pretations of the findings of these investigations. The significant interaction between logical reasoning and order of presentation of references is an example of a non-uniform effect of user characteristics. Accordingly, system designers may wish to consider implementing different orders of presentation as user-selected options. An information system that would take advantage of this finding would have two optional forms of output: ranked-order output and standard, non-ranked output. Users with lower levels of logical reasoning ability would enhance their selection of potentially relevant materials if they selected ranked-order output. Users with higher levels of logical reasoning ability would have less preference between the two output formats; either format would work reasonably well for them.

This seems a good example of the way system characteristics and user characteristics combine. On the basis of system performance (for example, response time), we might prefer the unranked order of presentation over the ranked order of presentation. On the basis of user performance, we might prefer the ranked order of presentation over the unranked order, since that order of presentation is preferable for some users, whereas other users are indifferent to order of presentation. It is only by taking into account both system performance and user performance that we have a full picture of system usability, and the sug- gestion that a user-selected option might work best.

The absence of a significant interaction between perceptual speed and the organization of browsable displays suggests that this approach to improved browsable displays has no impact on search precision. This appears to be a case where user characteristics influence system usability in a uniform manner. Selection of which type of browsable display to implement should be relatively straightforward because of this uniform effect.

Unlike the first experiment, which isolated a precise information retrieval setting within which the effects of logical reasoning on retrieval performance operated, the second experiment failed to locate the mechanism that would explain the pre-test finding of a correlation between perceptual speed and precision. In ongoing research into system usability, other design features will be explored to identify the design feature through which this association functions. When that objective is accomplished, it may be possible to specify another area in which a user-selectable option can be incorporated into information technology.

These experiments represent a first attempt to understand the extent to which cognitive abilities of users affect their retrieval performance, and so to illuminate the factors that contribute to system usability. There are many user characteristics that may influence retrieval performance, and only a few of them have been investigated. These user characteristics interact with each other to influence retrieval in ways that are only vaguely understood. As a result, a great deal of additional research into user characteristics is necessary


before it will be possible to predict the usability of any particular information technology or the impact of a specific design feature on the overall performance of a system.

Acknowledgement-This research was funded in part by a grant from the Vice-Chancellor for Academic Affairs, University of Illinois at Urbana-Champaign. The assistance of June Ellis is gratefully acknowledged.

REFERENCES

Allen, B. (1990). Knowledge organization in an information retrieval task. Information Processing & Munuge- ment, 26, 535-542.

Allen, B. (1992). Cognitive differences in end-user searching of a CD-ROM index. In N. Belkin, P. Ingwersen, & A.M. Pejtersen (Eds.), SIGIR 92: Proceedings of the I5th annual infernational ACM SIGIR conference on research and development in information retrieval (pp. 298-309). Baltimore, MD: ACM.

Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Newbury Park: Sage. Connell, T. H. (1991). Librarian subject searching in online catalogs: An exploratory study of knowledge used,

Doctoral thesis. University of Illinois at Urbana-Champaign. Eason, K. D., Ophert, C. W., Novara, F., Bertaggia, N., & Allamanno, N. (1987). The design of usable IT prod-

ucts: The ESPRIT/HUFIT approach. In G. Salvendy (Ed.), Cognitive engineering in the design of human- computer interaction and expert systems (pp. 147-154). Amsterdam: Elsevier.

Ekstrom, R. B., French, J. W., Harman, H. H., & Dermen, D. (1976). Manuaifor kit offactor-referenced cognitive tests. Princeton, NJ: Educational Testing Service.

Fischer, G., & Lemke, A. C. (1988). Constrained design processes: Steps towards convivial computing. In R. Guin- don (Ed.), Cognitive science and its applications for human-computer interaction (pp. l-58). Hillsdale, NJ: Lawrence Erlbaum.

Guilford, J. P. (1967). The nature of human intelligence. New York, NY: McGraw-Hill. Hockey, G. R. J. (1990). Styles, skills and strategies: Cognitive variability and its implications for the role of men-

tal models in HCI. In D. Ackerman & M. J. Tauber (Eds.), Mental models and human-computer interuc- tion I (pp. 113-129). Amsterdam: North-Holland.

Illich, I. (1973). Tools for conviviulity. New York: Harper & Row. Radecki, T. (1988). Probabilistic methods for ranking output documents in conventional Boolean retrieval sys-

tems. Information Processing & Management, 24, 281-302. Saracevic, T., & Kantor, P. (1988). A study of information seeking and retrieving. III. Searchers, searches, and

overlap. Journal of the American Society for Information Science, 39, 197-216.

Cognitive abilities and information system usability

Documents

Transcript of Cognitive abilities and information system usability