Peter-Paul van Maanen (TNO/VU), Lisette de Koning (TNO), Kees van Dongen (TNO) Effects of Task...

Peter-Paul van Maanen (TNO/VU), Lisette de Koning (TNO), Kees van Dongen (TNO)

Effects of Task Performance and Task Complexity on the Validity of Computational Models of Attention

VU, March 23rd, 2009Weekly AI

Contents

• Motivation

• Support system based on cognitive model

• Experimental validation

• Results

• Conclusions

• Further research


Motivation

Trends in naval warfare• More complex tactical situations• More information• Reduced manning• Less experience• Less training

Possible consequence• Errors in allocation of attention

Challenge• Support humans dividing attention


Support system based on cognitive model

Cognitive model of attention• Input: data that is believed to give cues

for human attention allocation• Environmental data• Behavioral data

• Output: estimation of human attention allocation

• Over objects• Over spaces

Advantages• Support adapted to human needs• Support comparable to human support• Support which is appropriately

accepted and trusted



Attention of user (descriptive model) Attention needed (prescriptive model)

Compare:

(Adapt) support

Discrepancy?



• Such support systems are effective iff:

• The validity of the used cognitive models is high enough• Otherwise support becomes unpredictable and most probably

ineffective

• We need to know the effect of different factors on the validity of models, e.g.:

• Task performance:• Can we expect differences in validity with respect to good and

poor performers? • If so, does this require different models/parameter settings?

• Task complexity:• How about differences in validity with respect to complex and

easy instances of a scenario?• Different models/parameter settings?

• Different model types• How do different models/parameter settings themselves affect

validity?


Experimental validation: Task

• Goal: (1) Select 5 most threatening contacts(2) Monitor gauge

• Criteria: (1) Speed, heading, distance, in sea-lane? (2) In red?

Primary: Tactical picture compilationSecondary: Gauge


Experimental validation: Independent variables

• Task performance (2(3)):• Selected good performers (g) (1/2 of participants)• Selected poor performers (p) (other 1/2 of participants)• (Overall)

• Task complexity (2(3)):• Complex scenario (c)• Simple scenario (s)• (Overall)

• Descriptive model type (3):• Gaze-based model (G)• Task-based model (T)• Combined model (C)

G, T, C c s (overall)

g ? ? ?

p ? ? ?

(overall) ? ? ?

2(3) X 2(3) X 3 mixed design


• Simple scenario (s) (10 sections), e.g.:

Experimental validation: Task complexity


• Complex scenario (c) (10 sections), e.g.:

Experimental validation: Task complexity


• Output of all descriptive model types (G, T, C) is as follows:

Experimental validation: Descriptive model type



• Gaze-based model (G):• Eye gaze (Just & Carpenter, 1976; Salvucci, 2000)• Distance between fixation point and contacts• Dwelling time• Use of eye tracker


• Task-based model (T):• Goal directed search (Treisman & Gelade, 1980)• Information of task environment (Speed, heading, distance,

in sea-lane?)• Calculates a threat value of contacts



• Combined model (C):• Both types of information


+


Experimental validation: Dependent variables

= model estimation= human estimation

Hit

Hit FA

Miss

CR

CR

Confusionmatrix


• Receiver-Operator Characteristic (ROC) analysis useful for:• evaluation, validation,• selection,• construction, and OF:• improvement

Experimental validation: Dependent variables

• models,• classifiers,• rankers, etc.


• Construct confusion matrix for each• Participant (40)• Scenario type (s, c, overall)• Descriptive model type (G, T, C)• Decision threshold (1000)

• Plot ROC curves (40 X 3 X 3 = 360, using 360.000 matrices)

• Calculate Area Under the Curve (AUC) for each ROC curve (1 = good, 0 = poor)

• Performance = average over AUCs per condition (3 X 3 X 3 = 27)

• Calculate statistical significance of differences between conditions based on hypotheses (i.e. ANOVA and (un)paired, one-tailed t-tests)

Experimental validation: Procedure


% CALCULATE AUC PER SECTION PER MODEL (ONE PARTICIPANT)for m = 1:m_steps % modeltype (G, T, C) for s = 1:s_steps % scenario type (s, c, overall) for t = 2:t_steps % thresholdstep (1000) A = FA(s,m,t); B = FA(s,m,t-1); C = HIT(s,m,t); D = HIT(s,m,t-1);

% AUC using Trapezoidal Rule: AUC(s,m) = AUC(s,m) + (A - B)*(C + D) / 2;

end endend

Experimental validation: Procedure


Experimental validation: Hypotheses

• Task complexity• H1: The validity of all three models is higher in a simple than in a

complex task.• H2: For both complex and simple tasks, the validity of the combined

model is higher than both the task- and the gaze-based models.• H3: The difference in validity between the combined model and the

task- and gaze-based model is higher in a complex than in a simple task.

• Task performance• H4: The validity of the combined and the task-based model is higher

for good performers than for poor performers.• H5: For both good and poor performers, the validity of the combined

model is higher than both the task- and the gaze-based models.• Descriptive model type

• H6: The validity of the combined model is higher than both the task- and the gaze-based models.


Results: Task complexity

(marg.) sign.sign.


Results: Task performance

(marg.) sign.


Results: Descriptive model type

sign.


Results: Hypotheses revisited

• Task complexity• H1: The validity of all three models is higher in a simple than in a

complex task.• H2: For both complex and simple tasks, the validity of the combined

model is higher than both the task- and the gaze-based models.• H3: The difference in validity between the combined model and the

task- and gaze-based model is higher in a complex than in a simple task.

• Task performance• H4: The validity of the combined and the task-based model is higher

for good performers than for poor performers.• H5: For both good and poor performers, the validity of the combined

model is higher than both the task- and the gaze-based models.• Descriptive model type

• H6: The validity of the combined model is higher than both the task- and the gaze-based models.


Conclusions

• Combination of gaze- and task-based information as input can increase the predictive power of models of attention independent of the task complexity and task performance

• Less increase of predictive power in simple tasks:• More complex tasks need more complex models

• Several expected effects of task performance and task complexity on model validity not found

• Possible explanations:• Indeed no effect• Difference in complexity too small (albeit sign. diff.)• Difference in performance too small (albeit sign. diff.)• A more complex model is needed• More participants• Task unsuitable• …


Further research

• Higher performance of model possible?

• By adding information• By augmenting the model• By parameter tuning i.c.w. ROC-

analysis• By using knowledge on

performance of model

• Application in support system• Validity of model high enough?• Better performance of human-

system team?• Appropriate trust and acceptance?

• Application in different domains and tasks (e.g. decision support, training)

Peter-Paul van Maanen (TNO/VU), Lisette de Koning (TNO), Kees van Dongen (TNO) Effects of Task...

Documents

Transcript of Peter-Paul van Maanen (TNO/VU), Lisette de Koning (TNO), Kees van Dongen (TNO) Effects of Task...