P.O. Box 80015 TNO report TNO 2015 R10160 | Final report ...
Peter-Paul van Maanen (TNO/VU), Lisette de Koning (TNO), Kees van Dongen (TNO) Effects of Task...
-
Upload
ginger-chambers -
Category
Documents
-
view
217 -
download
0
Transcript of Peter-Paul van Maanen (TNO/VU), Lisette de Koning (TNO), Kees van Dongen (TNO) Effects of Task...
Peter-Paul van Maanen (TNO/VU), Lisette de Koning (TNO), Kees van Dongen (TNO)
Effects of Task Performance and Task Complexity on the Validity of Computational Models of Attention
VU, March 23rd, 2009Weekly AI
Contents
• Motivation
• Support system based on cognitive model
• Experimental validation
• Results
• Conclusions
• Further research
VU, March 23rd, 2009Weekly AI
Motivation
Trends in naval warfare• More complex tactical situations• More information• Reduced manning• Less experience• Less training
Possible consequence• Errors in allocation of attention
Challenge• Support humans dividing attention
VU, March 23rd, 2009Weekly AI
Support system based on cognitive model
Cognitive model of attention• Input: data that is believed to give cues
for human attention allocation• Environmental data• Behavioral data
• Output: estimation of human attention allocation
• Over objects• Over spaces
Advantages• Support adapted to human needs• Support comparable to human support• Support which is appropriately
accepted and trusted
VU, March 23rd, 2009Weekly AI
Support system based on cognitive model
Attention of user (descriptive model) Attention needed (prescriptive model)
Compare:
(Adapt) support
Discrepancy?
VU, March 23rd, 2009Weekly AI
Support system based on cognitive model
• Such support systems are effective iff:
• The validity of the used cognitive models is high enough• Otherwise support becomes unpredictable and most probably
ineffective
• We need to know the effect of different factors on the validity of models, e.g.:
• Task performance:• Can we expect differences in validity with respect to good and
poor performers? • If so, does this require different models/parameter settings?
• Task complexity:• How about differences in validity with respect to complex and
easy instances of a scenario?• Different models/parameter settings?
• Different model types• How do different models/parameter settings themselves affect
validity?
VU, March 23rd, 2009Weekly AI
Experimental validation: Task
• Goal: (1) Select 5 most threatening contacts(2) Monitor gauge
• Criteria: (1) Speed, heading, distance, in sea-lane? (2) In red?
Primary: Tactical picture compilationSecondary: Gauge
VU, March 23rd, 2009Weekly AI
Experimental validation: Independent variables
• Task performance (2(3)):• Selected good performers (g) (1/2 of participants)• Selected poor performers (p) (other 1/2 of participants)• (Overall)
• Task complexity (2(3)):• Complex scenario (c)• Simple scenario (s)• (Overall)
• Descriptive model type (3):• Gaze-based model (G)• Task-based model (T)• Combined model (C)
G, T, C c s (overall)
g ? ? ?
p ? ? ?
(overall) ? ? ?
2(3) X 2(3) X 3 mixed design
VU, March 23rd, 2009Weekly AI
• Simple scenario (s) (10 sections), e.g.:
Experimental validation: Task complexity
VU, March 23rd, 2009Weekly AI
• Complex scenario (c) (10 sections), e.g.:
Experimental validation: Task complexity
VU, March 23rd, 2009Weekly AI
• Output of all descriptive model types (G, T, C) is as follows:
Experimental validation: Descriptive model type
VU, March 23rd, 2009Weekly AI
Experimental validation: Descriptive model type
• Gaze-based model (G):• Eye gaze (Just & Carpenter, 1976; Salvucci, 2000)• Distance between fixation point and contacts• Dwelling time• Use of eye tracker
VU, March 23rd, 2009Weekly AI
• Task-based model (T):• Goal directed search (Treisman & Gelade, 1980)• Information of task environment (Speed, heading, distance,
in sea-lane?)• Calculates a threat value of contacts
Experimental validation: Descriptive model type
VU, March 23rd, 2009Weekly AI
• Combined model (C):• Both types of information
Experimental validation: Descriptive model type
+
VU, March 23rd, 2009Weekly AI
Experimental validation: Dependent variables
= model estimation= human estimation
Hit
Hit FA
Miss
CR
CR
Confusionmatrix
VU, March 23rd, 2009Weekly AI
• Receiver-Operator Characteristic (ROC) analysis useful for:• evaluation, validation,• selection,• construction, and OF:• improvement
Experimental validation: Dependent variables
• models,• classifiers,• rankers, etc.
VU, March 23rd, 2009Weekly AI
• Construct confusion matrix for each• Participant (40)• Scenario type (s, c, overall)• Descriptive model type (G, T, C)• Decision threshold (1000)
• Plot ROC curves (40 X 3 X 3 = 360, using 360.000 matrices)
• Calculate Area Under the Curve (AUC) for each ROC curve (1 = good, 0 = poor)
• Performance = average over AUCs per condition (3 X 3 X 3 = 27)
• Calculate statistical significance of differences between conditions based on hypotheses (i.e. ANOVA and (un)paired, one-tailed t-tests)
Experimental validation: Procedure
VU, March 23rd, 2009Weekly AI
VU, March 23rd, 2009Weekly AI
% CALCULATE AUC PER SECTION PER MODEL (ONE PARTICIPANT)for m = 1:m_steps % modeltype (G, T, C) for s = 1:s_steps % scenario type (s, c, overall) for t = 2:t_steps % thresholdstep (1000) A = FA(s,m,t); B = FA(s,m,t-1); C = HIT(s,m,t); D = HIT(s,m,t-1);
% AUC using Trapezoidal Rule: AUC(s,m) = AUC(s,m) + (A - B)*(C + D) / 2;
end endend
Experimental validation: Procedure
VU, March 23rd, 2009Weekly AI
Experimental validation: Hypotheses
• Task complexity• H1: The validity of all three models is higher in a simple than in a
complex task.• H2: For both complex and simple tasks, the validity of the combined
model is higher than both the task- and the gaze-based models.• H3: The difference in validity between the combined model and the
task- and gaze-based model is higher in a complex than in a simple task.
• Task performance• H4: The validity of the combined and the task-based model is higher
for good performers than for poor performers.• H5: For both good and poor performers, the validity of the combined
model is higher than both the task- and the gaze-based models.• Descriptive model type
• H6: The validity of the combined model is higher than both the task- and the gaze-based models.
VU, March 23rd, 2009Weekly AI
Results: Task complexity
(marg.) sign.sign.
VU, March 23rd, 2009Weekly AI
Results: Task performance
(marg.) sign.
VU, March 23rd, 2009Weekly AI
Results: Descriptive model type
sign.
VU, March 23rd, 2009Weekly AI
Results: Hypotheses revisited
• Task complexity• H1: The validity of all three models is higher in a simple than in a
complex task.• H2: For both complex and simple tasks, the validity of the combined
model is higher than both the task- and the gaze-based models.• H3: The difference in validity between the combined model and the
task- and gaze-based model is higher in a complex than in a simple task.
• Task performance• H4: The validity of the combined and the task-based model is higher
for good performers than for poor performers.• H5: For both good and poor performers, the validity of the combined
model is higher than both the task- and the gaze-based models.• Descriptive model type
• H6: The validity of the combined model is higher than both the task- and the gaze-based models.
VU, March 23rd, 2009Weekly AI
Conclusions
• Combination of gaze- and task-based information as input can increase the predictive power of models of attention independent of the task complexity and task performance
• Less increase of predictive power in simple tasks:• More complex tasks need more complex models
• Several expected effects of task performance and task complexity on model validity not found
• Possible explanations:• Indeed no effect• Difference in complexity too small (albeit sign. diff.)• Difference in performance too small (albeit sign. diff.)• A more complex model is needed• More participants• Task unsuitable• …
VU, March 23rd, 2009Weekly AI
Further research
• Higher performance of model possible?
• By adding information• By augmenting the model• By parameter tuning i.c.w. ROC-
analysis• By using knowledge on
performance of model
• Application in support system• Validity of model high enough?• Better performance of human-
system team?• Appropriate trust and acceptance?
• Application in different domains and tasks (e.g. decision support, training)