Guideline Aggregation: Web Accessibility

The problem Barrier walkthrough Experimental plan Results Discussion and conclusions

Guideline Aggregation: Web AccessibilityEvaluation for Older Users

Giorgio Brajnik (1), Yeliz Yesilada (2), Simon Harper (2)

(1) Dip. di Matematica e InformaticaUniversity of Udine, Italy

www.dimi.uniud.it/giorgio(2)School of Computer Science

University of ManchesterManchester, UK

W4A 2009

c© Brajnik, Yesilada, Harper Guideline Aggregation: Web Accessibility Evaluation for Older Users

www.dimi.uniud.it/giorgio


The problem with analytic evaluation methods

I conformance reviews (eg. wrt WCAG20) arenon-contextualized, not specific

I evaluators are not guided into assessing consequences ofviolations

I there’s no reliable way to rate severity of violations

Our approach

1. Provide context to evaluators: focus on specific barriersand user categories (eg. blind, motor impaired, cognitivelyimpaired, low vision, ...)

2. Provide more formalized ways to rate severity



Multiple impairments

How to cope with multiple impairments and combinatorialexplosion?

I eg. older people

I Dynamic Aggregation:

1. do the evaluation for primitive categories2. and then aggregate3. eg. barriers for older people = barriers for low vision∪ those for motor impaired ∪ ...



Barrier Walkthrough

1. Analytic method; similar to "heuristic walkthrough"2. Based on barriers (ako "vulnerability points")3. Failure modes are contextualized within usage scenarios4. This helps evaluators in rating severity = F(impact,

persistence) in {1,2,3}5. See http://www.dimi.uniud.it/giorgio/

projects/bw/bw.html

(Brajnik, ICCHP 2006; ASSETS 2007)


http://www.dimi.uniud.it/giorgio/projects/bw/bw.html

http://www.dimi.uniud.it/giorgio/projects/bw/bw.html


Example of a barrier

Rich images lacking equivalent text

I Users: Blind persons using a screen readerI Cause: The page contains some image that provides

information (e.g. a diagram, histogram, picture, drawing,graph) but only in a graphical format; no equivalent textualdescription appears in the page.

I Failure mode: The user, even if s/he perceives that thereis an important image, has no way to get the information itcontains. In addition s/he spends time and effort trying tofind out where in the page or site that information is buried.



Experiment

GoalTo explore which conclusions are invariant wrt aggregation.

I Do certain differences among sites disappear?I How does reliability change?I How does correctness of evaluations change?I How does the difference b/w expert/non-expert change?



Plan

Mixed design experiment

I 19 experts + 51 non-experts applying BW; 61 barrier types(within-subj)

I 2 primitive user categories: low vision, motor impaired(within-subj)

I 1 aggregated category: older adults = union of individualbarriers found for primitive categories

I 4 pages (1 page/subject, between-subj): IMDB.com,Facebook.com, novascotiaquilts.com, Sam’s Chop House



Spreadsheet



True Barriers Types

Correct ratingsthose where the majority of experts agreed on their severity

Results:I Experts: 27 out of 61 barrier types ("ambiguous links",

"functional images w/o text", "inflexible layout", "missinginternal links", ...)

I Non-experts: 24 out of those 27 (missed: "forms w/olabels", "moving content", "no css support")

I Certain barriers are specific for specific user categories



Reliability

Reproducibilitygiven (barrier type, user group, page)rep = 1− sd

M if positive; 1 if M = 0; 0 otherwisewhere M, sd are mean/std.dev of weighted severity

Agreementgiven (user group, page)on all barrier types compute the ICC (Intraclass CorrelationCoefficient – relative and absolute consistency)



Reproducibility



Mean weighted severities



Correctness

I Error rate E = IC+I

I Accuracy = % ofreported barriers thatare correct

I Sensitivity = % ofcorrect barriers that arereported

I F.measure = 2A·SA+S

Ratings:



Error rates



F-measure



Invariant properties

1. Aggregation does not worsen the problem of missedbarriers

2. Reliability: experts are consistently more reliable; samepattern across pages

3. Severities: experts are more judgmental; ranks of pagesdo not change

4. Quality: error rates maintain a similar difference (expert vsnon-experts)

5. Quality: F-measure conf. intervals shrink; they keep samerelationship



Conclusions

1. Aggregation seems to work: it enables contextualizedevaluations and leads to results that are potentially valid

2. It could be extended to cope with degrees of impairment

Limitations

1. We did not validate our conclusions against anindependent assessment

2. We don’t know if the same conclusions would hold for anyset of primitive user categories

Questions?



Evaluation framework

I based on reliability (reproducibility + agreement),correctness (error rate, accuracy, sensitivity andF-measure)

I is viableI is discriminatory

It can be used to assess pros and cons of an evaluationmethod.


Guideline Aggregation: Web Accessibility

Technology

Transcript of Guideline Aggregation: Web Accessibility