Post on 27-Dec-2015
A collaborative venture (2003 – 2005)
Drexel iSchoolDr Susan WiedenbeckJim Waters
Rutgers, Camden, Computer ScienceJean-Camille Birget
Polytechnic of Brooklyn, Computer ScienceNasir Memon Alex Brodskiy (System Developer)
The SolutionCued recall not pure recallUse some aspects of an image for password
Many prior attemptsChoose from a set of faces
Low EntropyYoung males always chose attractive females
Point recorded by system (X,Y) is center of squareSquare shows system tolerance – square is 20 x 20 pixels excl lineOrder is part of password
Y
X
Was it any good ?How many unique passwords could be created
from a given X by Y canvasSome very clever calculations by the CS folksLOTS ! 7.2 X 1012 with our small picture
The iSchool (initial) FocusWas the system effective from a user point of viewHow did it compare against traditional alphanumeric
passwords
Objective performance data Time Errors Failure rate Undo – Clear and Show
The passwords themselvesUser perceptions (Likert Scales)
Parametric and nonparametric statistics
Time - needed to create the valid password (creation phase)- spent creating each bad password (invalid attempt)
- needed to input password correctly 10 times (practice)- spent on each and every attempt (good or bad)
- needed to input password correctly once after delay- spent on each and every attempt (good or bad)
ErrorsNumber of errors (invalid password entry attempts)- made when creating a valid password- made when input password correctly 10 times (practice)- made when input password correctly once after delay
Magnitude of errors per click
Required click
Actual click
Failure Rate4 invalid attempts at password input (after delay) = FAIL !
But kept recording attempts Some subjects had over 30)
Even after repeatedly viewing their passwords
ALSOHow many times subject clicked Undo on a password pointHow many times subject cleared passwordHow many times subject viewed password
The Basic ExperimentGraphical group vs. alphanumeric group20 subjects in each groupRandomly assigned to alphanumeric or graphical
Eight character alphanumeric password5 pass points
$10 per subjectSome subjects waived feeOne subject received a “Heidegger was a Nazi” T-
Shirt instead
The Basic ExperimentWhat to record ?Creation and practiceTime, Errors, Failures, Undos, Clears and Shows
Decay of passwords over time (retention phase)Time, Errors, Failures, Undos, Clears and ShowsShort-term after distraction task (questionnaire Q1)Medium term after 1 weekLong term retention after a further 5 weeks
User PerceptionsEmbedded online questionnaire (Q1 and Q2)33 Likert Scale questionsFive questions negative on the left (recoded)Q2 after last session = Q1 plus some open questions
Subjects
Experienced computer users40 members of a North East American
university communityStudents, staff and facultyConvenience sample
The ExperimentDemonstration PhaseVerbal and Visual explanation of the purpose of the system
and experiment protocol (magic lantern show)Invitation to participate and earn $10
Subjects completed IRB approved (Human Subjects) Consent Forms
The ExperimentCreation PhaseSubjects created password using randomly assigned systemPractice PhaseSubjects practiced entering password correctlyPractice until password entered correctly 10 times (in total)On-screen count of how many correct and incorrect entriesNo limit to number of attemptsAfter 4 failures could view password
The ExperimentDistraction Phase (after completing practice)Subjects filled in online questionnaire Q1
Retention PhaseEnter password correctly once (fail after 4
errors !)
R1 – immediately after Distraction phaseR2 – one week laterR3 – six weeks after R1 – plus complete Q2
Results: Creation
Mode Mean (SD)
Total attempts to create Alphanumeric 1.70 (0.18)
Graphical 1.10 (0.07)
Total time to create (seconds)
Alphanumeric 81.10 (36.50)
Graphical 64.03 (21.93)
The graphical group took significantly fewer attempts: t(38)=3.13, p<.005 to create a valid password
T-tests
Used SPSS version 10.0
For the first question there was a significant difference (U=127.00, p<.043)
Mode Mean (SD)
I did not have much trouble thinking up a password
Alphanumeric 3.30 (1.59)
Graphical 2.35 (1.57)
It did not take me long to think up a password
Alphanumeric 3.15 (1.63)
Graphical 2.60 (1.42)
Likert Scale questions 1 is strongly agree 7 is strongly disagree
Nonparametric Mann-Whitney U test
Learning PhaseMode Mean (SD)
Number of incorrect submissions
Alphanumeric 0.40 (0.68)Graphical 4.80 (7.16)
Total practice time (seconds)
Alphanumeric 66.08 (4.92)Graphical 171.89 (24.46)
The two measures were analyzed using t-tests. Significant differences in favor of the alphanumeric group in both cases:
Number of incorrect inputs t(38)=-.2.73, p<.013; Total practice time t(38)=-4.24, p<.0001.
Variability in practice phase
0
2
4
6
8
10
12
14
16
0 1 2 3 6 9 17 18 20
Number of incorrect password submissions
Nu
mb
er o
f p
arti
cip
ants
Alphanumeric
Graphical
Retention PhaseMode Mean R1
(SD)Mean R2(SD)
Mean R3(SD)
Number of incorrect submissions
Alphanumeric 0.25 (0.79)
2.20 (2.73)
1.75 (2.47)
Graphical 1.55 (1.57)
2.75 (3.88)
1.50 (2.80)
Time for correct submission (seconds)
Alphanumeric 5.23 (1.66)
9.42 (3.70)
9.24 (3.72)
Graphical 8.78 (4.40)
24.25 (15.21)
19.38 (17.57)
R1 – immediately after Distraction phaseR2 – one week laterR3 – six weeks after R1Effect of mode not significant (ANOVA)
Concept proved – what next ? What impacts performance ?Can we change system design to alter performance
Change picture (Expt 2)Change tolerance around selected point (Expt 3)
Interference (Expts 4 thru 6)Nobody has just one passwordWill 2 passwords interfere with each other
Experiment 2: More SubjectsRecruited 5 entire MS and BS classes from the
iSchool3 Different picturesRe-used data from Graphical subjects from
Experiment 1 as baseline
Experiment 2Total new subjects = 71 at $10 a pop !Randomly assigned Mural, Tea and Map picturesTolerance as per experiment 1 (20 x 20 pixel square)Similar routine
DemonstrationCreationPractice Distraction (questionnaire Q1)R1 – at end of sessionR2 – one week later – plus questionnaire 2
No later retention tests
Results
Image
Pool Mural Tea Map
Number incorrectsubmissions
0.55(1.57)
0.17(0.51)
0.14(0.47)
0.39(1.31)
Time for incorrectsubmissions (sec)
7.91(28.20)
1.18(3.37)
1.26(4.40)
3.08(9.25)
Time for correctsubmissions (sec)
8.61(3.27)
7.52(3.62)
7.43(2.32)
7.86(2.03)
Means (SD) in R1 first retention trial
No Significant differences at all !
Retention 1 week later
A two-way mixed model ANOVA was used for the analyses withimage as the between-subjects factor and retention trial (R1/R2)as the within-subjects factor.
There was a marginal effect of image, F(3,79)=2.55, p<.062. Tukey’s HSD indicated that performance of the MAP group was lower that the TEA group
Experiment 3 ToleranceConditionsBase group (20 x 20 pixels) - group from Experiment 1Harder (14 x 14 pixels) Hardest (10 x 10 pixels)32 Undergraduate iSchool students (another $320 ) Demonstration PhaseCreation PhasePractice PhaseDistraction phase – Questionnaire Q1 Retention Phase
R1 after distraction Q1 - R2 one week later
Experiment 3 ToleranceANOVA – one wayThere were no significant differences in the
number of attempts or the time to create a valid password between groups
No significant differences between no of attempts or time required during practice phase (10 passwords input correctly)
No significant differences between groups during 1st retention phase (after questionnaire)
Decay !Smallest tolerance group (10 x 10) made
significantly more errors during the 2nd retention phase (one week later)
In the 10 х 10 group 7 of 16 participants (43.75 percent) failed to log in,
In the 14 х 14 group only 2 of 16 failed (12.5 percent).
There was a significant difference on failure between the groups t(30)=2.63, p<.015).
Tolerance ExperimentSmallest tolerance group rated system
significantly worse on three perceptual measuresNon-parametric Mann-Whitney U-test
It did not take me long to input my password correctly 10 Times
Inputting my password was easy.I think that the password system was pleasant to
use.
Interference ExperimentsWill having to create, practice and remember 2
different passwords be more difficult than just one?Is it harder to use two different pictures or the same
picture?What about 2 alphanumeric passwords?
Experiment 4 (2 passwords 1 picture) ($420)Experiment 5 (2 passwords 2 pictures) ($450)Experiment 6 (2 alphanumeric passwords) ($450)
Protocol same for eachDemonstration and consent form completionCreate 1st password for HOME systemPractice entering HOME password 10 timesCreate 2nd password for OFFICE systemPractice entering OFFICE password 10 timesDistraction task (Questionnaire Q1)Retention PhaseR1: Immediately after questionnaire
Enter each password (random order) correctly onceR2: One week later
Enter each password (random order) correctly once
So……..?All groups better with practiceGraphical group benefitted more from practice
Sliced 57 seconds off average practice time for 2nd password
No other significant differencesNo effect of 2 graphical passwords vs. 1 graphical passwordPassword 2 retention same time and # errors as
password 1No effect of 2 pictures vs. 1 picture
IssuesJuggernaut approach – recorded all conceivable data for
every click for every subject for every trialX,Y location for every single password attempt clickHow many pixels away from the correct point each
click wasUp to 50 slots for practice trials for each passwordUp to 20 slots for retention tests for each passwordOnline questionnaire stored in database
IssuesExceeded capability of SPSS v10.0 (Student)Over 450 variables and over 270 subjectsSPSS output alone in excess of 1000 pagesMuch data of limited value (Q1 vs. Q2) Hard to extract meaningful findings from morass of
resultsMany findings may be significant by chanceRan out of time and moneyUnanswered questions
Unanswered QuestionsAnalysis of nature of errors
Order errors and memory failure errorsNeeded to analyze each attempt one by one
Manually using recorded X,Y coordinatesProximal points confused this
What makes a good password picture ?Memory strategies (geometry and semantics)