Jim Waters Cabrini College Dept of IS&T. A Graphical password system – Hurrah !

47
Experimental Design …a personal perspective Jim Waters Cabrini College Dept of IS&T

Transcript of Jim Waters Cabrini College Dept of IS&T. A Graphical password system – Hurrah !

Experimental Design…a personal perspective

Jim Waters

Cabrini College

Dept of IS&T

A Graphical password system – Hurrah !

A collaborative venture (2003 – 2005)

Drexel iSchoolDr Susan WiedenbeckJim Waters

Rutgers, Camden, Computer ScienceJean-Camille Birget

Polytechnic of Brooklyn, Computer ScienceNasir Memon Alex Brodskiy (System Developer)

The Password ProblemConflicting requirements

Easy to rememberHard to guess

Hard to crack

The SolutionCued recall not pure recallUse some aspects of an image for password

Many prior attemptsChoose from a set of faces

Low EntropyYoung males always chose attractive females

Our SolutionChoose a meaningful (rich) pictureSelect some “points” in a picture as

password

Point recorded by system (X,Y) is center of squareSquare shows system tolerance – square is 20 x 20 pixels excl lineOrder is part of password

Y

X

Was it any good ?How many unique passwords could be created

from a given X by Y canvasSome very clever calculations by the CS folksLOTS ! 7.2 X 1012 with our small picture

The iSchool (initial) FocusWas the system effective from a user point of viewHow did it compare against traditional alphanumeric

passwords

Objective performance data Time Errors Failure rate Undo – Clear and Show

The passwords themselvesUser perceptions (Likert Scales)

Parametric and nonparametric statistics

Time - needed to create the valid password (creation phase)- spent creating each bad password (invalid attempt)

- needed to input password correctly 10 times (practice)- spent on each and every attempt (good or bad)

- needed to input password correctly once after delay- spent on each and every attempt (good or bad)

ErrorsNumber of errors (invalid password entry attempts)- made when creating a valid password- made when input password correctly 10 times (practice)- made when input password correctly once after delay

Magnitude of errors per click

Required click

Actual click

Failure Rate4 invalid attempts at password input (after delay) = FAIL !

But kept recording attempts Some subjects had over 30)

Even after repeatedly viewing their passwords

ALSOHow many times subject clicked Undo on a password pointHow many times subject cleared passwordHow many times subject viewed password

The Basic ExperimentGraphical group vs. alphanumeric group20 subjects in each groupRandomly assigned to alphanumeric or graphical

Eight character alphanumeric password5 pass points

$10 per subjectSome subjects waived feeOne subject received a “Heidegger was a Nazi” T-

Shirt instead

The Basic ExperimentWhat to record ?Creation and practiceTime, Errors, Failures, Undos, Clears and Shows

Decay of passwords over time (retention phase)Time, Errors, Failures, Undos, Clears and ShowsShort-term after distraction task (questionnaire Q1)Medium term after 1 weekLong term retention after a further 5 weeks

User PerceptionsEmbedded online questionnaire (Q1 and Q2)33 Likert Scale questionsFive questions negative on the left (recoded)Q2 after last session = Q1 plus some open questions

Subjects

Experienced computer users40 members of a North East American

university communityStudents, staff and facultyConvenience sample

The ExperimentDemonstration PhaseVerbal and Visual explanation of the purpose of the system

and experiment protocol (magic lantern show)Invitation to participate and earn $10

Subjects completed IRB approved (Human Subjects) Consent Forms

The ExperimentCreation PhaseSubjects created password using randomly assigned systemPractice PhaseSubjects practiced entering password correctlyPractice until password entered correctly 10 times (in total)On-screen count of how many correct and incorrect entriesNo limit to number of attemptsAfter 4 failures could view password

The ExperimentDistraction Phase (after completing practice)Subjects filled in online questionnaire Q1

Retention PhaseEnter password correctly once (fail after 4

errors !)

R1 – immediately after Distraction phaseR2 – one week laterR3 – six weeks after R1 – plus complete Q2

Results: Creation

Mode Mean (SD)

Total attempts to create Alphanumeric 1.70 (0.18)

Graphical 1.10 (0.07)

Total time to create (seconds)

Alphanumeric 81.10 (36.50)

Graphical 64.03 (21.93)

The graphical group took significantly fewer attempts: t(38)=3.13, p<.005 to create a valid password

T-tests

Used SPSS version 10.0

For the first question there was a significant difference (U=127.00, p<.043)

Mode Mean (SD)

I did not have much trouble thinking up a password

Alphanumeric 3.30 (1.59)

Graphical 2.35 (1.57)

It did not take me long to think up a password

Alphanumeric 3.15 (1.63)

Graphical 2.60 (1.42)

Likert Scale questions 1 is strongly agree 7 is strongly disagree

Nonparametric Mann-Whitney U test

Learning PhaseMode Mean (SD)

Number of incorrect submissions

Alphanumeric 0.40 (0.68)Graphical 4.80 (7.16)

Total practice time (seconds)

Alphanumeric 66.08 (4.92)Graphical 171.89 (24.46)

The two measures were analyzed using t-tests. Significant differences in favor of the alphanumeric group in both cases:

Number of incorrect inputs t(38)=-.2.73, p<.013; Total practice time t(38)=-4.24, p<.0001.

Variability in practice phase

0

2

4

6

8

10

12

14

16

0 1 2 3 6 9 17 18 20

Number of incorrect password submissions

Nu

mb

er o

f p

arti

cip

ants

Alphanumeric

Graphical

Retention PhaseMode Mean R1

(SD)Mean R2(SD)

Mean R3(SD)

Number of incorrect submissions

Alphanumeric 0.25 (0.79)

2.20 (2.73)

1.75 (2.47)

Graphical 1.55 (1.57)

2.75 (3.88)

1.50 (2.80)

Time for correct submission (seconds)

Alphanumeric 5.23 (1.66)

9.42 (3.70)

9.24 (3.72)

Graphical 8.78 (4.40)

24.25 (15.21)

19.38 (17.57)

R1 – immediately after Distraction phaseR2 – one week laterR3 – six weeks after R1Effect of mode not significant (ANOVA)

Concept proved – what next ? What impacts performance ?Can we change system design to alter performance

Change picture (Expt 2)Change tolerance around selected point (Expt 3)

Interference (Expts 4 thru 6)Nobody has just one passwordWill 2 passwords interfere with each other

Experiment 2: More SubjectsRecruited 5 entire MS and BS classes from the

iSchool3 Different picturesRe-used data from Graphical subjects from

Experiment 1 as baseline

Worth a 1000 words ?

Mural

Tea

Map

Pool (baseline)

Experiment 2Total new subjects = 71 at $10 a pop !Randomly assigned Mural, Tea and Map picturesTolerance as per experiment 1 (20 x 20 pixel square)Similar routine

DemonstrationCreationPractice Distraction (questionnaire Q1)R1 – at end of sessionR2 – one week later – plus questionnaire 2

No later retention tests

Results

Image

Pool Mural Tea Map

Number incorrectsubmissions

0.55(1.57)

0.17(0.51)

0.14(0.47)

0.39(1.31)

Time for incorrectsubmissions (sec)

7.91(28.20)

1.18(3.37)

1.26(4.40)

3.08(9.25)

Time for correctsubmissions (sec)

8.61(3.27)

7.52(3.62)

7.43(2.32)

7.86(2.03)

Means (SD) in R1 first retention trial

No Significant differences at all !

Retention 1 week later

A two-way mixed model ANOVA was used for the analyses withimage as the between-subjects factor and retention trial (R1/R2)as the within-subjects factor.

There was a marginal effect of image, F(3,79)=2.55, p<.062. Tukey’s HSD indicated that performance of the MAP group was lower that the TEA group

Experiment 3 ToleranceConditionsBase group (20 x 20 pixels) - group from Experiment 1Harder (14 x 14 pixels) Hardest (10 x 10 pixels)32 Undergraduate iSchool students (another $320 ) Demonstration PhaseCreation PhasePractice PhaseDistraction phase – Questionnaire Q1 Retention Phase

R1 after distraction Q1 - R2 one week later

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

Experiment 3 ToleranceANOVA – one wayThere were no significant differences in the

number of attempts or the time to create a valid password between groups

No significant differences between no of attempts or time required during practice phase (10 passwords input correctly)

No significant differences between groups during 1st retention phase (after questionnaire)

Decay !Smallest tolerance group (10 x 10) made

significantly more errors during the 2nd retention phase (one week later)

In the 10 х 10 group 7 of 16 participants (43.75 percent) failed to log in,

In the 14 х 14 group only 2 of 16 failed (12.5 percent).

There was a significant difference on failure between the groups t(30)=2.63, p<.015).

Tolerance ExperimentSmallest tolerance group rated system

significantly worse on three perceptual measuresNon-parametric Mann-Whitney U-test

It did not take me long to input my password correctly 10 Times

Inputting my password was easy.I think that the password system was pleasant to

use.

Interference ExperimentsWill having to create, practice and remember 2

different passwords be more difficult than just one?Is it harder to use two different pictures or the same

picture?What about 2 alphanumeric passwords?

Experiment 4 (2 passwords 1 picture) ($420)Experiment 5 (2 passwords 2 pictures) ($450)Experiment 6 (2 alphanumeric passwords) ($450)

Protocol same for eachDemonstration and consent form completionCreate 1st password for HOME systemPractice entering HOME password 10 timesCreate 2nd password for OFFICE systemPractice entering OFFICE password 10 timesDistraction task (Questionnaire Q1)Retention PhaseR1: Immediately after questionnaire

Enter each password (random order) correctly onceR2: One week later

Enter each password (random order) correctly once

So……..?All groups better with practiceGraphical group benefitted more from practice

Sliced 57 seconds off average practice time for 2nd password

No other significant differencesNo effect of 2 graphical passwords vs. 1 graphical passwordPassword 2 retention same time and # errors as

password 1No effect of 2 pictures vs. 1 picture

IssuesJuggernaut approach – recorded all conceivable data for

every click for every subject for every trialX,Y location for every single password attempt clickHow many pixels away from the correct point each

click wasUp to 50 slots for practice trials for each passwordUp to 20 slots for retention tests for each passwordOnline questionnaire stored in database

IssuesExceeded capability of SPSS v10.0 (Student)Over 450 variables and over 270 subjectsSPSS output alone in excess of 1000 pagesMuch data of limited value (Q1 vs. Q2) Hard to extract meaningful findings from morass of

resultsMany findings may be significant by chanceRan out of time and moneyUnanswered questions

Unanswered QuestionsAnalysis of nature of errors

Order errors and memory failure errorsNeeded to analyze each attempt one by one

Manually using recorded X,Y coordinatesProximal points confused this

What makes a good password picture ?Memory strategies (geometry and semantics)

Low Entropy (Hotspots)

Better

Best