Turing’s Imitation Game -...

Turing’s Imitation Game: Role of Error-making in Intelligent Thought

Turing100:

Huma Shah (TCAC), Kevin Warwick (TCAC), Ian Bland, Chris Chapman & Marc Allen

Turing in Context II: 10-12 October, 2012

Overview

• Introduction to Turing’s two scenarios for his Imitation Game / Turing test

• University of Reading’s practical imitation game events 2008 & 2012

• Results from 120 (of 276) in which 3 error types observed

• Sample conversation – audience participation

• Why intelligent judges make errors in practical Turing tests


Before we start

• Turing the man:

– If Turing, the genius, were alive today {and he’d be 100!} he’d be helping GCHQ fight cybercrime. Need more Turings (Iain Lobben, Head of GCHQ: Oct 2012)

– Lessons learnt (Cambridge historian, Prof Christopher Andrew: Sept 2012)

– Posthumous apology for mistreatment from a British PM (Gordon Brown: Sept 2009)


Turing’s Imitation Game

• Plurality of views on:

– What is it?

– How useful/damaging it is for AI

• All points of view tolerated

• We focus on results from actual imitation games with over 50 judges, 40 hidden humans and 5 machines


Turing’s Imitation Game: Scenario 1


Simultaneous test: Judge interrogates two hidden entities in parallel: CMI, Mind 1950

Turing’s Imitation Game: Scenario 2


Viva voce test- direct questioning : 1952 & 1950: s6.4 The Argument from Consciousness, p. 446 Note: Turing’s revised prediction: “At least 100 years” (1952)

What Turing felt

• Idea of intelligence emotional rather than mathematical (1948)

• Learning language one of the most accomplished human feats (1948)

• Question/answer method “suitable for introducing almost any one of the fields of human endeavour” (1950: p. 435)

• Five minutes (1950: p. 442): first impression (Willis & Todorov, 2005) and thin slice of behaviour (Albrechtsen et. al, 2009) sufficient duration for game

• Thinking examined through satisfactory and sustained responses to any questions (1950: p.447)


Practical Turing tests reported here

• Scenario 1: simultaneous tests

• 120 machine-human Turing tests:

– 60 conducted at Reading in 2008

– 60 conducted at Bletchley Park in 2012

• Interrogators/Judges made errors in identifying one or both hidden chat partners in a simultaneous test.


Participants 2008


Judges Machine developers Hidden humans

Split-screen simultaneous test


Participants 2012: 23 June Bletchley Park


Judges

Hidden humans

Human participants • Human participants (interrogators and hidden humans)

recruited via calls including social media in 2012: – Local schools – Local and national newspapers – UoR internal call – STEMNET – Twitter – Facebook – Newsgroups

• Human interrogators and hidden humans included members of the public, comp scientists, philosophers, journalists; males/females; adults/ teenagers, and native/non-native UK English speakers.


Machines

• 2008: five select machines invited following one-to-one online testing performance prior to main event

• 2012 five machines invited (four of them from 2008 tests): – Cleverbot, Jfred (not in 2008), Elbot, Eugene

Goostman, Ultra Hal

• Each machine successful in previous Turing test contests


What the Interrogators were told

• Task of each interrogator is to uncover all the machines and hidden-humans.

• At the end of each 5 mins test ‘score’ hidden chatters: – If machine score it for conversation ability 0-100

– If human, identify male/female; age range (child/teenager/adult), FLE/Non-FLE

– ‘Unsure’ score allowed

• Sample score sheet ….


Sample Judge Score Sheet


Judge writing scores after Test


What the Hidden-humans were told

• Please remember that it is the machines in the contest competing to show they are the humans, please do not make it easier for the machines by answering in robotic fashion. If you are not sure what this means, here is a machine's response to a question from a recent Turing test: – I can't deal with that syntactic variant yet.

• Also, please be aware the tests will be open to the public for viewing and the transcripts used for research, i.e. anyone will be able to read your answers to the judges questions! Please do not reveal identity (real name, sex, age range).


Error types in Practical Turing Tests

• Double error in the same test: human interrogator classifies machine for human (Eliza effect); hidden human (foil for the machine) classified as machine (confederate effect);

• Type a single error – Eliza effect: human interrogator classifies machine as human

• Type b single error – confederate effect: human interrogator classifies hidden human as machine.

• Gender & age blur also feature in tests


Audience participation


120 Simultaneous Tests

Error

type :

Double-

error

Eliza

effect

Confed-

erate

effect

Unsure Errors

Number

of Judge

errors

9 7 5 2 (1 machine;

1 hidden

human)

23


Analysing Transcripts

• English language knowledge features in judges’ decisions:

– First language English (FLE) speaking judges misclassified non-FLE hidden humans as machines, and vice versa

• Lack of mutual knowledge

• Subjective opinion on what constitutes satisfactory responses


Why do intelligent humans err?

• Some humans trust too easily, succumb to deception more than others

• Schulz 2010: – Humans regard being right as natural state

– Making errors make us feel deflated and embarrassed

– Capacity to err is crucial to human cognition

– “wrongness is a window into normal human nature” (p.5)


Problem with misidentification in cyberspace

• CyberLover chatbot shows some humans susceptible to deception: developed to steal identity and conduct financial fraud in Internet chat rooms


Practical Uses of Imitation Game

• Encourage more children to take up computer science (problem in the UK)

• Scale progress in machine conversation - chatbots used widely in e-commerce

• Neurology: thought translation for locked-in patients (Stins & Laureys, 2009)

• Raise awareness of malfeasant programmes designed to conduct a particular cybercrime


References

– A.M. Turing. Intelligent Machinery (1948). In B. J. Copeland (Ed) The Essential Turing: The ideas that gave birth to the computer age. Clarendon Press: Oxford. 2004

– A.M. Turing (1950). Computing Machinery and Intelligence. MIND. October: 59(236), pp 433-460

– A.M. Turing. Can Automatic Calculating Machines be said to Think? (1952). In B. J. Copeland (Ed) The Essential Turing: The ideas that gave birth to the computer age. Clarendon Press: Oxford. 2004

– H. Shah. Deception-detection and Machine Intelligence in Practical Turing Tests. PhD Thesis .The University of Reading. October 2010

– J.F. Stins and S.Laureys (2009). Thought Translation, tennis and Turing tests in the vegetative state. Phenom Cogn Sci: DOI 10.1007/s11097-009-9124-8

– J.S. Albrechtsen, C.A. Meissner and K.J. Susa. (2009) Can intuition improve deception detection performance? Journal of Experimental and Social Psychology. 45: pp 1052-1055

– J. Willis and A. Todorov (2005). First Impressions: Making up your mind after 100ms exposure to a face. Psychological Science. 17(7), pp 592-598

– K. Schulz (2010). Being Wrong. Portobello Books: London


• For making Turing100 possible, thank you to:

– UoR SSE

– Price Waterhouse Coopers UK

– Women in Technology, UK

– Artificial Solutions, Daden Ltd, Elzware Conversation Systems

• Thank you for listening


Turing’s Imitation Game -...

Documents

Transcript of Turing’s Imitation Game -...