Andrew Sutherland Presentation

19
Voice-enabled web apps Voice-enabled web apps with WAMI with WAMI By Andrew Sutherland Founder, Quizlet.com

Transcript of Andrew Sutherland Presentation

Page 1: Andrew Sutherland Presentation

Voice-enabled web apps Voice-enabled web apps with WAMIwith WAMI

By Andrew SutherlandFounder, Quizlet.com

Page 2: Andrew Sutherland Presentation

Who am I?Who am I?

• Founder of Quizlet – online flashcards and

study tool• Founded in 2005 in high school

• 500,000 registered users

• 32,000,000 flashcards uploaded

• Sophomore at MIT• I should be in Chemistry lecture right now…

Page 3: Andrew Sutherland Presentation

What is WAMI?What is WAMI?

• A research project at MIT

• A free web service API

• Plug and play :• voice recognition

• audio recording

Page 4: Andrew Sutherland Presentation

How WAMI worksHow WAMI works

• Microphone activated with a Java applet

• Audio streams to WAMI servers

• WAMI processes audio in real-time

• Javascript receives structured data of what

the person said

Page 5: Andrew Sutherland Presentation

WAMI is a web serviceWAMI is a web service

• Plug-and-play javascript one-liner• You don’t have to maintain audio processing servers

• Re-Captcha model

• More apps -> more utterances ->

better quality voice recognition for all

Page 6: Andrew Sutherland Presentation

WAMI lets javascript do the workWAMI lets javascript do the work

• Javascript can activate microphone• myWami.startRecording()

• Javascript receives the text of what you said

• No clunky extra UI necessary – you build your

web app how you like.

Page 7: Andrew Sutherland Presentation

WAMI is fastWAMI is fast

• WAMI can send results before you finish your

sentence:1. “Put an X…”

2. Javascript displays an “X”

3. “…on square five”

4. Javascript moves that “X” to square five.

Page 8: Andrew Sutherland Presentation

WAMI is grammar-basedWAMI is grammar-based

• Recognition is restricted to a grammar

defined by your app

• Grammar is compiled on page load or

recompiled at any time

• Very flexible JSGF format

Page 9: Andrew Sutherland Presentation

What’s a grammar?What’s a grammar?

#JSGF V1.0;

grammar SampleGrammar;

public <top> = turtle | giraffe | pony;

Page 10: Andrew Sutherland Presentation

What’s a grammar?What’s a grammar?

#JSGF V1.0;

grammar SampleGrammar;

public <top> = turtle {[id=1]} | giraffe {[id=2]} |

pony {[id=3]};

Page 11: Andrew Sutherland Presentation

What’s a grammar?What’s a grammar?

#JSGF V1.0;

grammar SampleGrammar;

public <top> = i [really] want (a <animal>)+;

<animal> = turtle {[id=1]} | giraffe {[id=2]} |

pony {[id=3]};

Page 12: Andrew Sutherland Presentation

Getting startedGetting started

<script src="http://wami.csail.mit.edu/portal/wami.js?

devKey=a1234"></script>

<script>

myWami = new WamiApp($(‘wamiDiv’), {

onRecognitionResult : receiveWAMIguess,

onReady : startApp

});

myWami.setGrammar(“#JSGF V1.0 …”);

</script>

Page 13: Andrew Sutherland Presentation

Javascript Data receiverJavascript Data receiver

receiveWAMIguess(obj) {

// “You want a giraffe”

alert(“You want a ”+obj.hyps[0].text);

}

Page 14: Andrew Sutherland Presentation

WAMI saves your audioWAMI saves your audio

• Instantly replay user’s audio.

• You can download audio files to your server

for long-term storage.

Page 15: Andrew Sutherland Presentation

Real-world applicationReal-world application

• Built WAMI into Quizlet.com studying tool.

• Users control vocabulary games by voice.

• Thousands of students using it now• Over 1 million utterances recorded

Page 16: Andrew Sutherland Presentation

Live DEMO!Live DEMO!

Page 17: Andrew Sutherland Presentation

WAMI To Do:WAMI To Do:

• Complete real-time improvement system.

• Switch from Java to Flash

Page 18: Andrew Sutherland Presentation

Please complete Please complete an evaluation. an evaluation.

Page 19: Andrew Sutherland Presentation

Contact me:[email protected]

More about WAMI:http://wami.csail.mit.edu

Questions?Questions?