rospeex: a cloud-based speech communication toolkit for ROS

9
rospeex A Cloud-based speech communication toolkit for ROS Komei Sugiura National Institute of Information and Communication Technology, Japan [email protected] 2013/12/13

Transcript of rospeex: a cloud-based speech communication toolkit for ROS

Page 1: rospeex: a cloud-based speech communication toolkit for ROS

rospeexA Cloud-based speech communication toolkit for ROS

Komei SugiuraNational Institute of Information and Communication Technology, [email protected]

2013/12/13

Page 2: rospeex: a cloud-based speech communication toolkit for ROS

ROS (Robot Operating System)

• ROS: middleware for robots– Version 1.0 released in 2010– Global de facto standard– From driver and package management to learning and

visualization

2

Page 3: rospeex: a cloud-based speech communication toolkit for ROS

Speech communication toolkit for ROS

• ROS compatible• Speech recognition using VoiceTra engine• Other functionalities

– Noise reduction, non-monologues speech synthesis

Conventional packages rospeexSpeech recognition/synthesis

Sphinx, festival, Julius(or commercial tools)

VoiceTra engine(or third-party engines)

Engine Stand alone Cloud-basedLanguage Single language ja, en, zh, ko

rospeex

3

Page 4: rospeex: a cloud-based speech communication toolkit for ROS

Position in Cloud Robotics

• Cloud robotics [James Kuffner@Google, 2011]– Manipulation using Google Goggles [Kehoe+ 2013]– Knowledge sharing based on RoboEarth [Tenorth+ 2012]– Speech communication for robots rospeex

Commercial systems(Nuance, ToSpeak, AmiVoice Cloud, ..)

rospeex

Many OpenHRI, HARK,PocketSphinx, Festival

Cloud-based

Stand-alone

Robot middleware compatibleIncompatible

Page 5: rospeex: a cloud-based speech communication toolkit for ROS

Quadrilingual communication using rospeex

5

Page 6: rospeex: a cloud-based speech communication toolkit for ROS

rospeex provides speech recognition/synthesis, user constructs dialogue processing

Speech moduleDialogue

processingSpeech

synthesis

Task manager

Speech output

Speech input

Input from other modules(Sensors, recognized obj, etc)

Output to other modules(Actuators, learning, etc)

Provided by the user

Provided by rospeex

Speech recognition

Speech recognition & synthesis servers

Noise reduction

VAD

Speech recognition & synthesis servers

Provided by third parties

Page 7: rospeex: a cloud-based speech communication toolkit for ROS

Non-monologue speech synthesis for robots

• Reading-style robot voice– Monotonous, unnatural and unfriendly– Hard to realize that the robot is asking

a question• Conventional text-to-speech (TTS) systems

are not optimized for communication

Voice talentXIMERA 3

(Text reading)

7

Page 8: rospeex: a cloud-based speech communication toolkit for ROS

Demohttp://komeisugiura.jp/software/nm_tts.html

8

Page 9: rospeex: a cloud-based speech communication toolkit for ROS

Using speech recognition/synthesis without ROS

• Send JSON file to the server– Recognition– Synthesis

• Sample codes (JavaScript, Python, C++) are available

{ “method” : “speak”,"params" : [

"ja","こんにちは","*","audio/x-wav"

]}

http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSS

{ "method":"recognize","params":[

"ja",{“audio”:“base64-encoded wav",

"audioType":"audio/x-wav","voiceType":"*"} ] }

http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSR

Recognition Synthesis

Non-monologue speech synthesis Search