simon listens Non profit organization for research and training simon-listens

48
Page 1 of 48 simon listens Non profit organization for research and training www.simon-listens.org

description

simon listens Non profit organization for research and training www.simon-listens.org. Development of simon listens. 2005: Identifying the problem 2006/07: Conception and basic programming of the open source software simon by Grasch Peter and a group of students of the HTBLA - PowerPoint PPT Presentation

Transcript of simon listens Non profit organization for research and training simon-listens

Page 1: simon listens Non profit organization for research and training simon-listens

Page 1 of 48

simon listensNon profit organization for research and training

www.simon-listens.org

Page 2: simon listens Non profit organization for research and training simon-listens

Page 2 of 48

Development of simon listens

2005: Identifying the problem

2006/07: Conception and basic programming of the open source software simon by Grasch Peter and a group of students of the HTBLA

2007: Foundation of "simon listens"

2008/09: Programming of the first stable prototype financed by the BMfVIT

Page 3: simon listens Non profit organization for research and training simon-listens

Page 3 of 48

Research projects

Page 4: simon listens Non profit organization for research and training simon-listens

Page 4 of 48

simon – scientific network

HTBLA –Higher Technical School Kaindorf

Graz University of Technology -Signal Processing & Speech Communication Laboratory – Prof. Kubin

Graz University of Technology – Institute for Software Technology (Robocup) – Prof. Wotawa

University of Graz – Austrian German Research Center – Prof. Muhr

Installation über Workshops

Individueller Auftrag an Verein simon listens oder Fa. Cyber-Byte EDV Services

Page 5: simon listens Non profit organization for research and training simon-listens

Page 5 of 48

Simon expertise

Reasons for the fascination of simon listens

•Open Source character

•Synthesis of AI technologies

•Definition of concrete use cases and promoter of research themes

•Professional conceptual work

•Interdisciplinarity

•Preference for pedagogical and social solutions

Page 6: simon listens Non profit organization for research and training simon-listens

Page 6 of 48

• Open Source Speech recognition system

• Based on

o Julius, HTK

o KDE4 / C++

• „Use-case-packages“ for download

o Vocabulary, Grammar, Commands, Trainings texts

• Acoustic model: Base models

o Static, Adapted, User generated

Simon listens products:Simon

Page 7: simon listens Non profit organization for research and training simon-listens

Page 7 of 48

Simon listens products:Simon

Who benefits from simon?

1.Sensorimotor disabled elderly persons2.Physically disabled people of every age3.Quadriplegic people after an accident

Minimum requirement

•Conscious articulation of words or phoneme constellations•Conscious determination of numbers from 0 to 9•No computer literacy necessary

Page 8: simon listens Non profit organization for research and training simon-listens

Page 8 of 48

Simon listens products:SSC

SSC: simon sample collector

ssc is a tool for large scale sample acquisition. Using ssc multiple teams can gather training data from potential end users or professional speakers and collect them on the central sscd server.

Page 9: simon listens Non profit organization for research and training simon-listens

Page 9 of 48

Simon listens products:SAM

SAM: Simon Acoustic Modeller

SAM is a tool to create and test acoustic models. It can compile new speech models, use models created by simon and produce models that can be used by simon later on.

Page 10: simon listens Non profit organization for research and training simon-listens

Page 10 of 48

Use case: Basic Autonomy

Speech control of a media center

It is possible to listens to music, watch a slide show, TV or videos or listen to the radio just with a few – free eligible – words like “right”, “left ”, “up”, “down”,

“ok”, “stop” etc

Page 11: simon listens Non profit organization for research and training simon-listens

Page 11 of 48

Use case: Basic Autonomy

Speech control of the firefox browser

Daily reading of newspapers and surfing the internet is easy and uncomplicated. The number Plug-In allows you to click links just by entering numbers.

Page 12: simon listens Non profit organization for research and training simon-listens

Page 12 of 48

Use case: Basic Autonomy

Speech control of email clients

You can write to predetermined e-mail addresses using numbers and with the use of expandable text modules you can ask basic questions.

Page 13: simon listens Non profit organization for research and training simon-listens

Page 13 of 48

Use case: Basic Autonomy

Speech control of skype

It is easy to establish connections to relatives and friends using Skype or other Voice-Over-IP solutions.

Page 14: simon listens Non profit organization for research and training simon-listens

Page 14 of 48

Use case: Basic Autonomy

Desktop Grid

Navigate the mouse with voice control easy and fast and do simple clicks, double clicks and similar actions.

Calculator

You can do several arithmetic operations in your daily routine and print either the result or the operation with the voice controlled calculator.

Keyboard

The voice controlled keyboard allows easily to insert code words,

TAN-Numbers etc.

Page 15: simon listens Non profit organization for research and training simon-listens

Page 15 of 48

• FFG – BENEFIT simon – verbal control of ICT-applications for elder people

http://www.youtube.com/watch?v=35tyZntA9j4

Research Projects

Page 16: simon listens Non profit organization for research and training simon-listens

Page 16 of 48

Analysis of disfunctions

• Movement disorders are dominant

• Visual and auditive disorders are medial

• Speech ability remains longest

Page 17: simon listens Non profit organization for research and training simon-listens

Page 17 of 48

Current Projects

ibi – I’m informed

Development of dialogues-integration of moving and speaking avatare (Persons, Comics! )

Page 18: simon listens Non profit organization for research and training simon-listens

Page 18 of 49

Current Projects

App112 – Security Connections using Keyword spotting

Page 19: simon listens Non profit organization for research and training simon-listens

Page 19 of 48

Planned Projects

•Voice control via dialogues for clinical rooms (beds, TV, light, etc.)

•Voice control via dialogues for home automation

•Smartphone apps for android, windows mobile and iOS

•Specific Austrian speech model for elder people

•Voice control via dialogues of set top boxes and television sets

Page 20: simon listens Non profit organization for research and training simon-listens

Page 20 of 48

Project ASTROMOBILE

Page 21: simon listens Non profit organization for research and training simon-listens

Page 21 of 48

Astro & one user

Page 22: simon listens Non profit organization for research and training simon-listens

Page 22 of 48

ASTROMOBILE: simon tasks

To fulfil the mentioned task within the project ASTROMOBILE we had to work on the following different sub-tasks of different scientific requirements and not only technical issues like:

•Programming

•Development of scenarios

•Development of dialogues

•Speech modelling

•Signal processing

Page 23: simon listens Non profit organization for research and training simon-listens

Page 23 of 48

Programming the D-Bus Interface

The current draft identifies seven dedicated components:

•Navigator: Provides high level navigation including obstacle avoidance and path planning•Locator: Locate the robot and the person using the sensory network•Sensors: Integration of Boolean sensors (bed sensor, smoke sensor, etc.)•Speech Recognition: Command and control system utilizing simon•Text-To-Speech: Synthesize a given text in German, Italian and English•AstroLogic: Logic layer

Page 24: simon listens Non profit organization for research and training simon-listens

Page 24 of 48

Scenarios:User - Robot

General offers, when the robot stays in front of the User after calling him:

•Weather information•news based on RSS feeds with speech synthesis to listen the news •Multimedia offers like:

• Photos• Music• Videos

•Communication offers like Skype calls, Phone calls, SMS, Mail•Organization offers: scheduler•Calculator•Keyboard 

Page 25: simon listens Non profit organization for research and training simon-listens

Page 25 of 48

Scenarios:User - Robot

Control functions in the natural environment ordered by the user and configured feedback by the robot using the recording of a 10 second video and presenting it to the user, when the robot comes back like

• Control of the water in the bathroom• Control of the doors in the environment• Control of the cooker• Control of the gas and other critical functions

Request functions: With the help of the simon touch platform the user should be able to initiate some requests like

• Request of new medicine• Request of food• Request of acute help• Request of general help by the caregiver• Request of cargiver transport to the doctor or other events• Pre-established SMS-Service with the list PlugIn

Page 26: simon listens Non profit organization for research and training simon-listens

Page 26 of 48

Scenarios:Robot - User

Reminder functions with request of help are prepared for the following situations like

• Alarm in the morning • Reminding of the hygiene and dressing in the morning• Reminding of the hygiene and facing in the evening• Reminding of taking the ordered drugs• Reminding of periodic drinking• Reminding of eating in the morning• Reminding of eating in the noontime• Reminding of eating in the evening• Reminding of coffee time• Reminding of periodic Skype calls

Page 27: simon listens Non profit organization for research and training simon-listens

Page 27 of 48

Scenarios:Robot - User

Simple reminder functions without request of help are prepared for the following situations like:

• Reminding of events ( Based on calendar )• Reminding of birthdays• Reminding of appointments like

• Meeting with friends• Consultation with doctors• Visit of events• Personal appointments in the calendar

 Dialogue-actions: ( skype and mailing ) (simple reaccion yes or no! )

Incoming Skype calls with the possibility to accept or refuse the callIncoming mails with the possibility to allow or refuse that the robot reads the

messageIncoming appointment requests with the possibility to allow or refuse the

appointment

Page 28: simon listens Non profit organization for research and training simon-listens

Page 28 of 48

Scenarios:Caregiver – Robot - User

Control functions:• Caregiver have access to the information of the sensors in the

environment• Caregiver can administrate the dialogues, appointments and

reminder functions for the user on the calendar• Caregiver can activate the robot to transmit a visual impression of

the user in case of emergency 

Communication functions• Caregiver can call the user using the Skype dialogue• Caregiver can sent an appointment to convene with the user using

the robot and the calendar• Caregiver can sent an information with E-Mail using the mail reading

dialogue

Page 29: simon listens Non profit organization for research and training simon-listens

Page 29 of 48

Simon TouchArchitecture

Like the rest of the developed solution, Simontouch uses C++, Qt4 and the KDE libraries. In particular, we are using the Akonadi PIM service, the Nepomuk / Strigi search, the KLocale framework for localization and the Phonon multimedia system.

Page 30: simon listens Non profit organization for research and training simon-listens

Page 30 of 48

Simon listens – Simon touch

Simon touch – voice controlled touchscreen interface

•Main screen

•Information center with

– Slideshow, Music, Video, News with speech output

•Optional functions

– Touchscreen keyboard– Touchscreen calculator– Touchscreen calendar

Page 31: simon listens Non profit organization for research and training simon-listens

Page 31 of 48

Simon listens – Simon touch

• Communication center with

– Skype, Phone, SMS, Mail

• Control center with video recording and playback

– Control of water, doors, cooker, gas

– User can activate the control function

– Caregiver can activate the control function from outside and take a look using the integrated video stream

Page 32: simon listens Non profit organization for research and training simon-listens

Page 32 of 48

Simon listens – Simon touch

• Request center with direct phone calls or mail order

– Shopping system, transport and support calls

Page 33: simon listens Non profit organization for research and training simon-listens

Page 33 of 48

The dialogue system of Simon

Dialogues in the Astromobile project

Page 34: simon listens Non profit organization for research and training simon-listens

Page 34 of 48

The dialogue system of Simon

The dialogue system of simon was implemented as a command plugin. You can basically speak of an ultimately robot.

StatesEvery state consists internal of:The current dialogue text. Every State can have several texts to give the dialogue a natural flow. Dialogue texts can use bounded values and templates (see below).

Page 35: simon listens Non profit organization for research and training simon-listens

Page 35 of 48

The dialogue system of Simon

AvatarA state can be linked with an avatar (e.g. the face of a nurse, an icon, etc.)

Page 36: simon listens Non profit organization for research and training simon-listens

Page 36 of 48

The dialogue System of simon

OptionsThrough triggering the options (e.g. by a speech-command) a state can go over into another or commands can be executed. Options have a trigger, a name, an optional icon and can be automatically initiated after some time from entering the state.

Page 37: simon listens Non profit organization for research and training simon-listens

Page 37 of 48

The dialogue System of simon

Bound valuesVariables in the dialogue system will be shown as bound values. So for example the name of the user could be represented as $name$. The variable will be triggered to the duration with the list of configured bound values. There are four types of bound values:

StaticConnection of the variables with a text; e.g. Name of a patient

QtScriptThis variable takes the result of the given Qt-Script (ECMAScript; also known as “JavaScript”) at the evaluated run time.

Output optionsA dialogue can be shown graphically on the screen or through the integrated speech synthesis system (TTS) with the speaker.

Page 38: simon listens Non profit organization for research and training simon-listens

Page 38 of 48

The dialogue System of simon

Implementation in simonThe dialogue states can be taken through the schematic diagram above. So it results in:

•Three states (Reminder, already taken or not taken)•One time-driven trigger (who starts the dialogue on a specific time)•2 speech transitions (“Yes”, “No”)•One time-driven transition (renewed reminder after time lapse)

Page 39: simon listens Non profit organization for research and training simon-listens

Page 39 of 48

The dialogue System of simon

Page 40: simon listens Non profit organization for research and training simon-listens

Page 40 of 48

The dialogue System of simon

Page 41: simon listens Non profit organization for research and training simon-listens

Page 41 of 48

The dialogue System of simon

Schedule based appointments or dialogue actions

Page 42: simon listens Non profit organization for research and training simon-listens

Page 42 of 48

Simon listens – simon speech modelling

Speech models in English, German and Italian

•English: Adaption of the open source speech model of Voxforge

•German: Adaption of a self produced speech model of elderly people

•Italian:

– Recording speech data of 46 persons of the

region of Pontedera

– Modelling of a specific Italian speech model forelderly people

Page 43: simon listens Non profit organization for research and training simon-listens

Page 43 of 48

Simon listens – signal processing

Actual Solution

Calling Astro with a Nokia N9 MeeGo mobile phone from everywhere in the natural environment (A MeeGo client was developed within this project )

Controlling Astro with a mounted

gooseneck microphone in front of the robot

Page 44: simon listens Non profit organization for research and training simon-listens

Page 44 of 48

Simon listens – signal processing

Page 45: simon listens Non profit organization for research and training simon-listens

Page 45 of 48

Simon listens – signal processing

Natural speech communication from distance between humans and machines

A very good solution would need:

•A combination of speech recognition, tools of artificial intelligence and speech synthesis

•A combination of identification of the direction of sound, localization of the user, voice identification, face detection etc.

•A very good sound segmentation

•Different zones of communication like call, communication and comfort zone with special speech models and intelligent microphone management

The development of this solution would require a very great multidisciplinary project

Page 46: simon listens Non profit organization for research and training simon-listens

Page 46 of 48

Astro Video Presentation

Page 47: simon listens Non profit organization for research and training simon-listens

Page 47 of 49

Astro & two user

Page 48: simon listens Non profit organization for research and training simon-listens

Page 48 of 48

Synthesis of AI-Approaches

Multisensory natural speech communication between Humans and Robots/Ambient Assistant Living scenarios would need a high level of interdisciplinarity and

Synthesis of AI-technologies

Indoor localization Behaviour analysis

Microphone installation with logic of call/security, Communication and comfort zone

Specific sound segmentation Identification of direction of the voice

Voice activity detection Voice identification for user identification

Face detection Mouth movement detection

Context based speech recognition with specific Speech models for call/security,

communication and comfort zones Tools of artificial intelligence

Dialogue systems for clearing situations Intelligent context based user

orientated dialogue systems

R

O

B

O

T

S

S

M

A

R

T

H

O

M

E

S