December 19, 2005

Post on 19-Mar-2016

39 views 0 download

Tags:

description

FPMS. December 19, 2005. Acapela’s corporate profile. Group Background. Babel Technologies > Created in 1995 in Mons (Belgium) > Spin off of Mons Polytechnical University > In-house TTS & ASR technologies > TTS and ASR leader in Embedded environment. Infovox - PowerPoint PPT Presentation

Transcript of December 19, 2005

December 19, 2005

FPMS

Acapela’s corporate profile

Group Background

Babel Technologies> Created in 1995 in Mons (Belgium)> Spin off of Mons Polytechnical University > In-house TTS & ASR technologies> TTS and ASR leader in Embedded environment

Infovox> Created in 1983 in Stockholm (Sweden)> Spin off of KTH (Royal Institute of Technology)> Integrated into Telia Promotor in 1993> Acquired by Babel Technologies in 2001> TTS leader in Nordic, Germany and Netherlands> Accessibility and Telecom expertise

Elan Speech> Created in 1980 in Toulouse (France)> Focused on TTS since 1996> Launch of in-house high quality TTS in 2002 (Elan Sayso)> TTS leader in Telecom and Automotive

Acapela’s locations

France, Toulouse

Belgium, Mons

Sweden, Stockholm

3 sites50 people

InternationalTeam

Local support in each site

Merged organization

Acapela’s multilingual offer

ASR & TTS components in 23 languages

Acapela’s technologies

Technologies (TTS)

Architecture

Text Preprocessor

Synthesizer

Tagger

Phonetizer

Prosody

Set of RulesSet of Rules

Dictionary basedDictionary based

Phonetic tree + DictionaryPhonetic tree + Dictionary

Prosodic PatternsProsodic Patterns

database (Voice)database (Voice)

Text Preprocessor

> Function– Generation of standard text

> Examples– Numbers: 100 one hundred– Currencies: $20 twenty dollars– Abbreviations: tel. telephone

> Implementation– Rules are defined in a standard format (BNF

format) > Size of data

– 20 Kbytes

Text Prepro.

Tagger

Phonetizer

Prosody

Speech Synth

Tagger (optional)

> Function– Generation of grammatical function of each word– Optional: not necessary for all languages

> Examples– To read – I have read– Les poules du couvent couvent

> Implementation– Dictionary based + set of rules

> Size of data– 0 to 20 Kbytes

Text Prepro.

Tagger

Phonetizer

Prosody

Speech Synth

Phonetizer

> Function– Generation of phonetic transcription for each word

> Examples– Babel: b a b E l

> Implementation– Decision tree + exception dictionary

> Size of data (language dependent)– 5 to 350 Kbytes

Text Prepro.

Speech Synth

Phonetizer

Prosody

Tagger

Prosodic module

> Function– Generation of intonation:

• Phoneme duration• Pitch markers

> Examples– See MBROLI application

> Implementation– Prosodic patterns extracted from speech corpus

> Size of data (language dependent)– 30 to 300 Kbytes

Text Prepro.

Prosody

Tagger

Phonetizer

Speech Synth

Synthesizer

> Function– Generation of speech samples from phoneme

sequence + intonation> Implementation: 3 technologies

– Formant-based = rules– Diphone concatenation– Unit Selection

> Size of data: depends on– Technology– Sampling frequency– Compression rate– From 50 Kbytes to 50 Mb

Text Prepro.

Speech Synth.

Tagger

Phonetizer

Prosody

Technologies (ASR)

Speech Recognition

Hybrid Models : Hidden Markov Models/ Neural Networks.Hybrid Models : Hidden Markov Models/ Neural Networks.

Analyse Acoustique

Reseau neurones

HMM

DiscriminationDiscrimination

Programmation Dynamique Programmation Dynamique (decoder)(decoder)

Reconnaissance

Vocabulaire– Transcription phonétique

Ex: reconnaissance: R [@] k O n E s a~ s– Envisager toutes les transcriptions !

Ex: 10 = dis – diz – di– Envisager les synonymes !

Ex: Oui , ouais, ok, c’est cela, …Ex: Télévision, TV, poste de télévision

Reconnaissance (suite) : difficultés

BruitAccentsHésitationsUtilisateursSyntaxe incorrecteMots hors vocabulaire

ASR : advantage of NN

Acapela’s product overview

Acapela’s Technologies Overview

> High-Quality TTS : the pleasant and natural sounding voicevoice enabled by Sayso and BrightSpeech based on Unit Selection technology

> High-Density TTS : the right choice for high density and small footprintsvoice enabled by Tempo and Babil based on Diphone technology

> ASR : the robust speech recognizervoice enabled by Babear Speaker Independent ASR based on Hidden Markov Models and Artificial Neural Networks

3 Technologies

Two TTS technologies

Diphone based concatenative TTS

Advantages• Small footprint (2 to 6 Mb)• Flexibility (Pitch, Speed adjustment, prosody copying)• High intelligibility• 21 languages supported

Disadvantage :• Less natural sounding

Markets/Application targeted :• Automotive & consumer electronic (low footprint)• High density, short ROI server based TTS (telephony)• Multimedia software products

High Density TTSVoice enabled by Tempo & Babil

LanguageFrench Female MaleUS English Female MaleUK English Female MaleGerman Female MaleSpanish (castillian) Female MaleItalian Female MalePolish MaleRussian MaleDutch ( NL ) Female MaleDutch ( B ) FemaleContinental Portuguese FemaleDanish Female MaleSwedish Female MaleNorwegian MaleFinnish MaleIcelandic MaleCzech FemaleTurkish MaleArabic MaleSouth American Spanish FemaleBrazilian Portuguese Female Male

Gender

High Density TTS language availability

Unit selection concatenative TTS

Advantages :• Very high quality• Highly natural• Flexibility (Pitch, Speed adjustment, timber alteration, whispering feature)• Support for Custom voice (“SpeechBrand” Program)

Disadvantage :• larger footprint (16 to 70 Mb)

Markets/Application targeted :• High end telephony application (Voice portal, news)• New generation of navigation terminals• Public address

High Quality TTS Voice enabled by Sayso & BrightSpeech

Language StatusFrench Female Male AvailableUS English Female Male AvailableUK English Female Male AvailableGerman Female Q1-2006 AvailableSpanish (castillian) Female AvailableItalian Female AvailablePolish Female AvailableSwedish Female Male AvailableArabic Female Male AvailableDutch ( NL ) Female AvailableDutch ( B ) Female AvailableNorwegian Female AvailableContinental Portuguese FemaleDanish ****Mexican Spanish ***Finish **Canadian French *

Gender

High Quality TTS language availability

Hybrid technology of Hidden Markov Models and Artificial Neural Networks

Advantages :• Very high accuracy in difficult contexts• High dialog flexibility, • lip-sync and language learning capabilities thru phoneme level discrimination• Speaker independent• Accurate Voice Activation for noisy environments

Markets/Application targeted :

• Industrial Data collection : inventories, picking…• Automotive• Name dialing• Multimedia Command & Control / language learning

ASR Voice enabled by Babear

Language Robustness StatusUS English +++ AvailableUK English +++ AvailableSpanish + AvailableFrench +++ AvailableGerman +++ AvailableItalian ++ AvailableDutch + AvailableGreek + AvailableArabic ++ Available

ASR language availability

Acapela’s market coverage

Acapela’s Markets

Solutions for Telecom, Automotive, Accessibility

Mobility, Industry, Multimedia, Consumer Electronics.

Leading 3 major and mature marketsTelecom, Automotive, Accessibility

Acapela’s Markets

Acapela’s main Markets

TelecomServer based vocalization of contents for multiple users over the phone• for Companies : Unified messaging, Auto attendant, CRM• for Telcos : Unified messaging, Voice portal, SMS2Voice, directory and reverse directory• for Contact centers: call automation, FAQ

            

                   

                

                

                

                

Acapela’s main Markets

AutomotiveOn board and off-board speech solutions• On board & Off board car navigation systems• Traffic information• PDA based applications• Telematics          

Acapela’s main Markets

AccessibilityAssistive technologies• Screen readers• Reading machines• Voice-controlled mobile phones

Creating new speech markets opportunities in

Acapela’s Markets

>> Mobility• Cell phones• Navigation on PDAs

Creating new speech markets opportunities in

Acapela’s Markets

>> Industry• Public Address• Alarm & Supervision• Warehousing, Production Line

      

Creating new speech markets opportunities in

Acapela’s Markets

>>Multimedia • Edutainment • Education• Language learning• E-learning

Creating new speech markets opportunities in

Acapela’s Markets

>> Consumer Electronics, …• Talking dictionaries devices• Toys

giving you the say