20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU...

16
/14 /14 /14 Augmented Human Communication Lab 2019/5/20 Speech Translation Neural Machine Translation Brain Analysis Spoken Dialog Multi-modal Dialog Why don’t you join our lab! I’m looking for a lab. Information Retrieval QA System Multi-modal Multi-language ASR Speech Synthesis Deep Speech Chain Deep Neural Network Affective Computing Emotion and Environment Recognition Prof. Satoshi Nakamura Assis. Prof. Koichiro Yoshino WEB Information Processing Toward enhancement of human communication abilities Toward enhancement of human communication abilities, AHC lab is studying multilingual speech translation, dialog system, user- adaptive super-human automatic speech recognition/synthesis, and brain analysis related human communication. We have also been managing Data Science Center since 2017. Assoc. Prof. Katsuhito Sudoh Research Assoc. Prof. Sakriani Sakti Assis. Prof. Hiroki Tanaka Goal-oriented Dialog Non goal-oriented Dialog Incongruity measurement Prediction of feeling Early Detection of Dementia Communication Support Dialog Research Assoc. Prof. Keiji Yasuda Visiting Assoc. Prof. Yu Suzuki

Transcript of 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU...

Page 1: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

/14/14/14

Augmented Human Communication Lab

2019/5/20

Speech TranslationNeural Machine Translation

Brain Analysis

Spoken DialogMulti-modal Dialog

Why don’t you join our lab!I’m looking for

a lab.

Information Retrieval

QA SystemMulti-modal

Multi-language ASRSpeech SynthesisDeep Speech Chain

Deep NeuralNetwork

Affective ComputingEmotion and Environment Recognition

Prof. Satoshi Nakamura

Assis. Prof. Koichiro Yoshino

WEBInformationProcessing

Toward enhancement of human communication abilitiesToward enhancement of human communication abilities, AHC lab is studying multilingual speech translation, dialog system, user-

adaptive super-human automatic speech recognition/synthesis, and brain analysis related human communication. We have also been managing Data Science Center since 2017.

Assoc. Prof. Katsuhito Sudoh

ResearchAssoc. Prof.

Sakriani Sakti

Assis. Prof. Hiroki Tanaka

( )

Goal-oriented DialogNon goal-oriented Dialog

Incongruity measurementPrediction of feeling

Early Detection of DementiaCommunication Support Dialog

ResearchAssoc. Prof. Keiji Yasuda

Visiting Assoc. Prof. Yu Suzuki

Page 2: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

ProfessorSatoshi Nakamura

World-wide

Visiting Assoc. Professor

Yu Suzuki

Assistant ProfessorKoichiro Yoshino

Spoken Dialog SystemDialog Control

Semantic AnalysisLanguage

UnderstandingKnowledge ExtractionInformation Retrieval

Research Associate ProfessorSakriani Sakti

Speech RecognitionMultilingual SR

Cognitive Communication

Graphical Models

Machine Translation

Speech TranslationNatural Language

ProcessingMachine Learning

Assistant ProfessorHiroki Tanaka

Communication AidCognitive Information

Processing

2019/5/20

Associate ProfessorKatsuhito Sudoh

Speech TranslationSpeech Recognition

Dialog ControlCognitive Communication

Big Data Analysis

Research Associate ProfessorKeiji Yasuda

Human Resources TechEducational Tech

Artificial Intelligence in Healthcare

Page 3: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

Lab Members6 Faculty, 17 PhD Students,

27 Master Students

SpeechProcessingD: 8; M: 6

Spoken DialogD: 5; M: 6

CognitiveCommunicationD: 1; M: 6

NLPSTS TranslationD: 3; M: 9

2019/5/20

Page 4: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

2019/5/20

Graduation Ceremony

Lab Research Boot Camp

Lab Ski Camp

Alumni Party

Page 5: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

/14

Professor Satoshi NakamuraBackground

1981.4- 1994.3 Sharp Corp. Central Research Labs

1986-1989 ATR Interpreting Telephony Res. Labs.

1994.4-2000.3 Associate Prof. Nara Institute of Science and Technolog

2000.4 Advanced Telecommunication Research International (ATR) Vice President of ATR,

Director of Spoken Language Communication Labs.

ATR Fellow

2006.4 National Institute of Information and Communication Tech.(NICT) Director, MASTAR Project

Director, KCCC Research Center

Director General, Keihannna Research Laboratories

Dec. 2003 Honorarprofessor of University Karlsruhe, Germany

Apr. 2011 Prof. at Nara Institute of Science and Technology

Spoken Language Communication

Research Laboratories

2019/5/20

Page 6: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

History of Speech Translation Research In Japan

Fundamentals

Read Speech

• Syntactically correct• Clear utterance• Limited domain

Ex. “Conference Registration”

Daily Conversation

• Standard expression• Unclear utterance• Limited domain

Ex. “Hotel Reservation”

Wider and Real Domain

• Wider and real domain“International Travel”

• Realistic expressions• Noisy speech• J-E, J-C speech translation

1986 1992 1999 2006

Rule-based TechnologyCorpus-based Technology

Hand-madeLarge scale corpus

+ Machine learning

2008ATR NICT

A-STAR

+ More languagesfor translation

• Multilateral translation for 8 Asian languages• Network-based S2ST

2010

•21 multilateral text translation

C-STAR

• Multilateral translation for 7 world languages

IWSLT

• Evaluation Campaign of S2S technologies

2011

VoiceTra

NAIST

ATR ATR

2019/5/20

Page 7: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

/14

Page 8: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

/42

Riken AIP Tourism Information Analytics TeamIoT2H: (Internet of Things to Human)

2019/5/20 Satoshi NAKAMURA@AHC,NAIST 8

IoT2H is a technology to bridge Internet of Things and human-beings.

What’s happening?

IoT, Social information

ToHuman

Output in language

Congestion factor for tourism spots

Shopping Hospital

Bus

Train

Restaurant

Temples

BeaconBeacon

Beacon Beacon

KinkakujiTemple is

now crowded

Chat bot

Hotel

Tourism Information in KyotoIdea development of Deep Learning

Image captioning image2cap!

Real-time

Page 9: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

Assoc. Professor Katsuhito Sudoh

Background2000 Bachelor of Engineering, Kyoto University2002 Master of Informatics, Kyoto University2015 Ph.D. (Informatics), Kyoto University2002-2017 NTT Communication Science Laboratories2017- Associate Professor, Graduate School of

Information Science, NAIST

Machine TranslationSpoken Language Processing

I went Nara last night at noon

Information extraction from speech(using recognition “confidence”)

長尺矩形のオイルストレーナ74が溝条72aに略鉛直姿勢で嵌合される。

A long rectangular oil strainer 74 is fitted within the grooves 72a in a substantially vertical posture.

Translation with accurate word order (re-ordering)Translation of technical terms

2019/5/20

Page 10: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

Toward a Language Barrier-free Future!Translation & Language Understanding

Evaluation of Natural Language Generation (Katsuhito Sudoh)

NAISTへようこそ!! NAISTへようこそ!! Welcome to NAIST!!

Machine Translation

For better translation

・Shorter processing time・Accurate translation・Multi-lingual translationetc.

We are now working on these problems to make the world language barrier-free!

High-quality Chat Response (Ryo Nakamura) Neural Machine Translation in Real-time (Katsuki Chousa)

Semantic Sentence Encoding (Yoichi Ishibashi)

Reduce time delay to translate

俄罗斯可以宣布胜利了

Russia could claima victory of sorts

Semantic Automatic Evaluation of Translation (Kosuke Takahashi)

Reference TranslationOriginalRussia can

declarevictory

Evaluate the meanings of translations Encode the sentences as same vector

端子は互いに接触しないように配置されている

Terminals are placed not to be in contact with each other.

It’s fluent but gives wrong meaning...

IT’S POSSIBLY MISUNDERSTOOD

Focusing on risks of Misunderstanding

Style transfer for natural language(Kosuke Futamata)

Apply arbitrary stylistic features.

The chicken was delicious.

The chicken was Terrible.

Style(Positive)

Style(Negative)

Past ・Simultaneous optimization of speech recognition and machine translation ・Translating normal style into honorific styleResearch: ・Small and accurate translation models ・Evaluation of simultaneous speech translation systems

・Machine translation error analysis ・Pivot translation strategies ・Automated programing ・Multilingual machine translation ・Code efficiency prediction based on OJS data

Page 11: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

relation between nouns domainToritaniKinami knowledge

relation between verbs preferencehitswing how to say

Assistant ProfessorKoichiro Yoshino

Background2009 Bachelor of Arts in Environmental Information,

Keio University2014 Ph.D. (Informatics), Graduate School of Informatics,

Kyoto University2014- JSPS Research Fellow (PD)2015- Assistant Professor, Graduate School of Information Science, NAIST

Did Tortani hit a home-run?

Toritani who got the start in 1st line-up hit 2 doubles.

Toritani hitsubject

focus→ retrieve from news text

Toritani who got the start

in 1st line-up

hit 2 doubles

subject

object

Web Text

subject

2019/5/20

Page 12: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

Understanding

Recognition

Management

ASR, Para-linguistic recognition (SP, CC)

Understanding

Management

GenerationAction

TTSAction & behavior generation (SP, CC)

Generation

What is understanding?• Materialization of utterances• Dialogue act tagging• Knowledge acquisition • Knowledge extension• Relations between events

NLP techniques (NL)

NLP techniques (NL)

PRESTO: incremental

knowledge acquisition

PRESTO: incremental processing and

knowledge acquisition

affective computing

Spoken Dialogue Group- Toward cooperative systems through interactions -

How do we realize systems?• Decision making with reinforcement learning• Using a variety of information:

arguments, deception, emotion, task completionentrainment, contradictions, etc

• Algorithm of reinforcement learning• Evaluations of dialogue systems

How systems have effects?• Generate responses

according to manager decisions, contexts, personality, etc

• Image and interaction• End-to-end systems w distillation

Confirm?

Use emotion?

Ask a question?

There are 3 Italian restaurants at Ikoma. Do you have any preference?

Kiyomizu-temple is very crowded due to the high-

seasonAIP: Touristic

information analysis

AP Yoshino

D 品川 D 杉山

M Mai M1

M 隆辻

D 河野

D Tung M 浅井

M1

D 村瀬 M 池内

M 田中 M1

Italian near by Ikoma

DA: questionDomain: restaurant{

type=Italian, …}Obs.

Page 13: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

Assistant ProfessorHiroki Tanaka

Background Bachelor of Engineering,

Ph.D. (Engineering), NAIST

2019/5/20

Page 14: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

Cognitive Communication Group

Assistant Prof.Hiroki Tanaka

Assessment and training of social communication skills (based on cognitive behavior therapy: SST)・Task: speaking, listening, small talks ・Feedback regarding eye gaze, speech and image・Now tested in clinics

Human - humanHuman - machine

Communication

Automatic assessment / Feedback(Medical and educational system)

Estimation of Cognitive &Psychological States

Current and Ongoing Researches

Automated social skills training EEG measuring during Simultaneous translation

Tourism Information Analysis using Tensor Decomposition

D3Haruko Yagura

Predicting Objective Speech Quality Score

Various modalities

EEG Face / Eye Voice

Previous Research Topics

• Speech recognition using EEG signals

• Detection of dementia from responses

• Prediction of depressive tendency from lifestyle

Anomalous Sentence Detection using EEG

Measuring Empathy from EEG Signals

M1Ivan Halim P.

Application:Objective quality measurement of synthesized speech

M2Taiki Kinoshita

Empathy

Inter-BrainSynchronization

EEG

Statistical analysis

Measuring empathy

Application:Evaluation of human-machine empathy

Taro eats an apple

Taro runs an apple

Speech EEG Prediction

Correct

Incorrect

Prediction whether the speech sentence is correct or incorrect using EEG based on machine learning model

Application:Evaluation of machine outputs, adaptive dialogue system

M2ShunnosukeMotomura

M2Motoi Kubo

Apply tensor decomposition to a variety of (=high dimensional) tourism information and analyze trends of tourist's tourist routes and popular spots.

Tourismdata

Tensordecomposition

Loc

atio

n

+ ・・・ +

Trend of migration pathway, popular spot

Page 15: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

/14

Education2005-2008 Doctorate degree (Dr.-Ing)

in Engineering Science, University of Ulm, GERMANY

2000-2002 Master degree (MSc ) in Communication Technology, University of Ulm, GERMANY

1995-1999 Bachelor degree (BSc) in Informatics, Bandung Institute of Technology, INDONESIA

Work Experience2018 – Research Assoc. Professor, Augmented Human Communication Labs, NAIST, JAPAN

Research Scientist, RIKEN Advanced Intelligence Project AIP, JAPAN 2011 – 2017 Assistant Professor, Augmented Human Communication Labs, NAIST, JAPAN2009 – 2011 Visiting Professor, Faculty of Computer Science, University of Indonesia, INDONESIA 2006 – 2011 Expert Researcher, Spoken Language Communication Research Groups, NICT, JAPAN 2003 – 2009 Research Engineer - Researcher, Spoken Language Communication Research Labs, ATR, JAPAN 2001-2002 Masterarbeit, Speech Understanding Dept,

Daimler Chrysler Research Center, GERMANY1999-2000 Junior Software Consultant, Sumarno Pabotingi

Associate, INDONESIA

Research Assoc. Prof. Sakriani Sakti

2019/5/20

Page 16: 20190517 ç 究室ç ç©¶ç´¹ä» è ±èª ç FY2019 - …...2CUV 75KOWNVCPGQWU QRVKOK\CVKQP QH URGGEJ TGEQIPKVKQP CPF OCEJKPG VTCPUNCVKQP 76TCPUNCVKPI PQTOCN UV[NG KPVQ

Speech Processing Research and Applications~ Let’s make a machine that can hear and speak as human ~

Multi-modal Paralinguistic Recognition & Modelling

Michael Heck (AIP-OB)[Multimodal

representation learning]

Speech Recognition and Synthesis

Speech-to-speech TranslationIncluding translation of paralinguistic information such as emphasis, intonation, and pitch

End-to-End Wav-to-Text ASR with Deep Learning

Machine Speech Chain

Emotion Analysis and Deception Detection

Real-time Text-to-speech Synthesis

Have a nice day!

Sahoko Nakayama (DC)

[ Code-switching Speech Chain]

Kazuki Tsunematsu(MC)

[Speech Prediction]

Yanagita Tomoya(DC)

[Incremental TTS]

Nurul Fitria Lubis (D-OB)[Social-Affective Dialogues]

Do Quoc Truong (D-OB)[Paralinguistic

Speech-to-Speech Translation]

Takamoto Kano (DC)[Direct End-to-end

Speech-to-speech Translation]

Johanes Effendi The (DC)[Multi-modal Translation]

Multilingual ASR for Speech Translation

Real-time Speech Recognition using Video

Recognize speech and output a text!

Zero Resource Speech Challenge

ꦱꦸꦒꦌꦚꦁ

Incorporating Human Cognitiveinto ASR

Andros Tjandra (DC)[Machine Speech Chain]

Marco Vetter (DC)

[Lexical Discovery]

Wu Bin (DC)

[Zero Resource ASR]

Tourism Information Analytics

Fan Yang (DC)[Scene Recognition]

Mayuko Okamoto (MC)

[Entrainment TTS]

Sashi Novitasari(MC)

[Incremental ASR]