Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5....

111
Speaker: Hung-yi Lee Towards Spoken Knowledge Structuring and Organization When Speech Processing Technology meets MOOCs

Transcript of Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5....

Page 1: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speaker: Hung-yi Lee

Towards Spoken Knowledge

Structuring and Organization

When Speech Processing Technology

meets MOOCs

Page 2: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Introduction

2012 is the year of the massive open online

course (MOOC)

Instructors post recorded video/audio of their lectures on

online lecture platforms.

Learners worldwide can easily access the curricula.

More learning materials

Page 3: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Too much materials ……

Learner

I want to learn “SVM”.

Page 4: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

A course is too much

Learner

I want to learn “SVM”.

Machine Learning(by Andrew Ng)

XII. Support Vector

Machines (Week 7)

Learning From Data

(Yaser S. Abu-Mostafa)

Lecture 14: Support

Vector Machines

Lecture 15: Kernel

Methods

機器學習基石(by 林軒田)

Page 5: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

A course is not enough

Inter-discipline

Learner

I want to learn

“amino acid”.

Introduction to Biology

Amino acid

Amino acid

Organic Chemistry

Page 6: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Vision: Personalized Courses

I want to learn “XXX”.

I am a graduate student of

computer science.

I can spend 10 hours.Learner

I open a course for you.

on-line learning

material

Spoken Language Processing techniques can be very

helpful.

The spoken content in courses plays the most

important role in conveying the knowledge.

Page 7: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Outline

Part I: Overview each block in spoken knowledge

structuring and organization

Speech Recognition

Temporal Structure

Spoken content retrieval

Linking related lectures

Speech summarization

Knowledge graph construction

Inferring prerequisite and advanced concepts

Part II: Spoken Content Retrieval

Part III: Speech Summarization

Part IV: Demo

Page 8: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Part I: Overview

Page 9: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Speech Recognition

Recognition

Output

Multimedia

(Audio/Video)

Page 10: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Recognition

Lectures on Coursera and edX has manual

transcriptions

Most lectures on the Internet do not have

transcriptions

Speech recognition!

Page 11: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Recognition

Speech

Recognition

Speech Recognition is the foundation of the following

speech techniques

Spoken Lectures

DNA is the material

of ……

Hidden Markov Model

N-gram Language Models

Page 12: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Recognition

Speech

Recognition

Speech Recognition is the foundation of the following

speech techniques

Spoken Lectures

DNA is the material

of ……

N-gram Language Models

Deep Neural Network

Page 13: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Recognition

Speech

Recognition

Speech Recognition is the foundation of the following

speech techniques

Spoken Lectures

DNA is the material

of ……

Deep Neural Network Continues Language Model

Page 14: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Multi-layer Temporal Structure

Recognition

Output

Multimedia

(Audio/Video)

Temporal Structure

Page 15: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Multi-layer temporal structure

lectures

course

ch 1 ch 2 ……

Learner

Too long ……

Page 16: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Multi-layer temporal structure

lectures

sections

paragraphs

course

ch 1 ch 2

Paragraph

1

Paragraph

2

Paragraph

3 ……

……

……

audio

Not Directly

Available

several

utterances

several

utterances

several

utterances

Find automatically

Page 17: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Spoken Content Retrieval

Spoken

Content

Retrieval

Recognition

Output

Multimedia

(Audio/Video)

Temporal Structure

Page 18: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Basic goal: return paragraphs or sections containing

keywords

This is called “Spoken Term Detection” (口語詞彙偵測)

Spoken Content Retrieval – Goal

“DNA”

user

sections

Paragraph

1

Paragraph

2

Paragraph

3 ……

……

several

utterances

several

utterances

several

utterances

paragraphs

audio

Page 19: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Basic goal: return paragraphs or sections containing

keywords

This is called “Spoken Term Detection” (口語詞彙偵測)

Advanced goal: Semantic retrieval of spoken content

Spoken Content Retrieval – Goal

user

“Inheritance

Material”

I know that the user is

looking for “DNA”.

Retrieval system

Page 20: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Visualizing Search Results

Spoken

Content

Retrieval

Recognition

Output

Multimedia

(Audio/Video)

Temporal Structure

Easy to browseLinking Similar

Lectures

Page 21: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Learner

Search

With spoken content retrieval, we can use keywords

to search related lectures

Electronic

Textbooks

Course AInheritance

Lecture 8-2 Lecture 8-3 Lecture 8-4

Lecture 5-4 Lecture 5-5 Lecture 5-6

Course B

“DNA”

Page 22: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

DNA

ReplicationDNA

Structure

Learner

Linking

Linking lectures with similar content

Compute similarity between lectures in courses and sections in

textbooks

Merge the materials with high cosine similarity

Course AInheritance

Lecture 8-2 Lecture 8-3 Lecture 8-4

Lecture 5-5 Lecture 5-6

Course BLecture 5-4

“DNA”

Page 23: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Speech Summarization

Syntactic and Semantic

Analysis

Speech

Summarization

Spoken

Content

Retrieval

Recognition

Output

Multimedia

(Audio/Video)

Temporal Structure

Easy to browseLinking Similar

Lectures

Page 24: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Summarization

Audio is hard to browse

Lecture

SummarySelect the most informative segments to

form a compact version

40 minutes

1 minute

Page 25: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Knowledge Graph Construction

Syntactic and Semantic

Analysis

Speech

Summarization

Spoken

Content

Retrieval

Key Term

Extraction

Recognition

Output

Knowledge

Graph

Multimedia

(Audio/Video)

Temporal Structure

Easy to browseLinking Similar

Lectures

Page 26: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Audio transcriptions of video clips

Knowledge Graph Construction

- Keyterm Extraction

Knowledge graph construction

Keyterm extraction

Keyterm

ExtractionDNA

Inheritance

RNA

Adenosine triphosphate

……

Page 27: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Knowledge Graph Construction

- Relation Extraction

Knowledge graph construction

Keyterm extraction

Find relation between keyterms

Co-reference

Resolution

Transcriptions As DNA encodes RNA,

it is the material of inheritance.

Page 28: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Syntactic

Parsing

Knowledge Graph Construction

- Relation Extraction

Knowledge graph construction

Keyterm extraction

Find relation between keyterms

Co-reference

Resolution

Transcriptions As DNA encodes RNA,

it is the material of inheritance.DNA

Syntactic

parsing tree

Page 29: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Relation

Extraction

Syntactic

Parsing

Knowledge Graph Construction

- Relation Extraction

Knowledge graph construction

Keyterm extraction

Find relation between keyterms

Co-reference

Resolution

Transcriptions

relation

Syntactic

parsing tree

[Mausam, EMNLP’12].

Page 30: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Knowledge Graph Construction

- Relation Extraction

Knowledge graph construction

Keyterm extraction

Find relation between keyterms

Knowledge

Graph

DNA

RNA

Protein

GenomeGene

Nucleotide

Inheritance

Knowledge Graph

Page 31: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Inferring Learning Path

Syntactic and Semantic

Analysis

Speech

Summarization

Spoken

Content

Retrieval

Key Term

Extraction

Recognition

Output

Knowledge

Graph

Multimedia

(Audio/Video)

Temporal Structure

Easy to browseInferring

Learning Path

Linking Similar

Lectures

Page 32: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Inferring Learning Path

Inferring prerequisite and advanced concepts

Construct a knowledge graph

DNA

RNA

Protein

GenomeGene

Nucleotide

Inheritance

First mentioned

in Lecture 1

First mentioned

in Lecture 3

Analyze the positions where the concepts are mentioned

the first time in a course

First mentioned

in Lecture 10

Page 33: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Inferring Learning Path

Inferring prerequisite and advanced concepts

Construct a knowledge graph

DNA

RNA

Protein

GenomeGene

Nucleotide

Inheritance

First mentioned

in Lecture 1

First mentioned

in Lecture 3

“Inheritance” is the prerequisite concept of “DNA”

“RNA” is the advanced concept of “DNA”

First mentioned

in Lecture 10

Page 34: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Part II:

Spoken Content Retrieval

Page 35: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

People think ……

Spoken Content Retrieval

Speech Recognition

+

Text Retrieval=

Page 36: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Recognition + Text Retrieval

Speech

Recognition

Text

Retrieval

ResultText

Retrieval Query learner

Spoken Lectures

Models

Spoken Content Retrieval

= Speech Recognition + Text Retrieval

Page 37: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech recognition has uncertainty

Speech Recognition + Text Retrieval

……….. DNA ...

x1

R(x1)=0.9

x2

R(x2)=0.3

x1 0.9

x2 0.3

…user

“DNA”

DNA ………….

Page 38: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Is the problem solved?

Speech

Recognition

Text

Retrieval

ResultText

Retrieval Query learner

Spoken Lectures

Models

The retrieval performance seriously degrades with

inevitable recognition errors.

In real application, speech recognition accuracy can be

low.

Page 39: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Is the problem solved?

Speech

Recognition

Text

Retrieval

ResultText

Retrieval Query learner

Spoken Lectures

Models

To make retrieval performance less limited by recognition

errors

We need new ideas beyond cascading speech recognition

and text retrieval.

Page 40: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

My point

Spoken Content Retrieval

Speech Recognition

+

Text Retrieval=

Page 41: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Beyond Cascading Speech

Recognition and Text Retrieval

Incorporating Information Lost in Standard Speech

Recognition

Improving Recognition Models by User Relevance

feedback

Query Expansion with Speech Signals

Spoken Content Retrieval without Speech

Recognition

Interactive Retrieval

Page 42: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Beyond Cascading Speech

Recognition and Text Retrieval

Incorporating Information Lost in Standard Speech

Recognition

Improving Recognition Models by User Relevance

feedback

Query Expansion with Speech Signals

Spoken Content Retrieval without Speech

Recognition

Interactive Retrieval

Page 43: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Similarity

Is it truly “DNA”?

Recognition

ModelsAcoustic Models

Language Model

Lots of inaccurate assumption

Speech

Recognition

Page 44: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Similarity

“DNA”

“DNA”

“DNA”

similarity

It is not realistic to find examples for all queries.

Use Pseudo-relevance Feedback (PRF)

Is it truly “DNA”?

Page 45: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Retrieval

System

Pseudo Relevance Feedback (PRF)

Query Q

First-pass Retrieval Resultx1

x2 x3

Recognition

Output

Page 46: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Retrieval

System

Pseudo Relevance Feedback (PRF)

Query Q

Confidence scores

R(x1)

First-pass Retrieval Resultx1

x2 x3

R(x2) R(x3)

Not shown to the user

Recognition

Output

Page 47: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Retrieval

System

Pseudo Relevance Feedback (PRF)

Query Q

R(x1)

First-pass Retrieval Resultx1

x2 x3

R(x2) R(x3)

Assume the result with high confidence scores as correct

Examples of Q

Considered as examples of Q

Recognition

Output

Page 48: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Retrieval

System

Pseudo Relevance Feedback (PRF)

Query Q

R(x1)

First-pass Retrieval Resultx1

x2 x3

R(x2) R(x3)

similar dissimilar

Examples of Q

Recognition

Output

Page 49: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Retrieval

System

Pseudo Relevance Feedback (PRF)

Query Q

R(x1)

First-pass Retrieval Resultx1

x2 x3

R(x2) R(x3)

time 1:01

time 2:05

time 1:45

time 2:16

time 7:22

time 9:01

Rank according to new scores

Examples of Q

Recognition

Output

Page 50: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

How to compute the similarity of two audio

segments?

Similarity between Audio Segments

Use a feature vector to present a short time span.

similarity

… …

……

Page 51: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

How to compute the similarity of two audio

segments?

Similarity between Audio Segments

A audio segment is a sequence of feature vectors.

similarity

… … … … … … … … … … …

Page 52: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Similarity between Audio Segments

Dynamic Time Warping (DTW)

… … … … … … …

Page 53: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(A) (B)

Digital Speech Processing (DSP) of NTU based on lattices

Pseudo Relevance Feedback (PRF)

- Experiments

Evaluation Measure: MAP (Mean Average Precision)

Page 54: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(A) (B)

Pseudo Relevance Feedback (PRF)

- Experiments

(B): speaker independent (50% recognition accuracy)

(A): speaker dependent (84% recognition accuracy)

(A) and (B) use different speech recognition systems

Page 55: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(A) (B)

PRF (red bars) improved the first-pass retrieval

results with lattices (blue bars)

Pseudo Relevance Feedback (PRF)

- Experiments

Page 56: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Graph-based Approach

In PRF, each result considers the similarity to some

examples

Consider the similarity between all results

Formulated as a problem on graph

Page 57: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Graph Construction

The first-pass results is considered as a graph.

Each retrieval result is a node

First-pass Retrieval

Result from lattices

x1

x2

x3

x2

x3

x1

x4

x5

Page 58: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Graph Construction

The first-pass results is considered as a graph.

Nodes are connected if their retrieval results are similar.

x2

x3

x1

x4

x5Dynamic Time Warping

(DTW) Similarity

similar

Page 59: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Changing Confidence Scores by Graph

The score of each node depends on its neighbors.

x2

x3

x1

x4

x5

R(x1)

R(x2)

R(x3)

R(x5)R(x4)

high

high

近朱者赤

近墨者黑

Page 60: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Changing Confidence Scores by Graph

The score of each node depends on its neighbors.

x2

x3

x1

x4

x5

R(x1)

R(x2)

R(x3)

R(x5)R(x4)

low

low

近朱者赤

近墨者黑

Page 61: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

The score of each node depends on its connected

nodes.

x2

x3

x1

x4

x5

Changing Confidence Scores by Graph

Score of x1 depends on

the scores of x2 and x3

Page 62: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

The score of each node depends on its connected

nodes.

x2

x3

x1

x4

x5

Changing Confidence Scores by Graph

Score of x1 depends on

the scores of x2 and x3

Score of x2 depends on

the scores of x1 and x3

Page 63: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

The score of each node depends on its connected

nodes.

x2

x3

x1

x4

x5

……

Changing Confidence Scores by Graph

Score of x1 depends on

the scores of x2 and x3

Score of x2 depends on

the scores of x1 and x3

The scores are found by random walk algorithm.

Page 64: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Digital Speech Processing (DSP) of NTU based on lattices

(A) (B)

Graph-based Approach -

Experiments

(B): speaker independent (low recognition accuracy)

(A): speaker dependent (high recognition accuracy)

Page 65: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(A) (B)

Graph-based re-ranking (green bars) outperformed PRF (red

bars)

Graph-based Approach -

Experiments

Page 66: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Graph-based Approach –

Experiments on Babel Program

Join Babel program (巴別塔計畫) at MIT

Evaluation program of spoken term detection

More than 30 research groups divided into 4

teams

Spoken content to be retrieved are in special

languages

Page 67: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Graph-based Approach –

Experiments on Babel Program

3 out of 4 teams used this approach

Speech recognition system is based on deep neutral networks

Page 68: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

New Directions for

Spoken Content Retrieval

Incorporating Information Lost in Standard Speech

Recognition

Improving Recognition Models by User Relevance

feedback

Query Expansion with Speech Signals

Spoken Content Retrieval without Speech

Recognition

Interactive Retrieval

Page 69: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Online search engine optimizes performance by user

relevance feedback

E.g. click-through data [T Joachims, SIGKDD 02]

User Relevance Feedback

time 1:10 T

time 2:01 F

time 3:04 T

time 5:31 T

time 1:10 T

time 2:01 F

time 3:04 T

time 5:31 F

time 1:10 F

time 2:01 F

time 3:04 T

time 5:31 T

Query Q1 Query Q2 Query Qn

……

Page 70: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Update Recognition Models

Speech

Recognition Models

TextText

Retrieval Query learner

Spoken

Content

Retrieval

Result

update

optimize

time 1:10

time 2:01

time 3:04

time 5:31 T

time 1:10 T

time 2:01 F

time 3:04

time 5:31

time 1:10 F

time 2:01

time 3:04

time 5:31

Query Q1 Query Q2 Query Qn

……

Page 71: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

New Directions for

Spoken Content Retrieval

Incorporating Information Lost in Standard Speech

Recognition

Improving Recognition Models by User Relevance

feedback

Query Expansion with Speech Signals

Spoken Content Retrieval without Speech

Recognition

Interactive Retrieval

Page 72: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Query Expansion

for Text Retrieval

learner

“Inheritance material”

Retrieval

system

Search “Inheritance material” and “DNA”

To handle the problem of semantic retrieval,

retrieval system will expand the user query.

Page 73: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Query Expansion

for Spoken Content Retrieval

learner

“Inheritance material”

Search “Inheritance material” and “DNA”

Expand the queries by speech signals

Speech signals

of “DNA”

Retrieval

system

Page 74: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Query Expansion

for Spoken Content Retrieval

learner

“Inheritance material”

Search “Inheritance material” and “DNA”

Retrieval

system

Recognition

output (Text)

Speech signals

of “DNA”

Match on speech signal level

Page 75: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

New Directions for

Spoken Content Retrieval

Incorporating Information Lost in Standard Speech

Recognition

Improving Recognition Models by User Relevance

feedback

Query Expansion with Speech Signals

Spoken Content Retrieval without Speech

Recognition

Interactive Retrieval

Page 76: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Spoken Content Retrieval without

Speech Recognition

Spoken Queries

Spoken

Lectures

user

“DNA”

Match on speech

signal level

Page 77: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

New Directions for

Spoken Content Retrieval

Incorporating Information Lost in Standard Speech

Recognition

Improving Recognition Models by User Relevance

feedback

Query Expansion with Speech Signals

Spoken Content Retrieval without Speech

Recognition

Interactive Retrieval

Page 78: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Interactive Retrieval

Model the interactive retrieval process as

Markov Decision Process (MDP)

user

Deep Neural

Network

Find something

related to “speech”?

Page 79: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Part III:

Speech Summarization

Page 80: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

MMR approach

Maximum marginal relevance (MMR)

approach

Unsupervised approach: Use heuristic rules to

select utterances

Select utterances whose content are similar to

the whole lectures

Minimize redundancy in summary at the same

time

Page 81: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Supervised Approach

Training dataLecture 1

Lecture 2

Lecture 3

2nd and 4th utterances

form the summary

3rd utterances form the

summary

1st and 2nd utterances

form the summary

Use the training data to learn model

for summarization

Page 82: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Supervised Approach

– Binary Classification

Summarization problem can be formulated as a

binary classification program

Included in the summary or not

utterance 1

utterance 2

utterance 3

utterance 4

Binary

Classifier-1

+1

+1

-1

utterance 2

utterance 3

classification

result

summary

Binary

ClassifierBinary

ClassifierBinary

Classifier

Lecture

Page 83: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Supervised Approach

– Binary Classification

Training data

Lecture 1

Lecture 2

Lecture 3

The utterances in the

summary are positive

examples.

Otherwise, negative

examples

Train a binary classifier

Page 84: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Supervised Approach

– Binary Classification

Binary classifier individually considers each utterance

To generate a good summary, “global information” should be

considered

Example: summary should be concise

大家好 ……

LSA就是 Latent semantic analysis

LSA用來強化 summarization

LSA可以用來強化 summarization

我再說一次

……

…….

LSA可以用來強化 summarization

Spoken DocumentSummary

More advanced machine learning techniques

Page 85: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Globally considering the whole

spoken lectures

Learn a special model by structured learning

techniques

Input: whole lecture

Output: summary

Special

ModelLecture

Summary

Consider the

whole lecture

3 utterances

selected in

summary

Page 86: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation Function

Evaluation function of utterance set F(s)

s: utterance set in a lecture

F(s) 10

評分utterance

set s

how suitable it is to

consider utterance set

s as the summary

Properties:

• Concise?

• Include

keyword?

• Short enough?

……….

How good it is to take this

utterance set as summary?

Lecture

Page 87: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation Function

– How to summary

With F(s), we can do summarization on new

lectures now

Lecture

s1

s2

s3

s4

s5

s6

s7

Compute F(s) for

all utterance sets

If s6

maximizes

F(s)

summary

Enumerate all

the possible

utterance set s

Page 88: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation Function

Evaluation function of utterance set F(s)

s: utterance set in a lecture

F(s) 10

評分utterance

set s

how suitable it is to

consider utterance set

s as the summary

Properties:

• Concise?

• Include

keyword?

• Short enough?

……….

How good it is to take this

utterance set as summary?

Lecture

What properties

should F(s) check?

Page 89: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Learn F(s) from training data

Reference

summary

Reference

summary

Evaluation Function - Training

9

7

-4

highFind F(s) such that

lecture

Training data

F(s)

F(s)

F(s)

Structured SVM: I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support

Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004.

Page 90: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Summarization - Structure

Temporal structure helps summarization

lectures

sections

paragraphs

ch 1

Paragraph

1

Paragraph

2

Paragraph

3 ……

……

audio

several

utterances

several

utterances

several

utterances

Page 91: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Summarization - Structure

Temporal structure helps summarization

Long summary: consecutive utterances in a

paragraph are more likely to be

Short summary: one utterance is selected on behalf

of a paragraph.

…𝑥𝑖+3𝑥𝑖−2 𝑥𝑖−1 𝑥𝑖+6𝑥𝑖+4 𝑥𝑖+5

…𝑥𝑖+3𝑥𝑖−2 𝑥𝑖+6𝑥𝑖+4

Important paragraph

Representative of the paragraph

𝑥𝑖+5𝑥𝑖 𝑥𝑖+1 𝑥𝑖+2𝑥𝑖−1

𝑥𝑖 𝑥𝑖+1 𝑥𝑖+2

Paragraph 1 Paragraph 2 Paragraph 3

Paragraph 1 Paragraph 2

Page 92: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation Function - Structure

Add structure information into evaluation function

of utterance set F(s)

F(s) 100評分

Properties:

• Concise?

• Include

keyword?

• Short enough?

……….

utterances

Given the

information of

structure

Paragraph 1 Paragraph 2

Page 93: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Summarization - Structure

Structure in text are clear

Paragraph boundaries are directly known

For spoken content, there is no obvious

structure

Here the structure are considered as “hidden

variables”

Structured learning with hidden variables

Page 94: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Speech Summarization -

Experiments

Evaluation Measure: ROUGE-1 and ROUGE-2

Larger scores means the machine-generated summaries

is more similar to human-generated summaries.

Page 95: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Part IV:

Demo

Page 96: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

On-line lecture platforms (MIT)

“Cang-Jie (倉頡)”:

Search lecture recording and textbook

Linking video clips or textbook sections with similar

content

Inferring prerequisite and advanced concepts

http://people.csail.mit.edu/tlkagk/Cangjie/

Page 97: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Concluding Remarks

Page 98: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

(Multilingual)

Speech

Recognition

Towards Spoken Knowledge

Structuring and Organization

Syntactic and Semantic

Analysis

Speech

Summarization

Spoken

Content

Retrieval

Key Term

Extraction

Recognition

Output

Knowledge

Graph

Multimedia

(Audio/Video)

Temporal Structure

Visualizing

Search ResultEasy to browse

Inferring

Learning Path

Page 99: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Ultimate Goal

I want to learn “XXX”.

I am a graduate student of

computer science.

I can spend 10 hours.Learner

I open a course for you.

on-line learning

material

Personalized course for each learner

Page 100: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Thank You for Your Attention

Page 101: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Appendix

Page 102: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Video Demonstration

Page 103: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Paragraph Boundaries

With speech recognition, we know the content of

each utterances

Compute their similarities

Find the boundary of paragraph such that

The content of the utterances in a paragraph is

similar

Paragraph

1

Paragraph

2

Paragraph

3 ……paragraphs

audio

Paragraph

4

Page 104: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Slide Boundaries

The slides are modeled as HMMs

Align the slides with paragraphs

Paragraph

1

Paragraph

2

Paragraph

3 ……paragraphs

audio

Paragraph

4

s1 s2

Page 105: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation function F(s)

A good summary should

1. include the most important utterance

2. but minimize the redundancy at the same time

3. not too long

Utterance set s fulfill the above

requirement should have large F(s)

Page 106: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation function F(s)

A good summary should

1. include the most important utterance

2. but minimize the redundancy at the same time

3. not too long

sx

i

i

xIsF

I(xi): importance of utterance xi

ii xfwxI f(xi): feature of sentence xi

Page 107: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation function F(s,D)

A good summary should

1. include the most important utterance

2. but minimize the redundancy at the same time

3. not too long

Di sx

ixIsF

I(xi): importance of utterance xi

ii xfwxI f(xi): feature of sentence xi

Lexical feature: use speech recognition

to transcribe each utterance into text

similarity to the transcriptions of whole

lectures

latent topic distribution

how many keywords

……

Prosodic feature:

Energy, pitch, syllable duration,

pause duration

weights for each feature

Page 108: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation function F(s)

A good summary should

1. include the most important utterance

2. but minimize the redundancy at the same time

3. not too long

λ is a parameter to be determined.

sx sxx

jii

i ji

xxSimxIsF,

,

Sim(xi, xj): similarity between utterances xi and xj

Page 109: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation function F(s)

A good summary should

1. include the most important utterance

2. but minimize the redundancy at the same time

3. not too long

sx sxx

jii

i ji

xxSimxIsF,

,

sx

i

i

KxL(constraint)

L(xi): length of utterance xi

K: length constraint of summary

Page 110: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Evaluation function F(s)

A good summary should

1. include the most important utterance

2. but minimize the redundancy at the same time

3. not too long

sx sxx

jii

i ji

xxSimxIsF,

,

sx

i

i

KxL(constraint)

ii xfwxI Jointly learn from

training data

Page 111: Towards Spoken Knowledge Structuring and Organizationtlkagk/slide/MyTalk_NTUCS... · 2016. 5. 11. · Towards Spoken Knowledge Structuring and Organization When Speech Processing

Idea of Training

Training data

D1

D2

Reference

summaryR1

Reference

summaryR2

……

Find w and λ in F(s) such that

F(R1) > F(sD1)

F(R2) > F(sD2)

……

sD1 is all utterance set in D1, except R1

sD2 is all utterance set in D2, except R2

I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for

Interdependent and Structured Output Spaces, ICML, 2004.