Download - Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning Chitra Dorai IBM T.J. Watson Research Center New York [email protected].

Saras Shareable Rich Media Learning Object Repositories and Management for e-Learning

Chitra DoraiIBM T.J. Watson Research CenterNew [email protected]

(Saras(wati), a Sanskrit word for flow of knowledge/Goddess of Learning)

Overview of e-Learning Content Management Research

E-learning media semantic analysis for metadata generation

SCORM and MPEG-7 conformant asset metadata model

Search and browse client interfaces

Text, ImagesCourse catalogs,

Student Assessments

Content Manager

Asset Repository

Asset Repository

Asset Repository

Search & Browse ClientLO ingest

Learning Management SystemLearning Authoring Tool

E-Learning Media Analyzer Metadata

Audio, Video

SCORM / MPEG-7 Data Model

( DD )

Discussion Sections

Narration sections

Dialog, interviews,...

raw footage,text, ...

Video

On-screen

narration Voice Over

Direct Narration

Assistive Narration

Uninterrupted Voice Over

Interrupted Voice Over

Linkage Sections

( DN ) (A N ) (UV ) (IV ) (LF )

Multimodal narrative structure analysis for partitioning of instructional media

Manage learning assets of various types

Middleware for shareable learning object repositories

Metadata model creation from XML schema

Project Goals

Develop SCORM support technologies

• Enable generic content repositories (CMv8 and DB2) to support standards compliant e-learning and transform into shareable and interoperable learning object repositories

• Analyze instructional media for automated SCORM/MPEG-7 compliant metadata generation

• The Department of Defense (DoD) established Advanced Distributed Learning (ADL) initiative in 1997.

• ADL develops strategy for using learning and information technologies to modernize education and training on the Web, and to promote e-learning standardization.

• SCORM (Shareable Content Object Reference Model): ADL reference model for shareable learning content objects that enable interoperability, accessibility and reusability of Web-based learning content.

• Content Aggregation Model: LO Metadata, Content Packaging• SCORM is built on many e-Learning standardization efforts --- AICC,

IMS, IEEE LOM (became a standard in 06/02), ARIADNE.

E-Learning and Standards

SCORM LOM Overview

• Nine learning object metadata categories from IEEE LOM specification– General, Lifecycle, Meta-metadata, Technical, Educational,

Rights, Relation, Annotation, and Classification

• IMS’s XML binding specification for metadata representation

• Describe three content model components– Asset, Sharable Content Object (SCO), Content Aggregation

Enabling Content Repositories for e-Learning

Objective:

Develop middleware tools to enable content management products (IBM CM v8) and databases (DB2) for standards-based e-Learning archival and for supporting SCORM-compliant learning object metadata.

Creation of SCORM compliant learning object meta-data model on a repository

Automated storage of learning objects and their meta-data in the content repository

Search and retrieval of learning objects based on their meta-data

E-Learning Content Management with Content Manager

Meta-data

Generation Pages

Automated Instructional Media AnalysisObjectives:

– Develop technologies for standards-based e-learning content tagging, supporting shareable and searchable learning object repositories with rich media. • Rich instructional media analysis for automated

extraction of learning objects and their metadata from media for content-based search and browse

Problem with the State of the Art“The user seeks semantic similarity, the

[multimedia] database can only provide similarity on data processing”

• Existing content annotation/management systems cannot ensure reliable content location and access– Fall far short from the expectations of users:

Semantic gap– Generic, low-level annotations that deal only

with characterizing perceived content, not the meaning of it

– Lack of structure in content organization for non-linear navigation

Our Approach to Media Semantics Analysis

New Research Approach:Computational Media Aesthetics is the algorithmic study of visual and aural elements in media and associated analysis of the principles that underlie their manipulation in the creative art of clarifying and interpreting some event for an audience.

Best semantic grid for media interpretation is that within which its creators work - Derive meaning from the production grammar, aesthetic conventions used

Create tools for understanding high-level semantic constructs in a domain by interpreting the data with its maker’s eye, exploiting media production methods for their perceptual and interpretive guidance.

Content RepositoryMedia Semantic Analyzer Metadata

( DD )

Discussion Sections

Narration sections

Dialog, interviews,...

raw footage,text, ...

Video

On-screen narration

Voice Over

Direct Narration

Assistive Narration

Uninterrupted Voice Over

Interrupted Voice Over

Linkage Sections

( DN ) (A N ) (UV ) (IV ) (LF )

Example 1 - Multimodal analysis for extracting hierarchy of narrative structures in education/training video

Focus Areas: Motion picture analysis for affect and story essence using film grammar (recognized w best paper awards)

e-learning; Multimodal algorithms to parse and structure audiovisual content in media for content distillation & nonlinear browsing

Multigranular media narrative segmentation to generate & annotate reusable assets

Tempo in Titanic Tempo ebb and flow and associated story

elements and events automatically deconstructed

Example 2 - Titanic Movie Analysis for Tempo

ExampleNarrative Structure Based Segmentation of Education

and Training Videos

Problem Statement: Automatically structuralize instructional media through high-level semantics-based video partitioning and content tagging for effective segment search, access, and browse services in e-learning content management systems

Joint Work with Dinh Q. Phung and Svetha Venkatesh, Curtin University of Technology, W. Australia

Narrative Structures Hierarchy

Discussion

sections

Direct Narratio

n

Assistive

Narration

Un-interrupted

VO

Interrupted VO

Linkage Section

s

On-screen Narration

Voice Over

Narration Sections

Raw footage, text, …

Dialog, interviews, …

Narrative Structures Hierarchy: Discussion Sections

Discussion

sections

Direct Narratio

n

Assistive

Narration

Un-interrupted

VO

Interrupted VO

Linkage Section

s

On-screen Narration

Voice Over

Narration Sections



Capture dialog, interviews, meeting sections.

Narrative Structures Hierarchy: On-Screen Narration

Discussion

sections

Direct Narratio

n

Assistive

Narration

Un-interrupted

VO

Interrupted VO

Linkage Section

s

On-screen Narration

Voice Over

Narration Sections



Clear view of a narrator speaking in the scene.

Dominated by narrator’s face and captured in a close-up.

Interrupted presence of the narrator.

Narrative Structures Hierarchy: Voice Overs

Discussion

sections

Direct Narratio

n

Assistive

Narration

Un-interrupted

VO

Interrupted VO

Linkage Section

s

On-screen Narration

Voice Over

Narration Sections



The audio track is dominated by the voice of the narrator, but without their appearances (no faces)

smooth and continuousinterrupted

Narrative Structures Hierarchy: Linkage Sections

Discussion

sections

Direct Narratio

n

Assistive

Narration

Un-interrupted

VO

Interrupted VO

Linkage Section

s

On-screen Narration

Voice Over

Narration Sections



Raw footage, superimposed text, and others.

Visual Processing• S = {f1, f2, … , fN}: Sequence of frames from shots in a video for face detection

• Detect faces in frames using CMU’s face detector software

Feature 1: How many faces -- “How many frames contain faces as a proportion of the total frames in a shot ?”

Feature 2: Avg. face areas -- “If there is a face, how big is the face?”

• Two frame sequences from a shot are used: Uniformly sampled and key frames sequence

Audio Processing

• Classify shot audio into voice (V), no-voice (N) or mixture of two (M)

“Is the voice consistently delivered ?”New voice connectivity feature: Number of contiguous speech-dominant clips normalized by the shot length.

Characterize dominance of speech in audio tracks of shots

• Cluster audio clips into two classes and assume the larger cluster as one of clips with speech domination

• N = total # of audio clips within a shot

Nv = # of clips classified as voice-dominated

Va = voice activity = Nv/N

Classification

• Decision Trees as machine learning classifiers for final labeling of narrative structures

• C4.5 algorithm to train and test decision trees

• First learn all six classes at the first children level and test accuracy of labeling

• Propose a two-level decision tree for improved performance

Experimental Results

a b c d e f10 1 0 1 0 0 a = DD

0 29 0 3 0 0 b = DN0 0 12 2 0 0 c = AN0 2 0 480 0 0 d = UV2 0 2 22 0 0 e = IV0 1 0 13 0 0 f = LF

• Average classification result is high: 91.6%

Experimental Results: Confusion Matrix for Six Classes

Exp. Results (cont.)

• Results are very good for classes: DD, DN, AN and UV. However, poor for classes IV and LF

• VO with presences of many faces (meetings, party,..) accounts for most of misclassification

• Solution: group IV, LF and UV into a group G and study separately

a b c d e f10 1 0 1 0 0 a = DD


a b c d e f10 1 0 1 0 0 a = DD



a b c d e f10 1 0 1 0 0 a = DD


G

a b c G1 0 1 0 1 a = D D

0 2 9 0 3 b = D N0 0 1 2 2 c = A N2 3 2 5 1 5 G

97.6%


• Over-fitting is the problem identified in G due to UV instances outnumbering IV and LF

• To solve the problem to a certain extent, reduce number of UV such that number of instances of (IV, UV, LF) are approx. the same, and train with C4.5

a b c

424 40 18 a = UV

14 10 2 b = IV

7 1 6 c = LF

84.3%

Conclusion

• Novel narrative structure based analysis for segmentation of education and training videos

• Hierarchical DT-classification system achieves an overall accuracy of 84.7%

• Focus on higher level semantics such as segmentation of topics

• Work is underway – Map media objects to LOs– Algorithms for support of both SCORM and MPEG-

7 compliant XML metadata

Acknowledgements

Team:

• Geetika Tewari (IBM TJW, currently at Harvard U)

• Norman Haas (IBM TJW)

• Austin Schilling (IBM SWG)