Technical Aspects of the CALO Recorder

19
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky

description

Technical Aspects of the CALO Recorder. By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky. Role of the CALO recorder. A centralized mechanism to collect all perceptual events. Speech, Text CMU provides technology on - PowerPoint PPT Presentation

Transcript of Technical Aspects of the CALO Recorder

Page 1: Technical Aspects of the CALO Recorder

Technical Aspects of the CALO Recorder

By Satanjeev Banerjee

Thomas QuiselJason CohenArthur Chan

Yitao SunDavid Huggins-Daines

Alex Rudnicky

Page 2: Technical Aspects of the CALO Recorder

Role of the CALO recorder

A centralized mechanism to collect all perceptual events. Speech, Text

CMU provides technology on On Event Recording On Speech Recognition

Page 3: Technical Aspects of the CALO Recorder

Role of the CALO Recorder One of the component of CAMPER The four:

CALO recorder Speechalizer

End-pointing Information Prosodic Information Speech Recognition

CAMSeg Speech Segmentation Understanding

Page 4: Technical Aspects of the CALO Recorder

An Architecture Diagram (Client Side)

Audio Capturing Text Capturing through Keyboard

Ring Buffers

End-Pointer

VU Meter Speech

Decoder

Other Events

Storage

Page 5: Technical Aspects of the CALO Recorder

Persistence of Data

Background Intelligent Transfer System (BITS) Use to transfer data off-line

Page 6: Technical Aspects of the CALO Recorder

Technical Challenges in the Recorder Threading Audio Buffering Time-synchronization Real-time processing

End-pointing Speech processing

Portability Maintenance/Distribution

Page 7: Technical Aspects of the CALO Recorder

Threading Several processing needs to be concurrently

VU meter Speech Processing and Higher-level Understanding Graphical User Interface

Long development time was invested to make the communication between to be correct.

(By Thomas Quisel) See Architecture Diagram next slides

Example Issues: In some platforms, WX implementation will make GUI thread disallow other threads to call its drawing functions.

Page 8: Technical Aspects of the CALO Recorder
Page 9: Technical Aspects of the CALO Recorder

Audio Buffering Sphinx 2, 3.X libaudio require,

Capture audio Do processing on the audio buffer.

If the processing thread is slightly slower than 1xRT Audio will be lost

(By Jason Cohen) A ring buffer structure is implemented.

Page 10: Technical Aspects of the CALO Recorder

Time Synchronization By David Huggins Simple NTP (SNTP) is used in getting

universal time coordinate (UTC) from arbitrary NTP server Clone of standard NTP implementation

Internal Synchronization Synchronization time between machines 50-60ms

Major challenge is the delay imposed by OS/audio capturing software.

Page 11: Technical Aspects of the CALO Recorder

Real-time Processing Role of End-pointing and Recognition

After long-time debate Two stage end-pointing and recognition architecture

is chosen By Ziad

High performance end-pointing routine is created Gaussian Mixture Model-based End-pointer implemented as a frames voter within

segments The parameters are further manually tuned. Speed optimized. Now in s3ep, a customized version of Sphinx

Page 12: Technical Aspects of the CALO Recorder
Page 13: Technical Aspects of the CALO Recorder

Speech Recognizer

Resulting output is fed to the recognizer

Speech Recognition in meeting Regards as one of the biggest

challenge in the field Results largely varied from meeting

style, number of attendants, topics, disfluencies of the speakers.

Page 14: Technical Aspects of the CALO Recorder

Accuracy Performance, still under heavy work, Currently……

In the cleanest meeting (Bdb001) With one very dominating male

speaker With one very dominating female

speaker Speaker speaking rate entropy is

lowest Error rate 29.4%

Page 15: Technical Aspects of the CALO Recorder

Phase IV of Accuracy Improvement (Core) Boosting-based training Confidence-based N-best re-ranking Speaker adaptation based on

transformation Speaker normalization Include BN , SWB material in LM

training Dictionary Refinement

Page 16: Technical Aspects of the CALO Recorder

Phase IV of Accuracy Improvement (Optional)

STC MLLT DT PLP, TRAP LM with disfluencies and back-

channeling

Page 17: Technical Aspects of the CALO Recorder

Speed 2.2G machine Communicator

S2, 17.3%, 0.34xRT S3.X BL 11.8%, 4xRT S3.X Tuned 12.8, 0.87xRT

WSJ 5k S3.X BL 7.4% 1.61xRT S3.X BL 8.3% 0.5xRT

ICSI With tuning SVQ and CIGMMS, 0.7xRT is achieved. We may possibly tune up the results. Benchmarking results need time to prepared

Page 18: Technical Aspects of the CALO Recorder

Maintenance and Distribution

All in local CVS C, Java

Will soon move to SRI Regular release is created, usage

of SRI’s CVS will blur this line.

Page 19: Technical Aspects of the CALO Recorder

Conclusion

Engineering work is mostly done for the recorder

Time to improve individual components.

Everyone is welcomed to join the effort.