HUMAN LANGUAGE AND COMUNICATION:

Joseph Picone, PhDIntelligent Electronic Systems

Human and Systems EngineeringDepartment of Electrical and Computer Engineering

Why Is This Research Area Still an Important Challenge?

HUMAN LANGUAGE AND COMUNICATION:

of 13Perspectives on Human Language and Communication

Abstract and Biography

ABSTRACT: Speech technology has quietly become a pervasive influence in our daily lives despite widespread concerns about research progress over the past 20 years. However, because language is so fundamental to our human existence, the expectations users have for human computer collaboration have continually outpaced research advances. In this talk, we will review recent research on fundamentally new approaches to speech recognition with an emphasis on machine learning and discrimination. We will then project future research directions in this field based on a historical perspective of progress over the past 50 years. We will conclude with a discussion of why research in this area will have a fundamental impact on computational science far beyond speech or language.

BIOGRAPHY: Joseph Picone is currently a Professor in the Department of Electrical and Computer Engineering at Mississippi State University, where he also directs the Intelligent Electronic Systems program at the Center for Advanced Vehicular Systems. His principal research interests are the development of new statistical modeling techniques for speech recognition. He has previously been employed by Texas Instruments and AT&T Bell Laboratories. Dr. Picone received his Ph.D. in Electrical Engineering from Illinois Institute of Technology in 1983. He is a Senior Member of the IEEE.


• Fundamental challenge: diversity of data that often defies mathematical descriptions or physical constraints.

• Solution: Can we integrate multiple knowledge sources using principles of risk minimization?

Fundamental Challenges: Generalization and Risk

• Why research human language technology?

“Language is the preeminent trait of the human species.”

“I never met someone who wasn’t interested in language.”

“I decided to work on language because it seemed to be the hardest problem to solve.”


• Speech recognition State of the art Statistical (e.g., HMM) Continuous speech Large vocabulary Speaker independent

• Goal: Accelerate research Flexibility, Extensibility, Modular Efficient (C++, Parallel Proc.) Easy to Use (documentation) Toolkits, GUIs

• Benefit: Technology Standard benchmarks Conversational speech

Internet-Accessible Speech Recognition (CARE)


Training

Testing

Optimum • Pioneering the use of risk minimization in speech recognition and verification

• First LVCSR systems based on support and relevance vector machines

Integrating Speech and Natural Language Processing (ITR)

• Integrate speech recognition, prosody and parsing on conversational speech

• Multi-university and multidisciplinary (medium ITR)

• Speech features are highly confusable

• Integration of knowledge (e.g. linguistic context) is crucial


Nonlinear Statistical Modeling of Speech (HLC)

Expected outcomes:

• Reduced complexity of statistical models for speech (two order of magnitude reduction)

• High performance channel-independent text-independent speaker verification/identification

“Though linear statistical models have dominated the literature for the past 100 years, they have yet to explain simple physical phenomena.”

• Motivated by a phase-locked loop analogy

• Application of principles of chaos and strange attractor theory to acoustic modeling in speech

• Baseline comparisons to other nonlinear methods


Applications in Advanced Vehicular Systems (Mississippi)

• Use of dialog to provide on-demand training for workers

• A dialog system must adapt to user stress, confusion, and learning style


An Algorithm Retrospective of Language Technology

1950 1960 1970 1980 1990 2000 2010 2020

Analog Systems

Open Loop Analysis

Discriminative Methods

Expert Systems

Statistical Methods (Generative)

Knowledge Integration

Observations:

• Information theory preceded modern computing.

• Early research focused on basic science.

• Computing capacity has enabled engineering methods.

• We are now “knowledge-challenged.”


1950 1960 1970 1980 1990 2000 2010 2020

Physical Sciences:Physics, Acoustics, Linguistics

Cognitive Sciences:Psychology, Neurophysiology

Engineering Sciences:EE, CPE, Human Factors

Computing Sciences: Comp. Sci., Comp. Ling.

Observations:

• Field continually accumulating new expertise.

• As obvious mathematical techniques have been exhausted (“low-hanging fruit”), there will be a return to basic science (e.g., fMRI brain activity imaging).

A Historical Perspective of Prominent Disciplines


Evolution of Knowledge and Intelligence in HLT Systems

• The solution will require approaches that use expert knowledge from related, more dense domains (e.g., similar languages) and the ability to learn from small amounts of target data (e.g., autonomic).

Source of Knowledge

Performance• A priori expert knowledge created a

generation of highly constrained systems (e.g. isolated word recognition, parsing of written text, fixed-font OCR).

• Statistical methods created a generation of data-driven approaches that supplanted expert systems (e.g., conversational speech to text, speech synthesis, machine translation from parallel text).

… but that isn’t the end of the story …

• A number of fundamental problem still remain (e.g., channel and noise robustness, less dense or less common languages).


Historical Synergy Between IIS and HLC

• Speech recognition now widely acknowledged to be a machine learning problem. But language modeling has not yet embraced advanced statistical models.

• Statistical methods are now dominant in most forms of HLC research where ample amounts of data exist.

• Information extraction (e.g., audio mining) is coming of age, but named entities remain a major challenge.

• General perception that machine translation is at least 5 years behind spoken language in terms of resources, evaluation-driven research, and performance (but catching up quickly).

• Many forms of HLC research remain underfunded (multimodal, multispeaker conferences).


Summary

• Machine learning approaches to human language technology are still in their infancy.

• A mathematical framework for integration of knowledge and metadata will be critical in the next 10 years.

• Information extraction in a multilingual environment will be an emerging market in the next 5 years.

• Mundane problems such as named entity extraction are still major barriers in information extraction.

• It is widely perceived that research progress in machine translation will begin a similar trajectory to speech recognition in the next 10 years.

• This is a time of great opportunity!


Recent relevant peer-reviewed publications:

1. J. Baca and J. Picone, “Effects of Navigational Displayless Interfaces on User Prosodics,” Speech Communication, vol. 45, no. 2, pp. 187-202, Feb. 2005.

2. A. Ganapathiraju, J. Hamaker and J. Picone, “Applications of Support Vector Machines to Speech Recognition,” IEEE Trans. on Signal Proc., vol. 52, no. 8, pp. 2348-2355, August 2004.

3. R. Sundaram and J. Picone, “Effects of Transcription Errors on Supervised Learning in Speech Recognition,” International Conference on Acoustics, Speech, and Signal Processing, pp. 169-172, Montreal, Quebec, Canada, May 2004.

4. I. Alphonso and J. Picone, “Network Training For Continuous Speech Recognition,” to be presented at the 12th European Signal Processing Conference, Vienna, Austria, September 7-10, 2004.

5. J. Baca, F. Zheng, H. Gao, and J. Picone, “Dialog Systems for Automotive Environments,” European Conference on Speech Communication and Technology, pp. 1929-1932, Geneva, Switzerland, September 2003.

6. J. Hamaker, J. Picone, and A. Ganapathiraju, “A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines,” Proceedings of the International Conference of Spoken Language Processing, pp. 1001-1004, Denver, Colorado, USA, September 2002.

Relevant online resources:

1. “Projects,” http://www.isip.msstate.edu/projects/, Intelligent Electronic Systems, Center for Advanced Vehicular Systems, Mississippi State University, Mississippi State, Mississippi, USA, August 2004.

2. “Internet-Accessible Speech Recognition Technology,” http://www.isip.msstate.edu/projects/speech/index.html, August 2004.

3. “About our Software,” http://www.isip.msstate.edu/projects/speech/software/, January 2004.

4. “Nonlinear Statistical Modeling of Speech,” http://www.isip.msstate.edu/projects/nsf_nonlinear/, September 2004.

5. “Cognitive Assessment Using Voice Analysis,” http://www.isip.msstate.edu/projects/voice_analysis/, September 2004.

6. “Fundamentals of Speech Recognition — A Tutorial Based on a Public Domain C++ Toolkit,” http://www.isip.msstate.edu/projects/speech/software/tutorials/production/fundamentals/current/, Aug. 2003.

7. “Speech and Signal Processing Demonstrations,” http://www.isip.msstate.edu/projects/speech/software/demonstrations/index.html, September 2004.

8. “Fundamentals of Speech Recognition,” http://www.isip.msstate.edu/publications/courses/ece_8463/, September 2004.

Recent Publications


• Foundation Classes: generic C++ implementations of many popular statistical modeling approaches

Appendix: Relevant Resources

• Fun Stuff: have you seen our campus bus tracking system? Or our Home Shopping Channel commercial?

• Interactive Software: Java applets, GUIs, dialog systems, code generators, and more

• Speech Recognition Toolkits: compare SVMs and RVMs to standard approaches using a state of the art ASR toolkit

HUMAN LANGUAGE AND COMUNICATION:

Documents

Transcript of HUMAN LANGUAGE AND COMUNICATION: