Download - Speech Recognition: Technology & Patent Landscape · PDF filespeech-enabled communication, ... Word recognition 2,044 Speech to text conversion 1,521 ... Speech recognition application

Transcript

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 2 of 36

Public

Contents

1 Executive Summary .................................................................................................................. 3

2 Introduction .............................................................................................................................. 5

2.1 Technology Overview .................................................................................................................................. 5

3 Patent Landscape and Analysis ................................................................................................ 7

3.1 Leading Patent Owners ................................................................................................................................ 9

3.2 Technology-wise Patent Distribution of Top Assignees ............................................................................ 11

3.3 Seminal Patent Landscape ......................................................................................................................... 13

4 Application in Mobile Devices & Automobiles .......................................................................16

4.1 Speech Recognition Technology in Mobile Devices .................................................................................. 16

4.1.1 Patent Landscape of Mobile Device-Related Applications .............................................................................. 16

4.1.2 Key Players in the Mobile Devices Sector ........................................................................................................ 17

4.1.3 Seminal Patents in Mobile Device Applications ............................................................................................... 19

4.2 Speech Recognition Technology in Automobiles ...................................................................................... 21

4.2.1 Patent Landscape of Automobile-Related Applications .................................................................................. 21

4.2.2 Key Players in the Automobile Sector .............................................................................................................. 22

4.2.3 Seminal Patents in Automobile Applications ................................................................................................... 24

5 Litigation Trend in Patent Landscape .....................................................................................26

6 Conclusion ..............................................................................................................................30

7 Glossary ..................................................................................................................................31

7.1 Technology Category ................................................................................................................................. 31

7.2 Mobile Device Application Categories ....................................................................................................... 32

7.3 Automobile Application Categories ........................................................................................................... 33

8 Authors ...................................................................................................................................35

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 3 of 36

Public

1 Executive Summary

The ability to interface with a machine using natural human language has fascinated the

scientific world for many decades. Recent virtual assistants such as Apple’s Siri have

demonstrated the promise of a comfortable future with speech recognition and voice-enabled

processes penetrating household and industrial applications. While the success rate so far in the

world market has been negligible, this may soon change.

Microsoft’s Cortana, available in the Windows 10 for mobile phones, tablets and, importantly,

desktop units may dramatically advance the virtual assistant experience, providing Microsoft

with a significant edge over competitors. Unlike previous virtual assistants, Cortana is now

tailored to local languages, customs and cultures, and to the corresponding nuances of speech.

Researchers have struggled to build a platform that interprets and responds to voice commands

with accuracy and efficiency. While technology developers such as Nuance Communications

have developed a large speech recognition technology patent portfolio in recent years, Microsoft

and others have focused on linguistics, building massive dictionaries of vocabulary through

neural networks and cloud-based architecture. It appears that linguistics may be the key to

reducing processing time and providing a more seamless user experience.

This research report examines speech recognition technology and its patent landscape in the

U.S. market, providing an overview of key audio signal processing techniques and identifying

the IP strengths and weaknesses of top companies. The report also provides in-depth analysis

of two widely used speech recognition applications – mobile devices and automobiles.

iRunway’s analysis found 21,281 granted patents and major industry players such as Microsoft,

Nuance Communications, AT&T and IBM lead the list of top patent holders. While Microsoft

owns the largest patent portfolio in linguistics technology with 15% of the seminal, or strong,

patents in this space, Nuance Communications dominates the recognition category with 8.5%

seminal patents. Sony owns 7.4% of seminal patents in storage and transmission technology.

Another force of change will soon arrive as at least 172 seminal patents belonging to the

leading 10 seminal patent owners expire in 2016, bringing them into public domain. This will

likely prompt a new wave of development in the speech recognition domain with a dramatic

impact on the application of this technology. This is likely to reduce licensing costs and make

speech recognition technology more easily available to the larger market.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 4 of 36

Public

Mobile devices are emerging as commercially successful ubiquitous tools to perform multiple

human activities through voice commands. iRunway’s analysis found 3,209 patents related to

application of speech recognition technology for mobile devices. AT&T, Nuance

Communications, Microsoft, IBM and Google are the key patent holders in this space. Apple

does not have a large speech recognition patent portfolio and it licenses patents from other

industry leaders including Nuance that powers Siri.

A growing market for smart vehicles has also bolstered the automobile industry into a key

applicant of speech recognition technology. The analysis found 648 speech recognition patents

that were applicable to automotive and vehicular systems. Until the turn of the century,

automobile manufacturers were relying heavily on technology companies such as Nuance and

Microsoft to implement speech recognition functionalities in vehicles. In recent times, Apple and

Google have emerged as two major players vying for a large market share with their CarPlay

and Android Auto speech-controlled infotainment systems respectively. However, many auto

manufacturers have begun developing these technologies in-house. Denso, General Motors and

Honda are three leading patent owners in this space. Other leading owners of patents in

speech-enabled communication, navigation, maneuver, data presentation and techniques to

cancel environmental noise include Nuance, Microsoft, Alpine Electronics and AT&T.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 5 of 36

Public

2 Introduction

Designing a system that mimics human behavior, particularly the capability of speaking

naturally and responding interactively to spoken commands, has intrigued engineers and

scientists for centuries. Today, speech recognition has stepped into every realm, including

mobile phones, telecommunications, healthcare, banking, speech-controlled automobile

maneuvering, speech-based web browsing, robotics, virtual personal assistant, aviation,

military, education, handicap aid, security, and media and entertainment to name a few.

2.1 Technology Overview

Speech is technically defined as a sequence of basic units called phonemes. Automated Speech

Recognition (ASR) systems convert analog speech signals received through microphones to

digital signals that are segmented to retrieve phonemes. Using the phoneme sequence, the ASR

system refers to the vocabulary and grammar rules to decipher words or phrases. Processing

speech signals includes removing noise, reducing errors in recognizing phoneme patterns and

resolving ambiguity arising from variations in speech accent, pitch and speed.

Statistical models and grammatical rules require exemplary training data and a large volume of

vocabulary, words and phrases. With growing vocabulary and advances in semantic analyses,

speech recognition is striving to achieve more accuracy. Today's speech recognition systems

use powerful and complicated statistical modeling techniques to determine the most likely

outcome. Each system implements grammatical rules based on a specific dictionary to deduce

speech from phonemes.

Acoustic modeling and language modeling are important parts of modern statistically-based

speech recognition algorithms. Hidden Markov Models (HMMs) and neural network analyses are

widely used methods to recognize speech. Language modeling is also used in natural language

processing applications such as document classification or statistical machine translation.

Performance and accuracy of speech recognition systems is very crucial in industrial and

commercial applications. With the evolution of distributed computing and cloud-based data

storage, handling huge dictionaries and complex models to improve accuracy and performance

is now a reality.

Advancements in speech recognition techniques and their applications have manifested in

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 6 of 36

Public

increased patenting activity over the years. iRunway has analyzed patenting activity in this

technology sector for the past 20 years. The speech recognition landscape can be classified into

four major technology domains and multiple sub-categories as shown in Figure 1.

Please refer to the Glossary for a detailed understanding of the taxonomy.

Figure 1: Speech Recognition Technology Taxonomy

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 7 of 36

Public

3 Patent Landscape and Analysis

Since 1994, the USPTO has granted 21,281 patents across categories of this technology as

shown in Table 11.

Level 1 Categories Level 2 Categories Number of Patents

Recognition

(8,579)

Word recognition 2,044

Speech to text conversion 1,521

Correction and pattern matching 2,082

Models and algorithms 2,817

Speech recognition processes 909

Voice recognition 1,340

Speech recognition application 2,479

Storage and transmission (7,815)

Analysis and prediction 2,653

Data storage and distributed computing 1,094

Audio signal compression and expansion 2,176

Speech transformation 2,352

Enhancement and correction 3,102

Storage and transmission application 633

Linguistics

(3,519)

Translation 1,721

Natural language 1,002

Dictionary building 343

Theoretical interpretation 595

Multilingual support 417

Linguistics application 122

Synthesis

(1,368)

Text to speech conversion 1,055

Models and mathematical computations 802

Speech synthesis processes 483

Speech synthesis application 189

Table 1: Technology-wise Distribution of Speech Recognition Patents (Source: iRunway analysis based on patent data from USPTO)

The complexity of speech recognition technologies and their implementation, coupled with

innovation needed to evolve efficient speech assisted systems for commercial or research

purposes, has resulted in aggressive patenting activity over the years.

1 An IP asset may be counted in more than one of the Level 2 categories based on their relevance and theme.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 8 of 36

Public

The patent filing trend in the evolution tree below shows that the primary areas of focus include

methods to recognize speech from audio signals (recognition) and mechanisms to efficiently

store and transmit data2. Figure 2 charts the evolution of the speech recognition patent

portfolio across categories, with each circle representing the number of US patents filed in a

corresponding year in each category.

Figure 2: Evolution of the Speech Recognition Patent Portfolio in the U.S. Region (Source: iRunway analysis based on patent data from USPTO)

There has been intensive research in the last two decades to improve models and algorithms to

recognize speech commands from analog audio signals. The Hidden Markov Model (HMM) and

its variations is the most exploited this domain. In fact, transformation of analog speech signals

to digital data, and performing pattern matching analysis to distinguish word and voice to

activate applications has evolved as the key motivation for filing patents.

Patenting activity in linguistics technology indicates that translation is a widely researched field.

The need for large vocabularies to develop proficient language translation mechanisms is being

addressed through distributed storage and transmission techniques. This points to the fact that

efficiency and precision of speech recognition is dependent on the advancement of precise

mathematical models and effective ways of data retrieval, storage and transmission.

2 The number of patent applications for 2013 and 2014 may be more than what has been indicated in all the charts in

this paper as it takes an average of 18 months for a patent application to be published.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 9 of 36

Public

3.1 Leading Patent Owners

Speech recognition technology has garnered much interest from leading corporations including

Nuance Communications, Microsoft, Samsung, Apple and Google. Though Microsoft is the

largest patent holder, Nuance has been a strong player in this field. Nuance has been filing and

acquiring patents since the early stages of research and development in speech recognition

technology. It acquired a portfolio of thousands of speech recognition patents and pending

applications from IBM in 20093, while Microsoft recorded accelerated patenting activity between

2003 and 2007.

According to Wired magazine4, Apple has made significant investments to create a next-

generation backbone for its virtual assistant, Siri, through a neural network paradigm. Neural

network is a machine learning algorithm that simulates learning capabilities of neurons in the

human brain to manifest advanced artificial intelligence. The magazine also reported that IBM,

Microsoft and Google have deployed neural network technology in various speech-related

applications. Microsoft is using a neural network to power real-time translation features that are

expected to be introduced in Skype5. Similarly, Google is indulging in R&D in neural networks6

to offer a more efficient and accurate7 Google Now application8.

Nuance Communications is said to power some of the most appreciated commercial products

like Apple’s Siri9, and Ford’s in-car communication and infotainment system (SYNC)10. Each

year, millions of cars are fitted with Nuance’s voice, text to speech and natural language

understanding solutions. Leading automobile manufacturers like Audi, BMW, Chrysler, Ford,

General Motors, Hyundai and Toyota have been using Nuance’s Dragon Drive voice solution to

support a range of functionalities like voice dialing, message dictation, navigation, local

business search, music search and climate control in their automobiles11.

Figure 3 charts the 10 leading companies in this domain. In recent years, Google, IBM and

AT&T have emerged as significant applicants of speech recognition inventions, indicating strong

3 https://gigaom.com/2009/01/15/nuance-takes-on-microsoft-and-google-with-ibm-deal/

4 http://www.wired.com/2014/06/siri_ai/

5 http://www.wired.com/2014/05/microsoft-skype-translate

6 http://www.wired.com/2013/02/android-neural-network/

7 http://research.google.com/pubs/SpeechProcessing.html

8 http://www.forbes.com/sites/roberthof/2013/05/01/meet-the-guy-who-helped-google-beat-apples-siri/

9 http://www.forbes.com/sites/rogerkay/2014/03/24/behind-apples-siri-lies-nuances-speech-recognition/

10 http://www.nuance.com/company/news-room/press-releases/ND_003932

11 http://www.businesswire.com/news/home/20130530005460/en/Nuance-Accelerate-Successful-Automotive-

Business-Acquisition-Tweddle#.VNL_KmiUd1U

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 10 of 36

Public

research focus. IBM is a dominant applicant. In 2006, it entered into a partnership with Nuance

Communications to deploy speech recognition technology across various application areas.

Figure 3: Top 10 Speech Recognition Technology Patent Assignees (Source: iRunway analysis based on patent data from USPTO)

Figure 4 charts the patent filing trend of the top five companies in this domain.

Figure 4: Patent Filing Trend of Top 5 Assignees in Speech Recognition Technology (Source: iRunway analysis based on patent data from USPTO)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 11 of 36

Public

3.2 Technology-wise Patent Distribution of Top Assignees

iRunway analyzed the distribution of patents across categories for the top 10 assignees to

understand their R&D and IP focus. Nuance Communications owns the maximum number of

patents in models and algorithms. It also owns considerably large portfolios in speech to text

conversion, text to speech conversion, word recognition and voice recognition processes.

Microsoft has a distributed portfolio, dominating patented technologies in translation and

linguistics. It also owns intrinsic patents in models and algorithms, word recognition, correction

and pattern matching, and signal enhancement and correction categories. Since 2009 Google

has emerged as a strong player in recognition technology. Consumer electronics companies

Panasonic, Samsung and Sony own strong portfolios in audio signal storage and transmission

technologies. As represented in Figure 5 below, Nuance is the clear leader in developing

recognition techniques, while Microsoft has a distributed portfolio of patents in recognition,

linguistics and signal transmission categories.

Figure 5: Technology-wise Distribution of Speech Recognition Patents for Top 10 Assignees (Source: iRunway analysis based on patent data from USPTO)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 12 of 36

Public

AT&T holds the largest number of patents in synthesis technology, with a majority of them

related to text to speech conversion. Most of these patents focus on various technical

processes, mathematical models and algorithms on text to speech conversion technology.

Table 2 charts the leaders and competitors in each of the categories in Level 2. Numbers in red

indicate the maximum number of patents owned by an assignee in each Level 2 category. As

mentioned earlier, Nuance owns the highest number of patents across all Level 2 Categories in

Recognition technology.

Table 2: Patent Distribution of Top 10 Assignees across Level 2 Categories

(Source: iRunway analysis based on patent data from USPTO)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 13 of 36

Public

3.3 Seminal Patent Landscape

iRunway analyzed the 21,281 speech recognition patents granted by the USPTO to identify

seminal, or strong, patents using its proprietary solution COMPASSSM. Patent strength was

determined based on multiple parameters such as number of independent and dependent

claims, geographical presence, patent classifications, backward and forward references, age of

the patent, etc. The top 10% of patents were considered as seminal and important for further

analysis. Microsoft, Nuance Communications, AT&T, IBM, Apple and Google are the leading

assignees in terms of seminal patents. However, companies such as Digimarc, Lucent, Cisco

and Canon have remarkably strong seminal portfolios in this landscape.

Table 3 lists the number and percentage of seminal patents in the portfolios of the top 20

assignees.

Assignees Number of

Seminal Patents Percentage of Seminal

Patents in their Portfolios

Microsoft 144 9.8%

Nuance Communications 80 6.6%

Sony 73 12.4%

Digimarc 59 72%

IBM 59 8.1%

AT&T 48 5.5%

Apple 41 15%

Google 39 6.8%

Cisco 31 17%

Intel 31 13.8%

Canon 29 14.3%

LG 26 8.4%

Ericsson 26 14.7%

Qualcomm 25 9.3%

Hewlett-Packard 24 16.7%

Panasonic 24 4.8%

Samsung 22 4.6%

Philips N.V 22 12.7%

Lucent 21 39.6%

Fujitsu 18 8.2%

Table 3: Share of Seminal Patents within the Portfolios of the Top 20 Assignees (Source: iRunway analysis based on patent data from USPTO)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 14 of 36

Public

Nuance Communications owns the maximum number of seminal patents in recognition

technology (8.5% of all seminal patents in the category), followed by Microsoft (6.2%) and

AT&T (3.9%). Microsoft dominates in linguistics technology with 15% of seminal patents in this

category, followed by IBM that owns 5.5%. Apple and Nuance Communications own 4.2%

seminal patents each in linguistics. Sony is the predominant leader in storage and transmission

technology, owning around 7.4% of the total seminal patents in this category.

Figure 6 illustrates the seminal patent distribution of the top 10 assignees across Level 1

categories.

Figure 6: Distribution of Seminal Patents in the Portfolios of the Top 10 Assignees (Source: iRunway analysis based on patent data from USPTO)

Interestingly, few of these seminal patents will expire next year, bringing critical speech

recognition patents into public domain. Figure 7 illustrates the number of seminal patents

owned by the top 10 assignees vis-á-vis the number of them expiring by 2016:

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 15 of 36

Public

Figure 7: Number of Seminal Patents Expiring by 2016 in the Portfolios of Top 10 Seminal Patent Owners (Source: iRunway analysis based on patent data from USPTO)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 16 of 36

Public

4 Application in Mobile Devices & Automobiles

Speech recognition technology has enabled a wide range of applications for industrial and

domestic purposes. The need to automate and provide user convenience has triggered the

adaptation of speech recognition technology in all facets of everyday applications. These include

intelligent personal agents, customer care wizards, call center automated attendants, voice

access to universal directories and registries, unconstrained dictation and language translation

capabilities, speech assistance for handicaps, speech-enabled security systems, speech-

controlled medical equipment, automobile maneuvering and navigation systems among others.

However, the technology has found extensive demand in mobile phones/ hand held portable

devices and automobiles.

4.1 Speech Recognition Technology in Mobile Devices

The most widespread application of speech recognition is implemented through mobile devices,

smartphones, portable devices and digital assistants. Mobile devices have transformed from

mere voice calling devices to more advanced personal computing devices that allow users to

perform numerous applications like browsing the Internet, capturing photographs, language

translation, navigation guidance, and managing entertainment or multimedia content to name a

few. Speech recognition applications on mobile devices include voice activated initiation of

applications and speech synthesis and/or speech-based guidance for the user.

4.1.1 Patent Landscape of Mobile Device-Related Applications

An analysis of the 21,281 speech recognition patents revealed that 3,209 patents were related

to application of speech recognition technology for mobile devices. Methods and processes that

allow users to instruct and interact with mobile devices to perform various tasks and

applications intended to provide hands-free experiences has been a major area of focus. Design

of voice input and output hardware and software platforms to eliminate noise and increase the

accuracy of interpretation have enabled widespread implementation of speech recognition

capabilities on mobile devices.

Voiceprint is an important source of biometric authentication that is increasingly being

employed for confidential transactions. Also, voice commands to locate data in mobile devices is

a growing space in research. Figure 8 charts the distribution of these patents. Please refer to

Appendix 6.2 for a detailed understanding of each category.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 17 of 36

Public

Figure 8: Distribution of Speech Recognition Patents across Mobile Device Application Categories (Source: iRunway analysis based on patent data from USPTO)

Hands-free computing has been the major area of focus since 1995. Other categories have a

fairly distributed focus on patenting in transcription and text-to-speech applications that are

growing into areas of research interest.

4.1.2 Key Players in the Mobile Devices Sector

Speech recognition capabilities on mobile devices have attracted major technology development

companies, irrespective of mobile phone manufacturers or platform developers. In the race to

provide better services to users, companies have invested heavily in this area.

Patenting activity in the mobile devices application segment has been aggressive. Leading

mobile device manufacturers, telecommunications providers and technology platform providers

are actively patenting in speech recognition, synthesis and audio signal processing. In 2004 and

2008, IBM, AT&T, Microsoft and Nuance engaged in heavy patenting activity, with Nuance

leading the list both years. While AT&T has been a steady applicant, Google picked up pace in

2010 aligned to the development and launch of its Android-based Nexus mobile device series.

Figure 9 below shows that AT&T and Nuance are close competitors in this space with

comparable portfolio sizes. Interestingly, in 2006 Nuance mortgaged 34 of these patents to

UBS Stamford Branch. It is interesting to note that UBS, which is a banking entity, owns 160

patents in this domain, 54 of which are seminal and make up for 33.8% of its portfolio.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 18 of 36

Public

Figure 9: Top 10 Assignees in Speech Recognition Technology for Mobile Devices (Source: iRunway analysis based on patent data from USPTO)

It is common knowledge that Nuance powers Apple’s smart personal assistant, Siri12. Though

Apple doesn’t have a large speech recognition portfolio, it licenses patents from other players. A

detailed analysis of the top five assignees reveals that while all major players are focusing on

hands-free computing, AT&T has shown special interest in interactive voice response systems

(IVRS), authentication and identity management, and transcription and text-to-speech

application categories. Nuance owns significant patents in language learning and dictionary

application categories. Figure 10 charts the distribution of patents of the top five assignees:

Figure 10: Patent Distribution of the Top 5 Assignees in Mobile Devices Categories (Source: iRunway analysis based on patent data from USPTO)

12

http://www.forbes.com/sites/rogerkay/2014/03/24/behind-apples-siri-lies-nuances-speech-recognition/

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 19 of 36

Public

Microsoft and Google have emerged as major patent applicants in the last five years, racing

ahead of Nuance Communications. In order to understand the specific research and business

interest of the top five assignees, iRunway analyzed the pending applications as well.

Figure 11 below shows that Google has been focusing on developing hands-free computing,

language learning and dictionary applications, and transcription & text-to-speech applications.

AT&T, on the other hand, is investing in developing authentication & identity management

technology, and transcription & text-to-speech applications.

There is much competition among inventions related to language learning and dictionary

application technologies, with Nuance Communications, Google and Microsoft fighting for a

larger share of the pie.

Figure 11: Pending Patent Applications of Top 5 Assignees across Mobile Device Categories (Source: iRunway analysis based on patent data from USPTO)

4.1.3 Seminal Patents in Mobile Device Applications

An analysis of the seminal patent set in mobile device applications revealed that researchers

are focusing heavily on hands-free computing technology. About 39% of the seminal set

includes patents related to this application field.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 20 of 36

Public

A number of seminal patents related to mobile device applications of speech recognition

technology have been securitized to various financial organizations like UBS Stamford Branch,

JP Morgan Chase and Bank of New York. Microsoft, Google, Nuance Communications and IBM

are the key assignees of seminal patents in this application space. Companies such as AT&T,

Cisco, Digimarc, Intervoice, Apple and Accenture also own seminal patents.

Intense research in hands-free computing technology has resulted in a maximum number of

seminal patents registered in this category. With IVRS technology penetrating various customer

interfaces, it is another application category that is encouraging researchers to invent newer

and better methods of speech recognition. Figure 12 presents a distribution of seminal patents

in the mobile device application taxonomy.

Figure 12: Distribution of Seminal Patents in the Mobile Devices Application Categories (Source: iRunway analysis based on patent data from USPTO)

Table 4 charts the distribution of seminal patents across application categories in the mobile

devices categories for the top 10 seminal patent owners. Hands-free computing is the only

technology category in which all the ten companies own patents, while Nuance Communications

is the only player in this list that owns patents across all categories of speech recognition

technology application in mobile devices.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 21 of 36

Public

Table 4: Distribution of Seminal Patents of Top 10 Assignees in Mobile Device Categories (Source: iRunway analysis based on patent data from USPTO)

4.2 Speech Recognition Technology in Automobiles

Implementing voice command recognition in an automotive environment has been a major

challenge in developing speech recognition technology over several decades now. The first

vehicle mounted speech recognition application included caller dialing and basic navigation

functions through simple voice commands. Today, there is growing demand for the technology

to beat a variety of noises from outside and inside, including audio reverberations and echo

from mechanical surfaces. There is also a need to develop natural language interpretation for

distraction-free driving.

Communication to and from automobiles such as navigation instructions and remote access of

data has enabled drivers to stay informed and entertained while travelling. Vehicle

manufacturers are keen on providing intelligent and personalized data based on user

preferences, with capabilities to manage applications such as emails, select music from a

playlist, answer or make phone calls, type text messages and navigate through voice

commands.

4.2.1 Patent Landscape of Automobile-Related Applications

Of the 21,281 U.S. patents analyzed, 648 patents were applicable to automotive and vehicular

systems. Notably, patents related to speech recognition in mobile devices can also be extended

to the automobile environment. However, these 648 patents in this section are exclusive to the

use of speech recognition processes in an automobile environment.

Figure 13 represents the distribution of these patents across automobile application categories.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 22 of 36

Public

Figure 13: Distribution of Speech Recognition Patents across Automobile Categories (Source: iRunway analysis based on patent data from USPTO)

Please refer to Appendix 7.3 for a detailed understanding of the category.

4.2.2 Key Players in the Automobile Sector

While many leading automobile makers manufacture vehicles with speech recognition

functionalities, only a few own valuable patent portfolios. This indicates that manufacturers are

licensing patents from technology developers. Nuance Communications is reported to have

licensed its patents to several automobile manufacturers such as Ford, General Motors,

Chrysler, Toyota, Volkswagen, Daimler, BMW and Fiat13.

Nuance Communications owns the maximum number of patents in this domain with 42 patents,

followed by Denso that has a portfolio with 30 patents. General Motors owns 21 patents and

has been ranked third in the list of leading assignees in this technology landscape. However,

General Motors has another 21 patents securitized with Wilmington Trust Company.

Until the turn of the century, automobile manufacturers were relying heavily on technology

developing companies such as Nuance Communications and Microsoft to implement speech

recognition functionalities in their vehicles. However, patent filing trends suggest that they are

beginning to actively develop these technologies in-house.

13

http://www.bizjournals.com/boston/blog/techflash/2014/09/toyota-renews-contract-with-nuance-for-in-car.html

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 23 of 36

Public

iRunway found that since 2010, General Motors has focused its research activities in driver

assistance, voice enhancement and noise reduction categories, while Ford has shown an

interest in developing solutions in audio input/ output and user interface technologies.

Patenting activities in voice enhancement and noise reduction suggest that assignees are

aiming to solve noise issues from within and outside the vehicle to provide efficient speech

recognition functionalities in automobiles.

Figure 14 charts the leading owners of patents related to the application of speech recognition

technology in automobiles. Nuance is the predominant leader in this space with 42 patents,

while Robert Bosch, Harman International, Honda Motors and Panasonic own 11 patents each.

Google follows with 10 patents in speech recognition technologies applicable to the automobile

industry. Sony, Mitsubishi and Ford own nine patents each, followed by Johnson Controls,

Volkswagon and IBM with six patents each. Qualcomm, System Application Engineering Inc.

and Toyota end the list of leading patent assignees in this segment with each owning six

patents related to speech recognition technologies that can be applied to automobiles.

Figure 14: Top Assignees of Patents in Speech Recognition Technology for Automobiles (Source: iRunway analysis based on patent data from USPTO)

Denso and Microsoft own a considerable number of patents in audio input/ output and user

interface categories, while General Motors has a significant portfolio of driver assistance and

communication patents. Alpine Electronics has patents equally distributed across the six

technology categories in the speech recognition taxonomy in the automobile landscape, while

Microsoft has greater focus on navigation technology.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 24 of 36

Public

Figure 15 presents a distribution of patents of the top five assignees. iRunway’s analysis of the

top five assignees indicates that Nuance Communications has been focusing on development of

navigation, and noise reduction and voice enhancement technologies.

Figure 15: Patent Distribution of Top 5 Assignees in Automobile Categories (Source: iRunway analysis based on patent data from USPTO)

4.2.3 Seminal Patents in Automobile Applications

iRunway’s analysis revealed that maximum seminal patents are related to communication (32%

of all seminal) and navigation technologies (23%). General Motors owns the maximum seminal

patents in this space. Figure 16 presents a distribution of seminal patents in the automobile

taxonomy.

A major portion of the seminal patents related to speech recognition in automobiles relates to

communication technology. This points to increasing research interest in bettering

communication interfaces for drivers. About 23% of seminal patents in this patent set describe

technologies that can better navigation. This is a field of growing interest for researchers, with

the field receiving a boost from GPS technology and an increasing usage of navigation services

by users. An analysis of the top 10 seminal patent owners in speech recognition technology

categories for automobile applications shows that only General Motors owns a strong patent in

this space. A majority of these top 10 players are focusing on developing communication and

navigation technologies.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 25 of 36

Public

Figure 16: Distribution of Seminal Patents across Automobile Application Categories (Source: iRunway analysis based on patent data from USPTO)

Table 5 charts the distribution of seminal patents across application categories for the top 10

seminal patent owners.

Table 5: Seminal Patent Distribution of Top 10 Assignees across Automobile Application Categories (Source: iRunway analysis based on patent data from USPTO)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 26 of 36

Public

5 Litigation Trend in Patent Landscape

Of the 21,281 patents analyzed in the speech recognition domain, iRunway found 291 patents

involved in litigation and trials. Since 2000, these 291 patents have been involved in 745 patent

litigations, 67 Patent Trial and Appeal Board (PTAB) petitions and 10 International Trade

Commission (ITC) investigations. The litigation and trial filing trend as seen in Figure 17,

suggests a sudden spike in the number litigations in 2012, progressively increasing up to 2014.

A PricewaterhouseCoopers report14 states that there has been a 29% increase in the number of

litigations filed in 2012 over 2011. This can be owed to the anti-joinder provision of the America

Invents Act. Not surprisingly, 2012 saw a spike in the number of cases filed in speech

recognition technologies. Approximately 65% of the cases filed in 2012 were by four NPEs,

namely Blue Spike LLC (55 cases), EMG Technology LLC (18 cases), MCRO Inc. (27 cases) and

Ceecolor Industries LLC (13 cases). The technology sub-domains of these patents are

recognition and, storage and transmission. These patents were granted or assigned to the

plaintiffs in 2012 which propelled them to initiate litigation against multiple defendants. Only

5% of these cases are still open.

Figure 17: Patent Litigation Trend in Speech Recognition Domain

(Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)

Of the 291 patents involved in litigation, 112 patents have been identified as seminal in Section

3.3. Figure 18 below shows a technology distribution of seminal patents under litigation.

14

http://www.pwc.com/en_US/us/forensic-services/publications/assets/2013-patent-litigation-study.pdf

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 27 of 36

Public

Figure 18: Distribution of Seminal Patents under Litigation (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)

NPEs have been involved in 580 cases between 2000 and 2014. Operating companies have

been less active in litigation, initiating just 136 lawsuits. Figure 19 shows the distribution of

various entities in terms of percentage of litigations filed.

Figure 19: Patent Litigation Distribution by Type of Company (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 28 of 36

Public

The top 15 plaintiffs and defendants involved in litigation is shown in Figure 20 and Figure 21

respectively. Interestingly, these plaintiffs have not pursued majority of the cases as evident in

Figure 21.

Figure 20: Number of Open and Closed Litigations Filed by Top 15 Plaintiffs (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)

It is evident from Figure 21 that major technology developers and mobile phone manufacturers

are the primary defendants in cases filed by NPEs. This trend can be owed to the growing

application of speech recognition technologies on mobile devices.

Figure 21: Top 15 Defendants involved in the Litigations (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 29 of 36

Public

Recognition technology, followed by storage and transmission technology, are the major areas

of litigation. Figure 22 charts the distribution of litigated patents across technology categories.

Figure 22: Percentage of Litigated Patents across Technology Categories

(Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 30 of 36

Public

6 Conclusion

Over the last twenty years, speech recognition technology has evolved into an everyday

application. Much of the activity has been driven by innovation, competition, user demands and

ease of application. Industry trends suggest that growth in speech engines and native

algorithms may be modest in the near future, while heavy research and development activity

will be witnessed in speech-based applications across everyday gadgets and machines.

The implementation of speech recognition applications in mobile devices, smartphones and

smart assistants has unraveled a new arena for competition. Apple’s Siri, which has been

dominating this market for the last few years, now faces strong competition from Microsoft’s

Cortana, Google Now, Nuance Communications’ Dragon Go and many others. An appealing

voice interface and visual emoticon, combined with artificial intelligence, has made these

applications more user-friendly.

The automobile segment is witnessing lesser patenting activity in comparison to mobile device

applications. However, mobile devices are being increasingly integrated with the automobile

environment, and hence, the dividing line between these two industries is getting finer by the

day. To offer better driving and navigation experiences, use of handheld or pre-installed

interfaces in a vehicle to assist drivers is being researched and deployed by major automobile

manufacturers. Over the last decade, there has been an increase in collaborations between

technology developers and automobile manufacturers to ensure better incorporation of speech-

controlled applications in automobiles.

While Microsoft and Nuance Communications were the predominant patenting leaders in the

speech recognition landscape, in recent times Google is making a mark with its aggressive

patenting activity. There is much happening in this segment with players in the automobile and

mobile devices segments vying to create a common platform that can integrate speech

recognition applications across a variety of operating systems and offer seamless speech

command services.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 31 of 36

Public

7 Glossary

7.1 Technology Category

Recognition – This is the most important category in the speech recognition technology

landscape and encompasses patents that disclose inventions related to recognition of words

from speech data. Speech is separated into discrete components which are distinguished

from one another. Patents in this category disclose inventions related to word recognition,

voice recognition, speech to text conversion, and models and algorithms to achieve these

conversions. Phonetic, phonemic, lexical and semantic knowledge is used in conjunction

with statistical regression of mathematical models such as the Hidden Markov Model and its

derivatives to recognize human and computer readable words.

Storage and Transmission – Speech signals entered by a user is processed in multiple

steps to activate the desired action. Analog speech signals are digitized and computed to

decipher machine readable commands. The speech may be coded or transformed based on

mathematical functions and converted into computational data. Speech data is stored or

transmitted for further analysis. Correction and enhancement functions are applied on the

speech data to remove disturbances and noise. Challenges in meeting accuracy and

efficiency requirements for speech analysis have been addressed through parallel

distributed computing of speech data using artificial neural networks.

Linguistics – Recognizing speech is impossible without an understanding of the language.

Linguistics includes patents that disclose inventions pertaining to construction of a word,

phrase or sentence in a language. This involves building a dictionary, formulating grammar

rules, language translation and natural language processing.

Synthesis – Synthesis of speech is artificial simulation of human speech. Synthesis

includes patents that disclose the processes of combining speech components to produce a

synthetic speech output. This category includes patents related to text to speech conversion

techniques and various models and methods to do so.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 32 of 36

Public

7.2 Mobile Device Application Categories

Hands-Free Computing - Hands free computing refers to a configuration that allows a user

to interact or control a device without the use of hands and other equipment such as touch

screen, keyboard or a mouse. Mobile device applications can be controlled by providing a

voice input. A speech recognition system present in the portable device identifies voice

commands and performs the operation.

Authentication & Identity Management - Authentication and identity management refers

to management of a user’s access to a certain application or a feature available in a portable

device. Voice input provided by the user is compared to an existing voice print present in the

portable device. The user is provided access to a portable device only if the voice input

matches with a pre-existing voice print. For example, operations such as unlocking the touch

screen based on voice command, caller identification based on voice, performing business

transactions, etc.

Database Annotation/Searching - Database annotation refers to annotating the data

present in a database. Database searching refers to performing a search for specific data

stored in a database or remote server. A voice command requesting search for a specific

data in a database is obtained and information corresponding to the request is sent to the

portable device based on the voice command. For example, operations such as retrieving

audio content, extracting phone numbers, etc., come under this category.

Language Learning & Dictionary Applications - The category deals with technology that

enables mobile computing devices to learn new words and language models, phonetic

representation of words in different languages and perform translation operations. Language

learning techniques also allow the device to learn words and pronunciations specific to

certain users, and additionally create different language profiles for multiple users. For

example, operations such as pronunciation correction, words addition to dictionary, etc., fall

under this technology category.

Interactive Voice Response System (IVRS) - An Interactive Voice Response System is

an automated telephony system, where a caller can interact with a computer. IVR systems

are used extensively in banking, airline services, pharmacies and many other places to guide

customers through various processes. Callers can interact with the computer through Dual

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 33 of 36

Public

Tone Multi Frequency (DTMF) signals, which are unique tones associated with keys on a

telephone keypad. Alternatively, speech recognition systems can also be employed where

callers can use voice commands to interact with the IVR system.

Transcription & Text to Speech Applications - Transcription in mobile devices allows

accurate conversion of voice to text. Transcription has several applications in mobile

computing devices, like typing, browsing through menu items using voice commands etc.

Text to speech functionality, on the other hand, enables the computing device to read aloud

text that is entered by the user or is the output of any computing process. Text to speech

applications include, but are not limited to reading emails, e-books, etc.

7.3 Automobile Application Categories

Audio Input/Output and User Interface - Audio Input/Output and User Interface refers

to technology used in smart devices in automobiles that receives and processes speech or

audio input through its user interface. It also encompasses technology for outputting audio

and video signals to allow user interaction. Audio input/output devices used in automobiles

for receiving speech inputs, UI devices for processing speech commands, display screens to

guide and facilitate the driver, and audio video/ output mediums to display or play the

requested data are a few examples.

Communication - This category deals with communication established between the driver

of an automobile and a remote server that provides information and data services. The driver

establishes communication with the server and provides voice inputs requesting information.

The server responds with the information requested. Telecommunication devices like mobile

phones and smartphones used from an automobile can serve as a communication medium.

Driver Assistance - This category deals with technology implemented in automobiles to

provide assistance to drivers. The driver may control entertainment devices, access emails,

select multimedia files etc. by providing voice input to a device. In addition, the driver may

be alerted if the smart device senses him/her to be drowsy while driving.

Maneuver - Maneuver refers to technology used in automobiles to enhance driving

experience. This may include automatic control of the steering wheel, brakes, window panes,

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 34 of 36

Public

wipers, etc. through voice commands, equip robots for automatic control of brakes and other

driving related equipment. The driver of the vehicle may be able to control the movement of

the vehicle remotely through voice commands.

Navigation - Navigation refers to location services provided to the driver based on voice

commands. The driver may request for directions to a specific location through speech inputs

to the navigation device present in the automobile. This category also includes automatic

suggestions of near-by points of interest such as restaurants, shops, pharmacies, hospitals,

etc. based on the location of the vehicle or pre-defined user preferences.

Voice Enhancement and Noise Reduction - Voice enhancement refers to technology used

in automobiles to enhance the quality of speech command recognition by reducing noises

associated with an automobile environment. Noise in automobiles can be internal due to

various mechanical reverberations or external (for example traffic, neighboring vehicles). To

identify user commands from the audio input requires dampening of noise and enhancement

of voice.

Speech Recognition: Technology & Patent Landscape

© iRunway 2015 Page 35 of 36

Public

8 Authors

The authors acknowledge the many contributions of Purnima Lodha, Sarfraz Shariff and Ambika

Ashirwad Mohanty towards shaping this report.

Aditi Das

Aditi is a Consultant at iRunway. She provides key technical insights that help IP

attorneys profoundly improve licensing and litigation outcomes. She specializes

in data mining, technology and patent landscape, patent infringement and

validity analyses, and helps clients in multiple technology domains.

Ashish Gupta

Ashish is a Consultant at iRunway. He is a strategy analyst who provides new

licensing outlooks for clients. He works on high-profile technology litigation and

licensing programs, helping attorneys and clients find crucial infringement

evidence.

Bhargav Ram

Bhargav is a Senior Associate at iRunway. He specializes in data mining and

provides in-depth analyses to extract meaningful insights that provide lead

indicators to support complex IP monetization matters of clients.

Contact

United States

1114 Lost Creek Blvd, Suite 400

Austin, TX 78746

Tel: +1 512 284 8200

2905 Stender Way, Unit 28

Santa Clara, CA 95054

Tel: +1 650 308 4807

8400 E. Crescent Parkway, Suite 600

Greenwood Village, CO 80111

Tel: +1 720 528 4273

India

1st Floor, AMR Tech Park I Annex

No. 23 and 24, Hongasandra

Hosur Road, Bangalore - 560068

Tel: +91 804 058 4000

www.i-Runway.com

iRunway® is a registered trademark of iRunway India Private Limited.

iRunway has prepared this research independently based on reliable public data and reviewed the results based on its proprietary methodology, with the belief that it is fair and not misleading. The preparers of the information in this report are not engaged in rendering legal or other professional advice, and nothing in this document should be construed as such. iRunway does not practice law and it exists to provide technical research, analysis and reporting capability to its clients.

The patent data in this report is as of the date of preparation and therefore is subject to change as new patents are filed and issued every day. iRunway, nor any employee of iRunway accepts any liability for any damages or losses, direct, indirect, consequential, arising from any use or interpretation of this report or its contents.