Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 2 of 36
Public
Contents
1 Executive Summary .................................................................................................................. 3
2 Introduction .............................................................................................................................. 5
2.1 Technology Overview .................................................................................................................................. 5
3 Patent Landscape and Analysis ................................................................................................ 7
3.1 Leading Patent Owners ................................................................................................................................ 9
3.2 Technology-wise Patent Distribution of Top Assignees ............................................................................ 11
3.3 Seminal Patent Landscape ......................................................................................................................... 13
4 Application in Mobile Devices & Automobiles .......................................................................16
4.1 Speech Recognition Technology in Mobile Devices .................................................................................. 16
4.1.1 Patent Landscape of Mobile Device-Related Applications .............................................................................. 16
4.1.2 Key Players in the Mobile Devices Sector ........................................................................................................ 17
4.1.3 Seminal Patents in Mobile Device Applications ............................................................................................... 19
4.2 Speech Recognition Technology in Automobiles ...................................................................................... 21
4.2.1 Patent Landscape of Automobile-Related Applications .................................................................................. 21
4.2.2 Key Players in the Automobile Sector .............................................................................................................. 22
4.2.3 Seminal Patents in Automobile Applications ................................................................................................... 24
5 Litigation Trend in Patent Landscape .....................................................................................26
6 Conclusion ..............................................................................................................................30
7 Glossary ..................................................................................................................................31
7.1 Technology Category ................................................................................................................................. 31
7.2 Mobile Device Application Categories ....................................................................................................... 32
7.3 Automobile Application Categories ........................................................................................................... 33
8 Authors ...................................................................................................................................35
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 3 of 36
Public
1 Executive Summary
The ability to interface with a machine using natural human language has fascinated the
scientific world for many decades. Recent virtual assistants such as Apple’s Siri have
demonstrated the promise of a comfortable future with speech recognition and voice-enabled
processes penetrating household and industrial applications. While the success rate so far in the
world market has been negligible, this may soon change.
Microsoft’s Cortana, available in the Windows 10 for mobile phones, tablets and, importantly,
desktop units may dramatically advance the virtual assistant experience, providing Microsoft
with a significant edge over competitors. Unlike previous virtual assistants, Cortana is now
tailored to local languages, customs and cultures, and to the corresponding nuances of speech.
Researchers have struggled to build a platform that interprets and responds to voice commands
with accuracy and efficiency. While technology developers such as Nuance Communications
have developed a large speech recognition technology patent portfolio in recent years, Microsoft
and others have focused on linguistics, building massive dictionaries of vocabulary through
neural networks and cloud-based architecture. It appears that linguistics may be the key to
reducing processing time and providing a more seamless user experience.
This research report examines speech recognition technology and its patent landscape in the
U.S. market, providing an overview of key audio signal processing techniques and identifying
the IP strengths and weaknesses of top companies. The report also provides in-depth analysis
of two widely used speech recognition applications – mobile devices and automobiles.
iRunway’s analysis found 21,281 granted patents and major industry players such as Microsoft,
Nuance Communications, AT&T and IBM lead the list of top patent holders. While Microsoft
owns the largest patent portfolio in linguistics technology with 15% of the seminal, or strong,
patents in this space, Nuance Communications dominates the recognition category with 8.5%
seminal patents. Sony owns 7.4% of seminal patents in storage and transmission technology.
Another force of change will soon arrive as at least 172 seminal patents belonging to the
leading 10 seminal patent owners expire in 2016, bringing them into public domain. This will
likely prompt a new wave of development in the speech recognition domain with a dramatic
impact on the application of this technology. This is likely to reduce licensing costs and make
speech recognition technology more easily available to the larger market.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 4 of 36
Public
Mobile devices are emerging as commercially successful ubiquitous tools to perform multiple
human activities through voice commands. iRunway’s analysis found 3,209 patents related to
application of speech recognition technology for mobile devices. AT&T, Nuance
Communications, Microsoft, IBM and Google are the key patent holders in this space. Apple
does not have a large speech recognition patent portfolio and it licenses patents from other
industry leaders including Nuance that powers Siri.
A growing market for smart vehicles has also bolstered the automobile industry into a key
applicant of speech recognition technology. The analysis found 648 speech recognition patents
that were applicable to automotive and vehicular systems. Until the turn of the century,
automobile manufacturers were relying heavily on technology companies such as Nuance and
Microsoft to implement speech recognition functionalities in vehicles. In recent times, Apple and
Google have emerged as two major players vying for a large market share with their CarPlay
and Android Auto speech-controlled infotainment systems respectively. However, many auto
manufacturers have begun developing these technologies in-house. Denso, General Motors and
Honda are three leading patent owners in this space. Other leading owners of patents in
speech-enabled communication, navigation, maneuver, data presentation and techniques to
cancel environmental noise include Nuance, Microsoft, Alpine Electronics and AT&T.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 5 of 36
Public
2 Introduction
Designing a system that mimics human behavior, particularly the capability of speaking
naturally and responding interactively to spoken commands, has intrigued engineers and
scientists for centuries. Today, speech recognition has stepped into every realm, including
mobile phones, telecommunications, healthcare, banking, speech-controlled automobile
maneuvering, speech-based web browsing, robotics, virtual personal assistant, aviation,
military, education, handicap aid, security, and media and entertainment to name a few.
2.1 Technology Overview
Speech is technically defined as a sequence of basic units called phonemes. Automated Speech
Recognition (ASR) systems convert analog speech signals received through microphones to
digital signals that are segmented to retrieve phonemes. Using the phoneme sequence, the ASR
system refers to the vocabulary and grammar rules to decipher words or phrases. Processing
speech signals includes removing noise, reducing errors in recognizing phoneme patterns and
resolving ambiguity arising from variations in speech accent, pitch and speed.
Statistical models and grammatical rules require exemplary training data and a large volume of
vocabulary, words and phrases. With growing vocabulary and advances in semantic analyses,
speech recognition is striving to achieve more accuracy. Today's speech recognition systems
use powerful and complicated statistical modeling techniques to determine the most likely
outcome. Each system implements grammatical rules based on a specific dictionary to deduce
speech from phonemes.
Acoustic modeling and language modeling are important parts of modern statistically-based
speech recognition algorithms. Hidden Markov Models (HMMs) and neural network analyses are
widely used methods to recognize speech. Language modeling is also used in natural language
processing applications such as document classification or statistical machine translation.
Performance and accuracy of speech recognition systems is very crucial in industrial and
commercial applications. With the evolution of distributed computing and cloud-based data
storage, handling huge dictionaries and complex models to improve accuracy and performance
is now a reality.
Advancements in speech recognition techniques and their applications have manifested in
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 6 of 36
Public
increased patenting activity over the years. iRunway has analyzed patenting activity in this
technology sector for the past 20 years. The speech recognition landscape can be classified into
four major technology domains and multiple sub-categories as shown in Figure 1.
Please refer to the Glossary for a detailed understanding of the taxonomy.
Figure 1: Speech Recognition Technology Taxonomy
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 7 of 36
Public
3 Patent Landscape and Analysis
Since 1994, the USPTO has granted 21,281 patents across categories of this technology as
shown in Table 11.
Level 1 Categories Level 2 Categories Number of Patents
Recognition
(8,579)
Word recognition 2,044
Speech to text conversion 1,521
Correction and pattern matching 2,082
Models and algorithms 2,817
Speech recognition processes 909
Voice recognition 1,340
Speech recognition application 2,479
Storage and transmission (7,815)
Analysis and prediction 2,653
Data storage and distributed computing 1,094
Audio signal compression and expansion 2,176
Speech transformation 2,352
Enhancement and correction 3,102
Storage and transmission application 633
Linguistics
(3,519)
Translation 1,721
Natural language 1,002
Dictionary building 343
Theoretical interpretation 595
Multilingual support 417
Linguistics application 122
Synthesis
(1,368)
Text to speech conversion 1,055
Models and mathematical computations 802
Speech synthesis processes 483
Speech synthesis application 189
Table 1: Technology-wise Distribution of Speech Recognition Patents (Source: iRunway analysis based on patent data from USPTO)
The complexity of speech recognition technologies and their implementation, coupled with
innovation needed to evolve efficient speech assisted systems for commercial or research
purposes, has resulted in aggressive patenting activity over the years.
1 An IP asset may be counted in more than one of the Level 2 categories based on their relevance and theme.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 8 of 36
Public
The patent filing trend in the evolution tree below shows that the primary areas of focus include
methods to recognize speech from audio signals (recognition) and mechanisms to efficiently
store and transmit data2. Figure 2 charts the evolution of the speech recognition patent
portfolio across categories, with each circle representing the number of US patents filed in a
corresponding year in each category.
Figure 2: Evolution of the Speech Recognition Patent Portfolio in the U.S. Region (Source: iRunway analysis based on patent data from USPTO)
There has been intensive research in the last two decades to improve models and algorithms to
recognize speech commands from analog audio signals. The Hidden Markov Model (HMM) and
its variations is the most exploited this domain. In fact, transformation of analog speech signals
to digital data, and performing pattern matching analysis to distinguish word and voice to
activate applications has evolved as the key motivation for filing patents.
Patenting activity in linguistics technology indicates that translation is a widely researched field.
The need for large vocabularies to develop proficient language translation mechanisms is being
addressed through distributed storage and transmission techniques. This points to the fact that
efficiency and precision of speech recognition is dependent on the advancement of precise
mathematical models and effective ways of data retrieval, storage and transmission.
2 The number of patent applications for 2013 and 2014 may be more than what has been indicated in all the charts in
this paper as it takes an average of 18 months for a patent application to be published.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 9 of 36
Public
3.1 Leading Patent Owners
Speech recognition technology has garnered much interest from leading corporations including
Nuance Communications, Microsoft, Samsung, Apple and Google. Though Microsoft is the
largest patent holder, Nuance has been a strong player in this field. Nuance has been filing and
acquiring patents since the early stages of research and development in speech recognition
technology. It acquired a portfolio of thousands of speech recognition patents and pending
applications from IBM in 20093, while Microsoft recorded accelerated patenting activity between
2003 and 2007.
According to Wired magazine4, Apple has made significant investments to create a next-
generation backbone for its virtual assistant, Siri, through a neural network paradigm. Neural
network is a machine learning algorithm that simulates learning capabilities of neurons in the
human brain to manifest advanced artificial intelligence. The magazine also reported that IBM,
Microsoft and Google have deployed neural network technology in various speech-related
applications. Microsoft is using a neural network to power real-time translation features that are
expected to be introduced in Skype5. Similarly, Google is indulging in R&D in neural networks6
to offer a more efficient and accurate7 Google Now application8.
Nuance Communications is said to power some of the most appreciated commercial products
like Apple’s Siri9, and Ford’s in-car communication and infotainment system (SYNC)10. Each
year, millions of cars are fitted with Nuance’s voice, text to speech and natural language
understanding solutions. Leading automobile manufacturers like Audi, BMW, Chrysler, Ford,
General Motors, Hyundai and Toyota have been using Nuance’s Dragon Drive voice solution to
support a range of functionalities like voice dialing, message dictation, navigation, local
business search, music search and climate control in their automobiles11.
Figure 3 charts the 10 leading companies in this domain. In recent years, Google, IBM and
AT&T have emerged as significant applicants of speech recognition inventions, indicating strong
3 https://gigaom.com/2009/01/15/nuance-takes-on-microsoft-and-google-with-ibm-deal/
4 http://www.wired.com/2014/06/siri_ai/
5 http://www.wired.com/2014/05/microsoft-skype-translate
6 http://www.wired.com/2013/02/android-neural-network/
7 http://research.google.com/pubs/SpeechProcessing.html
8 http://www.forbes.com/sites/roberthof/2013/05/01/meet-the-guy-who-helped-google-beat-apples-siri/
9 http://www.forbes.com/sites/rogerkay/2014/03/24/behind-apples-siri-lies-nuances-speech-recognition/
10 http://www.nuance.com/company/news-room/press-releases/ND_003932
11 http://www.businesswire.com/news/home/20130530005460/en/Nuance-Accelerate-Successful-Automotive-
Business-Acquisition-Tweddle#.VNL_KmiUd1U
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 10 of 36
Public
research focus. IBM is a dominant applicant. In 2006, it entered into a partnership with Nuance
Communications to deploy speech recognition technology across various application areas.
Figure 3: Top 10 Speech Recognition Technology Patent Assignees (Source: iRunway analysis based on patent data from USPTO)
Figure 4 charts the patent filing trend of the top five companies in this domain.
Figure 4: Patent Filing Trend of Top 5 Assignees in Speech Recognition Technology (Source: iRunway analysis based on patent data from USPTO)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 11 of 36
Public
3.2 Technology-wise Patent Distribution of Top Assignees
iRunway analyzed the distribution of patents across categories for the top 10 assignees to
understand their R&D and IP focus. Nuance Communications owns the maximum number of
patents in models and algorithms. It also owns considerably large portfolios in speech to text
conversion, text to speech conversion, word recognition and voice recognition processes.
Microsoft has a distributed portfolio, dominating patented technologies in translation and
linguistics. It also owns intrinsic patents in models and algorithms, word recognition, correction
and pattern matching, and signal enhancement and correction categories. Since 2009 Google
has emerged as a strong player in recognition technology. Consumer electronics companies
Panasonic, Samsung and Sony own strong portfolios in audio signal storage and transmission
technologies. As represented in Figure 5 below, Nuance is the clear leader in developing
recognition techniques, while Microsoft has a distributed portfolio of patents in recognition,
linguistics and signal transmission categories.
Figure 5: Technology-wise Distribution of Speech Recognition Patents for Top 10 Assignees (Source: iRunway analysis based on patent data from USPTO)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 12 of 36
Public
AT&T holds the largest number of patents in synthesis technology, with a majority of them
related to text to speech conversion. Most of these patents focus on various technical
processes, mathematical models and algorithms on text to speech conversion technology.
Table 2 charts the leaders and competitors in each of the categories in Level 2. Numbers in red
indicate the maximum number of patents owned by an assignee in each Level 2 category. As
mentioned earlier, Nuance owns the highest number of patents across all Level 2 Categories in
Recognition technology.
Table 2: Patent Distribution of Top 10 Assignees across Level 2 Categories
(Source: iRunway analysis based on patent data from USPTO)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 13 of 36
Public
3.3 Seminal Patent Landscape
iRunway analyzed the 21,281 speech recognition patents granted by the USPTO to identify
seminal, or strong, patents using its proprietary solution COMPASSSM. Patent strength was
determined based on multiple parameters such as number of independent and dependent
claims, geographical presence, patent classifications, backward and forward references, age of
the patent, etc. The top 10% of patents were considered as seminal and important for further
analysis. Microsoft, Nuance Communications, AT&T, IBM, Apple and Google are the leading
assignees in terms of seminal patents. However, companies such as Digimarc, Lucent, Cisco
and Canon have remarkably strong seminal portfolios in this landscape.
Table 3 lists the number and percentage of seminal patents in the portfolios of the top 20
assignees.
Assignees Number of
Seminal Patents Percentage of Seminal
Patents in their Portfolios
Microsoft 144 9.8%
Nuance Communications 80 6.6%
Sony 73 12.4%
Digimarc 59 72%
IBM 59 8.1%
AT&T 48 5.5%
Apple 41 15%
Google 39 6.8%
Cisco 31 17%
Intel 31 13.8%
Canon 29 14.3%
LG 26 8.4%
Ericsson 26 14.7%
Qualcomm 25 9.3%
Hewlett-Packard 24 16.7%
Panasonic 24 4.8%
Samsung 22 4.6%
Philips N.V 22 12.7%
Lucent 21 39.6%
Fujitsu 18 8.2%
Table 3: Share of Seminal Patents within the Portfolios of the Top 20 Assignees (Source: iRunway analysis based on patent data from USPTO)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 14 of 36
Public
Nuance Communications owns the maximum number of seminal patents in recognition
technology (8.5% of all seminal patents in the category), followed by Microsoft (6.2%) and
AT&T (3.9%). Microsoft dominates in linguistics technology with 15% of seminal patents in this
category, followed by IBM that owns 5.5%. Apple and Nuance Communications own 4.2%
seminal patents each in linguistics. Sony is the predominant leader in storage and transmission
technology, owning around 7.4% of the total seminal patents in this category.
Figure 6 illustrates the seminal patent distribution of the top 10 assignees across Level 1
categories.
Figure 6: Distribution of Seminal Patents in the Portfolios of the Top 10 Assignees (Source: iRunway analysis based on patent data from USPTO)
Interestingly, few of these seminal patents will expire next year, bringing critical speech
recognition patents into public domain. Figure 7 illustrates the number of seminal patents
owned by the top 10 assignees vis-á-vis the number of them expiring by 2016:
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 15 of 36
Public
Figure 7: Number of Seminal Patents Expiring by 2016 in the Portfolios of Top 10 Seminal Patent Owners (Source: iRunway analysis based on patent data from USPTO)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 16 of 36
Public
4 Application in Mobile Devices & Automobiles
Speech recognition technology has enabled a wide range of applications for industrial and
domestic purposes. The need to automate and provide user convenience has triggered the
adaptation of speech recognition technology in all facets of everyday applications. These include
intelligent personal agents, customer care wizards, call center automated attendants, voice
access to universal directories and registries, unconstrained dictation and language translation
capabilities, speech assistance for handicaps, speech-enabled security systems, speech-
controlled medical equipment, automobile maneuvering and navigation systems among others.
However, the technology has found extensive demand in mobile phones/ hand held portable
devices and automobiles.
4.1 Speech Recognition Technology in Mobile Devices
The most widespread application of speech recognition is implemented through mobile devices,
smartphones, portable devices and digital assistants. Mobile devices have transformed from
mere voice calling devices to more advanced personal computing devices that allow users to
perform numerous applications like browsing the Internet, capturing photographs, language
translation, navigation guidance, and managing entertainment or multimedia content to name a
few. Speech recognition applications on mobile devices include voice activated initiation of
applications and speech synthesis and/or speech-based guidance for the user.
4.1.1 Patent Landscape of Mobile Device-Related Applications
An analysis of the 21,281 speech recognition patents revealed that 3,209 patents were related
to application of speech recognition technology for mobile devices. Methods and processes that
allow users to instruct and interact with mobile devices to perform various tasks and
applications intended to provide hands-free experiences has been a major area of focus. Design
of voice input and output hardware and software platforms to eliminate noise and increase the
accuracy of interpretation have enabled widespread implementation of speech recognition
capabilities on mobile devices.
Voiceprint is an important source of biometric authentication that is increasingly being
employed for confidential transactions. Also, voice commands to locate data in mobile devices is
a growing space in research. Figure 8 charts the distribution of these patents. Please refer to
Appendix 6.2 for a detailed understanding of each category.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 17 of 36
Public
Figure 8: Distribution of Speech Recognition Patents across Mobile Device Application Categories (Source: iRunway analysis based on patent data from USPTO)
Hands-free computing has been the major area of focus since 1995. Other categories have a
fairly distributed focus on patenting in transcription and text-to-speech applications that are
growing into areas of research interest.
4.1.2 Key Players in the Mobile Devices Sector
Speech recognition capabilities on mobile devices have attracted major technology development
companies, irrespective of mobile phone manufacturers or platform developers. In the race to
provide better services to users, companies have invested heavily in this area.
Patenting activity in the mobile devices application segment has been aggressive. Leading
mobile device manufacturers, telecommunications providers and technology platform providers
are actively patenting in speech recognition, synthesis and audio signal processing. In 2004 and
2008, IBM, AT&T, Microsoft and Nuance engaged in heavy patenting activity, with Nuance
leading the list both years. While AT&T has been a steady applicant, Google picked up pace in
2010 aligned to the development and launch of its Android-based Nexus mobile device series.
Figure 9 below shows that AT&T and Nuance are close competitors in this space with
comparable portfolio sizes. Interestingly, in 2006 Nuance mortgaged 34 of these patents to
UBS Stamford Branch. It is interesting to note that UBS, which is a banking entity, owns 160
patents in this domain, 54 of which are seminal and make up for 33.8% of its portfolio.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 18 of 36
Public
Figure 9: Top 10 Assignees in Speech Recognition Technology for Mobile Devices (Source: iRunway analysis based on patent data from USPTO)
It is common knowledge that Nuance powers Apple’s smart personal assistant, Siri12. Though
Apple doesn’t have a large speech recognition portfolio, it licenses patents from other players. A
detailed analysis of the top five assignees reveals that while all major players are focusing on
hands-free computing, AT&T has shown special interest in interactive voice response systems
(IVRS), authentication and identity management, and transcription and text-to-speech
application categories. Nuance owns significant patents in language learning and dictionary
application categories. Figure 10 charts the distribution of patents of the top five assignees:
Figure 10: Patent Distribution of the Top 5 Assignees in Mobile Devices Categories (Source: iRunway analysis based on patent data from USPTO)
12
http://www.forbes.com/sites/rogerkay/2014/03/24/behind-apples-siri-lies-nuances-speech-recognition/
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 19 of 36
Public
Microsoft and Google have emerged as major patent applicants in the last five years, racing
ahead of Nuance Communications. In order to understand the specific research and business
interest of the top five assignees, iRunway analyzed the pending applications as well.
Figure 11 below shows that Google has been focusing on developing hands-free computing,
language learning and dictionary applications, and transcription & text-to-speech applications.
AT&T, on the other hand, is investing in developing authentication & identity management
technology, and transcription & text-to-speech applications.
There is much competition among inventions related to language learning and dictionary
application technologies, with Nuance Communications, Google and Microsoft fighting for a
larger share of the pie.
Figure 11: Pending Patent Applications of Top 5 Assignees across Mobile Device Categories (Source: iRunway analysis based on patent data from USPTO)
4.1.3 Seminal Patents in Mobile Device Applications
An analysis of the seminal patent set in mobile device applications revealed that researchers
are focusing heavily on hands-free computing technology. About 39% of the seminal set
includes patents related to this application field.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 20 of 36
Public
A number of seminal patents related to mobile device applications of speech recognition
technology have been securitized to various financial organizations like UBS Stamford Branch,
JP Morgan Chase and Bank of New York. Microsoft, Google, Nuance Communications and IBM
are the key assignees of seminal patents in this application space. Companies such as AT&T,
Cisco, Digimarc, Intervoice, Apple and Accenture also own seminal patents.
Intense research in hands-free computing technology has resulted in a maximum number of
seminal patents registered in this category. With IVRS technology penetrating various customer
interfaces, it is another application category that is encouraging researchers to invent newer
and better methods of speech recognition. Figure 12 presents a distribution of seminal patents
in the mobile device application taxonomy.
Figure 12: Distribution of Seminal Patents in the Mobile Devices Application Categories (Source: iRunway analysis based on patent data from USPTO)
Table 4 charts the distribution of seminal patents across application categories in the mobile
devices categories for the top 10 seminal patent owners. Hands-free computing is the only
technology category in which all the ten companies own patents, while Nuance Communications
is the only player in this list that owns patents across all categories of speech recognition
technology application in mobile devices.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 21 of 36
Public
Table 4: Distribution of Seminal Patents of Top 10 Assignees in Mobile Device Categories (Source: iRunway analysis based on patent data from USPTO)
4.2 Speech Recognition Technology in Automobiles
Implementing voice command recognition in an automotive environment has been a major
challenge in developing speech recognition technology over several decades now. The first
vehicle mounted speech recognition application included caller dialing and basic navigation
functions through simple voice commands. Today, there is growing demand for the technology
to beat a variety of noises from outside and inside, including audio reverberations and echo
from mechanical surfaces. There is also a need to develop natural language interpretation for
distraction-free driving.
Communication to and from automobiles such as navigation instructions and remote access of
data has enabled drivers to stay informed and entertained while travelling. Vehicle
manufacturers are keen on providing intelligent and personalized data based on user
preferences, with capabilities to manage applications such as emails, select music from a
playlist, answer or make phone calls, type text messages and navigate through voice
commands.
4.2.1 Patent Landscape of Automobile-Related Applications
Of the 21,281 U.S. patents analyzed, 648 patents were applicable to automotive and vehicular
systems. Notably, patents related to speech recognition in mobile devices can also be extended
to the automobile environment. However, these 648 patents in this section are exclusive to the
use of speech recognition processes in an automobile environment.
Figure 13 represents the distribution of these patents across automobile application categories.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 22 of 36
Public
Figure 13: Distribution of Speech Recognition Patents across Automobile Categories (Source: iRunway analysis based on patent data from USPTO)
Please refer to Appendix 7.3 for a detailed understanding of the category.
4.2.2 Key Players in the Automobile Sector
While many leading automobile makers manufacture vehicles with speech recognition
functionalities, only a few own valuable patent portfolios. This indicates that manufacturers are
licensing patents from technology developers. Nuance Communications is reported to have
licensed its patents to several automobile manufacturers such as Ford, General Motors,
Chrysler, Toyota, Volkswagen, Daimler, BMW and Fiat13.
Nuance Communications owns the maximum number of patents in this domain with 42 patents,
followed by Denso that has a portfolio with 30 patents. General Motors owns 21 patents and
has been ranked third in the list of leading assignees in this technology landscape. However,
General Motors has another 21 patents securitized with Wilmington Trust Company.
Until the turn of the century, automobile manufacturers were relying heavily on technology
developing companies such as Nuance Communications and Microsoft to implement speech
recognition functionalities in their vehicles. However, patent filing trends suggest that they are
beginning to actively develop these technologies in-house.
13
http://www.bizjournals.com/boston/blog/techflash/2014/09/toyota-renews-contract-with-nuance-for-in-car.html
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 23 of 36
Public
iRunway found that since 2010, General Motors has focused its research activities in driver
assistance, voice enhancement and noise reduction categories, while Ford has shown an
interest in developing solutions in audio input/ output and user interface technologies.
Patenting activities in voice enhancement and noise reduction suggest that assignees are
aiming to solve noise issues from within and outside the vehicle to provide efficient speech
recognition functionalities in automobiles.
Figure 14 charts the leading owners of patents related to the application of speech recognition
technology in automobiles. Nuance is the predominant leader in this space with 42 patents,
while Robert Bosch, Harman International, Honda Motors and Panasonic own 11 patents each.
Google follows with 10 patents in speech recognition technologies applicable to the automobile
industry. Sony, Mitsubishi and Ford own nine patents each, followed by Johnson Controls,
Volkswagon and IBM with six patents each. Qualcomm, System Application Engineering Inc.
and Toyota end the list of leading patent assignees in this segment with each owning six
patents related to speech recognition technologies that can be applied to automobiles.
Figure 14: Top Assignees of Patents in Speech Recognition Technology for Automobiles (Source: iRunway analysis based on patent data from USPTO)
Denso and Microsoft own a considerable number of patents in audio input/ output and user
interface categories, while General Motors has a significant portfolio of driver assistance and
communication patents. Alpine Electronics has patents equally distributed across the six
technology categories in the speech recognition taxonomy in the automobile landscape, while
Microsoft has greater focus on navigation technology.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 24 of 36
Public
Figure 15 presents a distribution of patents of the top five assignees. iRunway’s analysis of the
top five assignees indicates that Nuance Communications has been focusing on development of
navigation, and noise reduction and voice enhancement technologies.
Figure 15: Patent Distribution of Top 5 Assignees in Automobile Categories (Source: iRunway analysis based on patent data from USPTO)
4.2.3 Seminal Patents in Automobile Applications
iRunway’s analysis revealed that maximum seminal patents are related to communication (32%
of all seminal) and navigation technologies (23%). General Motors owns the maximum seminal
patents in this space. Figure 16 presents a distribution of seminal patents in the automobile
taxonomy.
A major portion of the seminal patents related to speech recognition in automobiles relates to
communication technology. This points to increasing research interest in bettering
communication interfaces for drivers. About 23% of seminal patents in this patent set describe
technologies that can better navigation. This is a field of growing interest for researchers, with
the field receiving a boost from GPS technology and an increasing usage of navigation services
by users. An analysis of the top 10 seminal patent owners in speech recognition technology
categories for automobile applications shows that only General Motors owns a strong patent in
this space. A majority of these top 10 players are focusing on developing communication and
navigation technologies.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 25 of 36
Public
Figure 16: Distribution of Seminal Patents across Automobile Application Categories (Source: iRunway analysis based on patent data from USPTO)
Table 5 charts the distribution of seminal patents across application categories for the top 10
seminal patent owners.
Table 5: Seminal Patent Distribution of Top 10 Assignees across Automobile Application Categories (Source: iRunway analysis based on patent data from USPTO)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 26 of 36
Public
5 Litigation Trend in Patent Landscape
Of the 21,281 patents analyzed in the speech recognition domain, iRunway found 291 patents
involved in litigation and trials. Since 2000, these 291 patents have been involved in 745 patent
litigations, 67 Patent Trial and Appeal Board (PTAB) petitions and 10 International Trade
Commission (ITC) investigations. The litigation and trial filing trend as seen in Figure 17,
suggests a sudden spike in the number litigations in 2012, progressively increasing up to 2014.
A PricewaterhouseCoopers report14 states that there has been a 29% increase in the number of
litigations filed in 2012 over 2011. This can be owed to the anti-joinder provision of the America
Invents Act. Not surprisingly, 2012 saw a spike in the number of cases filed in speech
recognition technologies. Approximately 65% of the cases filed in 2012 were by four NPEs,
namely Blue Spike LLC (55 cases), EMG Technology LLC (18 cases), MCRO Inc. (27 cases) and
Ceecolor Industries LLC (13 cases). The technology sub-domains of these patents are
recognition and, storage and transmission. These patents were granted or assigned to the
plaintiffs in 2012 which propelled them to initiate litigation against multiple defendants. Only
5% of these cases are still open.
Figure 17: Patent Litigation Trend in Speech Recognition Domain
(Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)
Of the 291 patents involved in litigation, 112 patents have been identified as seminal in Section
3.3. Figure 18 below shows a technology distribution of seminal patents under litigation.
14
http://www.pwc.com/en_US/us/forensic-services/publications/assets/2013-patent-litigation-study.pdf
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 27 of 36
Public
Figure 18: Distribution of Seminal Patents under Litigation (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)
NPEs have been involved in 580 cases between 2000 and 2014. Operating companies have
been less active in litigation, initiating just 136 lawsuits. Figure 19 shows the distribution of
various entities in terms of percentage of litigations filed.
Figure 19: Patent Litigation Distribution by Type of Company (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 28 of 36
Public
The top 15 plaintiffs and defendants involved in litigation is shown in Figure 20 and Figure 21
respectively. Interestingly, these plaintiffs have not pursued majority of the cases as evident in
Figure 21.
Figure 20: Number of Open and Closed Litigations Filed by Top 15 Plaintiffs (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)
It is evident from Figure 21 that major technology developers and mobile phone manufacturers
are the primary defendants in cases filed by NPEs. This trend can be owed to the growing
application of speech recognition technologies on mobile devices.
Figure 21: Top 15 Defendants involved in the Litigations (Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 29 of 36
Public
Recognition technology, followed by storage and transmission technology, are the major areas
of litigation. Figure 22 charts the distribution of litigated patents across technology categories.
Figure 22: Percentage of Litigated Patents across Technology Categories
(Source: iRunway analysis based on patent data from USPTO and litigation data from RPX)
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 30 of 36
Public
6 Conclusion
Over the last twenty years, speech recognition technology has evolved into an everyday
application. Much of the activity has been driven by innovation, competition, user demands and
ease of application. Industry trends suggest that growth in speech engines and native
algorithms may be modest in the near future, while heavy research and development activity
will be witnessed in speech-based applications across everyday gadgets and machines.
The implementation of speech recognition applications in mobile devices, smartphones and
smart assistants has unraveled a new arena for competition. Apple’s Siri, which has been
dominating this market for the last few years, now faces strong competition from Microsoft’s
Cortana, Google Now, Nuance Communications’ Dragon Go and many others. An appealing
voice interface and visual emoticon, combined with artificial intelligence, has made these
applications more user-friendly.
The automobile segment is witnessing lesser patenting activity in comparison to mobile device
applications. However, mobile devices are being increasingly integrated with the automobile
environment, and hence, the dividing line between these two industries is getting finer by the
day. To offer better driving and navigation experiences, use of handheld or pre-installed
interfaces in a vehicle to assist drivers is being researched and deployed by major automobile
manufacturers. Over the last decade, there has been an increase in collaborations between
technology developers and automobile manufacturers to ensure better incorporation of speech-
controlled applications in automobiles.
While Microsoft and Nuance Communications were the predominant patenting leaders in the
speech recognition landscape, in recent times Google is making a mark with its aggressive
patenting activity. There is much happening in this segment with players in the automobile and
mobile devices segments vying to create a common platform that can integrate speech
recognition applications across a variety of operating systems and offer seamless speech
command services.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 31 of 36
Public
7 Glossary
7.1 Technology Category
Recognition – This is the most important category in the speech recognition technology
landscape and encompasses patents that disclose inventions related to recognition of words
from speech data. Speech is separated into discrete components which are distinguished
from one another. Patents in this category disclose inventions related to word recognition,
voice recognition, speech to text conversion, and models and algorithms to achieve these
conversions. Phonetic, phonemic, lexical and semantic knowledge is used in conjunction
with statistical regression of mathematical models such as the Hidden Markov Model and its
derivatives to recognize human and computer readable words.
Storage and Transmission – Speech signals entered by a user is processed in multiple
steps to activate the desired action. Analog speech signals are digitized and computed to
decipher machine readable commands. The speech may be coded or transformed based on
mathematical functions and converted into computational data. Speech data is stored or
transmitted for further analysis. Correction and enhancement functions are applied on the
speech data to remove disturbances and noise. Challenges in meeting accuracy and
efficiency requirements for speech analysis have been addressed through parallel
distributed computing of speech data using artificial neural networks.
Linguistics – Recognizing speech is impossible without an understanding of the language.
Linguistics includes patents that disclose inventions pertaining to construction of a word,
phrase or sentence in a language. This involves building a dictionary, formulating grammar
rules, language translation and natural language processing.
Synthesis – Synthesis of speech is artificial simulation of human speech. Synthesis
includes patents that disclose the processes of combining speech components to produce a
synthetic speech output. This category includes patents related to text to speech conversion
techniques and various models and methods to do so.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 32 of 36
Public
7.2 Mobile Device Application Categories
Hands-Free Computing - Hands free computing refers to a configuration that allows a user
to interact or control a device without the use of hands and other equipment such as touch
screen, keyboard or a mouse. Mobile device applications can be controlled by providing a
voice input. A speech recognition system present in the portable device identifies voice
commands and performs the operation.
Authentication & Identity Management - Authentication and identity management refers
to management of a user’s access to a certain application or a feature available in a portable
device. Voice input provided by the user is compared to an existing voice print present in the
portable device. The user is provided access to a portable device only if the voice input
matches with a pre-existing voice print. For example, operations such as unlocking the touch
screen based on voice command, caller identification based on voice, performing business
transactions, etc.
Database Annotation/Searching - Database annotation refers to annotating the data
present in a database. Database searching refers to performing a search for specific data
stored in a database or remote server. A voice command requesting search for a specific
data in a database is obtained and information corresponding to the request is sent to the
portable device based on the voice command. For example, operations such as retrieving
audio content, extracting phone numbers, etc., come under this category.
Language Learning & Dictionary Applications - The category deals with technology that
enables mobile computing devices to learn new words and language models, phonetic
representation of words in different languages and perform translation operations. Language
learning techniques also allow the device to learn words and pronunciations specific to
certain users, and additionally create different language profiles for multiple users. For
example, operations such as pronunciation correction, words addition to dictionary, etc., fall
under this technology category.
Interactive Voice Response System (IVRS) - An Interactive Voice Response System is
an automated telephony system, where a caller can interact with a computer. IVR systems
are used extensively in banking, airline services, pharmacies and many other places to guide
customers through various processes. Callers can interact with the computer through Dual
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 33 of 36
Public
Tone Multi Frequency (DTMF) signals, which are unique tones associated with keys on a
telephone keypad. Alternatively, speech recognition systems can also be employed where
callers can use voice commands to interact with the IVR system.
Transcription & Text to Speech Applications - Transcription in mobile devices allows
accurate conversion of voice to text. Transcription has several applications in mobile
computing devices, like typing, browsing through menu items using voice commands etc.
Text to speech functionality, on the other hand, enables the computing device to read aloud
text that is entered by the user or is the output of any computing process. Text to speech
applications include, but are not limited to reading emails, e-books, etc.
7.3 Automobile Application Categories
Audio Input/Output and User Interface - Audio Input/Output and User Interface refers
to technology used in smart devices in automobiles that receives and processes speech or
audio input through its user interface. It also encompasses technology for outputting audio
and video signals to allow user interaction. Audio input/output devices used in automobiles
for receiving speech inputs, UI devices for processing speech commands, display screens to
guide and facilitate the driver, and audio video/ output mediums to display or play the
requested data are a few examples.
Communication - This category deals with communication established between the driver
of an automobile and a remote server that provides information and data services. The driver
establishes communication with the server and provides voice inputs requesting information.
The server responds with the information requested. Telecommunication devices like mobile
phones and smartphones used from an automobile can serve as a communication medium.
Driver Assistance - This category deals with technology implemented in automobiles to
provide assistance to drivers. The driver may control entertainment devices, access emails,
select multimedia files etc. by providing voice input to a device. In addition, the driver may
be alerted if the smart device senses him/her to be drowsy while driving.
Maneuver - Maneuver refers to technology used in automobiles to enhance driving
experience. This may include automatic control of the steering wheel, brakes, window panes,
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 34 of 36
Public
wipers, etc. through voice commands, equip robots for automatic control of brakes and other
driving related equipment. The driver of the vehicle may be able to control the movement of
the vehicle remotely through voice commands.
Navigation - Navigation refers to location services provided to the driver based on voice
commands. The driver may request for directions to a specific location through speech inputs
to the navigation device present in the automobile. This category also includes automatic
suggestions of near-by points of interest such as restaurants, shops, pharmacies, hospitals,
etc. based on the location of the vehicle or pre-defined user preferences.
Voice Enhancement and Noise Reduction - Voice enhancement refers to technology used
in automobiles to enhance the quality of speech command recognition by reducing noises
associated with an automobile environment. Noise in automobiles can be internal due to
various mechanical reverberations or external (for example traffic, neighboring vehicles). To
identify user commands from the audio input requires dampening of noise and enhancement
of voice.
Speech Recognition: Technology & Patent Landscape
© iRunway 2015 Page 35 of 36
Public
8 Authors
The authors acknowledge the many contributions of Purnima Lodha, Sarfraz Shariff and Ambika
Ashirwad Mohanty towards shaping this report.
Aditi Das
Aditi is a Consultant at iRunway. She provides key technical insights that help IP
attorneys profoundly improve licensing and litigation outcomes. She specializes
in data mining, technology and patent landscape, patent infringement and
validity analyses, and helps clients in multiple technology domains.
Ashish Gupta
Ashish is a Consultant at iRunway. He is a strategy analyst who provides new
licensing outlooks for clients. He works on high-profile technology litigation and
licensing programs, helping attorneys and clients find crucial infringement
evidence.
Bhargav Ram
Bhargav is a Senior Associate at iRunway. He specializes in data mining and
provides in-depth analyses to extract meaningful insights that provide lead
indicators to support complex IP monetization matters of clients.
Contact
United States
1114 Lost Creek Blvd, Suite 400
Austin, TX 78746
Tel: +1 512 284 8200
2905 Stender Way, Unit 28
Santa Clara, CA 95054
Tel: +1 650 308 4807
8400 E. Crescent Parkway, Suite 600
Greenwood Village, CO 80111
Tel: +1 720 528 4273
India
1st Floor, AMR Tech Park I Annex
No. 23 and 24, Hongasandra
Hosur Road, Bangalore - 560068
Tel: +91 804 058 4000
www.i-Runway.com
iRunway® is a registered trademark of iRunway India Private Limited.
iRunway has prepared this research independently based on reliable public data and reviewed the results based on its proprietary methodology, with the belief that it is fair and not misleading. The preparers of the information in this report are not engaged in rendering legal or other professional advice, and nothing in this document should be construed as such. iRunway does not practice law and it exists to provide technical research, analysis and reporting capability to its clients.
The patent data in this report is as of the date of preparation and therefore is subject to change as new patents are filed and issued every day. iRunway, nor any employee of iRunway accepts any liability for any damages or losses, direct, indirect, consequential, arising from any use or interpretation of this report or its contents.
Top Related