IDOL presentation

72
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP IDOL Speaker’s name Month day, 2014

Transcript of IDOL presentation

Page 1: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

HP IDOLSpeaker’s name

Month day, 2014

Page 2: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2

Video surveillance

Wire tapping

Internet of ThingsFacebook likes

Tweets

DronesOnline shopping Search queries Tweets

RBMS Social sentiment

CRM

Web logs

User clickstreams

Business data feedsMobile

SMS/MMS

User generated content

Apps YouTube

Service logs

The dawn of the information era

Page 3: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3

Improve customer relationship

Extend life expectancy

Deliver better, smarter products

Ensure governance & compliance

Protect and save lives

HP IDOL makes data matter

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Page 4: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4

Understanding meaning is the key to solving information challenges

Risk modeling Fraud detection Competitive advantage Behavior analysis Knowledge delivery

?

Volume VelocityVariety Veracity

Fin Services ManufacturingLife Sciences Hospitality GovernmentTelecom RetailEntertainment Energy HealthcareMedia

Future challenges

Page 5: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5

Understanding human information

• Access and understand virtually any source of information on-premise and in the cloud

• A strategic pillar of HP’s HAVEn Big Data platform

• Non-disruptive, manage-in-place approach complements any organization

Social Media Video Audio Email Texts Mobile

Transactional Data IT/OTDocuments Search Engine Images

Harnessing the power

Page 6: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6

86%of corporations cannot deliver the right information, at the

right time, to support enterprise outcomes all of the time³

³Source: Coleman Parkes Survey November 2012

Keyword, metatags, database technologies often fail

Legacy technologies fall short

• Manual process does not scale

• Multiple definitions of the same word

• Not real-time

• Inaccurate and subjective

• Limited definitions, no relativity

• No idea distancing

• Interoperability of tagging

• Retroactive reporting

Page 7: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7

How does HP Autonomy approach human information?

Continuous learning based on incoming data and contextAdaptive

Mathematical, language independent technologyProbabilistic

Extract main concepts present in informationConcept

Combination of proprietary technology and proven industry standard methodologiesModeling

Page 8: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8

HP IDOL: Key enabling technology

• Mathematically based

• 15 years and over $280M in R&D

• >170 Patents

• Language independent

• Built for infrastructure

• All file types, all media types (voice/video)

• Scalable and with security

• Platform/OS /device agnostic

• Managed in place

Page 9: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9

Powered by IDOL, the OS for human information

Social Media Video Audio Email Texts Mobile TransactionalData

Documents IT/OT Search Engine Images

Apps for Exploratory Information Analytics

Apps for Information Governance and Management

Apps for Marketing Optimization

HP Autonomy connectors

Developer/Partner

External/CloudHP Autonomy Enterprise

Applications

The OS for human information

Repositories

Information types

OS service layers 500+ functions

DigitalSafe SharePoint Hadoop

CRM

Jive

Exchange

Relational DB

ACA AeD

WorkSite HP Records Mgr MediaBin

Data Protector Connected LiveVault

Driven by advanced analytics to understand data in context from any source

Page 10: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10

Over 500 IDOL functions to augment your intelligence

Automatic hyperlinking

Conceptual search

Keyword search

Fieldtext search

Phrase search

Phonetic search

Field modulation

Fuzzy matching

Implicit profiling

Explicit profiling

Community and expertise network

Agents

Intent-based ranking

Alerting

Social feedback

Eduction

Automatic clustering

Clustering 2D/3D

Autoclassification

Auto language detection

Sentiment analysis

Automatic taxonomy generation

Automatic query guidance

Highlighting

Parametric refinement

Summarization

Real-time predictive query

Metadata extraction

Automatic tagging

Faceted navigation

InquireSearch your data

InvestigateAnalyze your data

InteractPersonalize your data

ImproveEnhance your data

Page 11: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11

Search your data

• Conceptual, Keyword or Object• Extensive Field combinations• Full Meta Search

• Linearly Scalable• Fault Tolerant• Disaster Recovery Friendly

• All Information • Real-Time Data• Audio and Video

• Mapped Security• Fully Extendable• Leverages Existing Security

Accuracy

Robust Architecture

Reach

Security

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 12: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”Analyze your data

Quickly evaluate the relevance of information

• Automatic Query Guidance (providing top themes from query results in real time)

• Concept navigation via advanced visualizations (node graphs, theme tracking, topic maps, broadcast analysis)

• Intelligent summarization (simple, concept and context)

• Intelligent highlighting (search terms, phrases, concepts, context, fidelity to query grammar)

• Concept streaming (Real-time summaries from audio that are contextual to queries and intent)

• Intelligent de-duplication, including “near” de-duplication

Use structure to navigate the data

• Structured, semi-structured and XML support

• Parametric search (unlimited nesting and association support)

• Directed navigation (create compelling navigation for users)

Page 13: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13

Personalize your dataWe are what we…

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 14: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Personalize your data

Explicit profiling (agent): user-defined •Define your interest using:

- Natural language descriptions

- Keyword/ Boolean rules

- Refine by example

•Automatically monitor information

•Customizable

•Share interests with knowledge community

Implicit profiling: capturing behavior data• Fully automatic• Ongoing monitoring of data consumption

and contribution• Multi-faceted profiles• Always up-to-date

Expertise

CommunitiesAgents

Profiles

Dynamic communities of interest•Expert identification

•Define business rules to guide relationships

•Automatically form and manage community

•Collaboration Networks

•Document rating

•Consumer groups

Expertise Expertise

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 15: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Exploratory analytics that help you discover the “unknown unknowns”

Enhance your data

Managed classification• Create categories using business rules or training

Automatic classification and clustering• Automatically determine categories based on patterns and relationships in information• Spot analysis of all themes and grouping• Time sensitive analysis; What’s hot? What’s New?

Eduction• Apply structure to unstructured data by extracting key fields and entities• Hundreds of entities supported, including names, addresses, credit card information, sentiment, intent, etc

Audio analysis• Speaker independent speech to text, speaker identification, audio events, language identification, etc

Image and video analysis• Next generation image classification (is this a car?/find more like “this”)• On-screen OCR, logo detection, intelligent scene analysis, Color and texture analysis,

story segmentation, etc

Page 16: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16

HP Autonomy solutions family, powered by IDOL

IDOL

Compliance

Litigation Readiness

Storage OptimizationDatabase Archiving

eDiscovery

Supervision

Legal Hold

Enterprise Search & AnalyticsVoice of the Customer

Voice of the Worker

Media Intelligence

Video Surveillance

Big Data Analytics

Knowledge Mgmt

Content Access& Extraction

Records Mgmt

Legal Content Mgmt

Business Process Mgmt

Document Mgmt

Records Mgmt

Legacy Clean Up

Server Data Protection

Virtual Machine Data Protection

Remote & BranchOffice Data Protection

Endpoint DeviceData Protection

Cloud Data Protection

EnterpriseContent Mgmt

Archiving & eDiscovery

DataProtection

Web Experience Mgmt

Web Optimization

Search Engine Marketing

Marketing Analytics

Contact Center Mgmt

Rich Media Mgmt

Aurasma - Augmented Reality Mobile Experience

Digital Marketing Experience

Information Analytics

Information Management & Governance Marketing Optimization

Hybrid

OEM

Software

Cloud for human information

Page 17: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17

HP technology powered by IDOL

Enterprise Group

HP StoreAllHP StoreOnceHP Gen 8 Appliances

Enterprise Services

HP Social Command CenterHP Information Governance

PPS

HP FlowHP Live PhotoHP Connected Backup

IDOL + HadoopIDOL + VerticaHAVEn

Big Data

IDOL + ArcSight

Security HP Labs

Compass

Page 18: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Foundational methodology

Page 19: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19

Strong information and weak informationKey Words are small amounts of very strong information without contextLarger amounts of weaker information is what humans refer to as “context”

“Mercury”

Is it a planet?Is it an element?Is it a car?With high certainty; its and element!

“A heavy element and the only metal that is liquid at standard conditions for temperature and pressure with the symbol Hg and atomic number 80, commonly

known as quicksilver”

Page 20: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20

Uses pattern-matching and probabilistic modeling to form an understanding of content

HP IDOL understands the meaning of information

Fundamentally language-independent• Treats words as symbols

Allows incoming data to dictate the model, not pre-defined rules or dictionaries• Adapts to changing definitions

Optimized with language packs• Eduction, sentiment analysis, speech analytics

Information Theoryand Bayesian Inference

Page 21: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21

Best-in-class combination of approaches

XML and Boolean+

Natural language processing

Probabilistic

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Page 22: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22

If we toss a coin 100 times and get heads every time, what’s the probability of getting a head on the 101st?

Traditional probability says: 50%© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Page 23: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23

If we toss a coin 100 times and get heads every time, what’s the probability of getting a head on the 101st?

Adaptive intelligence: prior information changes the model of understanding

Bayesian Inference says: 99+%© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Page 24: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24

What is in front of this wake?...

With high probability we can say there is a……..

BOAT!

Page 25: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Page 26: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26

Let’s play hang man

_ _ _ e _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ e _ _ _ _ _ _ _ _ _ _ _ _ _ t _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ e _ _ _ i _ _ _ _ i _ i _ _ t i _ _ _ _ i _ _ i _ _ _ i _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _

e

t

a

i

n

o

s

r

l

d

h

c

u

m

f

p

y

g

w

v

b

k

x

j

q

z

Supercalifragilisticexpialidocious_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Page 27: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Platform features

Page 28: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28

Language independence

• Free from linguistic restraints and rules

• Automatically adapts to changing definitions

• Over 170 live customer languages

• Single, multibyte and Unicode languages

• Optional language packs for localization

Department of Homeland Security - Requires extremely precise handling of foreign languages, including Chinese and Arabic

Open V – China’s largest online video website

Page 29: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29

HP IDOL powers the largest systems in the world

Scalability

Millions of users

• Dept of Defense: 2.5 million users

Billions of documents

• Large bank: Over 1 bn emails

• Pharma: 50 terabytes of data in discovery repository alone

High throughput

• Bloomberg: Alert on 46m emails per day

Page 30: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30

Mapped security

• Fully integrated Kerberos authentication together with Secure Socket Layer (SSL) encryption across all transactions

• Compliance with all major Security Standards, including US DoD5015.2, UK TNA2002, Australia’s VERS, ISO 15489

• Full-range of customizable security functionality:

– Discretionary access control (ACL based)

– Mandatory access control (Based on metadata)

– Kerberized access to IDOL

– SSO authentication using Windows Active Directory

Single supplier to US Department of Homeland Security

Page 31: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31

Intelligent compaction

• Pause and resume the operation without causing corruption

• Monitor the progress

• Skip large sections of the index when appropriate to expedite the operation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Page 32: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32

http://host:ACIPort/action=admin

HP IDOL admin

Answer common questions and ease common actions:

• “Why is this query slow?”

• “What’s using up so much memory in my engine?”

• “Is my engine operating as expected?”

• “I need to perform some light maintenance (DREREPLACEs, etc) but don’t want to bother writing a perl script.”

Page 33: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Architecture

Page 34: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.34

HP IDOL connector overview

Connector actions

• Synchronize (fetch)

• View

• Identifiers, Collect, Hold, ReleaseHold

• Insert, Delete, Update

Repository ConnectorConnector framework

serverIDOL

LUA w/IDOL extensions

DocumentFormat

detection

Pre-import processing

KeyViewfiltering

Post-import processing

LUA w/IDOL extensions

Index into IDOL

Repository

Connector

Connector framework server

IDOL

Repository

Connector

Repository

Connector

DIH

IDOL IDOL

Page 35: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.35

HP IDOL data ingestion pipeline

LUA scripting engine is available within connectors

KeyView file format process, Eduction and LUA scripting engine are available within CFS

Repository

Connector

Connector framework server

Content

Repository

Connector

Repository

Connector

DIH

IDOL ProxyIndex tasks

OCR

Audio/Video

Category

APA Agents

Page 36: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36

Provides a flexible way of batching, scheduling, routing, and aggregating information into IDOL servers

Distributed Index Handler (DIH)

Features• Consistent hashing

• Batch indexing

• Index routing

• Virtual databases

• Categorization-based indexing

• Time-based indexing

Benefits• Seamless integration with backend modules

• Resilience

• Scalability

• Flexibility

IDOL Server 1 IDOL Server 2

Distributed Index Handler

(DIH)

Connector

Mirror/non mirror index

Page 37: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.37

Intelligent query distribution

Distributed Action Handler (DAH)

Features

• Arbitrary distribution

• Mirrored configuration

• Non-mirror configuration

• Load balancing

• Fail-over

Benefits

• Linear scaling

• Improved performance

• Reduced processing time

• Robustness

IDOL 1 IDOL 2 IDOL 3 IDOL 4

1

DAH1

N 1 N

DIH1

N N

DAH3 DAH2

DIH3 DIH2

Page 38: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.38

Globally distributed system

Page 39: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Advanced functions in depth

Page 40: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.40

HP IDOL retrieval methods

Conceptual

• Natural language

• Conceptual matching

• Unstructured refinement

Business rules

• Boolean

• Keyword

Parametric

• Structured refinement

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 41: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.41

Over 100 operators for Boolean search

AND

OR

NOT

NEAR

NEARn

DNEAR

DNEARn

WNEAR

WNEARn

BEFORE

AFTER

EOR

WHEN

WHENn

vAND

vSUBSTRING

vMATCHES

NEAR

NEAR/n

SENTENCE

PARAGRAPH

BEFORE

AFTER

ORDER

SOUNDEX

MANY

[n] WORD

CASE

PHRASE

. >

. >=

. <

. <=

. !=

. =

LANG/x

TODAY

YESTERDAY

NOW

NOW+n

NOW-n

term

term*

term?

vOR

vNOT

vACCRUE

vANY

vALL

vIN

vWHEN

vCONTAINS

vENDS

vSTARTS

vSUBSTRING

vCONTAINS

vENDS

vSTARTS

FREETEXT

STEM

TYPO

TYPO/n

YES-NO

PRODUCT

SUM

COMPLEMENT

LOGSUM

LOGSUM/n

MULT

MULT/n

FREQ

term~

term[100]

term[*1.5]

"term"

"term phrase"

term:field

"term

phrase":field

~term

FUZZY()

FUZZYnn()

SOUNDEX()

APCMMOD[]

term[~]

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 42: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.42

Conceptual search

High recall and precision

• Return documents that do not contain query terms but are conceptually related

Input sentences or entire document as query

• Extracts main concepts in the query to deliver the most relevant results

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 43: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.43

Automatic hyperlinking

• Automatically retrieves conceptually related content

• Searches automatically done for the user

• Increase productivity and reduce duplicate work

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 44: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.44

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Add context to short queries by grouping results into concepts

Automatic query guidance

Query ”Madonna”

Results: Documents containing ”Madonna”

Query search

Documents about:1. Singer2. Italian Renaissance3. Madonna Further

suggestions…

Most likely meaning…

Result documents

Conceptual clustering

Page 45: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.45

Summarization

Quick summary(N+ lines)

Context summary(What is this doc about with relation

to query terms?)

Concept Summary(What do I look for with regards

to interest rates?)

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Information Theory andBayesian Inference

Page 46: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.46

Directed navigation Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Narrow search with facets

Page 47: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.47

Visualization of main topics Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 48: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.48

Understanding the customer at the level of a dialog

Contextual segmentation

Geo + Demo + Psychographicsegments Behavioral segments

Functions

Performance

Feature Driven

Reviews

News

Adverts

Social media

Buzz driven

18-35 yrs

35-65

Seniors

Have Kids

Male

Female

Semantic segments

Large Screen

Lots of storage

High Res Display

Would give it 5 stars

Great Value for Price

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 49: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.49 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Intent-based ranking

Search results personalized and targeted based on user and context

Profile developed through complete behavior analysis… implicit or explicit profiling

Gather data from content consumption,

content contribution, interaction with colleagues, etc.

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 50: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.50

Foster collaboration by automatically matching and connecting employees with similar needs

Connect with your colleagues

Experts

Communities

Files

Social

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 51: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.51

Product performance issues

Clustering

Side letters

Off balancesheet transactionsAutomatically

partition the data so that similar information is clustered together

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 52: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.52 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Topical sentiment analysis

Decomposition and classification within a sentence to pull out specific topics

“I stayed at the Marriott last week, and though the mattresses were very nice, the service was awful.”

Is this Positive? Negative? Neutral?

How much Positive? How much Negative?

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 53: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.53

Hundreds of conceptual entities

Eduction

Quickly narrow search results with auto-identified facets and conceptual entities such as employee names from documents

Validate or customize entities

• Is this a valid credit card number?

• What are all docs that contain SSNs?

• If area code is 415, output as Home Office

Pinpoint accuracy for multibyte languages such as CJK, Thai and some European languages

NamesPlacesIP addressesCompaniesEventsRelationshipsMedicinesAirportsCarsSocial Security numbersPhone numbersCredit cardsDatesHolidaysJob titlesCurrencies… many more

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 54: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.54

Eduction Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

<Organization>• National Security Agency

<Names>• President Obama• Vladimir Putin• Edward Snowden

<Places>• Moscow• St. Petersburg• Washington• Syria• Russia

Page 55: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.55

Search video as easily as textTransform rich media into intelligent assets

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Live video or playback from archived footage

On-screen text recognition

Face identification

Automatically generated transcript using speech

recognition

Speaker identification

Timecodesynchronization

Automatic keyframe generation

AutomateAutomatically create metadata, keyframes, transcriptions

UnderstandUnderstand video footage and audio streams in real time

ActApply advanced analytics such as clustering and categorization, and link with other file types

Page 56: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.56

Most advanced speech technology

Convert spoken words to text• Acoustic + Language Model

• Speech-to-Text and IDOL’s conceptual understanding

Eliminate manually adding metadata to A/V clips

Phonetic approaches have major problems• No Conceptual or Contextual Language Understanding

• Keyword-Based

Model of language disambiguates similar terms• U.S. President “Bush”

• “bush” as in a large plant

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 57: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.57

Limitations of phonetic search

Phonetic sounds do not have a unique match

Only capable of keyword matching• “Cambridge University”• /k ey m b r ih jh y uw n ih v er s ax t iy/• The University of Cambridge• Cambridge colleges• Kings College• Trinity Hall

/k/ /ae/ /t/

“cat”

“category”

“scatty”

“catalogue”

?

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 58: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.58

Accurate speech technology

Language independent, statistical algorithms to recognize speech +

Language dependent, acoustic and language models for each supported language

How is the voice being recorded?Telephone models, Hz Rates Broadcast Models

What language +common phases,product names, etc.

Trained dictionarywith vocabulary and conceptual

understanding

Recognized hypothesis

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Front end processing

Front end processing

Front end processing

Front end processing

Front end processing

Speech

Page 59: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.59

Statistical models of speech and language

Speech-to-text technology

P(W) = probability of word string W

P(A|W) = probability of a acoustic sequence A given W

Use Bayes rule to find the word string w that has the highest probability given the acoustic sequence

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Language model

W = arg max P(W|A) = arg max P(W) P(A|W)P(A)

Acoustic model

Page 60: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.60

Language model provides probability of word sequence

Forms a conceptual understanding of language

“Can I help you?” vs. “Can eye help you?”

Trained from large text corpora (Hundreds of millions of words)

Defines words that can be recognized

Use training text, e.g., broadcast news

Encompasses topic information, colloquial phrases, etc.

Adaptable for particular customer

Specialist vocabulary, e.g., product names

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 61: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.61

Acoustic model analyzes the sounds that comprise a spoken language

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Audio analyzed to extract energy at various frequencies

Dependent on audio format

Complex statistical techniques model both the sounds and audio characteristics

Page 62: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.62

Image technology: Text

Document field extraction

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

<item><price>$6.23</price><date>10/2/2012</date><purpose>Lunch</purpose>…</item>

OCR: Read text from images

1D and 2D barcode reading

ISBN (“9870140189865”) PDF-417 (“LASTNAME, FIRSTNAME,…”)

Data Matrix (“The Future of Ticketing…”)

Many more (about 20 barcode types)

Image artifacts such as wrinkled paper

Avoid non-text parts of the image

Column understanding

Page 63: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.63

Image technology: 2D objects

Registered image Test image

Generic Logo recognition

Registered Logos

Test image

Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Page 64: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.64

Image technology: Human analysis Inquire“Search your data”

Investigate“Analyze your

data”

Interact“Personalize your

data”

Improve“Enhance your

data”

Primary clothing color = whiteNot nude

Primary clothing color = whiteNot nude

Primary clothing color = blackNot nude

Face detection

Face analysis

Found “President Obama” face

Page 65: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.65 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Hadoop/HDFS connector

Ingest Hadoop data into IDOL for advanced retrieval

Extract metadata, enrich and conduct advanced analytics for files stored in Hadoop

Push enterprise documents into Hadoop (chat data, ODBC, documents) for MapReduce analysis

Collect documents in Hadoop for legal collection

Page 66: IDOL presentation

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

What’s new

Page 67: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.67

HP IDOL 10

Extending leadership in human information analytics

More powerful Easier to operate Reliable

• Analyze sentiment at a granular level• Automatically extract 100s of entities

for improved search• Enhance your Hadoop investment• Deliver search results personalized for

each user• Improve audio and image analysis• Increase query speed by up to 30%

• Quickly answer performance-relatedquestions with our new visual dashboard, IDOL Admin

• Dynamically expand capacity without re-indexing for improved performance and no downtime

• Increase your indexing speed by as much as 47x with improved data transmission

• Recover intelligently from system failures with improved self-diagnosis of indices

• Securely delete content from your index

• Prevent the loss of documents during the indexing process

Page 68: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.68

Latest innovations in IDOL 10

Core IDOL algorithm enhancements• Improved compaction

• Improved ability to repair indices

• Improved query speed

• Incremental backup and point-in-time restore

IDOL architecture improvements• Indexing flow control

• IDOL Admin

Speech• Multi-CPU support

Eduction• Improved handling of multi-byte

languages

• New grammars

• Degrees of sentiment analysis

• 3x performance improvement in sentiment analysis

Image• Object detection

• Unified analysis

…and many more!

Page 69: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.69

Key strategic themes of HP IDOL development

Platform for search based applications

Enable internal and external partners to more easily leverage IDOL as a platform to build applications

Strengthen core functionality

Improve existing areas (e.g., sentiment), and continue growing in new areas (e.g., image)

Simplified consumption

Easier to install with more robust features

Consumable from private and public cloud for rapid web services

Next-generation enterprise search

Reinvent enterprise search in the era of cloud, mobile, and social computing

Big Data / analytics

Enable IDOL as content analytics platform in the broader Big Data / Information Analytics ecosystem; integrate Hadoop

Page 70: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Use cases

Page 71: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.71

Insert slides from relevant decks

Page 72: IDOL presentation

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you