Autonomy IDOL Server Technical Brief 1204 Rev1

7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1

1/6

Technical Brief

Autonomy IDOL server 5

IDOL serverAt the heart of Autonomys software infrastructure lies IDOL

server, a scalable, multithreaded process based on advanced

pattern-matching technology that exploits high-performance

probabilistic modeling techniques.

Selected IDOL server operationsThe intelligent operations that IDOL server performs across

structured, semi-structured and unstructured data are highly

customizable, offering a wide range of configuration

combinations that enable you to perform over 250 dataoperations.

1. Automatic Query Guidance

IDOL servers Automatic Query Guidance feature provides

an easy navigation facility which directs users to the results

they require based on a conceptual and contextual

understanding of their query. Instead of page ranking, an

approach which has been proven to be ineffective in the link

free enterprise, Automatic Query Guidance uses conceptual

clustering to determine the context of a user's search, and

presents the most appropriate results along with other

suggestions, even from few or single word queries.

2. Dynamic Clustering

Query results are clustered on the fly to avoid information

overload and provide an overview of the different conceptual

aspects that results can be grouped into. The clustered

results are presented in an easily navigable hierarchy,

providing users with speedy access to the right information.

3. Hyperlinking

Hyperlinks can be automatically generated in real time.

These link to contextually similar content and can be used to

recommend related articles, documents, affinity products or

services, or media content that relates to textual content.

Because links are automatically inserted at the time a

document is retrieved, they can include references to

documents and articles written long before. Hyperlinks from

archived material can link to the latest news or material on

that subject.

Autonomys software infrastructure uses sophisticated pattern-matching techniques to enable computers to understand information in

context. For the first time, a computer can go beyond keywords and metadata to identify concepts within text itself, determine the

concepts' importance and automate the processing of this content, regardless of its format, location, language and source application.

Using Autonomy Connectors, Autonomy's unique Intelligent Data Operating Layer (IDOL) integrates unstructured, semi-structured andstructured information from multiple repositories through an understanding of the content, delivering a real-time environment in which

operations across applications and content are automated, removing all the manual processes involved in getting the right information

to the right people at the right time.

IDOL server provides the following core information operations:

13. Profiling

14. Expertise Location

15. Collaboration

16. Alerting

17. Mailing

18. Spelling Correction

19. Dynamic Thesaurus

20. Retrieval - Lite

21. Retrieval - Concept

22. Retrieval - Parametric

23. Retrieval - Federated

1. Automatic Query Guidance

2. Dynamic Clustering

3. Hyperlinking

4. Summarization

5. Taxonomy Generation

6. Categorization

7. Channels

8. Channel Recommendation

9. Clustering

10. CEN Clustering

11. Eduction

12. Agents


2/6

Technical Brief

4. Summarization

IDOL server accepts a piece of content and returns a

summary of the information. IDOL server can generate

different types of summary:

Conceptualsummaries

Summaries that contain the most salient concepts of the

content

Contextual summaries

Summaries that relate to the context of the original inquiry -

allowing the most applicable dynamic summary to be

provided in the results of a given inquiry.

Quick summaries

Summaries that comprise a few sentences of the result

documents.

5. Taxonomy Generation

IDOL server's automatic Taxonomy Generation feature can

automatically understand and create deep hierarchical

contextual taxonomies of information. Clustering or any other

conceptual operation can be used as a seed for the

process. The resulting taxonomy can be used to provide

insight into specific areas of the information, to provide an

overall information landscape, or as training material for

automatic categorization, which then allows information to be

placed into a formally dictated and controlled category

hierarchy.

Automatic Taxonomy Based on Cluster Result

Based on cluster results, IDOL server can build Taxonomies

automatically and in real time.

Automatic Taxonomy to Category Generation

Once the Automatic Taxonomy Generation process has

taken place, it contextually understands the type of data it is

dealing with. From this a deep hierarchical contextual

taxonomy is generated, known also as an information

landscape. Much like the Automatic Cluster to Category

Generation, this feature takes the taxonomy results and usesthat data to create categories (in order to perform

categorization of information using the Categorization

operation).

6. Categorization

IDOL server can automatically categorize data with no

requirement for manual input whatsoever. The flexibility of

Autonomys Categorization feature allows you to precisely

derive categories using concepts found within unstructured

text. This ensures that all data is classified in the correct

context with the utmost accuracy. Autonomys Categorization

feature is a completely scalable solution capable of handling

high volumes of information with extreme accuracy and total

consistency.

Rather than relying on rigid rule based category definitions

such as Legacy Keyword and Boolean Operators,

Autonomys infrastructure relies on an elegant pattern

matching process based on concepts to categorize

documents and automatically insert tag data sets, route

content or alert users to highly relevant information pertinent

to the users profile.

This highly efficient process means that Autonomy is able to

categorize upwards of four million documents in 24 hours per

CPU instance. That's approximately one document, every 25

milliseconds. Autonomy hooks into virtually all repositories

and data formats respecting all security and access

entitlements, delivering complete reliability.

Category Matching

IDOL server accepts a category or piece of content and

returns categories ranked by conceptual similarity. This

determines for which categories the piece of content is most

appropriate, so that the piece of content can subsequently be

tagged, routed or filed accordingly.

7. Channels

IDOL server can automatically provide users with a set of

hierarchical channels with highly relevant information

pertinent to the respective channel. Eliminating the

requirement for manual intervention or pre-tagging, real-time

information is dynamically updated into the channels

automatically, minimizing the maintenance effort required.

Moreover, the administrator can add and remove channelson the fly, without having to re-categorize all of the data.

8. Channel Recommendation

IDOL servers Channel Recommendation feature

automatically recommends conceptually matching channels

when a query is submitted to IDOL server, thus providing

users with instant access to relevant information in the

hierarchical channels.

9. ClusteringIDOL server delivers the ability to automatically cluster

information. Clustering is the process of taking a large

repository of unstructured data, agents or profiles and

automatically partitioning the data so that similar information

is clustered together. Each cluster represents a concept area

within the knowledge base and contains a set of items with

common properties.

Features:

Automatic clustering of information

Configurable sub-headings

Automatic title generation

Configurable results layout

Identify key areas of expertise

Complete overview of knowledge base.


3/6

Technical Brief

10. CEN Clustering

IDOL server provides Collaboration and Enterprise Network

(CEN) Clustering to automatically match clustered data

against user agents and profiles in order to identify data that

matches people's interests. User interfaces that integrate with

IDOL server (for example, Retina, Portal-in-a-Box or third

party portals) highlight matching data in a spectrograph and

enable on-the-fly display of community users who own

matching agents or profiles, providing an instant overview of

the community users' details and instant email contactability.

Features:

Automatic clustering of information

Automatic matching cluster / interests matching

Automatic highlighting of popular clusters

Identify key areas of expertise

Display community user details

Email community users

Encourage collaboration.

11. Eduction

Eduction identifies concepts in the document in order to add

tags to the kind of content you specify.

Features:

Tag training

Plain Tagging ConceptValue Tagging

Negative Name training

Default User definable phrase tags

Case-sensitive user defined phrase tags.

12. Agents

Agents provide the facilities to find and monitor information

from a configurable list of Internet and Intranet sites, News

Feeds, Chat Streams and internal repositories highly relevant

to the explicit interests of a user. Agents are created in a very

user-friendly way using the following options:

Natural language descriptions

Example content (point and click)

Legacy Keyword or Boolean Expressions.

IDOL server provides the conceptual information that is

needed to create agents. The server accepts a piece of

content (training text, a document or a set of documents) or

reference (identifier) and returns an encoded representation

of the concepts, including each concepts specific underlying

patterns of terms and associated probabilistic ratings.

Agent Retraining

The server accepts an agent and a piece of content (training

text, a document or a set of documents) and adapts the

agent using the content.

Agent Alerting

The server accepts a piece of content (a sentence,

paragraph or page of text, the body of an email, a record

containing human readable information, or the derived

contextual information of an audio or speech snippet) and

returns similar agents ranked by conceptual similarity. This is

used to discover users who are interested in the content, or

to find experts in a field.

Agent Matching

The high performance agent matching solution enables

documents to be dynamically matched against any scale of

Boolean Agents. As content is indexed into IDOL server, the

content is matched against all Agent rules simultaneously

allowing targeted information to be delivered to the user in

real time.

13. ProfilingIDOL server tracks the content with which a user interacts,

extracts a conceptual understanding of the content and uses

this understanding to maintain a profile of the users

interests.

This profile is typically used to target information on particular

users, recommend content to users and to alert users to the

existence of content.

14. Expertise Location

IDOL server facilitates the automatic recognition of highly

focused experts and reduces the duplication of effort through

teamwork and the engagement of proactive collaboration

ventures.

15. Collaboration

IDOL server automatically matches users with common

explicit interest agents or similar implicit profile agents. This

information can be used to create virtual expert knowledge

groups.

16. AlertingIDOL server analyzes data in new documents, and compares

the concepts the documents contain with agents that users

have set up already. It then automatically sends email

notification to users whose interests are similar to a new

documentscontent.

17. Mailing

IDOL server regularly emails users to notify them of content

that matches their agents and channels that they are

subscribed to.

Features:

Configurable email format through XSS templates.


4/6

Technical Brief

18. Spelling Correction

IDOL server can automatically spell check query text that it

receives and suggest correct spelling for terms that it doesnt

contain. If a query contains several words that IDOL server

does not recognize, it suggests a spelling suggestion for

each of these words.

19. Dynamic Thesaurus

IDOL server includes a sophisticated conceptual Thesaurus

which uses the most salient terms and phrases in the result

documents that a query produces in order to offer a selection

of alternative query strings. These strings allow a user to

quickly execute alternative queries in order to produce a

variety of relevant result sets.

20. Retrieval - Lite

IDOL server offers the following basic legacy search

methods:

Legacy Keyword

IDOL server accepts a keyword and returns a list of

documents containing the terms ordered by contextual

relevance to the query.

Boolean/ bracketed Boolean

IDOL server accepts simple or complex Boolean and

bracketed Boolean expressions and returns a list of matching

documents. Boolean expressions can be formed using a

range of Boolean and proximity operators:

21. Retrieval - Concept

IDOL server provides the following sophisticated conceptual

retrieval operations:

Conceptual Matching

IDOL server accepts a piece of content (a sentence,

paragraph or page of text, the body of an email, a record

containing human-readable information, or the derived

contextual information of an audio or speech snippet) or

reference (identifier) as input, and returns references to

conceptually related documents ranked by relevance or

contextual distance. This is used to generate automatic

hyperlinks between pieces of content.

Proper Names

IDOL server recognizes names and treats them as a unit.

Active MatchingIDOL server accepts textual information describing the

current user task and returns a list of documents ordered by

contextual relevance to the active task.

Native XML Indexing

This allows IDOL server to natively index plain well-formed

XML straight into IDOL server. This feature involves minimal

configuration with document level and field indexing

specification required.

Native XML Output

Users can specify in which output format they requireinformation, i.e. if they dont specify the XML output, the

default template is used.

Multiple XML Schema Support

Multiple simultaneous schema support - This feature enables

you to index multiple XML sources with varying XML

schemas (tag names/hierarchies) into IDOL server. IDOL

servers intelligence will perform conceptual analysis across

all the different schemas. Users have the option to specify

the output format of information.

Automatic XML Tagging

IDOL server can automatically XML tag any form of

unstructured information based on the same process used for

tag reconciliation.

22. Retrieval - Parametric

Advanced Parametric Refinement is used to provide an

improved user experience coupled with increased productivity

via an advanced real-time information discovery process.

Real-time navigation across multiple taxonomies is supported

with no additional manual configuration necessary, including

full access to intersections of diverse taxonomy definitions.

Exact Phrase

Provides the ability to search for exact phrases by putting

quotation marks around a string of words. For example,world market.

Fuzzy Queries

If a search string is not quite accurate (for example, if it

contains spelling mistakes) a fuzzy query returns results that

contain words that are similar to the entered string. (Note that

you need to enable fuzzy queries before you can use them).

Proximity Search

IDOL server returns documents in which specific terms occur

within a given proximity with a higher weighting.

Soundex Keyword Search

If the spelling of a keyword is not quite accurate but

phonetically correct, a Soundex keyword search returns

results that contain the keyword and phonetically similar

keywords (using a configurable Soundex algorithm).

AND

NOT

OR

XOR / EOR

NEAR

DNEAR

WNEAR

BEFORE

AFTER


5/6

From among the complete set of field names present within

the corpus, a subset of fields can be defined in the servers

configuration as of type 'Parametric'. These fields are known

as 'parametric' fields.

Once indexed, IDOL server will create and store a structure

containing information about all 'tag-value' pairs that occur

within defined parametric fields ('tag-value's' are definedwhere a field contains a textual or numerical value and the

field name is considered paired to its textual value).

The user may then query IDOL server with the name of a

parametric field or fields. IDOL server returns a list of all

textual values that appear within the given field or fields

within the documents stored in the server.

This underlying operation can be used to power a user

interface that enables a user to gradually refine the scope of

query from a complete corpus to the subset of documents

that contain information pertinent to the user's current

enquiry.

23. Retrieval - Federated

Submit queries to a selection of third party search engines in

addition to IDOL server.

Additional functionality

Sentient Architecture

IDOL server's sentient architecture delivers on the concept of

autonomic computing for companies worldwide. Globalpredictive self management abstracts the need for an

administrator, for example, by dynamically throttling IDOL's

connector layer to available bandwidth and a target site's

responsiveness together with the ability to predict windows of

opportunity for faster collection based on prior usage

patterns. This ability to support distributed architectures,

identify potential problems and prompt a real-time, dynamic

substitution enables companies to keep systems entirely

operational for users at all times. IDOL's sentient architecture

presents a robust solution for large, geographically

dispersed, multinational enterprises who seek to make all

their information assets readily available.

Failover / Distribution

Uninterrupted service is ensured through Failover. If IDOL

server should fail at any point, it is automatically restarted,

ensuring a stable system.

Automatic Language Detection

IDOL server can detect the language and encoding of

documents that it processes automatically. This allows you to

set up processes that are automatically applied to documents

or document metadata if they are in a specific language. For

example, if a document is identified as Chinese, the

appropriate preliminary linguistic tools are automatically

applied to it.

The Autonomy Service Dashboard provides central control.

Technical Brief

DiSH / Dashboard

The Autonomy Service Dashboard, is an intuitive stand-alone

front-end web interface that allows administrators to manage

all Autonomy modules/ services running locally or remotely.

The Dashboard communicates with one or more Autonomy

Distributed Service Handler (DiSH) modules that provide the

back-end process for monitoring and controlling all the

Autonomy child services.

DiSH servers administration

View the DiSH servers in enterprise

Display DiSH server information (version, ports, status,

start time etc.)

Add and remove DiSH servers to / from the dashboard

Edit the DiSH servers

View DiSH servers' configuration, license information and logs.

Services administration

View child services

Display child service information (version, ports,

status, start time etc.)

Add and remove child services

Edit child services

Configure child services

View child service's logs.

Control of services

Start, stop or pause or restart child service

Set up KeepAlives to ensure continuous service.

Monitoring services

Track service processing of documents

Automatically audit child service

Generate graphs for a child service's audit data.

Alerting

Allows setup of an email alert triggered by any statistic

Configuration of an alert triggered when certain

statistics values move outside a predefined range

Configuration of a periodic email alert containing

status summary reports.


6/6

Technical Brief

Architecture

User Interfaces

Retina

IDOL server 5 includes Autonomy Retina, a web interface

application that provides a full spectrum of retrieval methods,from simple keyword search to sophisticated conceptual

matching. Adjusting to the user's experience and proficiency,

Retina not only offers basic legacy search methods but also

leverages them through Autonomys unique pattern-

recognition technology.

Please refer to the Retina Technical Brief for further details.

Portlets

Autonomy provides a wide range of Portlets that offer user-

friendly platforms from which IDOL server operations can be

intuitively executed. Autonomy Portlets are available as part

of the Autonomy Portal-in-a-Box solution or for integration

with a number of market-leading third party Portals.

Please refer to the Portlets Technical Brief for further details.

Microsoft Windows NT4, 2000, XP and 2003

Linux (all versions) kernel 2.2, 2.4 and 2.6

Sun Solaris for SPARC version 5 - 9

Sun Solaris for Intel version 9

AIX version 4.3, 5 and 5.1

HP-UX for PA-RISC version 10, 11 and 11i

HP-UX for Itanium version 11i

Tru64 version 5.1

Other POSIX compliant UNIX versions are available on request.

Minimum Server Specifications:

Dual Intel Xeon 1.8 Ghz

1 GB RAM

30 GB hard disk recommended

For specific sizing requirements, please consult the Autonomy

Sizing Service.

Requirements

Platforms Supported:

www.autonomy.com

Autonomy Inc.

One Market Plaza,

19th Floor,

Spear Tower,

San Francisco, CA 94105

Tel: 415 243 9955

Fax: 415 243 9984

Email: [email protected]

Autonomy Systems Ltd

Cambridge Business Park

Cowley Road

Cambridge CB4 0WZ

Tel: +44 (0) 1223 448 000

Fax: +44 (0) 1223 448 001

Email: [email protected]

Other Offices

Autonomy has additional offices in Boston, Dallas, Chicago,

Washington and New York,as well as in Amsterdam, Beijing,

Diegem, Hamburg, Madrid, Milan, Munich, Oslo,

Paris, Rome, Singapore, Stockholm and Sydney.

Copyright 2005 Autonomy Corp All rights reserved Other trademarks are registered trademarks and the properties of their respective owners

Autonomy IDOL Server Technical Brief 1204 Rev1

Documents

Transcript of Autonomy IDOL Server Technical Brief 1204 Rev1