Autonomy IDOL Server Technical Brief 1204 Rev1
-
Upload
mark-aldiss -
Category
Documents
-
view
216 -
download
0
Transcript of Autonomy IDOL Server Technical Brief 1204 Rev1
-
7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1
1/6
Technical Brief
Autonomy IDOL server 5
IDOL serverAt the heart of Autonomys software infrastructure lies IDOL
server, a scalable, multithreaded process based on advanced
pattern-matching technology that exploits high-performance
probabilistic modeling techniques.
Selected IDOL server operationsThe intelligent operations that IDOL server performs across
structured, semi-structured and unstructured data are highly
customizable, offering a wide range of configuration
combinations that enable you to perform over 250 dataoperations.
1. Automatic Query Guidance
IDOL servers Automatic Query Guidance feature provides
an easy navigation facility which directs users to the results
they require based on a conceptual and contextual
understanding of their query. Instead of page ranking, an
approach which has been proven to be ineffective in the link
free enterprise, Automatic Query Guidance uses conceptual
clustering to determine the context of a user's search, and
presents the most appropriate results along with other
suggestions, even from few or single word queries.
2. Dynamic Clustering
Query results are clustered on the fly to avoid information
overload and provide an overview of the different conceptual
aspects that results can be grouped into. The clustered
results are presented in an easily navigable hierarchy,
providing users with speedy access to the right information.
3. Hyperlinking
Hyperlinks can be automatically generated in real time.
These link to contextually similar content and can be used to
recommend related articles, documents, affinity products or
services, or media content that relates to textual content.
Because links are automatically inserted at the time a
document is retrieved, they can include references to
documents and articles written long before. Hyperlinks from
archived material can link to the latest news or material on
that subject.
Autonomys software infrastructure uses sophisticated pattern-matching techniques to enable computers to understand information in
context. For the first time, a computer can go beyond keywords and metadata to identify concepts within text itself, determine the
concepts' importance and automate the processing of this content, regardless of its format, location, language and source application.
Using Autonomy Connectors, Autonomy's unique Intelligent Data Operating Layer (IDOL) integrates unstructured, semi-structured andstructured information from multiple repositories through an understanding of the content, delivering a real-time environment in which
operations across applications and content are automated, removing all the manual processes involved in getting the right information
to the right people at the right time.
IDOL server provides the following core information operations:
13. Profiling
14. Expertise Location
15. Collaboration
16. Alerting
17. Mailing
18. Spelling Correction
19. Dynamic Thesaurus
20. Retrieval - Lite
21. Retrieval - Concept
22. Retrieval - Parametric
23. Retrieval - Federated
1. Automatic Query Guidance
2. Dynamic Clustering
3. Hyperlinking
4. Summarization
5. Taxonomy Generation
6. Categorization
7. Channels
8. Channel Recommendation
9. Clustering
10. CEN Clustering
11. Eduction
12. Agents
-
7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1
2/6
Technical Brief
4. Summarization
IDOL server accepts a piece of content and returns a
summary of the information. IDOL server can generate
different types of summary:
Conceptualsummaries
Summaries that contain the most salient concepts of the
content
Contextual summaries
Summaries that relate to the context of the original inquiry -
allowing the most applicable dynamic summary to be
provided in the results of a given inquiry.
Quick summaries
Summaries that comprise a few sentences of the result
documents.
5. Taxonomy Generation
IDOL server's automatic Taxonomy Generation feature can
automatically understand and create deep hierarchical
contextual taxonomies of information. Clustering or any other
conceptual operation can be used as a seed for the
process. The resulting taxonomy can be used to provide
insight into specific areas of the information, to provide an
overall information landscape, or as training material for
automatic categorization, which then allows information to be
placed into a formally dictated and controlled category
hierarchy.
Automatic Taxonomy Based on Cluster Result
Based on cluster results, IDOL server can build Taxonomies
automatically and in real time.
Automatic Taxonomy to Category Generation
Once the Automatic Taxonomy Generation process has
taken place, it contextually understands the type of data it is
dealing with. From this a deep hierarchical contextual
taxonomy is generated, known also as an information
landscape. Much like the Automatic Cluster to Category
Generation, this feature takes the taxonomy results and usesthat data to create categories (in order to perform
categorization of information using the Categorization
operation).
6. Categorization
IDOL server can automatically categorize data with no
requirement for manual input whatsoever. The flexibility of
Autonomys Categorization feature allows you to precisely
derive categories using concepts found within unstructured
text. This ensures that all data is classified in the correct
context with the utmost accuracy. Autonomys Categorization
feature is a completely scalable solution capable of handling
high volumes of information with extreme accuracy and total
consistency.
Rather than relying on rigid rule based category definitions
such as Legacy Keyword and Boolean Operators,
Autonomys infrastructure relies on an elegant pattern
matching process based on concepts to categorize
documents and automatically insert tag data sets, route
content or alert users to highly relevant information pertinent
to the users profile.
This highly efficient process means that Autonomy is able to
categorize upwards of four million documents in 24 hours per
CPU instance. That's approximately one document, every 25
milliseconds. Autonomy hooks into virtually all repositories
and data formats respecting all security and access
entitlements, delivering complete reliability.
Category Matching
IDOL server accepts a category or piece of content and
returns categories ranked by conceptual similarity. This
determines for which categories the piece of content is most
appropriate, so that the piece of content can subsequently be
tagged, routed or filed accordingly.
7. Channels
IDOL server can automatically provide users with a set of
hierarchical channels with highly relevant information
pertinent to the respective channel. Eliminating the
requirement for manual intervention or pre-tagging, real-time
information is dynamically updated into the channels
automatically, minimizing the maintenance effort required.
Moreover, the administrator can add and remove channelson the fly, without having to re-categorize all of the data.
8. Channel Recommendation
IDOL servers Channel Recommendation feature
automatically recommends conceptually matching channels
when a query is submitted to IDOL server, thus providing
users with instant access to relevant information in the
hierarchical channels.
9. ClusteringIDOL server delivers the ability to automatically cluster
information. Clustering is the process of taking a large
repository of unstructured data, agents or profiles and
automatically partitioning the data so that similar information
is clustered together. Each cluster represents a concept area
within the knowledge base and contains a set of items with
common properties.
Features:
Automatic clustering of information
Configurable sub-headings
Automatic title generation
Configurable results layout
Identify key areas of expertise
Complete overview of knowledge base.
-
7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1
3/6
Technical Brief
10. CEN Clustering
IDOL server provides Collaboration and Enterprise Network
(CEN) Clustering to automatically match clustered data
against user agents and profiles in order to identify data that
matches people's interests. User interfaces that integrate with
IDOL server (for example, Retina, Portal-in-a-Box or third
party portals) highlight matching data in a spectrograph and
enable on-the-fly display of community users who own
matching agents or profiles, providing an instant overview of
the community users' details and instant email contactability.
Features:
Automatic clustering of information
Automatic matching cluster / interests matching
Automatic highlighting of popular clusters
Identify key areas of expertise
Display community user details
Email community users
Encourage collaboration.
11. Eduction
Eduction identifies concepts in the document in order to add
tags to the kind of content you specify.
Features:
Tag training
Plain Tagging ConceptValue Tagging
Negative Name training
Default User definable phrase tags
Case-sensitive user defined phrase tags.
12. Agents
Agents provide the facilities to find and monitor information
from a configurable list of Internet and Intranet sites, News
Feeds, Chat Streams and internal repositories highly relevant
to the explicit interests of a user. Agents are created in a very
user-friendly way using the following options:
Natural language descriptions
Example content (point and click)
Legacy Keyword or Boolean Expressions.
IDOL server provides the conceptual information that is
needed to create agents. The server accepts a piece of
content (training text, a document or a set of documents) or
reference (identifier) and returns an encoded representation
of the concepts, including each concepts specific underlying
patterns of terms and associated probabilistic ratings.
Agent Retraining
The server accepts an agent and a piece of content (training
text, a document or a set of documents) and adapts the
agent using the content.
Agent Alerting
The server accepts a piece of content (a sentence,
paragraph or page of text, the body of an email, a record
containing human readable information, or the derived
contextual information of an audio or speech snippet) and
returns similar agents ranked by conceptual similarity. This is
used to discover users who are interested in the content, or
to find experts in a field.
Agent Matching
The high performance agent matching solution enables
documents to be dynamically matched against any scale of
Boolean Agents. As content is indexed into IDOL server, the
content is matched against all Agent rules simultaneously
allowing targeted information to be delivered to the user in
real time.
13. ProfilingIDOL server tracks the content with which a user interacts,
extracts a conceptual understanding of the content and uses
this understanding to maintain a profile of the users
interests.
This profile is typically used to target information on particular
users, recommend content to users and to alert users to the
existence of content.
14. Expertise Location
IDOL server facilitates the automatic recognition of highly
focused experts and reduces the duplication of effort through
teamwork and the engagement of proactive collaboration
ventures.
15. Collaboration
IDOL server automatically matches users with common
explicit interest agents or similar implicit profile agents. This
information can be used to create virtual expert knowledge
groups.
16. AlertingIDOL server analyzes data in new documents, and compares
the concepts the documents contain with agents that users
have set up already. It then automatically sends email
notification to users whose interests are similar to a new
documentscontent.
17. Mailing
IDOL server regularly emails users to notify them of content
that matches their agents and channels that they are
subscribed to.
Features:
Configurable email format through XSS templates.
-
7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1
4/6
Technical Brief
18. Spelling Correction
IDOL server can automatically spell check query text that it
receives and suggest correct spelling for terms that it doesnt
contain. If a query contains several words that IDOL server
does not recognize, it suggests a spelling suggestion for
each of these words.
19. Dynamic Thesaurus
IDOL server includes a sophisticated conceptual Thesaurus
which uses the most salient terms and phrases in the result
documents that a query produces in order to offer a selection
of alternative query strings. These strings allow a user to
quickly execute alternative queries in order to produce a
variety of relevant result sets.
20. Retrieval - Lite
IDOL server offers the following basic legacy search
methods:
Legacy Keyword
IDOL server accepts a keyword and returns a list of
documents containing the terms ordered by contextual
relevance to the query.
Boolean/ bracketed Boolean
IDOL server accepts simple or complex Boolean and
bracketed Boolean expressions and returns a list of matching
documents. Boolean expressions can be formed using a
range of Boolean and proximity operators:
21. Retrieval - Concept
IDOL server provides the following sophisticated conceptual
retrieval operations:
Conceptual Matching
IDOL server accepts a piece of content (a sentence,
paragraph or page of text, the body of an email, a record
containing human-readable information, or the derived
contextual information of an audio or speech snippet) or
reference (identifier) as input, and returns references to
conceptually related documents ranked by relevance or
contextual distance. This is used to generate automatic
hyperlinks between pieces of content.
Proper Names
IDOL server recognizes names and treats them as a unit.
Active MatchingIDOL server accepts textual information describing the
current user task and returns a list of documents ordered by
contextual relevance to the active task.
Native XML Indexing
This allows IDOL server to natively index plain well-formed
XML straight into IDOL server. This feature involves minimal
configuration with document level and field indexing
specification required.
Native XML Output
Users can specify in which output format they requireinformation, i.e. if they dont specify the XML output, the
default template is used.
Multiple XML Schema Support
Multiple simultaneous schema support - This feature enables
you to index multiple XML sources with varying XML
schemas (tag names/hierarchies) into IDOL server. IDOL
servers intelligence will perform conceptual analysis across
all the different schemas. Users have the option to specify
the output format of information.
Automatic XML Tagging
IDOL server can automatically XML tag any form of
unstructured information based on the same process used for
tag reconciliation.
22. Retrieval - Parametric
Advanced Parametric Refinement is used to provide an
improved user experience coupled with increased productivity
via an advanced real-time information discovery process.
Real-time navigation across multiple taxonomies is supported
with no additional manual configuration necessary, including
full access to intersections of diverse taxonomy definitions.
Exact Phrase
Provides the ability to search for exact phrases by putting
quotation marks around a string of words. For example,world market.
Fuzzy Queries
If a search string is not quite accurate (for example, if it
contains spelling mistakes) a fuzzy query returns results that
contain words that are similar to the entered string. (Note that
you need to enable fuzzy queries before you can use them).
Proximity Search
IDOL server returns documents in which specific terms occur
within a given proximity with a higher weighting.
Soundex Keyword Search
If the spelling of a keyword is not quite accurate but
phonetically correct, a Soundex keyword search returns
results that contain the keyword and phonetically similar
keywords (using a configurable Soundex algorithm).
AND
NOT
OR
XOR / EOR
NEAR
DNEAR
WNEAR
BEFORE
AFTER
-
7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1
5/6
From among the complete set of field names present within
the corpus, a subset of fields can be defined in the servers
configuration as of type 'Parametric'. These fields are known
as 'parametric' fields.
Once indexed, IDOL server will create and store a structure
containing information about all 'tag-value' pairs that occur
within defined parametric fields ('tag-value's' are definedwhere a field contains a textual or numerical value and the
field name is considered paired to its textual value).
The user may then query IDOL server with the name of a
parametric field or fields. IDOL server returns a list of all
textual values that appear within the given field or fields
within the documents stored in the server.
This underlying operation can be used to power a user
interface that enables a user to gradually refine the scope of
query from a complete corpus to the subset of documents
that contain information pertinent to the user's current
enquiry.
23. Retrieval - Federated
Submit queries to a selection of third party search engines in
addition to IDOL server.
Additional functionality
Sentient Architecture
IDOL server's sentient architecture delivers on the concept of
autonomic computing for companies worldwide. Globalpredictive self management abstracts the need for an
administrator, for example, by dynamically throttling IDOL's
connector layer to available bandwidth and a target site's
responsiveness together with the ability to predict windows of
opportunity for faster collection based on prior usage
patterns. This ability to support distributed architectures,
identify potential problems and prompt a real-time, dynamic
substitution enables companies to keep systems entirely
operational for users at all times. IDOL's sentient architecture
presents a robust solution for large, geographically
dispersed, multinational enterprises who seek to make all
their information assets readily available.
Failover / Distribution
Uninterrupted service is ensured through Failover. If IDOL
server should fail at any point, it is automatically restarted,
ensuring a stable system.
Automatic Language Detection
IDOL server can detect the language and encoding of
documents that it processes automatically. This allows you to
set up processes that are automatically applied to documents
or document metadata if they are in a specific language. For
example, if a document is identified as Chinese, the
appropriate preliminary linguistic tools are automatically
applied to it.
The Autonomy Service Dashboard provides central control.
Technical Brief
DiSH / Dashboard
The Autonomy Service Dashboard, is an intuitive stand-alone
front-end web interface that allows administrators to manage
all Autonomy modules/ services running locally or remotely.
The Dashboard communicates with one or more Autonomy
Distributed Service Handler (DiSH) modules that provide the
back-end process for monitoring and controlling all the
Autonomy child services.
DiSH servers administration
View the DiSH servers in enterprise
Display DiSH server information (version, ports, status,
start time etc.)
Add and remove DiSH servers to / from the dashboard
Edit the DiSH servers
View DiSH servers' configuration, license information and logs.
Services administration
View child services
Display child service information (version, ports,
status, start time etc.)
Add and remove child services
Edit child services
Configure child services
View child service's logs.
Control of services
Start, stop or pause or restart child service
Set up KeepAlives to ensure continuous service.
Monitoring services
Track service processing of documents
Automatically audit child service
Generate graphs for a child service's audit data.
Alerting
Allows setup of an email alert triggered by any statistic
Configuration of an alert triggered when certain
statistics values move outside a predefined range
Configuration of a periodic email alert containing
status summary reports.
-
7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1
6/6
Technical Brief
Architecture
User Interfaces
Retina
IDOL server 5 includes Autonomy Retina, a web interface
application that provides a full spectrum of retrieval methods,from simple keyword search to sophisticated conceptual
matching. Adjusting to the user's experience and proficiency,
Retina not only offers basic legacy search methods but also
leverages them through Autonomys unique pattern-
recognition technology.
Please refer to the Retina Technical Brief for further details.
Portlets
Autonomy provides a wide range of Portlets that offer user-
friendly platforms from which IDOL server operations can be
intuitively executed. Autonomy Portlets are available as part
of the Autonomy Portal-in-a-Box solution or for integration
with a number of market-leading third party Portals.
Please refer to the Portlets Technical Brief for further details.
Microsoft Windows NT4, 2000, XP and 2003
Linux (all versions) kernel 2.2, 2.4 and 2.6
Sun Solaris for SPARC version 5 - 9
Sun Solaris for Intel version 9
AIX version 4.3, 5 and 5.1
HP-UX for PA-RISC version 10, 11 and 11i
HP-UX for Itanium version 11i
Tru64 version 5.1
Other POSIX compliant UNIX versions are available on request.
Minimum Server Specifications:
Dual Intel Xeon 1.8 Ghz
1 GB RAM
30 GB hard disk recommended
For specific sizing requirements, please consult the Autonomy
Sizing Service.
Requirements
Platforms Supported:
www.autonomy.com
Autonomy Inc.
One Market Plaza,
19th Floor,
Spear Tower,
San Francisco, CA 94105
Tel: 415 243 9955
Fax: 415 243 9984
Email: [email protected]
Autonomy Systems Ltd
Cambridge Business Park
Cowley Road
Cambridge CB4 0WZ
Tel: +44 (0) 1223 448 000
Fax: +44 (0) 1223 448 001
Email: [email protected]
Other Offices
Autonomy has additional offices in Boston, Dallas, Chicago,
Washington and New York,as well as in Amsterdam, Beijing,
Diegem, Hamburg, Madrid, Milan, Munich, Oslo,
Paris, Rome, Singapore, Stockholm and Sydney.
Copyright 2005 Autonomy Corp All rights reserved Other trademarks are registered trademarks and the properties of their respective owners