Autonomy IDOL Server Technical Brief 1204 Rev1

download Autonomy IDOL Server Technical Brief 1204 Rev1

of 6

Transcript of Autonomy IDOL Server Technical Brief 1204 Rev1

  • 7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1

    1/6

    Technical Brief

    Autonomy IDOL server 5

    IDOL serverAt the heart of Autonomys software infrastructure lies IDOL

    server, a scalable, multithreaded process based on advanced

    pattern-matching technology that exploits high-performance

    probabilistic modeling techniques.

    Selected IDOL server operationsThe intelligent operations that IDOL server performs across

    structured, semi-structured and unstructured data are highly

    customizable, offering a wide range of configuration

    combinations that enable you to perform over 250 dataoperations.

    1. Automatic Query Guidance

    IDOL servers Automatic Query Guidance feature provides

    an easy navigation facility which directs users to the results

    they require based on a conceptual and contextual

    understanding of their query. Instead of page ranking, an

    approach which has been proven to be ineffective in the link

    free enterprise, Automatic Query Guidance uses conceptual

    clustering to determine the context of a user's search, and

    presents the most appropriate results along with other

    suggestions, even from few or single word queries.

    2. Dynamic Clustering

    Query results are clustered on the fly to avoid information

    overload and provide an overview of the different conceptual

    aspects that results can be grouped into. The clustered

    results are presented in an easily navigable hierarchy,

    providing users with speedy access to the right information.

    3. Hyperlinking

    Hyperlinks can be automatically generated in real time.

    These link to contextually similar content and can be used to

    recommend related articles, documents, affinity products or

    services, or media content that relates to textual content.

    Because links are automatically inserted at the time a

    document is retrieved, they can include references to

    documents and articles written long before. Hyperlinks from

    archived material can link to the latest news or material on

    that subject.

    Autonomys software infrastructure uses sophisticated pattern-matching techniques to enable computers to understand information in

    context. For the first time, a computer can go beyond keywords and metadata to identify concepts within text itself, determine the

    concepts' importance and automate the processing of this content, regardless of its format, location, language and source application.

    Using Autonomy Connectors, Autonomy's unique Intelligent Data Operating Layer (IDOL) integrates unstructured, semi-structured andstructured information from multiple repositories through an understanding of the content, delivering a real-time environment in which

    operations across applications and content are automated, removing all the manual processes involved in getting the right information

    to the right people at the right time.

    IDOL server provides the following core information operations:

    13. Profiling

    14. Expertise Location

    15. Collaboration

    16. Alerting

    17. Mailing

    18. Spelling Correction

    19. Dynamic Thesaurus

    20. Retrieval - Lite

    21. Retrieval - Concept

    22. Retrieval - Parametric

    23. Retrieval - Federated

    1. Automatic Query Guidance

    2. Dynamic Clustering

    3. Hyperlinking

    4. Summarization

    5. Taxonomy Generation

    6. Categorization

    7. Channels

    8. Channel Recommendation

    9. Clustering

    10. CEN Clustering

    11. Eduction

    12. Agents

  • 7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1

    2/6

    Technical Brief

    4. Summarization

    IDOL server accepts a piece of content and returns a

    summary of the information. IDOL server can generate

    different types of summary:

    Conceptualsummaries

    Summaries that contain the most salient concepts of the

    content

    Contextual summaries

    Summaries that relate to the context of the original inquiry -

    allowing the most applicable dynamic summary to be

    provided in the results of a given inquiry.

    Quick summaries

    Summaries that comprise a few sentences of the result

    documents.

    5. Taxonomy Generation

    IDOL server's automatic Taxonomy Generation feature can

    automatically understand and create deep hierarchical

    contextual taxonomies of information. Clustering or any other

    conceptual operation can be used as a seed for the

    process. The resulting taxonomy can be used to provide

    insight into specific areas of the information, to provide an

    overall information landscape, or as training material for

    automatic categorization, which then allows information to be

    placed into a formally dictated and controlled category

    hierarchy.

    Automatic Taxonomy Based on Cluster Result

    Based on cluster results, IDOL server can build Taxonomies

    automatically and in real time.

    Automatic Taxonomy to Category Generation

    Once the Automatic Taxonomy Generation process has

    taken place, it contextually understands the type of data it is

    dealing with. From this a deep hierarchical contextual

    taxonomy is generated, known also as an information

    landscape. Much like the Automatic Cluster to Category

    Generation, this feature takes the taxonomy results and usesthat data to create categories (in order to perform

    categorization of information using the Categorization

    operation).

    6. Categorization

    IDOL server can automatically categorize data with no

    requirement for manual input whatsoever. The flexibility of

    Autonomys Categorization feature allows you to precisely

    derive categories using concepts found within unstructured

    text. This ensures that all data is classified in the correct

    context with the utmost accuracy. Autonomys Categorization

    feature is a completely scalable solution capable of handling

    high volumes of information with extreme accuracy and total

    consistency.

    Rather than relying on rigid rule based category definitions

    such as Legacy Keyword and Boolean Operators,

    Autonomys infrastructure relies on an elegant pattern

    matching process based on concepts to categorize

    documents and automatically insert tag data sets, route

    content or alert users to highly relevant information pertinent

    to the users profile.

    This highly efficient process means that Autonomy is able to

    categorize upwards of four million documents in 24 hours per

    CPU instance. That's approximately one document, every 25

    milliseconds. Autonomy hooks into virtually all repositories

    and data formats respecting all security and access

    entitlements, delivering complete reliability.

    Category Matching

    IDOL server accepts a category or piece of content and

    returns categories ranked by conceptual similarity. This

    determines for which categories the piece of content is most

    appropriate, so that the piece of content can subsequently be

    tagged, routed or filed accordingly.

    7. Channels

    IDOL server can automatically provide users with a set of

    hierarchical channels with highly relevant information

    pertinent to the respective channel. Eliminating the

    requirement for manual intervention or pre-tagging, real-time

    information is dynamically updated into the channels

    automatically, minimizing the maintenance effort required.

    Moreover, the administrator can add and remove channelson the fly, without having to re-categorize all of the data.

    8. Channel Recommendation

    IDOL servers Channel Recommendation feature

    automatically recommends conceptually matching channels

    when a query is submitted to IDOL server, thus providing

    users with instant access to relevant information in the

    hierarchical channels.

    9. ClusteringIDOL server delivers the ability to automatically cluster

    information. Clustering is the process of taking a large

    repository of unstructured data, agents or profiles and

    automatically partitioning the data so that similar information

    is clustered together. Each cluster represents a concept area

    within the knowledge base and contains a set of items with

    common properties.

    Features:

    Automatic clustering of information

    Configurable sub-headings

    Automatic title generation

    Configurable results layout

    Identify key areas of expertise

    Complete overview of knowledge base.

  • 7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1

    3/6

    Technical Brief

    10. CEN Clustering

    IDOL server provides Collaboration and Enterprise Network

    (CEN) Clustering to automatically match clustered data

    against user agents and profiles in order to identify data that

    matches people's interests. User interfaces that integrate with

    IDOL server (for example, Retina, Portal-in-a-Box or third

    party portals) highlight matching data in a spectrograph and

    enable on-the-fly display of community users who own

    matching agents or profiles, providing an instant overview of

    the community users' details and instant email contactability.

    Features:

    Automatic clustering of information

    Automatic matching cluster / interests matching

    Automatic highlighting of popular clusters

    Identify key areas of expertise

    Display community user details

    Email community users

    Encourage collaboration.

    11. Eduction

    Eduction identifies concepts in the document in order to add

    tags to the kind of content you specify.

    Features:

    Tag training

    Plain Tagging ConceptValue Tagging

    Negative Name training

    Default User definable phrase tags

    Case-sensitive user defined phrase tags.

    12. Agents

    Agents provide the facilities to find and monitor information

    from a configurable list of Internet and Intranet sites, News

    Feeds, Chat Streams and internal repositories highly relevant

    to the explicit interests of a user. Agents are created in a very

    user-friendly way using the following options:

    Natural language descriptions

    Example content (point and click)

    Legacy Keyword or Boolean Expressions.

    IDOL server provides the conceptual information that is

    needed to create agents. The server accepts a piece of

    content (training text, a document or a set of documents) or

    reference (identifier) and returns an encoded representation

    of the concepts, including each concepts specific underlying

    patterns of terms and associated probabilistic ratings.

    Agent Retraining

    The server accepts an agent and a piece of content (training

    text, a document or a set of documents) and adapts the

    agent using the content.

    Agent Alerting

    The server accepts a piece of content (a sentence,

    paragraph or page of text, the body of an email, a record

    containing human readable information, or the derived

    contextual information of an audio or speech snippet) and

    returns similar agents ranked by conceptual similarity. This is

    used to discover users who are interested in the content, or

    to find experts in a field.

    Agent Matching

    The high performance agent matching solution enables

    documents to be dynamically matched against any scale of

    Boolean Agents. As content is indexed into IDOL server, the

    content is matched against all Agent rules simultaneously

    allowing targeted information to be delivered to the user in

    real time.

    13. ProfilingIDOL server tracks the content with which a user interacts,

    extracts a conceptual understanding of the content and uses

    this understanding to maintain a profile of the users

    interests.

    This profile is typically used to target information on particular

    users, recommend content to users and to alert users to the

    existence of content.

    14. Expertise Location

    IDOL server facilitates the automatic recognition of highly

    focused experts and reduces the duplication of effort through

    teamwork and the engagement of proactive collaboration

    ventures.

    15. Collaboration

    IDOL server automatically matches users with common

    explicit interest agents or similar implicit profile agents. This

    information can be used to create virtual expert knowledge

    groups.

    16. AlertingIDOL server analyzes data in new documents, and compares

    the concepts the documents contain with agents that users

    have set up already. It then automatically sends email

    notification to users whose interests are similar to a new

    documentscontent.

    17. Mailing

    IDOL server regularly emails users to notify them of content

    that matches their agents and channels that they are

    subscribed to.

    Features:

    Configurable email format through XSS templates.

  • 7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1

    4/6

    Technical Brief

    18. Spelling Correction

    IDOL server can automatically spell check query text that it

    receives and suggest correct spelling for terms that it doesnt

    contain. If a query contains several words that IDOL server

    does not recognize, it suggests a spelling suggestion for

    each of these words.

    19. Dynamic Thesaurus

    IDOL server includes a sophisticated conceptual Thesaurus

    which uses the most salient terms and phrases in the result

    documents that a query produces in order to offer a selection

    of alternative query strings. These strings allow a user to

    quickly execute alternative queries in order to produce a

    variety of relevant result sets.

    20. Retrieval - Lite

    IDOL server offers the following basic legacy search

    methods:

    Legacy Keyword

    IDOL server accepts a keyword and returns a list of

    documents containing the terms ordered by contextual

    relevance to the query.

    Boolean/ bracketed Boolean

    IDOL server accepts simple or complex Boolean and

    bracketed Boolean expressions and returns a list of matching

    documents. Boolean expressions can be formed using a

    range of Boolean and proximity operators:

    21. Retrieval - Concept

    IDOL server provides the following sophisticated conceptual

    retrieval operations:

    Conceptual Matching

    IDOL server accepts a piece of content (a sentence,

    paragraph or page of text, the body of an email, a record

    containing human-readable information, or the derived

    contextual information of an audio or speech snippet) or

    reference (identifier) as input, and returns references to

    conceptually related documents ranked by relevance or

    contextual distance. This is used to generate automatic

    hyperlinks between pieces of content.

    Proper Names

    IDOL server recognizes names and treats them as a unit.

    Active MatchingIDOL server accepts textual information describing the

    current user task and returns a list of documents ordered by

    contextual relevance to the active task.

    Native XML Indexing

    This allows IDOL server to natively index plain well-formed

    XML straight into IDOL server. This feature involves minimal

    configuration with document level and field indexing

    specification required.

    Native XML Output

    Users can specify in which output format they requireinformation, i.e. if they dont specify the XML output, the

    default template is used.

    Multiple XML Schema Support

    Multiple simultaneous schema support - This feature enables

    you to index multiple XML sources with varying XML

    schemas (tag names/hierarchies) into IDOL server. IDOL

    servers intelligence will perform conceptual analysis across

    all the different schemas. Users have the option to specify

    the output format of information.

    Automatic XML Tagging

    IDOL server can automatically XML tag any form of

    unstructured information based on the same process used for

    tag reconciliation.

    22. Retrieval - Parametric

    Advanced Parametric Refinement is used to provide an

    improved user experience coupled with increased productivity

    via an advanced real-time information discovery process.

    Real-time navigation across multiple taxonomies is supported

    with no additional manual configuration necessary, including

    full access to intersections of diverse taxonomy definitions.

    Exact Phrase

    Provides the ability to search for exact phrases by putting

    quotation marks around a string of words. For example,world market.

    Fuzzy Queries

    If a search string is not quite accurate (for example, if it

    contains spelling mistakes) a fuzzy query returns results that

    contain words that are similar to the entered string. (Note that

    you need to enable fuzzy queries before you can use them).

    Proximity Search

    IDOL server returns documents in which specific terms occur

    within a given proximity with a higher weighting.

    Soundex Keyword Search

    If the spelling of a keyword is not quite accurate but

    phonetically correct, a Soundex keyword search returns

    results that contain the keyword and phonetically similar

    keywords (using a configurable Soundex algorithm).

    AND

    NOT

    OR

    XOR / EOR

    NEAR

    DNEAR

    WNEAR

    BEFORE

    AFTER

  • 7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1

    5/6

    From among the complete set of field names present within

    the corpus, a subset of fields can be defined in the servers

    configuration as of type 'Parametric'. These fields are known

    as 'parametric' fields.

    Once indexed, IDOL server will create and store a structure

    containing information about all 'tag-value' pairs that occur

    within defined parametric fields ('tag-value's' are definedwhere a field contains a textual or numerical value and the

    field name is considered paired to its textual value).

    The user may then query IDOL server with the name of a

    parametric field or fields. IDOL server returns a list of all

    textual values that appear within the given field or fields

    within the documents stored in the server.

    This underlying operation can be used to power a user

    interface that enables a user to gradually refine the scope of

    query from a complete corpus to the subset of documents

    that contain information pertinent to the user's current

    enquiry.

    23. Retrieval - Federated

    Submit queries to a selection of third party search engines in

    addition to IDOL server.

    Additional functionality

    Sentient Architecture

    IDOL server's sentient architecture delivers on the concept of

    autonomic computing for companies worldwide. Globalpredictive self management abstracts the need for an

    administrator, for example, by dynamically throttling IDOL's

    connector layer to available bandwidth and a target site's

    responsiveness together with the ability to predict windows of

    opportunity for faster collection based on prior usage

    patterns. This ability to support distributed architectures,

    identify potential problems and prompt a real-time, dynamic

    substitution enables companies to keep systems entirely

    operational for users at all times. IDOL's sentient architecture

    presents a robust solution for large, geographically

    dispersed, multinational enterprises who seek to make all

    their information assets readily available.

    Failover / Distribution

    Uninterrupted service is ensured through Failover. If IDOL

    server should fail at any point, it is automatically restarted,

    ensuring a stable system.

    Automatic Language Detection

    IDOL server can detect the language and encoding of

    documents that it processes automatically. This allows you to

    set up processes that are automatically applied to documents

    or document metadata if they are in a specific language. For

    example, if a document is identified as Chinese, the

    appropriate preliminary linguistic tools are automatically

    applied to it.

    The Autonomy Service Dashboard provides central control.

    Technical Brief

    DiSH / Dashboard

    The Autonomy Service Dashboard, is an intuitive stand-alone

    front-end web interface that allows administrators to manage

    all Autonomy modules/ services running locally or remotely.

    The Dashboard communicates with one or more Autonomy

    Distributed Service Handler (DiSH) modules that provide the

    back-end process for monitoring and controlling all the

    Autonomy child services.

    DiSH servers administration

    View the DiSH servers in enterprise

    Display DiSH server information (version, ports, status,

    start time etc.)

    Add and remove DiSH servers to / from the dashboard

    Edit the DiSH servers

    View DiSH servers' configuration, license information and logs.

    Services administration

    View child services

    Display child service information (version, ports,

    status, start time etc.)

    Add and remove child services

    Edit child services

    Configure child services

    View child service's logs.

    Control of services

    Start, stop or pause or restart child service

    Set up KeepAlives to ensure continuous service.

    Monitoring services

    Track service processing of documents

    Automatically audit child service

    Generate graphs for a child service's audit data.

    Alerting

    Allows setup of an email alert triggered by any statistic

    Configuration of an alert triggered when certain

    statistics values move outside a predefined range

    Configuration of a periodic email alert containing

    status summary reports.

  • 7/22/2019 Autonomy IDOL Server Technical Brief 1204 Rev1

    6/6

    Technical Brief

    Architecture

    User Interfaces

    Retina

    IDOL server 5 includes Autonomy Retina, a web interface

    application that provides a full spectrum of retrieval methods,from simple keyword search to sophisticated conceptual

    matching. Adjusting to the user's experience and proficiency,

    Retina not only offers basic legacy search methods but also

    leverages them through Autonomys unique pattern-

    recognition technology.

    Please refer to the Retina Technical Brief for further details.

    Portlets

    Autonomy provides a wide range of Portlets that offer user-

    friendly platforms from which IDOL server operations can be

    intuitively executed. Autonomy Portlets are available as part

    of the Autonomy Portal-in-a-Box solution or for integration

    with a number of market-leading third party Portals.

    Please refer to the Portlets Technical Brief for further details.

    Microsoft Windows NT4, 2000, XP and 2003

    Linux (all versions) kernel 2.2, 2.4 and 2.6

    Sun Solaris for SPARC version 5 - 9

    Sun Solaris for Intel version 9

    AIX version 4.3, 5 and 5.1

    HP-UX for PA-RISC version 10, 11 and 11i

    HP-UX for Itanium version 11i

    Tru64 version 5.1

    Other POSIX compliant UNIX versions are available on request.

    Minimum Server Specifications:

    Dual Intel Xeon 1.8 Ghz

    1 GB RAM

    30 GB hard disk recommended

    For specific sizing requirements, please consult the Autonomy

    Sizing Service.

    Requirements

    Platforms Supported:

    www.autonomy.com

    Autonomy Inc.

    One Market Plaza,

    19th Floor,

    Spear Tower,

    San Francisco, CA 94105

    Tel: 415 243 9955

    Fax: 415 243 9984

    Email: [email protected]

    Autonomy Systems Ltd

    Cambridge Business Park

    Cowley Road

    Cambridge CB4 0WZ

    Tel: +44 (0) 1223 448 000

    Fax: +44 (0) 1223 448 001

    Email: [email protected]

    Other Offices

    Autonomy has additional offices in Boston, Dallas, Chicago,

    Washington and New York,as well as in Amsterdam, Beijing,

    Diegem, Hamburg, Madrid, Milan, Munich, Oslo,

    Paris, Rome, Singapore, Stockholm and Sydney.

    Copyright 2005 Autonomy Corp All rights reserved Other trademarks are registered trademarks and the properties of their respective owners