Hidden Costs of Scaling Search Whitepaper English

download Hidden Costs of Scaling Search Whitepaper English

of 20

Transcript of Hidden Costs of Scaling Search Whitepaper English

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    1/20

    EXALEAD WHITEPAPER

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    2/20

    Foreword

    his whitepaper is intended to aid you in evaluating search and information access

    proposals for your organization by detailing a very important, often overlooked, cost

    component: scaling your search solution. Too many customers are surprised to find

    that almost immediately after deploying a search engine, they need to scale their

    platformand that the cost of scaling can be exorbitant.

    This paper therefore:

    Identifies the reasons why search needs escalate so frequently and dramatically,

    Explains why scaling is often expensive,

    Provides practical advice for anticipating and controlling costs, and

    Furnishes performance benchmarks for more effectively making cost comparisons between

    solutions.

    We hope this information will aid you in developing a complete TCO forecast for your search platform,

    one that effectively incorporates the costs associated with scaling functionality and/or performance

    in addition to more easily identifying direct, indirect and upgrade costs.

    The Authors

    We Welcome Your Feedback

    Whatever your roleIT analyst, system administrator, application end user, business manager,

    security expert, or simply a curious readeryour feedback is important to us. We invite you to

    contact us at the address below with your comments, suggestions or questions.

    Frdric Catherine, Marketing Supervisor, [email protected]

    +33 1 55 35 26 81

    www.exalead.com

    T

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    3/20

    1 Why Search Demands & Costs Escalate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1 Users Demand Wider Access, More Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

    1.2 IT Discovers New Uses, Additional Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

    2 Anticipating and Controlling Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 Forecasting Demand: Five Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

    3.1 Double Your Estimated Volume; Anticipate Double-Digit Growth . . . . . . . . . . . . . . . . . . . .3

    3.2 Plan for Additional Data Sources, including the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

    3.3 Anticipate Demand for a Web-Style Experience, and Real Web Integration . . . . . . . . . . .4

    3.4 Plan for Increased Compliance Demands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

    3.5 Position Yourself for the Unexpected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

    4 Understanding Search Types & Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54.1 Legacy Enterprise Search Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

    4.2 Search Add-Ons from Mainstream Application Providers . . . . . . . . . . . . . . . . . . . . . . . . . .5

    4.3 Web Search Engines Ported to the Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

    5 Establishing Apples-to-Apples Cost Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 About Exalead CloudViewTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

    6.1 Dual Web/Enterprise DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

    6.2 High Performance with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

    6.3 Infinite Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

    6.4 True Unified Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

    6.5 Rapid Time to Market, Agile Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

    7 CloudView Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107.1 Enterprise Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

    7.2 Business Applications - Database Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

    7.3 Web Applications - Online Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

    7.4 Web Applications - Online Classifieds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

    Table of Contents

    Figures

    Fig. 1: Data Volume Managed by Companies Worldwide (IDC) . . . . . . . . . . . . . . . . . . . . . . . . .3

    Fig. 2: CloudView Scales with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

    Fig. 3: CloudView Scales Infinitely in Five Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

    Fig. 4: Scaling with a Distributed Architecture + Commodity Hardware . . . . . . . . . . . . . . . . .8

    Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource . . . . .9

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    4/20

    1 Why Search Demands & Costs Escalate

    At root, search demands and costs escalate because searchworks. Users are hungry for better, easier information access.In fact, IDC estimates that information workers spend 48% of

    their time searching for and analyzing information, with one-

    third of that time resulting in failed searches (and re-created

    work), costing organizations $28,000 per worker per year.1

    1.1 Users Demand Wider Access, More FeaturesOnce enterprise search is deployed and users get a taste of unified, universal data access, demands

    to scale the system in functionality and performance appear almost immediately. Often, this is

    because organizations begin with an overly basic search solution: simple keyword searching of afinite set of resources, often HTML-centric, delivered via an appliance, hosted service or open source

    solution, and provided to a restricted user base.Even when more advanced systems are deployed, and a wider initial user base is served, users still

    quickly demand access to a wider range of data sources, and insist on more sophisticated featuresand functionality, such as automatic clustering and categorization, multilingual indexing, natural

    language querying and Web-style collaboration tools. And, of course, whatever the scope or functionality,

    users expect the sub-second responsiveness theyve become accustomed to on the Internet.

    1.2 IT Discovers New Uses, Additional ValueIn addition to this user-driven escalation, IT departments often discover that their search engine can

    provide value beyond simply locating information. They learn that search engines can be used to

    derive new value from existing information assets while adding much-needed IT agility. Specifically,

    these engines can be used to:

    Create new, exploitable assets from unstructured content like email, Office documents,

    chat and Web pages Increase the value of existing structured content (i.e., database systems)

    Provide a unified data platform for constructing agile business applications

    Transforming Unstructured Content into an Exploitable AssetSearch engines automatically classify and categorize unstructured data. Once this data is structuredand indexed, it can be incorporated into business information systems and processes. Enterprises

    find this can provide a significant competitive advantage given that unstructured data makes up onaverage 80% of corporate information assets, and that it contains highly valuable emotive andqualitative data.1. IDC Predictions 2009: An Economic Pressure Cooker Will Accelerate the IT Industry Transformation, IDC, 12/2008

    Page 1

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

    Information workers spend48% of their time searchingfor information, with 1/3 ofthat time resulting in failedsearches and recreated work

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    5/20

    Increasing the Value of Structured DataBecause of performance limitations (databases are optimized for storing, not accessingdata), and

    heavy licensing and infrastructure costs, database resources are frequently under-utilized in theenterprise. However, when IT managers discover that index-based querying is as rich as

    relational database querying yet ten times faster and cheaper, they begin to use search engines toprovide alternative access to essential database content.Unified Data Access for Agile ApplicationsUnified data access is essential for meeting escalating compliance requirements, and for satisfyingWeb-savvy users appetite for fast, easy information access. However, by decoupling data fromtraditional application layers, enterprises are also learning that search engines can enable a new

    breed of light business applications.

    Known as Search-Based Business Applications (SBAs), theseapplications can be created on-the-fly to satisfy evolving business

    needs using information drawn from any sourcefrom legacydatabases to email, blogs, and the Webwhile leaving existingsystems and structures untouched, an approach that preservesexisting IS investments and is clearly less complex and costly than

    traditional data and application integration strategies.

    Maximize Benefit; Avoid Sticker ShockGiven these benefits to both end users and IT managers, it is no wonder that functional andperformative search demands escalate so frequently. And it is in this attempt to meet these escalating

    search demands by scaling hardware, infrastructure and functionality that organizations frequently

    encounter search sticker shock.They boost RAM, add servers, increase bandwidth, add or upgrade licenses, and set about the

    difficult (sometimes impossible) task of trying to make simple search tools perform complex

    analytic functions.

    But, given that search is too often a complex, resource-intensiveprocess, with infrastructure requirements increasing exponentiallywith increases in functional requirements, and that scaling is often

    tied to proprietary hardware or to unreasonable user or documentcounts, it is easy to see why costs can quickly mount. Even somesolutions that begin at only a few thousand dollars can skyrocket

    to millions of dollars within just a few short years (sometimes even

    within one year) when functional or performance needs escalate.

    Page 2

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

    Search-BasedBusiness Applications(SBAs) are fast andeasy to construct andcan incorporate datafrom any source

    Without built-inscalability, even lowcost solutions canskyrocket tomillions of dollarsin just a few years

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    6/20

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    7/20

    3.3 Anticipate Demand for a Web-Style Experience, and Real WebIntegrationThese same Web-savvy users are also demanding that enterprise search, and the business applications

    built upon search platforms, be as easy and intuitive to use as Web applications, if not seamlesslyintegrated with those same tools. Make sure prospective bidders can meet users demands for: Zero-training usage (for search and search-based applications) The ability to leverage Web and personal information for business tasks (e.g., using their

    LinkedIn network for sales and recruiting or integrating FaceBook data in CRM applications)

    Web 2.0/3.0 interactive capabilities, such as workflow integration and collaborative toolslike resource tagging, bookmarking and sharing

    Fresh, up-to-date dataAs people spend more time online actively participating in Web 2.0 technologies such as rich user

    interfaces based on Ajax and Flash, social networking and tagging, blogs and wikis, Web mashups, and

    on-demand services in general, information workers will start expecting Enterprise 2.0 applications

    in the workplace that focus on providing easy-to-use and many-to-many personalized online experiences

    for creating, publishing, locating, and sharing information with colleagues, customers, and partners.

    Susan Feldman, IDC, Worldwide Search and Discovery Software 2008-2012 Forecast Update and

    2007 Vendor Shares

    3.4 Plan for Increased Compliance DemandsWhile IT has been working to meet increased legal and regulatory compliance demands for several

    years, regulatory pressures are revving up again in response to mismanagement issues underlying

    the recent economic crisis. Expect a trickle down impact on your own compliance strategy, withheightened internal demand for better risk management as well.

    3.5 Position Yourself for the UnexpectedAs the evolution of the Internet and Cloud computing attest, the information landscape is changingso fast that many demands simply cannot be anticipated. To make sure you have an enterprise

    search tool that provides you with maximum agility in responding to the unexpected, look for:

    An SOA architecture, with core services that can be easily replicated and distributed

    Open, standards-based APIs for flexibility in managing and interacting with the platform

    Support for Web formats and protocols (SOAP, REST, OWL, XML, RDF, RSS, etc.) as well as

    major programming environments (Java, C#, .Net)

    A single, unified base of unstructured and structured data

    Linear scaling using commodity hardware

    With these platform attributes, you can quickly modify existing applications, rapidly construct new

    applications and easily scale on demand.

    Page 4

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    8/20

    4 Understanding Search Types & Limitations

    Another aid to accurately forecasting costs is understanding the three basic types of enterprise

    search engines and their unique performance capabilities and limitations. These types are:

    Legacy enterprise search products

    Search add-ons from mainstream business application providers

    Web-based search engines ported to an enterprise environment

    4.1 Legacy Enterprise Search ProductsDesigned from inception for enterprise search, these engines were constructed for cross-repository

    data access and use statistical and linguistics-based text analytics to automate content processing.

    This enables them to produce the kind of faceted results navigation required for task-based searching

    in the enterprise. Most also provide good support for existing enterprise security infrastructures.

    While this native enterprise focus enables these engines to

    accommodate a wide range of functional requirements, they

    are often complex to use and lag in Web-style features. Theycan also be expensive to scale as they were designed from the

    outset for a relatively small user base and a limited (ofteninternal) set of data sources.

    4.2 Search Add-Ons from Mainstream Application ProvidersAnother class of engine is that developed by leading business application providers (IBM, SAP, SAS,

    Microsoft, Oracle, etc.) who sought first to improve the search function within their own database-

    centered products, then to extend that search functionality to external repositories.

    As they were originally designed for database querying, these

    products typically offer limited text analytics (i.e., limited ability to

    process unstructured data), are expensive to connect to external

    data sources, and expensive to scale due to restrictive licensingpolicies and resource-intense engineering. Many of these vendorshave attempted to address these shortcomings by acquiring native

    enterprise search companies, with limited success in product

    integration and support.

    4.3 Web Search Engines Ported to the EnterpriseThese search engines scale well, up to tens of billions of documents and hundreds of queries per

    second, however, they are feature-poor, designed for light keyword searching of mainly HTML

    content. They typically return a laundry list of search results rather than the faceted navigationrequired for task-based enterprise search (popularity-driven Web relevancy is meaningless in an

    enterprise context).

    Page 5

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

    Originally designed forlimited data collectionsand a small, traineduser base, traditionalenterprise systems areoften difficult to scale

    Search add-ons fromnon-search vendorsare typically poor intext analytics, limitedin source connectors,and expensive to scale

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    9/20

    They have limited text analytic capabilities and a limited capacity to

    ingest, process and integrate structured content. They likewise have

    limited built-in support for the special security constraints of an

    enterprise environment. Therefore, extending the functionality of such

    systems to better meet enterprise needs can be very expensive, when

    it is doable at all.

    Lastly, even when scaling is limited to content well-suited to

    these engines, it can still be surprisingly expensive. These productsare often sold with licenses tied to unrealistically low documentcounts, or scaling necessitates the purchase of expensive proprietaryhardware. Consider, for example, the cost of scaling a searchsolution from one popular Web vendor to hundreds of millions of

    documents when for only 30 million documents, the solutionrequires a $500k bi-annual license of proprietary hardware.

    5 Establishing Apples-to-Apples Cost

    Comparisons

    Finally, you can better anticipate and control costs by conducting a more accurate, more complete

    comparison of vendor cost proposals. To do so, first, detail your now-revised demand forecast,

    specifying:

    The Number of Users and Simultaneous Queries to be Processed

    The Number and Type of Sources and Documents to be Indexed

    The Range of Search and Indexing Features Required

    The Data Refresh Rate

    Next, ask prospective vendors to provide 5 year costs to cover both the initial demand and the scaleddemand. To realistically forecast TCO, these costs should include the following:Direct Costs:

    Software Licensing Fees

    Hardware & Operating System

    (servers, server clusters, back-up

    systems) - Initial Purchase and

    Upgrade Costs

    3 year 24*7 Support

    5 Year Maintenance & Support

    Note: Keep in mind that you can not only reduce costs by selecting a resource-efficient solution,you can also help your organization meet Green IT objectives.

    Page 6

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

    Products from Websearch vendors areweak in structureddata handling,faceted navigation,and security

    Though theytechnically scale well,licensing policiesoften make scalingWeb search enginesexpensive

    Indirect Costs: Staffing Costs for Software

    Implementation, and Software

    and Hardware Administration

    Hardware Floor Space

    Hardware Power

    Cooling Hardware

    Bandwidth

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    10/20

    6 About Exalead CloudViewTM

    As a final tool for anticipating and controlling costs, we provide several performance benchmarks

    for CloudView that you can use in comparing vendor solutions. But first, it is helpful to understand

    why CloudView provides an important comparative model for cost-efficient search scaling.

    6.1 Dual Web/Enterprise DNAFirst, and most importantly, CloudView was designed from inception for both the Web, driving an 8billion (soon to be 16 billion) page public search engine and serving 100 million unique researchers

    a month, and the enterprise market, with advanced semantic processing of unstructured data,superior structured data handling, and full compliance with existing security systems.

    6.2 High Performance with Minimal ResourcesFurthermore, CloudView was designed toachieve this balance of Web scalability

    and enterprise functionality using minimal

    resources. The end result is a platform

    that uses on average 1/5th the hardwareresources of competitors, providing real-

    time indexing of 100 million documentsand processing 20 queries per second ona single commodity serverall while

    providing advanced semantic featureslike dynamic categorization and clustering.

    6.3 Infinite Scaling

    Fig. 3: CloudView Scales Infinitely in Five Directions

    Page 7

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

    Fig. 2: CloudView Scales with Minimal Resources

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    11/20

    CloudView easily and cost-effectively scales in five directions:

    The Total Number of System UsersProven capacity to serve 100 million unique monthly visitors

    System Features and FunctionalityExtensive built-in functionality with full administrator control over features activated;

    open APIs for endlessly extending functionality

    Volume of Data IndexedIndex and index build services can easily be distributed across commodity servers; built-in

    index partitioning and replication services further extend performance and availability

    Number of Simultaneous Queries ProcessedAverage throughput of 20 Queries per Second (QPS) per server; easily scales by distributing

    query processing across multiple commodity servers

    Index Refresh RateSupports any data refresh strategy: 1) real-time, 2) interval, and 3) just in time (on query

    reception). Dictionaries, thesauri, etc., are automatically updated as the index is updated.

    Fig. 4: Scaling with aDistributed Architecture+ Commodity Hardware

    CloudView is designed tomaximize performance andavailability through process

    distribution, load balancing,index partitioning and index

    replication.

    Exaleads ability to scale is comparable to GooglesMost enterprise search and content processing

    systems cannot handle billions of documentsExalead does. Exalead's search and content

    processing solutions give the company a technical advantage over vendors whose systems choke

    when thousands of users simultaneously want access to information.

    Stephen Arnold, ArnoldIT

    Page 8

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    12/20

    6.4 True Unified Data AccessBecause CloudView was developed simultaneously for Web and enterprise search, the platforms

    natural language processing modules (text processing and annotation, automatic document

    classification, named entity extraction, etc.) are especially adept at analyzing, categorizing and

    classifying very high volumes of unstructured data, content like Word documents, Web pages, blogentries, email messages, PowerPoint presentations, PDFs, etc.

    This automatic structuration not only makes previously unstructured data directly accessible as a

    new information channel, it also enables CloudView to synthesize it with existing structured data,such as that from corporate databases and business applications. This meaningful correlation forms

    the foundation for value-added uses such as database offloading, data migration, and content mash-ups.

    Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource

    6.5 Rapid Time to Market, Agile DevelopmentRapid Time to MarketCloudView is both a fully packaged, off-the shelf product designed for plug and play use, and awhite box solution that can be quickly adapted to specific needs using standards-based APIs. Asa result, CloudView typically deploys in just days for enterprise search, and on average within only4-6 weeks for advanced business applications and data mash-ups, with little to no need forprofessional services support.Agile DevelopmentBeyond initial deployment, CloudView provides an agile base for rapidly constructing new business

    applications, and can be quickly scaled to meet evolving demands. Application agility is assured

    by CloudViews fully unified data access platform, SOA architecture and open API framework,while the ability to scale quickly is made possible by built-in distribution and replication facilities

    that simply require the addition of commodity hardware.

    Page 9

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    13/20

    7 CloudView Benchmarks

    To help you form a baseline of functional and performance requirements for comparing solutions,

    we provide below benchmarks for actual Exalead CloudViewTM installations. These benchmarks

    include statistics such as:

    Number of documents indexed

    Refresh rate for the index

    Queries processed per second

    Servers required

    Time to market

    Data source connectors used

    We invite you to use these specifications when evaluating vendor offerings. Furthermore, weencourage you to demand that prospective vendors contractually agree to meet your requirements

    with the resources they have proposed. Exalead can, and does.

    7.1 Enterprise SearchCOFACE EXTRANETCoface, a world leader in trade-credit information and protection with offices in 60 countries, selected

    CloudView for this extranet which provides customers with key data on 100 million companies.

    Performance Benchmarks Documents Indexed: 100 million (Oracle db records)

    Processing: 2000 documents indexed per second; 1.7 million company profiles

    added per hour

    Refresh Rate: Less than 1 minute

    Servers Required: 2 for indexing + 2 for searching

    Time to Market: 60 days

    Connectors: Standard PAPI and ODBC Connectors

    Competitors: Sinequa, Fast

    Note: Response rate is five times faster than legacy system

    The indexing capacity and performance of CloudView impressed us, and we quickly realized that

    this solution would enable us to create the kind of research services we wanted for our clients

    while letting us retain control over our costs, software, services, servers and maintenance. Whats

    more, the Exalead solution integrated transparently into our infrastructure, and offered essential

    security guarantees.

    Jean-Luc Brizard, ISD, Coface Services

    Page 10

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    14/20

    SANGER INSTITUTE INTRANET

    The Sanger Institute, a world-renowned research center dedicated to the study and analysis of

    genomes, uses CloudView for its knowledgebase of resources including genome data and

    genome-related scientific articles. Features include dynamic categorization and clustering, entity

    extraction (people, places, organizations), faceted results navigation, reverse search, proximity

    search, approximate search, spell checker.

    Documents: 1.2 billion (XML files, database records, scientific documents); growing by

    120 million documents every 2 months; projected to eventually reach 20

    billion documents

    Processing: 5 Queries Per Second (QPS)

    Servers: 1 for indexing + 1 for searching

    Time to Market: 6 weeks; search component ready in 10 days

    Staffing: 1 part-time technician: 2 days per month

    Connectors: Native ODBC Connector; XML API

    Competitors: Lucene; CloudView replaced Altavista

    Our in-house staff and our external researcher community are now instantaneously in touch with

    all the information they need... We have to provide the context behind the search that allows our

    users to navigate to the specific area of interest in a few clicks. It is a unique solution over our size

    of index.

    Tony Cox, Head of Software, The Sanger Institute

    Page 11

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    15/20

    7.2 Business Applications - Database Offloading

    GEFCO EXTRANET/DATABASE OFFLOADING

    GEFCO selected CloudView for its redesigned logistics portal, reducing the load on its Oracle

    databases and allowing staff, partners and customers to locate, track and optimize vehicle transport

    in real-time across 80 countries and 500 international routes. Features include dynamic categorization

    and clustering, entity extraction (people, places, organizations), faceted results navigation,

    geolocalization, reverse search, proximity search, approximate search, spell checker.

    Documents: 1 million (representing 600,000 daily transactions)

    Processing: 2000 documents indexed per second Refresh Rate: Quasi real-time (30 seconds)

    Servers: 1 for index build + 1 for search + 1 for high availability

    Time to Market: Prototype 10 days; deployment in 60 days

    Connectors: Native ODBC Connector

    Notes: Improved functionality, performance and data freshness while offloading

    central databases and reducing IS infrastructure. Enforces strong firewalling

    of confidential client data.

    Exalead CloudView has dramatically improved system efficiency across the board. Before we

    installed CloudView it could take a day to get the results of such CPU-intensive queries, by which

    time the information was out of date. Now we get these answers almost instantly.

    Guillaume Rabier, Manager of Studies and Projects, GEFCO

    Page 12

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    16/20

    7.3 Web Applications - Online Directories118 218.fr

    This hybrid online yellow and white page directory from Frances leading directory service company

    uses CloudView to dynamically enrich database content with Web content (Web/database mash-up).

    Features include geolocalization, faceted results navigation, dynamic categorization andclustering, entity extraction (people, places, organizations), reverse search, proximity search,

    approximate search, spell checker.

    Documents: 30 million (database records and Webpages)

    Processing: 40 QPS per server

    Refresh Rate: 15 minutes

    Servers: 1 for build + 2 for search

    Time to Market: 60 days

    Connectors: Built-In HTTP and ODBC Connectors; XML API

    Competitors: FAST

    Notes: Features powerful natural language interpretation capabilities.

    Deploying an online directory is highly complex and usually requires 12 to 24 months. Exalead

    allowed us to launch our site in 2 months while bringing unmatched differentiating innovation.

    Bruno Massiet Dubiest, CEO, 118 218

    Page 13

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    17/20

    VIAMICHELIN

    Travel publishing and services leader Michelin selected CloudView for its high-traffic travel portal,

    ViaMichelin. Features include rich mapping, dynamic categorization and clustering, entity extraction

    (people, places, organizations), faceted results navigation, geolocalization, reverse search, proximity

    search, approximate search, spell checker.

    Documents: 15 million points of interest (hotels, restaurants, attractions, etc.)

    Processing: 800 QPS; 150 milliseconds per query

    Servers: 8

    Time to Market: 4 weeks

    Connectors: Built-In HTTP and ODBC Connectors; XML API

    Page 14

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    18/20

    7.4 Web Applications - Online ClassifiedsYAKAZ

    This classified ad site uses CloudView to aggregate listings from more than 500 public websites.

    Features include dynamic categorization and clustering, entity extraction (people, places, organizations),faceted results navigation, reverse search, proximity search, approximate search, spell checker.

    Documents: 1 million announcements from 500 databases in 15 languages

    Processing: 40 QPS; 6 million unique monthly visitors, with traffic growing rapidly

    (18% in most recent quarter)

    Servers: 1 index build + 1 search + 1 high availability

    Staffing: 100% of the work done by Yakaz team; Exalead provided only training

    Connectors: Built-In HTTP Crawler + Extractors

    Notes: The system is very non-intrusive; indexing has no impact on source

    databases.

    Page 15

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    19/20

    RIGHTMOVE

    Rightmove, the UKs top real estate classifieds portal, selected CloudView to enhance the end user

    experience, improve system performance, and reduce IT costs. Features include dynamic categorization

    and clustering, faceted results navigation, and geolocalization.

    Documents: 2 million (real estate ads)

    Processing: 400 QPS; 1.2 million records indexed in 1 hour; 29 million monthly visitors

    Refresh Rate: Less than 2 minutes

    Servers: 3 datacenters for high availability: each has 1 build + 2 search servers

    Deployment: 3 months

    Connectors: Built-In ODBC Connector

    Notes: Cost of search successfully reduced from .06 pence to .01 pence per 1000

    queries (with more powerful and intuitive search and navigation features).

    99.99% reliability achieved. 30 Oracle CPUs replaced by 9 Exalead CPUs.

    Rightmove has already found that Exalead CloudView has allowed the speedy development of

    advanced search functionality whilst reducing search costs by 83%.

    Peter Brooks-Johnson, Product Director, Rightmove

    Page 16

    Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead

  • 7/28/2019 Hidden Costs of Scaling Search Whitepaper English

    20/20