The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A...

16
THE HIDDEN COSTS OF SCALING SEARCH A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

Transcript of The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A...

Page 1: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

The hidden CosTs of sCaling searChA Practical Guide to Anticipating and Controlling Search Costs

3DS.

COM

/EXA

LEA

D

Page 2: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

© 2012 Dassault SystèmesWP EXALEAD: The Hidden Costs of Scaling Search

foreword

This whitepaper is intended to aid you in evaluating search and information access proposals for your organization by detailing a very important, often overlooked, cost component: scaling your search solution. Too many customers are surprised to find that almost immediately after deployinga search engine, they need to scale their platform—and that the cost of scaling can be exorbitant.

This paper therefore:• Identifies the reasons why search needs escalate so fre-

quently and dramatically,• Explains why scaling is often expensive,• Provides practical advice for anticipating and controlling

costs, and• Furnishes performance benchmarks for more effectively

making cost comparisons between solutions.

We hope this information will aid you in developinga complete TCO forecast for your search platform, one that effectively incorporates the costs associated with scaling functionality and/or performance in addition to more easily identified direct, indirect and upgrade costs.

we welCome Your feedbaCk

Whatever your role—IT analyst, system administrator, application end user, business manager, security expert, or simply a curious reader—your feedback is important to us. We invite you to contact us at www.exalead.com/software with your comments, suggestions or questions.

Page 3: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

© 2012 Dassault Systèmes WP EXALEAD: The Hidden Costs of Scaling Search

Table of ConTenTs

1. why search demands & Costs escalate………………………………………………11.1 Users Demand Wider Access, More Features…………….…………………………………..11.2 IT Discovers New Uses, Additional Value………….……..……………………………………1

2. anticipating and Controlling Costs…………………………………………………….2

3. forecasting demand: five rules……....……………….…………………………………2 3.1 Double Your Estimated Volume; Anticipate Double-Digit Growth……………………..23.2 Plan for Additional Data Sources, including the Web………………………………………33.3 Anticipate Demand for a Web-Style Experience, and Real Web Integration………..33.4 Plan for Increased Compliance Demands………………………………………………………33.5 Position Yourself for the Unexpected…………………………………………………………..3

4. understanding search Types & limitations…………………………………………44.1 Legacy Enterprise Search Products……..….…………………………………………………...44.2 Search Add-Ons from Mainstream Application Providers………………………………..44.3 Web Search Engines Ported to the Enterprise………………………………………………….4

5. establishing apples-to-apples Cost Comparisons…….…………………………5

6. about eXalead CloudViewTm…….……………………………………………………...5 6.1 Dual Web/Enterprise DNA………………………………………………………………………...56.2 High Performance with Minimal Resources………………………………………………….56.3 Infinite Scaling………………………………………………………………………………………...66.4 True Unified Data Access……………………………………………………………………………76.5 Rapid Time to Market, Agile Development…………………………………………………...7

7. CloudView benchmarks……………………………………………………………….…...77.1 Enterprise Search……………………………………………………………………………………..77.2 Business Applications……………………………………………………………………………….87.3 Web Applications - Online Directories…………………………………………………………97.4 Web Applications - Online Classifieds…………………………………………………………10

figures

Fig. 1: Data Volume Managed by Companies Worldwide (IDC)………………………………3Fig. 2: CloudView Scales with Minimal Resources……………………………………………….6Fig. 3: CloudView Scales Infinitely in Five Directions……………………………………………6Fig. 4: Scaling with a Distributed Architecture + Commodity Hardware………………….6Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource…7

Page 4: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search © 2012 Dassault Systèmes

Page 5: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search© 2012 Dassault Systèmes 1

Information workers spend 48% of their time searching for information, with 1/3 of that time resulting in failed searches and recreated work

1.2 iT discovers new uses, additional Value

In addition to this user-driven escalation, IT departments of-ten discover that their search engine can provide value beyond simply locating information. They learn that search engines can be used to derive new value from existing information as-sets while adding much-needed IT agility. Specifically, these engines can be used to:• Create new, exploitable assets from unstructured content

like email, Office documents, chat and Web pages• Increase the value of existing structured content (i.e.,

database systems)• Provide a unified data platform for constructing agile

business applications

Transforming Unstructured Content into an Exploitable AssetSearch engines automatically classify and categorize unstruc-tured data. Once this data is structured and indexed, it can be incorporated into business information systems and processes. Enterprises find this can provide a significant competitive advantage given that unstructured data makes up on aver-age 80% of corporate information assets, and that it contains highly valuable emotive and qualitative data.

Increasing the Value of Structured DataBecause of performance limitations (databases are optimized for storing, not accessing data), and heavy licensing and infra-structure costs, database resources are frequently under-uti-lized in the enterprise. However, when IT managers discover that index-based querying is as rich as relational database querying yet 100s of times faster and cheaper, they beginto use search engines to provide alternative access to essential database content.

Unified Data Access for Agile ApplicationsUnified data access is essential for meeting escalating compli-ance requirements, and for satisfying Web-savvy users’ appe-tite for fast, easy information access. However, by decoupling data from traditional application layers, enterprises are also learning that search engines can enable a new breed of ‘light’ business applications.

1) whY searCh demands & CosTs esCalaTe

At root, search demands and costs escalate because searchworks. Users are hungry for better, easier information access.In fact, IDC estimates that information workers spend 48% oftheir time searching for and analyzing information, with one-third of that time resulting in failed searches (and re-created work), costing organizations $28,000 per worker per year.1

1. IDC Predictions 2009: An Economic Pressure Cooker Will Accelerate the IT Industry Transformation, IDC, 12/2008

1.1 users demand wider access, more features

Once enterprise search is deployed and users get a taste of unified, universal data access, demands to scale the system in functionality and performance appear almost immediately. Often, this is because organizations begin with an overly basic search solution: simple keyword searching of a finite set of resources, often HTML-centric, delivered via an appliance, hosted service or open source solution, and provided toa restricted user base.

Even when more advanced systems are deployed, and a wider initial user base is served, users still quickly demand access to a wider range of data sources, and insist on more sophisti-cated features and functionality, such as automatic clustering and categorization, multilingual indexing, natural language querying and Web-style collaboration tools. And, of course, whatever the scope or functionality, users expect the sub-second responsiveness they’ve become accustomed to onthe Internet.

Page 6: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

2WP EXALEAD: The Hidden Costs of Scaling Search © 2012 Dassault Systèmes2

Search-Based Applications (SBAs) are fastand easy to construct and can incorporate data from any source

Without built-in scalability, even low cost solu-tions can skyrocket to millions of dollars in justa few years

2) anTiCipaTing and ConTrol-ling CosTs

To avoid this unpleasant turn of events, one must better antici-pate and control costs by:• Forecasting search needs as accurately as possible and

understanding the drivers for increased demands, while still ‘expecting the unexpected’

• Understanding the differences between the basic types of search solutions and the scaling issues unique to each

• Establishing baseline functional and performance criteria to more easily make direct comparisons across solutions

3) foreCasTing demand: fiVe rules

Keeping the user and management-driven demand factors from Section 1 in mind, we recommend following these five practical rules to better forecast demand.

3.1 double Your estimated Volume; anticipate double-digit growth

Enterprise datastores now routinely reach 100 terabytes (100,000 GB) or more, with large organizations accumulating as much as 2 TB of new data daily. Much of this is unstruc-tured User-Generated Content (UGC) enabled by personalproductivity tools (like the Office suite) and communication tools (like email and chat). Structured content has likewise increased with the rise of content management systemsdesigned to manage UGC, and with the widespread adoption of enterprise business applications (ERP, SCM, CRM, BI, CI, etc.).

This rapid, often unmeasured growth is the reason enterprises frequently under-estimate the initial volume of content that needs to be indexed. Even for accurately forecast volumes, one must keep in mind that the organic growth in corporatedatastores means that the content load for enterprise search typically doubles every 6 months. Therefore, to be safe, de-velop your best volume estimate, then double it, and plan for double-digit growth post-implementation.

Known as Search-Based Applications (SBAs), these applica-tions can be created on the fly to satisfy evolving business needs using information drawn from any source—from legacy databases to email, blogs, and the Web—while leaving exist-ing systems and structures untouched, an approach that preserves existing IS investments and is clearly less complex and costly than traditional data and application integration strategies.

Maximize Benefit; Avoid ‘Sticker Shock’Given these benefits to both end users and IT managers, it is no wonder that functional and infrastructure-level search demands escalate so frequently. And it is in this attempt to meet these escalating search demands by scaling hardware, infrastructure and functionality that organizations frequently encounter search ‘sticker shock’.

They boost RAM, add servers, increase bandwidth, add or upgrade licenses, and set about the difficult (sometimes impossible) task of trying to make simple search tools perform complex analytic functions.

But, given that search is too often a complex, resource-intensive process, with infrastructure requirements increasing exponentially in relation to functional requirements, and that scaling is often tied to proprietary hardware or to unreason-able user or document counts, it is easy to see why costs can quickly mount. Even some solutions that begin at only a few thousand dollars can skyrocket to millions of dollars within just a few short years (sometimes even within one year) when functional or performance needs escalate.

Page 7: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search© 2012 Dassault Systèmes 3

3.2 plan for additional data sources, including the web

New information silos are popping up daily due to increasing IS complexity and the rise of the “Cloud” model, that is to say the real-time delivery of software, information, computing power and other business services via the Web. Expect your users to demand that more and more of these internal and external data sources be included in the search index, or in ap-plications built upon this index, including extensive data from the Web itself. The Web already constitutes an essential dailyresource for knowledge-hungry workers, and workers are increasingly demanding that Web data (blogs, competitor sites, feedback forums, industry sites, etc.) be integrated in business applications.

3.3 anticipate demand for a web-style experience, and real web integration

These same Web-savvy users are also demanding that enter-prise search, and the business applications built upon search platforms, be as easy and intuitive to use as Web applications, if not seamlessly integrated with those same tools. Make sure prospective bidders can meet users’ demands for:• ‘Zero-training’ usage• The ability to leverage Web and personal information

for business tasks (e.g., using their LinkedIn network for sales and recruiting or integrating Facebook data in CRM applications)

• Web 2.0/3.0 interactive capabilities, such as workflow integration and collaborative tools like resource tagging, bookmarking and sharing

• Fresh, up-to-date data

3.4 plan for increased Compliance demands

While IT has been working to meet increased legal and regulatory compliance demands for several years, regulatory pressures are revving up again in response to mismanagement issues underlying the recent economic crisis. Expect a trickle down impact on your own compliance strategy, with height-ened internal demand for better risk management as well.

3.5 position Yourself for the unexpected

As the evolution of the Internet and Cloud computing attest, the information landscape is changing so fast that many de-mands simply cannot be anticipated. To make sure you have an enterprise search tool that provides you with maximum agility in responding to the unexpected, look for:• An SOA architecture, with core components that can be

easily replicated and distributed• Open, standards-based APIs for flexibility in managing

and interacting with the platform• Support for Web formats and protocols (SOAP, REST,

OWL, XML, RDF, RSS, etc.) as well as major programming environments (Java, C#, .Net)

• A single, unified base of unstructured and structured data• Linear scaling using commodity hardware

fig. 1: Data Volume Managed by Companies Worldwide (IDC)

“As people spend more time online actively par-ticipating in Web 2.0 technologies such as rich user interfaces based on Ajax and Flash, social networking and tagging, blogs and wikis, Web mashups, and on-demand services in general, information workers will start expecting Enter-prise 2.0 applications in the workplace that focus on providing easy-to-use and many-to-many personalized online experiences for creating, pub-lishing, locating, and sharing information with colleagues, customers, and partners”

Susan Feldman, IDC, Worldwide Search and Discovery Software 2008-2012 Forecast Update

and 2007 Vendor Shares

Page 8: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

4WP EXALEAD: The Hidden Costs of Scaling Search © 2012 Dassault Systèmes

Originally designed for limited data collections and a small, trained user base, traditional enterprise systems are often difficult to scale.

Search add-ons from non-search vendors are typically poor in text analytics, limited in source connectors, and expensive to scale

Products from Web search vendors are weakin structured data handling, faceted navigation,and security

With these platform attributes, you can quickly modify exist-ing applications, rapidly construct new applications and easily scale on demand.

4) undersTanding searCh TYpes & limiTaTions

Another aid to accurately forecasting costs is understanding the three basic types of enterprise search engines and their unique performance capabilities and limitations. These types are:• Legacy enterprise search products• Search add-ons from mainstream business application

providers• Web-based search engines ported to an enterprise envi-

ronment

4.1 legacy enterprise search products

Designed from inception for enterprise search, these engines were constructed for cross-repository data access and use statistical and linguistics-based text analytics to automate content processing. This enables them to produce the kind of faceted results navigation required for task-based searchingin the enterprise. Most also provide good support for existing enterprise security infrastructures.

While this native enterprise focus enables these engines toaccommodate a wide range of functional requirements, theyare often complex to use and lag in Web-style features. Theycan also be expensive to scale as they were designed fromthe outset for a relatively small user base and a limited (ofteninternal) set of data sources.

4.2 search add-ons from mainstream application providers

Another class of engine is that developed by leading business application providers (IBM, SAP, SAS, Microsoft, Oracle, etc.) who sought first to improve the search function within their own database-centered products, then to extend that search functionality to external repositories.

As they were originally designed for database querying, theseproducts typically offer limited text analytics (i.e., a limited ability to process unstructured data), are expensive to connect to external data sources, and expensive to scale due to restric-tive licensing policies and resource-intense engineering. Many of these vendors have attempted to address these shortcom-ings by acquiring native enterprise search companies, with limited success in product integration and support.

4.3 web search engines ported to the enterprise

These search engines scale well, up to tens of billions of docu-ments and hundreds of queries per second. However, they are feature-poor, designed for light keyword searching of mainly HTML content. They typically return a ‘laundry list’ of search results rather than the faceted navigation required for task-based enterprise search (popularity-driven Web ‘relevancy’ is meaningless in an enterprise context).

They have limited text analytic capabilities and a limited capacity to ingest, process and integrate structured content. They likewise have limited built-in support for the special security constraints of an enterprise environment. Therefore,

Page 9: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search© 2012 Dassault Systèmes 5

Though they scale well, licensing policies often make scaling Web search engines surprisingly expensive

extending the functionality of such systems to better meet enterprise needs can be very expensive, when it is doable at all.

Lastly, even when scaling is limited to content well-suited to these engines, it can still be surprisingly expensive. These products are often sold with licenses tied to unrealistically low document counts, or scaling necessitates the purchase of ex-pensive proprietary hardware. Consider, for example, the cost of scaling a search solution from one popular Web vendor to hundreds of millions of documents when for only 30 million documents, the solution requires a $500k bi-annual license of proprietary hardware.

5) esTablishing apples-To-ap-ples CosT ComparisonsFinally, you can better anticipate and control costs by conducting a more accurate, more complete comparison of vendor cost proposals. To do so, first, detail your now-revised demand forecast, specifying:• The Number of Users and Simultaneous Queries to be

Processed• The Number and Type of Sources and Documents to be

Indexed• The Range of Search and Indexation Features Required• The Data Refresh Rate

Next, ask prospective vendors to provide 5 year costs to cover both the initial demand and the scaled demand. To realisti-cally forecast TCO, these costs should include the following:

Direct Costs:• Software Licensing Fees• Hardware & Operating System (servers, server clusters,

back-up systems) - Initial Purchase and Upgrade Costs• 3 Year 24*7 Support• 5 Year Maintenance & Support

Indirect Costs:• Staffing Costs for Software Implementation, and Soft-

ware and Hardware Administration• Hardware Floor Space• Hardware Power• Cooling Hardware• BandwidthNote: Keep in mind that you can not only reduce costs by selecting a resource-efficient solution, you can also help your organization meet Green IT objectives.

6) abouT eXalead CloudViewTm

As a final tool for anticipating and controlling costs, we pro-vide several performance benchmarks for CloudView that you can use in comparing vendor solutions. But first, it is helpful to understand why CloudView provides an important com-parative model for cost-efficient search scaling.

6.1 dual web/enterprise dna

First, and most importantly, CloudView was designed from inception for both the enterprise and the Web, driving a 16-billion page public search engine and serving 100 million unique users a month through CloudView-powered websites. Because of this dual Web/enterprise DNA, CloudView alone combines Web simplicity, scalability and innovation with features essential for the corporate environment, including advanced semantic processing of unstructured data, superior structured data handling, and full compliance with existing security systems.

6.2 high performance with minimal resources

Furthermore, CloudView was designed to achieve this balance of Web scalability and enterprise functionality using minimalresources. The end result is a platform that uses on average 1/5th the hardware resources of competitors, providing real-time indexing of 100 million documents and processing 20 queries per second on a single commodity server—all whileproviding advanced semantic features like dynamic categori-zation and clustering.

Page 10: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search © 2012 Dassault Systèmes6

6.3 infinite scaling

CloudView easily and cost-effectively scales in five directions:• The Total Number of System Users

The platform supports an unlimited user base, with a proven capacity to serve millions of users

• System Features and FunctionalityCloudView offers extensive built-in functionality with full administrator control over feature activation and configu-ration; open APIs can be used to further extend function-ality as desired

• Volume of Data IndexedIndex and index build services can easily be distributed across commodity servers; built-in index partitioning and replication services further extend performance and availability

• Number of Simultaneous Queries ProcessedThe platform’s average throughput is 20 queries per sec-ond (QPS) per server, and throughput can be easily scaled by distributing query processing across multiple servers

• Index Refresh RateCloudView supports any data refresh strategy desired: 1) real-time, 2) interval, and 3) “just in time” (on query reception). Dictionaries, thesauri, etc., are automatically updated as the index is updated

fig. 2: CloudView Scales with Minimal Resources

fig. 3: CloudView Scales Infinitely in Five Directions fig. 4: Scaling with a Distributed Architecture + Commodity HardwareCloudView is designed to maximize performance and availability through process distribution, load balancing, index partitioning and index replication

“EXALEAD’s ability to scale is comparable to Google’s…Most enterprise search and content processing systems cannot handle billions of doc-uments—EXALEAD does. EXALEAD’s search and content processing solutions give the company a technical advantage over vendors whose systems choke when thousands of users simultaneously want access to information.”

Stephen Arnold, ArnoldIT

Page 11: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search© 2012 Dassault Systèmes 7

6.4 True unified data access

Because CloudView was developed simultaneously for Web and enterprise search, the platform’s natural language pro-cessing modules (text processing and annotation, automatic document classification, named entity extraction, etc.) are es-pecially adept at analyzing, categorizing and classifying very high volumes of unstructured data, content like Word docu-ments, Web pages, blog entries, email messages, PowerPoint presentations, and PDFs.

This automatic structuration not only makes previously unstructured data directly accessible as a new information channel, it enables CloudView to synthesize it with existing structured data, such as that from corporate databases and business applications. This meaningful correlation formsthe foundation for value-added uses such as extending en-terprise applications with emotive and qualitative data from unstructured sources, and creating innovative mashups merg-ing Web, database and enterprise content.

6.5 rapid Time to market, agile development

Rapid Time to MarketCloudView is both a fully packaged, off-the shelf product designed for ‘plug and play’ use, and a white box solution that can be quickly adapted to specific needs using standards-based APIs. As a result, CloudView typically deploys in just days for enterprise search, and on average within only 2-8weeks for advanced business applications and data mashups,

with little to no need for professional services support.

Agile DevelopmentBeyond initial deployment, CloudView provides an agile base for rapidly constructing new business applications, and can be quickly scaled to meet evolving demands. Application agility is assured by CloudView’s fully unified data access platform, SOA architecture and open API framework, while the ability to scale quickly is made possible by built-in distribution and replication facilities that simply require the addition of com-modity hardware.

7) CloudView benChmarks

To help you form a baseline of functional and performance re-quirements for comparing solutions, we provide benchmarks below for actual EXALEAD CloudViewTM installations. These benchmarks include statistics such as:• Number of documents indexed• Refresh rate for the index• Queries processed per second• Servers required• Time to deployment• Data source connectors used

We invite you to use these specifications when evaluating vendor offerings. Furthermore, we encourage you to demand that prospective vendors contractually agree to meet your re-quirements with the resources they have proposed. EXALEAD can, and does.

7.1 enterprise search

CofaCe eXTraneTCoface, a world leader in trade-credit information and protec-tion with offices in 60 countries, selected CloudView for an extranet that provides customers with key data on 100 million companies.

Performance Benchmarks• Documents Indexed: 100 million (Oracle DB records)• Processing: 2000 documents indexed per second; 1.7

fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource

Page 12: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search © 2012 Dassault Systèmes8

million company profiles added per hour• Refresh Rate: Less than 1 minute• Servers Required: 2 for indexing + 2 for searching• Deployment: 60 days• Connectors: Standard PAPI and ODBC Connectors• Competitors: Sinequa, Fast• Notes: Response rate is five times faster than legacy

system

sanger insTiTuTe inTraneT

The Sanger Institute, a world-renowned research center dedi-cated to the study and analysis of genomes, uses CloudView for its knowledgebase of resources including genome data and genome-related scientific articles. Features include dy-namic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, reverse search, proximity search, approximate search, spell checker.

• Documents: 1.2 billion (XML files, database records, sci-entific documents); growing by 120 million documents every 2 months; projected to eventually reach 20 billion documents

• Processing: 5 queries per second (QPS)• Servers: 1 for indexing + 1 for searching• Deployment: 6 weeks; less than 10 days for search com-

ponent• Staffing: 1 part-time technician (2 days per month)• Connectors: Native ODBC Connector; XML API• Competitors: Lucene; CloudView replaced Altavista

7.2 business applications

gefCo eXTraneT/daTabase offloading

GEFCO selected CloudView for its redesigned logistics portal. This portal allows staff, partners and customers to locate, track and optimize vehicle transport across 80 countries and 500 international routes. Deploying CloudView as the core information access platform enabled GEFCO to reduce the

”Our in-house staff and our external researcher community are now instantaneously in touch with all the information they need... We have to provide the context behind the search that allows our users to navigate to the specific area of inter-est in a few clicks. It is a unique solution over our size of index.”

Tony Cox, Head of Software, The Sanger Institute

“The indexing capacity and performance of CloudView impressed us, and we quickly realized that this solution would enable us to create the kind of research services we wanted for our cli-ents while letting us retain control over our costs, software, services, servers and maintenance. What’s more, the EXALEAD solution integrated transparently into our infrastructure, and offered essential security guarantees.”

Jean-Luc Brizard, CIO, Coface Services

Page 13: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search© 2012 Dassault Systèmes 9

load on its Oracle databases while improving performance, with data latency cut from 24 hours to 30 seconds. It also enabled IT to open up all data characteristics for researchand reporting—they no longer need to pre-determine and hard code SQL queries for all available views, summaries and reports. Features deployed include dynamic categorization and cluster-ing, on-the-fly operational reporting (charts, graphs, statistics, etc.), faceted results navigation, geolocation, reverse search, proximity search, approximate search, spell checker.

• Documents: 1 million (representing 600,000 daily transactions)• Processing: 2000 documents indexed per second• Refresh Rate: Quasi real-time• Servers: 1 for index build + 1 for search + 1 for high availability• Deployment: Prototype 10 days; deployment in 60 days• Connectors: Native ODBC Connector• Notes: Improved functionality, performance and data

freshness while offloading central databases and reduc-ing IS infrastructure. Enforces strong firewalling of confi-dential client data.

“EXALEAD CloudViewTM has dramatically improved system efficiency across the board. Before we installed CloudView it could take a day to get the results of such CPU-intensive queries, by which time the information was out of date. Now we get these answers almost instantly.”

Guillaume Rabier, Manager of Studies and Projects, GEFCO

“Deploying an online directory is highly complex and usually requires 12 to 24 months. EXALEAD allowed us to launch our site in 2 months while bringing unmatched differentiating innovation.”

Bruno Massiet du Biest, CEO, 118 218

7.3 web applications - online directories

118 218.fr

This hybrid online yellow and white page directory from France’s leading directory service provider uses CloudView to dynamically enrich database content with Web content(a Web/database mashup). Features include geolocation, faceted results navigation, dynamic categorization and clustering, entity extraction (people, places, organizations), reverse search, proximity search, approximate search, spell checker.• Documents: 30 million (database records and Web pages)• Processing: 40 QPS per server• Refresh Rate: 15 minutes• Servers: 1 for build + 2 for search• Deployment: 60 days• Connectors: Built-In HTTP and ODBC Connectors; XML API• Competitors: FAST• Notes: Features powerful natural language interpretation

capabilities

Page 14: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search © 2012 Dassault Systèmes10

This classified ad site uses CloudView to provide seamless structured search of listings culled from nearly 7000 pub-lic websites. Features include dynamic categorization and clustering, entity extraction (people, places, organizations), faceted results navigation, reverse search, proximity search, approximate search, spell checker.• Documents: 1 million announcements from 500 data-

bases in 15 languages• Processing: 40 QPS; 500,000 monthly visitors, with traf-

fic growing rapidly (18% in most recent quarter)• Servers: 1 index build + 1 search + 1 high availability• Staffing: 100% of the work done by Yakaz team; EXA-

LEAD provided only training• Connectors: Built-In HTTP Crawler + Extractors• Notes: The system is very non-intrusive; indexation has

no impact on source databases

righTmoVe

Rightmove, the UK’s top real estate classifieds portal, wanted to enhance their end user experience, improve system perfor-mance, and reduce IT costs. To achieve these goals, they de-cided to shift from a classic database application with queries run directly against an Oracle database system to an index-based SBA strategy using EXALEAD CloudViewTM. The new CloudView SBA has met Rightmove’s cost and performance goals while providing search and navigation features that are at once more intuitive and more powerful, with the system automatically incorporating data facets that could only be made accessible through manual programming with the legacy application. Features include dynamic categorization

ViamiChelin

Travel publishing and services leader Michelin selectedCloudView for its high-traffic travel portal, ViaMichelin.The portal offers an engaging mashup of database infor-mation, Web content and dynamic mapping, with features including geolocation, dynamic categorization and cluster-ing, entity extraction (people, places, organizations), faceted results navigation, reverse search, proximity search, approxi-mate search, spell checker.• Documents: 15 million points of interest (hotels, restau-

rants, attractions, etc.)• Processing: 800 QPS; 150 milliseconds per query• Servers: 8• Deployment: 4 weeks• Connectors: Built-In HTTP and ODBC Connectors; XML API

7.4 web applications - online Classifieds

YakaZ

Page 15: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

WP EXALEAD: The Hidden Costs of Scaling Search© 2012 Dassault Systèmes 11

and clustering, faceted results navigation, and geolocation.• Documents: 2 million (real estate ads)• Processing: 400 QPS; 1.2 million records indexed in 1

hour; 29 million monthly visitors• Refresh Rate: Less than 2 minutes• Servers: 3 datacenters for high availability: each has 1

build + 2 search servers• Deployment: 3 months• Connectors: Built-In ODBC Connector• Notes: Cost of search successfully reduced from £0.06

to £0.01 per 100 queries (with more powerful and intui-tive search and navigation features). 99.99% reliability achieved. 30 Oracle CPUs replaced by 9 EXALEAD CPUs

“Rightmove has already found that EXALEAD CloudViewTM has allowed the speedy development of advanced search functionality whilst reducing search costs by 83%.”

Peter Brooks-Johnson, Product Director, Rightmove

Page 16: The hidden CosTs of sCaling searCh - Dassault Systèmes · The hidden CosTs of sCaling searCh A Practical Guide to Anticipating and Controlling Search Costs 3DS.COM/EXALEAD

© 2012 Dassault SystèmesA Practical Guide to Big Data: Opportunities, Challenges & Tools

Visit us at3ds.Com/eXalead

Dassault Systèmes, the 3D Experience Company, provides business and people with virtual universes to imagine sustainable innovations. Its world-leading solutions transform the way products are designed, produced, and supported. Dassault Systèmes’ collaborative solutions foster social innovation, expanding possibilities for the virtual world to improve the real world. The group brings value to over 150,000 customers of all sizes in all industries in more than 80 countries. For more information, visit 3DS.COM.

CATIA, SOLIDWORKS, SIMULIA, DELMIA, ENOVIA, EXALEAD, NETVIBES, 3DSWYM, 3DVIA are registered trademarks of Dassault Systèmes or its subsidiaries in the US and/or other countries.

Delivering Best-in-Class Products

Europe/Middle East/Africa Dassault Systèmes 10, rue Marcel Dassault CS 40501 78946 Vélizy-Villacoublay Cedex France

Americas Dassault Systèmes 175 Wyman Street Waltham, Massachusetts 02451-1223 USA

Asia-Pacific Dassault Systèmes Pier City Shibaura Bldg 10F 3-18-1 Kaigan, Minato-Ku Tokyo 108-002 Japan

Virtual Product Design

3D for Professionals

Realistic Simulation

Global Collaborative Lifecycle Management

Dashboard Intelligence

Information Intelligence

Social Innovation

Online 3D Lifelike ExperiencesVirtual Production