GSA Deployment Guide

126
Google Search Appliance Deployment Guide September 2009

Transcript of GSA Deployment Guide

Google Search ApplianceDeployment GuideSeptember 2009

Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com 21 September 2009

Copyright 2009 Google Inc. All rights reserved.Google, the Google logo, Google Search Appliance, GSA, the Google Mini, Google Site Search, and GSS are trademarks, registered trademarks, or service marks of Google Inc. All other trademarks are the property of their respective owners. Use of any Google solution is governed by the license agreement included in your original contract. Any intellectual property rights relating to the Google services are and shall remain the exclusive property of Google, Inc. and/or its subsidiaries (Google). You may not attempt to decipher, decompile, or develop source code for any Google product or service offering, or knowingly allow others to do so. Google documentation may not be sold, resold, licensed or sublicensed and may not be transferred without the prior written consent of Google. Your right to copy this manual is limited by copyright law. Making copies, adaptations, or compilation works, without prior written authorization of Google is prohibited by law and constitutes a punishable violation of the law. No part of this manual may be reproduced in whole or in part without the express written consent of Google. Copyright by Google Inc. Google provides this publication as is without warranty of any either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Google may revise this publication from time to time without notice. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions; therefore, this statement may not apply to you.

2

Google Search Appliance Deployment Guide

Contents

Chapter 1: Introduction....................................................................................... 5 Welcome to the Google Search Appliance............................................................ 5 About this guide..................................................................................................... 6 Disclaimer for Third-Party Product Configurations ................................................ 8 Chapter 2: Understanding Your Deployment.................................................... 9 Understanding your users ................................................................................... 10 Understanding your content ................................................................................ 11 Understanding your business processes............................................................. 12 Understanding your architecture ......................................................................... 12 Chapter 3: Planning for Successful Deployment ........................................... 15 Capturing requirements ....................................................................................... 16 Identifying phases................................................................................................ 22 Defining success criteria...................................................................................... 28 Transitioning to business as usual ...................................................................... 28 Chapter 4: Project Scenarios ........................................................................... 31 Basic search on a public website ........................................................................ 32 Basic internal search ........................................................................................... 35 Internal search over intranet, file system, SharePoint, and Notes....................... 38 Internal search including CMS, database, corporate application assets ............. 40 Chapter 5: Deployment Architecture ............................................................... 45 Sizing the index ................................................................................................... 45 Architecting for scale and performance ............................................................... 48 Architecting for reliability...................................................................................... 50 Architecting for reach........................................................................................... 52 Architecting for security ....................................................................................... 56 Enhancing technologies ...................................................................................... 60 Chapter 6: Deployment Scenarios ................................................................... 61 Staging/Development environment ..................................................................... 62 Simple architecture.............................................................................................. 63 Search as a web service ..................................................................................... 64

Contents

3

High availability architecture................................................................................ 65 Disaster recovery deployment architecture ......................................................... 69 Integrated architectures....................................................................................... 71 Security solutions ................................................................................................ 75 Federated architecture ........................................................................................ 79 Chapter 7: Post Deployment ............................................................................ 83 Update planning .................................................................................................. 83 Planning for renewal............................................................................................ 87 Optimizing support............................................................................................... 89 Using reports to enhance the search experience ................................................ 91 Chapter 8: Putting the User First ..................................................................... 95 Presentation methods.......................................................................................... 95 Enrichment features ............................................................................................ 99 Google Enterprise Labs..................................................................................... 100 User feedback ................................................................................................... 101 Appendix A: Best Practices............................................................................ 103 Datacenter and installation ................................................................................ 103 Crawl ................................................................................................................. 104 Feeds................................................................................................................. 105 Index reset......................................................................................................... 106 Collections ......................................................................................................... 106 Serving .............................................................................................................. 106 Front end stylesheets ........................................................................................ 108 Security.............................................................................................................. 108 Ongoing administration...................................................................................... 108 Appendix B: Technical Solutions for Common Challenges ........................ 111 Crawling and indexing content .......................................................................... 111 Security and serving secure content ................................................................. 116 Document relevancy.......................................................................................... 117 Interfaces and front end customization.............................................................. 118 Other areas........................................................................................................ 118 Appendix C: Enterprise Search Satisfaction Survey ................................... 121 Appendix D: Other Resources ....................................................................... 125

4

Google Search Appliance Deployment Guide

Introduction

Chapter 1

Welcome to the Google Search ApplianceThe Google Search Appliance is a full-featured enterprise search solution that brings Googles award-winning search technology to the enterprise. The Google Search Appliance provides high levels of search relevancy, scalability, and redundancy to meet the ever-growing, mission-critical information access demands of any organization. Unlike many enterprise applications, the Google Search Appliance is designed to be self sufficient: hardware, software, networking, storage, and security support are built in, and can be easily supplemented with additional capabilities. This document outlines several considerations for successfully deploying Google Search Appliances to meet the document capacity, scalability, and redundancy needs of an enterprise.

Right for your businessThe Google Search Appliance delivers the same powerful search algorithms as Google.com from a self-contained appliance. Your users will get the same great relevance and experience searching your companys information in the office as they get searching the web at home.

Great valueBecause the Google Search Appliance is self-contained, it delivers core search capabilities out of the box with no additional hardware required. However, you can supplement the search appliance with off-box capabilities to deliver universal search at a compelling price. Ongoing operating cost is lowered by reducing the effort to administer and maintain a search solution substantially, delivering powerful, intuitive search at a low, compelling Total Cost of Ownership (TCO).

5

Easy integrationThe Google Search Appliance seamlessly integrates with existing information technology (IT) infrastructures through industry standards and best practices. Custom integration can be delivered through open standards, such as Security Assertion Markup Language (SAML) for Single Sign-On (SSO) and heterogeneous security, and well-documented, standard Application Programming Interfaces (APIs).

Constant innovationInnovation is the hallmark of Google Enterprise. The Google Search Appliance takes advantage of the innovations tested on Google.com and proven by hundreds of millions of users worldwide. In addition to regular software releases, you can add innovations to a search solution from Google Enterprise Labs or by harnessing the power of Googles cloud capabilities to deliver core search capability.

Continuous increase in ROIThe Google Search Appliance delivers immediate return on investment (ROI), increasing rapidly with short deployment cycles. The Google Search Appliances flexible architecture and open technologies enable you to deploy it rapidly. Once deployed, the search appliance offers increased value by unlocking more of the value in your businesss information assets through continuous innovation, incorporation of additional content, and rich user functionality.

About this guideThis guide provides an overview of deploying a search solution using the enterprise-class Google Search Appliance. The focus of this guide is on best practices and proven approaches to architecture and deployment methodologies. This guide assumes basic knowledge of the Google Search Appliance. However, this guide is not a technical how-to document. For in-depth information, visit Googles rich and comprehensive public search appliance documentation at http://code.google.com/apis/ searchappliance/documentation/index.html. A search solution can be deployed as a traditional monolithic project or by using agile, even extreme project methodologies. Whatever the project methodology, there are guiding principles that have been used in most successful search implementations. This document discusses these guiding principles, giving you the information you need to plan your deployment with the right phases or micro-phases.

6

Google Search Appliance Deployment Guide

Whats in this guideThis guide focuses on deployment best practices. There are several components to this, including: Approaches for planning and executing a Google Search Appliance deployment Architectural best practices Techniques to increase adoption and user satisfaction

In this guide, you can also find comprehensive information about the following topics: The foundations of a successful deployment Ensuring your deployment is optimized for support Designing an architecture to meet your technical and business requirements How to plan deployment phases to achieve quick wins, while delivering ongoing value Supplementing the search appliance with enriching technologies Enabling core features to maximize value

Who this guide is forThis guide is primarily for IT administrators and project managers who plan and manage a deployment of the Google Search Appliance, as well as certified Google Enterprise partners who assist customers in deploying their search appliances. This guide also provides useful information for other technical and managerial personnel who are involved in making decisions about IT infrastructure for your company.

How to use this guideUse this guide as a starting point to help plan and manage your Google Search Appliance deployment. The concepts, instructions, and advice in this guide are intended to provide general information only. Because organizations have a wide variety of IT infrastructures, the methods you ultimately use to set up and manage your search deployment might differ from those described in this guide. Although Google recommends that you read this entire guide, you dont have to. Depending on your organizations infrastructure, your goals, and your own experience, you can use this guide as a reference and read just the sections that are applicable to you. It is recommended that implementation of a search solution proceed with the support of a Google Enterprise partner. Your Google representative will be able to recommend one, or you can find them yourself in the Google Solutions Market Place at http://www.google.com/ enterprise/marketplace/. Partners may use their own methodologies or enhance the contents of this guide based on their experiences in the field.

Introduction

7

Resources that complement this guideFor a detailed list of the resources that this guide refers to, see Other Resources on page 125.

Where to find the latest version of this guideGoogle continually enhances its products and services, so the content of this guide will change from time to time. To ensure you have the most up-to-date version of this guide, visit www.learngsa.com.

How to provide comments about this guideGoogle values your feedback. If you have comments about this guide or suggestions for its improvement, please send an email message to:[email protected]

In your message, be sure to tell us the specific section to which your comment applies. Thanks!

Disclaimer for Third-Party Product ConfigurationsParts of this guide describe how Google products work with diverse customer environments and configurations that Google recommends. These guidelines are designed to work with common environments and deployment scenarios, and should be adapted to your environment. Any changes to your environment, including installation of the Google Search Appliance and related technologies, should be made in conjunction with the oversight and approval of your IT teams. Google does not provide technical support for configuring servers or other third-party products outside of the Google Search Appliance, nor does Google support solution design activities. In the event of a non-Google Search Appliance issue, you should contact your IT systems administrator. GOOGLE ACCEPTS NO RESPONSIBILITY FOR THIRD-PARTY PRODUCTS. Please consult a products web site for the latest configuration and support information. You might also contact Google Solutions Providers for consulting services and options.

8

Google Search Appliance Deployment Guide

Understanding Your Deployment

Chapter 2

To make the most out of your search deployment, you need to understand how users in your organization will use search. You also need to understand the content and processes that will benefit from search, and the architecture that will support it. This chapter presents issues and questions that will help you in: Understanding your users, as described page 10 Understanding your content, as described on page 11 Understanding your business processes, as described on page 12 Understanding your architecture, as described on page 12

The information that you gather as you address the issues listed in this chapter helps you to define your deployment architecture and project plan. For a simple deployment, you might gather information in a single meeting. For more complex deployments, you might use a series of workshops and surveys.

9

Understanding your usersThe success of your deployment hinges on how much your users use the search solution and how effectively they do so. The Google Search Appliance delivers powerful search capabilities out of the box, including a search experience that the vast majority of your users are already familiar with from Google.com. However, you can substantially enhance the user appeal and overall richness of the search experience by understanding your users and what they will be trying to do with search. To understand your users and their search needs, consider the following questions.How many users do you have and where are they? What will your users be using the search appliance for?

Are users internal, external, or both? Its not just a search capability what benefit will your users get? What does the search experience need to provide for users to regard it as successful?

Are there different groups of users?

Do they require specific search capabilities? Do different user communities need different content or a different search experience? How important is search speed to each group? What is the relative sophistication level of users in each group?

How will the users typically access search?

Through a portal? Through a dedicated search page? Integrated with Google Desktop search so that localized content and enterprise content are brought together?

How can advanced search capabilities facilitate your users daily tasks?

Do you need mobile functionality? Do you need to consider resource logistics?

10

Google Search Appliance Deployment Guide

Understanding your contentYou have set a goal of delivering more relevant information to your users desktops by using the Google Search Appliance. To achieve this goal, you need to identify what content your users will need. As you identify content, consider a variety of sourcesand remember, not everything needs to be included on day one. As part of this activity, get an understanding of what your index capacity needs will be. For information about this topic, see Sizing the index on page 45. To understand your content, consider the following questions.What are your content sources?

Typical content sources that are often incorporated into a search deployment include: Intranet sites Your company website(s) File systems and shared drives Content Management Systems (CMS), such as Documentum Record/Document Management Systems (RMS/ DMS) Portals or collaboration sites, such as SharePoint Archives Databases Line Of Business (LOB) applications Other structured data

What are the details about each content source?

For each content source, identify: How the content can be accessed Roughly how many documents it contains Whether the content is: - Structured, for example, customer records - Unstructured, for example, a Word document - Both, for example, a customer letter (unstructured) in an RMS (structured) Whether the content is secured How content is secured Who uses it (or who you want to use it) How important it is How frequently it changes What kind of publishing process (if any) governs content revisions

Understanding Your Deployment

11

Understanding your business processesBusiness processes rarely exist in isolation. As you think about all the information that you will be putting at your users fingertips, identify the processes where being able to find rich sets of information will enhance or streamline processes and how these processes relate to one another. For example, think how much faster a call center employee could answer questions about a refund policy for a purchased product if she can simply search the policy databaseand bring up the purchase order in the same rich search window. In many cases, you might discover that processes also produce information that you want to make available through search. Also, it may be valuable to have visibility over in-flight business processes, such as being able to search currently open cases in a support queue. So you might want to enable the search appliance to crawl this information or otherwise integrate it with the search appliance.

Understanding your architectureJust as your processes work together to orchestrate the business of getting things done, your IT architecture components work together to deliver all of the various solutions. The success of a search implementation project also requires a thorough understanding of how it will interrelate with the other systems in your IT ecosystem. Think about your physical network design, and where the content is locatedboth geographically and from a network design perspective. Also, think about your requirements for security. Security architecture is particularly important for internal deployments of the search appliance, and requires planning.

12

Google Search Appliance Deployment Guide

To understand your architecture, consider the following questions.What are your physical systems?

Are the content systems located on fast Ethernet switches? What are the peak usage times for each content system daily, weekly, monthly and/or quarterly? Will the search appliance be located on a part of the network that requires access through a firewall or proxy to get to the content?

What is the security infrastructure surrounding you content?

Do you have a single security mechanism for all content, or do you have a heterogeneous authentication/ authorization environment? Will users require several identities/passwords to access all protected content, or is there a single sign-on solution in place? Do you have Active Directory (AD)? What version? Is Active Directory installed in Native Mode or Mixed Mode? Do you have NTLM v2?

Understanding Your Deployment

13

14

Google Search Appliance Deployment Guide

Planning for Successful Deployment

Chapter 3

A successful search solution is conceptually very simple: help users find the information they are looking for. Make search fast, make it easy, and make it relevant. The Google Search Appliance takes care of the speed, ease and relevance. But you need to plan and execute the project to take full advantage of the power of the search appliance. Key to this approach is remaining focused on short delivery cycles and structuring work around this. Every deployment of a Google search solution is unique. You might be providing search across SharePoint content and extending core search with purchase orders from SAP. Or you might be providing search of the hundreds of thousands of documents that businesses tend to accumulate over time, bringing them together with policy documents, and the contact details of the people who wrote them. Although each deployment has different content sources, security requirements, and user needs, there are core planning activities with fundamental guiding principles that apply to all search deployments. This chapter focuses on the following core planning activities: Capturing requirements, described on page 16 Identifying phases, described on page 22 Defining success criteria, described on page 28 Transitioning to business as usual, described on page 28

For scenario-based example deployment programs, see Chapter 4, Project Scenarios.

15

Capturing requirementsAs you capture requirements, group them into related sets that you can prioritize and align with phases of work. In general, focus on the following areas: User requirements, described in the following section Content and security requirements, described on page 18 Performance and scalability requirements, described on page 20 Administration and reporting requirements, described on page 22

User requirementsUnderstand what is important to make the deployment successful from the user perspective. In general, user requirements focus on: Usability, described in the following section Breadth and depth, described on page 16 Communication and feedback, described on page 17

UsabilityFor users, search should not be a chore. Defining usability requirements can help ensure that users find your search solution intuitive and effective. As you identify usability requirements, consider the following issues: What are the usability features that really make the search solution resonate with users? The Google Search Appliance offers many simple-to-implement, on-box usability features, such as Query Suggestions. For more information about this and other usability features see Enrichment features on page 99.

What speed requirements do users have? AJAX style technologies can dramatically enhance perceived responsiveness and performance, while providing a richer search experience.

In general, meet usability requirements as early in the release cycle as possible because these are not typically tied to content sources and they can get users excited about the search solution.

Breadth and depthA search solution needs to meet the demands of your user community. Defining breadth and depth requirements help ensure that you have covered a wide enough user group while providing the right content for them. As you identify breadth and depth requirements, consider the following issues:

16

Google Search Appliance Deployment Guide

What do the user groups look like? Where possible, the largest groups and the users experiencing the most frustration today should be brought on first. Using search appliance front ends, you can present a different look and feel and different content to various users, based on their needs. For information about front ends, see Using the search appliances front ends on page 97.

What are they trying to find now, but are frustrated that they cant? This is the content that should be in early phases. User onboarding should be aligned with the inclusion of the content they are looking for. That is, try not to give the users a search capability before including the content they will be looking for.

Do some users have more sophisticated search needs? Out-of-the-box advanced search can easily be augmented through rich use of metadata and other core functionality. Where this is not required, keep search simple, but functionally rich.

Communication and feedbackUser feedback is an effective tool for identifying usability issues. When you solicit feedback, you let users know that their opinions about the search deployment are important. Defining communication and feedback requirements ensures that you give users the ability to provide input to the implementation team on the search deploymentwhat is working for them and what is not working. As you identify communication and feedback requirements, consider the following issues: What do you need to communicate to users? In addition to adding new content and exciting new features, its important to make sure to tell your users about them to keep them excited about the product, and get kudos on your successes. Because most of your users already know how to use Google search technology, training needs typically are minimal, but make sure your users know that they can now search enterprise content with the same ease as they search the internet at home.

How will you get feedback from users? User feedback is one of the best measure of success. Consider conducting periodic surveys with user groups. See the sample search satisfaction survey on page 121. Also consider providing a feedback link for users.

Planning for Successful Deployment

17

Content and security requirementsFor most organizations, the following two aspects of a search deployment typically go hand-inhand: Content, described in the following section Security, described on page 19

Scenarios that encompass content and security can range in complexity from completely unsecured public website pages to complex integration with an Enterprise Resource Planning (ERP) system such as SAP or PeopleSoft, and everything in between. Plan your end-state architecture in the early phases, but also phase in both content and security. In other words, dont delay delivering a great search experience to your users because you want to index every last scrap of content or implement a security framework they wont need until later.

ContentIn general, analyze all potential repositories of organizational information. Although the Google Search Appliance excels at providing powerful, fast, and relevant search across unstructured content, you should not exclude structured content, such as your data warehouse, transactional systems, and so on. It is important to understand how content sources relate to each other, as this will help you define how to phase deployment of content. For example, content from a case management system may be supplemented effectively with content from a product catalog, enabling users to see not only product information, but also the types of problems and issues that users encounter when using the products. The following table lists various types of structured and unstructured content sources and considerations that can help you define how to phase its deployment.Structured/ Unstructured Complexity (L/M/H)

Content source

Consideration

File systems Public websites Databases Intranet websites Staff portal

U U S U U

L L L L M-H

SMB or HTTP HTTP Use database crawl or a feed, or web enable Might need to consider security Might need to consider security complexities Might need to account for non-unique URLs (the same URL containing different content, based on user role)

18

Google Search Appliance Deployment Guide

Structured/ Unstructured

Complexity (L/M/H)

Content source

Consideration

CMS

U

L-M

Might have metadata to leverage Need to determine if it can be crawled natively Might need to consider security

LOB applications (for example, Lotus Notes) Enterprise applications (for example, ERPs) Other transactional systems

S

L-H

If web enabled, might be able to crawl these content sources Might require a connector Security

S

M-H

Need to identify core data Might use feeds or a connector Security

S

L-H

Security Might be accessed by custom connector or OneBox module

SecuritySecurity can be the area of greatest complexity in a search deployment. As you analyze content, understand if it is secured, and if so, how it is secured (forms-protected, cookies, protected by application-level security, and so on). In a search solution, security has two main areas of impact: Crawling and content acquisition Serving and user authorization

For comprehensive information about the search appliance and security, see Managing Search for Controlled-Access Content at http://code.google.com/apis/searchappliance/ documentation/60/secure_search/secure_search_overview.html. Crawling and content acquisition The Google Search Appliance can make use of standard security protocols, such as NTLM or forms-based security. Understanding all the security permutations will help you plan for content acquisition. For example, security might have an impact on web and file system crawl that you need to plan for, such as configuring a proxy or ensuring your Windows file systems have CIFS enabled to support SMB crawling. More complex security might require alternative means of content acquisition, such as feeds or connectors.

Planning for Successful Deployment

19

You might find that you need to make small adjustments to your environment that enable the search appliance to crawl and acquire content without needing to use feeds or connectors. For example, you might find that to extend the crawl to a new subdomain, you need to modify a cookie domain as the search appliance crawls content to allow cookies to conform to request for comments (RFC) specifications. These types of changes are typically small and can be implemented through a variety of methods. Occasionally, security considerations require making adjustments to the indexing procedure or using an alternate content acquisition approach. These circumstances might affect aspects of the solution design that are not directly related to security architecture, such as feeds and publishing workflows. For more information, see Crawling secure content on page 57. Serving and user authorization When serving secured content, the Google Search Appliance first checks that the user is entitled to see relevant results. If the user is not entitled to view a document, it does not appear in the result set. Of course, you can always choose to make results public and apply no security at serve time. In many cases, search can initially be deployed unsecured, with security added as more content is acquired. Public search (such as an externally facing internet site) is typically deployed this way. In general, deployments with heterogeneous security requirements can be satisfied by using the SAML Service Provider Interface (SPI), described in SAML SPI on page 58. The SAML SPI is responsible for managing authentication and authorization checks across diverse systems and protocols. This capability provides great flexibility in how security will be implemented. You can: Purchase pre-built providers (see the Google Enterprise Solution Marketplace for examples) Build a custom solution Use one of many SSO providers so long as they support SAML

With the release of version 6.0, the Google Search Appliance also supports definition of policy access control lists (ACL), so that authorization checks can be performed against documents using early binding. Policy ACLs not only enhance performance, but give you more options for managing security. This new capability also gives you options to phase your secure search deployment. For information about policy ACLs, see Access control list caching on page 59. For more information about secure serve, see Serving secure content on page 57.

Performance and scalability requirementsNon-Functional Requirements (NFRs) are typically pure technical requirements. In a search solution, the most common NFRs are: Performance, described on page 21 Scalability, described on page 21

20

Google Search Appliance Deployment Guide

PerformancePerformance requirements typically revolve around how fast the solution returns results, though there may also be requirements around speed of content acquisition. Performance is typically dependent on a number of factors, including: Security requirements Content type Corpus size The type of queries being executed Network architecture and performance Additional search functions used (for example, query expansion or metadata filtering)

As a rule, if there are specific performance requirements, you should conduct a performance test early in the deployment to determine changes that may need to be made to the solution architecture. Although the Google Search Appliance itself cannot be modified, changes you can incorporate into your planned deployment include: Configuring policy ACLs to improve serve-time security checking. Deploying a reverse proxy to cache where possible for common searches. This change is beneficial only for public (non-secured) content searches. Minimizing network traffic between the Google Search Appliance and content sources. Although this change mostly has an impact on crawl, reduced latency will improve performance of late-binding authorization. Improving perceived performance through responsiveness optimizations (for example, AJAX). For example, by displaying a progress spinner to create the perception of responsiveness. Deploying additional search appliances to spread the load. This change reduces the demand on any single search appliance and helps ensure that capacity is not a constraining factor.

See Architecting for scale and performance on page 48 for further discussion of performance-driven search architecture. Performance requirements should also take crawling and indexing into consideration. Search appliance indexing adds load to your content systems. If there are specific times of the day in which the content systems must not be affected, then you need to understand this so that you can configure search appliance host load schedules accordingly. Furthermore, if the content system is sufficiently strained, or is particularly slow, you might consider content feeds as an alternative.

ScalabilityScalability requirements typically revolve around number of queries per second (QPS) or queries per minute (QPM). As with performance, the QPS that the solution supports depends on the security requirements, content type, query type, network performance, and a host of other factors.

Planning for Successful Deployment

21

Google recommends that where scalability requirements exist, you first re-ratify the requirementsideally with metrics derived on current searches. In many cases, the scalability requirement is not as high as first stated. While search solutions can be designed to support hundreds of queries per second, in practice, this is not usually required. The kind of scalability requirements needed from a search solution are substantially different from those of a transactional system. For more details about designing a search solution for increased scalability, see Architecting for scale and performance on page 48. For information about the number of concurrent connections that the Google Search Appliance can accept, see Designing a Search Solution at http://code.google.com/apis/ searchappliance/documentation/52/troubleshooting/Designing_Search_Solution.html#Queueing.

Administration and reporting requirementsReporting is an important part of any enterprise application, and search is no exception. In addition to the search reporting requirements described in Using reports to enhance the search experience on page 91, identify other reporting requirements. In particular, pay attention to Non Functional Requirements (NFRs). Requirements to consider include: The analytical technology to be used (for example, Google Analytics, Advanced Search Reporting, or some other third-party tool) Reporting frequency and distribution Other reporting types that may be required (for example, administration events)

Make sure that you understand the business processes that will use these reports. For example, you should understand the use cases for your reporting requirements and make sure that the reporting strategy will deliver on them.

Identifying phasesMost search deployments fall into one of the following categories, listed from simplest to most complex: Specialized deployments focused on delivering a familiar, powerful search experience to customers of an organizations public or secured externally-facing information. Stand-alone search deployments focused on providing general productivity gains to enterprises and making better use of information assets. Search deployments driven by a compelling event or larger deployment, such as implementing a new portal, delivering an Enterprise Content Management (ECM) system, or launching a new Information Architecture project.

A search deployment typically targets quick wins to deliver a rich search experience to users rapidly, with incremental, iterative delivery of additional value over the life of the search deployment.

22

Google Search Appliance Deployment Guide

Business value is derived from the breadth of content over which the search capability is delivered and the usability and effectiveness of the search experience, as illustrated in the following figure. Deployment phases

The key to successful search deployments is to deliver early and deliver often. Dont try to do everything at once. Your users will benefit from getting access to the content they want as early as possible. Delivering early means quick wins that can help drive support with your stakeholders and generate excitement and visibility with your users. Phase scope could be defined in terms of: Content sources Security User groups Usability features

Each phase should include an evaluation task, where you explicitly evaluate user satisfaction, and feature requests. As always, evaluate feature requests, including risks associated with implementingand not implementing. In general, since each phase is of relatively short duration, you can use most delivery methodologies, ranging from Agile to Life Cycle.

Planning for Successful Deployment

23

If you are using a more classical variety of development methodology, keep in mind that the development phases of a Google search project are relatively short. In this case, you need to make adjustments so you can effectively deliver a quality search experience in a flexible manner. Many of the technologies that you will use (Extensible Stylesheet Language TransformationsXSLTstylesheets, OneBox modules, and so on) can be quickly implemented and rapidly adjusted. You need to have flexibility in your approach to prototype rapidly and iterate on deliverables. This section discusses how you can structure your deliverables and project plans to broaden the search footprint and increase use of your search solution. Each delivery moves your deployment further along the value curve.

Where to startThe Google Search Appliance is designed to be rapidly deployed over core content sources. Leveraging open standards and protocols allows rapid integration of content from a variety of sources and implementation of rich usability features, such as Search-as-you Type, useradded results, and dynamic results clusters. Phases can be as short as a week or two or as long as a month. Google recommends that you structure your program of work to aim for shorter phases, with rapid delivery of iterative functionality, content, or user groups. In many cases, a single rapid delivery phase is all that is required. However, even when your deployment is part of a longer running, comprehensive program of work delivering universal search across all your enterprise assets, you should still structure your phases to deliver quick wins. Before you commence your search deployment, complete the following core tasks, so that your search deployment specialist can get your search appliance up and running as quickly as possible.Before you start

Rack the search appliance. Configure network settings. Inventory your content sources (including document count). Inventory your security systems. Configure your network to allow the search appliance access to all content sources, and if required, restrict access to secure areas. Create any user IDs needed by the search appliance to crawl content.

24

Google Search Appliance Deployment Guide

Phasing your activitiesYou might consider phasing your activities as described in the following sections: Early development on page 25 Incremental releases on page 26 Advanced delivery on page 27

Early developmentDelivery items listed in the following table are typically relatively quick and easy to deliver. Consider them as candidates for early development. Many of these could be considered mandatorya custom front end for example, no matter how simple, should always be a part of the core delivery.Candidates for early development Delivery Item Type Complexity

Basic HTTP crawl: Intranet Extranet Website Wiki Web-enabled knowledge bases (for example, Lotus Notes) File system crawl: SMB crawl of shared drives HTTP crawl of HTTP-enabled drives SharePoint sites Basic OneBox modules (for example, PeopleFinder) Lightweight Directory Access Protocol (LDAP) authentication Kerberos integration Query suggestions/Search-as-you-Type User-added results Custom front end Advanced search reporting

Content sources

Low

Content sources

Low

Content sources Content sources Security Security Usability Usability Usability Usability

Low to medium Low to medium Low Medium Low to medium* Low Low to medium Low

Planning for Successful Deployment

25

Candidates for early development Delivery Item Type Complexity

Primary system users Business owners

User groups User groups

Low Low

*Depending on whether you use out-of-the box features or a customized implementation. Complexity may vary depending on your infrastructure and environmental configuration.

Incremental releasesDelivery items listed in the following table are candidates for incremental release. Consider these items and schedule their deployment according to priority (typically based on volume of content, and business criticality), and level of effort. In many cases, you can accelerate delivery by using third-party tools (such as connectors) and certified Google Enterprise partners, who are experienced in Google Search Appliance integration issues. Some of these delivery items (for example, customized advanced search) might require some user feedback before full implementation.Candidates for incremental releases Delivery Item Type Complexity

Portal content Non-web-enabled knowledge bases (for example many Lotus Notes Databases) Content Management Systems Custom OneBox modules (may be secured) Custom application content Additional connectors (FileNet, Livelink, Documentum) Customized advanced search Advanced usability features (for example, AJAXdriven user interface) Cross-language translation Additional users dependent on new content

Content source Content source Content source Content source Content source Content source Usability Usability Usability User groups

Medium to high Medium Low to high Low to high Low to high Low to medium Low Low to medium Low to medium Low to medium

26

Google Search Appliance Deployment Guide

Advanced deliveryDelivery items listed in the following table are candidates for advanced delivery. Advanced Delivery candidates might require more time or effort to implement or they might not be required at all. If these items are part of your search deployment, you can implement them in parallel with other deployment tasks. This way, you can get users up and running with core content immediately. In some cases, items are structured data sources that require analysis before understanding how best to integrate into the search experience (for example, Business Intelligence platforms).Candidates for advanced delivery Delivery Item Type Complexity

Advanced security (including Policy ACLs) SAML SPI (single sign-on) provider Record management systems ERP systems (SAP, Oracle, PeopleSoft) CRM systems (for example, Siebel) Data warehousing/BI platforms Other Line of Business systems

Security Security Content sources Content sources Content sources Content sources Content sources

Medium to high Medium to high Medium to high Medium to high Medium to high Medium to high Medium to high

How long should phases be?In general, phases should last anywhere from a few days to a few weeks. Although work efforts vary and require specific estimates, the duration to complete tasks can be derived from complexity, as listed in the following table.Complexity Duration

Low Medium High

2-8 hours 1-5 days 1-4 weeks

The times in this table are guidelines only and will vary, based on your environment and requirements. Google recommends that you perform an analysis to determine the work effort specific to your deployment. In addition to the work effort, you need to allow enough time to acquire content. Strive for having as much content in the index as possible from targeted content sources. This is not to say that you should wait until you get every possible content source into your search solution, but rather that you should have in the index all the content from the systems you are incorporating in the current release.

Planning for Successful Deployment

27

It is challenging to predict how quickly the Google Search Appliance will acquire content, as the rate of acquisition is dependent on a number of factors, including: Network performance Server performance Host load Content type

Google recommends running some tests early in the project life cycle to determine content acquisition speed. Use this information to help you plan accordingly.

Defining success criteriaBefore you commence your project delivery, define what constitutes your success criteria, so that you have a clearly defined set of acceptance criteria. Typical success criteria for search deployments include: User-executed assessments of relevance (for example, user ratings) Security tests (authentication and authorization is working for all secured systems) Breadth of content (volume of content is in the indexfor example, 95% of content in a system) Breadth of roll out (percentage of users activated)

Transitioning to business as usualBusiness As Usual (BAU) tasks are the regular tasks that are carried out to operationalize the delivered solution. Unlike other search solutions, the Google Search Appliance does not require constant adjusting and tuning of the algorithm, nor a dedicated team to do so. However, as with any enterprise solution, there are some tasks that should be carried out regularly. These are discussed in Post Deployment on page 83. You need to plan your resourcing to manage these tasks, as the operational team who will be responsible for BAU may not be the same as the team who deployed. Perform tasks in preparation for transition to BAU as described in the following sections: Document standard operating procedures on page 29 Document your support arrangements on page 29 Prepare your analytic solution on page 29 Configure Monitoring on page 30 Export and back up your configurations on page 30 Transition user enrichments on page 30

28

Google Search Appliance Deployment Guide

Document standard operating proceduresWhen you deploy search to production, you introduce some new operational processes. Document these processes and transition them to the BAU team. Typical processes include: Troubleshooting your Google Search Appliance and environment Raising a support ticket (see https://support.google.com, password required) Executing an emergency failover to a hot standby Creating and managing administrator and manager roles on the search appliance Managing KeyMatches, related queries, and query expansion synonyms Any processes around additional technologies (for example, OneBoxes modules, SAML providers, and so on) Migrating code assets and configurations from your development environment to your production environment

Document your support arrangementsMake sure your BAU team knows the support arrangements for your Google Search Appliance. When contacting Google Enterprise Support, the BAU team should have the following information available: Your search appliance IDs Your login details to https://support.google.com Remote access details for chosen methods (SSH configuration and routing, support call, and so on) License information Google/Partner support contact information In many cases your Google Enterprise partner will be a very effective contact for resolving challenges.

This preparation allows for efficient use of Google Enterprise Support, should you need it. For details, see Optimizing support on page 89.

Prepare your analytic solutionConfigure advanced search reporting or another analytical engine, so that user searches and behavior can be reported on and analyzed. The analysis can be used to enrich and enhance the search experience. You can also output logs to a syslog server to leverage third-party log processing tools that you might already have in use.

Planning for Successful Deployment

29

For information about advanced search reporting, see Gathering Information about the Search Experience at http://code.google.com/apis/searchappliance/documentation/60/ admin_searchexp/ce_improving_search.html#gather.

Configure MonitoringEstablish a method for monitoring your Google Search Appliance. You can use SNMP, or some of the monitoring tools discussed in Designing a Search Solution, at http:// code.google.com/apis/searchappliance/documentation/60/troubleshooting/ Designing_Search_Solution.html#Monitoring. You could also monitor your search appliance by using a custom solution. Anything that allows you to monitor your search appliance actively will give you additional confidence and stability in your deployment, and will allow you to identify problems early.

Export and back up your configurationsExport configurations from all your Google Search Appliances, as well as any code from other assets. Store them in a version control system.

Transition user enrichmentsMake sure that the BAU team is familiar with the user enrichments made by the business, such as KeyMatches, related queries, and so on. These user enrichments need to be adjusted over time as the required effects change. For example, when a new policy document or product is launched, KeyMatches relating to the old version may need to be updated. The BAU team needs to be aware of them, and the appropriate business owners.

30

Google Search Appliance Deployment Guide

Project Scenarios

Chapter 4

The project scenarios in this chapter illustrate how a successful deployment might be executed. These scenarios include: Basic search on a public website, described on page 32 Basic internal search, described on page 35 Internal search over intranet, file system, SharePoint, and Notes, described on page 38 Internal search including CMS, database, corporate application assets, described on page 40

Each of these scenarios is based on the following assumptions: The deployment team is familiar with the Google Search Appliance. If required, a certified Google partner can help. There are no significant problems in the deployment environment. All environments differ, and yours may have unforeseen complexity.

The time lines and project plans used in this document, while examples, should not be taken as reference plans. Your own time lines might reflect greater complexity. When you plan a deployment project, take specific business or technical requirements into consideration. Always include contingencies in your plans.

31

Basic search on a public websiteIn the use case for this project scenario, Alpha inc. is deploying search over a public-facing website containing a massive amount of information about products sold in their retail stores. Most of the content is public, but there is also protected content in a secured members section for customers who have purchased a product and registered it. While all users search for public, product content, members might also search for protected content, such as support information.

Scenario summaryContent sources

General corporate information Product Catalog (published to the site as navigable product pages) Support documents (frequently asked questions (FAQs) and .pdf files containing product information)

Key requirements

Index all core content. User interface (UI) must be standardized, and maintained to be consistent with look and feel changes to the site. Secure members content must only be accessible to users authorized to see it. Understand user activity to increase stickiness and conversion rate.

Key decisions

Present results directly from the search appliance or by means of a web application presentation layer? Manage security by using the search appliance or by means of an application?

32

Google Search Appliance Deployment Guide

Chosen approach

Present results from a web application, using existing templates. The search appliance will delivers result to the web application in XML form. The web application will process the results and render them seamlessly on the website. Many organizations will render results directly from the search appliance by using XSLT.

Crawl content by using a forms authentication rule for secured content. Create four searchable collections: All public content Products Members-only content (support material) All content

Manage security at the application level: Mark secured content as public. The application will search only against collections containing documents that a user is entitled to see. Users who are not logged in will search only against public content. Users who are logged in will search by default against both public and members-only content.

If desired, users will be able to choose specialized product search or support information search by means of basic user interface (UI) widgets, such as check boxes or drop down lists. Use Google Analytics to track not only search activities, but also broader site usage.Possible architectures

High availability load balanced architecturepotential large volume of searches requires the system be able to handle excessive load. Implement LDAP security, or provide application-level authorization to search to limit access to secure information so that site members can access authorized information after authenticating themselves.

Project Scenarios

33

Project planThe following figure shows a generalized gantt chart for deploying basic search on a public website. Basic search on a public website project plan

EnhancementsThe initial deployment should also be followed by a set of rapid enhancements with short delivery cycles. Enhancements include: Product OneBox moduleretrieves product pricing and availability directly from supply chain system. When a user searches for gadget, they will also get the price, and availability of gadgets in real time. Store locator OneBox modulefor logged in users, this OneBox module could retrieve information about stores within 10 miles of them and display it by means of Google Maps. Related product OneBox moduleretrieves supplemental product information based on data mining from the companys business intelligence (BI) platform to help drive additional sales.

The following figure shows a generalized gantt chart for enhancement phases for deploying basic search on a public website.

34

Google Search Appliance Deployment Guide

Enhancement phases

Basic internal searchIn the use case for this project scenario, CorpCom LLC has a fairly extensive internal web presence that extends out to different parts of the globe. They want to consolidate the searching of all their internal websites and pages to one place so their employees will not have to go to different websites to search for information. Although all the users are employed with CorpCom, not all of them have access to all the information on the various sites in their corporate domain. For example, Human Resources (HR) information access is desirable by means of the search, therefore securing personal information is important.

Scenario summaryContent sources

Corporate file shares Internal U.S. web pages Foreign web pages, with different country content under different folders. For example: http://intranet.corpcom.com/us/content for U.S. http://intranet.corpcom.com/fr/content for France

HR information

Project Scenarios

35

Key requirements

Index all pages that are web accessible. Index foreign language content. Seamless sign-on. A standard search page where employees go to search for information. Secure content must only be accessible to users authorized to see it.

Key decisions

Present results directly from the search appliance or by means of a web application presentation layer? Present foreign language content directly from the search appliance or by means of a web application presentation layer? Manage security by using the search appliance or by means of an application? Use Advanced Search Reporting for on-board analytical capabilities?

Chosen approach

Present results directly from the search appliance by using existing templates. Since the majority of the users will be searching English language content solely, foreign language content will be identified as a later enhancement to be implemented by using different front ends. Some languages, such as Scandinavian, will be deployed with Language Bundles (a release 6.0 feature).

Crawl content using LDAP authentication for secured HR content. Foreign content will be segmented in separate collections, based on language type.Possible architectures

Federated architectureSearch integration of disparate data stores with the need of replicating indexes across different departments/groups. Implement Kerberos security to limit access to secure informationall users will have network accounts which complements using integrated authentication and authorization with Kerberos.

36

Google Search Appliance Deployment Guide

Project planThe following figure shows a generalized gantt chart for deploying basic internal search. Basic internal search project plan

EnhancementsThe initial deployment should also be followed by a set of rapid enhancements with short delivery cycles. Enhancements include: Collection--crawl foreign language sites and store them in a separate collection. XSLT template--develop a foreign language presentation XSLT template. HR OneBox module--retrieves directory information for employees.

The following figure shows a generalized gantt chart for enhancement phases for deploying basic internal search. Enhancement phases

Project Scenarios

37

Internal search over intranet, file system, SharePoint, and NotesIn the use case for this project scenario, Cybertron Appliance Inc. houses different data corpora that are being served up on different servers on their corporate network. These data silos are accessed by way of different data management applications such as SharePoint, Lotus Notes Databases, as well as secure file shares. Having to go to different applications to find information has become tedious and very time consuming for their employees. Not only that, the loss in productivity in trying to locate a particular piece of information has started to show up on their bottom line because of the repetitive switching between disjoint systems to search for information and ineffective existing search tools.

Scenario summaryContent sources

Secure file shares SharePoint portal data used to host internal sites Lotus Notes Domino data in business databases

Key requirements

Index each individual data silo keeping content secure. Create standard default UI for data access. Create custom interfaces for internal and external users. Secure content must only be accessible to users authorized to see it. Deployment must result in a measurable business benefit.

Key decisions

Present results directly from the search appliance or by means of a web application presentation layer? Provide documentation so that each group can incorporate search functionality into their own custom application/website or use only the search appliances default search and results pages? Manage security by using the search appliance or by means of an application? Integrate a connector to access Lotus Notes or web enable databases? Use Google SharePoint connector? Crawl SMB file share?

38

Google Search Appliance Deployment Guide

Chosen approach

Conduct a short study to capture time spent on existing platforms, in parallel with deployment. Conduct a post-deployment evaluation of the new solution to evaluate its effectiveness. Implement analytics for ongoing evaluation of effectiveness. Present results directly from the search appliance by using customized front ends for different data stores. For SharePoint users the Google Search Box for SharePoint will be used.

Implement SharePoint connector early to access data in SharePoint portal. Crawl SharePoint and secure file share content using Kerberos Authentication. Use Google Search Box for SharePoint to deliver integrated search experience. Index Lotus Notes database using a connector.Possible architectures

Federated architectureSearch integration of disparate data stores with the need of replicating indexes across different departments/groups. Implementation of integration architectures: SharePoint Connector for SharePoint Portal Content and Metadata feeds for Domino DB

Implement Kerberos security to limit access to secure informationall users will have network accounts which complements using integrated authentication and authorization with Kerberos. Implement SAML SPI Architecture for heterogeneous security.

Project Scenarios

39

Project planThe following figure shows a generalized gantt chart for deploying internal search over intranet, file system, SharePoint, and Notes. Internal search over intranet, file system, SharePoint, and Notes project plan

EnhancementsThe following figure shows a generalized gantt chart for enhancement phases for deploying internal search over intranet, file system, SharePoint, and Notes. Enhancement phases

Internal search including CMS, database, corporate application assetsIn the use case for this project scenario, Sisyphus Work Force is a large labor organization that employs people from around the world. Their information is scattered around various kinds of systems that do not communicate readily with each other. They also have a substantial internal corporate web presence with content that is served by a CMS. Their internal HR system is a large database repository and they have various commercial and custom applications that allow users to gain access to data based on different access methods. Directory information (for example contact details, manager and direct reports), performance reports, and salary information are stored on this system.

40

Google Search Appliance Deployment Guide

Scenario summaryContent sources

Intranet websites Shared filed system Oracle database CMS Custom application

Key requirements

Index each individual data silo, keeping content secure. Create standard default UI for data access. Create custom interfaces for different groups in the organization. Secure content must only be accessible to users authorized to see it.

Key decisions

Manage security by using the search appliance or by means of an application using Active Directory? Present results by using the search appliance default front end or by means of a secondary application? Use a phased approach for system deployment?

Chosen approach

Deploy to initial pilot group prior to full rollout by means of a corporate portal. Crawl and serve secure content (for example, HR or Salary information) using LDAP. Manage security at the application level. Initially index a selected cross-section of data holding with additional documents to be added later. Present results directly from the search appliance by using the default XSLT style sheet. Due to the diversity of content and source, use a phased approach for deployment. Intranet sites and file share will be in initial deployment. Database feeds for Oracle HR system will follow. CMS systems and related portals will be next. A survey of corporate applications that house and serve data will be conducted and a determination will be made on which will be accessed for search.

Project Scenarios

41

Possible architectures

Federated high availability deployment architecture with disaster recovery capabilitySearch integration of disparate data stores with the need of replicating indexes across different departments/groups while ensuing virtual 24/7 up-time so that productivity is not lost. Implementation of integration architectures: Content and Metadata feeds for CMS Custom connector for database search

Implement Kerberos security to limit access to secure informationall users will have network accounts which complements using integrated authentication and authorization with Kerberos. Implement SAML SPI Deployment or policy ACL deployment to handle diverse security or poorly performing systems.

Project planThe following figure shows a generalized gantt chart for deploying internal search including CMS, database corporate application assets. Internal search including CMS, database corporate application assets project plan

EnhancementsMultiple short iterative enhancement phases deliver incremental functionality, delivering new content to your users, and allowing opportunities to increase visibility and drive uptake with new users. The following figures show gantt charts for enhancement phases.

42

Google Search Appliance Deployment Guide

Enhancement phases project plan A

You can begin by including unsecured database content by means of a database feed, crawl any additional content that still needs to be acquired, and then release to the primary user groups. Enhancement phases project plan B

Now that your users are searching across their information, the next phase is to rapidly build a method to feed content from your CMS to the Google Search Appliance. Enhancement phases project plan C

And finally, you can begin to consume content from your corporate applications, in short, phased migrations. These can be planned and repeated as needed to deliver true Universal Search. Note that phases may have longer durations where security integration is required.

Project Scenarios

43

44

Google Search Appliance Deployment Guide

Deployment Architecture

Chapter 5

This chapter discusses the following technical and architectural considerations for planning your deployment: Sizing the index, described on page 45 Architecting for scale and performance, described on page 48 Architecting for reliability, described on page 50 Architecting for reach, described on page 52 Architecting for security, described on page 56 Enhancing technologies, described on page 60

For examples of architectures that address common deployment scenarios, see Deployment Scenarios on page 61.

Sizing the indexThe process of sizing your Google Search Appliance index consists of the following activities: Scoping index capacity needs, described on page 46 Determining what to index, described on page 47 Determining how to index, described on page 48

45

Scoping index capacity needsGoogle Search Appliance models are: GB-7007can index up to 10 million documents GB-9009can index up to 30 million documents out of the box. For larger deployments, multiple GB-9009 appliances can be linked together to search hundreds of millions or even billions of documents.

From a sizing perspective, Google recommends that organizations choose a base unit that meets the current document capacity needs, as well as projected document growth needs for two years. Google Search Appliances are designed to be flexible, so if a model upgrade is required at any time, there is a seamless migration plan that allows for the installation and transition to the new unit with no service downtime. However, because upgrading requires a hardware change, if the current document capacity is close to the physical indexing limits of the GB-7007, Google recommends selecting the GB9009 to simplify management of the solution over time. The Google Search Appliance is also designed to operate intelligently up to the license limits within each respective model to ensure the most optimal user experience. When the license limit is reached in a given model, the search appliance continues to discover relevant documents outside the license limit in an effort to maintain a servable index of the most relevant documents found in the environment. However, this will create churn while less relevant documents are removed in favor of more relevant ones. If your search appliance is nearing its license limit, consider upgrading to a higher document count. This process of continual discovery and analysis beyond the license limit provides an automated and intelligent method of managing the search experience when operating in an environment where more documents are available than the license limit allows. However, the search appliances automated pruning logic could cause certain critical content to be excluded from the index to make room for more relevant content. If mission-critical content exists beyond the license limit, Google recommends expanding the license limit to ensure that all the relevant content can be indexed and served with additional room to grow. For a discussion of the choice between a upgrading search appliance upgrade and deploying additional hardware, see Scale up/scale out on page 49.

Dynamic scalabilityDynamic scalability is a release 6.0 feature that enables multiple Google Search Appliances to work together in a federated environment to scale up to as many documents as you wish to search in a unified manner. In a dynamic scalability configuration, one search appliance is the primary node and the others are secondary nodes. The primary search appliance aggregates results from all of the search appliances in the configuration and serves them to the search user. The primary search appliance's front end is used for searching all document corpora in the dynamic scalability configuration.

46

Google Search Appliance Deployment Guide

Any model of the Google Search Appliance running software version 6.0 or later can be configured to participate in a dynamic scalability configuration. The configuration may include different search appliance models, provided they are all running the same software version. For information about dynamic scalability, see Configuring Dynamic Scalability at http://code.google.com/apis/searchappliance/documentation/60/dynamic_scale/ dynamic_scale.html.

Determining what to indexDetermining what to index is a process of identifying content sources, which typically include: Corporate websites Partner extranets Corporate intranet sites Portals Knowledge bases Content and Document Management Systems File shares

This process might seem straightforward. However, you might uncover more information within a given content source that needs to be indexed than you originally anticipated. For example: A file system might be much deeper than expected A Content or Document Management System might contain more documents than originally anticipated A website might have more pages than expected

The Google Search Appliance provides capabilities to limit content indexing by implementing simple content acquisition rules (follow and crawl URLs). Limiting the index scope by adjusting these rules can be an ongoing discovery process that needs to be taken into consideration, especially when the content sources targeted for search are not well-maintained or are managed in a decentralized fashion. The search appliance provides detailed logs on each document that has been indexed and also provides summary information on document types and sizes. Crawl Diagnostics features allow administrators to fine tune follow and crawl URLs to ensure that the most relevant content is being indexed and served at any time.

Deployment Architecture

47

Determining how to indexOnce you have identified content sources to index, you need to take the method of indexing each one into consideration. The Google Search Appliance can use several methods to acquire content for indexing, including: Web crawl Database synchronization Feedsboth content and Metadata and URL Connectors to third-party applications

Determining the most effective method of indexing depends on the content sources that need to be indexed. For example, corporate websites, partner extranets, wikis, corporate intranet sites, and informational portals can often be easily indexed by using the search appliance's crawling technology. The crawl process issues HTTP requests or follows links to locate content on a website or file system. To configure crawling, an administrator follows a simple process of defining URL rules in the search appliance's simple-to-use web-based Admin Console. For comprehensive information about crawl, see Administering Crawl for Web and File Share Content at http://code.google.com/apis/searchappliance/documentation/60/admin_crawl/ Introduction.html. For information about database synchronization, see Database Crawling and Serving at http://code.google.com/apis/searchappliance/documentation/60/database_crawl_serve.html. For integration into Content and Document Management Systems, knowledge bases, and collaboration tools, such as Microsoft SharePoint, using feeds or a connector might be the most effective method for indexing content. For more information, see Feeds and Connectors on page 55. For complex deployments where search spans multiple information sources, consult a Google product specialist or Google Enterprise partner to determine the optimal methods of indexing.

Architecting for scale and performanceAlmost every organization continually produces content. Indexing additional content adds value to your search solution deployment. To accommodate additional content in a search appliance deployment, you need to architect for scale as described in the following section, Scale up/scale out. Similarly, organizations might need to support additional users and search loads over time. To accommodate this type of growth, you need to architect for performance, as described in Load balancing on page 49.

48

Google Search Appliance Deployment Guide

Scale up/scale outBased upon the amount of content to be indexed, consider whether to scale up or scale out. To scale up is to upgrade to a larger capacity model of the Google Search Appliance. To scale out is to configure two or more search appliance models in a federated environment, as described in Dynamic scalability on page 46. For example, if you require 25 million documents to be indexed, should you use one GB-9009 or three federated GB-7007s? The answer to this question is dependent on a number of factors, including, but not limited to the following issues. These items are in no particular order. The factors important to your deployment may be completely different from those important to another deployment. How much rack space is available or are there power restrictions in the data center? For limited rack space or power restrictions, Google recommends that you choose a more powerful search appliance model instead of multiple, federated search appliances. Is a hot backup required (increased cost for more servers)? Each hot backup has a fixed cost, so if you require multiple hot backup servers, the cost might be greater than one individual, but larger, search appliance. Alternately, a deployment made up of many lower capacity servers might be more costly than a single larger one. Therefore, investigate this issue before deciding on the type and number of search appliances that will be used in the solution, as total procurement cost, including production and hot backups may be subject to change. Are there multiple secure repositories that have to be indexed? Are there multiple departmental owners who want to control their own search service? In some instances, individual content owners prefer to own their own search appliance. If this is the case, then a dynamic scalability configuration using multiple search appliances would be the solution. Deployment of additional search appliances may enable you to execute crawling in parallel, increasing acquisition and renewal of content. Deployment of multiple nodes offers trade-offs to be considered. While multiple appliances may require additional management and support, this also means in many cases, a single node may be taken offline without disrupting the rest of the search deployment. This is not typically the case with a single node deployment.In general, if a single search appliance can provide for the corpus size, use one search appliance, if possible.

To read about deployment scenarios that use dynamic scalability, see Federated architecture on page 79.

Load balancingLoad balancing distributes network traffic of a particular type to two or more instances of an application, dividing the work load between the instances. A load balancer is a software or hardware application that distributes the network traffic. When you configure two or more Google Search Appliance systems for load balancing, search queries are distributed between the two systems.

Deployment Architecture

49

Determining whether a load balancer is required is dependent on a number of considerations, such as: The peak load of queries that the search appliance will receive The number of users who are going to be using the system Where the users are located

A large number of queries per second at peak time, or a very diversely located user base, generally requires multiple search appliances using load balancing to help serve the results at an acceptable rate. Load-balanced search appliances also provide a level of redundancy that is not possible with a single search appliance. You can setup search appliances in the following configurations: A single search appliance on a network with no other search appliance for failover or fault tolerance. This is not a load-balanced configuration. A load balancing configuration in which there is a physical connection between the search appliances and the load balancer and each search appliance is on the same network or subnet as the load balancer. A load balancing configuration in which there is a logical connection to the load balancer and each search appliance is potentially on different networks or subnets from the load balancer. A failover configuration in which a switch fails over search queries from the search appliance that normally responds to search queries to a search appliance that does not normally respond to search queries and is used only for failover. For more information, see Failover configurations on page 51.

Note: In each of the above configurations, each search appliance could be one or more search appliance in a federated deployment. Load balancers can be used with virtually any architecture, such as Federated high availability deployment architecture, described on page 80. Google does not recommend specific load balancers to use with the search appliance. The configurations described in this document are expected to work with any equipment that complies with networking RFCs. To read about deployment scenarios that use load balancing, see High availability architecture on page 65. For information about load balancing, see Configuring Search Appliances for Load Balancing or Failover at http://code.google.com/apis/searchappliance/documentation/60/configuration/ Configuration.html.

Architecting for reliabilityThe Google Search Appliance often provides mission-critical search capabilities for applications, such as: Public-facing retail websites

50

Google Search Appliance Deployment Guide

Web-based application services Supplier and partner extranets Customer service center applications

For any application where Google Search Appliances are providing mission-critical search capabilities, Google recommends a high availability configuration to provide seamless operation in the event of a system failure. High availability is the deployment of multiple search appliances in a configuration where if one appliance fails, a secondary appliance (or appliances) is able to fail over and resume service seamlessly with minimal, if any, disruption.

Failover configurationsFailover configurations typically involve two instances of an application or a particular type of hardware. The first instance, sometimes called the primary instance, responds to search queries. If the first instance fails, the second instance, sometimes called the secondary or standby instance, starts responding to search queries. One such implementation is a domain name system (DNS) switchover configuration that provides a redundant "hot spare. This configuration involves multiple search appliances, where one is used in production and the second one is used as a hot spare. These search appliances can be located anywhere, physically or logically. The DNS switchover may be automatically executed in the event of a failure. This switchover can be executed manually, but it typically results in a more extended outage, due to the need to wait for manual execution, and, depending on your environment, time for DNS changes to propagate. Changes are made in DNS to restore the search if the primary search appliance becomes inaccessible. This setup is only used for redundancy (or failover) and does not provide a method of load balancing. To read about deployment scenarios that use failover, see High availability architecture on page 65. For more information about failover, see Configuring Search Appliances for Load Balancing or Failover at http://code.google.com/apis/searchappliance/documentation/60/configuration/ Configuration.html.

Active/Active vs. active/passive failoverOnce you determine the appropriate Google Search Appliance model based on index capacity needs, you can begin to scope your query throughput needs. Each model of the Google Search Appliance has been tested to meet a query volume that is suitable for most internal corporate network search requirements, as well as most public-facing, site-search requirements. However, some organizations have needs to scale beyond one search appliance. For these instances, multiple search appliances can be deployed in parallel to scale in a linear fashion for query volume. By determining the number of queries, you can determine if you require:

Deployment Architecture

51

An active/active setup, in which two appliances are set up and serving results concurrently An active/passive failover setup for fault tolerance, in which two search appliances are set up, with one serving results and the other to be used only in the event of a failure on the primary search appliance

Architecting for reachA great search experience depends on a number of factors, with a powerful relevance algorithm as one of the most important. The Google Search Appliance takes care of relevance, so you can focus on another of the most important factors: making sure your search solution can reach and search over all your organizations most relevant content, wherever it is. Not all content can be accessed and discovered by crawling. To make sure that this content is in your index and searchable, you might need to use the following integration technologies: OneBox modules, described in the following section Feeds, described on page 54 Connectors, described on page 55

OneBox modulesThe name "OneBox" refers to the search box that provides access to information from many sources. OneBox also refers to the formatted output that appears in response to specific query keywords. OneBox modules are a powerful tool at your disposal for increasing the breadth of content in your search deployment. The following figure shows the OneBox module that appears when a user searches for finance.

52

Google Search Appliance Deployment Guide

OneBox module

OneBox Modules enable a Google Search Appliance to integrate with third-party systems in real time. OneBox modules supplement Googles powerful algorithmic search with purposebu