Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
-
Upload
best-tech-videos -
Category
Documents
-
view
117 -
download
2
description
Transcript of Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
![Page 1: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/1.jpg)
![Page 2: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/2.jpg)
Google Search Appliance Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Nitin MangtaniMay 27, 2009
![Page 3: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/3.jpg)
Search is the starting point to the world’s information
![Page 4: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/4.jpg)
Google Enterprise Search
More than 20,000 enterprise search customers
Dedicated team of enterprise engineers focused on solving enterprise search problems.
Backed by Google’s core research and development
Bringing Google.com search experience to businesses
Our Search Products
![Page 5: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/5.jpg)
Universal Search
Employee Directory
Content Management
Wikis
Intranet
File share
SharePoint
![Page 6: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/6.jpg)
Google’s Search Philosophy
User
All information‘Real-time’ dataCustomizable and extendable
Reach
Highly secure architectureStandards-basedLeverage existing security
Security
Intuitive, unified resultsHighly relevantUser-friendly innovation
Large corpus searchCross-enterprise managementFlexible infrastructure
Scale
![Page 7: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/7.jpg)
Personalized Search Experience
Marketing
Engineering
![Page 8: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/8.jpg)
Advanced Biasing Controls
Administrators can create multiple biasing policies.
Source biasing
Date biasing
Metadata biasing New!
Front-end biasing New!
Simple setup - No complex coding or scripts.
![Page 9: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/9.jpg)
Metadata Biasing New!
Determine influence of metadata parameter
On Specific metadata name,
content
Biasing based on metadata attribute and value
“Boost all documents that have author as Larry Page”
Administrators control influence (positive or negative) on metadata attribute/value pairs
![Page 10: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/10.jpg)
Embedding Search Box in your application
<form method="GET" action="http://search.mycompany.com/search"> <input type="text" name="q" size="32" maxlength="256" value="query string"> <input type="submit" name="btnG" value="Google Search"> <input type="hidden" name="site" value="default_collection"> <input type="hidden" name="client" value="default_frontend"> <input type="hidden" name="output" value="xml_no_dtd"> <input type="hidden" name="proxystylesheet" value="default_frontend"></form> Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways.
A web application may make a HTTP GET request directly:GET /search?q=query+string&site=default_collection &client=default_frontend &output=xml_no_dtd &proxystylesheet=default_frontend HTTP/1.0
![Page 11: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/11.jpg)
Leverage users’ input
![Page 12: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/12.jpg)
Do-It-Yourself KeyMatch
![Page 13: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/13.jpg)
Search-as-you-Type
![Page 14: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/14.jpg)
Google Search Appliance
Fileshares Intranets Databases Enterprise
applicationsContent
Management
Universal Search: Powered by Google Search Appliance
Documentum
SharePoint
FileNet
Livelink
Any other system
Over 200 file formats
MS Office, PDF, HTML, etc.
Web servers
Portals
Oracle
SQL Server
MySQL
DB2
Sybase
ERP systems
Business intelligence systems
![Page 15: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/15.jpg)
Architecture
![Page 16: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/16.jpg)
SecureReal-time access to business information
Real-Time Access to Business Applications
“The Google Search Appliance with OneBox is our command line interface to our world …adding more content and additional OneBox
interfaces will only increase the value to our organization” – Danny Perri, BOC Gases
Access to real-time business data with OneBox
2008 Q4
Q1 2007 Q3 2007 Q1 2008 Q3 2008Q1 2007 – Q4 2008
![Page 17: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/17.jpg)
①
②③
④
⑤
https://provider…
XML
ProviderServer
Google OneBox for Enterprise
1. User enters a query 2. OneBox “trigger”
determines if the query is relevant to a OneBox module.
3. The appliance makes a secure REST call (https GET request) to the predefined OneBox provider, passing security credentials and other parameters.
4. The provider users the information to determine appropriate, user-specific, secure results to the query, and passes those results back to the appliance in XML.
5. The XML is transformed into HTML based on the XSL template provided in the OneBox module and presented to the user inline with their search results.
![Page 18: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/18.jpg)
Google OneBox for Enterprise
Real-time, secure access to information from the search boxTriggers - Configurable to show OneBox results:
Always On: the module is invoked for every query
Keyword(s): the module is invoked in response to specific keywords
Regular Expression: invoked when query matches a regular expression
Providers Internal: Specialized search content in a separate appliance collection
External: Modules from OneBox module gallery
External: API enables you to create your own modules
![Page 19: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/19.jpg)
OneBox Results Schema<OneBoxResults><resultCode>result_code </resultCode><Diagnostics>failure_reason </Diagnostics><provider>provider_name </provider><searchTerm>query_escape </searchTerm><totalResults>total_results_escape </totalResults><title><urlText>results_title </urlText><urlLink>results_uri </urlLink></title><IMAGE_SOURCE>image_uri </IMAGE_SOURCE><MODULE_RESULT><U>uri </U><Title>title </Title><Field name="name1 ">value1 </Field><Field name="name2 ">value2 </Field><Field name="nameN ">valueN </Field></MODULE_RESULT></OneBoxResults>
![Page 20: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/20.jpg)
Common Security Protocols
HTTP-Basic
NTLM (v1, v2)
LDAP
Advanced Security
Kerberos New!SSO - Oracle (Oblix), CA/SiteMinderX509 Certificates
Custom Authentication & Authorization Support for SAML SPI
Document Level Security Provide the right users with access to the right documents
Security
“Zero” Sign-on
![Page 21: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/21.jpg)
Access Control (NTLM, HTTP Basic, SSO, etc.)
1. User executes search for public and secure content (access=a)
2. User is prompted for credentials (if NTLM/Basic Auth & SSO, user is prompted for both sets of credentials)
3. Users credentials are sent securely to the search appliance
4. Google Search Appliance queries index for all possible results
5. Search appliance makes ‘authorization’ requests of the host content servers with user’s credential set
6. Host servers respond with success or failure
7. Secure results restricted to user are filtered from search results
8. Final search results (filtered) are presented to the user
nonehttp://corp…/welcome/…http basichttp://corp…/policyhtml2ntlmhttp://corp…/preso.ppt1SecureURL#
Results
ssohttp://int…/customer.jspn
Index x
401200 200
DatabaseFile sharesContent Mgmt.
![Page 22: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/22.jpg)
Traditional search technology for millions of docs
+
Disaster Recovery Server
+Patch Deployment Management Server
+
Volume License Management Server
![Page 23: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/23.jpg)
Google Architecture: 10M documents in a box
![Page 24: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/24.jpg)
![Page 25: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/25.jpg)
Health Vine SimplicityPatients
Immediate Family
Community
![Page 26: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/26.jpg)
![Page 27: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/27.jpg)
![Page 28: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/28.jpg)
![Page 29: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/29.jpg)
Where’s your GSA??
The State of Missouri’s use of Google GSA
![Page 30: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/30.jpg)
Where was Missouri?
16 Executive AgenciesNo common web searchNo unified way for citizen’s or businesses find information about State Government.
![Page 31: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/31.jpg)
Where is Missouri??
Centrally Managed Google GSAFront Ends and Collections provided to all State Government entitiesCommon search across all State Government web contentReliable information now easily found by citizens and businesses
![Page 32: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall](https://reader034.fdocuments.net/reader034/viewer/2022051411/5466a610b4af9ffd748b4814/html5/thumbnails/32.jpg)