Revolutionizing enterprise web development

27
Revolutionizi ng enterprise web development Searching with Solr

description

Searching with Solr. Revolutionizing enterprise web development. What is Solr ?. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. What’s Lucene ? - PowerPoint PPT Presentation

Transcript of Revolutionizing enterprise web development

Page 1: Revolutionizing enterprise web development

Revolutionizingenterprise web

development

Searching with Solr

Page 2: Revolutionizing enterprise web development

What is Solr?

• Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project.

• What’s Lucene?

• Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Page 3: Revolutionizing enterprise web development

What is Solr?

• Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling.

• Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

• See http://lucene.apache.org/ for more info.

Page 4: Revolutionizing enterprise web development

Why Solr?

• Why Solr or why Solr with Drupal?Core Drupal Search Solr SearchReasonable performance only for small sites

Quality performance for all installations, including large deployments

Poor scalability: Relies on Drupal’s DB to handle all search results

Quality scalability: Single-purpose servers independent of Drupal

Few configuration options (better in D7 than D6)

Significant configuration options out of the box, including configurable filters and indexed material

Few search options Significant search options out of the box (based on filters above)

No multi-site capability Multi-site (even non-Drupal sites) capabilities

Page 5: Revolutionizing enterprise web development

Where does it fit?

• Sits beside your application servers in the stack

• PHP communicates with the Solr servers (Apachesolr modules handles this for you)

• Retrieve: URL strings

• Push: XML packets

Page 6: Revolutionizing enterprise web development

Solr Setup

• Options

• Self-Hosted

• http://lucene.apache.org/solr/

• Look for “Download Solr here”

• Service

• Acquia

• http://acquia.com/products-services/acquia-search

Page 7: Revolutionizing enterprise web development

Solr Setup

• Example directory

• Start.jar

• java -jar start.jar &> /dev/null &

• Solr directory

• Conf directory

• Schema.xml

• Solrconfig.xml

Page 8: Revolutionizing enterprise web development

Solr Setup

• Solr admin accessible here:http://localhost:8983/solr/admin

Page 9: Revolutionizing enterprise web development

Solr Setup

• Schema.xml

• Primarily handles what is indexed

Page 10: Revolutionizing enterprise web development

Solr Setup

• Solrconfig.xml

• Handles general configuration.

• Might need to edit it for replication or if you plan to do file handling on the Solr server.

Page 11: Revolutionizing enterprise web development

Drupal + Solr

• Core Module: Apachesolr

• Optional Modules:

• Apachesolr_multisitesearch

• Self-explanatory

• Apachesolr_attachments

• Requires an additional Solr component (Tika). Allows full-text indexing of docs.

• Apachesolr_views

• Sorta…& maybe someday

Page 12: Revolutionizing enterprise web development

Drupal + Solr

• BasicDrupalSettings

Page 13: Revolutionizing enterprise web development

Drupal + Solr

• Examples of filters that can be surfaced

Page 14: Revolutionizing enterprise web development

Example: Drupal.org

Page 15: Revolutionizing enterprise web development

Example: Drupal.org

Page 16: Revolutionizing enterprise web development

Solr hooks

• Add new data to the index

• By default, all data displayed on the node view is indexed. We can also set up additional information to be indexed and/or filtered even if the information is not on the node page.

• It’s worth taking a look at apachesolr_node_to_document (in apachesolr.index.inc)

Page 17: Revolutionizing enterprise web development

Solr hooks

• hook_apachesolr_update_index (&$document, $node, $namespace)

• Allows a module to change the contents of the $document object before it is sent to the Solr Server

Page 18: Revolutionizing enterprise web development

Solr hooks

• Altering the query (3 possible methods)

• hook_apachesolr_prepare_query(&$query, &$params, $caller)

• Occurs before the query is cached

• Modifications you make can be used by others

Page 19: Revolutionizing enterprise web development

Solr hooks

Page 20: Revolutionizing enterprise web development

Solr hooks

• Altering the query (3 possible methods)

• hook_apachesolr_modify_query(&$query, &$params, $caller)

• Occurs after the query is cached

• Modifications that you don’t want other modules to inherit

Page 21: Revolutionizing enterprise web development

Solr hooks

Page 22: Revolutionizing enterprise web development

Solr hooks

• Altering the query (3 possible methods)

• <caller>_finalize_query (&$query, &$params)

• Occurs after the query is cached

• Technically only for use by modules originating Solr queries (aka custom Solr search invocations, not the search page)

Page 23: Revolutionizing enterprise web development

Solr hooks

• hook_apachesolr_search_result_alter(&$doc, &$extra)

• Allows for modification of each search result independently

Page 24: Revolutionizing enterprise web development

Solr hooks

• hook_apachesolr_process_results(&results)

• Allows for modification of all search results

Page 25: Revolutionizing enterprise web development

Solr hooks

• No technically a hook, but worth noting that search theming is identical to search module.

• search-result.tpl.php

• search-results.tpl.php

• If you pass the same values from Solr as you had via node_load, the theming template becomes interchangeable.

Page 26: Revolutionizing enterprise web development

Summary• Apachesolr module provides a replacement for

core Drupal search with better performance, scalability, and configuration than Drupal default.

• Solr requires a separate service running on Jetty or Tomcat.

• hook_apachesolr_update_index provides a way to change what goes into the index.

• hook_prepare_query, hook_modify_query and <caller>_finalize_query allow return modifications.

• hook_apachesolr_search_result_alter & hook_apachesolr_process_results allow for result modification. Theming is the same as core.

Page 27: Revolutionizing enterprise web development

Thank YouBill O’Connor, CTOd.o: csevb10t: csevb10e: [email protected]