Ferret A Ruby Search Engine
-
Upload
elliando-dias -
Category
Lifestyle
-
view
3.471 -
download
0
description
Transcript of Ferret A Ruby Search Engine
FerretA Ruby Search Engine
Brian Sam-Bodden
Agenda
• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
Agenda
• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
Agenda
• Ferret in Rails
• Resources
What is Ferret?
• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the Search Engine
• Port to Ruby by David Balmain
What is Ferret?
• Initially a 100% pure Ruby port
• Since 0.9 many core functions are implemented in C
• Fast! Now Faster than Lucene ;-)
Concepts
Concepts
• Index : Sequence of documents
Concepts
• Index : Sequence of documents
• Document : Sequence of fields
Concepts
• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
Concepts
• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
Fields of a Document in an Index
Fields of a Document in an Index
• Fields are individually searchable units that are:
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents containing any of the Terms
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents containing any of the Terms
• Tokenized: Individual Terms extracted are indexed
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents containing any of the Terms
• Tokenized: Individual Terms extracted are indexed
• Vectored: Frequency and location of Terms are stored
It’s all about Indexing
• Indexing is the processing of a source document into plain text tokens that Ferret can manipulate
• For any non-plaintext sources such as PDF, Word, Excel you need to:
• Extract
• Analyze
Installing Ferret
Installing Ferret
gem install ferret
Installing Ferret
Installing Ferret
Installing Ferret
}
Installing Ferret
}Pick the latest version for your platform
The Recipe
The Recipe
1. Create some Documents
The Recipe
1. Create some Documents
2. Create an Index
The Recipe
1. Create some Documents
2. Create an Index
3. Adding Documents to the Index
The Recipe
1. Create some Documents
2. Create an Index
3. Adding Documents to the Index
4. Perform some Queries
Example DocumentsCreate some Documents
Example DocumentsCreate some Documents
“Any String is a Document”
Example DocumentsCreate some Documents
Example DocumentsCreate some Documents
[“This”, “is also”, “a document”]
Example DocumentsCreate some Documents
Example DocumentsCreate some Documents
Ferret::Index::IndexCreate an Index
Ferret::Index::Index
• Indexes are encapsulated by the class
Create an Index
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
Create an Index
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
Create an Index
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
Create an Index
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
Create an Index
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
Create an Index
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
➡ index = Ferret::I.new()
Create an Index
Ferret::Index::Index
• Index provides the add_document method
• It also provides the << alias
• Adding documents is then as easy as:
➡ index << “This is a document”
➡ index << {:first => “Bob”, :last => “Smith”}
Adding Documents to the Index
Ferret::Index::IndexPerform some Queries
Ferret::Index::Index
• Index provides the search and search_each methods
Perform some Queries
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
Perform some Queries
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
➡ search(query, options = {})
Perform some Queries
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
➡ search(query, options = {})
• The search_each method provides an iterator block
Perform some Queries
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
➡ search(query, options = {})
• The search_each method provides an iterator block
➡ search_each(query, options = {}) {|doc, score| ... }
Perform some Queries
Playing with Ferret in irb
Playing with Ferret in irb
Ferret Query Language
• Ferret own Query Language, FQL is a powerful way to specify search queries
• FQL supports many query types, including:
• Term• Phrase• Field• Boolean
• Range• Wild• Fuzz
Index.explain
• The explain method of Index describes how a document score against a query
• Very useful for debugging
• and for learning how Ferret works
Index.explain
Ferret in your App
File System
Gather Data
Database Web
Manual Input
Ap
pli
cati
onF
erre
t
User
Get User’s Query
Present Search Results
Index Documents Search Index
Index
Ferret in Rails
• Acts As Ferret is an ActiveRecord extension
• Available as a plugin
• Provides a simplified interface to Ferret
• Maintained by Jens Kramer
Ferret in Rails
• Adding an index to an ActiveRecord model is as simple as:
Ferret in Rails
• Adding an index to an ActiveRecord model is as simple as:
Ferret in Rails• Simple model has two searchable
fields title and body:
Ferret in Rails
• After a quick rake db:migrate we now have some data to play with
• Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
Ferret in Rails
Want more?
• Ferret is improving constantly
• Acts As Ferret seems to catch up quickly
• Real-life usage seems to require some good engineering on your part
• Background indexing
• Hot swap of indexes?
Want more?
• We only covered the simplest constructs in Ferret
• Ferret’s API provides enough flexibility for the most demanding searching needs
Online Resources
• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkraemer.net/acts_as_ferret
In-Print Resources
Thanks!