Ferret A Ruby Search Engine

Post on 02-Nov-2014

3.471 views 0 download

Tags:

description

 

Transcript of Ferret A Ruby Search Engine

FerretA Ruby Search Engine

Brian Sam-Bodden

Agenda

• What is Ferret?

• Concepts

• Fields

• Indexing

• Installing Ferret

Agenda

• The Recipe

• Documents

• Ferret::Index::Index

• FQL

• Ferret in you App

Agenda

• Ferret in Rails

• Resources

What is Ferret?

• Information Retrieval (IR) Library

• Full-featured Text Search Engine

• Inspired on the Search Engine

• Port to Ruby by David Balmain

What is Ferret?

• Initially a 100% pure Ruby port

• Since 0.9 many core functions are implemented in C

• Fast! Now Faster than Lucene ;-)

Concepts

Concepts

• Index : Sequence of documents

Concepts

• Index : Sequence of documents

• Document : Sequence of fields

Concepts

• Index : Sequence of documents

• Document : Sequence of fields

• Field : Named sequence of terms

Concepts

• Index : Sequence of documents

• Document : Sequence of fields

• Field : Named sequence of terms

• Term : A text string, keyed by field name

Fields of a Document in an Index

Fields of a Document in an Index

• Fields are individually searchable units that are:

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

• Indexed: Inverted to rapidly find all Documents containing any of the Terms

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

• Indexed: Inverted to rapidly find all Documents containing any of the Terms

• Tokenized: Individual Terms extracted are indexed

Fields of a Document in an Index

• Fields are individually searchable units that are:

• Stored: The original Terms of the fields are store

• Indexed: Inverted to rapidly find all Documents containing any of the Terms

• Tokenized: Individual Terms extracted are indexed

• Vectored: Frequency and location of Terms are stored

It’s all about Indexing

• Indexing is the processing of a source document into plain text tokens that Ferret can manipulate

• For any non-plaintext sources such as PDF, Word, Excel you need to:

• Extract

• Analyze

Installing Ferret

Installing Ferret

gem install ferret

Installing Ferret

Installing Ferret

Installing Ferret

}

Installing Ferret

}Pick the latest version for your platform

The Recipe

The Recipe

1. Create some Documents

The Recipe

1. Create some Documents

2. Create an Index

The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries

Example DocumentsCreate some Documents

Example DocumentsCreate some Documents

“Any String is a Document”

Example DocumentsCreate some Documents

Example DocumentsCreate some Documents

[“This”, “is also”, “a document”]

Example DocumentsCreate some Documents

Example DocumentsCreate some Documents

Ferret::Index::IndexCreate an Index

Ferret::Index::Index

• Indexes are encapsulated by the class

Create an Index

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

Create an Index

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

Create an Index

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

Create an Index

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

➡ index = Ferret::I.new(:path = > ‘/somepath’)

Create an Index

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

➡ index = Ferret::I.new(:path = > ‘/somepath’)

• Or, completely in Memory

Create an Index

Ferret::Index::Index

• Indexes are encapsulated by the class

➡ Ferret::Index::Index

• Use the alias Ferret::I for convenience

• Index can be persistent

➡ index = Ferret::I.new(:path = > ‘/somepath’)

• Or, completely in Memory

➡ index = Ferret::I.new()

Create an Index

Ferret::Index::Index

• Index provides the add_document method

• It also provides the << alias

• Adding documents is then as easy as:

➡ index << “This is a document”

➡ index << {:first => “Bob”, :last => “Smith”}

Adding Documents to the Index

Ferret::Index::IndexPerform some Queries

Ferret::Index::Index

• Index provides the search and search_each methods

Perform some Queries

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

Perform some Queries

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

➡ search(query, options = {})

Perform some Queries

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

➡ search(query, options = {})

• The search_each method provides an iterator block

Perform some Queries

Ferret::Index::Index

• Index provides the search and search_each methods

• search method takes a query and a an optional set of parameters:

➡ search(query, options = {})

• The search_each method provides an iterator block

➡ search_each(query, options = {}) {|doc, score| ... }

Perform some Queries

Playing with Ferret in irb

Playing with Ferret in irb

Ferret Query Language

• Ferret own Query Language, FQL is a powerful way to specify search queries

• FQL supports many query types, including:

• Term• Phrase• Field• Boolean

• Range• Wild• Fuzz

Index.explain

• The explain method of Index describes how a document score against a query

• Very useful for debugging

• and for learning how Ferret works

Index.explain

Ferret in your App

File System

Gather Data

Database Web

Manual Input

Ap

pli

cati

onF

erre

t

User

Get User’s Query

Present Search Results

Index Documents Search Index

Index

Ferret in Rails

• Acts As Ferret is an ActiveRecord extension

• Available as a plugin

• Provides a simplified interface to Ferret

• Maintained by Jens Kramer

Ferret in Rails

• Adding an index to an ActiveRecord model is as simple as:

Ferret in Rails

• Adding an index to an ActiveRecord model is as simple as:

Ferret in Rails• Simple model has two searchable

fields title and body:

Ferret in Rails

• After a quick rake db:migrate we now have some data to play with

• Fire up the Rails Console and let’s see what acts_as_ferret can do for our models

Ferret in Rails

Want more?

• Ferret is improving constantly

• Acts As Ferret seems to catch up quickly

• Real-life usage seems to require some good engineering on your part

• Background indexing

• Hot swap of indexes?

Want more?

• We only covered the simplest constructs in Ferret

• Ferret’s API provides enough flexibility for the most demanding searching needs

Online Resources

• http://ferret.davebalmain.com

• http://lucene.apache.org

• http://lucenebook.com

• http://projects.jkraemer.net/acts_as_ferret

In-Print Resources

Thanks!