www.SmarteGuru.com
  Home | Blogs | Recipe | Find a Friend | Discussion Board | Resources | Developers Area | Articles | Health |  Login | Register Now 

ferret integration tutorial using acts_as_ferret in ruby on rails

Ferret: is a Ruby high-performance text search engine library based on Apache Lucene. The ferret full text engine is fast, flexible, but needs more programming than MySQL full text index.

Installing Ferret is easy:

gem install ferret

If you take a look at the guts of what ferret is, you’ll notice it’s a small amount of Ruby code bound into a large amount of C code. Ferret was designed for use with Ruby in mind, not particularly Ruby on Rails, and if you look at the Ferret API you’ll notice there’s a pretty good Ferret Tutorial if you’re hardcore.

Acts_as_Ferret

Luckily for us Rapid Rails developers, Jens Kramer wrote Acts As Ferret, which gives us a very simple interface so we can start creating complex search indexes in very little time.

Acts As Ferret can be installed as a plugin

ruby script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret



Setup

The simplest setup is

class Article < ActiveRecord::Base
acts_as_ferret
end

This is enough to make the full text engine working. Now you can test it in the Rails console

Article.find_by_contents("sybase")

If you have a lot of data to be indexed, be patient with the first run. It is slow, because the index needs to be built.

The acts_as_ferret with no argument indexes automatically all fields of the Article, including arrays of child objects. This behaviour could be overwritten. You can narrow the field set

# Index only id and body, not title
acts_as_ferret :fields => [ 'id', 'body' ]

Or you can widen the field set.

acts_as_ferret :fields => [ 'id', 'body', 'title', 'long_article' ]

# Compute the article length
def long_article
self.body.length > 40
end

Note 1: see usage of long_article in Query syntax below
Note 2: once you change the structure of the index, you need to rebuild it. The easiest way is to stop your application and delete the index/~environment~/~Indexed object~ folder. It will be created automatically with the next search request.
Query syntax
Since ferret is a port of the lucene engine, it uses the same query syntax. I will show only a few queries that you can use.

# Search for pages with "sybase" keyword
Article.find_by_contents("sybase")

# "sybase" and "replication" keywords
Article.find_by_contents("sybase replication")

# "sybase" or "replication"
Article.find_by_contents("sybase OR replication")

# short articles about sybase
Article.find_by_contents("long_article:(false) *:sybase")

# articles containing similar words like "increase"
# will return e.g. increasing
Article.find_by_contents("increase~")

Pagination
Ferret is fast, ferret is flexible, but… it is not an active record object, so you cannot use the pre-defined pagination. You have to implement it on your own. Here is how we did it in your projects Your model would have this function:

def self.full_text_search(q, options = {})
return nil if q.nil? or q==""
default_options = {:limit => 10, :page => 1}
options = default_options.merge options
# get the offset based on what page we're on
options[:offset] = options[:limit] * (options.delete(:page).to_i-1)
# now do the query with our options
results = Member.find_by_contents(q, options)
return [results.total_hits, results]
end

Then in your application.rb:

def pages_for(size, options = {})
default_options = {:per_page => 10}
options = default_options.merge options
pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
return pages
end

Then in your controller:

def search
@query = params[:query]
@total, @members = Member.full_text_search(@query, :page => (params[:page]||1))
@pages = pages_for(@total)
end

Then in your member view you could have the totally normal pagination helpers:
<%= link_to 'Previous page', { :page => @pages.current.previous, :query => @query} if @pages.current.previous %>
<%= pagination_links(@pages, :params => { :query=> @query }) %>
<%= link_to 'Next page', { :page => @pages.current.next, :query => @query} if @pages.current.next %>

Additional Query Strings
There are a few things you can do with your strings. I’m going to go through a couple examples to illustrate.
id: for exp: id:29889
Venkatesh Reddy: topics containing “Venkatesh” or “Reddy”
+Venkatesh +Reddy: topics containing “Venkatesh” and “Reddy”
“Venkatesh vs. Reddy”: topics containing the phrase “Venkatesh vs. Reddy”
Ve*: words beginning with “Ve”
i Venkatesh -Rails: topics containing “Venkatesh”, but not “Rails”
Venkateshinrails~: words similar to “Venkateshinrails”, e.g. “Venkateshonrails”
contact_name:(+Venkatesh +Reddy): topics containing both “Venkatesh” and “Reddy” in the contact name
For more complex

To search for “text” or “test” you can use the search: te?t

Range Queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically: reg_date:[20020101 TO 20030101]

You could also use range queries with non-date fields: title:{Aida TO Carmen}

To search for documents that contain “Venkatesh Reddy” but not “Reddy Venkatesh” use the query: “Venkatesh Reddy” NOT “Reddy Venkatesh”

To search for either “Venkatesh” or “Reddy” and “website” use the query:
(Venkatesh OR Reddy) AND website

To search for a title that contains both the word “return” and the phrase “pink panther” use the query: title:(+return +”pink panther”)

Adding Non-Model or Non-Standard Fields

Lets change our example. Lets say we have Books, and Books have many Authors. What if I want to have my search not only search book titles, but also the book authors.
The obvious problem here, is I’m dealing with two tables. My author’s names are in the Author table and my Book titles are in the Book table. I don’t want to have to search multiple indexes, so how do I do this?
Well, you’d change your /models/book.rb to look like this:

class Book < ActiveRecord::Base
acts_as_ferret :fields => [:title, :author_name]
def author_name
return "#{self.author.first_name} #{self.author.last_name}"
end
end

That’s it! Now when I search books, I search the author name as well!
You can index anything you return in a model function. You can even reformat your fields.
You would do something similar you were using acts_as_taggable and you wanted to make your tags searchable. If book was taggable, then your model might look like this:

class Book < ActiveRecord::Base
acts_as_taggable
acts_as_ferret :fields => [:title, :tags_with_spaces]
def tags_with_spaces
return self.tag_names.join(" ")
end
end

If you were using the acts_as_taggable plugin you might not even need the extra function, and use “:tag_list” in the ferret field list.
Either way, now your tags get searched when you search the index.

Sorting
Everything we’ve done so far is getting sorted by the search score, which is what you’re going to want most of the time. But what about when you want to sort by an alternative field such as book title?
The first thing you need to do is make sure the field you are trying to sort by is untokenized. Unfortunately, by making a field untokenized I’m not indexing it to be searchable anymore. This makes for a little funky coding.
So if I wanted to sort by title in the above example, but I also want to search by title, I would do this:

acts_as_ferret :fields => {
:title => {},
:tags_with_spaces => {},
:title_for_sort => {:index => :untokenized}
}
def title_for_sort
return self.title
end

Remember, if you change something in this acts_as_ferret line you’ll want to regenerate your index. You can do this by deleting your /index directory and restarting your server.
I would then be able to do the following code to get returned results in title order:

s = Ferret::Search::SortField.new(:title_for_sort, :reverse => false)
@total, @members = Book.full_text_search(@query,
{:page => (params[:page]||1), :sort => s})

Lastly, if you want to sort by date, you may need to convert the date field to an integer.

Highlighting
Highlighting search result like google does

<% @books.each do |book| %>
<%= book.highlight("Jason", :field => :author_name, :num_excerpts => 1, :pre_tag => "", :post_tag => "") %>
<% end %>

Enjoy !!!

Related Posts

Tags: , , , ,


Viewed: 475 views

Leave a Reply

Comment moderation is enabled. Your comment may take some time to appear.

Home - About Us - Help - Terms and Conditions - Site Map - Link to Us - Resources - Contact Us
Google Rank Calculator | Suggest developer resource | Suggest Article
All rights reserved © 2007 SmarteGuru.com.