Sunspot

What is Sunspot?

Sunspot is a Solr client written in Ruby and based on the RSolr project. Sunspot provides an interface between an application (usually Ruby on Rails) and a Solr index. This interface allows the application to send and query data very easily, using Solr as the search engine. A few lines of code are all that are needed to get a basic search function set up.

This document assumes Rails is the underlying framework.

Installation

Add the following to the Gemfile:

gem 'sunspot_rails'

Note on Sunspot Solr Binary

There is an optional gem, `sunspot_solr`, which offers a pre-packaged Solr distribution for local development. Some online guides recommend installing this and give examples of using it. This information can be very confusing to new users, and we do not recommend it. There are no command line tools or Rake tasks for starting, stopping or restarting a Websolr index. Additionally, the pre-packaged Solr distribution may be a very different version from your Websolr index, which means it will behave differently in a local environment.

Once the Gemfile has been updated, install the gems with:

bundle install

Congratulations, Sunspot is now installed and ready to use!

Configuring Sunspot

Next, you will need to inform Sunspot about the location of your Websolr index. If you have an environment variable called WEBSOLR_URL that contains the URL to your index, then Sunspot will automatically read that. Heroku and Manifold users will have environments with this variable pre-configured.

If you are working in an environment other than Heroku or Manifold, then you can set this variable up manually. Log in to Websolr and navigate to the index dashboard. The URL will be front and center:

The easiest way to create the application environment variable is to simply export it. Copy the URL to the clipboard and run:

$ export WEBSOLR_URL="paste the URL here"

That will create the variable, populated with the Solr URL in the local environment, which will subsequently make it available to Rails/Sunspot with the default settings.

Advanced Configuration (Optional)

If you want to have a little more control over the process, then you can create the default configuration file with:

rails generate sunspot_rails:install

This will create a file called ./config/sunspot.yml. This file will look like this:

production:
  solr:
    hostname: localhost
    port: 8983
    log_level: INFO

development:
  solr:
    hostname: localhost
    port: 8982
    log_level: INFO

test:
  solr:
    hostname: localhost
    port: 8981
    log_level: WARNING

Suppose your production index URL is https://us-east-1.websolr.com/solr/0a1b2c3d4e5f. You could create an entry like this in the ./config/sunspot.yml file:

production:
  solr:
    hostname: us-east-1.websolr.com
    port: 443
    scheme: https
    path: /solr/0a1b2c3d4e5f

This would allow you to configure a different index for different environments. So, for example, if you have a Solr instance running on localhost:8983 (or whatever) in a development or test environment, Sunspot can use that instead of the production index.

Set up the objects

Any model that you want to have indexed and searchable by Solr needs to be configured with a searchable block. Here is a contrived example:

class Post < ActiveRecord::Base
  searchable do
    text    :title
    text    :body
    string  :permalink
    integer :category_id
    time    :published_at
  end
end

In this example, the Post model has a number of attributes the developer wants to index into Solr: title, body, a permalink, etc. Within the searchable block, the first token indicates the data type Solr should use to index the data, followed by the attribute, list or block.

For example:

text :title

This line instructs Sunspot that the Post#title data should be indexed into Solr using a text data type.

From the official documentation: "text" fields will be full-text searchable. Other fields (e.g., integer and string) can be used to scope queries.

Indexing the Data

At this point, Sunspot has been configured to use your Websolr index and the models have been prepared with searchable blocks. Before the data can be searched, it must be indexed into Solr. Sunspot provides a couple rake tasks for this:

# Push all searchable data into Solr:
bundle exec rake sunspot:reindex

# Push all searchable data for the Post model into Solr
Post.reindex

# Push specific objects into Solr
Sunspot.index! [post1, item2, person3]

Refer to the official documentation for more examples and details.

Indexing can take anywhere from a few seconds to many hours, depending on a variety of factors. For most users, it will take a few minutes or less.

Once indexing has completed, assuming no exceptions were raised, the data should be available in Solr. You can inspect the index in a couple ways. One is to check the Solr index dashboard:

Updates may take a few minutes to register

Note that the dashboard is not a real-time display. It can take 10-15 minutes for the counters to update. So if you index a number of documents and check the dashboard right away, they may not have been counted yet.

A quick way to check is by querying Solr directly, either with a tool like curl or via a browser. Simply append /select to the URL, and you will get back some meta data about the index, as well as some documents:

# The wt=json, indent=true, rows=0 parameters are not required, but enhance readability:
curl -s "https://us-east-1.websolr.com/solr/a1b2c3d4e5/select?wt=json&indent=true&rows=0"
{
   "responseHeader":{
     "status":0,
     "QTime":0},
   "response":{"numFound":1234,"start":0,"docs":[]
   }}

The numFound:1234 indicates that the index has 1,234 documents in it.

Set up Search

At this point, Sunspot is aware of the Websolr index, the models have been configured, and the data is sitting in Solr. The last step is to query the data.

Sunspot provides a #search method that will construct a search query using Solr's DSL, pass the request to Websolr, where it will be processed by your index. Solr will return a collection of IDs corresponding to matching records, and Sunspot will use this list to render the matched objects.

Here is a basic example:

Post.search { fulltext 'pizza' }

Based on the searchable block defined earlier, this will search both the title and body fields for any mention of "pizza". A slightly more complicated version of this passes a block to the fulltext method:

Post.search do
  fulltext 'pizza' do
    boost_fields :title => 2.0
  end
end

This performs the same search, but boosts documents with "pizza" in the title to a higher position in the search results.

There is so much more that Sunspot can do. Please read the official documentation for more details and features.