{"_id":"5787dd5b14b5952200e43141","__v":3,"project":"56be3387be55991700c3ca0d","parentDoc":null,"user":"5633ec9b35355017003ca3f2","version":{"_id":"56be3388be55991700c3ca10","project":"56be3387be55991700c3ca0d","__v":8,"createdAt":"2016-02-12T19:33:28.313Z","releaseDate":"2016-02-12T19:33:28.313Z","categories":["56be3389be55991700c3ca11","57646709b0a8be1900fcd0d8","5764671c89da831700590782","57646d30c176520e00ea8fe5","5764715d4f867c0e002bc8e3","57698fa2e93bfd190028815c","576c2af16c24681700c902da","5787da96b008c91900aae865"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"56be3389be55991700c3ca11","__v":2,"pages":["56be338abe55991700c3ca13","56be34fa37d84017009de5f7"],"project":"56be3387be55991700c3ca0d","version":"56be3388be55991700c3ca10","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-02-12T19:33:29.389Z","from_sync":false,"order":2,"slug":"documentation","title":"Documentation"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-07-14T18:43:39.538Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":4,"body":"Typically the biggest bottlenecks to indexing speed are all at the application's end, the largest of which is simply fetching data from the database, followed next by serializing that data into XML. Posting that data over the network to Solr is fairly negligible if you are located in the same EC2 region as our servers, but can be another factor if you are not.\n\nFor the database bottlenecks, the worst offenders are typically due to indexing associated objects in other tables. You should consider writing a reindexing task to ensure that you're using joins for eager loading to avoid making multiple trips to the database per record. You should also make sure that you have correct database indic​es set up for the relevant foreign key joins.\n\nFor Ruby on Rails applications, something like this is a good approximation for the Sunspot `rake sunspot:reindex` task that can be optimized a bit for your application:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"Post.includes(:author).find_each(:batch_size => 10) do |posts|\\n  posts.each { |post| post.solr_index }\\nend\",\n      \"language\": \"ruby\"\n    }\n  ]\n}\n[/block]\nIf you're using a background job processor like DelayedJob or Resque, it can really help to queue up individual indexing jobs and use multiple workers to index in parallel. This also has the benefit of speeding up your application because your users don't have to wait for Solr updates to take place during the normal usage of your application.\n\nWith DelayedJob, a simple approach could look like this:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"class Post\\n  searchable do\\n    # …\\n  end\\n  handle_asynchronously :solr_index\\nend\",\n      \"language\": \"ruby\"\n    }\n  ]\n}\n[/block]\nAnecdotally, we know of setups using dozens of background job processors to reindex many millions of records in an hour or two.","excerpt":"Learn how to optimize the rate at which your application indexes documents to Solr.","slug":"improve-indexing-speed","type":"basic","title":"Improve indexing speed"}

Improve indexing speed

Learn how to optimize the rate at which your application indexes documents to Solr.

Typically the biggest bottlenecks to indexing speed are all at the application's end, the largest of which is simply fetching data from the database, followed next by serializing that data into XML. Posting that data over the network to Solr is fairly negligible if you are located in the same EC2 region as our servers, but can be another factor if you are not. For the database bottlenecks, the worst offenders are typically due to indexing associated objects in other tables. You should consider writing a reindexing task to ensure that you're using joins for eager loading to avoid making multiple trips to the database per record. You should also make sure that you have correct database indic​es set up for the relevant foreign key joins. For Ruby on Rails applications, something like this is a good approximation for the Sunspot `rake sunspot:reindex` task that can be optimized a bit for your application: [block:code] { "codes": [ { "code": "Post.includes(:author).find_each(:batch_size => 10) do |posts|\n posts.each { |post| post.solr_index }\nend", "language": "ruby" } ] } [/block] If you're using a background job processor like DelayedJob or Resque, it can really help to queue up individual indexing jobs and use multiple workers to index in parallel. This also has the benefit of speeding up your application because your users don't have to wait for Solr updates to take place during the normal usage of your application. With DelayedJob, a simple approach could look like this: [block:code] { "codes": [ { "code": "class Post\n searchable do\n # …\n end\n handle_asynchronously :solr_index\nend", "language": "ruby" } ] } [/block] Anecdotally, we know of setups using dozens of background job processors to reindex many millions of records in an hour or two.