HTTP 429: Too Many Requests

We enforce a limit on the number of simultaneous requests that can be made to a server (in the case of Websolr, an index); these are called concurrent connections. This limit is enforced to mitigate against DoS attacks, whether intentional or unintentional. The limit scales with plan level on our shared tier plans. Single tenant plans don’t have concurrent limits beyond the limitations of the underlying hardware.

How do concurrent connections affect thoughput?

Assume a typical request takes 100ms. With a 5x limit, you could serve 50 requests per second. Reducing latency to, say, 50ms (perhaps with field caching or reducing complexity) would double the available throughput to 100 requests per second.

Besides just upgrading plan level, one method to improving thoughput is to batch your updates. This entails sending updates in batches rather than one at a time. If you have a bunch of workers handling document updates, you should decrease that number to ensure that you don’t have too many trying to send documents at once. This frees up connections to handle search traffic. Keep in mind that both searches and updates count as concurrent connections, so if you have 5 workers sending updates and a user submits a search, something will fail with an HTTP 429.

Another strategy for avoiding concurrency limits is to use routing headers. This allows you to send searches to your replica and updates to your primary. This can be helpful in minimizing latency by distributing load over both Solr cores. The caveat here is that this approach only applies to indices that support replication, and where near real-time results aren’t a requirement.

Managing Traffic

Our general recommendations for managing high volumes of traffic are:

  • When performing a bulk reindex, update operations should send documents in batches of between 100 and 1,000 per request, from one or two indexing processes.
  • Ongoing incremental updates (e.g., generated by user activity) should be processed through a queue, so they can be paused, throttled or retried as needed.
  • High volumes of search traffic can also benefit from a layer of caching on the application side.

Beyond that, we can also increase our concurrent request limits for dedicated cluster customers (Titanium and custom plans) for very high traffic sites which may require hundreds of concurrent requests.