Solr Disaster Recovery

Whether you’re hosting Solr on your own or choosing a hosted provider like Websolr, every search cluster should have a disaster recovery plan.

Websolr was built from the ground up to be a highly available system. We leverage a variety of best practices from the industry to achieve this. Websolr deploys all Solr resources in a multi-node, multi-data center configuration to guarantee that your data is safe and secure.

Websolr utilizes the concept of a slice, rather than an index. A slice represents a collection of cores belonging to an index: those with primary data, and their corresponding replicas. Each Solr core is provisioned on a node in a separate AWS Availability Zone, giving us data center isolation. A Websolr slice could experience a complete loss of a primary AWS data center, and continue to operate. This makes Websolr slices extremely fault-tolerant.

To further improve the High Availability of your index, Websolr deploys all Solr resources behind an AWS Application Load Balancer. This allows you to connect to a singular URL, while a smart proxy routes requests to the best available core.

When a Websolr slice does experience a node loss, AWS Auto Scaling Groups will immediately begin spinning up the replacement instance that will auto-bootstrap into your configured Solr configuration and version. Once the node has successfully provisioned, Websolr will provision a new shadow replica core on it to replace what was lost.

In the off chance that Websolr (and much of the internet with it) experiences an entire loss of an AWS EC2 region, all of your Solr data is backed up to encrypted buckets in AWS’s S3 system, which has a reliability guarantee of 99.99% uptime and 99.999999999% durability. If such a failure happens, Websolr’s staff will work with your team to understand where you will be relocating your application, and can then initiate a restore process into an index in the same AWS Region while maintaining your existing DNS connections.