Solr Cloud Architecture

4 min readOct 28, 2020

Introduction:

Apache Solr is a very powerful search platform, It is quick and easy to have it up and running on your local machine, on a server or in a container. But once the development/ POC phase is completed and when you are ready to have your enterprise applications start using the core there are few questions that would pop up in your mind, Is Solr scalable? Is it fault-tolerant? Is it highly available? The answer is Yes, Yes, and Yes. Solr Cloud is the correct choice of architecture that would answer all the 3 architectural concerns.

Before we start, the name Solr Cloud has nothing to do with Cloud computing. Solr Cloud is a group of solr servers clustered together running the same instance of Solr and each host hosting a portion of a collection data.

Solr Cloud provides Scalability, High availability, and Fault-tolerant features. It supports distributed search, distributed indexing, scaling up by adding more nodes as replicas when needed, and remove nodes when its time to scale down.

Architecture of Solr Cloud:

Solr cloud is a Leader-replica architecture and is not a master-slave architecture. The leader election is handled by Zookeeper which means that there is no manual intervention for leader election.

Zookeeper is the main component in Solr Cloud architecture. Zookeeper ensemble comes prepackaged with Solr Cloud, If you do not have the capacity to host external zookeeper’s you can always use the internal ensemble. Apart from handling the leader election, Zookeeper also provides the following 2 main features.

It works as a load balancer and routes the incoming service requests to the least busy available solr nodes.
It also works as a centralized location for storing solr configuration and Configuration files (schema.xml,solrconfig.xml etc…) for all the Collections.

In Solr Cloud, each collection can have multiple shards and each shard can have multiple replicas this configuration is handled when we create a collection. As the data grows and as the shards become bigger you can break the shards further into smaller shards. There is a little bit of operational work when you do this but you have the option available.

How are the search requests handled?

Apart from the external load balancer provided by Zookeeper, Solr Cloud has its own internal load balancer to route the requests.

When a search request comes in Zookeeper would send the request to the solr node that is least busy, But it doesn’t have any idea if the node it’s routing the request to hosts the shard that can fulfill the search request. If the solr node that gets the request has the data it would send the data back to the client and fulfills the search request. In the event, where the solr node that gets the request doesn’t have the data then it acts as a co-ordinator and sends the request to the node that hosts the shard with data. Once the coordinator node gets the data it is then sent back to the client.

How are the update/insert requests handled in Solr Cloud?

Update and Inserts are handled differently when compared to a search request in Solr Cloud architecture. While the search requests can be handled by any node in the cluster the Inserts, updates, and deletes can only be handled by the leader of that shard in the cluster.

When an update/Insert request comes into the cluster and if the node that gets the request is not a leader the request is internally re-routed to the leader of the shard and the leader commits the data. Once the data is committed the leader forwards the committed data to the replicas of that shard. These are also logged in the transactional logs which are used in the event of recovery and also when new replicas are created.

Pros & Cons:

By now you must be thinking about some pro’s and con’s that this architecture presents to us. I will layout my version

Pro:

This architecture is highly scalable, Highly available, and Fault-tolerant.
Since this is a leader-replica architecture the complexity is low when compared to a master-slave architecture. And the leader election is transparent.
Mutations in a transaction can be sent to any node in the cluster.

Con:

The cost of implementing this solution could increase based on the number of servers you choose. So review your application needs before choosing this solution.
Since Zookeeper is a core component in the architecture the complexity increases. But the good news is you don’t need to be a pro in managing zookeeper basic operational knowledge would suffice.

Conclusion:

Solr Cloud is a great solution if you are planning on using solr to build an enterprise search solution. You might have to spend some time understanding the architecture and testing all your Resiliency test cases such as removing a replica, adding a replica, taking down leaders on both Solr and Zookeeper, etc..