Installation
Hardware
As CLAIMS Direct Solr is a pre-configured, bundled distribution of Apache Solr, it can be deployed on any number of nodes
(individual instances). A group of nodes function to expose a collection
. Further, multiple collections
could be searched across the distribution.
There are many scenarios for a CLAIMS Direct deployment that range from indexing the entire content of CLAIMS Direct XML to the sparse indexing of certain fields and ranges of publication dates for application-specific usage. There could also be specific QoS requirements: minimum supported queries per second, average response time et al. All of these factors play a role in planning for a CLAIMS Direct Solr deployment. Generally speaking, a full index with the entire content of CLAIMS Direct XML requires, at a minimum:
Number | Type | Specs |
---|---|---|
8 | Solr search server nodes 1-3 housing the ZooKeeper quorum | minimum:
|
1 | processing server | minimum:
|
The ZooKeeper quorum could be placed together on Solr search servers or, optionally, you could break out the ZooKeeper configuration into an additional 3 separate servers.
Number | Type | Specs |
---|---|---|
3 | ZooKeeper configuration server | minimum:
|
The following diagram represents the minimum architecture required to support a full CLAIMS Direct index.
Unlike the PostgreSQL data warehouse, IFI CLAIMS does not provide a back file for Solr. All indexing is self-contained on-site.
Installation
All CLAIMS Direct software is to be installed from the applicable repository.
Distribution | URL |
---|---|
CentOS 7 | yum -y install https://repo.ificlaims.com/ifi-claims-direct/centos/7/x86_64/ifi-claims-direct-1.0-1.el7.x86_64.rpm |
Rocky 8 | yum -y install https://repo.ificlaims.com/ifi-claims-direct/rocky/8/x86_64/ifi-claims-direct-1.0-1.el8.x86_64.rpm |
Installing ZooKeeper
sudo yum install alexandria-solr-zookeeper
alexandria-solr-zookeeper
comes pre-configured to support up to a 5-node ZooKeeper quorum. You should install the alexandria-solr-zookeeper
on at least 3 nodes.
Configuring ZooKeeper
The main configuration file for ZooKeeper can be found as /opt/alexandria/zookeeper/conf/zoo.cfg
. For each node, the dataDir
and server
entries need to reflect the node number and quorum members respectively.
Your IP addresses may be different.
Node | dataDir | server |
---|---|---|
1 | dataDir=/var/lib/alexandria/zookeeper-data/zk1 | server.1=10.234.1.91:3181:4180 |
2 | dataDir=/var/lib/alexandria/zookeeper-data/zk2 | server.2=10.234.1.92:3181:4180 |
3 | dataDir=/var/lib/alexandria/zookeeper-data/zk3 | server.3=10.234.1.93:3181:4180 |
The format of the server
entry is as follows:
server.x
x
being the node number. This corresponds to amyid
file located underdataDir
for each node.- IP address
The IP address of the node - Port 1
Port to connect followers to the leader - Port 2
Port is used for leader election
The ports should not be changed.
A typical configuration for a 3-node quorum follows.
Node 1
dataDir=/var/lib/alexandria/zookeeper-data/zk1 # ... server.1=10.234.1.91:3181:4180 server.2=10.234.1.92:3181:4180 server.3=10.234.1.93:3181:4180
Node 2
dataDir=/var/lib/alexandria/zookeeper-data/zk2 # ... server.1=10.234.1.91:3181:4180 server.2=10.234.1.92:3181:4180 server.3=10.234.1.93:3181:4180
Node 3
dataDir=/var/lib/alexandria/zookeeper-data/zk3 # ... server.1=10.234.1.91:3181:4180 server.2=10.234.1.92:3181:4180 server.3=10.234.1.93:3181:4180
Enable and Start ZooKeeper
The following should be done on all ZooKeeper nodes.
# enable automatic startup sudo systemctl enable alexandria-solr-zookeeper # start service sudo systemctl start alexandria-solr-zookeeper # check status sudo systemctl status alexandria-solr-zookeeper
Installing SolrCloud
On all Solr nodes:
sudo yum install alexandria-solr-cloud
Data Directory
alexandria-solr-cloud
collections reside under /var/lib/alexandria/solr-data
. Collection sizes vary depending on the subscription level, number of nodes, shards etc. as defined below. A typical Premium Plus index, e.g., will initially be 8TB. The following table provides minimum and recommended disk requirements:
Number of Nodes | Minimum | Recommended |
---|---|---|
4 | 4000GB | 8000GB |
8 | 1000GB | 2000GB |
16 | 500GB | 1000GB |
Configuring SolrCloud
The following table lists the applicable nodes and their functions in a typical 8-node environment:
IP | Function | Description |
---|---|---|
10.234.1.91 | node-1: * Solr/ZooKeeper | basic Solr node with ZooKeeper |
10.234.1.92 | node-2: * Solr/ZooKeeper | basic Solr node with ZooKeeper |
10.234.1.93 | node-3: * Solr/ZooKeeper | basic Solr node with ZooKeeper |
10.234.1.94 | node-4: * Solr | basic Solr node |
10.234.1.95 | node-5: * Solr | basic Solr node |
10.234.1.96 | node-6: * Solr | basic Solr node |
10.234.1.97 | node-7: * Solr | basic Solr node |
10.234.1.98 | node-8: * Solr | basic Solr node |
Initial Template Node
On node-1, edit /opt/alexandria/solr/etc/solr-alexandria-vars.
Variable | Value | Description |
---|---|---|
ALEXANDRIA_SOLR_CLOUD | 1 | Run Solr in cloud mode. A value of 0 will run Solr in standalone mode. |
ALEXANDRIA_SOLR_CLOUD_NUMSHARDS | 4 | The number of shards for a complete |
| 10.234.1.91,10.234.1.92,10.234.1.93,10.234.1.94, 10.234.1.95,10.234.1.96,10.234.1.97,10.234.1.98 | The IP addresses of each node. Note: this should be one line in the configuration. |
| 2 | The number of replicas per shard. |
| 1 | This limits the number of shards on a single node to 1. |
| 10.234.1.91 | The designated host to receive administrative commands: collection creation, status etc. This can be any node in the cluster. |
ALEXANDRIA_SOLR_JVM_MEM | 8g | This is the Java heap setting. This should be at least 8G. Generally, 50-60% of total RAM should be allocated. |
ALEXANDRIA_SOLR_ZK_NODES | 10.234.1.91,10.234.1.92,10.234.1.93 | The IP addresses of each member of the ZooKeeper quorum. |
ALEXANDRIA_SOLR_ZK_HOST | 10.234.1.91 | The designated host to receive ZooKeeper administrative commands. This could be any member of the ZooKeeper quorum. |
A note about nodes, shards and replication
In a typical SolrCloud cluster, redundancy and performance are achieved by distributing or sharding
an index. How one distributes the shards across the available live nodes is critical. The above configuration creates 4 shards and 4 replicas spread over 8 nodes. Solr is smart enough to distribute these correctly. The above 4x2 cluster is a minimal configuration. A more performant cluster could have 8x2 (16 nodes) or, as CLAIMS Direct Search API, 16x2 (32 nodes).
Enable and Start Solr
The following should be done on all Solr nodes.
# enable automatic startup systemctl enable alexandria-solr-cloud # start service systemctl status alexandria-solr-cloud # check status systemctl status alexandria-solr-cloud
Viewing Solr and ZooKeeper Status
After starting ZooKeeper and Solr, you can visit the internal Solr web page: http://10.234.1.91:8983. Under Cloud → Nodes
all available nodes should be visible as well as Cloud → ZK Status
showing all ZooKeepers.
Bootstrapping the ZooKeeper Quorum
On the template node (node-1):
cd /opt/alexandria/solr/ # upload Solr configuration files ./bin/alexandria-solr-zookeeper-init Bootstrapping 10.234.1.91 ... -> name: acdidx-v3.0.0 -> directory: /var/lib/alexandria/solr-data/conf
Creating the Solr alexandria
Collection
cd /opt/alexandria/solr/ # Create the alexandria collection ./bin/alexandria-solr-cloud-admin alexandria Creating (action=CREATE) on http://10.234.1.91:8983/solr/admin/collections ... -> name=alexandria -> numShards=8 -> replicationFactor=2 -> maxShardsPerNode=1 -> collection.config=acdidx-v3.0.0 -> property.config=solrconfig-alexandria.xml
Once Solr has been installed, set up the indexing daemon aidxd.