Installation

Hardware

As CLAIMS Direct Solr is a pre-configured, bundled distribution of Apache Solr, it can be deployed on any number of nodes (individual instances). A group of nodes function to expose a collection. Further, multiple collections could be searched across the distribution.

There are many scenarios for a CLAIMS Direct deployment that range from indexing the entire content of CLAIMS Direct XML to the sparse indexing of certain fields and ranges of publication dates for application-specific usage. There could also be specific QoS requirements: minimum supported queries per second, average response time et al. All of these factors play a role in planning for a CLAIMS Direct Solr deployment. Generally speaking, a full index with the entire content of CLAIMS Direct XML requires, at a minimum:

NumberTypeSpecs
8

Solr search server

nodes 1-3 housing the ZooKeeper quorum

minimum:
  • CPU: 2 cores
  • RAM: 16GB
  • Disk: 1TB
1processing serverminimum:
  • CPU: 4 cores
  • RAM: 16GB
  • Disk: 250GB

The ZooKeeper quorum could be placed together on Solr search servers or, optionally, you could break out the ZooKeeper configuration into an additional 3 separate servers.

NumberTypeSpecs
3ZooKeeper configuration serverminimum:
  • CPU: 1 core
  • RAM: 2GB
  • Disk: 50GB

The following diagram represents the minimum architecture required to support a full CLAIMS Direct index.

Primary Architecture


Unlike the PostgreSQL data warehouse, IFI CLAIMS does not provide a back file for Solr. All indexing is self-contained on-site.

Installation

All CLAIMS Direct software is to be installed from the applicable repository.

Installing ZooKeeper

sudo yum install alexandria-solr-zookeeper

alexandria-solr-zookeeper comes pre-configured to support up to a 5-node ZooKeeper quorum. You should install the alexandria-solr-zookeeper on at least 3 nodes.

Configuring ZooKeeper

The main configuration file for ZooKeeper can be found as /opt/alexandria/zookeeper/conf/zoo.cfg. For each node, the dataDir  and server entries need to reflect the node number and quorum members respectively.

Your IP addresses may be different.

NodedataDirserver
1dataDir=/var/lib/alexandria/zookeeper-data/zk1server.1=10.234.1.91:3181:4180
2dataDir=/var/lib/alexandria/zookeeper-data/zk2server.2=10.234.1.92:3181:4180
3dataDir=/var/lib/alexandria/zookeeper-data/zk3server.3=10.234.1.93:3181:4180

The format of the server entry is as follows:

  • server.x 
    x being the node number. This corresponds to a myid file located under dataDir for each node.
  • IP address
    The IP address of the node
  • Port 1
    Port to connect followers to the leader
  • Port 2
    Port is used for leader election

The ports should not be changed.

A typical configuration for a 3-node quorum follows.

Node 1

dataDir=/var/lib/alexandria/zookeeper-data/zk1
# ...
server.1=10.234.1.91:3181:4180
server.2=10.234.1.92:3181:4180
server.3=10.234.1.93:3181:4180

Node 2

dataDir=/var/lib/alexandria/zookeeper-data/zk2
# ...
server.1=10.234.1.91:3181:4180
server.2=10.234.1.92:3181:4180
server.3=10.234.1.93:3181:4180

Node 3

dataDir=/var/lib/alexandria/zookeeper-data/zk3
# ...
server.1=10.234.1.91:3181:4180
server.2=10.234.1.92:3181:4180
server.3=10.234.1.93:3181:4180

Enable and Start ZooKeeper

The following should be done on all ZooKeeper nodes.

# enable automatic startup
sudo systemctl enable alexandria-solr-zookeeper

# start service
sudo systemctl start alexandria-solr-zookeeper

# check status
sudo systemctl status alexandria-solr-zookeeper

Installing SolrCloud

On all Solr nodes:

sudo yum install alexandria-solr-cloud

Data Directory

alexandria-solr-cloud collections reside under /var/lib/alexandria/solr-data. Collection sizes vary depending on the subscription level, number of nodes, shards etc. as defined below. A typical Premium Plus index, e.g., will initially be 8TB. The following table provides minimum and recommended disk requirements:

Number of NodesMinimumRecommended
44000GB8000GB
81000GB2000GB
16500GB1000GB

Configuring SolrCloud

The following table lists the applicable nodes and their functions in a typical 8-node environment:

IP
Function
Description
10.234.1.91node-1: * Solr/ZooKeeperbasic Solr node with ZooKeeper
10.234.1.92node-2: * Solr/ZooKeeperbasic Solr node with ZooKeeper
10.234.1.93node-3: * Solr/ZooKeeperbasic Solr node with ZooKeeper
10.234.1.94node-4: * Solrbasic Solr node
10.234.1.95node-5: * Solrbasic Solr node
10.234.1.96node-6: * Solrbasic Solr node
10.234.1.97node-7: * Solrbasic Solr node
10.234.1.98node-8: * Solrbasic Solr node

Initial Template Node

On node-1, edit /opt/alexandria/solr/etc/solr-alexandria-vars. 

Variable
Value
Description
ALEXANDRIA_SOLR_CLOUD 1Run Solr in cloud mode. A value of 0 will run Solr in standalone mode.
ALEXANDRIA_SOLR_CLOUD_NUMSHARDS4

The number of shards for a complete alexandria collection. This should correspond to the number of nodes in the cluster divided by the replication factor.

ALEXANDRIA_SOLR_CLOUD_NODES

10.234.1.91,10.234.1.92,10.234.1.93,10.234.1.94,

10.234.1.95,10.234.1.96,10.234.1.97,10.234.1.98

The IP addresses of each node. Note: this should be one line in the configuration.

ALEXANDRIA_SOLR_CLOUD_REPLICATION_FACTOR 

2

The number of replicas per shard.

ALEXANDRIA_SOLR_CLOUD_MAX_SHARDS_PER_NODE 

1

This limits the number of shards on a single node to 1.

ALEXANDRIA_SOLR_HOST 

10.234.1.91

The designated host to receive administrative commands: collection creation, status etc. This can be any node in the cluster.
ALEXANDRIA_SOLR_JVM_MEM8gThis is the Java heap setting. This should be at least 8G. Generally, 50-60% of total RAM should be allocated.
ALEXANDRIA_SOLR_ZK_NODES 10.234.1.91,10.234.1.92,10.234.1.93The IP addresses of each member of the ZooKeeper quorum.
ALEXANDRIA_SOLR_ZK_HOST 10.234.1.91The designated host to receive ZooKeeper administrative commands. This could be any member of the ZooKeeper quorum.

A note about nodes, shards and replication

In a typical SolrCloud cluster, redundancy and performance are achieved by distributing or sharding an index. How one distributes the shards across the available live nodes is critical. The above configuration creates 4 shards and 4 replicas spread over 8 nodes. Solr is smart enough to distribute these correctly. The above 4x2 cluster is a minimal configuration. A more performant cluster could have 8x2 (16 nodes) or, as CLAIMS Direct Search API, 16x2 (32 nodes).

Enable and Start Solr

The following should be done on all Solr nodes.

# enable automatic startup
systemctl enable alexandria-solr-cloud

# start service
systemctl status alexandria-solr-cloud

# check status
systemctl status alexandria-solr-cloud

Viewing Solr and ZooKeeper Status

After starting ZooKeeper and Solr, you can visit the internal Solr web page: http://10.234.1.91:8983. Under Cloud → Nodes all available nodes should be visible as well as Cloud → ZK Status showing all ZooKeepers.

Bootstrapping the ZooKeeper Quorum

On the template node (node-1):

cd /opt/alexandria/solr/

# upload Solr configuration files
./bin/alexandria-solr-zookeeper-init
 Bootstrapping 10.234.1.91 ...
  -> name: acdidx-v3.0.0
  -> directory: /var/lib/alexandria/solr-data/conf

Creating the Solr alexandria Collection

cd /opt/alexandria/solr/

# Create the alexandria collection
./bin/alexandria-solr-cloud-admin alexandria
 
Creating (action=CREATE) on http://10.234.1.91:8983/solr/admin/collections ...
  -> name=alexandria
  -> numShards=8
  -> replicationFactor=2
  -> maxShardsPerNode=1
  -> collection.config=acdidx-v3.0.0
  -> property.config=solrconfig-alexandria.xml

Next Steps

Once Solr has been installed, set up the indexing daemon aidxd.

  • No labels