Overview
Indexing into SOLR is controlled by an indexing daemon: aidxd
. This daemon probes PostgreSQL for available load-id(s) to index. This "queue" is represented by the table reporting.t_client_index_process
. See Data Warehouse Design for more information on the structure of this table. When processing is successfully completed into PostgreSQL, apgupd
registers a new, index-ready load-id. The indexing daemon aidxd
recognizes this as an available load-id and begins the indexing process for that particular load-id.
Usage
aidxd [ Options ... ] --nodaemon don't put process into background --once only process one load-id --pidfile=s specify location of PIDFILE (default=/var/log/alexandria/aidxd.pid) --interval=i n-seconds between probing for new loads --tmp=dir specify temporary file storage (default=/tmp) --clean remove temporary processing directory --batchsize=i maximum number of documents to parallelize --nthreads=i maximum number of processes to parallelize --facility=s logging facility (default=aidxd) --help print this usage and exit -------- --idxversion=i 20 | 21 (default=20) --idxcls=s Alexandria::DWH::Index::Document (20) Alexandria::DWH::Index::DocumentEx (21) --idxexe=s specify indexing script (default aidx) --quiet supress output from sub-process NOTE: supressing this output will make it difficult to track down errors originating in --idxexe --solrdbname=s base url for index (default=alexandria) --core=s index core (default=alexandria) --autooptimize issue an 'optimize' call to SOLR after optinterval continuous load-id(s) --optinterval # of load-id(s) after which an optimize is issued (default=100) --optsegments=n optimize down to n-segments (default=16) --nostatistics do not gather indexing statistics
Options
Argument | Description | Default Value |
---|---|---|
--nodaemon --once | When specified, aidxd will run in the foreground. If --once is given, --nodaemon is implied and only one load-id will be processed. | N/A |
--interval | Time (in seconds) between successive indexing queue probes | 10 |
--tmp | Temporary processing area which holds the transformed XML before being POST ed to SOLR | /tmp |
--batchsize | Number of documents to extract for indexing | 250 |
--nthreads | Number of parallel extraction processes. This value should be adjusted depending on available processing power on the PostgreSQL data warehouse server. A rule of thumb would be to set this to the number of cores. | 8 |
--idxversion | The version of the index. If you are running v2.1, please set this to 21 ( --idxversion=21 ) | 20 |
--idxcls | Each version has a specific indexing class used in XML transformation. If you are running v2.1, please set this to Alexandria::DWH::Index::DocumentEx | Alexandria::DWH::Index::Document |
--solrdbname --core | Base URL for indexing. If different from the default, it should have an index entry in /etc/alexandria.xml | alexandria |
--autooptimize | DO NOT USE |
Daemon Execution
Starting
# all defaults (for v2.0 index) $INSTALL_BASE/bin/aidxd # v2.1: all defaults $INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx # v2.1: Only process one load-id $INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx --once
Pausing/Resuming/Stopping
# pause the daemon kill -s USR1 <pid> # resume processing kill -s USR2 <pid> # stop daemon entirely Kill -s INT <pid>