Indexing into Solr is controlled by an indexing daemon: aidxd
. This daemon probes PostgreSQL for available load-id(s) to index. This "queue" is represented by the table reporting.t_client_index_process
. See Data Warehouse Design for more information on the structure of this table. When processing is successfully completed into PostgreSQL, apgupd
registers a new, index-ready load-id. The indexing daemon aidxd
recognizes this as an available load-id and begins the indexing process for that particular load-id. aidxd
is installed as part of the CLAIMS Direct Client Tools. Please see the Client Tools Installation Instructions for more information about how to install this tool.
If you have chosen to deploy Solr as Type 3, --core
must be specified corresponding to your subscription level.
- Basic:
--core=alexandria-standard
- Premium:
--core=alexandria-premium
- Premium-Plus:
--core=alexandria-premium-plus
Usage
aidxd [ Options ... ] --nodaemon don't put process into background --once only process one load-id --pidfile=s specify location of PIDFILE (default=/var/log/alexandria/aidxd.pid) --interval=i n-seconds between probing for new loads --tmp=dir specify temporary file storage (default=/tmp) --clean remove temporary processing directory --batchsize=i maximum number of documents to parallelize --nthreads=i maximum number of processes to parallelize --facility=s logging facility (default=aidxd) --help print this usage and exit -------- --idxversion= 21 --idxcls=s Alexandria::DWH::Index::DocumentEx --dbfunc=s specify an alternative extraction function (default=xml.f_patent_document_s) --idxexe=s specify indexing script (default aidx) --quiet suppress output from sub-process NOTE: suppressing this output will make it difficult to track down errors originating in --idxexe --pgdbname=s source postgresql instance as defined in /etc/alexandria.xml --solrdbname=s base url for index (default=alexandria) --core=s index core (default=alexandria) --tolerate tolerate indexing errors and attempt again --autooptimize issue an 'optimize' call to Solr after optinterval continuous load-id(s) --optinterval # of load-id(s) after which an optimize is issued (default=100) --optsegments=n optimize down to n-segments (default=16) --nostatistics do not gather indexing statistics
Options
Argument | Description | Default Value |
---|---|---|
--nodaemon --once | When specified, aidxd will run in the foreground. If --once is given, --nodaemon is implied and only one load-id will be processed. | N/A |
--interval | Time (in seconds) between successive indexing queue probes | 10 |
--tmp | Temporary processing area which holds the transformed XML before being POST ed to Solr | /tmp |
--batchsize | Number of documents to extract for indexing | 250 |
--nthreads | Number of parallel extraction processes. This value should be adjusted depending on available processing power on the PostgreSQL data warehouse server. A rule of thumb would be to set this to the number of cores. | 8 |
--idxversion | The version of the index | 21 |
--idxcls | The indexing class used in XML transformation | Alexandria::DWH::Index::DocumentEx |
--dbfunc | Specify an alternative extraction function | xml.f_patent_document_s |
--pgdbname | Source PostgreSQL instance as defined in /etc/alexandria.xml | alexandria |
--solrdbname --core | Base URL for indexing. If different from the default, it should have an index entry in /etc/alexandria.xml. | alexandria |
--tolerate | (v2.6-1) Tolerate a wide variety of errors and re-try failed index | N/A |
--autooptimize | DO NOT USE | N/A |
Daemon Execution
Starting
# v2.1: all defaults $INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx # v2.1: Only process one load-id $INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx --once
Pausing/Resuming/Stopping
# pause the daemon kill -s USR1 <pid> # resume processing kill -s USR2 <pid> # stop daemon entirely kill -s INT <pid>