You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Overview

Indexing into SOLR is controlled by an indexing daemon: aidxd. This daemon probes PostgreSQL for available load-id(s) to index. This "queue" is represented by the table reporting.t_client_index_process. See Data Warehouse Design for more information on the structure of this table. When processing is successfully completed into PostgreSQL, apgupd registers a new, index-ready load-id. The indexing daemon aidxd recognizes this as an available load-id and begins the indexing process for that particular load-id.

Usage

	aidxd [ Options ... ]
  --nodaemon    don't put process into background
    --once      only process one load-id
  --pidfile=s   specify location of PIDFILE
                  (default=/var/log/alexandria/aidxd.pid)
  --interval=i  n-seconds between probing for new loads
  --tmp=dir     specify temporary file storage (default=/tmp)
  --clean       remove temporary processing directory
  --batchsize=i maximum number of documents to parallelize
  --nthreads=i  maximum number of processes to parallelize
  --facility=s  logging facility (default=aidxd)
  --help        print this usage and exit
  --------
  --idxversion=i 20 | 21 (default=20)
  --idxcls=s    Alexandria::DWH::Index::Document (20) Alexandria::DWH::Index::DocumentEx (21)
  --idxexe=s    specify indexing script (default aidx)
    --quiet     supress output from sub-process
                NOTE: supressing this output will make it difficult
                      to track down errors originating in --idxexe
  --solrdbname=s base url for index (default=alexandria)
    --core=s     index core (default=alexandria)
  --autooptimize issue an 'optimize' call to SOLR after optinterval
                 continuous load-id(s)
    --optinterval  # of load-id(s) after which an optimize is issued
                   (default=100)
    --optsegments=n optimize down to n-segments (default=16)
  --nostatistics do not gather indexing statistics

Options

ArgumentDescriptionDefault Value
--nodaemon
--once
When specified, aidxd will run in the foreground. If --once is given, --nodaemon is implied and only one load-id will be processed.N/A
--intervalTime (in seconds) between successive indexing queue probes10
--tmpTemporary processing area which holds the transformed XML before being POSTed to SOLR/tmp
--batchsizeNumber of documents to extract for indexing250
--nthreadsNumber of parallel extraction processes.
This value should be adjusted depending on available processing power on the PostgreSQL data warehouse server.
A rule of thumb would be to set this to the number of cores.
8
--idxversionThe version of the index. If you are running v2.1, please set this to 21 ( --idxversion=21 )20
--idxcls

Each version has a specific indexing class used in XML transformation. If you are running v2.1, please set this to Alexandria::DWH::Index::DocumentEx
--idxcls=Alexandria::DWH::Index::DocumentEx

Alexandria::DWH::Index::Document
--solrdbname
--core 
Base URL for indexing. If different from the default, it should have an index entry in /etc/alexandria.xmlalexandria
--autooptimizeDO NOT USE 

Daemon Execution

Starting

# all defaults (for v2.0 index)
$INSTALL_BASE/bin/aidxd
 
# v2.1: all defaults
$INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx
 
# v2.1: Only process one load-id
$INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx --once

Pausing/Resuming/Stopping

# pause the daemon
kill -s USR1 <pid>
 
# resume processing
kill -s USR2 <pid>
 
# stop daemon entirely
Kill -s INT <pid>
  • No labels