Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The indexing queue is a table inside the reporting schema: t_client_index_process. When apgupd completes the load into the PostgreSQL, a row is inserted into reporting.t_client_index_process with running_statuspending. The indexing daemon, aidxd, probes the table looking for the highest priority load_id to index. Upon finding an applicable load-idaidxd proceeds to extract all publications associated with that load_id and index the documents into SOLRSolr.

Info

The column priority influences which load_id is to be processed next. Higher priority load-ids are processed before lower priority ones. This can cause out-of-order processing as we will discuss below.

...

Unfortunately, resetting all the rows in reporting.t_client_index_process is not sufficient to re-index the entire contents of the data warehouse. This has to do with the fact that the table is empty upon initial load to the on-site instance as both SOLR and PostgreSQL back file deliveries are in sync. Each the population of reporting.t_client_index_process begins with the first update to the data warehouse. Each new update that is processed is queued. To that end, the most efficient way to queue the entire data warehouse for re-indexing is to select the entire xml.t_patent_document_values table and group by modified_load_id.

...

Changing Field Definitions

Modifying the CLAIMS Direct

...

Solr Schema

The CLAIMS Direct SOLR Solr schema is a compromise between having every piece of the patent document searchable, having enough metadata stored to be retrievable in the search result set as well as indexing certain fields to be efficiently faceted. This compromise keeps the size of the index manageable while allowing efficient return of data from a search. Of course, the concept of manageable is subjective. There could very well be the need to have more fields returned during the search (stored=true) and other fields removed from storage but left searchable. In this use case, we will enable patent citations to be returned during the search. Regardless of CLAIMS Direct SOLR Solr package, we'll start editing the schema under <package>/conf-2.1.2. We are making a very simple change but one that requires a complete re-index of all documents having citations.

...

To make this change effective, SOLR Solr needs to be restarted or the collection reloaded. For the standalone package, you would just need to restart SOLRSolr:

Code Block
<package>/solrctl stop ; <package>/solrctl start

...