Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

asrch is a command-line tool used to search an optional on-site installation of SOLR and extract data either in SOLR response format or complete CLAIMS Direct XML. It is installed as part of the package Alexandria::Client::Tools.

Code Block
asrch [Options ...] query
  --url=s       search URL (excluding /select)
                  (default=http://solr.alexandria.com:8080/alexandria-index/alexandria)
  --raw         output raw SOLR XML
  --count       output total documents found
  --maxrows=i   maximum documents to output
                  this argument is ignored when using --table
  --output=file specify output file
  --dtdpublic=pi  Public Identifier for DTD
  --dtdsystem=si  System Identifier for DTD
  Output Options
  --------
  --archive     archive result set documents into predictable path
                directory structure (Alexandria XML only)
  --archiveroot=dir
                root directory to place result set (default=.)
  --wrapper=s   wrap mulitplemultiple documents in wrapper-named element
                default=patent-documents
  --pretty      indent output
  SOLR Options
  --------
  --solropt=s@  Solr options.
    e.g., --solropt=sort=f1,f2,f3 --solropt=rows=30
    See: http://wiki.apache.org/solr/CommonQueryParameters
  DWH Options
  --------
  --pgdbname     as defined in /etc/alexandria.xml (default=alexandria)
  --dbfunc       extract UDF (default=xml.f_patent_document_s)
  --table=s      If specified, a table of UCIDs/publication_ids is
                 created -- could later be used for indexing
    --truncate  truncate --table if it currently exists
  --help         print this usage and exit

...

Parameter

Description
pgdbnameAs configured in /etc/alexandria.xml, the database entry pointing to the on-site CLAIMS Direct postgresql PostgreSQL instance. The default value is alexandria as this value is pre-configured in /etc/alexandria.xml.
urlThis is the URL of the CLAIMS Direct SOLR instance.

...

The following parameters specify output possibilities.

ParameterDescription
outputOutput results to named file. The default output goes to stdout.
archiveArchive results in predicatble a predictable path structure. See aext.
archiverootThe root directory of teh the archive. See aext.
wrapperDefault top-level XML element. The default is patent-documents.
prettyIndent the output XML.
countOnly output the count of documents.
maxrowsMaximum number of documents to output. If using the --table option, this parameter is ignored.
table

If specified, a table of UCIDs/publication_ids is created.

rawThis parameter specifies SOLR response XML as format.

...

Info

You can return SOLR results in a variety of formats using the query parameter wt. For a detailed list of output format options, see https://cwiki.apache.org/confluence/display/solr/Response+Writers.

Code Block
asrch  --raw \
       --url=http://SOLR-INSTANCE-URL/alexandria-v2.1/alexandria \
       --solropt='wt=xml' \
       --solropt='fl=ucid,score' \
       --solropt='rows=1' \
       --solropt='shards.info=false' \
'loadid:261358'
-> executing search ... 200 OK
 
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <bool name="zkConnected">true</bool>
  <int name="status">0</int>
  <int name="QTime">14</int>
  <lst name="params">
    <str name="q">loadid:261358</str>
    <str name="qt">premium</str>
    <str name="echoParams">all</str>
    <str name="indent">true</str>
    <str name="fl">ucid,score</str>
    <str name="shards.info">false</str>
    <str name="sort">pd desc</str>
    <str name="rows">1</str>
    <str name="wt">xml</str>
  </lst>
</lst>
<result name="response" numFound="4613" start="0" maxScore="9.676081">
  <doc>
    <str name="ucid">JP-2013257331-A</str>
    <float name="score">9.617687</float></doc>
</result>
</response>