Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

aext is a tool used to extract full XML documents out of CLAIMS DirectDirect™. It is installed as part of the package Alexandria::Client::Tools.

...

ParameterDescription
rootThe output location of either the batches or, if --archive is specified, the root directory for files in the predictable path structure. The default is the current working directory.
prefixThe standard extract is run in batches. This parameter specifies the prefix for each output file. The default is batch.
archive

Archive the XML into a predictable path structure. The structure is as follows:

<root>/<country>/kind/nnnnnn/nn/nn/nn/ucid.xml

Where:
root: The the destination specified with the --root parameter
country: The the country of publication
kind: The the kind code of the publication
nnnnnnnnnnnn: the 12-digit, zero-padded publication number
ucid.xml: The the full XML file of the publication

For example:
./DE/A1/102008/03/79/61/DE-102008037961-A1.xml

...

ParameterDescription
dbfuncBy default, aext uses the internal PostgreSQL function xml.f_patent_document_s to extract full XML documents. This parameter allows you to specify a custom extract function.

 

Examples

Extracting

...

Using a

...

Specific load-id

The following example uses modified_load_id 261358. The resulting XML batches will be in /tmp and will be prefixed with TEST. The logging output may be different depending on your logging configuration.

Code Block
aext --loadid=261358 --root=/tmp --prefix=TEST
 
##
## the results in /tmp
##
ls -l /tmp/TEST*.xml
-rw-r--r-- 1 root root 56626271 Apr  6 03:52 /tmp/TEST.00000001-00000001.00000500.001491465129.xml
-rw-r--r-- 1 root root 68733642 Apr  6 03:52 /tmp/TEST.00000002-00000501.00001000.001491465129.xml
-rw-r--r-- 1 root root 91214345 Apr  6 03:52 /tmp/TEST.00000003-00001001.00001500.001491465129.xml
-rw-r--r-- 1 root root 91201427 Apr  6 03:52 /tmp/TEST.00000004-00001501.00002000.001491465129.xml
-rw-r--r-- 1 root root 79966094 Apr  6 03:52 /tmp/TEST.00000005-00002001.00002500.001491465129.xml
-rw-r--r-- 1 root root 86552704 Apr  6 03:52 /tmp/TEST.00000006-00002501.00003000.001491465129.xml
-rw-r--r-- 1 root root 35221625 Apr  6 03:52 /tmp/TEST.00000007-00003001.00003500.001491465129.xml
-rw-r--r-- 1 root root 68582397 Apr  6 03:52 /tmp/TEST.00000008-00003501.00004000.001491465129.xml
-rw-r--r-- 1 root root 80311992 Apr  6 03:52 /tmp/TEST.00000009-00004001.00004500.001491465129.xml
-rw-r--r-- 1 root root 17395649 Apr  6 03:52 /tmp/TEST.00000010-00004501.00004613.001491465129.xml

Extracting

...

Using a

...

Table

The following example uses the table parameter. A user-defined table is created with a subset of documents which are then extracted using aext.

...

Code Block
aext --table=mySchema.t_load_261358 --archive
 
##
## abbreviated listing
##
./JP
./JP/B2
./JP/B2/000H07
./JP/B2/000H07/11
./JP/B2/000H07/11/02
./JP/B2/000H07/11/02/83
./JP/B2/000H07/11/02/83/JP-H07110283-B2.xml
./JP/B2/000H07/11/56
./JP/B2/000H07/11/56/83
./JP/B2/000H07/11/56/83/JP-H07115683-B2.xml
etc ...

Extracting

...

Using SQL

This example will take the raw SQL used to populate the private table in the example above, and use it directly as a parameter to aext.

Code Block
aext --sqlq="SELECT t.publication_id from xml.t_patent_document_values as t where t.modified_load_id=261358" \
     --archive \
     --root=/tmp

Extracting

...

Using SOLR

If the optional CLAIMS Direct SOLR instance is installed, the power of SOLR can be used to search, filter, and extract documents. This example will simply pull the same set of documents as above using SOLR query syntax.

Code Block
aext --solrurl=http://SOLR-INSTANCE-URL/alexandria-v2.1/alexandria --archive --solrq='loadid:261358'

[aindex01] [2017/04/06 04:17:11] [DEBUG     ] [preparing extract ...]
[aindex01] [2017/04/06 04:17:11] [DEBUG     ] [creating t_tmp_000000000000_001491466631 ... ]
[aindex01] [2017/04/06 04:17:11] [DEBUG     ] [querying SOLR (http://SOLR-INSTANCE-URL/alexandria-v2.1/alexandria { loadid:261358 })]
[aindex01] [2017/04/06 04:17:12] [DEBUG     ] [running extract ...]
[aindex01] [2017/04/06 04:17:27] [DEBUG     ] [finalizing extract ...]
[aindex01] [2017/04/06 04:17:27] [INFO      ] [extract complete: { 4613 documents across 10 batches in 15.643s (294.894/s) }]

Extracting

...

Using a Custom Database Function

The following example will describe a use-case in which only CPC classifications are of interest. It will make use of a custom extract function created in a private schema.

...