Paging results is cumbersome and inefficient. In this next segment I'd like to talk about simple and complex sorting. Sorting, used effectively with the
rows parameter can push relevant documents into the first page of results. Generally, you can sort on any indexed field but you can also utilize query boosting and functions to influence sort order.
Missing Value Behavior
CLAIMS Direct is configured to return empty fields at the top when
asc is the direction and the bottom when
desc is the direction.
Static Field Sorts
The most common field to sort on will undoubtedly be the date of publication,
pd desc. This pushes the newly published documents to the top of the result set. CLAIMS Direct uses
pd desc as default sort order instead of
score. If two documents share the same publication date (or any sort field value), the tie is broken by the internal Lucene document IDs ascending.
Multiple sort criteria are also possible and used mostly to stabilize the initial sort results. The syntax for the
sort parameter is a comma-separated list of field/direction pairs, e.g.,
pd desc,ad asc. Field and direction are separated by a space.
Dynamic Field Sort
As mentioned, CLAIMS Direct uses a default sort of
pd desc. You are, of course, free to use the dynamic
score field to mimic the default Solr distribution behavior.
Random Sort (for Random Results)
CLAIMS Direct uses a utility field to allow random sorting. The field is input as
n is any random integer, e.g.,
rnd_1234. Specifying the same value for
n will yield the same sort results so the randomness is only as random as the input integer. Returning random results (sorting on
rnd_n) is an efficient way to sample a wide variety of data.
Sorting on Functions
A previous segment discussed using functions as pseudo fields to return in the result set. You can also use the results of a function as sort criteria. An interesting use-case for sorting on a calculated value is measuring the latency between application filing date and grant date. You can calculate this value by subtracting
tdate field types of filing and publication dates). The following examples use field aliasing with the
fl parameter as well as sorting by function to return documents with low latency between filing and grant (
asc) and high latency (
desc) displayed in days (86400000 milliseconds in a day).
In this first of a series of blogs about Solr result sets I'd like to talk about returned fields, both static and dynamic.
Any field that is stored in Solr can be returned in the result set. The following table shows all available stored fields:
|ab_*||all abstracts||mixed: alexandria_text, alexandria_asian_text||true|
|an||application number (tokenized)||alexandria_token||true|
|anorig||original, patent office format application number||alexandria_token||true|
|anseries||US application series code||alexandria_string||true|
|anucid||standardized filing identifier||string||true|
|icl,cpcl,uscl,ficl||classifications suitable for display||string||false|
|ifi_name_current_id||current assignee identifier||string||true|
|ifi_name_original_id||original assignee identifier||string||true|
|loadid||meta load identifier||tint||true|
|nclms||number of claims||tint||true|
|nindepclms||number of independent claims||tint||true|
|pa||applicants/assignees (all formats)||alexandria_token||true|
|pn||patent number (tokenized)||alexandria_token||true|
|pri||priority filing number (tokenized)||alexandria_string||true|
|prid||earliest priority filing date||tint||true|
|priorig||original, patent office format priority number||alexandria_string||true|
|timestamp||last modification stamp||tdate||true|
|ttl_*||all titles||mixed: alexandria_text, alexandria_asian_text||true|
|ucid||unique character identifier||string||true|
The shorthand to return all static fields is the asterisk
As with all examples, we discuss results in JSON format.
Some things to notice: Multi-valued fields are returned as JSON arrays. Also, as
copyFields containing multiple values, duplicate entries may appear.
Dynamic fields are fields that are generated either internally, like the relevance
score or fields that can be generated by computing values using function queries. Dynamic fields must be listed explicitly in the
fl parameter, e.g.,
fl=score. Below, we request the static
ucid field together with the relevance
score for 5 results.
Another type of dynamic field makes use of function queries. A simple, yet contrived, example would be to return the number of dependent claims. CLAIMS Direct stores the total number of claims in
nclms as well as the number of independent claims in
nindepclms, both as
tint. Simple subtraction yields the desired result.
Field aliasing allows individual fields in the result set to be renamed. This functionality is particularly useful to give meaningful names to pseudo fields created by function queries as in the example above. Any field can be renamed using aliases. The syntax is: alias-field-name:original-field-name
ucid is the only exception. When trying to alias the
ucid, an HTTP 500 will be returned.