Paging results is cumbersome and inefficient. In this next segment I'd like to talk about simple and complex sorting. Sorting, used effectively with the rows
parameter can push relevant documents into the first page of results. Generally, you can sort on any indexed field but you can also utilize query boosting and functions to influence sort order.
Missing Value Behavior
CLAIMS Direct is configured to return empty fields at the top when asc
is the direction and the bottom when desc
is the direction.
Static Field Sorts
The most common field to sort on will undoubtedly be the date of publication, pd desc
. This pushes the newly published documents to the top of the result set. CLAIMS Direct uses pd desc
as default sort order instead of score
. If two documents share the same publication date (or any sort field value), the tie is broken by the internal Lucene document IDs ascending.
/search/query?q=ab_en:sonar&rows=5&fl=ucid,[docid],pd
"docs" : [ { "[docid]" : 5536680, "pd" : 20160225, "ucid" : "US-20160054444-A1" }, { "[docid]" : 1274637, "pd" : 20160224, "ucid" : "EP-2986998-A1" }, { "[docid]" : 4335577, "pd" : 20160223, "ucid" : "US-9268020-B2" }, { "[docid]" : 4986794, "pd" : 20160218, "ucid" : "US-20160049143-A1" }, { "[docid]" : 5088703, "pd" : 20160218, "ucid" : "US-20160047906-A1" } ]
Multiple sort criteria are also possible and used mostly to stabilize the initial sort results. The syntax for the sort
parameter is a comma-separated list of field/direction pairs, e.g., pd desc,ad asc
. Field and direction are separated by a space.
/search/query?q=ab_en:sonar&rows=5&fl=ucid,pd&sort=pd desc,ucid desc
Dynamic Field Sort
As mentioned, CLAIMS Direct uses a default sort of pd desc
. You are, of course, free to use the dynamic score
field to mimic the default Solr distribution behavior.
/search/query?q=ab_en:sonar&rows=5&fl=ucid,score,pd&sort=score desc
"docs" : [ { "pd" : 19960322, "ucid" : "FR-2724734-A1", "score" : 3.781385 }, { "pd" : 20120523, "ucid" : "EP-2454606-A1", "score" : 3.3196893 }, { "pd" : 20151229, "ucid" : "US-9223022-B2", "score" : 3.3196893 }, { "pd" : 20150910, "ucid" : "US-20150253425-A1", "score" : 3.3196893 }, { "pd" : 20150521, "ucid" : "AU-2010273841-B2", "score" : 3.2852356 } ]
Random Sort (for Random Results)
CLAIMS Direct uses a utility field to allow random sorting. The field is input as rnd_n
where n
is any random integer, e.g., rnd_1234
. Specifying the same value for n
will yield the same sort results so the randomness is only as random as the input integer. Returning random results (sorting on rnd_n
) is an efficient way to sample a wide variety of data.
/search/query?q=ab_en:sonar&rows=3&fl=ucid,score,pd&sort=rnd_1234 desc /search/query?q=ab_en:sonar&rows=3&fl=ucid,score,pd&sort=rnd_12345 desc
"docs" : [ { "pd" : 20150902, "ucid" : "CN-204605667-U", "score" : 1.6596402 }, { "pd" : 20110224, "ucid" : "WO-2010125029-A4", "score" : 1.1842703 }, { "pd" : 19910719, "ucid" : "FR-2657063-A1", "score" : 1.760531 } ] // "docs" : [ { "pd" : 19720119, "ucid" : "GB-1260387-A", "score" : 0.4986458 }, { "pd" : 20160114, "ucid" : "US-20160011310-A1", "score" : 2.3957791 }, { "pd" : 20130425, "ucid" : "WO-2013056893-A1", "score" : 0.9474162 } ],
Sorting on Functions
A previous segment discussed using functions as pseudo fields to return in the result set. You can also use the results of a function as sort criteria. An interesting use-case for sorting on a calculated value is measuring the latency between application filing date and grant date. You can calculate this value by subtracting ad_d
from pd_d
(tdate
field types of filing and publication dates). The following examples use field aliasing with the fl
parameter as well as sorting by function to return documents with low latency between filing and grant (asc
) and high latency (desc
) displayed in days (86400000 milliseconds in a day).
/search/query?rows=5&q=ab_en:sonar +ifi_publication_type:G&fl=ucid,ad,pd,days_to_grant:div(sub(pd_d,ad_d),86400000)&sort=sub(pd_d,ad_d) asc /search/query?rows=5&q=ab_en:sonar +ifi_publication_type:G&fl=ucid,ad,pd,days_to_grant:div(sub(pd_d,ad_d),86400000)&sort=sub(pd_d,ad_d) desc
"docs" : [ { "pd" : 20131205, "ad" : 20130118, "ucid" : "DE-102013000802-B3", "days_to_grant" : 321.0005 }, { "pd" : 20160126, "ad" : 20150304, "ucid" : "US-9244168-B2", "days_to_grant" : 328.0001 }, { "pd" : 20140904, "ad" : 20130904, "ucid" : "DE-102013014640-B3", "days_to_grant" : 365.00064 }, { "pd" : 20160216, "ad" : 20141208, "ucid" : "US-9261391-B2", "days_to_grant" : 434.99976 }, { "pd" : 19931007, "ad" : 19920421, "ucid" : "DE-4213121-C1", "days_to_grant" : 534.00006 } ] // "docs" : [ { "pd" : 19890719, "ad" : 19730214, "ucid" : "GB-1605319-A", "days_to_grant" : 5999 }, { "pd" : 19781220, "ad" : 19640511, "ucid" : "GB-1536653-A", "days_to_grant" : 5336.0005 }, { "pd" : 20160209, "ad" : 20100426, "ucid" : "US-9255982-B2", "days_to_grant" : 2115 }, { "pd" : 20160223, "ad" : 20120210, "ucid" : "US-9268020-B2", "days_to_grant" : 1473.9987 }, { "pd" : 20150217, "ad" : 20110322, "ucid" : "US-RE45379-E1", "days_to_grant" : 1428.0006 } ]