Blog from February, 2016

Understanding the Solr Result Set - Sort Parameter

Paging results is cumbersome and inefficient. In this next segment I'd like to talk about simple and complex sorting. Sorting, used effectively with the rows parameter can push relevant documents into the first page of results. Generally, you can sort on any indexed field but you can also utilize query boosting and functions to influence sort order.

Missing Value Behavior

 CLAIMS Direct is configured to return empty fields at the top when asc is the direction and the bottom when desc is the direction.

Static Field Sorts

The most common field to sort on will undoubtedly be the date of publication, pd desc. This pushes the newly published documents to the top of the result set. CLAIMS Direct uses pd desc as default sort order instead of score. If two documents share the same publication date (or any sort field value), the tie is broken by the internal Lucene document IDs ascending.

/search/query?q=ab_en:sonar&rows=5&fl=ucid,[docid],pd
         "docs" : [
            {
               "[docid]" : 5536680,
               "pd" : 20160225,
               "ucid" : "US-20160054444-A1"
            },
            {
               "[docid]" : 1274637,
               "pd" : 20160224,
               "ucid" : "EP-2986998-A1"
            },
            {
               "[docid]" : 4335577,
               "pd" : 20160223,
               "ucid" : "US-9268020-B2"
            },
            {
               "[docid]" : 4986794,
               "pd" : 20160218,
               "ucid" : "US-20160049143-A1"
            },
            {
               "[docid]" : 5088703,
               "pd" : 20160218,
               "ucid" : "US-20160047906-A1"
            }
         ]

Multiple sort criteria are also possible and used mostly to stabilize the initial sort results. The syntax for the sort parameter is a comma-separated list of field/direction pairs, e.g., pd desc,ad asc. Field and direction are separated by a space.

/search/query?q=ab_en:sonar&rows=5&fl=ucid,pd&sort=pd desc,ucid desc

Dynamic Field Sort

As mentioned, CLAIMS Direct uses a default sort of pd desc. You are, of course, free to use the dynamic score field to mimic the default Solr distribution behavior.

/search/query?q=ab_en:sonar&rows=5&fl=ucid,score,pd&sort=score desc
         "docs" : [
            {
               "pd" : 19960322,
               "ucid" : "FR-2724734-A1",
               "score" : 3.781385
            },
            {
               "pd" : 20120523,
               "ucid" : "EP-2454606-A1",
               "score" : 3.3196893
            },
            {
               "pd" : 20151229,
               "ucid" : "US-9223022-B2",
               "score" : 3.3196893
            },
            {
               "pd" : 20150910,
               "ucid" : "US-20150253425-A1",
               "score" : 3.3196893
            },
            {
               "pd" : 20150521,
               "ucid" : "AU-2010273841-B2",
               "score" : 3.2852356
            }
         ]

Random Sort (for Random Results)

CLAIMS Direct uses a utility field to allow random sorting. The field is input as rnd_n where n is any random integer, e.g., rnd_1234. Specifying the same value for n will yield the same sort results so the randomness is only as random as the input integer. Returning random results (sorting on rnd_n) is an efficient way to sample a wide variety of data.

/search/query?q=ab_en:sonar&rows=3&fl=ucid,score,pd&sort=rnd_1234 desc
/search/query?q=ab_en:sonar&rows=3&fl=ucid,score,pd&sort=rnd_12345 desc
         "docs" : [
            {
               "pd" : 20150902,
               "ucid" : "CN-204605667-U",
               "score" : 1.6596402
            },
            {
               "pd" : 20110224,
               "ucid" : "WO-2010125029-A4",
               "score" : 1.1842703
            },
            {
               "pd" : 19910719,
               "ucid" : "FR-2657063-A1",
               "score" : 1.760531
            }
         ]
//
         "docs" : [
            {
               "pd" : 19720119,
               "ucid" : "GB-1260387-A",
               "score" : 0.4986458
            },
            {
               "pd" : 20160114,
               "ucid" : "US-20160011310-A1",
               "score" : 2.3957791
            },
            {
               "pd" : 20130425,
               "ucid" : "WO-2013056893-A1",
               "score" : 0.9474162
            }
         ],

Sorting on Functions

A previous segment discussed using functions as pseudo fields to return in the result set. You can also use the results of a function as sort criteria. An interesting use-case for sorting on a calculated value is measuring the latency between application filing date and grant date. You can calculate this value by subtracting ad_d from pd_d (tdate field types of filing and publication dates). The following examples use field aliasing with the fl parameter as well as sorting by function to return documents with low latency between filing and grant (asc) and high latency (desc) displayed in days (86400000 milliseconds in a day).

/search/query?rows=5&q=ab_en:sonar +ifi_publication_type:G&fl=ucid,ad,pd,days_to_grant:div(sub(pd_d,ad_d),86400000)&sort=sub(pd_d,ad_d) asc
/search/query?rows=5&q=ab_en:sonar +ifi_publication_type:G&fl=ucid,ad,pd,days_to_grant:div(sub(pd_d,ad_d),86400000)&sort=sub(pd_d,ad_d) desc
         "docs" : [
            {
               "pd" : 20131205,
               "ad" : 20130118,
               "ucid" : "DE-102013000802-B3",
               "days_to_grant" : 321.0005
            },
            {
               "pd" : 20160126,
               "ad" : 20150304,
               "ucid" : "US-9244168-B2",
               "days_to_grant" : 328.0001
            },
            {
               "pd" : 20140904,
               "ad" : 20130904,
               "ucid" : "DE-102013014640-B3",
               "days_to_grant" : 365.00064
            },
            {
               "pd" : 20160216,
               "ad" : 20141208,
               "ucid" : "US-9261391-B2",
               "days_to_grant" : 434.99976
            },
            {
               "pd" : 19931007,
               "ad" : 19920421,
               "ucid" : "DE-4213121-C1",
               "days_to_grant" : 534.00006
            }
         ]
//
         "docs" : [
            {
               "pd" : 19890719,
               "ad" : 19730214,
               "ucid" : "GB-1605319-A",
               "days_to_grant" : 5999
            },
            {
               "pd" : 19781220,
               "ad" : 19640511,
               "ucid" : "GB-1536653-A",
               "days_to_grant" : 5336.0005
            },
            {
               "pd" : 20160209,
               "ad" : 20100426,
               "ucid" : "US-9255982-B2",
               "days_to_grant" : 2115
            },
            {
               "pd" : 20160223,
               "ad" : 20120210,
               "ucid" : "US-9268020-B2",
               "days_to_grant" : 1473.9987
            },
            {
               "pd" : 20150217,
               "ad" : 20110322,
               "ucid" : "US-RE45379-E1",
               "days_to_grant" : 1428.0006
            }
         ]
Understanding the Solr Result Set - fl parameter

In this first of a series of blogs about Solr result sets I'd like to talk about returned fields, both static and dynamic.

Stored Fields

Any field that is stored in Solr can be returned in the result set. The following table shows all available stored fields:

NameDescriptionTypeIndexed
ab_*all abstractsmixed: alexandria_text, alexandria_asian_texttrue
adfiling datetinttrue
anapplication number (tokenized)alexandria_tokentrue
anorigoriginal, patent office format application numberalexandria_tokentrue
anseriesUS application series codealexandria_stringtrue
anucidstandardized filing identifierstringtrue
eclaECLA classificationalexandria_stringtrue
famfamily identifierstringtrue
ftermF-Termsalexandria_stringtrue
icl,cpcl,uscl,ficlclassifications suitable for displaystringfalse
ifi_name_current_idcurrent assignee identifierstringtrue
ifi_name_original_idoriginal assignee identifierstringtrue
ifi_paIFI assigneesalexandria_tokentrue
invinventorsalexandria_tokentrue
loadidmeta load identifiertinttrue
nclmsnumber of claimstinttrue
nindepclmsnumber of independent claimstinttrue
paapplicants/assignees (all formats)alexandria_tokentrue
pdpublication datetinttrue
pnpatent number (tokenized)alexandria_tokentrue
pripriority filing number (tokenized)alexandria_stringtrue
pridearliest priority filing datetinttrue
priorigoriginal, patent office format priority numberalexandria_stringtrue
timestamplast modification stamptdatetrue
ttl_*all titlesmixed: alexandria_text, alexandria_asian_texttrue
ucidunique character identifierstringtrue


The shorthand to return all static fields is the asterisk fl=*

/search/query?q=ucid:US-20160054444-A1&fl=*&rows=1

As with all examples, we discuss results in JSON format.

         "docs" : [
            {
               "timestamp" : "2016-02-27T08:29:31.936Z",
               "ucid" : "US-20160054444-A1",
               "loadid" : 229946,
               "nclms" : 19,
               "nindepclms" : 2,
               "fam" : "-1",
               "pn" : "US-20160054444-A1 US20160054444A1",
               "pd" : 20160225,
               "an" : "US-201514833357-A"
               "anseries" : "14",
               "anorig" : "US-14833357",
               "ad" : 20150824,
               "pri" : [
                  "FR-1457951"
               ],
               "prid" : 20140825,
               "priorig" : [
                  "FR-1457951"
               ],
               "cpcl" : [
                  "G01S  15/588       20130101 LI20160225BHUS        ",
                  "G01S  15/60        20130101 FI20160225BHUS        "
               ],
               "icl" : [
                  "G01S  15/58        20060101ALI20160225BHUS        ",
                  "G01S  15/60        20060101AFI20160225BHUS        "
               ],
               "ifi_pa" : [
                  "ECA ROBOTICS",
                  "ECA ROBOTICS"
               ],
               "inv" : [
                  "Pinto, Marc"
               ],
               "pa" : [
                  "ECA ROBOTICS",
                  "ECA ROBOTICS"
               ],
               "ttl_en" : [
                  "METHOD AND SONAR DEVICE FOR DETERMINING THE SPEED OF MOVEMENT OF A NAVAL VEHICLE IN RELATION TO THE SEA BED"
               ],
               "ab_en" : [
                  "<abstract mxw-id=\"PA168904151\" lang=\"EN\" load-source=\"patent-office\">\n    <p id=\"p-0001\" num=\"0000\">Sonar intended to be carried by a naval vehicle including at least one element for transmitting an acoustic signal, at least one element for receiving the acoustic signal transmitted and reflected on the sea bed and at least two phase centres (PC<sub>1</sub>, PC<sub>2</sub>) that are disposed along a first and a second axis (v<sub>1</sub>, v<sub>2</sub>), respectively, forming an interferometric antenna. The sonar includes elements for determining the speed of movement of the vehicle as a function of the computed value of the relative trim angle (β) formed between a straight line (d<sub>1</sub>) that is perpendicular to the axes (v<sub>1</sub>, v<sub>2</sub>) of the phase centres and a straight line (d<sub>2</sub>) that is perpendicular to the sea bed (F) and of the value determined for the angle of sight.</p>\n  </abstract>"
               ],
            }
         ],

Some things to notice: Multi-valued fields are returned as JSON arrays. Also, as invpa and ifi_pa are copyFields containing multiple values, duplicate entries may appear.

Dynamic Fields

Dynamic fields are fields that are generated either internally, like the relevance score or fields that can be generated by computing values using function queries. Dynamic fields must be listed explicitly in the fl parameter, e.g., fl=score. Below, we request the static ucid field together with the relevance score for 5 results.

/search/query?q=ab_en:sonar&fl=ucid,score&rows=5
         "docs" : [
            {
               "ucid" : "US-20160054444-A1",
               "score" : 1.1746656
            },
            {
               "ucid" : "US-9268020-B2",
               "score" : 2.8773313
            },
            {
               "ucid" : "US-20160047906-A1",
               "score" : 1.1735018
            },
            {
               "ucid" : "US-20160049143-A1",
               "score" : 0.99574935
            },
            {
               "ucid" : "US-20160047891-A1",
               "score" : 1.1675186
            }
         ],

Another type of dynamic field makes use of function queries. A simple, yet contrived, example would be to return the number of dependent claims. CLAIMS Direct stores the total number of claims in nclms as well as the number of independent claims in nindepclms, both as tint. Simple subtraction yields the desired result.

/search/query?q=ab_en:sonar&fl=ucid,nclms,nindepclms,sub(nclms,nindepclms)&rows=2
         "docs" : [
            {
               "ucid" : "US-20160054444-A1",
               "nclms" : 19,
               "nindepclms" : 2,
               "sub(nclms,nindepclms)" : 17
            },
            {
               "ucid" : "US-9268020-B2",
               "nclms" : 74,
               "nindepclms" : 3,
               "sub(nclms,nindepclms)" : 71
            }
         ],

Field Aliases

Field aliasing allows individual fields in the result set to be renamed. This functionality is particularly useful to give meaningful names to pseudo fields created by function queries as in the example above. Any field can be renamed using aliases. The syntax is: alias-field-name:original-field-name

Note

ucid is the only exception. When trying to alias the ucid, an HTTP 500 will be returned.

/search/query?q=ab_en:sonar&fl=ucid,total:nclms,independent:nindepclms,dependent:sub(nclms,nindepclms)&rows=1
         "docs" : [
            {
               "ucid" : "US-20160054444-A1",
               "total" : 19,
               "independent" : 2,
               "dependent" : 17
            }
         ],