Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Introduction to CLAIMS SOLR Search Basics

CLAIMS Direct Direct™ is a web service that provides access to the IFI CLAIMS Global Patent Database, a Data Warehouse that contains patent records from over 90 patenting authorities stored in a common XML format. Each publication, including all published applications and granted patents, is represented by a separate record in the data.


The Data Warehouse is indexed in SOLR, the fast open-source enterprise search platform from the Apache Lucene project. The search interface is a single search box, into which you can type simple or complex queries. The data warehouse is searchable by field; field names and sample searches are provided below. For more information about SOLR searching, see https://cwiki.apache.org/confluence/display/solr/Searchingsee the SOLR Reference Guide.

Boolean Operators

The SOLR index supports AND, OR, and NOT as Boolean operators. Boolean operators must be ALL CAPS. If you enter these operators in lower case letters, the system will search them as terms.

AND
The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. For example, to search for documents that contain "solar energy" and "heating," use the following query: 

"solar energy" AND heating


NOT

The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol "!" can be used in place of the word NOT. For example, to search for documents that contain "solar energy" but not "heating" use the following query:

"solar energy" NOT heating


OR

The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.
For example, to search for documents that contain either "solar energy" or "wind power," use the following query:

 
"solar energy" OR "wind power


Default Operator

In CLAIMS DIrect, the default operator is AND. This means that if no operator is specified, the system assumes AND. In the above examples, we explicitly included the operator in all cases for purposes of clarity.

...

A phrase is a group of words surrounded by double quotes, such as "fuel cell." To retrieve only documents containing the phrase exactly as searched, place the phrase within quotes, as shown in the example below:

 

"fuel cell"

 

CLAIMS® CLAIMS SOLR supports finding words that are within a specific proximity to one another. To execute a proximity search, use the tilde, "~", symbol at the end of a phrase. For example, to search for "solar" and "generation" within 5 words of each other in a document, use the the following search query:

...

This type of search must be prefixed as a complex phrase. The specific syntax is as follows:


{!complexphrase}[field name]:[query]


To search abstracts for documents related to solar energy storage modules, use wild cards in your phrase query, as shown in the example below:

...

This upgrade to SOLR 4.8.1 also allows for fuzzy search based on the Levenshtein Distance. A fuzzy search query returns terms similar to the queried term. (See http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Fuzzy%20Searchesthe SOLR Reference Guide for more information about fuzzy searches.)

For example, the following query returns 'thermic barrier,' thermo . . . barrier,' as well as 'thermal barrier, but it also returns 'dermal barrier.'


ttl_en:"thermal~ barrier"


If you want to fine-tune the results similarity, you can attach a parameter (a number from 0 to 1 -- with 1 being the highest similarity-- follows the tilde) to the fuzzy search.  When this parameter is not specified, the system defaults to .5

You can use fuzzy searches to ferret out spelling variations and errors. For example:


pa:mitsubishi~9

or

ttl_en:color~


Fuzzy searches can be embedded in complex phrase queries, as shown in the examples below:


{!complexphrase}ttl_en:"thermal~ barrier"~

or

{!complexphrase}ttl_en:"(thermal~ NOT dermal) barrier"~3

 

Case Sensitivity

Searches in CLAIMS® SOLR are not case-sensitive. Search terms may be entered in caps or lower case, regardless of case in the documents.

...

Searches in CLAIMS SOLR can include multiple search fields and multiple criteria per field. A few examples are provided to illustrate these more complex queries. Please consult the table at the end of this document SOLR Search Fields section for descriptions and additional search examples on a field-by-field basis.

...

To search for US granted patents , published in the second quarter of 2010, issued to Chevron or Exxon or Total, the search syntax would look like this:

...

To search for EP publications with pub publication dates since 1 December 2010 in Cooperative Patent Classification H01G 9, the search syntax would look like this:

...