You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

CLAIMS Direct

CLAIMS Direct is a web service that provides access to the IFI CLAIMS Global Patent Database, a Data Warehouse that contains patent records from over 90 patenting authorities stored in a common xml format. Each publication, including all published applications and granted patents, is represented by a separate record in the data.


The Data Warehouse is indexed in SOLR, the fast open-source enterprise search platform from the Apache Lucene project. The search interface is a single search box, into which you can type simple or complex queries. The data warehouse is searchable by field; field names and sample searches are provided in the table below.

Introduction to CLAIMS SOLR Search Basics

Boolean Operators

The SOLR index supports AND, OR, NOT as Boolean operators. Boolean operators must be ALL CAPS. If you enter these operators in lower case letters, the system will search them as terms.

AND
The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. For example, to search for documents that contain "solar energy" and "heating," use the following query: "solar energy" AND heating

NOT
The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol "!" can be used in place of the word NOT. For example, to search for documents that contain "solar energy" but not "heating" use the following query: "solar energy" NOT heating

OR
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.
For example, to search for documents that contain either "solar energy" or "wind power," use the following query: "solar energy" OR "wind power"

Default Operator

In CLAIMS DIrect, the default operator is AND. This means that if no operator is specified, the system assumes AND. In the above examples, we explicitly included the operator in all cases for purposes of clarity.

 Default Fields

When no field is specified in the query, the search is directed to the title, abstract, description and claims fields.

 Wildcards

? -- Use the question mark to represent a single character (one and only one) at the end or within a word. To search for British or American spellings, use a query like this: sterili?e

* -- Use the asterisk to represent 0 to many characters. For example, to search for test, tests, testing, tester, etc, use the search: test* To retrieve sulphur or sulfur, use the search: sul*ur

Note: You cannot use a * or ? symbol as the first character of a search.

Range Searching

Range Queries allow you to match documents whose field(s) values are between the lower and upper bound specified by the Range Query.

Note: In a range query, the operator 'TO' must be ALL CAPS.

For example, the following search will find documents whose publication dates have values between 20020101 and 20030101, inclusive.

 

pd:[20020101 TO 20030101]

Phrases and Proximity

A phrase is a group of words surrounded by double quotes, such as "fuel cell." To retrieve only documents containing the phrase exactly as searched, place the phrase within quotes, as shown in the example below:

 

"fuel cell"

 

CLAIMS® SOLR supports finding words that are a within a specific proximity to one another. To execute a proximity search, use the tilde, "~", symbol at the end of a phrase. For example, to search for "solar" and "generation" within 5 words of each other in a document, use the the following search query:

 

"solar generation"~5

 

With the upgrade to SOLR version 4.8.1, it is now possible to search phrases that include wild cards and OR'd terms. (For more information about the SOLR Complex Phrase Query Parser, see http://wiki.apache.org/solr/ComplexPhraseQueryParser.)

This type of search must be prefixed as a complex phrase.The specific syntax is as follows: {!complexphrase}[field name]:[query]


To search abstracts for documents related to solar energy storage modules, use wild cards in your phrase query, as shown in the example below:

{!complexphrase}ab:"solar energy stor* modul*"~6

 

Although the default search for this index is to follow the words in the order specified, you can also search for a phrase containing these same words in any order, as shown in the example below:

{!complexphrase inOrder=false}ab:"solar energy stor* modul*"~6

To search abstracts for documents containing information about thermal barriers, you might use a complex phrase search such as the one provided in the example below:

{!complexphrase}ab:"(thermal OR thermic OR thermo) barrier"~8

Fuzzy Search 

This upgrade to SOLR 4.8.1 also allows for fuzzy search based on the Levenshtein Distance. A fuzzy search query returns terms similar to the queried term. (See http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Fuzzy%20Searches for more information about fuzzy searches.)

For example, the following query returns 'thermic barrier,' thermo . . . barrier,' as well as 'thermal barrier, but it also returns 'dermal barrier.'

ttl_en:"thermal~ barrier"

If you want to fine-tune the results similarity, you can attach a parameter (a number from 0 to 1 -- with 1 being the highest similarity-- follows the tilde) to the fuzzy search.  When this parameter is not specified, the system defaults to .5

You can use fuzzy searches to ferret out spelling variations and errors. For example:

pa:mitsubishi~.9

ttl_en:color~

Fuzzy searches can be embedded in complex phrase queries, as shown in the example below:

{!complexphrase}ttl_en:"thermal~ barrier"~

{!complexphrase}ttl_en:"(thermal~ NOT dermal) barrier"~3


 Case Sensitivity

Searches in CLAIMS® SOLR are not case sensitive. Search terms may be entered in caps or lower case, regardless of case in the documents.

Note: You must enter Operators must in ALL CAPS and enter field names in lower case.

 

 Complex Queries ( For Example: Queries Including Criteria in Multiple Fields)

Searches in CLAIMS SOLR can include multiple search fields and multiple criteria per field. A few examples are provided to illustrate these more complex queries. Please consult the table at the end of this document for descriptions and additional search examples on a field-by-field basis.

Examples:

To search for European or PCT applications published in 2010 that have title, abstract and claims in English that concern intraocular lenses, the search syntax would look like this:

 

pnctry:(EP OR WO) AND
pd:[20100101 TO 20101231] AND
flags:(has_ttl_en AND has_ab_en AND has_clm_en) AND
ab:"intraocular lens"

 

The query can also be be written on a single line:

 

pnctry:(EP OR WO) AND pd:[20100101 TO 20101231] AND flags:(has_ttl_en AND has_ab_en AND has_clm_en) AND ab:"intraocular lens"

 

To search for US granted patents, published in the second quarter of 2010, issued to Chevron or Exxon or Total, the search syntax would look like this:

 

pnctry:US AND
pnkind:(B1 OR B2) AND
pd:[20100401 TO 20100630] AND
pa:(BP OR Exxon OR total)

 

To search for EP publications with pub dates since 1 December 2010 in Cooperative Patent Classification H01G 9, the search syntax would look like this:

 

pnctry:EP AND
pd:[20101201 TO *] AND
cpc:H01G0009

 

To search for any publications in December 2010 in US Class 208/415 or Cooperative Patent Classification C10G 1/00, the search syntax would look like this:

 

(uc:208 OR cpc:c10g000100) AND
pd:[20101201 TO 20101231]

 

Important Information:
  • Since the field names are case sensitive, always use lower case letters
  • Capitalize operators (AND, OR, NOT, TO)
  • Use straight quotes to enclose phrases. (Note: This is an issue only if you are cutting/pasting from another source where "smart quotes" may have been used.)


The Filter Query Box

You can use a filter query to restrict the superset of documents that are returned by the standard query There are two advantages to using the filter query rather than including the restriction with your standard query: (1) The filter query does not affect score, and (2) It can increase the speed of complex queries since the filter query results are cached.


Options for Viewing and Analyzing Results

Options are shown in tabs on the upper right segment of the screen and include the following: General, Group, Facet, and More Like This (MLT). * You must select the Group, Facet, and More Like This (MLT) features using the Enable checkbox before you execute the search. The same tabs are present in the display section of the screen.

General

The General option allows selection of the fields you want to display in the result list, the order in which the results should be sorted, the number of rows to display per page, and the timeout (in seconds).

Group

The Result Grouping option groups documents with a common field in like categories and returns the top documents for each category. Fields available for grouping must be single-value fields. When Group is enabled, the default result list is grouped. Note: When the Group option is enabled, the More Like This (MLT) option is disabled. Within the Group option, you can enable the following options:

  • Enable: Toggles the feature on/off
  • Field: Allows selection a grouping parameter from the fields listed in the dropdown.
  • Limit:  Controls the number of documents to display in each group.
  • Sort: Determines the order of the group members. Any fields can be used in ascending (ASC) or descending (DSC) order.
  • Query: Performs like a sub-search in the groups. The input format is the same as for the standard search.
  • Func--: Function queries use functions and allow you to generate a relevancy score using the actual value of one or more numeric fields.


Facet

Faceting is the arrangement of search results into categories based on indexed terms. Facets are expressed in tabular and graphic formats. Within the Facet option, you can enable the following options:

  • Enable: Toggles the feature on/off.
  • Field: Allows selection of a faceting parameter from the fields listed in the dropdown. Multiple fields can be selected.
  • Limit:  Specifies the maximum number of facets to be returned for each field.
  • Sort: The parameter can be count (highest count first) or index (lexicographic by indexed term).
  • Offset: Specifies an offset into the facet results at which to begin displaying facets.
  • Missing: Controls whether SOLR should include a count of all matching results that have no value for the field.
  • Min. Count: Specifies the minimum counts required for a faceted field to be included in the response.

To set the how Facet results are displayed, do the following:

On the Facet tab in the display area, one or more facet-field names are listed 

  1. Click the small > to the left of the facet-field name to expand the list.
  2. Double-click on the facet-field name on this screen to open the graphic display on the right side of the screen.
  3. Under the Settings button, select the style of the graphic and Export the graphic as PNG or JPEG.

 


More Like This (MLT)

 

More Like This constructs a query using terms from the selected document(s) to find similar documents. When More Like This is enabled, each document in the result set is processed to find similar documents, which are listed for each document in the result set. MLT contains the following features:

  • Enable: Toggles the feature on/off
  • Field: Allows selection of a More Like This parameter from the fields listed in the dropdown, either in the title or abstract search fields.
  • Count: Specifies the number of similar documents to be returned for each result.
  • Terms: Control how the MLT component presents the "interesting" terms (the top TF/IDF terms) for the query.  List lists terms; Details also lists boost value.
  • Min. tf: Specifies the Minimum Term Frequency, below which words will be ignored in the source document.
  • Min. df: Specifies the Minimum Document Frequency. (Words that do not occur in at least "this many" documents will be ignored)
  • Max. df: Specifies the Maximum Document Frequency. (Words that occur in more than "this many" documents will be ignored.)
  • Boost: Specifies if the query will be boosted by the "interesting" term relevance.
  • Max. qt: Sets the maximum number of query terms to be included in the generated query.
  • Max. wl: Sets the maximum word length.
  • Min. wl: Sets the minimum word length.
  • Qf: Query fields and their boosts.
Displaying More Like This Results

The display of the More Like This results is similar to that of grouped results. Each document in the result set has an option to display the similar documents. To do so, click the small > to the left of each document in the list.




  • No labels