...

The CLAIMS Global Patent Database data is delivered as a separate record for each patent publication, with data merged from multiple sources. The sources include the following:

DocDB DOCDB and legal status from the EPO
Bibliographic and full text files from national patent offices
Assignment information for US Translated bibliographic data for Japan from Patent Abstracts of Japan
Data from national registers for AU and EP
Extensions and associated drug names for AU and US

New raw data is loaded as soon as published and includes both new records as well as changes.

The loading of PDFs and other attachments generally occurs after the XML as they are delivered to us separately. In order to make the XML available as quickly as possible, we do not constrain delivery of the text until the images arrive. Attachments and images are usually available within 24 hours of the XML.

Because of the unpredictable, large amount of updated records delivered by DocDB DOCDB (which can cause delays in processing national office data), we are more selective in processing DocDB DOCDB data by effectively spreading the weekly loads over a full week instead of by processing the data when we it is received the data. Doing so allows us to make sure that the US, EP, and PCT files (in particular) files covering newly published documents are processed in a timely manner.

The data we receive from patent offices may differ from what is available through their portals as they are separate products with different maintenance procedures and schedules. As a result, it may be helpful to consult the /legalstatus service if a more current source is needed.

In the following sections, we include information about data volumes and frequency of updates for the main raw data sources.

Weekly

...

Updates

...

Source	Day of

source data availabilityDelay from patent publication date

Source Data Availability	Delay From Patent Publication Date (Original Language)	Availability in

Claims

CLAIMS Global DB (Original Language)	Translation Availability

CA

Wednesday

same day

day of publication

CN

Wednesday

2-3 weeks

2-3 weeks after publication

(EN)
AP	static collection
AT	monthly on day 15	1-2 weeks	1-2 weeks after publication	1 day after availability of the original language
AU		same day	1-2 days after publication
BE			1-2 weeks after publication	1 day after availability of the original language
BG		big delays
BR		2 weeks	2 weeks after publication	1 day after availability of the original language
CA	Wednesday	same day	1-6 days after publication	1 day after availability of the original language
CH			2-3 weeks after publication	1 day after availability of the original language
CN		1-2 days (bib data) 3 weeks (full text)	1-2 days after publication (bib data) 3 weeks after publication (full text)	1-2 days after publication (bib data and full text)
CS	static collection
CZ			2-3 weeks after publication	1 day after availability of the original language
DD	static collection
DE	Thursday	same day	day of publication

2 days after publication

1 day after availability of the original language
DK			2-3 weeks after publication	1 day after availability of the original language
EA			3 weeks after publication	1 day after availability of the original language
EP	Wednesday	same day	day of publication

1 day after availability of the original language
EPO DOCDB Create/Delete	Thursday	depending on the country	1 day after source

data availability


ES

Daily

daily

1-2

-4 weeks2-4 Wednesday

weeks	1-2 weeks after publication	1 day after availability of the original language
FI			3-4 weeks after publication

JP Grants

1 day after availability of the original language
FR	1 week after publication	1 day after availability of the original language
GB	1-2 weeks after publication
IN

1

2-

2

3 weeks after publication
JP Grants	Wednesday	2-3

weeks

days	2-3 days after publication	1 day after availability of the original language
JP Applications	Thursday	2-3 days	2-3 days after publication	1 day after availability of the original language
KR	daily	same day	2-3 days after publication	1 day after availability of the original language
LT			2-3 weeks after publication	1

-2

day after availability of the original language
LU			3-4 weeks after publication

2-3 weeks after publicationKRDailysame dayday of publication2-3 weeks after publication

	1 day after availability of the original language
LV			3-4 weeks after publication	1 day after availability of the original language
NL			4 weeks after publication	1 day after availability of the original language
OA		delays of some months		1 day after availability of the original language
PT		delays of some months		1 day after availability of the original language
RO			3-4 weeks after publication	1 day after availability of the original language
RU			1 week after publication	1 day after availability of the original language
SI			3-4 weeks after publication	1 day after availability of the original language
SK			3-4 weeks after publication	1 day after availability of the original language
SU	static collection
TW		8 days	3-4 weeks after publication	1 day after availability of the original language
US Grants	Tuesday	same day	day of publication


US Grants - Attachments	Tuesday	same day	day of publication


US Applications	Thursday	same day	day of publication


US Apps - Attachments	Thursday	same day	day of publication


WO	Thursday	same day	day of publication

1 day after availability of the original language

Data Volume and Type Per Delivery

The following table shows the average volume of records processed in each data delivery from the main raw data sources. It also indicates some of the elements that are commonly processed.

A: all first publications contain this element (excludes corrections, search reports, WO equivalents/republications, etc.)

S: some publications contain this element, data is loaded when available

Source	Average

volume

Volume	`classes`	`priorities`	`titles`

abst.

`abstracts`

desc.

`descriptions`	`claims`
CA	3,200	A	S	A	S

	A
CN	24,000	A	S	A	A	S	S
DE	1,500	A	S	A	S	A	A
EP	4,400	A

		A	S	A	A
EPO DOCDB Create/Delete	125,000	S	A	S	S


ES	120	A	S	A	S	A	A
JP Grants	4,800	A	S	A	S	A	A
JP Applications	6,600	A	S	S	S	S	S
KR	1,200	A	S	A	A	A	A
US Grants	4,800	A	S	A	S	A	A
US Grants - Attachments	120,000


US Applications	6,300	A	S	A	A	A	A
US Apps - Attachments	150,000


WO	4,000

A

S

A - all first publications contain this element (excludes corrections, search reports, WO equivalents/republications, etc.)

S - some publications contain this element, data is loaded when available

...

DOCDB Amended Records

DocDB DOCDB Amended Records (percentage of documents having updated elements for the given fields)

Source	Day of Publication	Average Volume
EPO

DocDB

DOCDB Amend

Tuesday

365,000

Source	`classes`	`citations`	`titles`	`applicants`	`inventors`	`abstracts`
EPO

DocDB

DOCDB Amend

48%

26%

8%

6%

5%

Non-Weekly Updates

Source	Frequency	Average Volume	Element(s) Affected
Patent Abstracts of Japan (PAJ)	monthly	25,000	`classifications-ipcr, titles, parties, abstracts`
USPTO Master Classification File	every 2 months	150,000	`classification-national`
USPTO Reassignments	daily	60,000	`assignee-history`
EPO IPCR Incremental Update	quarterly	1,200,000	`classifications-ipcr`

Architecture

Two important scripts are used to manage the ongoing data update process.

apgupd: The main update daemon along with the sub-script apgup is used to check for and download new or updated data from the CLAIMS Direct primary data warehouse.
aidxd: This, along with the sub-script aidx (alexandria_index), is responsible for keeping the optional SOLR index updated with data synced from the individual client PostgreSQL database.

Update Process In Detail

The main components involved in content updates for Claims Direct client instances consist of the client PostgreSQL database proper, the remote web client, and the server-side web service end points (CDWS), as shown in the diagram below:

...

Content is processed on the IFI CLAIMS primary instance based on the concept of load source. These load sources are particular data feeds from issuing authorities. Load sources can include new, complete documents, updated complete documents or partial document updates. As these load sources are processed into the primary data warehouse, they are stamped with a load-id (an identifier used to group sets of documents together) and are immediately made available for client download.

...

Info

title	What is a load-id

Every document within the data warehouse was loaded as part of a group of documents. This set of documents is identified by a load-id (integer value). There are 3 types of load-ids in the data warehouse: (1) created-load-id, (2) deleted-load-id, (3) and modified-load-id. The created-load-id represents the load-id in which a document was added to the database, the modified-load-id represents the load-id that last modified the document, and the deleted-load-id represents the load-id in which the document marked as deleted.

All meta data pertaining to updates to both PostgreSQL and SOLR are contained in the PostgreSQL schema reporting. For further details, see design documentation.

PostgreSQL Updates

apgup (alexandria_update) is the mechanism through which client instances communicate with the IFI primary instance and is the only method of obtaining new or updated content for the client PostgreSQL database. You will need authentication credentials from IFI CLAIMS in order to communicate with the IFI primary server.

Action	Example Command
Check for available content updates:	apgup --check --user=xxx --password=yyy
Automatically download and process the next available content update:	apgup --update --user=xxx --password=yyy
To continually process available content updates:	nohup apgup --update --continue --user=xxx --password=yyy &
To stop apgup when in --continue mode (in separate terminal window):	apgup --stop

Detailed Usage (apgup)

Code Block

apgup [ Options ... ]

  --help           print this usage and exit
  --update         update local database
    --continue     continue processing available load-ids
    --interval=i   number of seconds to sleep before continuing (default=10)
    --die_on_error Break out of continue loop if any error conditions arise
                     (default=false, i.e., continue trying to update)
  --check          check for available updates but don't proceed to actually update
    --limit=n      limit display of available updates to n (default=10)
  --stop           trigger a gracefull exit while in continuous mode
  --maxloadid=i    force a specific max-client-load-id value
  --noindex        don't trigger index of new load-id(s)

  Required Authorization Arguments
  --------
  --user=s         basic authorization name
  --password=s     basic authorization password

  Optional Update Arguments
  --------
  --url            Alexandria services url
  --update_method  method name for requesting server updates
  --status_method  method name for getting/setting session status
  --check_method   method name for checking available updates
  --schema         schema name for temporary work tables
  --force          force the update even if it is redundant
  --tmp            temporary directory for update processing
  --facility=s     logging facility (default=apgup)

  Optional Database Argument
  --------
  --pgdbname=conf  default database connection name (alexandria)

SOLR Indexing

Indexing into SOLR is controlled by an indexing daemon: aidxd. This daemon probes PostgreSQL for available load-id(s) to index. This "queue" is represented by the table reporting.t_client_index_process. When processing is successfully completed into PostgreSQL, apgup registers a new, index-ready load-id. The indexing daemon recognizes this as an available load-id and begins the indexing process for that particular load-id.

Action	Example Command
Starting the indexing daemon:	aidxd --tmp=/scratch/solr-load-tmp
Pausing the indexing daemon:	kill -s USR1 <pid>
Resuming a paused daemon:	kill -s USR2 <pid>
Stopping the indexing daemon:	kill -s USR1 <pid> && kill -s INT <pid>

Detailed Usage (aidxd)

...

Page tree

Versions Compared

Old Version 2

New Version Current

Key

Weekly

Updates

Data Volume and Type Per Delivery

DOCDB Amended Records

Non-Weekly Updates

Architecture

Update Process In Detail

PostgreSQL Updates

Detailed Usage (apgup)

SOLR Indexing

Detailed Usage (aidxd)

Page tree

Page History

Versions Compared

Old Version 2

New Version Current

Key

Weekly

Updates

Data Volume and Type Per Delivery

DOCDB Amended Records

Non-Weekly Updates

Architecture

Update Process In Detail

PostgreSQL Updates

Detailed Usage (apgup)

SOLR Indexing

Detailed Usage (aidxd)