Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Reverted from v. 18

The data warehouse design is coupled tightly with the structure of the ST36-based XML document. The relational database is the open source PostgreSQL system and makes use of schemas to organize content and organizational structure inside the database. See http://www.postgresql.org for further information on PostgreSQL-specific functionality. The following table lists the active schemas and their roles in the data warehouse.

Schema

Description

work

Tables used for loading and merging raw data – these will not be described below

reporting

Statistics pertaining to loading, updating, and indexing

xml

The complete XML broken into tables based on the ST36 XML structure

cdwsTables used for on-site citation and family functionality. See CDWS Schema and User-Defined Functions.
reportsAllows extraction of XML content from the tables in the xml schema. See Reports Schema Functions.
litigation (optional)Tables used for global litigation data provided by MaxVal. See Litigation Schema and User-Defined Functions.
Note

Current Demos may not contain all of the schemas listed below. In addition, tables and schemas present in the current demo with the exception of reporting, work, and xml (described below), may be removed from the database without notice.


The following graphic is a tree view of the structure of a complete XML document. Each node represents a container whereby each leaf of the tree represents a table in the data warehouse (xml and work schemas). The naming pattern of tables follows the following rules:

  • dashes are replaced with underlines and
  • each leaf-node name is prefixed with t_, i.e., publication-reference =t_publication_reference.

The two exceptions to this are the non-XML table t_patent_document_values and the copyright node. The copyright container is stored separately in t_sys_data.

Note

Note: The xml.t_designated-states table is deprecated and will be removed in a future schema release. designated-states data is found within the xml.t_international-convention-data table.

Schemas

Reporting

The reporting schema houses tables critical to data loading, update processes, and queues for optional indexing. Although most tables are for internal use, the table reporting.t_client_load_process contains interesting information for clients around which web status interfaces or triggers can easily be built.

Column

Type

Modifiers

Comment

client_load_process_id

serial

primary key

Table primary key

load_id

integer


The load identifer

load_source

text


Source data for load

url

text


URL of package to load

ndocs

integer


Number of documents contained in the load

entered_stamp

timestamp


Time entered for processing

completed_stamp

timestamp


Time loading completed

running_status

text


One of: downloading, unpacking, loading, merging, citations, priorities, complete

completed_statustext
One of: success, failure

If you have installed the optional SOLR Solr index, there is a companion table reporting.t_client_index_process that manages the indexing queue and has the exact structure of the reporting.t_client_load_process.

XML

The XML schema is the heart of the data warehouse and contains tables representing individual containers (sections) of the complete document stored in pure XML.

Table Structure–General

All tables, with the exception of xml.t_patent_document_values, follow the same structure.

Column

Type

Modifiers

Comment

<tbl>_id

serial

primary key

Table primary key

publication_id

integer

not null

Integer representation of the publication

modified_load_id

integer

not null

Internal load id used to manage data updates

status

char(1)


Validity of the XML fragment (v=valid, i=invalid)*

content

XML


The XML fragment

Info

All XML stored in content columns is without exception well-formed but occasionally during processing, invalid XML is created. This is due to a variety of reasons, most predominantly, corrupt or invalid patent office data. In an effort to maintain the completeness of the publication, even invalid fragments are stored. If you wish to validate a complete document, simply filter out invalid fragments. 


The table xml.t_patent_document_values is the master content table containing meta information of the document in standard relational database columns. This table is the basis for the overall container element patent-document.

Column

Type

Modifiers

Comment

patent_document_value_id

serial

primary key

Table primary key

publication_id

integer

not null

Integer representation of the publication

created_load_idinteger
Load id of document creation

modified_load_id

integer


Load id of last modification

deleted_load_idinteger
Load id when document was deleted

status

varchar(16)


Status of complete document (new|update)

country

char(2)

not null

Country of publication

doc_number

varchar(32)

not null

Publication document number

kind

varchar(4)


Document kind code

lang

char(2)


Document publication language

ucid

varchar(32)


Concatenation of country-doc_number-kind

published

date


Date of publication

produced

date


Date of most recent Alexandria XML production, which may indicate original production or updates

withdraw

boolean


Withdrawn status

family_id

integer


DocDBDOCDB-provided simple family identifier

CDWS

The cdws schema is used for on-site citation and family functionality. It houses tables and functions that interact with the data-store, index, and attachment repositories through the web services. Several of the tables in the cdws schema are described below. Note that if your initial load occurred before May 13, 2020, these tables will need to be added as described in Leveraging On-Site Citation and Family Functionality.

cdws.t_applications

Column

Type

Modifiers

Comment

application_idserialprimary keyTable primary keyapplication_ucidvarchar(32)not null

Concatenation of country-doc_number-kind based on application record

fileddatenot nullDate of filing

cdws.t_cited_documents 

Column

Type

Modifiers

Comment

idserialprimary keyTable primary key
publication_idintegernot null

Integer representation of the publication

cited_document_idintegernot null
deleted_load_idintegerLoad id when case was deleted

litigation.t_publication_cases

ColumnTypeModifiersCommentpublication_case_idserialprimary keyTable primary keycase_idintegernot nullCase ID referencing litigation.t_cases (case_id )publication_idintegernot nullPublication ID referencing xml.t_patent_document_values (publication_id )

litigation.t_parties

ColumnTypeModifiersCommentparties_idserialprimary keyTable primary keycase_idintegernot nullCase ID referencing litigation.t_cases (case_id )modified_load_idintegernot nullLoad id of last modificationstatuschar(1)XML validity flag (v=valid;i=invalid)contentxmlParties XML container

litigation.t_related_publications

ColumnTypeModifiersCommentrelated_publications_idserialprimary keyTable primary keycase_idintegernot nullCase ID referencing litigation.t_cases (case_id )modified_load_idintegernot nullLoad id of last modificationstatuschar(1)XML validity flag (v=valid;i=invalid)contentxmlRelated publications XML container

litigation.t_related_cases

ColumnTypeModifiersCommentrelated_cases_idserialprimary keyTable primary keycase_idintegernot nullCase ID referencing litigation.t_cases (case_id )modified_load_idintegernot nullLoad id of last modificationstatuschar(1)XML validity flag (v=valid;i=invalid)contentxmlRelated cases XML container

litigation.t_litigation_case_extract

char(1)
ColumnTypeModifiersComment
parties_idserialprimary keyTable primary key
case_idintegernot nullCase ID referencing litigation.t_cases (case_id )
modified_load_idintegernot nullLoad id of last modification
statusDocument

identifier

of the cited record
sourcevarchar(8)not nullCitation origin
is_appbooleannot nullIndicates that record is an application (true) or publication (false)
load_idintegernot nullLoad_id of last modification

cdws.t_priority_documents​

Column

Type

Modifiers

Comment

priority_idserialprimary keyTable primary keypublication_idintegernot null

Integer representation of the publication

application_idintegernot nullConcatenation of country-doc_number-kind based on application recordapp_typechar(1)not nullXML origin of record: 'A' indicates application, 'P' indicates prioritylinkchar(1)Value of priority-claim/linkage-type attribute in XMLload_idintegernot nullLoad_id of last modification

Litigation (Optional)

The litigation schema is used for global litigation data provided by MaxVal. This data is available through the CLAIMS Direct platform as an optional data subscription. For information about the Litigation API, see Litigation.

litigation.t_cases

ColumnTypeModifiersCommentcase_idserialprimary keyTable primary keycase_numbervarchar(32)Identifying case numberfileddateFiling date of casecloseddateDate case closedcreated_load_idintegernot nullLoad id of case creationmodified_load_idintegernot nullLoad id of last modification

XML

validity flag (v=valid;i=invalid)contentxmlMain XML case container

User-Defined

Functions

XML User-Defined Functions

Functions

In addition to content columns from which XML content can easily be selected, there are also the following supplemental functions available for common data extraction tasks:

xml.f_patent_document(integer, text)

This function returns an entire XML document given a publication_id as first argument. The second argument is currently unused.

Code Block
languagesql
SELECT xml.f_patent_document(1, null);


xml.f_patent_document_s(integer, text)

Similar to the function above, this function returns a SET OF XML documents, one document per row. Arguments are identical to function above.

Code Block
languagesql
SELECT xml.f_patent_document_s(t.publication_id, null)
FROM xml.t_patent_document_values AS t
WHERE t.published='20100407';


xml.f_ucid2id(text)

This function takes a ucid (unique character ID) and converts it to an integer publication_id.

Code Block
languagesql
SELECT xml.f_patent_document(xml.f_ucid2id('US-5551212-A'), null);

CDWS User-Defined Functions

The following examples illustrate the active functions in the cdws schema. Any functions not documented here are for internal use by IFI CLAIMS only. See Leveraging On-Site Citation and Family Functionality for more information.

Note

Note: Initial loads which occurred prior to May 13, 2020 need to have loaded the citations and family tables for the cdws functions to work (see instructions here). If your initial load occurred after May 13, those tables were included in the initial load.

cdws.f_citations_backward(publication_id integer)

This function returns cited ucids and their citation sources.

Code Block
languagesql
SELECT * 
FROM cdws.f_citations_backward( xml.f_ucid2id('US-5551212-A') )

cdws.f_citations_forward(publication_id integer)

This function returns ucids and citation sources of documents citing the given ucid.

Code Block
languagesql
SELECT *
FROM cdws.f_citations_forward( xml.f_ucid2id('US-5551212-A') )

cdws.f_family_citations_backward(family_id integer)

This function returns the unique backward citations of a given family.

Code Block
languagesql
SELECT *
FROM cdws.f_family_citations_backward(6413406);

cdws.f_family_citations_forward(family_id integer)

This function returns the unique forward citations of a given family.

Code Block
languagesql
SELECT *
FROM cdws.f_family_citations_forward(6413406)

cdws.f_family_extended(publication_id integer)

This function returns all extended family members. This family is comparable to the EPO INPADOC family.

Code Block
languagesql
SELECT *
FROM cdws.f_family_extended( xml.f_ucid2id('US-5551212-A') )

cdws.f_family_meta(family_id integer)

This function returns meta data for an entire family, including:

  • family_id
  • published
  • application ucid
  • filing date
  • title (in EN if available, language of publication otherwise)
  • IFI status as well as anticipated and adjusted expiration dates
  • EPO legal status as XML
Code Block
languagesql
SELECT *
FROM cdws.f_family_meta( 6413406 )

cdws.f_family_simple(publication_id integer )

This function returns all simple family members.

Code Block
languagesql
SELECT *
FROM cdws.f_family_simple( xml.f_ucid2id('US-5551212-A') )

cdws.f_ifi_names(publication_id integer)

This function returns both the original and current IFI standardized name and country.

Code Block
languagesql
SELECT *
FROM cdws.f_ifi_names( xml.f_ucid2id('US-5551212-A') )

cdws.f_text_bibliographic_data( publication_id integer, text)This function returns XML bibliographic data. This is a subset of information returned from xml.t_patent_document_values.

Code Block
languagesql
SELECT *
FROM cdws.f_text_bibliographic_data( xml.f_ucid2id('US-5551212-A'), ‘’ )

cdws.f_text_tac(publication_id integer, text)

This function returns the XML for titles, abstracts, and claims. This is a subset of information returned from xml.t_
patent_document
_values.
Code Block
languagesql
SELECT *
FROM cdws.f_text_tac( xml.f_ucid2id('US-5551212-A'), ‘’ )

Litigation User-Defined Functions

litigation.f_build_case(integer, text)

This function returns an entire XML case given a case_id as first argument. The second argument is currently unused.

Code Block
languagesql
SELECT litigation.f_build_case(534987, null);