Table of Contents |
---|
The data warehouse design is coupled tightly with the structure of the ST36-based XML document. The relational database is the open source PostgreSQL system and makes use of schemas to organize content and organizational structure inside the database. See http://www.postgresql.org for further information on PostgreSQL-specific functionality. The following table lists the active schemas and their roles in the data warehouse.
Schema | Description |
---|---|
work | Tables used for loading and merging raw data – these will not be described below |
reporting | Statistics pertaining to loading, updating, and indexing |
xml | The complete XML broken into tables based on the ST36 XML structure |
cdws | Tables used for on-site citation and family functionality. See CDWS Schema and User-Defined Functions. |
reports | Allows extraction of XML content from the tables in the xml schema. See Reports Schema Functions. |
litigation (optional) | Tables used for global litigation data provided by MaxVal. See Litigation Schema and User-Defined Functions. |
Note |
---|
Current Demos may not contain all of the schemas listed below. In addition, tables and schemas present in the current demo with the exception of reporting, work, and xml (described below), may be removed from the database without notice. |
The following graphic is a tree view of the structure of a complete XML document. Each node represents a container whereby each leaf of the tree represents a table in the data warehouse (xml
and work
schemas). The naming pattern of tables follows the following rules:
- dashes are replaced with underlines and
- each leaf-node name is prefixed with t_, i.e., publication-reference =t_publication_reference.
The two exceptions to this are the non-XML table t_patent_document_values and the copyright node. The copyright container is stored separately in t_sys_data.
Note |
---|
Note: The |
Schemas
Reporting
The reporting
schema houses tables critical to data loading, update processes, and queues for optional indexing. Although most tables are for internal use, the table reporting.t_client_load_process
contains interesting information for clients around which web status interfaces or triggers can easily be built.
Column | Type | Modifiers | Comment |
---|---|---|---|
client_load_process_id | serial | primary key | Table primary key |
load_id | integer | The load identifer | |
load_source | text | Source data for load | |
url | text | URL of package to load | |
ndocs | integer | Number of documents contained in the load | |
entered_stamp | timestamp | Time entered for processing | |
completed_stamp | timestamp | Time loading completed | |
running_status | text | One of: downloading, unpacking, loading, merging, citations, priorities, complete | |
completed_status | text | One of: success, failure |
If you have installed the optional SOLR Solr index, there is a companion table reporting.t_client_index_process
that manages the indexing queue and has the exact structure of the reporting.t_client_load_process
.
XML
The XML schema is the heart of the data warehouse and contains tables representing individual containers (sections) of the complete document stored in pure XML.
Table Structure–General
All tables, with the exception of xml.t_patent_document_values
, follow the same structure.
Column | Type | Modifiers | Comment |
---|---|---|---|
<tbl>_id | serial | primary key | Table primary key |
publication_id | integer | not null | Integer representation of the publication |
modified_load_id | integer | not null | Internal load id used to manage data updates |
status | char(1) | Validity of the XML fragment (v=valid, i=invalid)* | |
content | XML | The XML fragment |
Info |
---|
All XML stored in |
The table xml.t_patent_document_values
is the master content table containing meta information of the document in standard relational database columns. This table is the basis for the overall container element patent-document
.
Column | Type | Modifiers | Comment |
---|---|---|---|
patent_document_value_id | serial | primary key | Table primary key |
publication_id | integer | not null | Integer representation of the publication |
created_load_id | integer | Load id of document creation | |
modified_load_id | integer | Load id of last modification | |
deleted_load_id | integer | Load id when document was deleted | |
status | varchar(16) | Status of complete document (new|update) | |
country | char(2) | not null | Country of publication |
doc_number | varchar(32) | not null | Publication document number |
kind | varchar(4) | Document kind code | |
lang | char(2) | Document publication language | |
ucid | varchar(32) | Concatenation of country-doc_number-kind | |
published | date | Date of publication | |
produced | date | Date of most recent Alexandria XML production, which may indicate original production or updates | |
withdraw | boolean | Withdrawn status | |
family_id | integer | DocDBDOCDB-provided simple family identifier |
CDWS
The cdws
schema is used for on-site citation and family functionality. It houses tables and functions that interact with the data-store, index, and attachment repositories through the web services. Several of the tables in the cdws
schema are described below. Note that if your initial load occurred before May 13, 2020, these tables will need to be added as described in Leveraging On-Site Citation and Family Functionality.
cdws.t_applications
Column
Type
Modifiers
Comment
Concatenation of country-doc_number-kind based on application record
cdws.t_cited_documents
Column | Type | Modifiers | Comment |
---|---|---|---|
id | serial | primary key | Table primary key |
publication_id | integer | not null | Integer representation of the publication |
cited_document_id | integer | not null |
litigation.t_publication_cases
litigation.t_parties
Column | Type | Modifiers | Comment |
---|---|---|---|
parties_id | serial | primary key | Table primary key |
case_id | integer | not null | Case ID referencing litigation.t_cases (case_id ) |
modified_load_id | integer | not null | Load id of last modification |
status | char(1)Document identifier of the cited record | ||
source | varchar(8) | not null | Citation origin |
is_app | boolean | not null | Indicates that record is an application (true) or publication (false) |
load_id | integer | not null | Load_id of last modification |
cdws.t_priority_documents
Column
Type
Modifiers
Comment
Integer representation of the publication
Litigation (Optional)
The litigation
schema is used for global litigation data provided by MaxVal. This data is available through the CLAIMS Direct platform as an optional data subscription. For information about the Litigation API, see Litigation.
litigation.t_cases
XML validity flag (v=valid;i=invalid) | content | xml | Parties
XML
containerlitigation.t_related_publications
litigation.t_related_cases
litigation.t_litigation_case_extract
User-Defined Functions
XML User-Defined Functions
User-Defined Functions
In addition to content columns from which XML content can easily be selected, there are also the following supplemental functions available for common data extraction tasks:
xml.f_patent_document(integer, text)
This function returns an entire XML document given a publication_id as first argument. The second argument is currently unused.
Code Block | ||
---|---|---|
| ||
SELECT xml.f_patent_document(1, null); |
xml.f_patent_document_s(integer, text)
Similar to the function above, this function returns a SET OF XML documents, one document per row. Arguments are identical to function above.
Code Block | ||
---|---|---|
| ||
SELECT xml.f_patent_document_s(t.publication_id, null) FROM xml.t_patent_document_values AS t WHERE t.published='20100407'; |
xml.f_ucid2id(text)
This function takes a ucid (unique character ID) and converts it to an integer publication_id.
Code Block | ||
---|---|---|
| ||
SELECT xml.f_patent_document(xml.f_ucid2id('US-5551212-A'), null); |
CDWS User-Defined Functions
The following examples illustrate the active functions in the cdws
schema. Any functions not documented here are for internal use by IFI CLAIMS only. See Leveraging On-Site Citation and Family Functionality for more information.
Note |
---|
Note: Initial loads which occurred prior to May 13, 2020 need to have loaded the citations and family tables for the |
cdws.f_citations_backward(publication_id integer)
This function returns cited ucids and their citation sources.
Code Block | ||
---|---|---|
| ||
SELECT * FROM cdws.f_citations_backward( xml.f_ucid2id('US-5551212-A') ) |
) |
This function returns ucids and citation sources of documents citing the given ucid.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_citations_forward( xml.f_ucid2id('US-5551212-A') )
|
cdws.f_family_citations_backward(family_id integer)
This function returns the unique backward citations of a given family.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_family_citations_backward(6413406); |
cdws.f_family_citations_forward(family_id integer)
This function returns the unique forward citations of a given family.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_family_citations_forward(6413406)
|
cdws.f_family_extended(publication_id integer)
This function returns all extended family members. This family is comparable to the EPO INPADOC family.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_family_extended( xml.f_ucid2id('US-5551212-A') )
|
cdws.f_family_meta(family_id integer)
This function returns meta data for an entire family, including:
- family_id
- published
- application ucid
- filing date
- title (in EN if available, language of publication otherwise)
- IFI status as well as anticipated and adjusted expiration dates
- EPO legal status as XML
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_family_meta( 6413406 )
|
cdws.f_family_simple(publication_id integer)
This function returns all simple family members.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_family_simple( xml.f_ucid2id('US-5551212-A') )
|
cdws.f_ifi_names(publication_id integer)
This function returns both the original and current IFI standardized name and country.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_ifi_names( xml.f_ucid2id('US-5551212-A') ) |
cdws.f_text_bibliographic_data(publication_id integer, text)
This function returns XML bibliographic data. This is a subset of information returned from xml.t_patent_document_values.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_text_bibliographic_data( xml.f_ucid2id('US-5551212-A'), ‘’ )
|
cdws.f_text_tac(publication_id integer, text)
This function returns the XML for titles, abstracts, and claims. This is a subset of information returned from xml.t_patent_document_values.
Code Block | ||
---|---|---|
| ||
SELECT *
FROM cdws.f_text_tac( xml.f_ucid2id('US-5551212-A'), ‘’ )
|
Litigation User-Defined Functions
litigation.f_build_case(integer, text)
This function returns an entire XML case given a case_id as first argument. The second argument is currently unused.
Code Block | ||
---|---|---|
| ||
SELECT litigation.f_build_case(534987, null); |