The data warehouse design is coupled tightly with the structure of the ST36-based XML document. The relational database is the open source PostgreSQL system and makes use of schemas to organize content and organizational structure inside the database. See http://www.postgresql.org for further information on PostgreSQL-specific functionality. The following table lists the active schemas and their roles in the data warehouse.
Schema | Description |
---|---|
work | Tables used for loading and merging raw data – these will not be described below |
reporting | Statistics pertaining to loading, updating, and indexing |
xml | The complete XML broken into tables based on the ST36 XML structure |
Note |
---|
Current Demos may not contain all of the schemas listed below. In addition, tables and schemas present in the current demo with the exception of reporting, work, and xml (described below), may be removed from the database without notice. |
The following graphic is a tree view of the structure of a complete XML document. Each node represents a container whereby each leaf of the tree represents a table in the data warehouse (xml
and work
schemas). The naming pattern of tables follows the following rules:
...
The two exceptions to this are the non-XML table t_patent_document_values and the copyright node. The copyright container is stored separately in t_sys_data.
Schemas
Reporting
The reporting
schema houses tables critical to data loading, update processes, and queues for optional indexing. Although most tables are for internal use, the table reporting.t_client_load_process
contains interesting information for clients around which web status interfaces or triggers can easily be built.
Column | Type | Modifiers | Comment | |
---|---|---|---|---|
client_load_process_id | serial | primary key | Table primary key | |
load_id | integer |
| The load identifer | |
load_source | text |
| Source data for load | |
url | text | URL of package to load | ||
ndocs | integer | Number of documents contained in the load | ||
entered_stamp | timestamp |
| Time entered for processing | |
completed_stamp | timestamp | Time loading completed | ||
running_status | text | One of: downloading, unpacking, loading, merging, complete | ||
completed_status | text | One of: success, failure |
If you have installed the optional SOLR index, there is a companion table reporting.t_client_index_process
that manages the indexing queue and has the exact structure of the reporting.t_client_load_process
.
...
Column | Type | Modifiers | Comment |
---|---|---|---|
<tbl>_id | serial | primary key | Table primary key |
publication_id | integer | not null | Integer representation of the publication |
modified_load_id | integer | not null | Internal load id used to manage data updates |
status | char(1) | Validity of the XML fragment (v=valid, i=invalid)* | |
content | XML | The XML fragment |
Info |
---|
All XML stored in |
...
Column | Type | Modifiers | Comment | |
---|---|---|---|---|
patent_document_value_id | serial | primary key | Table primary key | |
publication_id | integer | not null | Integer representation of the publication | |
created_load_id | integer | Load id of document creation | ||
modified_load_id | integer | Load id of last modification | ||
deleted_load_id | integer | Load id when document was deleted | ||
status | varchar(16) |
| Status of complete document (new|update) | |
country | char(2) | not null | Country of publication | |
doc_number | varchar(32) | not null | Publication document number | |
kind | varchar(4) |
| Document kind code | |
lang | char(2) | Document publication language | ||
ucid | varchar(32) |
| Concatenation of country-doc_number-kind | |
published | date |
| Date of publication | |
produced | date | Date of Alexandria XML production | ||
withdraw | boolean | Withdrawn status | ||
family_id | integer | DocDB-provided simple family identifier |
...
Code Block | ||
---|---|---|
| ||
SELECT xml.f_patent_document(1, null); |
...
xml.f_patent_document_s(integer, text)
...
Code Block | ||
---|---|---|
| ||
SELECT xml.f_patent_document_s(t.publication_id, null) FROM xml.t_patent_document_values AS t WHERE t.published='20100407'; |
...
xml.f_ucid2id(text)
This function takes a ucid (unique character ID) and converts it to an integer publication_id.
...