Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The data warehouse design is coupled tightly with the structure of the ST36-based XML document. The relational database is the open source PostgreSQL system and makes use of schemas to organize content and organizational structure inside the database. See http://www.postgresql.org for further information on PostgreSQL-specific functionality. The following table lists the active schemas and their roles in the data warehouse.

Schema

Description

work

Tables used for loading and merging raw data – these will not be descibed described below

reporting

Statistics pertaining to loading, updating, and indexing

xml

The complete XML broken into tables based on the ST36 XML structure

Note

Current Demos may not contain all of the schemas listed below. In addition, tables and schemas present in the current demo with the exception of reporting, work, and xml (described below), may be removed from the database without notice.

...

The reporting schema houses tables critical to data loading, update processes, and queues for optional indexing. Although most tables are for internal use, the table reporting.t_client_load_process contains interesting information for clients around which web status interfaces or triggers can easily be built.

Column

Type

Modifiers

Comment

client_load_process_id

serial

primary key

Table primary key

load_id

integer

 

The load identifer

load_source

text

 

Source data for load

url

text

 

URL of package to load

ndocs

integer

 

Number of documents contained in the load

entered_stamp

timestamp

 

time Time entered for processing

completed_stamp

timestamp

 

time Time loading completed

running_status

text

 

one One of: downloading, unpacking, loading, merging, complete

completed_statustext one One of: success, failure

If you have installed the optional SOLR index, there is a companion table reporting.t_client_index_process that manages the indexing queue and has the exact structure of the reporting.t_client_load_process.

...

Column

Type

Modifiers

Comment

<tbl>_id

serial

primary key

Table primary key

publication_id

integer

not null

Interger Integer representation of the publication

modified_load_id

integer

not null

Internal load id used to manage data updates

status

char(1)

 

Validity of the XML fragment (v=valid, i=invalid)*

content

XML

 

The XML fragment

...


The table xml.t_patent_document_values is the master content table containing meta information of the document in standard relation relational database columns. This table is the basis for the overall container element patent-document.

...