...
The basic structure of the PostgreSQL database housing the XML content is best described as a collection of tables containing the raw XML of every patent document available in CLAIMS Direct. Each table is a section of the XML document, referred to as a container, and each table is named according to the XML container data it contains, e.g.,
| | |
---|
xml.t_publication_reference | publication-reference | Document publication information |
xml.t_invention_title | invention-title | Document title(s) |
xml.t_claims | claims | Document claims |
The structure of each table is identical with the exception of xml.t_patent_document_values
which functions as a meta table.
| |
---|
<container>_id | serial primary key |
publication_id | integer not null |
modified_load_id | integer not null |
status | character(1) (v=valid XML, i=invalid XML) |
content | XML |
The XML is always in the content
(PostgreSQL type: XML) column.
...
from xml.t_publication_reference |
...
where publication_id=xml.f_ucid2id('US-5551212-A'); |
...
...
<publication-reference fvid="71329085" |
...
...
...
...
...
number>5551212</doc-number> |
...
...
<date>19960903</date>
<lang>EN</lang> |
...
...
Info |
---|
The invaluable utility function xml.f_ucid2id takes a ucid and returns a publication_id . |
...
As mentioned above, all content
columns are of type XML and therefore the internal PostgreSQL xpath functionality can be used. There are a variety of utility functions provided by CLAIMS Direct that mimic DOM functionality, e.g.
| |
---|
string
xml.f_textContent(xpath::text,xml)
| Extract text of node: Code Block |
---|
select xml.f_textContent('./document-id/lang', content) |
|
from xml.t_publication_reference |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
|
string
xml_f_getAttribute(att::text,xml)
| Extract single attribute value: Code Block |
---|
| select xml.f_getAttribute('ucid', content) |
|
from xml.t_publication_reference |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
|
boolean
xml.f_nodeExists(xpath::text,xml)
| Test presence of a particular node: Code Block |
---|
| select xml.f_nodeExists('//date', content) |
|
from xml.t_publication_reference |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
|
XML
xml.f_findNode(xpath::text,xml)
| Select a node (type XML) satisfying the given xpath expression: Code Block |
---|
| select xml.f_findNode('./main-classification', content) |
|
from xml.t_classification_national |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
<main-classification mxw-id="PCL893513162" |
|
>
053397
<>053397</main-classification> |
|
XML[]
xml.f_findNodes(xpath::text,xml)
| Select an array of nodes satisfying the given xpath expression: Code Block |
---|
| select xml.f_findNodes('./classification-ipcr', content) |
|
from xml.t_classifications_ipcr |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
{
{
"<classification-ipcr ...", |
|
"<classification-ipcr ...", |
|
"<classification-ipcr ..." |
|
|
XML[]
xml.f_getElementsByTagName(name::text,xml))
| Select an array of nodes based on element name: Code Block |
---|
| select xml.f_getElementsByTagName('classification-ipcr', content) |
|
from xml.t_classifications_ipcr |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
{
{
"<classification-ipcr ...", |
|
"<classification-ipcr ...", |
|
"<classification-ipcr ..." |
|
string
xml.f_nodeName(xml)
| Select the name of the content node: Code Block |
---|
| select xml.f_nodeName( content ) |
|
from xml.t_invention_title |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
|
XML[]
xml.f_childNodes(xml)
| Select all child nodes: Code Block |
---|
| select xml.f_childNodes( content ) |
|
from xml.t_application_reference |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
----------------------------------------------------------------------------------- |
|
<document-id mxw-id="PAPP60364944" |
|
number>2582893
<number>2582893</doc-number> |
|
<date>
19930303
</date>
<date>19930303</date>
<lang>EN</lang> |
|
<document-id mxw-id="PAPP84571955" |
|
load-source="patent-office" |
|
number>08025828
</number>08025828</doc-number> |
|
<date>
19930303
</date>
<date>19930303</date>
<lang>EN</lang> |
|
|
string
xml.f_node2string(xml,indent)
| Concatenate all child text nodes: Code Block |
---|
| select xml.f_node2string( content::text, |
|
where publication_id=xml.f_ucid2id('US-5551212-A'); |
|
----------------------------------------------------------------------------------- |
|
A
package
is formed by holding a nonextensible strip taut and wrapping it around the upper and/or lower periphery of the bundle and joining the ends of the strip segments e.g. by adhesive bonding. A projecting margin of the strip is folded to overlie the bundle or underlie the bundle so that an L-configuration is imparted to the frame formed around the bundle after the strip is applied
A package is formed by holding a nonextensible strip taut and wrapping it around the upper and/or lower periphery of the bundle and joining the ends of the strip segments e.g. by adhesive bonding. A projecting margin of the strip is folded to overlie the bundle or underlie the bundle so that an L-configuration is imparted to the frame formed around the bundle after the strip is applied thereto. |
|
Use Case: Creating relational view
of XML content
...
Now we can create a function that returns a TABLE
of these properties. We'll use publication_id
as input to the function and create it in a private schema
(mySchema).
Code Block |
---|
|
create or replace function mySchema.f_rdb_citations( integer ) |
...
...
(
publication_id integer, |
...
ref_publication_id integer, |
...
...
published date,
country varchar(2), |
...
...
...
...
...
...
...
...
...
...
...
-- select * from mySchema.f_rdb_citations( xml.f_ucid2id('US-9240001-B2') ); |
...
select x.publication_id, x.content into publication_id, v_content |
...
from xml.t_citations as x where x.publication_id= |
...
$1;
if not found then
return;
end if;
-- loop through all patent citations
for each v_node in array
( select xml.f_findNodes('//patcit', v_content) ) |
...
loop
select xml.f_getAttribute( |
...
'ucid', v_node ) into ref_ucid; |
...
select x.publication_id into ref_publication_id |
...
from xml.t_patent_document_values as |
...
x
where ucid=xml.f_getAttribute( |
...
...
select xml.f_getAttribute( |
...
'load-source', v_node ) into load_source; |
...
select xml.f_getAttribute('name', xml.f_findNode('./sources/source', v_node ) ) into source; |
...
-- loop through all forms of the document-id(s) |
...
...
array
( select xml.f_findNodes('./document-id', v_node) ) |
...
loop
select xml.f_getAttribute( |
...
'format', v_doc_node ) into format; |
...
select xml.f_textContent( |
...
'./country', v_doc_node ) into country; |
...
select xml.f_textContent( |
...
'./kind', v_doc_node ) into kind; |
...
select xml.f_textContent( |
...
'./date', v_doc_node ) into published; |
...
return next;
end loop;
end loop;
end;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000; |
Now, we test the results:
Code Block |
---|
|
select * from mySchema.f_rdb_citations( xml.f_ucid2id('US-9240001-B2') ) |
...
...
publication_id | ref_publication_id | |
...
...
ucid | published | country | kind | load_source | format | source |
...
----------------+--------------------+-------------------+------------+---------+------+-------------+--------+-------- |
...
160817649 | 5778519 | CA-2580978- |
...
...
| CA | A1 | docdb | epo | APP
160817649 | 127402642 | CN-201918032-U | 2011-08-03 |
...
| CN | U | docdb | epo | APP
160817649 | 75847915 | US-20020150866-A1 | |
...
...
| US | A1 | docdb | epo | APP
160817649 | 75947470 | US-20030060942-A1 | |
...
...
| US | A1 | docdb | epo | APP
160817649 | 76046244 | US-20030154010-A1 | |
...
...
| US | A1 | docdb | epo | APP |
Of course, GROUP BY
and ORDER BY
are available on the returned columns. In addition to SELECTing citations for a single document, we can also JOIN
to table expressions, e.g., assume we are interested in the top 5 cited countries from US grant publication date 2016-05-03:
Code Block |
---|
|
select count(*) as n, t2.country as ref_country |
from xml.t_patent_document_values t1 |
inner join mySchema.f_rdb_citations (t1.publication_id) as |
t2
on t2
on (t1.publication_id=t2.publication_id) |
and t1.published='2016-05-03' |
limit
5
;
n |
14285
| WO
13313
| JP
7115
| EP
2554
|
14285 | WO
13313 | JP
7115 | EP
2554 | CN |