...
Like the images, other non-textual content can also be also provided in separate files and stored in special formats such as chemical structures, gene sequences, or mathematical formulae.
...
Images and other files containing non-textual content (collectively "the attachments") are stored in the Attachment Server. Files containing embedded images, as well as mixed mode files, are pre-processed before being loaded to the CLAIMS Direct Patent Database. Links to the attachments are generally referenced in the CLAIMS Direct XML files.
IMG element
The most common way to find embedded/referenced images in the full text of Alexandria XML files is by using the img element.
Depending on the data source, different parameters can be found in this element, including the following most relevant parameters:
...
Drawings section
Attachment refereces references can be embedded in the img
element as described above but they can also be in a drawings
section of the XML document. Pages published at the end of the original patent document and containing drawings are also referenced in this section.
Code Block | ||
---|---|---|
| ||
<drawings mxw-id="PDW3055834" load-source="patent-office"> <figure num="1"> <img file="00380001.tif" id="img-00380001" he="228" wi="180" img-format="tif" img-content="dr"/> </figure> </drawings> |
A figure id
can be used to reference a drawing in the full text as follows:
...
USPTO provides chemical structures in ChemDraw (CDX) and MDL (MOL) formats. References to these special files can be found in the attachment
element in the chemistry
section.
Code Block | ||
---|---|---|
| ||
<chemistry id="CHEM-US-00001" num="00001"> <img id="EMI-C00001" he="19.64mm" wi="54.36mm" file="US07307149-20071211-C00001.TIF" alt="embedded image" img-content="table" img-format="tif"/> <attachments> <attachment idref="CHEM-US-00001" attachment-type="cdx" file="US07307149-20071211-C00001.CDX"/> <attachment idref="CHEM-US-00001" attachment-type="mol" file="US07307149-20071211-C00001.MOL"/> </attachments> </chemistry> |
Other sections
In the same way that chemical structures are referenced in the special "chemistry" section, other attachments also have specific elements in the XML files:
sequence-list, math, megatable-doc, and table-external-doc.
...
Country | Type | Format | Years |
---|---|---|---|
EP apps | Referenced images | TIFF | 1978 to 1995, |
| Full document | multi-page PDF | 1978 to present |
EP grants | Referenced images | TIFF | 1998 to present |
| Full document | multi-page PDF | 1980 to present |
WO | Referenced images | TIF, JPEG | 1978 to present |
US apps and grants | Referenced images | TIFF | 2001 to present |
| Complex work units | CDX, MOL, NB, XML | 2001 to present |
JP apps | Front-page drawings | TIFF | 1980 to present |
JP apps, grants, | Referenced images | multi-page TIFF + POS file | 2004 to present |
| Full document | multi-page PDF | 2004 to present |
KR apps, grants, | Referenced images | TIFF, JPEG, APP | 2006 to 2009 |
| Full document | multi-page PDF | 2006 to 2009 |
ES apps ,and grants | Full document | multi-page PDF | 2007 to present |
...