Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Like the images, other non-textual content can also be also provided in separate files and stored in special formats such as chemical structures, gene sequences, or mathematical formulae.

...

Images and other files containing non-textual content (collectively "the attachments") are stored in the Attachment Server. Files containing embedded images, as well as mixed mode files, are pre-processed before being loaded to the CLAIMS Direct Patent Database. Links to the attachments are generally referenced in the CLAIMS Direct XML files.

IMG element

The most common way to find embedded/referenced images in the full text of Alexandria XML files is by using the img element.
Depending on the data source, different parameters can be found in this element, including the following most relevant parameters: 

...

Drawings section

Attachment refereces references can be embedded in the img element as described above but they can also be in a drawings section of the XML document. Pages published at the end of the original patent document and containing drawings are also referenced in this section.

Code Block
languagexml
<drawings mxw-id="PDW3055834" load-source="patent-office">
  <figure num="1">
    <img file="00380001.tif" id="img-00380001" he="228" wi="180" img-format="tif" img-content="dr"/>
  </figure>
</drawings>

 

A figure id can be used to reference a drawing in the full text as follows:

...

USPTO provides chemical structures in ChemDraw (CDX) and MDL (MOL) formats. References to these special files can be found in the attachment element in the chemistry section.

Code Block
languagexml
<chemistry id="CHEM-US-00001" num="00001">
  <img id="EMI-C00001" he="19.64mm" wi="54.36mm" file="US07307149-20071211-C00001.TIF" alt="embedded image" img-content="table" img-format="tif"/>
  <attachments>
    <attachment idref="CHEM-US-00001" attachment-type="cdx" file="US07307149-20071211-C00001.CDX"/>
    <attachment idref="CHEM-US-00001" attachment-type="mol" file="US07307149-20071211-C00001.MOL"/>
  </attachments>
</chemistry>


Other sections

 

In the same way that chemical structures are referenced in the special "chemistry" section, other attachments also have specific elements in the XML files:
sequence-list, math, megatable-doc, and table-external-doc.

...

Country

Type

Format

Years

EP apps

Referenced images

TIFF

1978 to 1995,
1998 to present

 

Full document

multi-page PDF

1978 to present

EP grants

Referenced images

TIFF

1998 to present

 

Full document

multi-page PDF

1980 to present

WO

Referenced images

TIF, JPEG

1978 to present

US apps and grants

Referenced images

TIFF

2001 to present

 

Complex work units

CDX, MOL, NB, XML

2001 to present

JP apps

Front-page drawings

TIFF

1980 to present

JP apps, grants,
and utility models

Referenced images

multi-page TIFF + POS file

2004 to present

 

Full document

multi-page PDF

2004 to present

KR apps, grants,
and utility models

Referenced images

TIFF, JPEG, APP

2006 to 2009

 

Full document

multi-page PDF

2006 to 2009

ES apps ,and grantsFull documentmulti-page PDF2007 to present

...