Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

The Attachment Server contains clipped images, drawings, chemistry files, full page images, and other non-textual files delivered by patent offices. Most of the links to the files in the Attachment Server are embedded in CLAIMS Direct XML files. There is a simple web service available to access this content (see Attachments for more information). This document describes the content of the Attachment Server, explains how image files are embedded in the XML files, and discusses how to interface with the server.

Raw Data

Image content from patents is available from data sources in different formats, which include the following:

  • Full page images
    These images are facsimiles of the original documents delivered as TIFF files or PDF documents, which can be a single page (one file per page) or multi-page (one file per document).
  • Embedded images
    Embedded images are images that are part of the document. Unlike full page images, they are not the entire document.
  • ST35, mixed mode
    Mixed-mode format is an old format delivered by some patent offices. This format includes both text and binary data. The text blocks and binary blocks are bytes of data lined up in a file. There is no separate image file; the image data is just another "in-line" block in the file.
  • Referenced images
    Images are delivered as separate files in TIFF or JPEG format. They are named systematically, and correspond to references within the XML or SGML documents. This is the most common current practice for delivering images. Multi-page TIFFs are indicated when multiple id attributes point to the same file. These are usually only found in JP records. A special tool or viewer is required for proper handling of multi-page TIFFs. A list of viewers can be found here.

Like the images, other non-textual content can also be provided in separate files and stored in special formats such as chemical structures, gene sequences, or mathematical formulae.

Attachments in the CLAIMS Direct Patent Database

Images and other files containing non-textual content (collectively "the attachments") are stored in the Attachment Server. Files containing embedded images, as well as mixed mode files, are pre-processed before being loaded to the CLAIMS Direct Patent Database. Links to the attachments are generally referenced in the CLAIMS Direct XML files.

IMG Element

The most common way to find embedded/referenced images in the full text of Alexandria XML files is by using the img element.
Depending on the data source, different parameters can be found in this element, including the following most relevant attributes: 

AttributeDescription
id

Unique image identifier

Note

Note: For Japanese multi-page TIFFs, this identifier is a reference to the corresponding ImageDescription in the TIFF.


fileName of the attachment file
heHeight of the image described in pixels or percentage
wiWidth of the image described in pixels or percentage
img-format

Format of the image file, which includes the following options:

ValueDescription
appSequence Listing
bmpBitmap Image File
cdxChemical Structure (ChemDraw format)
gifGraphics Interchange Format
iniInitialization File
jpgJPEG Image File
molChemical Structure (MDL format)
nbMathematical Expression
pdfFull Document
posPostScript Image File
seqSequence File
st33ST.33 File
st35ST.35 File
tifTagged Image File
tiffTagged Image File
txtText File
xmlLarge Table or Sequence Listing
zipZIP File


img-content

Describes the kind of content in the image file. Through the years, different values have been allowed in the img-content parameter. The most frequently used values include:

ValueDescription
adAbstract Drawing
cf, chem, chemistryChemical Formula
ciClip
cp, programlistingComputer Program Listing
dn, dnaDNA Sequence
dr, drawingDrawing
ffUndefined Character
fg, figureFigure
gr, graphGraph
mf, mathMathematical Formula
paPage
phPhotograph
srSearch Report
tbTable
txCharacter
uiUndefined Image

Note: The current default value is drawing.


The img element may be found in the abstract, description, or claims sections of the XML document:

Abstract Section

Code Block
languagexml
<abstract mxw-id="PA87583406" lang="EN" load-source="patent-office">
  <p num="pa01">This invention relates to conduits (12) that allow communication of fluids
                from one portion of a patient's body to another; and, more particularly, to a
                blood flow conduit to allow communication from a heart chamber to a vessel or
                vice versa, and/or vessel to vessel
    <img id="iaf01" file="imgaf001.tif" wi="78" he="89" img-content="drawing" img-format="tif"/>
  </p>
</abstract> 


Description Section

Code Block
languagexml
<p num="p0078">For screening purposes a GST and 6 x His fusion of the LBD (from amino acids 155
                of hLXRalpha to 447) of human LXRalpha is constructed by first cloning a Gateway
                cassette (Invitrogen) in frame into the Sma I site of the pAGGHLT Polylinker (Pharmingen) [...]
                Primers used for Amplification are:
  <img id="ib0004" file="imgb0004.tif" wi="165" he="15" img-content="dna" img-format="tif"/>
  <!-- [...] -->
</p>


Claims Section

Code Block
languagexml
<claim-text>A phosphoramidite reagent of the formula I
  <img file="00270001.tif" id="img-00270001" he="32" wi="91" img-format="tif" img-content="cf"/>
    wherein Y and Y' each independently are selected from optionally substituted C<sub>1-6</sub>-alkyl
    or Y and Y' together with the nitrogen to which they are bonded form a non-aromatic [...]
</claim-text>

  

Drawings Section

Attachment references can be embedded in the img element as described above but they can also be in a drawings section of the XML document. Pages published at the end of the original patent document and containing drawings are also referenced in this section.

Code Block
languagexml
<drawings mxw-id="PDW3055834" load-source="patent-office">
  <figure num="1">
    <img file="00380001.tif" id="img-00380001" he="228" wi="180" img-format="tif" img-content="dr"/>
  </figure>
</drawings>


A figure id can be used to reference a drawing in the full text as follows:

Code Block
languagexml
<p num="p0031">Referring now to
  <figref idrefs="f0001">FIGURES 1A and 1B</figref>, a coronary artery bypass is accomplished by disposing a
    conduit 12 (<figref idrefs="f0001">Fig. 1B</figref>) in a heart wall or myocardium MYO
    of a patient's heart PH (<figref idrefs="f0001">Fig. 1A</figref>). [...]
</p>
 
<drawings mxw-id="PDW10967064" load-source="patent-office">
  <figure id="f0001" num="1">
    <img id="if0001" file="imgf0001.tif" wi="140" he="230" img-content="drawing" img format="tif"/>
  </figure>
</drawings>


Chemistry Section

Chemical formulas can be found in the img section with "chem" or "cf", along with "drawing" values in the img-content attribute. Another way to encode a chemistry-specific embedded image is through the use of the chemistry element:

Code Block
languagexml
<claim-text>Use of a compound according to formula (I), or pharmaceutically acceptable salts or solvates thereof,
  <chemistry id="chem0011" num="0011">
    <img id="ib0071" file="imgb0071.tif" wi="74" he="50" img-content="chem" img-format="tif"/>
  </chemistry>
 wherein [...]
</claim-text>

The USPTO provides chemical structures in ChemDraw (CDX) and MDL (MOL) formats. References to these special files can be found in the attachment element in the chemistry section:

Code Block
languagexml
<chemistry id="CHEM-US-00001" num="00001">
  <img id="EMI-C00001" he="19.64mm" wi="54.36mm" file="US07307149-20071211-C00001.TIF" alt="embedded image" img-content="table" img-format="tif"/>
  <attachments>
    <attachment idref="CHEM-US-00001" attachment-type="cdx" file="US07307149-20071211-C00001.CDX"/>
    <attachment idref="CHEM-US-00001" attachment-type="mol" file="US07307149-20071211-C00001.MOL"/>
  </attachments>
</chemistry>


Other Sections

In the same way that chemical structures are referenced in the special "chemistry" section, other attachments also have specific elements in the XML files: sequence-listmathmegatable-doc, and table-external-doc.

Search Report Pages

Search report pages published by some patent authorities are frequently distributed as full page images. References to the image files can be found in the search-report-data container under the doc-page element:

Code Block
languagexml
<search-report-data>
  <doc-page id="srep0001" file="srep0001.tif" wi="154" he="233" type="tif"/>
</search-report-data>


File Naming

There is no consensus in naming referenced files. Every publishing authority follows its own rules, which also change with the years. The following tables demonstrate some representative examples of attachment listings associated with patents from different publishing authorities.

Attachment list for: EP-2207108-A1-20100714

File

Size

DOCUMENT.PDF

698793

imgaf001.tif

6725

imgb0001.tif

321

imgb0002.tif

315

imgf0001.tif

4219

imgf0002.tif

6892

srep0001.tif

33861

Attachment list for: WO-2010072727-A1-20100701

File

Size

imgf000004_0001.tif

1464

imgf000010_0001.tif

435

imgf000010_0002.tif

435

imgf000037_0001.tif

7738

imgf000040_0001.tif

9871

imgf000041_0001.tif

6357

imgf000042_0001.tif

1935

imgf000043_0001.tif

1464

imgf000046_0001.tif

1372

Attachment list for: WO-2009081462-A1-20090702

File

Size

JPOXMLDOC01-appb-D000003.tif

3974

JPOXMLDOC01-appb-D000004.tif

7764

JPOXMLDOC01-appb-D000005.tif

6896

JPOXMLDOC01-appb-D000006.tif

4642

JPOXMLDOC01-appb-D000007.tif

4252

JPOXMLDOC01-appb-D000008.tif

4270

JPOXMLDOC01-appb-D000009.tif

8998

JPOXMLDOC01-appb-T000001.jpg

92907

JPOXMLDOC01-appb-T000002.jpg

65901

Attachment list for: US-7307149-B2-20071211

File

Size

US07307149-20071211-C00001.CDX

3431

US07307149-20071211-C00001.MOL

681

US07307149-20071211-C00001.TIF

3056

US07307149-20071211-D00001.TIF

10470

US07307149-20071211-D00002.TIF

21196

US07307149-20071211-D00003.TIF

13830

US07307149-20071211-D00004.TIF

14215

US07307149-20071211-D00005.TIF

93857

US07307149-20071211-D00006.TIF

12986

US07307149-20071211-D00007.TIF

16709

US07307149-20071211-D00008.TIF

4793

US07307149-20071211-D00009.TIF

4896

US07307149-20071211-S00001.XML

159831

Attachment list for: JP-2005100000-A-20050414

File

Size

00000001.TIF

3759

2005100000.pdf

84862

2005100000.pos

1496

2005100000.tif

81180

Attachment list for: KR-100920729-B1-20091007

File

Size

1020070098852.pdf

2406235

112007070719375-sdosl.app

2032

112007081317753-pat00001.tif

1950

112007081317753-pat00002.tif

1950

112007081317753-pat00003.tif

1950

112007081317753-pat00005.jpg

204870

112007081317753-pat00006.jpg

180538

112007081317753-pat00007.jpg

279496

112007081317753-pat00008.jpg

166127

112007081317753-pat00009.jpg

170515

112007081317753-pat00010.jpg

244507

112007081317753-pat00011.jpg

156090

112007081317753-pat00012.jpg

156553

112008080483320-pat00013.jpg

37395

R1020070098852.jpg

252603

Data Coverage

Note that the availability of attachments will differ depending on your subscription level. The attachment collection can be queried through our shared service API to get exact file counts for your level.

Country

Type

Format

Years

AP grantsFull documentmulti-page PDF1985-2005
AT appsFull documentmulti-page PDF2005 to present
AT grantsFull documentmulti-page PDF1990 to present
AT utility modelsFull documentmulti-page PDF1994 to present
AU apps and grantsFull documentmulti-page PDF1990 to present
BE appsReferenced imagesTIFF1980 to present

Full documentmulti-page PDF1980 to present
BE grantsFull documentmulti-page PDF2015 to present
BG apps, grants, and utility modelsFull documentmulti-page PDF1994 to present
BR appsFull documentmulti-page PDF2010 to present
BR grantsFull documentmulti-page PDF2014 to present
BR utility modelsFull documentmulti-page PDF2009 to present
CA apps and grantsFull documentmulti-page PDF2000 to present
CH apps and grantsFull documentmulti-page PDF1980 to present
CN appsReferenced imagesTIFF2011 to present

Full documentmulti-page PDF1985 to present
CN grantsReferenced imagesTIFF2011 to present

Full documentmulti-page PDF1990 to present
CN utility modelsReferenced imagesTIFF2011 to present

Full documentmulti-page PDF1985 to present
CS grantsFull documentmulti-page PDF1980-1993
CZ apps, grants,
and utility models
Full documentmulti-page PDF1993 to present
DD grantsFull documentmulti-page PDF1980-2003

DE apps, grants,
and utility models

Referenced imagesTIFF2004 to present
DK apps and grantsReferenced imagesTIFF1980 to present

Full documentmulti-page PDF1980 to present
EA grantsFull documentmulti-page PDF2000 to present

EP apps

Referenced images

TIFF

1978 to present


Full document

multi-page PDF

1978 to present

EP grants

Referenced images

TIFF

1980 to present


Full document

multi-page PDF

1980 to present

ES apps, grants,
and utility models

Referenced imagesTIFF2019 to present

Full documentmulti-page PDF2007 to present
FI apps and grantsReferenced imagesTIFF1980 to present

Full documentmulti-page PDF1980 to present
FR appsReferenced imagesTIFF1981 to present

Full documentmulti-page PDF1981 to present
GB appsFull documentmulti-page PDF1980 to present
HU apps and grantsFull documentmulti-page PDF1980 to present

JP apps

Front-page drawings

TIFF

1980 to present

JP apps, grants,
and utility models

Referenced images

multi-page TIFF + POS file

2004 to present


Full document

multi-page PDF

2004 to present

KR appsFull documentmulti-page PDF1983 to present
KR grants and utility modelsFull documentmulti-page PDF1979 to present

KR apps, grants,
and utility models

Referenced images

TIFF, JPEG, APP

2006 to present


Full document

multi-page PDF

2006 to present

LT grantsFull documentmulti-page PDF1994 to present
LU appsReferenced imagesTIFF1980 to present

Full documentmulti-page PDF1980 to present
LV grantsFull documentmulti-page PDF1994 to present
NL appsReferenced imagesTIFF1990 to present

Full documentmulti-page PDF1990 to present
NL grantsReferenced imagesTIFF1997 to present

Full documentmulti-page PDF1997 to present
PT apps and OA grantsFull documentmulti-page PDF1986 to presentRU apps1980-2007
PT apps and grantsFull documentmulti-page PDF1986 to present
RO appsFull documentmulti-page PDF2011 to present
RO grantsFull documentmulti-page PDF1993 to present
RU apps, grants, and utility modelsFull documentmulti-page PDF2005 to present
RU grants and utility modelsReferenced imagesTIFF and JPEG1994 to present
SI grantsFull documentmulti-page PDF1992 to present
SK apps and grantsFull documentmulti-page PDF1993 to present
SK utility modelsFull documentmulti-page PDF2008 to present
TW appsFull documentmulti-page PDF2003 to present
TW grantsFull documentmulti-page PDF2000 to present
TW utility modelsFull documentmulti-page PDF2004 to present
US appsFull documentmulti-page PDF2001 to present
US grantsFull documentmulti-page PDF1920 to preent
US apps and grantsReferenced imagesTIFF2001 to present

Complex work unitsCDX, MOL, NB, XML2001 to present
WOFull documentmulti-page PDF1978 to present

Referenced imagesTIFF, JPEG1978 to present