DOCDB - putting a square peg in a round hole

Part I – exotic kind codes and number suffixes, an overview

To identify the state of the art in a technical field, our users need quick and easy access to as much patent information as possible and as soon as it is available. The EPO's DOCDB is one of the largest bibliographic databases in the world. It contains information on applications and granted patents on utility models and industrial designs, as well as on other kinds of industrial property rights. With almost 140 million documents from over 90 countries worldwide and dating as far back as the early 1800s, DOCDB coverage continues to grow.  

The biggest challenge facing EPO data experts is dealing with all the different types of documents and data formats - from old EPO documents to those received from IP offices - especially when they are incomplete or do not meet DOCDB requirements. There are many factors to consider: the age of the document, what documents offices are able and allowed to supply, office-specific codes and formats, the different IP systems and the frequent revisions to national patent laws.  

To integrate as much data as possible and reduce gaps in the bibliographic data, our experts had to find a way to put "a square peg in a round hole". Using INID codes as inspiration, they came up with the idea of applying the same principle to so-called non-publications (such as application and priority records). As these documents were not created with document kind codes, they were not publicly available in the bibliographic data. Assigning internal dummy codes and suffixes to these documents meant that they could be uploaded immediately, and this helped to fill in data gaps, temporarily and sometimes permanently.   

This solution works well for most data records, but there are some that still require manual intervention. Our experts must then carefully inspect the data to decide what can be used or must be discarded, or whether the data needs to be redelivered.   

Dummy internal codes and suffixes, or "exotic" kind codes, are rare, but you might still come across them in your daily work (and perhaps wrongly mistake them for errors in the data). The overview below will help you identify them. You can also look them up in the kind code concordance list.  

You will find detailed descriptions of these kind codes in part II of this article in one of the next editions of Patent Knowledge News.   

Dummy internal kind codes (overview)  

  • D and Q identify incomplete applications or applications in a non-compliant format.  
  • K, L, M, N and O are used for applications from certain jurisdictions.  
  • E, F and M mark French applications between 1900 and the 1960s.  

Extract from GPI displaying a US application with the internal kind code D.
Figure 1: example of kind code "D" applied to an application number.   

Priority and application number suffixes (overview)  

  • D (provisional application) and T (priority) are provisional until the complete dataset is received.  
  • T is used to create dummy priorities for very early publications to group them in technical families.  
  • X is used for priorities claimed in publications from the early 1900s.   

An extract from GPI: The T is applied provisionally until complete dataset received.
Figure 2: Example of suffix "T" applied to a priority number.  

In the second and final part on this article, we'll be taking a closer look at these exotic codes and number suffixes including how they're implemented and how they came about.   

Sign up to the EPO Newsletter and don't miss this or any other article.  

Keywords: DOCDB, document kind codes, number suffixes  

Quick Navigation