Support

Talk to EPO experts or get help from other users

 

EP full-text data for text analytics

Binary code (decorative image)

A bulk data set consisting of XML-tagged titles, abstracts, descriptions, claims and search reports of EP publications, designed to facilitate natural language processing work.

  
 

This product is specifically tailored to the needs of users who process the natural language content of patent publications. It is best used in combination with PATSTAT bulk data sets.


With EP full-text data for data analytics you can:

  • work with highly structured texts of EP publications
  • flexibly access text components like claims, abstracts, descriptions
  • freely reuse the data on the basis of an open data licence
  1. Check the sample data and and other available information on this web page.
  2. For information on accessing the data set, see the user guide below.
  3. Start downloading the data from Google Cloud Platform
 Content  
  • EP-A documents from 1978 onwards
  • EP-B documents from 1980 onwards
European patent applications (EP-A documents) filed via the PCT route and published by WIPO in one of the three official languages of the EPO are not republished by the EPO. For these cases, only the titles are available.
 Formats The data set consists of about 35 files in a simple CSV structure. Each file contains the publications associated with 100 000 publication numbers. 
 Volume The total size of the data set is about 210 GB (unzipped)
 Delivery There are two options:

  • The data set is available for download from Google Cloud Platform.
    For detailed instructions see the User Guide, downloadable from the tab “Getting Started”.
    Download fees charged by Google are payable by the user.
  • The data can also be delivered on HDD as part of “EP full-text data” (EUR 800).
The data set is updated annually.
 Price

The data set is free of charge.

See above for download or delivery charges

License  The EPO grants usage of "EP full-text data for text analytics" under "Creative Commons Attribution 4.0 International Public License"

Quick Navigation