EP full-text data for text analytics

Image
Artificial image showing blocks of data and binary code

 

A bulk data set consisting of XML-tagged titles, abstracts, descriptions, claims and search reports of EP publications, designed to facilitate natural language processing work.

This product is specifically tailored to the needs of users who process the natural language content of patent publications. It is best used in combination with PATSTAT bulk data sets.

With EP full-text data for data analytics you can:

  • work with highly structured texts of EP publications
  • flexibly access text components like claims, abstracts, descriptions
  • freely reuse the data on the basis of an open data licence
Getting started
  1. Check the sample data and and other available information on this web page.
  2. For information on accessing the data set, see the user guide below.
  3. Start downloading the data from Google Cloud Platform

User guide

Conditions
 Coverage  
  • EP-A documents from 1978 onwards
  • EP-B documents from 1980 onwards
European patent applications (EP-A documents) filed via the PCT route and published by WIPO in one of the three official languages of the EPO are not republished by the EPO. For these cases, only the titles are available.
 Format The data set consists of about 35 files in a simple CSV structure. Each file contains the publications associated with 100 000 publication numbers. 
 Volume The total size of the data set is about 210 GB (unzipped)
 Delivery There are two options:
 
  • The data set is available for download from Google Cloud Platform.
    For detailed instructions see the User Guide, downloadable from the tab “Getting Started”.
    Download fees charged by Google are payable by the user.
The data set is updated annually.
 Price

The data set is free of charge.

See above for download or delivery charges

License  The EPO grants usage of "EP full-text data for text analytics" under "Creative Commons Attribution 4.0 International Public License"
Related products

EP full-text data

European Publication Server

Talk to EPO experts or get help from other users

Visit the discussion forum

To choose the best-suited data set or web service for your needs, email patentdata@epo.org and request an online consultation via Microsoft Teams, Zoom or Skype.