EP full-text data for text analytics

Artificial image showing blocks of data and binary code

A bulk data set consisting of XML-tagged titles, abstracts, descriptions, claims and search reports of EP publications, designed to facilitate natural language processing work.

Download a sample file

This product is specifically tailored to the needs of users who process the natural language content of patent publications. It is best used in combination with PATSTAT bulk data sets.

With EP full-text data for data analytics you can:

work with highly structured texts of EP publications
flexibly access text components like claims, abstracts, descriptions
freely reuse the data on the basis of an open data licence

Getting started

Check the sample data and and other available information on this web page.
For information on accessing the data set, see the user guide below.
Start downloading the data from Google Cloud Platform

User guide

Conditions

Coverage	EP-A documents from 1978 onwards EP-B documents from 1980 onwards European patent applications (EP-A documents) filed via the PCT route and published by WIPO in one of the three official languages of the EPO are not republished by the EPO. For these cases, only the titles are available.
Format	The data set consists of about 35 files in a simple CSV structure. Each file contains the publications associated with 100 000 publication numbers.
Volume	The total size of the data set is about 210 GB (unzipped)
Delivery	There are two options: The data set is available for download from Google Cloud Platform. For detailed instructions see the User Guide, downloadable from the tab “Getting Started”. Download fees charged by Google are payable by the user. The data set is updated annually.
Price	The data set is free of charge. See above for download or delivery charges
License	The EPO grants usage of "EP full-text data for text analytics" under "Creative Commons Attribution 4.0 International Public License"

Related products

EP full-text data

European Publication Server

Support

Talk to EPO experts or get help from other users

Visit the discussion forum

To choose the best-suited data set or web service for your needs, email patentdata@epo.org and request an online consultation via Microsoft Teams, Zoom or Skype.

Patent knowledge

Applying for a patent

Law & practice

News & events

Learning

About us

Bulk data sets

EP full-text data for text analytics

With EP full-text data for data analytics you can:

See also

Support