Which bulk data sets best suit your needs? (XLS, 23 KB)
A bulk data set consisting of XML-tagged titles, abstracts, descriptions, claims and search reports of EP publications, designed to facilitate natural language processing work.
This product is specifically tailored to the needs of users who process the natural language content of patent publications. It is best used in combination with PATSTAT bulk data sets.
|Format||The data set consists of about 35 files in a simple CSV structure. Each file contains the publications associated with 100 000 publication numbers.|
|Volume||The total size of the data set is about 210 GB (unzipped)
|Delivery||There are two options:
The data set is free of charge.
See above for download or delivery charges
|License||The EPO grants usage of "EP full-text data for text analytics" under "Creative Commons Attribution 4.0 International Public License"|