EP full-text data for text analytics
Image
A bulk data set consisting of XML-tagged titles, abstracts, descriptions, claims and search reports of EP publications, designed to facilitate natural language processing work.
This product is specifically tailored to the needs of users who process the natural language content of patent publications. It is best used in combination with PATSTAT bulk data sets.
With EP full-text data for data analytics you can:
- work with highly structured texts of EP publications
- flexibly access text components like claims, abstracts, descriptions
- freely reuse the data on the basis of an open data licence
- Getting started
-
- Check the sample data and and other available information on this web page.
- For information on accessing the data set, see the user guide below.
- Start downloading the data from Google Cloud Platform
- Conditions
-
Coverage - EP-A documents from 1978 onwards
- EP-B documents from 1980 onwards
Format The data set consists of about 35 files in a simple CSV structure. Each file contains the publications associated with 100 000 publication numbers. Volume The total size of the data set is about 210 GB (unzipped) Delivery There are two options:
- The data set is available for download from Google Cloud Platform.
For detailed instructions see the User Guide, downloadable from the tab “Getting Started”.
Download fees charged by Google are payable by the user.
Price The data set is free of charge.
See above for download or delivery charges
License The EPO grants usage of "EP full-text data for text analytics" under "Creative Commons Attribution 4.0 International Public License" - Related products