"Dictionaries" - a new raw data product

Leider ist diese Seite derzeit nicht in deutscher Sprache verfügbar.

Newsflash 5/2010

The EPO, in close cooperation with the EPC member states, has developed electronic machine translation dictionaries.  Under the framework of this project, to date, the EPO had completed the following dictionaries:

  • German - English; English - German
  • Spanish - English; English - Spanish
  • English - French; French - English
  • English - Italian; Italian - English
  • English - Portuguese; Portuguese - English

Future updates will be announced.

The electronic machine translation dictionaries are based on an Open Lexicon Interchange Format (OLIF) 

The application is run on translation engines and the product has been custom-made for patent documents by using the IPC classification symbols to identify the technical context. The subfolders are organised by IPC subclasses.

The basis for the creation of the dictionaries and for training the translation engines, were the large collections of  European patents that have been translated into the languages of the National Offices. The size of  each of these corpora varies between 40.000 and 600.000 documents, depending on the existence of translated documents.  

Below are some examples, extracted from the new OLIF structure and content manual, relating to different spellings of German words.

Attribute Description Example
german-1 Match vowels to stem Schänke/Schenke
german-2 "selbstständig" instead of "selbständig" unselbstständig/unselbständig
german-3 German spelling of non-German words Soße/Sauce
german-4 Write "f" instead of "ph" Fantasie/Phantasie
german-5 Write "r" instead of "rh" Katarr/Katarrh
german-6 Write "t" instead of "th" Tunfisch/Thunfisch
german-7 Write "zi" instead of "ti" differenziell/differentiell
german-8 Plural "ices" instead of "izes" Indices/Indizes
german-9 New spelling of non-German words Campagne/Kampagne
german-10 Repeat three letters without a hyphen Schifffahrt/Schiff-Fahrt
german-11 Write preposition and "weak" noun as two words im Stande/imstande
german-12 Write "nicht" in compound adjectives as a separate word nicht öffentlich/nichtöffentlich
german-13 Write "rein" in compound adjectives as a separate word rein seiden/reinseiden
german-14 Write "wohl" in compound adjectives as a separate word wohl tuend/wohltuend
german-15 Write non-German words with multiple parts as a single word Bluejeans/Blue Jeans
german-16 Write non-German words with multiple parts with a hyphen Fall-out/Fallout
un unspecified  

The European Patent Office Machine Translation Program (EMTP) uses a hybrid approach. In addition to the statistical method which makes use of the classifications allocated to each document, linguistic rules have also been applied, e.g. a rule for distinguishing the various parts of speech, i.e. nouns, adjectives and verbs.

Quick Navigation