EPO user consultation on the Recommended WIPO Standard ST.26 for the Presentation of Nucleotide and Amino Acid Sequence Listings Using XML (eXtensible Markup Language)

I. Summary

Patent applications that contain disclosures of nucleotide and/or amino acid sequences must contain sequence information in a separate part of the disclosure in a specified manner (currently defined by WIPO Standard ST.25). A revised standard for the presentation of nucleotide and amino acid sequence listings (WIPO Standard ST.26) was adopted in March 2016 by the Committee on WIPO Standards (CWS); its implementation is foreseen for next year (2017).  Since the adoption of WIPO Standard ST.26, further work has been carried out to improve its effectiveness once implemented. The European Patent Office (EPO) invites users to provide their feedback on the documents published together with this announcement.

II. Background information

Applicants are currently required to submit biological sequence information in a standardised electronic format in accordance with WIPO Standard ST.25, both within the framework of the Patent Cooperation Treaty (PCT) (Annex C of the Administrative Instructions) and under the EPC. WIPO Standard ST.25 was implemented at the EPO as of 1 January 1999.

WIPO Standard ST.25, which was adopted in 1998 and has not been revised since that time, requires a flat file structure of numeric identifiers using a limited set of character codes. In October 2010, the CWS established a Task Force, designating the EPO as the Task Force Leader, to draft a revised standard (WIPO Standard ST.26) for the filing of nucleotide and/or amino acid sequence listings in XML format. A first user consultation was conducted on a draft version of WIPO Standard ST.26 in 2012, following which the draft was further amended by the Task Force in response to comments received.  WIPO Standard ST.26 was formally adopted by the CWS in March 2016; its implementation was however postponed until the recommendations for the transition from WIPO Standard ST.25 to WIPO Standard ST.26 would be agreed on by the CWS at its next session to be held in 2017. Meanwhile, WIPO Standard ST.25 should continue to be used. WIPO Standard ST.26, as adopted in March 2016, is composed of six documents, namely, the main body of the standard, a first annex setting forth the controlled vocabulary for use with the sequence part of the standard, a second annex setting forth the Document Type Definition (DTD) for the standard, a third annex containing a sequence listing specimen, a fourth annex setting forth the character subset from the Unicode Basic Latin Code Table, and a fifth annex setting forth additional data exchange requirements for patent offices.

Since the adoption of the new Standard, the main body, the controlled vocabulary, and the DTD have been further amended and updated. In addition, a guidance document has been included as Annex VI.

III. Request for comments

The EPO invites users to provide their feedback on the documents published together with this announcement, as revised subsequent to the adoption of the Standard in March 2016.

Users' comments would be particularly appreciated on the following:

a) WIPO Standard ST.26 Main Body

Since the first consultation, the main body of WIPO Standard ST.26 has been revised, inter alia, to define a ‘‘nucleotide'' to include nucleotide analogues and to provide further guidance on representation of nucleotide analogue sequences and variant sequences that have been disclosed as a single sequence with enumerated alternative variant residues at one or more positions. The EPO invites users to comment on the revised text of the main body and, in particular welcomes any suggestions to add details or clarify the language as appropriate.

b) Guidance Document

One goal of the development of a WIPO Standard for sequence listings is to allow patent applicants to draw up a single sequence listing in a patent application that would be acceptable for the purposes of both international and national or regional prosecution worldwide. Any new standard should represent the maximum requirements for any sequence listing submission.

The purpose of the guidance document is to facilitate applicants' and patent offices' understanding of the requirements for inclusion and representation of sequence disclosures.

The guidance document is composed of an introduction, examples, and a sequence listing in XML demonstrating representation of the exemplified sequences. The introduction defines the terminology used in the document and discusses the questions raised for each example, namely, whether inclusion is required for a particular disclosed sequence, if inclusion of the sequence is permitted when it is not required, and the appropriate means of representation of sequences included in a sequence listing. Examples were chosen to illustrate various paragraphs of the main body and include 22 involving nucleotide sequences and 19 involving amino acid sequences. It is envisioned that the guidance document would be updated as necessary to include further examples to keep pace with technological advances.

Users are invited to comment on whether the guidance document is sufficiently comprehensive and clear, and in particular to provide suggestions to add details or further examples as appropriate.

c) Authoring and Validation Tool

Availability of an authoring tool before the implementation of WIPO Standard ST.26 is key to a successful transition from WIPO Standard ST.25. As envisioned, the authoring tool should be capable of intake of a sequence listing in WIPO Standard ST.25 format and, with additional input from the applicant, create a sequence listing in WIPO Standard ST.26 format. Unfortunately, direct conversion from one standard to the other is not possible, due to numerous differences between the two standards, including inter alia, the types of required sequences, representation and annotation of the sequences, and sequence data structure.

The authoring tool should also prompt entry of all required data, prevent entry of sequences having fewer than ten specifically defined nucleotides or fewer than four specifically defined amino acids, inform as to the possibility of optional annotations, and allow use of only acceptable values or formats where applicable, thereby enhancing submission quality. A sequence listing in WIPO Standard ST.26 XML format is not as easily human-readable as its ST.25 counterpart; therefore, the tool should also provide a means for easily viewing both the in-progress and completed sequence listing.

Because the authoring tool is expected to prompt entry of all required data and to allow use of only acceptable values or formats where applicable, a certain level of validation occurs as data is entered. The tool is further expected to include a separate validation function for use by both applicants and patent offices.

WIPO Standard ST.25 provides for a single numeric identifier <223> per sequence to contain ‘‘free text'' to describe sequence characteristics using non-language neutral vocabulary. Such ‘‘free text'' is required to be repeated in the main part of the application description in the language thereof in a specific recommended section entitled ‘‘Sequence Listing Free Text.'' Such repetition ensures that any ‘‘free text'' will be translated together with the application description, precluding the need for separate translation of the sequence listing itself. In contrast, WIPO Standard ST.26 allows use of ‘‘free text'' as the value for multiple different annotation qualifiers per sequence, and due to the absence of procedural requirements, repetition in the application is not required, although such a requirement under the PCT and by various patent offices is possible. In WIPO Standard ST.26, ‘‘free text'' is limited to a few short terms indispensable for understanding a characteristic of a sequence, is preferably in the English language, and as part of the sequence data part of the sequence listing, must not exceed 1000 characters composed of printable characters from the Unicode Basic Latin code table. It is expected that most inventors providing sequence information are capable of providing ‘‘free text'' in the English language.

Users are invited to comment on any aspect of the authoring tool, and in particular on whether it is deemed necessary for the authoring tool to include a mechanism for automatic identification and extraction of any ‘‘free text'' from sequence annotations to facilitate inclusion in the application description.

Closing date for comments: 22.1.2017

