T 1245/20 (Identifying an application type of unknown data/MAGNET FORENSICS) 03-03-2023
Download and more information:
METHODS, SYSTEMS, AND DEVICES FOR IDENTIFYING AN APPLICATION TYPE OF UNKNOWN DATA
Amendment after summons - sole request
Amendment after summons - admissible
Inventive step - sole request
Inventive step - no technical effect over the whole scope of claim
The board is not convinced that there is a credible increase in the speed of the mapping of the retrieved data to a particular application type over the whole scope of the claim since a plurality of associations/catalogues are implicitly needed in the method in claim 1 compared to the single catalogue used in the acknowledged prior art, and the claim does not specify any implementation for which a speed can be determined over the whole scope of the claim.
In the current case, the results of the acknowledged prior art appears to be quite different from that of the invention. The method of claim 1 attempts to identify applications for a database, whereas the prior art attempts to identify data formats for all kinds of applications, i.e. not limited to applications using databases comprising tables. Therefore a speed comparison with the speed of the mapping achieved by the prior art is not meaningful.
One question that arises is whether the method of claim 1 has the potential to cause technical effects. But the mapping and display of the data record in a respective column of the user interface resulting from the claimed method is not specifically adapted for any technical use (see G 1/19, point 94). Since the board does not see any technical effect from the implementation of the claimed method in a computer system derivable over the whole scope of the claim, the claimed subject-matter does not achieve a technical effect over the prior art acknowledged in the application.
I. The appellant (applicant) appealed against the examining division's decision to refuse European patent application No. 14848057.7 (published as WO 2015/042719 A1).
II. The documents cited in the contested decision included:
D1: US 2008/0140692 A1, published on 12 June 2008
D5: WO 2005/057891 A1, published on 23 June 2005
III. The examining division decided that the subject-matter of the independent claims of the main request and the first to fifth auxiliary requests lacked an inventive step over a general purpose computer system or the disclosure of documents D1 or D5, contrary to the requirements of Article 56 EPC.
IV. In its statement of grounds of appeal, the appellant requested that the contested decision be set aside and that a patent be granted on the basis of a main request or a first auxiliary request, both requests corresponding to the main request in the contested decision, or a second to sixth auxiliary requests corresponding to the first to fifth auxiliary requests in the contested decision. All requests were resubmitted with the statement of grounds of appeal. Amended pages 2 and 3 of the description of the application were also submitted with the statement of grounds of appeal.
V. In a communication accompanying the summons to oral proceedings, the board expressed its preliminary view that claim 1 of the main request and the first and second auxiliary requests was not clear under Article 84 EPC.
Concerning claim 1 of the first auxiliary request (the wording of which was closer to claim 1 of the main request considered in the contested decision than the wording of claim 1 of the main request submitted with the statement of grounds), the board stated that while the method used by the examining division to assess an invention comprising a mixture of technical and non-technical features appeared to be correct, the assessment of which features were technical appeared to be arbitrary.
Since it was questionable that any technical effect over the whole scope of the claim, as required by the decision of the Enlarged Board of Appeal G 1/19, was achieved by the method of claim 1 over the prior art disclosed in document D1, the board was of the preliminary opinion that the method of claim 1 might not be inventive over the disclosure of document D1.
The board additionally considered that when starting from the acknowledged prior art cited in paragraphs [0035] to [0036] of the description of the application, there might also be no technical effect derivable over the whole scope of claim 1.
The board concurred in essence with the analysis of document D5 performed by the examining division and was of the preliminary opinion that the subject-matter of claim 1 of the main request and the first to sixth auxiliary requests was not inventive over the disclosure of document D5.
VI. With a letter dated 3 February 2023, the appellant filed a new main request to replace all previous requests on file and "be the only request on file should its submission be allowed by the Board of Appeal". It also submitted further arguments on clarity and inventive step in view of decision G 1/19.
VII. During the oral proceedings, the board admitted the new main request into the proceedings. The new main request became the sole request on file.
VIII. The appellant requested that the decision under appeal be set aside and that a patent be granted on the basis of the new main request filed with the letter of 3 February 2023.
IX. Claim 1 of the sole request on file reads as follows (feature labelling (A) to (F) added by the board):
"A computer-implemented data analysis method (200) of displaying data with unknown application type recovered from one or more addresses marked as unallocated on a storage device, the method comprising:
(A) recovering data with unknown application type from one or more addresses marked as unallocated on a storage device;
(B) determining (225) that the recovered data corresponds to database information by analyzing column identifiers of an unknown database of the recovered data to determine that the column identifiers of the unknown database correspond to a keyword of a particular application-type, the database information of the recovered data comprising at least one table with at least one column;
(C) for a column of a table in the database information, determining (250) if a column identifier of the column comprises a keyword associated with a data field that is commonly used by an application of the particular application type;
(D) if the column identifier comprises the keyword, identifying (265) data stored in the database as belonging to an application that is of the particular application type;
(E) storing a mapping (270) between the data field and the column, the mapping being accessible for recovery of data in the database to indicate that data for the column in the table is associated with the data field; and
(F) displaying a data record of the database information in a preview section (560) of a user interface (500), wherein the data record is displayed in a respective column of the user interface based on the stored mapping of the stored data field of the data
record to the stored column."
X. The appellant's arguments relevant to the decision are discussed in detail below.
The application
1. The application relates to methods of identifying an application type of unknown data that may be encountered during a data recovery process (paragraph [0035]).
2. Certain keywords are commonly used by applications of a particular application type as identifiers for columns of a table stored in a database: for example, the keywords "author" or "from" are commonly used in chat or instant messaging (IM) applications to identify a "sender" data field for chat messages stored in the application (paragraphs [0012] to [0017], [0038], [0039] and [0068]).
3. A given keyword used to identify data stored in a database as belonging to an application may be associated with a data field commonly used by an application of a particular application type. For example, the keywords "message", "subject", "text", "msg", "body" and "content" may all be commonly used to identify a data field for the substance or "content" of the message in an application of the chat / IM application type (paragraph [0072]).
4. If it is determined that a column identifier includes a keyword associated with an application type, a mapping is stored between the column and a data field that is commonly used by applications of the application type.
5. The mapping may be subsequently updated through a user interface. The mapping user interface may display a list of the located database tables and mappings of columns of such tables to the commonly used data fields for a given application type to allow user input for final verification or remapping if necessary (paragraphs [0043], [0057] and [0085]).
The board notes, however, that this updating step is not present in claim 1 of the sole request.
Sole request - admissibility
Claim 1 of the sole request comprises an amendment for clarifying step B following the clarity objection raised by the board in point 12 of its communication, an amendment for clarifying step C following the clarity objection raised by the board in point 13 of its communication, an amendment for clarifying step E following the clarity objection raised by the board in point 14 of its communication and an amendment for clarifying step F following the clarity objection raised by the board in point 15 of its communication.
The board decided during the oral proceedings to admit this sole request (Article 13(2) RPBA 2020) since this request was filed at the earliest possible opportunity as a direct response to the fresh clarity objections raised by the board.
Sole request - lack of inventive step over the acknowledged prior art disclosed in the application
6. Acknowledged prior art disclosed in the application
6.1 In the traditional data recovery processes disclosed in paragraphs [0035] to [0036] of the description of the application, a catalogue of application data indicates the data format of data stored by a given application on a storage device. When unknown data is encountered, this catalogue is referenced to determine if the unknown data matches the data formats indicative of a particular application. If so, the unknown data is processed according to the identified application.
6.1.1 To illustrate this, the board provides an example of the catalogue, albeit in a schematic view:
"application 1" is associated with "data format 1"
"application 2" is associated with "data format 2"
"application 3" is associated with "data format 3"
"application N" is associated with "data format N"
The board notes that it is not excluded that the same data format is associated with a plurality of applications or that an application is associated with a plurality of data formats.
6.2 The above cited passage of the application also discloses that these traditional processes may be inefficient because they require analysis of the data format ("structure") stored by an application before data associated with the application can be recovered. However, the ever-growing number of applications makes analysing the data format of each application difficult. This results in data formats for many applications not being analysed. If data stored by these applications is encountered during data recovery, the traditional data recovery processes may not be able to adequately recover the data.
6.2.1 In other words, a new application X having a new data format X might not yet have been listed in the catalogue. In this case, recovered data having the data format X cannot be identified as having the type of, or having to be processed by, this new application X.
7. Claim 1 of the sole request
7.1 Introductory part of claim 1
7.1.1 Claim 1 defines a computer-implemented data analysis method (200) of displaying data with an unknown application type recovered from one or more addresses marked as unallocated on a storage device. The recovering step is done by step (A) (see point 7.2 below). The displaying step is performed at step (F) (see point 7.6 below).
7.2 Step (A)
7.2.1 Claim 1 defines a first step (A) of recovering data with an unknown application type from one or more addresses marked as "unallocated" on a storage device.
7.2.2 This step is usually performed in a forensics context in an attempt to analyse data intended to have been deleted from the storage device. A user may have used the "delete file" function of an operating system to delete a file, but the deleted file may nevertheless still be recoverable because many operating systems and device driver software do not physically delete the data from the storage device immediately. Instead, the addresses on the device that stores the data may simply be marked as "unallocated" or "available". However, since this data flagged to be deleted may not yet be overwritten, it is possible that it remains physically undeleted from the storage device for an extended period of time (paragraph [0046]).
7.2.3 During the oral proceedings, the board stated that the fact that addresses flagged as "unallocated" or "available" might still contain recoverable data (intended to be deleted) was well-known to the skilled person. The appellant did not contest this statement.
Rather, in its statement of grounds of appeal, on page 8, the appellant admitted that features corresponding to the introductory part of claim 1 and step A were known from the prior art disclosed in the application as filed.
7.2.4 Thus, step (A) cannot provide an inventive step.
7.3 Step (B)
7.3.1 In step (B) of claim 1, it is stipulated that the "database information" of the recovered data comprises "at least one table with at least one column". Column identifiers of the unknown database of the recovered data are analysed "to determine that the recovered data corresponds to database information". The board notes that claim 1 does not define how the column identifiers are recognised or found if the database from which the recovered data originates is "unknown".
7.3.2 The board further notes that in claim 1, a reference sign "225" is associated to this step. Paragraph [0060] stipulates that, at step 225, a determination is made as to whether the file corresponds to database information by, for example, reading the header information of the file to determine if it contains information indicating that it is of a "known" database type. For example, this step checks for the string "SQLite format 3" in the header information of the file as SQLite database files typically include the text "SQLite format 3" in their headers. This paragraph also states that, in some cases, the information indicating that a file is a database may not necessarily be within the header portion of a file but instead could be located in other parts of the data associated with the file like in the file extension of a file (e.g., the file extensions ".mdb", ".mda", or ".accdb", indicating that the given file is a Microsoft**TM Access**TM database).
7.3.3 Thus, after having found, in a manner undefined in claim 1, column identifiers ("col_1", "col_2", "col_3", .., "col_N") of an unknown database DBj, these column identifiers are analysed, and it is determined whether they correspond to database information.
Thus, some column identifiers correspond to some database information, and this correspondence appears to be pre-defined, this being similar to the associations, in the catalogue of the prior art disclosed in the application, of applications to respective data formats illustrated in point 6.1.1. Analysing the column identifiers by reading them and looking at a pre-defined correspondence between the column identifiers and an associated database to determine the associated database cannot be considered to involve an inventive step. It is, rather, based on an analysis of data formats by a software engineer or programmer, which is in essence a mental act and thus a non-technical activity not contributing to a technical effect.
Thus, step (B) cannot provide an inventive step either.
7.4 Steps (C) and (D)
7.4.1 Step (C) is a further step of determining if a column identifier of the column comprises a keyword associated with a data field commonly used by an application of the particular application type.
7.4.2 There is a first catalogue of correspondences between a keyword of a particular application type and the (plurality of) column identifiers (see previous step (B)).
7.4.3 Application types include, for example, a chat or IM application type, a web browser application type, a navigation/geo-location application type, a file sharing application type, a social networking application type, a cloud application type, and an email application type (paragraph [0067]).
7.4.4 The associations of keywords to respective data field(s) commonly used by an application of the application types appear to be pre-determined and are similar to the associations, in the catalogue of the prior art disclosed in the application, of applications to respective data formats illustrated in point 6.1.1.
7.4.5 Thus, a plurality of keywords are associated with a plurality of data fields used by a corresponding application (see paragraphs [0012], [0013] and [0014]; see also paragraphs [0072] and [0073]), and this also corresponds to a second catalogue (see point 6.1.1 above).
7.4.6 The data fields "sender field", "recipient field", "message field" and "timestamp field" are, for example, used by or associated with a "messaging application" (paragraph [0012]).
And a messaging application is, for example, associated with the keywords "message", "subject", "text", "msg", "body", "content", "date", "time", "timestamp", "from", "sender", "author", "uid", "member", "to", "receiver", "conversation", "recipient", "partner", "participant", and "party" (paragraph [0015]).
7.4.7 The data fields "address field", "date field", "bookmark field" and "title field" are, for example, used by or associated with a "web-browser application" (paragraph [0013]).
And a web browser application is, for example, associated with the keywords "address", "location", "loc", "URL", "visited", "date", "bookmark", "favorite" and "title" (paragraph [0016]).
7.4.8 The data fields "longitude field", "latitude field", "destination field", "direction field" and "route field" are, for example, used by or associated with a "geographic location-enabled application" (paragraph [0014]).
And a geographic location-enabled application is, for example, associated with the following keywords: "coordinate", "longitude", "latitude", "location", "loc", "home", "destination", "direction", and "route" (paragraph [0017]; see also paragraph [0068]).
7.4.9 To schematise this step (C), it is determined whether a column identifier "X", for example, comprises a keyword "W" associated with a data field used by an application of the "application type Z".
7.4.10 In step (D), if the column identifier "X", for example, comprises the keyword "W", the data stored in the database is identified as belonging to an application of "application type Z".
7.4.11 Steps C and D alone cannot provide an inventive step since they consist in looking whether a keyword in a table of associations or catalogue is present in the column identifier to implicitly look up in the table of associations or catalogue the data field associated with this keyword and to implicitly look up in another table of associations or catalogue the application associated with this data field. These steps essentially just relate to mere programming based on non-technical considerations of software engineering and do not contribute to a technical effect over the prior art disclosed in the application.
7.5 Step (E)
7.5.1 In step (E), a mapping between the data field and the column is stored. It is accessible for recovery of data in the database.
7.5.2 For example, column identifier "X" is mapped to data field "Y".
7.5.3 This mapping is accessible during recovery and indicates that data for the column having the identifier "X" in the table is associated with the data field "Y".
7.5.4 Step (E) alone cannot provide an inventive step since storing the mapping determined in step (C) and (D) is obvious and would be performed by the skilled person without exercising an inventive step and depending on the circumstances. Moreover, the mapping itself relates just to software engineering and the analysis of data formats in applications, this being non-technical.
7.6 Step (F)
There is a displaying step (F) in a preview section (having reference sign 560) of a user interface (having reference sign 500) of a data record (of the database information) having a data field DFy. The data record is displayed in a column of the user interface "based on the stored mapping of the stored data field of the data record to the stored column".
The board considers that step (F) concerns a mere presentation of information which is non-technical and cannot render claim 1 inventive.
8. The appellant argued that the increased efficiency provided by claim 1 was synergistic in that a greater amount of unallocated data could be processed and displayed more quickly, with quickness corresponding to computer efficiency. It further argued that the features of claim 1 should not be considered in isolation.
9. The appellant argued that the method of claim 1 only needed to analyse the column headers or identifiers of the recovered data; not all the recovered data.
By contrast, in the traditional data recovery processes disclosed in the application, when unknown data was encountered, a catalogue was referenced to determine if the recovered data matched the data formats indicative of a particular application. For this to be achieved, the data format of the entire recovered data had to be matched to a data format of the catalogue.
First, matching the data formats as done in the acknowledged prior art was a slower process than looking for keywords in column identifiers and matching them to a data field associated with a predetermined application type.
Second, where the recovered data did not comprise all the information necessary for determining its data format, the method of the acknowledged prior art would fail. The method of claim 1 would not fail if the column identifiers were recovered and, for the majority of cases, would provide a "correct" mapping of the columns to a corresponding application type.
The technical effects were a "best-case" mapping and a presentation of this mapping in an user interface enabling a user to interact with this mapping.
10. The board does not consider the construction of this mapping, which is performed based on non-technical rather than technical considerations, or the mapping itself, to be technical. For example, the columns of the recovered data are not further processed by launching an application of the mapped application type since, in the method of claim 1, no further step is performed on the displayed data.
11. The board is also not convinced that there is a credible increase in the speed of the mapping of the retrieved data to a particular application type over the whole scope of the claim since a plurality of associations/catalogues are implicitly needed in the method in claim 1 compared to the single catalogue used in the acknowledged prior art, and the claim does not specify any implementation for which a speed can be determined over the whole scope of the claim.
In the current case, the results of the acknowledged prior art appears to be quite different from that of the invention. The method of claim 1 attempts to identify applications for a database, whereas the prior art attempts to identify data formats for all kinds of applications, i.e. not limited to applications using databases comprising tables. Therefore a speed comparison with the speed of the mapping achieved by the prior art is not meaningful.
12. The board is furthermore not convinced that the retrieved association between the recovered data and an application type is always possible in the method of claim 1. For example, where no keywords from the catalogue of keywords and the associated data field (and the catalogue of data fields and the associated application type) are found in the possible column identifiers, the column cannot be associated with any application type. This may arise in the case of recovered data of new application types not yet present in the catalogue. The board is also not convinced that the method of claim 1 yields a more correct or accurate association between the recovered data and an application type. On the contrary, since the method of the acknowledged prior art considers the (whole) data format, it may yield a more correct or accurate association between the recovered data and a particular application/application type.
An incorrect mapping would prevent the launched application, to which the recovered data is incorrectly associated, to read or process the data.
Even if the processing of the data might be considered technical, depending on the circumstances, no processing step is present in claim 1.
13. One question that arises is whether the method of claim 1 has the potential to cause technical effects. But the mapping and display of the data record in a respective column of the user interface resulting from the claimed method is not specifically adapted for any technical use (see G 1/19, point 94). Since the board does not see any technical effect from the implementation of the claimed method in a computer system derivable over the whole scope of the claim, the claimed subject-matter does not achieve a technical effect over the prior art acknowledged in the application.
14. Therefore, the subject-matter of claim 1 of the sole request lacks an inventive step (Article 56 EPC).
For these reasons it is decided that:
The appeal is dismissed.