T 2303/11 (Maintaining data integrity despite track squeeze/DELL) 10-10-2017
Download and more information:
Method for maintaining track data integrity in magnetic disk storage devices
Inventive step - main request and six auxiliary requests (no)
Inventive step - problem invention (no)
I. The appeal lies from the decision of the Examining Division to refuse European patent application No. 05252837.9 for lack of inventive step, Articles 52(1) and 56 EPC, of the subject-matter of all claims of the main request and of the first to fifth auxiliary requests over the following prior-art document D1 in combination with the common general knowledge of the skilled person:
D1: "Prevention of Hard Errors in Magnetic Files Due to Long Term Degradation", IBM Technical Disclosure Bulletin, vol. 29, no. 10, pages 4577 to 4578, New York, published in March 1987.
In the first-instance proceedings, a different printed version of document D1 was used.
The Examining Division cited further documents, including the following:
D5: "Get S.M.A.R.T. for Reliability", Seagate Technology, 1999, www.seagate.com/docs/pdf/whitepaper/enhanced_smart.pdf.
Document D5 was cited to illustrate standard practice in the technical field of the invention.
II. In the statement of grounds of appeal, the appellant requested that the decision be set aside and that a patent be granted on the basis of the main request or of one of the first to fifth auxiliary requests considered in the appealed decision. The appellant filed the version of original document D1 as cited above.
III. In a communication accompanying a summons to oral proceedings, the Board decided to adopt that version of document D1 submitted by the appellant because it corresponded to the original version and had a better text formatting than the one used by the Examining Division. It observed that document D5 had been retrieved from a reliable website and corresponded to the copy archived on 23 April 2003 at http://web.archive.org/web/20030423232421/http://www.seagate.com/docs/pdf/whitepaper/enhanced_smart.pdf.
The Board expressed its preliminary view that claim 1 of the main request did not fulfil the requirements of Articles 84 and 123(2) EPC due to lack of support and added subject-matter. The subject-matter of claim 1 did not seem to be inventive over the disclosure of document D1 in combination with the common general knowledge of the skilled person. Claim 1 of the auxiliary requests also appeared to raise issues with regard to clarity, lack of support and added subject-matter. The subject-matter of those claims did not seem to be inventive over document D1 in combination with the common general knowledge of the skilled person illustrated by document D5.
IV. With a letter of reply the appellant maintained the main request and first to fifth auxiliary requests and filed a new sixth auxiliary request. The appellant informed the Board that it would not be represented at the oral proceedings and requested a decision on the appeal on the basis of its written case. It further requested that, if the Board was of the view that any of the seven requests was allowable, save for the correction of minor deficiencies, the case be remitted to enable those deficiencies to be dealt with before the application was refused.
V. In reaction to the appellant's letter, the Board cancelled the oral proceedings.
VI. The appellant's final requests are that the contested decision be set aside and that
- a patent be granted on the basis of the main request or of one of the first to sixth auxiliary requests; or,
- if only minor deficiencies need to be corrected, the case be remitted to the department of first instance for further prosecution on the basis of one of the main request or first to sixth auxiliary requests to enable those deficiencies to be dealt with.
VII. Claim 1 of the main request reads as follows:
"A method for processing an error condition in a computer system (100) including a mass data storage device (140) which records data in concentric adjacent tracks of an electromagnetic storage media, the method comprising the steps of:
receiving (210, 610, 900) multiple I/O requests over time corresponding to a particular track of the storage media during normal execution of user applications of the computer system over time;
executing (230) the multiple I/O requests corresponding to the particular track over time;
calculating (240, 310, 410) a performance metric regarding the execution of the multiple I/O requests corresponding to the particular track;
comparing (260, 320, 420, 510) the performance metric to a threshold value;
detecting (270, 620) when a track squeeze error condition is likely to start occurring before the error condition has resulted in data loss based on the results of the comparison of the performance metric to the threshold value; and
in response to detecting that a track squeeze error condition is likely to start occurring, rewriting (670) at least the particular track."
Claim 1 of the first auxiliary request differs from that of the main request in that the following features are defined at the end:
"wherein the performance metric comprises at least one of:
a variance in input/output completion time;
a raw read error rate;
one or more of Self-Monitoring, Analysis and Reporting Technology (SMART) counter data values;
sequential read throughput rate;
timing of individual I/O requests; and
indication of timeout of an I/O request."
Claim 1 of the second auxiliary request differs from that of the main request in that the features "calculating ...", "comparing ..." and "detecting ..." are defined as follows:
"calculating (240) an elapsed time to perform an I/O request for the particular track;
comparing (510) the elapsed time to a threshold value;
detecting (270, 620) when a track squeeze error condition is likely to start occurring before the error condition has resulted in data loss based on the results of the comparison of the elapsed time to the threshold value determining that the timing of the I/O request exceeds the threshold value or if a timeout of the I/O request has occurred".
Claim 1 of the third auxiliary request differs from that of the main request in that the features "calculating ...", "comparing ..." and "detecting ..." read as follows:
"computing (240, 410) elapsed time values from a number of consecutive I/O requests for the particular track to obtain a sequential read throughput rate;
comparing (420) the sequential read throughput rate to a threshold setting;
detecting (270, 620) when a track squeeze error condition is likely to start occurring before the error condition has resulted in data loss based on the results of the comparison of the sequential read throughput rate to the threshold setting determining that the sequential read throughput rate is less than an expected throughput by a difference exceeding a predetermined threshold amount".
Claim 1 of the fourth auxiliary request differs from that of the main request in that the features "calculating ...", "comparing ..." and "detecting ..." read as follows:
"calculating (240, 250) a variance in elapsed time required to perform the multiple I/O requests corresponding to the particular track;
comparing (260) the calculated variance to a threshold value;
detecting (270, 620) when a track squeeze error condition is likely to start occurring before the error condition has resulted in data loss based on the results of the comparison of the calculated variance to the threshold value".
Claim 1 of the fifth auxiliary request differs from that of the main request in that the steps "calculating ...", "comparing ..." and "detecting ..." read as follows:
"obtaining (310) counter values including a raw read error rate from a disk drive Self-Monitoring, Analysis and Reporting Technology (SMART) data error detection mechanism;
comparing (320) the raw read error rate to a threshold value;
detecting (270, 620) when a track squeeze error condition is likely to start occurring before the error condition has resulted in data loss based on the results of the comparison of the raw read error rate to the threshold value determining that the raw read error rate exceeds the threshold value".
Claim 1 of the sixth auxiliary request reads as follows:
"A method for processing an error condition in a computer system (100) including a mass data storage device (140) which records data in concentric adjacent tracks of an electromagnetic storage media, the method comprising the steps of:
receiving (210, 610, 900) at least one I/O request corresponding to a particular track of the storage media during normal execution of user applications of the computer system over time;
executing (230) the or each I/O request corresponding to the particular track over time;
calculating (240, 310, 410) a drive performance metric regarding the execution of the or each I/O request corresponding to the particular track;
comparing (260, 320, 420, 510) the drive performance metric to a threshold value;
detecting (270, 620) when a track squeeze error condition is likely to start occurring before the error condition has resulted in unrecoverable data loss based on the results of the comparison of the drive performance metric to the threshold value; and
in response to detecting that a track squeeze error condition is likely to start occurring, rewriting (670) at least the particular track,
wherein the drive performance metric comprises at least one of:
a variance in input/output completion time computed from a number of consecutive I/O requests;
one or more of Self-Monitoring, Analysis and Reporting Technology (SMART) counter data values;
an average I/O throughput computed from a number of consecutive I/O requests; and
timing of individual I/O requests."
VIII. The appellant's arguments relevant to this decision are discussed in detail below.
1. The appeal complies with the provisions referred to in Rule 101 EPC and is therefore admissible.
2. Documents D1 and D5
2.1 In this decision the Board uses the version of prior-art document D1 filed by the appellant and not that cited in the decision under appeal (see also sections I and III above).
2.2 In the light of the facts mentioned in its preliminary opinion (see section III above), the Board is satisfied that the disclosure of document D5 constitutes state of the art within the meaning of Article 54(1) and (2) EPC for the present application. This was not contested by the appellant.
Invention
3. The invention concerns a mechanism that can detect gradual onset errors such as track squeeze in a disk drive and then take corrective action to eliminate the errors, in order to ensure continued service of the disk drive at a good performance level (see paragraphs [0007] and [0008] of the A2 publication).
3.1 Track squeeze occurs especially in very high data density devices used under high loads in server applications, as explained in the following passage of the application (paragraph [0003]):
"It appears when a track on the disk drive is written only rarely, while one or both of the adjacent tracks are written much more frequently. Due to the finite positioning tolerance of the head actuator mechanism, the electromagnetic forces used to effect adjacent track writes intrude to some extent into the rarely written track, causing reduced signal strength of the affected track. This in turn causes data errors during read operations."
Even if error recovery mechanisms can recover the data, track squeeze problems cause performance loss due to the time required by those corrective mechanisms (paragraph [0004]).
3.2 In order to solve those problems, the method of the invention detects when a track squeeze error condition is likely to start occurring during access to a particular identified area and corrects the condition by rewriting at least one track proximate to the particular identified area (original claim 1). In many cases the problem is detected when the original data can still be read. As a result, that original data is used directly to do the repair (paragraph [0008]). The application describes different ways of detecting an impending data error on the basis of difference performance metrics (paragraphs [0023] to [0026], Figures 2 to 5, original claims 2 to 9).
Interpretation of the claims
4. The sixth auxiliary request is based on the first auxiliary request, most of the amendments being directed to overcoming preliminary objections raised by the Board regarding lack of clarity, lack of support by the description and added subject-matter.
The Board agrees that the amendments to "data loss" and "performance metric" clarify those terms in the light of the description. In the inventive-step assessment below, the feature "data loss" of claim 1 of the main request and first to fifth auxiliary requests is therefore interpreted in the light of the respective amended feature of the sixth auxiliary request as "unrecoverable data loss". Similarly, the feature "performance metric" in claim 1 of the main and first auxiliary requests is interpreted as "drive performance metric".
Main request
5. Inventive step - claim 1
5.1 Document D1 discloses a solution to prevent hard errors in magnetic files due to long-term degradation (see title), and explains in particular that if a sector "is never rewritten over the years, the amount of squeeze from each side statistically increases with time" (page 4577, lines 11 to 13 of the last paragraph). Since the magnetic files are stored on a disk including tracks and sectors (page 4578, fifth line, page 4577 first paragraph), it is clear that document D1 refers to an electromagnetic storage medium which records data in concentric adjacent tracks as defined in claim 1.
Document D1 teaches a way of preventing a sector from getting progressively worse, until eventually a "Hard Error" occurs, by rewriting the sector when "some given level of difficulty is encountered in reading" it (page 4577, penultimate line to 4578, sixth line). It therefore discloses "a method for processing an error condition in a computer system including a mass data storage device which records data" as in claim 1.
It is clear from document D1 that the system processes I/O requests during normal execution of user applications. The disclosed method thus comprises steps of receiving and executing over time multiple I/O requests corresponding to a particular track during normal execution of user applications.
As acknowledged by the appellant, prior-art document D1 teaches that a hard error should be avoided by taking action in advance (see e.g. page 4577, penultimate line to page 4578, first paragraph). Document D1 explains that the squeeze errors cause problems, e.g. decreasing signal-to-noise ratio, level of difficulty in reading a sector (page 4577, last 8 lines to page 4578, third line) or low quality of the data (page 4578, first full paragraph), and that by reading the file on a regular basis the degradation can be found at an early stage (page 4578, first paragraph, last sentence).
In the first and second full paragraphs of page 4578, document D1 discloses that "the gradually increasing damage to data can be detected by the requiring of higher levels of recovery procedures" and that "[w]hen a recovery exceeds the selected moderate level in the data recovery procedure, it is determined that the data is of too low a quality to be left in the original condition". The data should then be recovered and rewritten. The method of document D1 thus comprises steps of detecting when a track squeeze error condition is likely to start occurring before the error condition results in unrecoverable data loss, as in the claimed method. Establishing that "a recovery exceeds the selected moderate level" implies the use of a threshold.
Document D1 also refers to the "selection of the appropriate point after which a rewrite should be done" in order to "identify poor sectors while they can still have a high certainty of recovery" (page 4578, first full paragraph). It is therefore clear that in response to detecting that a track squeeze error is likely to occur, the particular track is rewritten.
5.2 Even though some of the performance metrics covered by claim 1 rely on a single I/O request (e.g. timing of individual I/O requests of claim 2), for the sake of argument the Board considers the calculation of the performance metric on the basis of multiple I/O requests to be a distinguishing feature.
Taking that into account, the method of claim 1 differs from that of document D1 in that:
(a) a performance metric is used (instead of a level of recovery);
(b) the performance metric is calculated in terms of the execution of multiple I/O requests and
(c) during normal execution of user applications of the computer system.
5.3 In its reply to the Board's preliminary opinion, the appellant argued that the invention according to each of the requests had the advantage of being independent of any unusual or customised error reporting capabilities in the disk drives and was therefore applicable to all disk drives, as was explained on page 7, lines 10 to 12, of the original description (paragraph [0022] of the publication). The appellant was of the view that the procedure of document D1 should be carried out as part of an existing data recovery procedure, which was implemented as an integral part of the disk drive hardware. Such data recovery procedures were the preserve of electromagnetic storage media, i.e. disk-drive manufacturers, as was exemplified by the disclosure of document D1 originating from such a manufacturer, IBM.
The Board notes however that claim 1 covers embodiments described in the application in which the method is implemented in hardware in a storage controller, as defined in method claim 29 which corresponds to original claim 29. The Board is further of the opinion that document D1 does not disclose that the method has to be implemented in hardware or as part of the data recovery procedure. The skilled person understands that in spite of using information from the data recovery procedure, the steps of detecting the squeeze error condition and rewriting can be implemented separately.
The Board nevertheless concedes that using a performance metric makes the method less dependent on the data recovery procedure of the particular disk. The distinguishing features are therefore considered to solve the problem of implementing the method of document D1 so as to be less dependent on "unusual or customised error reporting capabilities in disk drives".
5.4 Feature (a) is equivalent to, and a minor obvious modification of, the corresponding feature of document D1. In particular, that document describes in several passages the "effects of squeeze" as being those of "degrading performance", "sector degradation", decreased "signal-to-noise ratio", "requiring of higher levels of recovery" and "some level of difficulty" in reading a sector. It hence clearly establishes that the squeeze errors lead to decreased performance in the form of low quality data, increased error rate and longer I/O completion time. In the light of that disclosure, it would be obvious for the skilled person to use a measure of performance instead of the recovery level in order to detect a track squeeze error condition as in feature (a). The skilled person would also recognise that such an option would be less dependent on the error reporting of the disk.
With regard to the question of which performance metric to use, before the priority date of the present application it was standard practice to use time series of performance metrics and their evolution for error prediction. The skilled person would for example be aware of the Self-Monitoring, Analysis and Reporting Technology (SMART) disclosed in document D5, which provides a series of attributes or diagnostics to signal various types of disk-drive failures (page 2, section "S.M.A.R.T Features", page 3, first text paragraph). As acknowledged in the present application, SMART counters were "typically maintained by most modern commodity disk drives" before the priority date (see column 5, line 56 to column 6, line 1 of A publication). The skilled person would therefore consider using one or more of such measurements to implement the method of document D1.
As shown in the non-exhaustive list of SMART attributes of document D5, typical attributes include data throughput performance, seek error rate, and seek time performance (see page 3, section "How Attributes Are Determined"). Some of those attributes (e.g. seek error rate) are statistical measurements based on multiple I/O requests. The skilled person understands that those measurements are also taken during normal execution of user applications. It would be obvious for the skilled person to choose one of those performance metrics, thereby arriving at features (b) and (c).
The appellant did not contest that document D1 was the closest prior art and disclosed features in common with the claimed invention, but argued that it taught away from the claimed invention: it did not disclose or suggest doing anything in response to I/O requests during normal operation in order to determine and correct for track-squeeze errors, but instead proposed a regular scan to identify and correct them.
The Board does not agree with the appellant's argument. Document D1 does not describe the implementation in detail and does not include any passage teaching away from detecting track-squeeze errors during normal operation or from using performance metrics. It is common general knowledge that detection of such errors can be performed by regular scans, during normal operation or both. For that reason, the skilled person does not interpret document D1 as teaching away from error detection during normal operation. The idea of performing regular scans does not contradict using performance metrics either, since regular scans simply guarantee that all scanned tracks are read (or tested) regularly and that the performance metric is regularly calculated for each of the scanned tracks, independently of how often those tracks are accessed during normal operation.
5.5 Citing decision T 2/83 (OJ EPO 1984, 265), the appellant argued that the skilled person could but would not have applied measures to solve the problem by modifying the teaching of document D1, because that document was oblivious to the benefit of a manufacturer-independent implementation to track squeeze and document D5 did not identify such a problem.
The Board does not find this argument persuasive, because searching for solutions which are broadly applicable is a standard design principle, and to improve existing solutions to meet that principle is standard practice. At the date of priority of the present application the skilled reader recognised that the approach of document D1 depended on the error reporting of the disk. The problem of finding an implementation independent of the manufacturer, or of customised error reporting, was therefore not an as yet unrecognised problem which would itself give rise to patentable subject-matter. Besides, document D1 does not teach away from or contradict the idea of improving the method to depend less on the error reporting of the disk (see point 5.3 above). The present case is therefore different from that of T 2/83 (see reasons 6 to 9).
5.6 From the above reasoning, the Board concludes that the subject-matter of claim 1 does not fulfil the requirements of Article 52(1) EPC, because it lacks inventive step within the meaning of Article 56 EPC.
First to sixth auxiliary requests
6. Claim 1 of each of the first to sixth auxiliary requests differs from that of the main request essentially in that it further specifies the performance metric, and with respect to the second to fifth auxiliary requests the detecting step, as follows:
(AR1) the performance metric comprises at least one of
- a variance in I/O completion time;
- a raw read error rate;
- one or more SMART counter data values;
- sequential read throughput rate;
- timing of individual I/O requests; and
- indication of timeout of an I/O request;
(AR2) the elapsed time to perform an I/O request for the particular track is calculated, and it is determined that "the timing of the I/O request exceeds the threshold value or if a timeout of the I/O request has occurred";
(AR3) "the elapsed time values from a number of consecutive I/O requests for the particular track" are computed to obtain "a sequential read throughput rate", and it is determined "that the sequential read throughput rate is less than an expected throughput by a difference exceeding a predetermined threshold amount";
(AR4) "a variance in elapsed time required to perform the multiple I/O requests" is calculated and, in the detecting step, compared with a threshold value;
(AR5) "counter values including a raw read error rate from a disk drive [...] SMART data error detection mechanism" are obtained and it is determined that "the raw read error rate exceeds the threshold value";
(AR6) the performance metric comprises at least one of
- a variance in I/O completion time;
- one or more SMART counter data values;
- an average I/O throughput from consecutive I/O requests;
- timing of individual I/O requests.
7. Inventive step - claim 1 of first to sixth auxiliary requests
7.1 The additional features of the auxiliary requests are not disclosed in document D1.
In its letter of reply to the Board's preliminary opinion, the appellant argued that its reasoning with regard to claim 1 of the sixth auxiliary request equally applied to each of the other requests. The same objective technical problem of manufacturer-independence was solved by claim 1 of each of the requests.
Similarly, the Board is of the opinion that claim 1 of each of the auxiliary requests solves the problem given above with regard to the main request of implementing the method of document D1 so as to be less dependent on customised error reporting capabilities in disk drives.
7.2 As explained with regard to the main request, SMART technology was widely used in disk drives at the priority date of the present application (see also column 5, line 56 to column 6, line 1 of the A publication of the present application). That technology, which is described in document D5, offers the possibility of choosing a series of attributes and thresholds to detect disk-drive failures (page 2, section titled "S.M.A.R.T Features"). It is also clear from document D5 that reliability-prediction technology based on attributes and thresholds was known before SMART (see page 1, section "The Evolution of S.M.A.R.T.").
The Board therefore agrees with the Examining Division that, as illustrated by document D5, at the date of priority of the present application it was common to monitor combinations of operational parameters and use them for failure prediction in hard-disk drives. It was also standard practice to use time series of performance metrics and their evolution for error prediction. In the Board's opinion, it was therefore standard practice to use the performance metrics listed under (AR1) to (AR6) above.
Furthermore, those performance metrics are equivalent or correspond to attributes listed on page 3 of document D5 or to the level of recovery of document D1:
- A variance in I/O completion time (AR1, AR4 and AR6) is related to a seek time performance.
- A raw read error rate (AR1 and AR5) is equivalent to a seek error rate and directly related to the occurrence of some level of recovery of data as used in document D1.
- One or more SMART counter data values (AR1, AR5 and AR6) are clearly disclosed on that page of document D5.
- A sequential read throughput rate or average I/O throughput (AR1, AR3 and AR6) is equivalent to a data throughput performance measure.
- Each of the performance metrics of (AR1), (AR2) and (AR6) - timing of individual I/O requests, elapsed time of I/O request and indication of timeout of an I/O request - is closely related to the occurrence of a high level of recovery as used in document D1.
Faced with the above-mentioned problem, the skilled person would hence consider using one or more of those attributes and respective thresholds commonly used for failure prediction in disk drives.
Furthermore, the skilled person would recognise that even though the SMART attributes can be customised for specific drive models (see e.g. D5, page 2, second full text paragraph), they are widely used by most modern disks (as the application explains) and less dependent on the error reporting scheme of the disk drive than the solution of document D1. The skilled person would therefore also consider directly using SMART counters.
7.3 With the grounds of appeal, the appellant argued that since document D1 was concerned with performing a regular scan, while the attributes disclosed in document D5 were disclosed as being part of a process to indicate when to start a backup procedure, it would not be obvious for the skilled person to combine any of the particular performance metrics as defined in document D5 with the process disclosed in document D1 to arrive at the invention claimed in any of the first to fifth auxiliary requests.
The Board does not find that argument convincing, because the adoption of SMART is not restricted to failures for which the corrective action is a backup procedure. In particular, it is clear from document D5 that SMART technology can be used for different types of predictable or unpredictable failures requiring different types of corrective actions (page 2, last five paragraphs). Furthermore, as explained for the main request, it would not be contrary to the principles of document D1 to modify the method to use performance metrics and detection during normal operation. The skilled person would be aware of the possibilities of performing error detection during normal execution in the place of, or additionally to, regular scans. The advantages and disadvantages of both approaches were well known at the priority date.
7.4 From the above, the Board concludes that the person skilled in the art would, without the exercise of inventive skills, arrive at the invention of claim 1 according to each of the auxiliary requests. Consequently, none of the first to sixth auxiliary requests complies with Articles 52(1) and 56 EPC.
Conclusion
8. Since the subject-matter of claim 1 of each of the main request and first to sixth auxiliary requests is not inventive, the appellant's requests that the contested decision be set aside and that the case be remitted to the department of first instance with the order to grant, or for further prosecution on the basis of one of the requests, have to be refused. Rather the Board concludes that none of the requests can serve as a basis for the grant of a patent and consequently that the appeal is to be dismissed.
For these reasons it is decided that:
The appeal is dismissed.