|European Case Law Identifier:||ECLI:EP:BA:2005:T108303.20051019|
|Date of decision:||19 October 2005|
|Case number:||T 1083/03|
|IPC class:||G10L 3/00|
|Language of proceedings:||EN|
|Download and more information:||
|Title of application:||Voice activity detector|
|Applicant name:||LG Electronics Inc.|
|Opponent name:||01. Telefonaktiebolaget LM Ericsson (publ)
02. Siemens AG
|Relevant legal provisions:||
|Keywords:||Inventive step (yes)|
Summary of Facts and Submissions
I. The appeal was lodged by the opponent 01 (appellant) against the decision of the opposition division, dispatched on 8 September 2003, rejecting the oppositions of opponents 01 and 02 against European patent No. 0 719 439. The notice of appeal was received on 16 September 2003, the appeal fee being paid on the same day, and the statement setting out the grounds of appeal was received on 7 November 2003. In its response, the respondent contested the admissibility of the appeal (and of the opposition) since the appellant/opponent 01 had not been identified as required by Rule 64(a) EPC.
II. The opposition of opponent 01 had been filed against the patent as a whole based on Article 100(a) EPC for lack of inventive step (Articles 52(1) and 56 EPC).
III. Oral proceedings were held on 19 October 2005, attended by the appellant and the patentee (respondent).
IV. The appellant requested that the decision under appeal be set aside and the patent revoked.
Furthermore, apportionment of costs incurred due to the respondent's objection as to the admissibility of the appeal and the opposition, withdrawn only in the course of the oral proceedings, was requested.
V. The respondent requested that the appeal be dismissed and the patent maintained as granted (main request), or the patent be maintained in amended form in accordance with the first to fifth auxiliary request filed with letter of 14 May 2004.
Moreover, apportionment of costs incurred due to the appellant's late submission of evidence as to the identity of the appellant and opponent 01 only in the oral proceedings, as well as due to the late filing of an expert opinion, was requested.
VI. Reference was made to the following prior art documents:
E1: EP-A-0 548 054
E2: EP-A-0 392 412
S5: US-A-4 689 760
VII. Independent claim 1 of the patent as granted reads as follows:
"1. A voice activity detector for detecting the presence of speech in an input signal, comprising
(a) means for storing an estimate of the noise component of an input signal;
(b) means for recognising the spectral similarity of the input signal and the stored estimate to produce an output decision signal;
(c) means for updating the stored estimate;
(d) an auxiliary detector arranged to control the updating means so that updating occurs only when speech is indicated by the auxiliary detector to be absent from the input signal;
characterised by means operable to calculate a prediction gain parameter for the input signal and modifying means arranged to suppress updating in the event that the prediction gain exceeds a threshold value".
The independent claims of the auxiliary requests contain further limitations.
VIII. The appellant argued that the subject-matter of claim 1 as granted lacked an inventive step having regard to document E1, providing the closest prior art, and any one of documents E2 and S5, both suggesting the claimed solution. In particular in document E2 it was already suggested to use the prediction gain itself for discriminating between noise and speech. Accordingly, it would have been obvious to the skilled person, in order to improve the noise model update accuracy in the voice activity detector of document E1, to add a further branch to the noise model update circuitry of document E1. Similarly, document S5 already taught that the prediction gain was suitable for discriminating between noise, tones and speech. Accordingly, the skilled person looking for a detection criterion to discriminate between noise and tones would include a further detector based on the prediction gain to the detector of document E1.
IX. The respondent submitted that, although it was agreed that document E1 disclosed a voice activity detector according to the preamble of claim 1 as granted, neither document E2 nor document S5 rendered the claimed solution obvious. Document E2 only considered the prediction gain as parameter in combination with the prediction gain deviation, the signal power and the zero crossings number. There was no hint that the prediction gain alone as parameter would be suitable. Document S5 was irrelevant for the problem of noise model update accuracy in a voice activity detector as it was concerned with another problem, namely the reliable detection of DTMF tones. Since the problem of signalling tones interfering with the noise model update in voice activity detectors was unknown so far, the idea of detecting tones in a voice activity detector would only be arrived at with hindsight.
Reasons for the Decision
1. Admissibility of the appeal
Concerning the identification of the appellant required by Rule 64(a) EPC the following has to be considered. The opposition of the opponent 01 was filed in the name of LM Ericsson with an address at 126 25 Stockholm, Sweden. The professional representative appointed was from the firm Hoffmann Eitle in Munich, Mr V. Frank. In a subsequent submission of the representative within the time limit for filing opposition, the opponent 01 was identified under the name Telefonaktiebolaget LM Ericsson (publ). In the board's opinion the opponent 01 was therefore identified in a sufficiently clear manner under the latter name. Indeed, the identity of the opponent 01 had not been questioned in the first instance opposition procedure.
The appeal was filed in the name of LM Ericsson, identified as opponent 01, by the same professional representative, Mr Frank. The appellant's address was not given.
The requirements of Rule 64(a) EPC are met if the notice of appeal provides sufficient information to identify the appellant and his address. It is established case law (see Case Law of the Board of Appeal, 4th edition, VII.D.7.4.1), that an appellant and its address are sufficiently identified if, in the notice of appeal, the number of the contested patent and the name and address of the professional representative were the same as those cited in previous proceedings and the appellant was referred to as opponent in those proceedings. These conditions are fulfilled in the present case.
According to the appellant's submission of 4 August 2004, its address had changed to Telefonaktiebolaget LM Ericsson (publ), SE- 16483, Stockholm, Sweden. At the oral proceedings the appellant filed evidence confirming this change. Thereupon, the respondent no longer maintained the objection concerning the admissibility of the opposition and the appeal.
In view of the above, the appeal is admissible.
2. Expert opinion
With letter of 30 September 2005 an expert opinion by Prof. A. Kondoz was submitted by the appellant dealing with some outstanding technical questions in the case in suit and providing an opinion on the issue of inventive step. Since the opinion was filed at a late stage of the proceedings and since it was not considered by the board to be essential to the decision to be taken, the appellant's representative at any rate still having sufficient opportunity to present its view on these technical questions and the issue of inventive step in the oral proceedings, the written opinion was not admitted in the proceedings in accordance with Article 114(2) EPC, as requested by the respondent.
3. Main request
As far as the patentee's main request is concerned, the only contentious issue between the parties is whether the subject-matter of claim 1 as granted involves an inventive step (Articles 52(1), 56, 100(a) EPC).
3.1 It is undisputed that the subject-matter of the preamble of claim 1 of the patent as granted is known from document E1, providing the closest prior art.
In particular, document E1 (see figure 3 and corresponding description) discloses in the terms of claim 1 in suit:
a voice activity detector for detecting the presence of speech in an input signal, comprising
(a) means (15) for storing an estimate of the noise component of an input signal;
(b) means (7) for recognising the spectral similarity of the input signal and the stored estimate to produce an output decision signal;
(c) means for updating the stored estimate; and
(d) an auxiliary detector (20) arranged to control the updating means so that updating occurs only when speech is indicated by the auxiliary detector to be absent from the input signal.
3.2 The features of the characterising part of claim 1, providing means operable to calculate a prediction gain parameter for the input signal and modifying means arranged to suppress updating in the event that the prediction gain exceeds a threshold value, are not known from document E1. Accordingly, novelty of the subject-matter of claim 1 over document E1 is indeed provided.
These features, providing the difference over document E1, have the effect of avoiding the update of the noise model in case the prediction gain of the input signal exceeds a given threshold value. Accordingly, the objective problem to be solved in the present case having regard to E1 may be defined as (further) improving the accuracy of the noise model update.
The formulation of the objective problem to be solved as identified above must be considered obvious, since as such increasing the accuracy of the noise model update is already addressed in document E1 (see page 8, lines 38 to 43), further improvements hereon being obviously desirable. In particular, the device of document E1 relies on the noise model for comparison with the input signal being a fair representation of just noise for which eg the transmission in a mobile phone system is unwanted. Although noise model updating is necessary in order for the device to adapt to different noise environments, inaccuracies in the updating process, resulting in the noise model containing not just noise, lead to the device going "out of lock" and wrongly identifying following input signal frames.
According to the appellant, the objective problem to be solved rather resides in providing a measure for discriminating between tone-based signals and noise. However, in view of the above this problem is considered to be overly specific. In fact, claim 1 is not limited to the use of prediction gain as a discriminating parameter for tones.
3.3 Document E2, like the patent in suit, generally relates to apparatuses for increasing the efficiency of the transmission of speech data by suppressing the transmission during silent intervals only containing noise (see page 3, lines 3 to 11). The object of document E2 is to accurately discriminate between noise and speech intervals, so that the transmission is only suppressed for frames containing just noise. Previously used discrimination between noise and speech based on signal power and zero crossing number only is found to be inadequate to accurately detect the beginning and end of speech (see page 3, lines 36 to 39). Accordingly, it is suggested to base the discrimination between noise and speed on the prediction gain as an additional parameter and possibly on the absolute value of the prediction gain itself. In particular, in a fourth embodiment of document E2 (see page 8, line 22 to page 9, line 42 and figures 13, 14A, 14B) four parameters, input signal power, zero crossing number, prediction gain and prediction gain deviation, are used for discriminating between noise ("silent" frames) and speech ("voiced" frames). For frames with low signal power (P<Pth) and low zero crossing number (Z<Zth), and which show a small prediction gain deviation with respect to the previous frame (D<Dth), the absolute value of the prediction gain G is considered. Where the prediction gain is low (0<=|G|<=Gth) the frame is identified as containing noise. Where it exceeds the threshold value Gth, however, it is rated as either noise or speech depending on the state of the previous frame and the signal power. As an alternative, it is possible to first discriminate the speech/noise state from the prediction gain and then discriminate the speech/noise state from the prediction gain deviation when the speech state is discriminated by the first discrimination (see page 9, lines 36 to 38). In addition, it is not considered essential to use the four parameters (input voice signal power, zero crossing number, prediction gain and prediction gain deviation) for making the voice detection in the fourth embodiment. In particular, it is stated that, for example, only one of the input voice signal power and the zero crossing number may be used in a modification of the fourth embodiment.
Although document E2 states that "the absolute value of the prediction gain itself has a large value for the voiced signal and a small value for the noise" (see page 9, lines 24 to 25), it is apparent from the above embodiments where the prediction gain is used as a parameter, that a high prediction gain is not considered conclusive for discriminating between noise and speech. In particular, as follows eg from figures 14 A and B of E2 showing a discriminating operation as a whole, the absolute value of the prediction gain appears as one of a sequence of parameters to be tested in a certain chronological order as defined by the flow charts of these figures. Thus, the statement referred to above has to be read in this context and cannot be transferred in isolation into the context of document E1 without hindsight. Accordingly, in the board's opinion the skilled person would not be taught by this document to use a high prediction gain as criterion for suppressing the noise model update in the device of document E1.
This is not altered by the further statement in document E2 that generally the prediction gain has a large value when the input voice frame is voiced (ie speech) and a small value when the input voice frame is silent such as in the case of noise (see page 7, lines 53 to 55), since this statement is merely provided to explain the behaviour of the prediction gain deviation at the transition between noise and speech in the context of a third embodiment (see page 7, line 13 to page 8, line 21 and figures 11, 12) in which the prediction gain itself is not used as a discrimination parameter, at all.
3.4 Document S5 discloses an apparatus for decoding dialling tones such as DTMF (Touchtone) tones from an incoming signal and, in particular, for reliably discriminating the dialling tones from speech and noise. The incoming signal is subjected to an LPC analysis in which PARCOR reflection coefficients RC(i), inverse LPC filter coefficients ai and residual energy coefficients IEG(i) are determined. The residual energy coefficients IEG(i) for i=1 to P (calculated by iteration from the normalised residual energy EN(1)) (see column 8, lines 7 to 64 and figure 5) represent the energy of the (normalised) residual/error signal at the output of the inverse filter, where P is the order of the LPC filter. In a preliminary test the coherency of the input signal is checked by considering the ratio IEG(0)/IEG(p), which corresponds to ratio of the power of the input signal to the power of the output signal of the inverse LPC filter of order P and thus to the LPC prediction gain (see also column 6, line 62 to column 7, line 15 and figure 4). For silence (ie noise) the ratio IEG(0)/IEG(P=6) is small (<10 dB), for speech it is larger (10-15 dB) and pure tones produce a very high ratio (>= 25 dB) (see column 10, lines 4 to 10). If the ratio is below 15 dB the presence of dialling tones in the input signal frame is ruled out and no further analysis is performed (see figure 2 and corresponding description). Accordingly, although in document S5 the LPC prediction gain is used as a parameter, it is merely used to discriminate tones from noise and speech and not for discriminating noise from speech for the purpose of noise model update. Accordingly, in the opinion of the board the teaching of document S5 cannot have rendered the claimed solution obvious.
3.5 In view of the above, the subject-matter of claim 1 as granted is considered to involve an inventive step (Articles 52(1), 56, 100(a) EPC).
Claims 2 to 8 as granted are dependent on claim 1 and define additional features of the voice activity detector. The subject-matter of these claims, thus, involves an inventive step as well.
4. Apportionment of costs
The appellant requested that the costs incurred in respect of the respondent's objection as to the admissibility of the appeal be apportioned to the respondent (see point IV, supra).
The respondent requested that the costs incurred due to the belated submission by the appellant in the oral proceedings only of the decisive evidence and arguments as to the identity of the appellant and opponent in respect of the issue of the admissibility of the appeal and opposition (see point 1, supra) be borne by the appellant. Furthermore, apportionment of the costs incurred due to the late filing by the appellant of the expert opinion was requested.
In accordance with Article 104(1) EPC, each party to the proceedings shall meet the costs it has incurred unless, for reasons of equity, a different apportionment of costs would be appropriate. In the board's opinion, however, no undue burden was placed on either party in the present case, so that there is no reason to order a different apportionment.
For these reasons it is decided that:
1. The appeal is dismissed.
2. The requests for apportionment of costs are rejected.