T 1629/18 (Face size / Samsung) 15-09-2022
Download and more information:
Apparatus and method for detecting face
Inventive step - main request (no)
Amendment after summons - taken into account (no)
Oral proceedings: request for video conference
I. The appeal is against the decision of the Examining Division to refuse the application.
II. With the grounds of appeal the appellant requested that the decision of the Examining Division be set aside and that a patent be granted on the basis of a single (main) request, which was an amended version of the first auxiliary request underlying the impugned decision.
III. That request was refused for non-compliance with Articles 123(2), 83, 84 and 56 EPC. The decision cited documents:
D1: US 6658136 B1 and
D2: US 2006/126941 A1
IV. In a communication dated 11 November 2021, accompanying a summons to oral proceedings for the date of 8 June 2022, the Board informed the appellant of its provisional opinion that the grounds for refusal under Article 123(2) EPC were overcome by amendment, that the application is compliant with Article 83 EPC, but that the claims lacked clarity (Article 84 EPC) and that the claimed matter lacked inventive step (Article 56 EPC) starting from D1 in combination with common knowledge.
V. On 15 December 2021, the appellant requested "that the oral proceedings be held by video conference due to the ongoing covid pandemic and consequential travel restrictions". The Board informed the appellant on 17 December 2021 that a decision in this respect will be taken early May (2022). On 6 May 2022 the Board informed the appellant that to its knowledge there were no longer any travel restrictions related to the pandemic and that there appeared to be no reasons to change the initially chosen format of in-person oral proceedings. With letter of 1 June 2022 the appellant announced that it would not appear at the oral proceedings, which were subsequently cancelled.
VI. With the submission of 12 May 2022 the appellant filed one new main request, replacing the one filed with the grounds of appeal, and two auxiliary requests.
VII. Claim 1 of the main request defines:
An apparatus for detecting a face (510) from left and right images (503, 504) acquired by a stereo camera (301), comprising:
a first distance image acquiring unit (302) to acquire a first distance image (505) of a scene with only a background, from a left background image (501) and a right background image (502) obtained from the camera;
a second distance image acquiring unit (303) to acquire a second distance image (506) of the same scene but with an object as well as the background, from a left background-and-object image (503) and a right background-and-object image (504) obtained from the camera, the second distance image differing from the first distance image due to the presence of the object;
and an object area detector (306) to detect an object area (509) of the object in one of the background-and-object images;
characterised by a distance difference image acquiring unit (304) to acquire a distance difference image (507) being the difference between the first distance image and the second distance image;
an object mask creator (305) to create an object mask (508) from the distance difference image (507), for use by the object area detector (306) to detect the object area (509) of the object by applying the mask to one of the background-and-object images;
a facial area detector (307) to detect the face (510) in the detected object area (509);
and a search window size determiner (401) to calculate the distance from the camera to the object from the distance difference image (507) and to determine a size for a search window for face detection scanning of the detected object area (509) based on the calculated distance to the object.
VIII. Claim 1 of the first auxiliary request reformulates that of the main request as follows:
An apparatus for detecting a face (510) from left and right images (503, 504) acquired by a stereo camera (301), comprising:
a first distance image acquiring unit (302) to acquire a first distance image (505) of a scene with only a background, from a left background image (501) and a right background image (502) obtained from the camera;
a second distance image acquiring unit (303) to acquire a second distance image (506) of the same scene but with an object as well as the background, from a left background-and-object image (503) and a right background-and-object image (504) obtained from the camera, the second distance image differing from the first distance image due to the presence of the object;
[deleted: and an object area detector (306) to detect an object area (509) of the object in one of the background-and-object images;]
characterised by a distance difference image acquiring unit (304) to acquire a distance difference image (507) being the difference between the first distance image and the second distance image;
an object mask creator (305) to create an object mask (508) from the distance difference image (507), [deleted: for use by the object area detector (306) to detect the object area (509) of the object by applying the mask to one of the background-and-object images];
and an object area detector (306) to detect an object area (509) of the object in one of the background-and-object images;
a facial area detector (307) to detect the face (510) in the detected object area (509);
and a search window size determiner (401) to calculate the distance from the camera to the object from the distance difference image (507) and to determine a size for a search window for face detection scanning of the detected object area (509) based on the calculated distance to the object.
IX. Claim 1 of the second auxiliary request adds to that of the first auxiliary request the following feature:
wherein the facial area detector (307) is configured
to scan the search window of the determined size over the detected object area to detect the facial area (510).
Format of oral proceedings
1. The Board summoned the appellant to oral proceedings in person, which is the default format for oral proceedings (G 1/21 r.45). The appellant requested a change to oral proceedings by video conferencing "due to the ongoing covid pandemic and consequential travel restrictions".
2. The Board communicated around one month before the set date that to its knowledge there were no longer any travel restrictions related to the pandemic. The appellant neither disputed this assertion, nor did it advance any other reasons for changing the oral proceedings format to a video conference. The Board had therefore no reasons to deviate from the initially chosen format.
The application
3. The application relates to a system and method for face detection (page 1), using stereo imaging to obtain distance images (see page 4, from line 9).
3.1 As a first step, the background (scene without persons) is acquired (page 5, lines 10-14); during face detection, this background is subtracted to obtain a difference distance image, from which an object/person mask is acquired (page 5, line 15 - page 6, line 16). The mask is applied to one of the stereo images to set-up the detection area (page 6, lines 17 - 25). A face detector is used in the detection area, with sizes corresponding to the distance to the person and the body ratio (page 6, line 26 to end of page 7).
3.2 This method reduces the computational time for detection in comparison with a standard image face detector that would need to consider several possible face sizes at every position in the image (see page 1 of the application).
Main request: Article 13(2) RPBA 2020
4. This request is based on the previous main request, amended to clarify the feature of the determination of the window size. In the Board's view, this amendment adequately responds to the clarity objection first raised by the Board at point 5 of its communication accompanying the summons to oral proceedings. The Board considers that this constitutes an "exceptional circumstance" in the sense of Article 13(2) RPBA 2020 and decides to take this request into account.
Main request: inventive step
5. The Examining Division arrived at a conclusion of a lack of inventive step starting from document D1. The Board understands the disclosure of D1 (in the pertinent parts) as follows.
6. D1 teaches a method of detecting and tracking persons in an area using range images, obtained from stereo (summary, 1**(st) par., section 2.1). D1, as the current application, uses background subtraction (section 2.3) to identify areas of interest in the distance images (foreground segmentation, 2.5).
6.1.1 D1 uses an average background, i.e. the mean value of a "prescribed number of sequential range images" (section 2.2).
6.1.2 The foreground segmentation also imposes validity conditions on the areas of interest, i.e. they must satisfy an expected relationship between the number of pixels and the depth, so as to correspond with the "physical area" of the object of interest, e.g. a person.
6.2 These areas of interest are used as a mask to identify the corresponding areas in the color images to carry out image based person identification (section 2.6), preferably using color histogram identification.
7. Thus claim 1 differs first from D1 by the usage of a single background image instead of an average background model. This is not a difference that can support an inventive step, as the one single image background is a (simplified) version of the average model, wherein the "prescribed number" is 1. This simplification is obvious to the skilled person in the light of the usual trade-off between complexity and accuracy.
8. The other difference with D1 is the one relating to the last claimed feature, i.e. using a face detector with a window size determined based on the distance to the object.
8.1 Starting from D1, the Board agrees with the Examining Division that, given its ubiquity for such purposes, it would be obvious for the skilled person at least to attempt face recognition as a further, or alternative, method of person identification, so as to improve on the color histogram scheme proposed by D1. That the skilled person may think that not enough useful information is present to perform face recognition, as the appellant submits (statement of grounds section 1.4.1 last paragraph), is a presumption which is not clearly derivable from D1; even if it were, the advances in imaging technology between the publication date of D1 (2003) and the priority date of the current application (2012) would at least alleviate such concerns.
8.2 When performing face recognition, the skilled person needs to detect the person's face and thus to set-up detection windows in the image. This would be done in one of the color images, in the detection areas, in accordance with the teaching of D1 as to the identification procedure ("color histogram").
8.2.1 Given the other teaching already present in D1 (section 2.5), linking the physical size of an object with the distance where it was detected and the corresponding number of pixels in the image, whereby detection areas are discarded if they are not of the expected size, the skilled person would also analogously restrict the possible sizes of the detection windows, for reasons of computational efficiency, to discard areas with sizes that cannot correspond to a human face.
8.2.2 The Board further notes, that even if D1 did not teach to use the relationship between the object's size, the image size, and the distance, to discard potential window sizes, the skilled person is well aware of this relationship (see the discussion on disclosure above), and would use it for said purpose, given that, unlike in standard imaging systems, distance information is provided in D1.
8.2.3 Thus this feature is also obvious starting from D1.
9. The above analysis (points 6 to 8) was communicated to the appellant in the communication accompanying the summons to oral proceedings.
10. In response, the appellant argued in the letter of 12 May 2022 first that D1 does not disclose an object mask creator as claimed. In particular
"Section 2.5 of D1 discloses the processing of an image to identify smoothly varying areas. There is no suggestion within section 2.5 (or elsewhere) that the generation of a mask (cf a geographic map) can be applied to other images. Section 2.5 of D1 simply discloses the processing of an individual image",
and section 2.6 "solely relates to the identification of individuals within a (previously identified) segmented region: it does not relate to the identification of a region of interest".
11. The Board is of the opinion that the appellant misreads document D1 in this respect. The foreground segmentation (section 2.5 of D1) identifies foreground regions in the distance image ("identify regions having smoothly varying depth values" - column 10, lines 33-34). These foreground regions correspond to certain pixels both in the depth image (distance image) and the color image of D1 (background and object image). The person identification method of D1 is color based, hence uses the color image, and is only performed in the foreground regions ("identify persons or objects represented by the segmented regions" - column 11, lines 12-13). Thus the system of D1 identifies a set of pixels in the distance image, i.e. creates a mask - the foreground set(s) of pixels, and constrains the execution of the identification method to those sets of pixels in the color image, i.e. it applies the mask to the color image.
12. The appellant also argues in that letter that section 2.5 of D1 does not suggest using the distance to the object to determine the size of the search window, because
[s]etting up a search window size involves proactively specifying an expected size, rather than simply dismissing objects of an unsuitable size, and
proactively specifying a size for a search window considerably reduces the computational load.
Indeed the relationship between distance to an object and the size of the object is well known. However, what is not known is using this to set a search window size. As discussed above, proactively setting a search window size significantly reduces the computational load.
13. Both these arguments fail to convince because they do not consider the context of D1 and the objective technical problem posed to the skilled person when attempting face recognition in D1. The problem posed is that of setting windows for face detection and for subsequent recognition (see point 8.2 above). The classical paradigm (e.g. Viola-Jones face detection), as also explained in the current application (page 1, lines 15-16) is that "a sub-window having several sizes is moved over all areas of an image".
13.1 In D1, all areas mean all segmented regions, as explained above. The skilled person needs to decide which "several sizes" to consider based on the information available to him.
13.2 In D1, unlike in standard image processing, the skilled person knows the distance to the person, because the segmented regions are associated with certain depths, and it knows, from its common knowledge, but also from D1 section 2.5, that a relationship between distance and image size exists. The skilled person would therefore use the already available distance information in order to set up the window sizes for face detection.
14. The Board concludes therefore that the subject matter of claim 1 lacks inventive step starting from D1 in combination with common knowledge.
Auxiliary requests: Article 13(2) RPBA 2020
15. These requests were filed after the summons to oral proceedings and their admittance is therefore regulated by Article 13(2) RPBA 2020. The appellant did not point to any exceptional circumstances justifying the filing of these two new requests. The apparent reason for filing these requests is to more clearly differentiate the claimed invention from D1. However, in its preliminary opinion the Board merely confirmed the decision of the Examining Division as to a lack of inventive step starting from D1, so no exceptional circumstances can be derived therefrom.
16. Notwithstanding, the Board may find of its own motion the circumstances exceptional and admit these requests, for instance if they define at least prima facie allowable matter (see also T 1294/16 r18 and T 339/19 r1.3). This is not the case here: the newly introduced features appear to only explicitly define features implicitly taken into account for the discussion above (mask setting; scanning detector - see points 11 and 13 above).
17. Thus the Board decides not to take these requests into account (Article 13(2) RPBA 2020).
For these reasons it is decided that:
The appeal is dismissed.