T 1911/10 () of 3.6.2015

European Case Law Identifier: ECLI:EP:BA:2015:T191110.20150603
Date of decision: 03 June 2015
Case number: T 1911/10
Application number: 07252717.9
IPC class: G06K 9/00
G06F 3/01
Language of proceedings: EN
Distribution: D
Download and more information:
Decision text in EN (PDF, 309.462K)
Documentation of the appeal procedure can be found in the Register
Bibliographic information is available in: EN
Versions: Unpublished
Title of application: Gesture recognition simultation system and method
Applicant name: Northrop Grumman Systems Corporation
Opponent name: -
Board: 3.4.01
Headnote: -
Relevant legal provisions:
European Patent Convention Art 52(1)
European Patent Convention 1973 Art 56
Keywords: Inventive step - after amendment
Catchwords:

-

Cited decisions:
-
Citing decisions:
-

Summary of Facts and Submissions

I. The appeal lies from the decision of the examining division refusing European patent application number 07 252 717.9.

II. The examining division refused the application because claim 1 of each of the requests on file at that time was held to lack inventive step. The following documents were cited inter alia:

D1: KORIDA, K. et al.: "An interactive 3D interface for a virtual ceramic art work environment"; Virtual Systems and Multimedia, 1997; VSSM '97 International Conference Proceedings; pages 227-234; XP010245649;

D9: SUTCLIFFE, A. et el.: "Presence, memory and interaction in virtual environments"; Int. Journal of Human-Computer Studies; vol. 62, pages 307-327; 17 January 2005;

D10: EP-A-1 223 537;

D11: LEIBE, B. et al.: "Towards Spontaneous Interaction with the Perceptive Workbench"; Virtual Reality; November/December 2000; pages 54-65.

III. With the statement setting out the grounds of appeal, the appellant requested that the decision under appeal be set aside and that a patent be granted on the basis of one of three sets of claims filed with the statement of grounds and forming the basis of a main request and first and second auxiliary requests.

As a precautionary measure, oral proceedings were requested.

IV. In a communication of the Board issued in preparation for an oral proceedings, objections were raised against the independent claims of the requests on file under Article 123(2) EPC and Articles 83 and 84 EPC 1973. In addition thereto, the question of inventive step was addressed for those independent claims which could be sufficiently understood.

The Board referred in its communication to D9 and to

D3: SATO, Y. et al.: "Real-Time Input of 3D Pose and Gestures of a User's Hand and Its Applications for HCI"; Proceedings IEEE Virtual Reality 2001; Yokohama, Japan; 13 March 2001; pages 79-86.

V. In response to the Board's communication, by letter of 6 May 2015, the appellant filed four new sets of claims forming the basis of a new main request and first, third and fourth auxiliary requests. The second auxiliary request filed with the statement setting out the grounds of appeal was maintained.

VI. During the oral proceedings the appellant withdrew all previous requests and filed a single set of claims 1 to 7, requesting that a patent be granted on the basis of this single claim set.

VII. Claim 1 reads as follows:

"A method of interacting with a simulated object (360, 410), the method comprising:

generating a three-dimensional image of the simulated object (360, 410) having a functional component (412), wherein the functional component (412) is a portion of the simulated object (360, 410) with which simulated interaction can be performed;

illuminating a retroreflective background surface (210, 356, 406) with a first light source (206, 354, 404) and a second light source (208, 354, 404), wherein the first light source is coupled to a first camera (202, 352) and the second light source is coupled to a second camera (204, 352) and wherein light from the first light source is reflected directly back to the first camera and light from the second light source is reflected directly back to the second camera;

performing an input gesture at or near a location of the functional component (412), wherein the input gesture is formed from a six-degree of freedom motion based on changes in three-dimensional location and orientation of the sensorless input object (212), the sensorless input object being a hand of a user;

generating, by the first camera, a first plurality of images associated with the sensorless input object (212) based on a reflected light contrast between the sensorless input object (212) and the illuminated retroreflective background surface (210, 356, 406) caused by the first light source (206, 354, 404);

generating, by the second camera, a second plurality of images associated with the sensorless input object (212) based on a reflected light contrast between the sensorless input object (212) and the illuminated retroreflective background surface (210, 356, 406) caused by the second light source (208, 354, 404);

forming pairs of images, each pair of images comprising an image of the first plurality of images taken by the first camera and an image of the second plurality of images taken by the second camera at the same time;

determining, for each of the pairs of images, a three-dimensional shape and a physical location of the sensorless input object at a given time based on a relative parallax separation of the respective pair of images;

determining changes in a three-dimensional shape and/or a physical location of the sensorless input object (212, 364) based on the determined three-dimensional shapes and the determined physical locations of the sensorless input object at the given times;

determining the input gesture based on the determined changes in the three-dimensional shape and/or the physical location of the sensorless input object (212);

accessing an object library (20) that is configured to store data associated with a plurality of simulated objects (360, 410) including three-dimensional image information, information associated with the functional component (412) of the respective simulated object (360, 410), a predefined action associated with the functional component (412), and a predefined gesture associated with the predefined action for the functional component;

accessing a gesture library (22) that is configured to store a plurality of universal gestures applicable to any of the stored plurality of simulated objects (360, 410);

determining if the input gesture matches one of the universal gestures stored in the gesture library (22) or the predefined gesture associated with the predefined action for the functional component (412) stored in the object library (20); and

displaying a simulated action associated with the matched universal gesture or the predefined gesture on a portion of the simulated object (360, 410) associated with the functional component (412) so that an automatic rendering of the simulated action on the portion of the simulated object (360, 410) is caused."

Claims 2 to 7 are dependent claims.

Reasons for the Decision

1. The appeal is admissible.

2. The Board is satisfied that the current claims meet the requirements of Article 123(2) EPC and Article 84 EPC 1973. The invention is also sufficiently disclosed (Article 83 EPC 1973).

3. Inventive step

3.1 In the contested decision, the examining division presented arguments using either document D1 or D9 or D11 as the closest prior art for the independent claims of the various requests on file at that time. The current independent claim has been modified extensively compared to the independent claims on which the contested decision was based. The appellant considered that D1 represented the closest prior art for this new independent claim. The Board agrees with this finding.

D1 relates to the design and implementation of an interactive interface to support 3D object creation in a virtual environment. Virtual objects are presented stereoscopically through LCD shutter glasses just in front of the user and the user can manipulate the virtual objects directly by using his/her own hands which are fitted with two-handed instrumented gloves and 6-degree of freedom electromagnetic (EM) sensors. Data sampled from the gloves enables static postures and temporally varied hand shapes to be recognised. Data sampled from EM sensors enables movement patterns and current hand position to be determined. The thus-equipped hands can be used to provide functions such as dynamic gesture expressions, position and size indication of the virtual objects and spatial manipulations such as translations, rotations and deformations.

3.2 Having regard to the other prior art cited in the contested decision, the Board notes that D9 concerns the use of pinch gloves to provide input commands to a virtual environment. In this respect the system of D9 is further removed from the subject-matter of current claim 1 since the shape of the hand is not employed to identify control gestures: instead it is the contact between the forefinger and thumb of the pinch glove which is registered.

D11 involves the use of shadow architecture to track the movement of a hand in front of a surface. Pointing gestures are recognised by complementing the shadow monitoring with a side-viewing camera enabling the spatial position of the hand as well as the pointing direction to be determined. Although D11 discloses a camera-based system gesture recognition system, the image processing is so different to that of current claim 1 that it does not constitute a suitable starting point.

3.3 Starting from the disclosure of D1, the skilled person is faced with the inconvenience of a cumbersome glove-based tracking system which may restrict the user's motion due to the physical connection of the sensors to their controllers. The technical problem to be solved with respect to D1 is therefore to provide an alternative mechanism by which hand gestures may be recognised which is less cumbersome to use.

3.4 This problem is solved by providing first and second cameras to generate a plurality of pairs of images, the images of each pair having been taken by the respective first and second cameras at the same time. The cameras are each associated with a respective first and second light source which illuminates the user's hand against a retroreflective background such that light is reflected back to the respective camera. The cameras are arranged such that the 3D shape and location of the user's hand at a given time can be determined on the basis of the parallax separation of the respective pair of images. An input gesture, which is formed from a six degree of freedom motion, is determined from changes in the shape and location of the hand at the given times. Once the input gesture is established, it is determined whether the input gesture matches a "universal gesture" stored in a gesture library or a "predefined gesture" stored in an object library. The object library stores data associated with a plurality of simulated objects. This information includes 3D image information, information associated with the functional component of the simulated object and a predefined action associated therewith. Moreover, the object library includes correspondence information which enables a "predefined gesture" to be mapped to a predefined action to be performed on the functional component (e.g. unscrewing a screw of a housing, pulling the trigger of a gun). The gesture library stores a plurality of "universal gestures" which are applicable to any of the stored plurality of simulated objects and can be mapped to an action which is to be performed on the entire simulated object (e.g. rotation or translation of the entire object). The simulated action corresponding to the matched "universal gesture" or "predefined gesture" is then displayed causing an automatic rendering of the simulated action on the virtual object.

3.5 This arrangement obviates the need to use any type of sensor attached to the user's hand. Furthermore, the provision of two libraries, one including information relating to the gestures associated with the simulated object as a whole (the "universal gestures") and one including information relating to the gestures associated with a specific functional component of the simulated object (the "predefined gestures"), means that storage space may be reduced and the speed at which the gestures can be determined may be improved.

3.6 The use of camera-based systems to identify input gestures in a human-computer interface is known from D3. In particular D3 discloses a method for tracking a user's hand in three-dimensions and recognising the hand's gesture without the use of any invasive devices attached to the hand. The user's hand is observed by two cameras which are placed facing the centre of the workspace in which the user's hand is to be moved. The cameras are located such that the images are taken from different locations and the 3D position of the hand may be determined by triangulation. By identifying characteristic points in the images of the hand, the orientation thereof may be determined. Various static hand shapes, or "gestures", are recognised by the pattern recognition software and mapped to specific control actions which are used to manipulate a computer generated object.

The Board notes that the gesture recognition performed in D3 is therefore more limited than the gesture recognition performed in D1. Since the camera-based system of D3 cannot provide the full functionality of the glove-based system of D1, it would not be obvious to the skilled person to replace the gloves and EM sensors of D1 by the camera-based system of D3.

Moreover, the gesture recognition performed by the camera-based system of D3 lacks a number of details which are now defined in claim 1 of the present application. Specifically, in claim 1 the shape and location of the hand is determined from the parallax between two images taken by different cameras at the same time. The changes in the shape and location of the hand at given times is used to identify the input gesture. Thus, gestures associated with motion of the hand can be identified, as opposed to the merely static gestures identified in D3.

Furthermore, the input gestures in D3 are used to control only "universal" actions. There is no suggestion in D3 that actions associated with a specific functional component of the simulated object may be simulated. Thus there is no disclosure in D3 of the step of accessing an object library which includes information associated with a functional component and a predefined action associated with that functional component.

These features are also not disclosed in any of the remaining prior art documents cited during the examination procedure.

3.7 It is therefore not obvious to modify the method of D1 in a manner which would lead to the method currently claimed. The subject-matter of claim 1 therefore involves an inventive step (Article 52(1) EPC, Article 56 EPC 1973).

Order

For these reasons it is decided that:

1. The decision under appeal is set aside.

2. The case is remitted to the examining division

with the order to grant a patent on the basis of

claims 1 to 7 filed during the oral proceedings of

3 June 2015 and a description and figures to be

adapted, where appropriate.

Quick Navigation