T 0799/19 (Cinematographic image interface/BLUE CINEMA) 11-10-2021
Download and more information:
PROCESS FOR MAKING AN AUDIO-VISUAL INTERFACE
I. This appeal is against the decision of the examining division of 15 October 2018 to refuse
European patent application No. 14723500.6. The application was refused for not fulfilling the requirements of Article 56 EPC having regard to the disclosure of:
D1: US 7 844 467
and the common general knowledge as illustrated by
D2: US 2011/007079,
D3: WO 2009/155926, and
D5: US 2011/304632.
The following document was also cited in the examination proceedings:
D4: WO 98/10584.
II. Notice of appeal was received on 30 October 2018, while the appeal fee was paid on 26 October 2018. The statement setting out the grounds of appeal was received on 14 February 2019. The appellant requested that the decision be set aside and that a patent be granted based on the claims 1 to 4 on which the decision was based. Furthermore, the appellant requested oral proceedings as an auxiliary measure.
III. A summons to oral proceedings was issued on
29 April 2021. In a communication pursuant to
Article 15(1) RPBA, sent on 23 July 2021, the board gave its preliminary opinion that claims 1 to 4 did not meet the requirements of Article 56 EPC in light of the disclosure of D1 and the common general knowledge as illustrated by D2 to D5.
IV. With a letter of response dated 8 September 2021, the appellant submitted a new set of claims 1 to 4 replacing the previous claims. The appellant also provided further arguments with respect to the requirements of Article 56 EPC.
V. Oral proceedings were held on 11 October 2021. The appellant submitted claims 1 to 4 of an amended sole request. It requested that the decision under appeal be set aside and that a patent be granted based on the claims of the amended unique request submitted during the oral proceedings. The decision of the board was announced at the end of the oral proceedings.
VI. Claim 1 according to the unique request reads as follows:
"Process for making an audio-visual interface which reproduces a complete interactive human being as a chosen subject (7), the process, comprising the steps of:
- interactivity analysis;
- image acquisition; and
- reproduction, through an optical or television system with voice, gesture, logic interaction capabilities,
wherein the step of interactivity analysis comprises the sub-steps of: using an optical apparatus which reproduces with three-dimensional effect the human being, equipped with a sensor/controller (3) capable of performing a voice recognition, able to interpret the human language, the interpretation of the human language allowing to identify the information semantics; triggering the appropriate event depending on the received voice command, communicating with a computer through an interface to connect external devices; perceiving the presence, absence and related status change, and a series of movements by a viewer/user (1), and
wherein the chosen subject (7) recites all provided segments, namely ACTION, and a series of non-verbal actions, namely IDLE, PRE-IDLE and BRIDGE, which provide the interface with a presence effect due to the simulation of the human behavior during the dialogue, the visual sequences to shoot being segmented according to an interactivity logic to obtain a sensation of talking with a human being, the logic sequences being divided into four main categories: 1) IDLE sequence (indifference status in which the interface "lives" waiting for a status change command); 2) PRE-IDLE sequence (a behavior which occurs before the action); 3) BRIDGE sequence (an image transition which helps giving visual continuity to the interface between the IDLE/PRE-IDLE status and the ACTION status); and 4)ACTION sequence (in which the interface performs the action or makes the action performed), wherein:
- the step of image acquisition provides for the use of a cinematographic technique comprising the sub-steps of: shooting the subject (7) with a high-definition digital shooting machine (5) with scanning interlaced in a vertical, instead of horizontal, position, on the digital shooting machine (5) oscillating tilt and shift optics being mounted to obtain field depth according to the Scheimpflug rule or condition, which states that, for an optical system, the focal plane generated by the objective and the focal plane of the subject (7) meet on the same straight line; obtaining images of the subject (7) with widespread and uniform illumination on the front to obtain a general opacity of the subject (7) and an illumination of well accurate side cuts and backlights to draw the subject (7) on its whole perimeter to increase the three-dimensionality effect and have well defined edges detached in an empty space, the subject (7) being under total absence of lighting in the areas surrounding his edge, thereby obtaining the "edge cutting" effect necessary for reproducing the images;
- the process further comprises the sub-steps of projecting the image from a LED monitor matrix (26) in a reverse vertical position on a plane made of transparent polycarbonate (28) placed at 45° with respect to the matrix (26);
- the step of reproduction comprises the sub-steps of: providing an optical reproduction system adapted, through the reflection of images generated by a LED monitor matrix (26) or other FULL HD source, to have the optical illusion of the total suspension of an image in an empty space; reflecting (28) the reproduced images on a slab made of transparent polycarbonate; providing a final image perceived by the viewer (1) on a vertical plane (30) behind the polycarbonate slab, completely detached from a background (32), with which a parallax effect (34) is formed; and
- the process comprises the step of activating
the interface for controlling external peripherals outside the optical reproduction system namely domotic peripherals, DMX controlled lights systems for shows or events, a videoprojection of an audio-visual contribution on an external large screen or any other electric or electronic apparatus equipped with standard communication protocols, comprising the steps of:
providing a reproduction (IDLE) of the human being in a standby status;
detecting the presence of one or more users through an input of data provided by the volumetric sensor;
in case of negative input, namely no users are present, going back in a loop to the reproduction of the IDLE status;
in case of positive input, namely users are present, reproducing the PREIDLE sequence and the ACTION sequence with a request to make a control request of the peripheral, possibly showing a list of possible choices;
performing a verbal input by the user;
performing a semantic analysis of the sentence and a logic comparison, possibly executing updated commands by the peripheral;
if the execution is not possible, reproducing the ACTION sequence with a request of inputting again a command on the peripheral;
if the execution is possible, visually executing the BRIDGE sequence and the ACTION sequence in which the interface describes and confirms the execution of the current command and going back to the initial IDLE status;
simultaneously, providing an output through a suitable communication protocol with the peripheral, in which the peripheral is actuated and updated in its status."
1. Prior art
D1 discloses a user interface for a hand-held device wherein a virtual agent based on computer-generated animated images and voice is presented to the user on the interface's display (see Figure 4A). The virtual agent is adapted to answer to vocal requests of the user for interaction of the user with the device (see column 1, lines 22 to 24). D1 is in particular focused on the design of the virtual agent which should appear as "human" as possible to the user both when talking and listening (see column 2, lines 53 to 61). With regard to this aim, the prosody of the speech data of the user is analysed by a server in order to control the movements of the virtual agent when it is listening and talking (see abstract; column 2, lines 62 to 65; column 3, lines 1 to 6; column 3, lines 15 to 18; column 4, lines 46 to 49; column 6, lines 36 to 44; column 8, lines 48 to 61). The control of the virtual agent's facial movements and expressions is generated by a server connected to the device and is based on stored video recordings of human speakers reading texts (see column 11, lines 4 to 8 and 23 to 29).
D2 relates to a user interface displaying an animated avatar of the user on the interface's display. The animations applied to the avatar are updated based on captured data of the user when the user is in front of the screen, such as their gestures and voice patterns (see paragraphs [0124] and [0125]). A depth camera is used to capture the user's image (see paragraph [0034] and Figure 2).
D3 discloses a camera with oscillating tilt and shift optics for obtaining a field depth according to the Scheimpflug principle (see page 1, lines 25 to 29; page 4, lines 23 to 29; page 5, lines 13 to 19; page 8, line 2 to page 9, line 2; figures 2 and 4).
D4 relates to the generation and display of a 3D or holographic image (see abstract; page 10, lines 22 to 24; page 16, line 23 to page 18, line 18).
D5 discloses a user interface displaying an avatar which interacts with a user. It comprises a depth sensing camera used to detect the presence of the user in front of the interface (see paragraph [0015]).
2. It was common ground in the oral proceedings that D1 is to be considered, as in the impugned decision, as the closest prior art to the subject-matter of claim 1.
The board agrees with the decision in point 9.1 that at least the following features A to D of claim 1 are not disclosed in D1:
A: - shooting the subject with a shooting machine with scanning interlaced in a vertical (instead of horizontal) position on the digital shooting machine, oscillating tilt and shift optics being mounted to obtain field depth according to the Scheimpflug rule or condition, which states that, for an optical system, the focal plane generated by the objective and the focal plane of the subject meet on the same straight line;
B: - obtaining images of the subject with widespread and uniform illumination on the front to obtain a general opacity of the subject and an illumination of well accurate side cuts and backlights to draw the subject on its whole perimeter to increase the three-dimensionality effect and have well defined edges detached in an empty space, the subject being under total absence of lighting in the areas surrounding its edge, thereby obtaining the "edge cutting" effect necessary for reproducing the images,
- projecting the image from an LED monitor matrix in a reverse vertical position on a plane made of transparent polycarbonate placed at 45° with respect to the matrix, the step of reproduction comprising the sub-steps of:
- providing an optical reproduction system adapted, through the reflection of images generated by an LED monitor matrix or other FULL HD source, to have the optical illusion of the total suspension of an image in an empty space,
- reflecting the reproduced images on a slab made of transparent polycarbonate,
- providing a final image perceived by the viewer on a vertical plane behind the polycarbonate slab,
completely detached from a background, with which a parallax effect is formed;
C: - detecting the presence of one or more users through an input of data provided by the volumetric sensor;
D: - if the execution is not possible, reproducing the ACTION sequence with a request of inputting again a command on the peripheral.
Moreover, the board agrees with the appellant that the following feature is not disclosed in D1:
E: - activating the interface for controlling external peripherals outside the optical reproduction system, namely domotic peripherals, DMX controlled light systems for shows or events, a videoprojection of an audio-visual contribution on an external large screen or any other electric or electronic apparatus equipped with standard communication protocols.
In this respect, D1 discloses that the device provided with the interface may be connected to a server through a network (see column 8, lines 28 to 37). The server processes speech data received from the device in order to assist the device in generating the voice and content of the virtual agent (see figure 4A and column 8, line 38 to column 9, line 18; figure 4B and column 9, lines 32 to 52). Therefore, the server in D1 has to be considered as a part of the optical reproduction system generating the interface of D1 and not as an external peripheral controlled by the device supporting this interface.
3. As regards feature A, the decision in point 9.2 states that it improves the quality of the captured image by using the common knowledge of the skilled person, as illustrated inter alia by D3. However, the image of the virtual agent displayed on the interface of D1 is not a captured image of a human actor but rather a computer-generated image based on facial features extracted from video recordings of a human actor. Thus, even if it were to be assumed that feature A in itself was a common measure, the board holds that its use in the system of D1 would not be obvious for the skilled person due to the above-mentioned technical difference between the virtual agent images in D1 and those in claim 1.
4. As regards feature B, the decision in point 9.2 asserts that it only defines how the information is presented to the user and thus has no technical effect other than its pure implementation which is in itself straightforward for the skilled person. However, even if it is considered that obtaining and projecting an image as defined by feature B is an obvious measure and that the image provided by feature B is only a different presentation of a virtual agent to the user with no technical effect, the board holds that implementing the image-obtaining and image-projecting method of feature B in the system of displaying a virtual agent of D1 would necessitate structural modifications of the device supporting the interface of D1 which exceed the general design capabilities of the skilled person.
5. As regards feature E, the board notes that D1 does not hint that the virtual agent may be used as an interface for controlling peripherals which are external to the device supporting the interface, such as those listed in claim 1. The virtual agent is described throughout D1 as a computer-generated image which simulates a person listening and answering to a user of the interface. The only connection of the device supporting the interface with an external entity is a connection with a server (144A in figure 4A and 144B in figure 4B) which assists the device in the task of generating the virtual agent behaviour. None of the other cited documents D2 to D5 relates to the controlling of an external peripheral on the part of an interface simulating a human agent.
6. For these reasons, the board holds that the combination of at least features A, B and E with the prior art of D1 would not be considered by the skilled person without the use of hindsight. Thus, the subject-matter of claim 1 involves an inventive step, having regard to the cited prior art. Claims 2 to 4 are dependent claims and, as such, also meet the requirements of Article 56 EPC.
Order
For these reasons it is decided that:
1. The decision under appeal is set aside.
2. The case is remitted to the examining division with the order to grant a patent based on claims 1 to 4 of the amended unique request submitted during the oral proceedings before the board, description and figures to be adapted.