Real-time Fine-Grained 3D Hand Gesture Recognition @CVPRW23

We are proud to share that our work “OO-dMVMT: A Deep Multi-view Multi-task Classification Framework for Real-time 3D Hand Gesture Classification and Segmentation” will be presented in the Computer Vision for Mixed Reality workshop at CVPR 2023! Our approach, OO-dMVMT, achieves state-of-the-art performance on fine-grained online gesture recognition and segmentation, and implements a novel learning approach we called “on-off learning”.

Fine-grained online gesture recognition is a fundamental topic for Augmented Reality and Virtual Reality applications and enables a “natural” way of interaction between humans and computers. In this sense, the word online means that the gestures are surrounded by non-meaningful movements that are called “non-gestures”.

The current generation of AR headsets (e.g. Meta Quest 2, Hololens 2, etc.) features accurate hand tracking capabilities, capturing finger poses, to perform an effective way of interaction. The real-time capabilities of these devices enable the recognition of mid-air gestures.

To achieve a natural and reliable AR interaction, two parameters are particularly important in gesture recognition: accuracy and false positives. In particular, false positives can trigger unwanted actions, making the interaction frustrating for the user.

With our approach, OO-dMVMT, we adopt the Multi-View Multi-Task paradigm (MVMT), in which multiple feature sets are processed in parallel and multiple tasks are trained on those features. To improve the segmentation of gestures we introduced a regression branch to improve the fine-grained classification: this branch predicts whether a gesture is starting or ending. However, this can not be applied to all the data points, such as those that are not either starting or ending a gesture (e.g. non-gestures or middle of gestures).

To cope with this problem, we introduced a novel multi-task learning paradigm named “On-Off multi-task learning”. During training, given a multi-task network with different heads that achieve different tasks, we activate (turning on) or deactivate (turning off) the heads of the network that can not work with that specific data point. This results in improved fine-grained classification and segmentation, achieving state-of-the-art accuracy and false positives while maintaining real-time latency.

A real-time Hololens 2 demo has been developed, will be presented at CVPR23 during the Computer Vision for Mixed Reality workshop, and will shortly be integrated inside the ICE Laboratory at the University of Verona!

Code: https://github.com/intelligolabs/OO-dMVMT

Paper: https://arxiv.org/abs/2304.05956

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *