We are pleased to announce that Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation is accepted with oral presentation @ IROS24! This work is in collaboration with Istituto Italiano di Tecnologia (IIT) and Fondazione Bruno Kessler.
If you want to learn more, the paper is on ArXiv, while on the project page you will find the dataset and code.
Idea and results:
Interacting with agents through natural language is a long-term goal of embodied AI as it is potentially the most intuitive mode for human-robot communication. The emerging research on Vision-and-Language Navigation (VLN) is along this path, aiming to develop embodied agents that, following a given instruction in the format of natural language, can reach a target destination in a 3D environment, e.g., “Exit the bedroom and turn left. Walk straight past the grey couch and stop near the rug.”
In real-world scenarios, however, instructions given by humans may contain errors when describing a spatial environment due to inaccurate memory or confusion.
To this end:
– We categorize errors in the VLN-CE task, and establish the first benchmark – R2RI-CE
– We show that state-of-the-art VLN-CE methods are not robust to instruction error
– We formalize the task of Detection and Localization of Instruction Errors
– We propose a method, Instruction Error Detection & Localizer (IEDL)
– We use IEDL to discover errors in the ground truth annotations of the R2R-CE and RxE-CE datasets
Method Overview