We are extremely happy to share that our joint work with the ITT PAVIS group, titled “Leveraging commonsense for object localization in partial scenes”, has been accepted for publication in the IEEE Transactions on Pattern Analysis and Machine Intelligence (tPAMI).
Object localization in partial (not fully seen) scenes is a challenging problem in computer vision, with significant practical applications in fields such as robotics and autonomous driving. To tackle this problem, our recently accepted paper, titled “Leveraging commonsense for object localization in partial scenes” proposes a novel end-to-end approach that leverages and embeds commonsense knowledge in a scene graph in the form of what we call Directed Spatial Commonsense Graph (D-SCG).
Our approach uses D-SCG, a heterogenous graph that represents the scene objects and their relative positions, enriched with additional concept nodes from a commonsense knowledge base. By doing this, the D-SCG augments the geometrics reasoning over the scene with commonsense reasoning, which is very beneficial for localizing objects in the unseen part of a scene.
To estimate the position of an object in an unknown area, we use a Graph Neural Network that implements a sparse attentional message-passing mechanism. The network first predicts the relative positions between the target object and each visible object by learning a rich representation of the objects via message passing on both the object and concept nodes in D-SCG. These relative positions are then combined to deduce the final position of the object. The evaluation on the Partial ScanNet dataset shows that our approach outperforms the state-of-the-art methods by 5.9 percent in terms of localization accuracy while training 8x faster.

Our work demonstrates the importance of incorporating commonsense knowledge into computer vision methods, especially in scenarios where only partial information about the scene is available. This a promising step towards both solving the object localization problem in partial scenes and opening up possibilities for future research on incorporating knowledge-based reasoning in visual perception tasks.
You can download the paper here.