Alessio Del Bue is a tenured senior researcher leading the PAVIS (Pattern Analyisis and computer VISion) research line of the Italian Institute of Technology (IIT) in Genova, Italy. Previously, he was a researcher in the Institute for Systems and Robotics at the Instituto Superior Técnico (IST) in Lisbon, Portugal. Before that, he obtained his Ph.D. under the supervision of Dr. Lourdes Agapito in the Department of Computer Science at Queen Mary University of London. His current research interests are related to 3D scene understanding from multi-modal input (images, depth, audio) to support the development of assistive Artificial Intelligence systems. He is co-author of more than 100 scientific publications, in refereed journals and international conferences, member of the technical committees of important computer vision conferences (CVPR, ICCV, ECCV, BMVC, etc.), and he serves as an associate editor of Patter Recognition and Computer Vision and Image Understanding journals. Finally, Dr. Del Bue is an IEEE and ELLIS member.
He has held senior professional and academic positions in four countries, and has published over one hundred scientific articles in peer-reviewed, international journals and conference proceedings.
3D Scene Understanding: bridging Geometry and Deep Learning
Autonomous systems have to understand the 3D spatial layout world they navigate and interact with. In order to fully operate in the wild, a fundamental step is to build representations of the 3D world that are reliable and that they can be generalised to every scenario. In this lecture we will provide a walktrough on recent advancements in generating 3D models of the world that are semantically meaningful and that can be used to solve high level tasks. We will first provide fundamentals on 3D geometry and how it is possible to localise objects in multi-view by using structure from motion principles. Then, this information can be used to provide 3D scene graphs linked to the physical world using Graph Neural Networks encoding both geometric structure and visual appearance of the objects present in the scene.
Finally we will demonstrate how these models can be effective for several tasks such as camera re-localisation, active visual search and Visual Question and Answering.