Oscar Mendez: Research Videos

Research Videos

Research Videos about my work.

ICRA 2022:Translating Images into Maps (Best Paper Award)

BEV-SLAM: Building a Globally-Consistent World Map Using Monocular Vision from IROS 2021. PDF. In this award-winning work, we approach instantaneous mapping (converting images to a top-down view of the world) as a translation problem. We show how a to map from images and video directly to an overhead map or bird’s-eye-view (BEV) of the world, in a single end-to-end network. Posing the problem as translation allows the robot to use the context of the image when interpreting the role of each pixel. We obtain state-of-the-art results for instantaneous mapping of three large-scale datasets.

IROS 2022: SKILL-IL: Splitting Skill and Knowledge in Reinforcemnt Learning.

SKILL-IL, Disentangling Skill and Knowledge in Multitask Imitation Learning from IROS 2022. PDF. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. In this work, we give robots the same ability by introducing a new perspective for learning transferable content in multi-task imitation learning. We hypothesize the latent memory of a policy network can be disentangled into two partitions (Skill and Knowledge). These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task. Giving robots this ability allows us to outperform the state-of-the-art by 30%!

IROS 2022:BEV-SLAM, SLAM in learned BEV.

BEV-SLAM: Building a Globally-Consistent World Map Using Monocular Vision from IROS 2022. PDF. We introduce BEV-SLAM, a novel type of graph-based SLAM that aligns semantically-segmented Bird’s Eye View (BEV) predictions from monocular cameras.

IROS 2022: AFT-VO, Asynchronous Transformer Sensor Fusion

AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation from IROS 2022. PDF. We propose a novel transformer-based fusion module, AFT-VO, which combines asynchronous pose estimates, along with their confidences, into a unified pose estimate. We introduce a Discretiser, which allows our sensor fusion to merge asynchronous signals and and a Source Encoding techniques which enable the fusion of multiple signals without need of extrinsic calibration.

IROS 2021: Neural Blindness for Localisation

Improving Robot Localisation by Ignoring Visual Distraction from IROS 2021. PDF. Full Presentation. We introduce the idea of Neural Blindness, which gives an agent the ability to completely ignore objects or classes that are deemed distractors. We render a neural network completely incapable of representing specific chosen classes in its latent space, allowing and agent to focus on what is important for a given task, and demonstrate how this can be used to improve localisation.

IROS 2021: MDN-VO, Odometry with Uncertainty

MDN-VO: Estimating Visual Odometry with Confidence from IROS 2021. PDF. Full Presentation. We propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence model for these estimates. We employ a Mixture Density Network (MDN) which estimates camera motion as a mixture of Gaussians, based on the extracted spatio-temporal representations.

ICRA 2021: Deep BEV Estimation

Birds-Eye-View (BEV) from Monocular Images from ICRA 2021. PDF. We show how monocular images can be used to learn instantaneous Birds-Eye-View (BEV) estimation of a scene. We also show how a better state estimation of the world can be achieved by incorporating temporal information. Our model learns a representation from monocular video through factorised 3D convolutions and uses this to estimate a BEV occupancy grid of the final frame.

ICRA 2021: Robot in a China Shop

Reinforcement Learning for Navigation from ICRA 2021. PDF. We propose a new approach to navigation, where it is treated as a multi-task learning problem. This enables the robot to learn to behave differently in visual navigation tasks for different environments while also learning shared expertise across environments.

ICRA 2021: Markov Localisation with CNNs

CNN-Based Markov Localisation and Odometry Propagation from ICRA 2021. PDF. We present a novel CNN-based localisation approach that can leverage modern deep learning hardware by implementing a grid-based Markov localisation approach directly on the GPU. We create a hybrid Convolutional Neural Network (CNN) that can perform image based localisation and odometry-based likelihood propagation within a single neural network.

ICRA2018: Semantic Scanmatching

SeDAR - Localisation without LiDAR from ICRA2018. PDF. How does a person work out their location using a floorplan? It is probably safe to say that we do not explicitly measure depths to every visible surface and try to match them against different pose estimates in the floorplan. And yet, this is exactly how most robotic scan-matching algorithms operate. Humans do the exact opposite. Instead of depth, we use high level semantic cues. In this work, we use this insight to present a global localisation approach that relies solely on the semantic labels present in the floorplan and extracted from RGB images.

ICCV2017: Taking the Scenic Route

Taking the Scenic Route to 3D from ICCV2017. PDF. We use live robots and simulated drones to demonstrate our Scenic Route Planner, which selects paths which maximise information gain, both in terms of total map coverage and reconstruction accuracy.

Baxter VR Control

A project made by one of my students (Celyn Walters) allows control of a Baxter Robot using a VR headset, ROS and Unity.

Dr. Oscar Mendez