BEV-SLAM: Building a Globally-Consistent World Map Using Monocular Vision from IROS 2021. PDF. In this award-winning work, we approach instantaneous mapping (converting images to a top-down view of the world) as a translation problem. We show how a to map from images and video directly to an overhead map or bird’s-eye-view (BEV) of the world, in a single end-to-end network. Posing the problem as translation allows the robot to use the context of the image when interpreting the role of each pixel. We obtain state-of-the-art results for instantaneous mapping of three large-scale datasets.
SKILL-IL, Disentangling Skill and Knowledge in Multitask Imitation Learning from IROS 2022. PDF. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. In this work, we give robots the same ability by introducing a new perspective for learning transferable content in multi-task imitation learning. We hypothesize the latent memory of a policy network can be disentangled into two partitions (Skill and Knowledge). These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task. Giving robots this ability allows us to outperform the state-of-the-art by 30%!
BEV-SLAM: Building a Globally-Consistent World Map Using Monocular Vision from IROS 2022. PDF. We introduce BEV-SLAM, a novel type of graph-based SLAM that aligns semantically-segmented Bird’s Eye View (BEV) predictions from monocular cameras.
AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation from IROS 2022. PDF. We propose a novel transformer-based fusion module, AFT-VO, which combines asynchronous pose estimates, along with their confidences, into a unified pose estimate. We introduce a Discretiser, which allows our sensor fusion to merge asynchronous signals and and a Source Encoding techniques which enable the fusion of multiple signals without need of extrinsic calibration.
Improving Robot Localisation by Ignoring Visual Distraction from IROS 2021. PDF. Full Presentation. We introduce the idea of Neural Blindness, which gives an agent the ability to completely ignore objects or classes that are deemed distractors. We render a neural network completely incapable of representing specific chosen classes in its latent space, allowing and agent to focus on what is important for a given task, and demonstrate how this can be used to improve localisation.
MDN-VO: Estimating Visual Odometry with Confidence from IROS 2021. PDF. Full Presentation. We propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence model for these estimates. We employ a Mixture Density Network (MDN) which estimates camera motion as a mixture of Gaussians, based on the extracted spatio-temporal representations.
Birds-Eye-View (BEV) from Monocular Images from ICRA 2021. PDF. We show how monocular images can be used to learn instantaneous Birds-Eye-View (BEV) estimation of a scene. We also show how a better state estimation of the world can be achieved by incorporating temporal information. Our model learns a representation from monocular video through factorised 3D convolutions and uses this to estimate a BEV occupancy grid of the final frame.
Reinforcement Learning for Navigation from ICRA 2021. PDF. We propose a new approach to navigation, where it is treated as a multi-task learning problem. This enables the robot to learn to behave differently in visual navigation tasks for different environments while also learning shared expertise across environments.
CNN-Based Markov Localisation and Odometry Propagation from ICRA 2021. PDF. We present a novel CNN-based localisation approach that can leverage modern deep learning hardware by implementing a grid-based Markov localisation approach directly on the GPU. We create a hybrid Convolutional Neural Network (CNN) that can perform image based localisation and odometry-based likelihood propagation within a single neural network.
SeDAR - Localisation without LiDAR from ICRA2018. PDF. How does a person work out their location using a floorplan? It is probably safe to say that we do not explicitly measure depths to every visible surface and try to match them against different pose estimates in the floorplan. And yet, this is exactly how most robotic scan-matching algorithms operate. Humans do the exact opposite. Instead of depth, we use high level semantic cues. In this work, we use this insight to present a global localisation approach that relies solely on the semantic labels present in the floorplan and extracted from RGB images.
Taking the Scenic Route to 3D from ICCV2017. PDF. We use live robots and simulated drones to demonstrate our Scenic Route Planner, which selects paths which maximise information gain, both in terms of total map coverage and reconstruction accuracy.
A project made by one of my students (Celyn Walters) allows control of a Baxter Robot using a VR headset, ROS and Unity.