Introduction to 3D SLAM with RTAB-Map

Shiva Chandrachary
8 min readJan 13, 2021

--

RTAB-Mapping, short for Real-Time Appearance-Based Mapping, is a graph-based SLAM approach. Appearance-based SLAM means that the algorithm uses data collected from vision sensors to localize the robot and map the environment. A process called loop closures is used to determine whether the robot has seen a location before. As the robot travels to new areas in its environment, the map is expanded, and the number of images that each new image must be compared to increases. This causes the loop closures to take longer but with complexity increasing linearly. RTAB-Map is optimized for large-scale and long-term SLAM by using multiple strategies to allow for loop closure to be done in real-time. The loop closure is happening fast enough that the result can be obtained before the next camera images are acquired.

Recap

Before diving deep into the RTAB-Mapping, it is quite important to understand the basics of GraphSLAM such as, what is a graph, how is one constructed, how to represent the poses and features in 1-D and n-D, how to store and process the constraints and how to work with nonlinear constraints. Below is a brief introduction to GraphSLAM that helps you gain the necessary tools before proceeding further.

Front End and Back End of RTAB-Map

Front End

The front end of RTAB-Map focuses on the sensor data used to obtain the constraints that are used for feature optimization approaches. Landmark constraints are not used in RTAB-Map. Only odometry constraints and loop closure constraints are considered here. The odometry constraints can come from wheel encoders, IMU, LiDAR, or visual odometry. Visual odometry is accomplished using 2D features such as Speeded Up Roust Features or SURF.

RTAB-Map is appearance-based and with no metric distance information RTAB-Map can use a single monocular camera to detect loop closure. For metric GraphSLAM, RTAB-Map requires an RGB-D camera or a stereo camera to compute the geometric constraint between the images of loop closure. A laser range finder can also be used to refine this geometric constraint. The front end also involves graph management, which includes node creation and loop closure detection using bag-of-words.

Back End

The back end of RTAB-Map includes the graph optimization and an assembly of an occupancy grid from the data of the graph.

Loop Closures

Loop closure is the process of finding a match between the current and previously visited locations in SLAM. There are two types of loop closure detections: local and global.

In local loop closures, the matches are found between a new observation and a limited map region. The size and location of this limited map region are determined by the uncertainty associated with the robot’s position. This type of approach fails if the estimated position is incorrect.

In the global loop closures approach, a new location is compared with previously viewed locations. If no match is found, the new location is added to the memory. As the robot moves around and the map grows, the amount of time to check the new locations with ones previously seen increases linearly. If the time it takes to search and compare new images to the one stored in memory becomes larger than the acquisition time, the map becomes ineffective.

RTAB-Map uses global loop closures along with other techniques to ensure that the loop closure process happens in real-time.

The importance of loop closure is best understood by seeing a map result without it!

When loop closure is disabled, you can see parts of the map output that are repeated, and the resulting map looks a lot more choppy. It is not an accurate representation of the environment. This is caused by the robot not using loop closure to compare new images and locations to ones that are previously viewed, and instead, it registers them as new locations. When loop closure is enabled, the map is significantly smoother and is an accurate representation of the room.

For example, on the left, where loop closure is disabled, you’ll see highlighted where the door is represented as multiple corners and parts of a door, where on the right, you see a single clearly defined door.

Bag-of-Words

In RTAB-Mapping, loop closure is detected using a bag-of-words approach. A feature is a very specific characteristic of an image, like a patch with complex texture or a well-defined edge or corner. In RTAB-Mapping, the default method used to extract features from an image is called Speeded Up Robust Features or SURF. Each feature has a descriptor associated with it. A feature descriptor is a unique and robust representation of the pixels that make up a feature. In SURF, the point of interest where the feature is located is split into smaller square sub-regions. From these sub-regions, the pixel intensities in regions of regularly spaced sample points are calculated and compared. The differences between the sample points are used to categorize the sub-regions of the image.

Comparing feature descriptors directly is time-consuming, so a vocabulary is used for faster comparison. This is where similar features or synonyms are clustered together. The collection of these clusters represent the vocabulary. When a feature descriptor is mapped to one in the vocabulary, it is called quantization. At this point, a feature is linked to a word and can be referred to as a visual word. When all features in an image are quantized, the image is now a bag-of-words. Each word keeps a link to images that it is associated with, making image retrieval more efficient over a large data-set.

To compare an image with all previous images, a matching score is given to all images containing the same words. Each word keeps track of which image it has been seen in so similar images can be found. This is called an inverted index.

If a word is seen in an image, the score of this image will increase. If an image shares many visual words with the query image, it will score higher. A Bayesian filter is used to evaluate the scores. This is the hypothesis that an image has been seen before. When the hypothesis reaches a pre-defined threshold H, a loop closure is detected.

Memory Management

RTAB-Map uses a memory management technique to limit the number of locations considered as candidates during loop closure detection. This technique is a key feature of RTAB-Map and allows for loop closure to be done in real-time.

The overall strategy is to keep the most recent and frequently observed locations in the robot’s Working Memory (WM) and transfer the others into Long-Term Memory (LTM).

  • When a new image is acquired, a new node is created in the Short Term Memory (STM).
  • When creating a node, recall that features are extracted and compared to the vocabulary to find all of the words in the image, creating a bag-of-words for this node.
  • Nodes are assigned a weight in the STM based on how long the robot spent in the location — where a longer time means a higher weighting. If two consecutive images are similar, the weight of the first node is increased by one and no new node is created for the second image.
  • The STM has a fixed size of S. When STM reaches S nodes, the oldest node is moved to WM to be considered for loop closure detection.
  • Loop closure happens in the WM.
  • WM size depends on a fixed time limit T. When the time required to process new data reaches T, some nodes of the graph are transferred from WM to LTM — as a result, WM size is kept nearly constant.
  • Oldest and less weighted nodes in WM are transferred to LTM before others, so WM is made up of nodes seen for longer periods of time.
  • LTM is not used for loop closure detection and graph optimization.
  • If loop closure is detected, neighbors in LTM of an old node can be transferred back to the WM (a process called retrieval).

Graph Optimization

When a loop closure hypothesis is accepted, a new constraint is added to the map’s graph, then a graph optimizer minimizes the errors in the map. RTAB-Map supports 3 different graph optimizations: Tree-based network optimizer, or TORO, General Graph Optimization, or G2O and GTSAM (Smoothing and Mapping).

All of these optimizations use node poses and link transformations as constraints. When a loop closure is detected, errors introduced by the odometry can be propagated to all links, correcting the map.

Recall that Landmarks are used in the graph optimization process for other methods, whereas RTAB-Map doesn’t use them. Only odometry constraints and loop closure constraints are optimized.

You can see the impact of graph optimization in the comparison below.

Map assembly and Output

The possible outputs of RTAB-Map are a 2d Occupancy grid map, 3d occupancy grid map (3d octomap), or a 3D point cloud.

Graph SLAM Complexity and the Complexity of RTAB-Map

Graph-SLAM complexity is linear, according to the number of nodes, which increases according to the size of the map.

By providing constraints associated with how many nodes are processed for loop closure by memory management, the time complexity becomes constant in RTAB-Map.

Conclusion

I used ROS’ RTAB-Map package to create a 2D occupancy grid and 3D octomap from the simulated environment in Gazebo. Below is a video showing the map being generated in real-time as the robot traverses its environment.

If you are interested in taking a look at the inner working of this algorithm, or even implement and run it yourself, follow the instruction in the readme below.

Source: Udacity’s Self Driving Nano-degree program

--

--

Shiva Chandrachary

I am an Automated Driving Engineer at Ford who is passionate about making travel safer and easier through the power of AI