Traditional volumetric fusion algorithms preserve the spatial structure of 3D scenes, which is beneficial for many tasks in computer vision and robotics. However, they often lack realism in terms of visualization. Emerging 3D Gaussian splatting bridges this gap, but existing Gaussian-based reconstruction methods often suffer from artifacts and inconsistencies with the underlying 3D structure, and struggle with real-time optimization, unable to provide users with immediate feedback in high quality. One of the bottlenecks arises from the massive amount of Gaussian parameters that need to be updated during optimization. Instead of using 3D Gaussian as a standalone map representation, we incorporate it into a volumetric mapping system to take advantage of geometric information and propose to use a quadtree data structure on images to drastically reduce the number of splats initialized. In this way, we simultaneously generate a compact 3D Gaussian map with fewer artifacts and a volumetric map on the fly. Our method, GSFusion, significantly enhances computational efficiency without sacrificing rendering quality, as demonstrated on both synthetic and real datasets.
At each time step, our GSFusion takes a pair of RGB-D images as input. The depth data is fused into an octree-based TSDF grid to capture geometric structure while the RGB image is segmented using quadtree based on contrast. A new 3D Gaussian is then initialized at the back-projected center of a quadrant if there are no adjacent Gaussians by checking its nearest voxel. We optimize Gaussian parameters on the fly by minimizing the photometric loss between the rendered image and input RGB. Additionally, we maintain a keyframe set to deal with the forgetting problem. After scanning, the system provides both a volumetric map and a 3D Gaussian map for subsequent tasks.
We compare our method with SplaTAM and RTG-SLAM, and we show rendered images from both training view and novel view on ScanNet++ dataset.
Here we show reconstructed 3D Gaussian models on room0 scene from synthetic Replica dataset and 8b5caf3398 scene from real ScanNet++ dataset.
Mapping: 0.13 fps
Model size: 333.9MB
Training view PSNR: 33.90
Mapping: 7.64 fps
Model size: 66.1MB
Training view PSNR: 29.01
Mapping: 7.90 fps
Model size: 58.1MB
Training view PSNR: 33.95
Mapping: 0.17 fps
Model size: 244.6MB
Training view PSNR: 25.75
Novel view PSNR: 22.76
Mapping: 1.18 fps
Model size: 118.4MB
Training view PSNR: 18.16
Novel view PSNR: 18.52
Mapping: 6.66 fps
Model size: 27.6MB
Training view PSNR: 29.63
Novel view PSNR: 25.35
We collected real-world data using our drone mounted with a RealSense D455 camera.
All poses are computed by OKVIS2, a real-time VI-SLAM with loop closure.
Despite noisy depth maps, our GSFusion manages to provide realistic rendering from novel viewpoints, as well as an accurate geometric map for downstream robotics tasks, e.g., navigation and path planning.
Reconstructed Scene
Extracted Mesh
This work was supported by the EU project AUTOASSESS. The authors would like to thank Simon Boche and Sebastián Barbas Laina for their assistance in collecting and processing drone data. We also extend our gratitude to Sotiris Papatheodorou for his valuable discussions and support with the Supereight2 software.
@misc{wei2024gsfusiononlinergbdmapping,
title={GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion},
author={Jiaxin Wei and Stefan Leutenegger},
year={2024},
eprint={2408.12677},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.12677},
}