GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion

Abstract

Traditional volumetric fusion algorithms preserve the spatial structure of 3D scenes, which is beneficial for many tasks in computer vision and robotics. However, they often lack realism in terms of visualization. Emerging 3D Gaussian splatting bridges this gap, but existing Gaussian-based reconstruction methods often suffer from artifacts and inconsistencies with the underlying 3D structure, and struggle with real-time optimization, unable to provide users with immediate feedback in high quality. One of the bottlenecks arises from the massive amount of Gaussian parameters that need to be updated during optimization. Instead of using 3D Gaussian as a standalone map representation, we incorporate it into a volumetric mapping system to take advantage of geometric information and propose to use a quadtree data structure on images to drastically reduce the number of splats initialized. In this way, we simultaneously generate a compact 3D Gaussian map with fewer artifacts and a volumetric map on the fly. Our method, GSFusion, significantly enhances computational efficiency without sacrificing rendering quality, as demonstrated on both synthetic and real datasets.

Method Overview

At each time step, our GSFusion takes a pair of RGB-D images as input. The depth data is fused into an octree-based TSDF grid to capture geometric structure while the RGB image is segmented using quadtree based on contrast. A new 3D Gaussian is then initialized at the back-projected center of a quadrant if there are no adjacent Gaussians by checking its nearest voxel. We optimize Gaussian parameters on the fly by minimizing the photometric loss between the rendered image and input RGB. Additionally, we maintain a keyframe set to deal with the forgetting problem. After scanning, the system provides both a volumetric map and a 3D Gaussian map for subsequent tasks.

Rendering Results

We compare our method with SplaTAM and RTG-SLAM, and we show rendered images from both training view and novel view on ScanNet++ dataset.

Training View

Ours

SplaTAM

Ours

SplaTAM

Ours

SplaTAM

Ours

SplaTAM

Ours

RTG-SLAM

Ours

RTG-SLAM

Ours

RTG-SLAM

Ours

RTG-SLAM

Novel View

Ours

SplaTAM

Ours

SplaTAM

Ours

SplaTAM

Ours

SplaTAM

Ours

RTG-SLAM

Ours

RTG-SLAM

Ours

RTG-SLAM

Ours

RTG-SLAM

Reconstruction Results

Here we show reconstructed 3D Gaussian models on room0 scene from synthetic Replica dataset and 8b5caf3398 scene from real ScanNet++ dataset.

SplaTAM

Mapping: 0.13 fps

Model size: 333.9MB

Training view PSNR: 33.90

RTG-SLAM

Mapping: 7.64 fps

Model size: 66.1MB

Training view PSNR: 29.01

Ours

Mapping: 7.90 fps

Model size: 58.1MB

Training view PSNR: 33.95

Mapping: 0.17 fps

Model size: 244.6MB

Training view PSNR: 25.75

Novel view PSNR: 22.76

Mapping: 1.18 fps

Model size: 118.4MB

Training view PSNR: 18.16

Novel view PSNR: 18.52

Mapping: 6.66 fps

Model size: 27.6MB

Training view PSNR: 29.63

Novel view PSNR: 25.35

Drone Demo

We collected real-world data using our drone mounted with a RealSense D455 camera.

All poses are computed by OKVIS2, a real-time VI-SLAM with loop closure.

Despite noisy depth maps, our GSFusion manages to provide realistic rendering from novel viewpoints, as well as an accurate geometric map for downstream robotics tasks, e.g., navigation and path planning.

Reconstructed Scene

Extracted Mesh

Acknowledgment

This work was supported by the EU project AUTOASSESS. The authors would like to thank Simon Boche and Sebastián Barbas Laina for their assistance in collecting and processing drone data. We also extend our gratitude to Sotiris Papatheodorou for his valuable discussions and support with the Supereight2 software.

BibTeX


      @article{wei2024gsfusion,
        title={Gsfusion: Online rgb-d mapping where gaussian splatting meets tsdf fusion},
        author={Wei, Jiaxin and Leutenegger, Stefan},
        journal={IEEE Robotics and Automation Letters},
        year={2024},
        publisher={IEEE}
      }

GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion

We present three demos on Replica, ScanNet++, and self-collected drone data, all running on a single Nvidia RTX 3060 GPU, with no video speedup.

Abstract

Method Overview

Rendering Results

Training View

Novel View

Reconstruction Results

SplaTAM

RTG-SLAM

Ours

Drone Demo

Acknowledgment

BibTeX