Explicit Memory through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

In category-agnostic video segmentation, poor temporal consistency and ineffective retention of historical object instances remain critical challenges. To address these issues, this paper introduces an explicit memory mechanism based on online 3D Gaussian splatting (3DGS). For the first time, 3DGS is employed as a dynamically updatable and queryable structured memory module, integrated into both FastSAM and SAM2 frameworks to yield two novel models: FastSAM-Splat and SAM2-Splat. A geometry-aware point-cloud fusion strategy explicitly injects spatial and appearance cues from historical object segments into the current frame’s segmentation process. Evaluated on both real-world and synthetic video datasets, our approach achieves significant improvements in segmentation accuracy and temporal consistency, consistently outperforming memory-free baselines and methods relying solely on implicit memory. These results empirically validate the effectiveness and generalizability of explicit 3D geometric memory for category-agnostic video understanding.

Technology Category

Application Category

📝 Abstract

Remembering where object segments were predicted in the past is useful for improving the accuracy and consistency of class-agnostic video segmentation algorithms. Existing video segmentation algorithms typically use either no object-level memory (e.g. FastSAM) or they use implicit memories in the form of recurrent neural network features (e.g. SAM2). In this paper, we augment both types of segmentation models using an explicit 3D memory and show that the resulting models have more accurate and consistent predictions. For this, we develop an online 3D Gaussian Splatting (3DGS) technique to store predicted object-level segments generated throughout the duration of a video. Based on this 3DGS representation, a set of fusion techniques are developed, named FastSAM-Splat and SAM2-Splat, that use the explicit 3DGS memory to improve their respective foundation models' predictions. Ablation experiments are used to validate the proposed techniques' design and hyperparameter settings. Results from both real-world and simulated benchmarking experiments show that models which use explicit 3D memories result in more accurate and consistent predictions than those which use no memory or only implicit neural network memories. Project Page: https://topipari.com/projects/FastSAM-Splat/

Problem

Research questions and friction points this paper is trying to address.

Improving class-agnostic video segmentation accuracy through explicit 3D memory

Storing object segments using online 3D Gaussian Splatting technique

Enhancing segmentation consistency by replacing implicit neural network memories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online 3D Gaussian Splatting stores object segments

Explicit 3D memory improves segmentation accuracy

Fusion techniques enhance foundation models' predictions

🔎 Similar Papers

Segment Any 3D Gaussians