logo

Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

Anonymous

TL;DR: We present GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images.

Abstract

Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods.

Video

Pipeline

overview

(a) We initialize 3D Gaussians by constructing a visual hull with camera parameters and masked images, optimizing them with the \(\mathcal{L}_{\text{gs}}\) and refining through floater elimination. (b) We use a novel `leave-one-out' strategy and add 3D noise to Gaussians to generate corrupted Gaussian renderings. These renderings, paired with their corresponding reference images, facilitate the training of the Gaussian repair model employing \(\mathcal{L}_{\text{tune}}\). (c) Once trained, the Gaussian repair model is frozen and used to correct views that need to be rectified. These views are identified through distance-aware sampling. The repaired images and reference images are used to further optimize 3D Gaussians with \(\mathcal{L}_{\text{rep}}\) and \(\mathcal{L}_{\text{gs}}\).

More Results

input-images

Generated with only four input images.


Ablation

Evolution of the 3D objects with Gaussian splatting only, with structure priors and with Gaussian repair model injected.