ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

FAIR, Meta
^*Equal contribution
To Appear at CVPR 2024

Abstract

Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images. However, NeRF training requires accurate camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Recent works have attempted to relax this constraint, but they still often rely on decent initial poses which they can refine. Here we aim at removing the requirement for pose initialization. We present Incremental CONfidence (ICON), an optimization procedure for training NeRFs from 2D video frames. ICON only assumes smooth camera motion to estimate initial guess for poses. Further, ICON introduces "confidence": an adaptive measure of model quality used to dynamically reweight gradients. ICON relies on high-confidence poses to learn NeRF, and high-confidence 3D structure (as encoded by NeRF) to learn poses. We show that ICON, without prior pose initialization, achieves superior performance in both CO3D and HO3D versus methods which use SfM pose.

Applications in-the-wild

Input recordings

Take a single-view monocular GoPro video of tennis swing demo

Racket pose

ICON predicts accurate racket poses

View synthesis

Completes out-of-view part of the racket

Mesh Extraction

Turns racket into 3D mesh asset

Common failure modes of existing methods

Bas Relief

Pose Estimation vs. GT

Trajectory is inverted by 180 degree along z-axis

Reconstruction

A concave appled inside the table.

Fragmentation

Pose Estimation vs. GT

Pose and NeRF break apart, producing sepa- rate, mutually invisible radiance fields.

Reconstruction

A tube of stacked toytrucks that the camera flies through like a flipbook

Overlapping Registration

Pose Estimation vs. GT

Two subsets of the pose trajectory are trapped in a local minimum, incorrectly observing the same part of the radiance field, leading to blurry rendering and empty voxels.

Reconstruction

One side of the toaster is blurry due to over- lapping views, while the other has no views and is vacant.

Other Pose-free NeRFs

BibTeX

@article{wang2024icon,
      title={ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization},
      author={Wang, Weiyao and Gleize, Pierre and Tang, Hao and Chen, Xingyu and Liang, Kevin J and Feiszli, Matt},
      journal={CVPR},
      year={2024}
    }

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

ICON 3D-fy (pose+shape+texture) objects from ANY video without depending on depth or pose.

Abstract

Video

Applications in-the-wild

Input recordings

Racket pose

View synthesis

Mesh Extraction

Common failure modes of existing methods

Bas Relief

Pose Estimation vs. GT

Reconstruction

Fragmentation

Pose Estimation vs. GT

Reconstruction

Overlapping Registration

Pose Estimation vs. GT

Reconstruction

Other Pose-free NeRFs

BibTeX