ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

FAIR, Meta
*Equal contribution
To Appear at CVPR 2024

ICON 3D-fy (pose+shape+texture) objects from ANY video without depending on depth or pose.

Shape and pose evolution over time.

Abstract

Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images. However, NeRF training requires accurate camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Recent works have attempted to relax this constraint, but they still often rely on decent initial poses which they can refine. Here we aim at removing the requirement for pose initialization. We present Incremental CONfidence (ICON), an optimization procedure for training NeRFs from 2D video frames. ICON only assumes smooth camera motion to estimate initial guess for poses. Further, ICON introduces "confidence": an adaptive measure of model quality used to dynamically reweight gradients. ICON relies on high-confidence poses to learn NeRF, and high-confidence 3D structure (as encoded by NeRF) to learn poses. We show that ICON, without prior pose initialization, achieves superior performance in both CO3D and HO3D versus methods which use SfM pose.

Video

Insert paper presentation video. It's fine if we don't have it now.

Applications in-the-wild

Input recordings

Take a single-view monocular GoPro video of tennis swing demo

Racket pose

ICON predicts accurate racket poses

View synthesis

Completes out-of-view part of the racket

Mesh Extraction

Turns racket into 3D mesh asset

Common failure modes of existing methods

Bas Relief

Pose Estimation vs. GT

a reverted trajectory

Trajectory is inverted by 180 degree along z-axis

Reconstruction

A concave appled inside the table.

Fragmentation

Pose Estimation vs. GT

a fragmented trajectory

Pose and NeRF break apart, producing sepa- rate, mutually invisible radiance fields.

Reconstruction

A tube of stacked toytrucks that the camera flies through like a flipbook

Overlapping Registration

Pose Estimation vs. GT

an overlapped trajectory

Two subsets of the pose trajectory are trapped in a local minimum, incorrectly observing the same part of the radiance field, leading to blurry rendering and empty voxels.

Reconstruction

One side of the toaster is blurry due to over- lapping views, while the other has no views and is vacant.

Other Pose-free NeRFs

Failures of other methods

BibTeX

@article{wang2024icon,
      title={ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization},
      author={Wang, Weiyao and Gleize, Pierre and Tang, Hao and Chen, Xingyu and Liang, Kevin J and Feiszli, Matt},
      journal={CVPR},
      year={2024}
    }