Google and Waymo Introduce Block-NeRF to Enable Large-Scale Scene Reconstruction

Researchers from UC Berkeley, Waymo and Google Research have proposed a grid-based Block-NeRF variant to represent larger environments. In the paper, Block-NeRF: Scalable Large Scene Neural View Synthesis, the researchers demonstrated that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is critical to decompose the scene into individually trained NeRFs.

Bloc-NeRF is built on top of NeRFs and the recently introduced mip-NeRF extension, a multi-scale representation for neuronal burst anti-aliasing fields that reduce aliasing issues that impair NeRF performance in scenes where input images observe a given scene from different distances. The team is also integrating NeRF in the Wild (NeRF-W) techniques to manage inconsistent scene appearances when applying NeRF to landmarks in the Photo Tourism dataset. The Bloc-NeRF proposal can thus combine several NeRFs to reconstruct a large coherent environment from millions of images.

The researchers used Block-NeRF, a variant of Neuronal Burst Fields that can represent large-scale environments. The researchers demonstrated that when scaling rendered NeRF to city-scale scenes spanning multiple blocks, it is critical to decompose the scene into individually trained NeRFs. This decomposition decouples render time from scene size, allows rendering to scale to arbitrarily large environments, and allows for block-based updates to the environment. The team adopted several architectural changes to bring robust NeRF to data captured over months in different environmental conditions. They also added appearance embeddings, learned pose refinement and controllable exposure to each individual NeRF, and introduced a procedure to align the appearance between adjacent NeRFs so they can be combined seamlessly.

The researchers used the San Francisco Place Alamo neighborhood as the target area and the Mission Bay district as the benchmark. The training dataset was derived from 13.4 hours of driving time from 1,330 different data collection cycles for a total of 2,818,745 training images.

Leave a Comment