TR2026-055

LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction


    •  Ding, T., Xie, Y., Liang, Y., Chatterjee, M., Miraldo, P., Jiang, H., "LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), May 2026.
      BibTeX TR2026-055 PDF
      • @inproceedings{Ding2026may,
      • author = {Ding, Tianye and Xie, Yiming and Liang, Yiqing and Chatterjee, Moitreya and Miraldo, Pedro and Jiang, Huaizu},
      • title = {{LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction}},
      • booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      • year = 2026,
      • month = may,
      • url = {https://www.merl.com/publications/TR2026-055}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Computer Vision, Machine Learning

Abstract:

Recent feed-forward reconstruction models like VGGT and p3 achieve impressive reconstruction quality but cannot process streaming videos due to quadratic memory complexity, limiting their practical deployment. While existing streaming methods address this through learned memory mechanisms or causal attention, they require extensive retraining and may not fully leverage the strong geometric priors of state-of-the-art offline models. We pro- pose LASER, a training-free framework that converts an offline reconstruction model into a streaming system by aligning predictions across consecutive temporal windows. We observe that simple similarity transformation (Sim(3)) alignment fails due to layer depth misalignment: monocular scale ambiguity causes relative depth scales of different scene layers to vary inconsistently between windows. To address this, we introduce layer-wise scale alignment, which segments depth predictions into discrete layers, computes per-layer scale factors, and propagates them across both adjacent windows and timestamps. Extensive experiments show that LASER achieves state-of-the-art performance on camera pose estimation and point map reconstruction while operating at 14 FPS with 6 GB peak memory on a RTX A6000 GPU, enabling practical deployment for kilometer-scale streaming videos

 

  • Related Publication

  •  Ding, T., Xie, Y., Liang, Y., Chatterjee, M., Miraldo, P., Jiang, H., "LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction", arXiv, December 2025.
    BibTeX arXiv
    • @article{Ding2025dec,
    • author = {Ding, Tianye and Xie, Yiming and Liang, Yiqing and Chatterjee, Moitreya and Miraldo, Pedro and Jiang, Huaizu},
    • title = {{LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction}},
    • journal = {arXiv},
    • year = 2025,
    • month = dec,
    • url = {https://arxiv.org/abs/2512.13680}
    • }