SHARP: Short-Window Streaming for Accurate and Robust Prediction in Motion Forecasting

Institute of Visual Computing, Graz University of Technology
CVPR 2026

Abstract

In dynamic traffic environments, motion forecasting models must be able to accurately estimate future trajectories continuously. Streaming-based methods are a promising solution, but despite recent advances, their performance often degrades when exposed to heterogeneous observation lengths. To address this, we propose a novel streaming-based motion forecasting framework that explicitly focuses on evolving scenes. Our method incrementally processes incoming observation windows and leverages an instance-aware context streaming to maintain and update latent agent representations across inference steps. A dual training objective further enables consistent forecasting accuracy across diverse observation horizons. Extensive experiments on Argoverse 2, nuScenes, and Argoverse 1 demonstrate the robustness of our approach under evolving scene conditions and also on the single-agent benchmarks. Our model achieves state-of-the-art performance in streaming inference on the Argoverse 2 multi-agent benchmark, while maintaining minimal latency, highlighting its suitability for real-world deployment.

Motivation and Contributions

  • Trajectory prediction is a core component of the autonomous vehicle (AV) control stack, providing hypotheses on the future motions of surrounding agents based on perception outputs.
  • Accurate and robust predictions are critical for safe and efficient motion planning which enables the AV to anticipate and respond to dynamic behaviors in complex traffic scenarios.
  • In real-world driving, traffic scenes are constantly changing. Agents entering the field-of-view of the AV have been observed only briefly, whereas for other agents a more comprehensive motion history is available.
  • Motion forecasting models must therefore be able to leverage heterogeneous historical observations effectively while operating under real-time constraints in continuously evolving scenes.
  • Challenges
    • Existing benchmarks consider only fixed-size historical and future windows, while in practice, the historical context can range from a few frames up to several seconds.
    • Methods that rely on extensive contexts typically achieve the most accurate results, but must delay predictions until a sufficient amount of observations are accumulated for newly detected agents.
    • Streaming-based methods are a promising solution, but despite recent advances, their performance often degrades when exposed to heterogeneous observation lengths.
    • To handle the constantly evolving context of real-world traffic scenes, models require the ability to efficiently propagate motion features as long as the agents are visible, a challenge that has not yet been sufficiently addressed.

Key Contributions

Novel Streaming Framework: We propose a trajectory forecasting approach that accurately predicts motions across varying observation lengths in continuously evolving scenes.
Instance-Aware Context Streaming: We introduce a new streaming module and a dual training objective to jointly optimize for both long-context and single-chunk predictions.
State-of-the-Art Performance: Our model achieves SOTA results in streaming inference on the Argoverse 2 multi-agent benchmark while maintaining minimal latency for real-world deployment.
Extensive evaluations on AV2, nuScenes, and AV1 show that our method excels across various datasets providing a lightweight yet effective motion forecasting model.

Results

AV2 Visualizations

BibTeX

@inproceedings{prutsch2026sharp,
              title={{SHARP: Short-Window Streaming for Accurate and Robust Prediction in Motion Forecasting}},
              author={Prutsch, Alexander and Fruhwirth-Reisinger, Christian and Schinagl, David and Possegger, Horst},
              booktitle={In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
              year={2026}
          }