Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
arXiv Code(Coming Soon)Abstract
Autoregressive (AR) architectures have achieved significant successes in LLM, inspiring explorations for video generation. In LLMs, top-
Motivation
We study why static top-$k$/top-$p$ decoding breaks in autoregressive video generation and propose Entropy-Guided k-Guard (ENkG), a training-free, model-agnostic sampler that adapts token candidate sets to predictive entropy to mitigate error accumulation and entropy collapse.
F1. Flat token distributions
Video tokens have low semantic density and high spatio-temporal redundancy, producing flatter predictive distributions. A fixed candidate size (top-$k$/top-$p$) becomes brittle and amplifies early mistakes.
F2. Entropy aligns with structure
High-entropy regions correspond to repeating textures (sky/foliage/road), while low-entropy regions cluster around structured content (boundaries, edges, lane markings). Static truncation ignores this structure.
F3. Entropy collapse in long horizons
As generation proceeds, low-entropy regions expand and frame-average entropy drops, causing texture wash-out, over-smoothing, and degenerate dynamics.
Ablation Study
Impact of Entropy-Adaptive Guidance
Removing entropy guidance leads to texture decay and color shifting, consistent with entropy collapse dynamics.
Without Entropy Guidance
With Entropy Guidance (Ours)
Impact of k-Guard Design
The k-guard enforces minimal exploration even in low-entropy regimes, mitigating degenerate static rollouts and frame-freezing.
Without k-Guard
With k-Guard (Ours)
BibTeX
@misc{han2026entropyguidedkguardsamplinglonghorizon,
title={Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation},
author={Yizhao Han and Tianxing Shi and Zhao Wang and Zifan Xu and Zhiyuan Pu and Mingxiao Li and Qian Zhang and Wei Yin and Xiao-Xiao Long},
year={2026},
eprint={2601.19488},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.19488},
}