Parallel Prompting

Scalable Parallel Prompting for Complex AV Video Captioning

[ CVPRW 2026 ] We propose pSVLMs, a scalable video captioning framework based on small VLMs that produces diverse intermediate captions, which are consolidated into a comprehensive unified description for autonomous driving datasets.

April Yang, Roberto Amoroso, Nikita Durasov, Devansh Bisla, Sandipan Kundu, Elmar Haussmann, Ruchi Bhargava, Maying Shen, Nadine Chang, Jose M. Alvarez