Roberto Amoroso
Roberto Amoroso
Home
News
Experience
Awards
Publications
Activities
Contact
Light
Dark
Automatic
Parallel Prompting
Scalable Parallel Prompting for Complex AV Video Captioning
[ CVPRW 2026 ]
We propose
pSVLMs
, a scalable video captioning framework based on small VLMs that produces diverse intermediate captions, which are consolidated into a comprehensive unified description for autonomous driving datasets.
April Yang
,
Roberto Amoroso
,
Nikita Durasov
,
Devansh Bisla
,
Sandipan Kundu
,
Elmar Haussmann
,
Ruchi Bhargava
,
Maying Shen
,
Nadine Chang
,
Jose M. Alvarez
Cite
Cite
×