Visual-Semantic Understanding

Multimodal Attentive Deep Learning Architectures for Visual-Semantic Understanding

PhD Thesis investigating critical challenges associated with multimodal attentive architectures, including improving semantic segmentation accuracy, enabling open-vocabulary segmentation, and advancing video question answering.

Roberto Amoroso

Multimodal Attentive Deep Learning Architectures for Visual-Semantic Understanding