Ciao! I am Roberto Amoroso, a Research Engineer at NVIDIA in Munich, Germany 🇩🇪, where I build Multimodal Video Understanding and Vision-Language Models (VLMs) systems for large-scale text–image and text–video retrieval, with a focus on Autonomous Vehicle applications.
My work focuses on retrieval architectures that align language with visual signals at scale, enabling users and systems to find relevant moments, objects, and scenes in large image and video collections.
I genuinely enjoy turning research ideas into practical, high-impact solutions.
I completed my PhD through the ELLIS program and the International Doctorate in ICT at the AImageLab research group of the University of Modena and Reggio Emilia (UNIMORE) 🇮🇹, under the supervision of Prof. Rita Cucchiara and Prof. Lorenzo Baraldi.
During my PhD, I also completed a PhD internship at LMU - Ludwig-Maximilians-Universität of Munich, in Germany 🇩🇪, focusing on Multimodal LLM for Video Question Answering and Open-vocabulary Segmentation, under the co-supervision of Prof. Volker Tresp.
I was also a Research Scholar at the Networking Research Group in Saint Louis, USA 🇺🇸, working on Super-resolution techniques applied to Internet traffic matrices.
My primary areas of research are Multimodal Video Understanding and Vision-Language Models for information retrieval, with a focus on MLLM-based text-to-visual retrieval architectures for both images and videos. In addition, I have also conducted research on the pre-training and optimization of Transformer-based architecture for image classification, open-vocabulary segmentation, self-supervised learning, deepfake detection of synthetic images, and the development of image watermarking systems.
Feel free to reach me out if you have any questions or curiosities! :)
ELLIS PhD in AI and Computer Vision, 2024
UNIMORE, Italy 🇮🇹 | LMU, Germany 🇩🇪 | NVIDIA, Germany 🇩🇪
MS in Artificial Intelligence, 2020
UNIMORE, Italy 🇮🇹 | AGH, Poland 🇵🇱 | Saint Louis University, USA 🇺🇸
BS in Computer Engineering, 2018
UNIMORE, Italy 🇮🇹
HumanE-AI-NET project, funded by the EU Framework Programme for Research and Innovation Horizon 2020.