Ciao! I am Roberto Amoroso, a Senior Research Engineer at NVIDIA in Munich, Germany 🇩🇪, where I build Multimodal Video Understanding and Vision-Language Models (VLMs) systems for large-scale retrieval, with a focus on Autonomous Vehicle applications.
My work centers on VLM-based retrieval architectures that align language with visual signals at scale, enabling users and AI agents to surface relevant moments, objects, and scenes across massive image and video collections.
Currently, I am training VLMs on fleet-scale video data to produce rich scene descriptions and retrieval-quality embeddings, enabling both search over large video collections and downstream agentic systems that reason about dynamic scenes — with a current focus on autonomous driving.
I completed my PhD through the ELLIS program and the International Doctorate in ICT at the AImageLab research group of the University of Modena and Reggio Emilia (UNIMORE) 🇮🇹, under the supervision of Prof. Rita Cucchiara and Prof. Lorenzo Baraldi.
During my PhD, I also completed an internship at LMU — Ludwig-Maximilians-Universität of Munich, Germany 🇩🇪, working on Multimodal LLMs for Video Question Answering and Open-vocabulary Segmentation under the co-supervision of Prof. Volker Tresp.
Earlier, I was a Research Scholar at the Networking Research Group in Saint Louis, USA 🇺🇸, working on super-resolution techniques applied to Internet traffic matrices.
Beyond my current focus, my past research has also covered the pre-training and optimization of Transformer architectures, image classification, self-supervised learning, deepfake detection of synthetic images, and image watermarking.
Feel free to reach out if you have any questions or curiosities! :)
ELLIS PhD in AI and Computer Vision, 2024
UNIMORE, Italy 🇮🇹 | LMU, Germany 🇩🇪 | NVIDIA, Germany 🇩🇪
MS in Artificial Intelligence, 2020
UNIMORE, Italy 🇮🇹 | AGH, Poland 🇵🇱 | Saint Louis University, USA 🇺🇸
BS in Computer Engineering, 2018
UNIMORE, Italy 🇮🇹
HumanE-AI-NET project, funded by the EU Framework Programme for Research and Innovation Horizon 2020.