Matteo Spanio

CSC lab, building DEI/S

Via Gardenigo 6/A

Padua, Italy

Hi there, I’m Matteo Spanio, Ph. D. student at the University of Padua. I’m interested in:

🤖 machine learning,
📡 signal processing,
🎹 music.

My Ph. D. project focuses on studying deep learning methods for music generation based on cross modal perception.

I’m member of the MPAI group, the international, unaffiliated, no-profit organisation developing standards for AI-based data coding.

I’m also a musician, and I play the clarinet on a regular basis in many orchestras and ensembles (Orchestra di Padova e del Veneto, Orchestra del Friuli Venezia Giulia, Orchestra San Marco, Concordia Chamber Orchestra, Rossini Ensemble and many others).

To see my work, check out my projects and publications. Time permitting, I also write blog posts.

news

Mar 05, 2025	I am happy to announce the release of a our new deep learning model for synesthetic music generation. It is available for download on Hugging Face. Learn more about it in our preprint paper. Enjoy!
Nov 03, 2024	My two new papers Towards Emotionally Aware AI: Challenges and Opportunities in the Evolution of Multimodal Generative Models and Filming the sound: Anomaly Detection on Audio Tape Recordings using Computer Vision Algorithms have been accepted for publication at the 23rd International Conference of the Italian Association for Artificial Intelligence that will take place in Bozen between 25 and 28 november. See you there!
Aug 18, 2024	A new paper titled A novel derivative-based approach for the automatic detection of time-reversed audio in the MPAI/IEEE-CAE ARP international standard by Marina Bosi, Fabio Zanini, Alessandro Russo, Sergio Canazza and me has been accepted at the AES Show 2024 that will take place in New York from 8 to 10 October 2024. See you in New York!
Jun 01, 2024	I will attend the 7th Advanced Course on Data Science & Machine Learning (ACDL) from 10 to 14 june. See you there!
Sep 11, 2023	Today Dr. Alessandro Russo is presenting our new article “Enhancing Preservation and Restoration of Open Reel Audio Tapes Through Computer Vision” at the International Conference of Image Analysis and Processing (ICIAP 2023) in Udine, see you there.

latest posts

Jun 21, 2024	Python's virtual environments
Nov 10, 2023	Python's Static Typing Safari: In Search of Code Clarity
Aug 30, 2022	Principles of statistic

selected publications

ICIAP
Enhancing Preservation and Restoration of Open Reel Audio Tapes Through Computer Vision

Alessandro Russo, Matteo Spanio, and Sergio Canazza

In Image Analysis and Processing - ICIAP 2023 Workshops, 2024

Abs Bib

Analog audio documents inevitably face degradation over time, posing a challenge for preserving their audio content and ensuring the integrity of the recordings. Analog document preservation is one of the main research topics of interest of the Centro di Sonologia Computazionale (CSC) of the Department of Information Engineering of the University of Padua, which over the years developed and implemented a methodology for preservation that includes, among other things, the video recording of the digitization process of the open-reel tapes for documenting irregularities on the top of their surface. Together with the corpus of digitized high-quality audio recordings, this led to the creation of an internal archive of video documents. This paper presents a software application that leverages computer vision techniques to automatically detect Irregularities on open-reel audio tapes, analyzing the video documents produced during the digitization interventions. The software employs a frame-by-frame analysis to automatically identify and highlight points of interest that may indicate tape damages, splices, and other Irregularities. The software uses Generalized Hough Transform and SURF algorithms to locate regions of interest within the tape. The proposed software is also part of the MPAI/IEEE-CAE ARP standard developed by Audio Innova s.r.l., spin-off of the CSC, and it may offer a robust and efficient solution for analyzing open-reel audio tapes, supporting archivists and musicologists in their activities.
@inproceedings{10.1007/978-3-031-51026-7_26, language = {en}, doi = {10.1007/978-3-031-51026-7_26}, url = {https://doi.org/10.1007/978-3-031-51026-7_26}, author = {Russo, Alessandro and Spanio, Matteo and Canazza, Sergio}, editor = {Foresti, Gian Luca and Fusiello, Andrea and Hancock, Edwin}, title = {Enhancing Preservation and Restoration of Open Reel Audio Tapes Through Computer Vision}, booktitle = {Image Analysis and Processing - ICIAP 2023 Workshops}, year = {2024}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {297--308}, isbn = {978-3-031-51026-7}, }
AES
A novel derivative-based approach for the automatic detection of time-reversed audio in the MPAI/IEEE-CAE ARP international standard

Marina Bosi, Fabio Zanini, Matteo Spanio, and 2 more authors

Journal of the Audio Engineering Society, 2024

Abs Bib

The Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Context-based Audio Enhancement (CAE) Audio Recording Preservation (ARP) standard provides the technical specifications for a comprehensive framework for digitizing and preserving analog audio, specifically focusing on documents recorded on open-reel tapes. This paper presents a novel envelope derivative-based method designed to be integrated into the ARP standard, for detecting reverse audio sections during the preservation process. The primary objective of this method is to automatically identify segments of audio recorded in reverse. Leveraging derivative-based signal processing algorithms, the system enhances its capability to detect and reverse such sections, thereby reducing errors during the preservation process. This feature not only aids in identifying and correcting errors but also enhances the efficiency of large-scale audio document archiving projects. The system’s performance was evaluated using a diverse dataset that includes various musical genres and digitized tapes, demonstrating its strong potential and effectiveness across different types of audio content.
@article{bosi2024a, language = {en}, author = {Bosi, Marina and Zanini, Fabio and Spanio, Matteo and Russo, Alessandro and Sergio, Canazza}, journal = {Journal of the Audio Engineering Society}, title = {A novel derivative-based approach for the automatic detection of time-reversed audio in the MPAI/IEEE-CAE ARP international standard}, year = {2024}, number = {10190}, url = {https://aes2.org/publications/elibrary-page/?id=22693} }
arXiv
A Multimodal Symphony: Integrating Taste and Sound through Generative AI

Matteo Spanio, Massimiliano Zampini, Antonio Rodà, and 1 more author

2025

Abs Bib HTML PDF

In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants’ (n=111) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.
@misc{spanio2025arxiv, language = {en}, title = {A Multimodal Symphony: Integrating Taste and Sound through Generative AI}, author = {Spanio, Matteo and Zampini, Massimiliano and Rodà, Antonio and Pierucci, Franco}, year = {2025}, eprint = {2503.02823}, archiveprefix = {arXiv}, primaryclass = {cs.SD}, doi = {10.48550/arXiv.2503.02823}, url = {https://arxiv.org/abs/2503.02823}, }