Analog audio documents inevitably face degradation over time, posing a challenge for preserving their audio content and ensuring the integrity of the recordings. Analog document preservation is one of the main research topics of interest of the Centro di Sonologia Computazionale (CSC) of the Department of Information Engineering of the University of Padua, which over the years developed and implemented a methodology for preservation that includes, among other things, the video recording of the digitization process of the open-reel tapes for documenting irregularities on the top of their surface. Together with the corpus of digitized high-quality audio recordings, this led to the creation of an internal archive of video documents. This paper presents a software application that leverages computer vision techniques to automatically detect Irregularities on open-reel audio tapes, analyzing the video documents produced during the digitization interventions. The software employs a frame-by-frame analysis to automatically identify and highlight points of interest that may indicate tape damages, splices, and other Irregularities. The software uses Generalized Hough Transform and SURF algorithms to locate regions of interest within the tape. The proposed software is also part of the MPAI/IEEE-CAE ARP standard developed by Audio Innova s.r.l., spin-off of the CSC, and it may offer a robust and efficient solution for analyzing open-reel audio tapes, supporting archivists and musicologists in their activities.
The Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Context-based Audio Enhancement (CAE) Audio Recording Preservation (ARP) standard provides the technical specifications for a comprehensive framework for digitizing and preserving analog audio, specifically focusing on documents recorded on open-reel tapes. This paper presents a novel envelope derivative-based method designed to be integrated into the ARP standard, for detecting reverse audio sections during the preservation process. The primary objective of this method is to automatically identify segments of audio recorded in reverse. Leveraging derivative-based signal processing algorithms, the system enhances its capability to detect and reverse such sections, thereby reducing errors during the preservation process. This feature not only aids in identifying and correcting errors but also enhances the efficiency of large-scale audio document archiving projects. The system’s performance was evaluated using a diverse dataset that includes various musical genres and digitized tapes, demonstrating its strong potential and effectiveness across different types of audio content.
IEEE2024
From Tape to Code: An International AI-Based Standard for Audio Cultural Heritage Preservation - Don’t Play That Song for me (If it’s Not Preserved With ARP!)
Marina Bosi, Sergio Canazza, Niccolò Pretto, and 2 more authors
This article describes a novel technology for preserving audio documents archived on open-reel magnetic tapes forming the core of the Audio Recording Preservation (ARP) international standard. ARP is part of the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) Context-based Audio Enhancement (CAE) standard, adopted by the IEEE Standard Association as IEEE 3302-2022 in December 2022. Leveraging automated Artificial Intelligence (AI) tools, ARP analyzes and extracts relevant information from digitized audio and video files of the tape’s corresponding digital Preservation Copy. This process includes identifying speed variations and surface irregularities on the tape, automatically rectifying errors to generate a restored Access Copy. By utilizing the ARP standard, archives gain a potent tool for expediting and optimizing the description of the preservation conditions of the tape, as well as automatically correcting any errors that may have occurred during the digitization process. This technology offers an efficient solution for managing both small and large collections of digitized analog items, marking a substantial advancement in the preservation of audio documents.
IAI4CH2024
Filming the sound: Anomaly Detection on Audio Tape Recordings using Computer Vision Algorithms
Zafer Çınar, Alessandro Russo, Matteo Spanio, and 2 more authors
In Proceedings of the 3rd Workshop on Artificial Intelligence for Cultural Heritage (IAI4CH 2024) co-located with the 23rd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2024), 2024
The preservation of open-reel audio tapes is critical for maintaining valuable cultural and historical audio archives, yet current digitisation and analysis operations are often error-prone due to tape degradation and the long duration of the recordings. Considering the analog nature of this kind of recording, anomaly detection algorithms, applied to the video of the tape flowing on the playback head, can be used to detect errors and details with musicological value. This paper presents a new dataset of high-quality videos and a new algorithm for anomaly detection on audio tapes. Experimental results show notable improvements in detection performance, though false positives remain a challenge at higher speeds. Additionally, the new algorithm supports a wider range of playback speeds, improving its flexibility. This improvement is an important step towards a reliable implementation of the IEEE/MPAI CAE ARP standard (3302-2022).
AIxIA2024
Towards Emotionally Aware AI: Challenges and Opportunities in the Evolution of Multimodal Generative Models
Matteo Spanio
In Proceedings of the AIxIA Doctoral Consortium 2024 co-located with the 23nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2024), 2024
The evolution of generative models in artificial intelligence (AI) has significantly expanded the capacity of machines to process and generate complex multimodal data such as text, images, audio, and video. Despite these advancements, the integration of emotional awareness remains an underexplored dimension. This paper examines the state of the art in multimodal generative AI, with a focus on existing models developed by major technology companies. It then proposes an approach to incorporate emotional awareness into AI models, which would enhance human-machine interaction by improving the interpretability and explainability of AI-generated decisions. The paper also addresses the challenges associated with building emotion-aware models, including the need for comprehensive multimodal datasets and the computational complexity of incorporating less-explored sensory modalities like olfaction and gustation. Finally, potential solutions are discussed, including the normalization of existing research data and the application of transfer learning to reduce resource demands. These steps are essential for advancing the field and unlocking the potential of emotion-aware multimodal AI in applications such as healthcare, robotics, and virtual assistants.
2023
A study on Equalization Curve Detection in Audio Tape Digitization process using Artificial Intelligence
In recent decades, archives have seen a rapid change in the media used to store sound information, and many of these media are rich in obsolete material that risks becoming unusable due to aging. Therefore, it is necessary to digitize sound documents in order to make them durable over time. However, during the digitization process, errors such as applying an incorrect equalization curve or playing back the tape at the wrong speed can lead to the acquisition of inauthentic material. This work focuses on studying the detection of possible errors due to incorrect equalization curve settings and tape playback speed during the transfer of material from analog to digital, verifying if and how it is possible to detect them using methods specific to Artificial Intelligence (clustering and classification). The results of this research demonstrate that these algorithms may offer good precision in detecting errors and have the potential to automate the verification process, ensuring the preservation of valid information for a longer period of time, but before they can be used in a real-world scenario, they must be further improved.
2021
TUTTI QUANTI VOGLION FARE JAZZ - Contaminazioni Jazz nel repertorio clarinettistico del ’900
Attraverso le ricerche riportate in questa tesi si è voluto approfondire nelle sue diverse sfaccettature il ruolo del clarinetto e dei clarinettisti nella composizione delle Fantasie su temi d’Opera, genere molto diffuso all’inizio dell’Ottocento. Si sono visti nel dettaglio il Potpourri n. 2 per clarinetto e orchestra su Là ci darem la mano di Franz Danzi, le Variazioni su Euer Liebreiz, eure Schönheit in Si♭ maggiore dall’Opera Alruna di Louis Spohr e la Fantasia da Concerto su motivi del Rigoletto di Luigi Bassi. Per ogni brano si sono analizzate le origini storiche ponendo grande attenzione al contatto e talvolta alla collaborazione avvenuta tra compositore e strumentista. Lo scopo di tale lavoro non è quello di riportare semplicemente nozioni di valore storico, ma di riuscire a fornire al lettore l’idea di una corretta interpretazione filologica grazie al supporto della ricostruzione della vita dei personaggi coinvolti nella storia di questi brani; tenendo conto del fatto che il paesaggio sonoro in cui viviamo è diverso da quello di duecento anni fa e che le caratteristiche acustiche del clarinetto hanno subito numerose modifiche nel corso del tempo. La tesi è articolata in quattro capitoli e un’appendice in cui si può trovare una breve storia del clarinetto. Ogni capitolo è strutturato in maniera indipendente e può essere letto separatamente dal resto della tesi. Per ogni capitolo viene fornita un’introduzione al contesto storico e geografico a cui si fa riferimento, una storia dell’autore e dell’esecutore del pezzo e una breve analisi del brano considerato.