This study delves into a comprehensive analysis and identification of potential vocal cues influencing the perception of authenticity versus artificiality in speech. Our primary aim is to pinpoint the pivotal parameters distinguishing genuine voices from artificially synthesized ones. Employing a multifaceted approach, we leverage advanced methodologies encompassing spectral analysis of speech signals, discernment of prosodic patterns, and the application of cutting-edge machine learning techniques such as Hifi-GAN and generative AI. By examining voice data sourced from diverse origins, our aim is to uncover pivotal markers that impact the perception of truthfulness or artificiality in voices, ultimately constructing a predictive model capable of reliably distinguishing between authentic and artificially crafted voices across both pathological and non-pathological conditions. The culmination of this research holds significant implications not only for advancing the scientific and technological understanding of human voice distinctiveness but also for practical applications in healthcare and security domains. Indeed, in the healthcare field the ability to discern authentic voices from artificial ones could facilitate more accurate diagnosis and monitoring of conditions affecting speech, such as neurodegenerative diseases or vocal cord disorders. Additionally, in security contexts, this research could enhance the reliability of voice-based authentication systems, thereby bolstering the integrity of sensitive communications and data transmission. This could pave the way for advancements in security protocols, healthcare applications, and bolstering the reliability of voice-based technologies.
Regondi, S., Pugliese, R., Mahroo, A. (2024). Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models. In 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE) (pp.782-786). Institute of Electrical and Electronics Engineers Inc. [10.1109/metroxraine62247.2024.10795959].
Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models
Mahroo A.
2024
Abstract
This study delves into a comprehensive analysis and identification of potential vocal cues influencing the perception of authenticity versus artificiality in speech. Our primary aim is to pinpoint the pivotal parameters distinguishing genuine voices from artificially synthesized ones. Employing a multifaceted approach, we leverage advanced methodologies encompassing spectral analysis of speech signals, discernment of prosodic patterns, and the application of cutting-edge machine learning techniques such as Hifi-GAN and generative AI. By examining voice data sourced from diverse origins, our aim is to uncover pivotal markers that impact the perception of truthfulness or artificiality in voices, ultimately constructing a predictive model capable of reliably distinguishing between authentic and artificially crafted voices across both pathological and non-pathological conditions. The culmination of this research holds significant implications not only for advancing the scientific and technological understanding of human voice distinctiveness but also for practical applications in healthcare and security domains. Indeed, in the healthcare field the ability to discern authentic voices from artificial ones could facilitate more accurate diagnosis and monitoring of conditions affecting speech, such as neurodegenerative diseases or vocal cord disorders. Additionally, in security contexts, this research could enhance the reliability of voice-based authentication systems, thereby bolstering the integrity of sensitive communications and data transmission. This could pave the way for advancements in security protocols, healthcare applications, and bolstering the reliability of voice-based technologies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


