This study delves into a comprehensive analysis and identification of potential vocal cues influencing the perception of authenticity versus artificiality in speech. Our primary aim is to pinpoint the pivotal parameters distinguishing genuine voices from artificially synthesized ones. Employing a multifaceted approach, we leverage advanced methodologies encompassing spectral analysis of speech signals, discernment of prosodic patterns, and the application of cutting-edge machine learning techniques such as Hifi-GAN and generative AI. By examining voice data sourced from diverse origins, our aim is to uncover pivotal markers that impact the perception of truthfulness or artificiality in voices, ultimately constructing a predictive model capable of reliably distinguishing between authentic and artificially crafted voices across both pathological and non-pathological conditions. The culmination of this research holds significant implications not only for advancing the scientific and technological understanding of human voice distinctiveness but also for practical applications in healthcare and security domains. Indeed, in the healthcare field the ability to discern authentic voices from artificial ones could facilitate more accurate diagnosis and monitoring of conditions affecting speech, such as neurodegenerative diseases or vocal cord disorders. Additionally, in security contexts, this research could enhance the reliability of voice-based authentication systems, thereby bolstering the integrity of sensitive communications and data transmission. This could pave the way for advancements in security protocols, healthcare applications, and bolstering the reliability of voice-based technologies.

Regondi, S., Pugliese, R., Mahroo, A. (2024). Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models. In 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE) (pp.782-786). Institute of Electrical and Electronics Engineers Inc. [10.1109/metroxraine62247.2024.10795959].

Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models

Mahroo A.
2024

Abstract

This study delves into a comprehensive analysis and identification of potential vocal cues influencing the perception of authenticity versus artificiality in speech. Our primary aim is to pinpoint the pivotal parameters distinguishing genuine voices from artificially synthesized ones. Employing a multifaceted approach, we leverage advanced methodologies encompassing spectral analysis of speech signals, discernment of prosodic patterns, and the application of cutting-edge machine learning techniques such as Hifi-GAN and generative AI. By examining voice data sourced from diverse origins, our aim is to uncover pivotal markers that impact the perception of truthfulness or artificiality in voices, ultimately constructing a predictive model capable of reliably distinguishing between authentic and artificially crafted voices across both pathological and non-pathological conditions. The culmination of this research holds significant implications not only for advancing the scientific and technological understanding of human voice distinctiveness but also for practical applications in healthcare and security domains. Indeed, in the healthcare field the ability to discern authentic voices from artificial ones could facilitate more accurate diagnosis and monitoring of conditions affecting speech, such as neurodegenerative diseases or vocal cord disorders. Additionally, in security contexts, this research could enhance the reliability of voice-based authentication systems, thereby bolstering the integrity of sensitive communications and data transmission. This could pave the way for advancements in security protocols, healthcare applications, and bolstering the reliability of voice-based technologies.
paper
Generative AI; Healthcare applications; Hifi-GAN; Machine learning; Prosodic patterns; Speech analysis; Voice-based authentication;
English
3rd IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering, MetroXRAINE 2024 - 21-23 October 2024
2024
2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE)
9798350378009
2024
782
786
none
Regondi, S., Pugliese, R., Mahroo, A. (2024). Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models. In 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE) (pp.782-786). Institute of Electrical and Electronics Engineers Inc. [10.1109/metroxraine62247.2024.10795959].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/573625
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact