Bicocca Open Archive

This study delves into a comprehensive analysis and identification of potential vocal cues influencing the perception of authenticity versus artificiality in speech. Our primary aim is to pinpoint the pivotal parameters distinguishing genuine voices from artificially synthesized ones. Employing a multifaceted approach, we leverage advanced methodologies encompassing spectral analysis of speech signals, discernment of prosodic patterns, and the application of cutting-edge machine learning techniques such as Hifi-GAN and generative AI. By examining voice data sourced from diverse origins, our aim is to uncover pivotal markers that impact the perception of truthfulness or artificiality in voices, ultimately constructing a predictive model capable of reliably distinguishing between authentic and artificially crafted voices across both pathological and non-pathological conditions. The culmination of this research holds significant implications not only for advancing the scientific and technological understanding of human voice distinctiveness but also for practical applications in healthcare and security domains. Indeed, in the healthcare field the ability to discern authentic voices from artificial ones could facilitate more accurate diagnosis and monitoring of conditions affecting speech, such as neurodegenerative diseases or vocal cord disorders. Additionally, in security contexts, this research could enhance the reliability of voice-based authentication systems, thereby bolstering the integrity of sensitive communications and data transmission. This could pave the way for advancements in security protocols, healthcare applications, and bolstering the reliability of voice-based technologies.

Regondi, S., Pugliese, R., Mahroo, A. (2024). Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models. In 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE) (pp.782-786). Institute of Electrical and Electronics Engineers Inc. [10.1109/metroxraine62247.2024.10795959].

Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models

Regondi S.;Pugliese R.;Mahroo A.

2024

Abstract

This study delves into a comprehensive analysis and identification of potential vocal cues influencing the perception of authenticity versus artificiality in speech. Our primary aim is to pinpoint the pivotal parameters distinguishing genuine voices from artificially synthesized ones. Employing a multifaceted approach, we leverage advanced methodologies encompassing spectral analysis of speech signals, discernment of prosodic patterns, and the application of cutting-edge machine learning techniques such as Hifi-GAN and generative AI. By examining voice data sourced from diverse origins, our aim is to uncover pivotal markers that impact the perception of truthfulness or artificiality in voices, ultimately constructing a predictive model capable of reliably distinguishing between authentic and artificially crafted voices across both pathological and non-pathological conditions. The culmination of this research holds significant implications not only for advancing the scientific and technological understanding of human voice distinctiveness but also for practical applications in healthcare and security domains. Indeed, in the healthcare field the ability to discern authentic voices from artificial ones could facilitate more accurate diagnosis and monitoring of conditions affecting speech, such as neurodegenerative diseases or vocal cord disorders. Additionally, in security contexts, this research could enhance the reliability of voice-based authentication systems, thereby bolstering the integrity of sensitive communications and data transmission. This could pave the way for advancements in security protocols, healthcare applications, and bolstering the reliability of voice-based technologies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Generative AI; Healthcare applications; Hifi-GAN; Machine learning; Prosodic patterns; Speech analysis; Voice-based authentication;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				3rd IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering, MetroXRAINE 2024 - 21-23 October 2024
			
	Anno del convegno
	
				2024
			
	Titolo degli atti
	
				2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE)
			
	ISBN del volume degli atti
	
				9798350378009
			
	Data di pubblicazione
	
				2024
			
	Pagina iniziale
	
				782
			
	Pagina finale
	
				786
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/metroxraine62247.2024.10795959
			
	Fulltext
	
				none
			
	Citazione
	
				Regondi, S., Pugliese, R., Mahroo, A. (2024). Towards a Predictive Model of Speech Signatures: Insights from Spectral Analysis and Generative AI Models. In 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE) (pp.782-786). Institute of Electrical and Electronics Engineers Inc. [10.1109/metroxraine62247.2024.10795959].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/573625

Citazioni

0

ND

Social impact