By leveraging the advancements in Natural Language Processing and Cognitive Computing, conversational artificial intelligence (AI) has become more mature over the last years. It serves humans in a broad range of applications in business enterprises, government, health-care, and entertaining, and it is getting more embedded into peoples' lives. However, despite the recent improvements, we are still far away from a robust or general AI comparable to human intelligence, especially when it comes to adaptive intelligence able to settle into non-standard and noisy environments. In this paper, we bring to light that emotion in speech negatively affects automatic speech recognition and automatic human inputs understanding. Therefore, emotion is to be considered as a noise compromising the understanding of what the user says and consequently messing up the whole interaction with conversational technologies. For this study, Google Cloud Speech-to-Text and IBM Watson Speech-to-Text have been used.

Catania, F., Crovari, P., Spitale, M., Garzotto, F. (2019). Automatic Speech Recognition: Do Emotions Matter?. In 2019 IEEE International Conference on Conversational Data & Knowledge Engineering (CDKE) (pp.9-16). IEEE [10.1109/CDKE46621.2019.00009].

Automatic Speech Recognition: Do Emotions Matter?

Garzotto, F
2019

Abstract

By leveraging the advancements in Natural Language Processing and Cognitive Computing, conversational artificial intelligence (AI) has become more mature over the last years. It serves humans in a broad range of applications in business enterprises, government, health-care, and entertaining, and it is getting more embedded into peoples' lives. However, despite the recent improvements, we are still far away from a robust or general AI comparable to human intelligence, especially when it comes to adaptive intelligence able to settle into non-standard and noisy environments. In this paper, we bring to light that emotion in speech negatively affects automatic speech recognition and automatic human inputs understanding. Therefore, emotion is to be considered as a noise compromising the understanding of what the user says and consequently messing up the whole interaction with conversational technologies. For this study, Google Cloud Speech-to-Text and IBM Watson Speech-to-Text have been used.
paper
Automatic Speech Recognition; Conversational Technology; Emotional speech; Word Error Rate;
English
1st IEEE International Conference on Conversational Data and Knowledge Engineering, CDKE 2019 - 09-11 December 2019
2019
2019 IEEE International Conference on Conversational Data & Knowledge Engineering (CDKE)
9781728160887
2019
9
16
8949383
none
Catania, F., Crovari, P., Spitale, M., Garzotto, F. (2019). Automatic Speech Recognition: Do Emotions Matter?. In 2019 IEEE International Conference on Conversational Data & Knowledge Engineering (CDKE) (pp.9-16). IEEE [10.1109/CDKE46621.2019.00009].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/556595
Citazioni
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 4
Social impact