By leveraging the advancements in Natural Language Processing and Cognitive Computing, conversational artificial intelligence (AI) has become more mature over the last years. It serves humans in a broad range of applications in business enterprises, government, health-care, and entertaining, and it is getting more embedded into peoples' lives. However, despite the recent improvements, we are still far away from a robust or general AI comparable to human intelligence, especially when it comes to adaptive intelligence able to settle into non-standard and noisy environments. In this paper, we bring to light that emotion in speech negatively affects automatic speech recognition and automatic human inputs understanding. Therefore, emotion is to be considered as a noise compromising the understanding of what the user says and consequently messing up the whole interaction with conversational technologies. For this study, Google Cloud Speech-to-Text and IBM Watson Speech-to-Text have been used.
Catania, F., Crovari, P., Spitale, M., Garzotto, F. (2019). Automatic Speech Recognition: Do Emotions Matter?. In 2019 IEEE International Conference on Conversational Data & Knowledge Engineering (CDKE) (pp.9-16). IEEE [10.1109/CDKE46621.2019.00009].
Automatic Speech Recognition: Do Emotions Matter?
Garzotto, F
2019
Abstract
By leveraging the advancements in Natural Language Processing and Cognitive Computing, conversational artificial intelligence (AI) has become more mature over the last years. It serves humans in a broad range of applications in business enterprises, government, health-care, and entertaining, and it is getting more embedded into peoples' lives. However, despite the recent improvements, we are still far away from a robust or general AI comparable to human intelligence, especially when it comes to adaptive intelligence able to settle into non-standard and noisy environments. In this paper, we bring to light that emotion in speech negatively affects automatic speech recognition and automatic human inputs understanding. Therefore, emotion is to be considered as a noise compromising the understanding of what the user says and consequently messing up the whole interaction with conversational technologies. For this study, Google Cloud Speech-to-Text and IBM Watson Speech-to-Text have been used.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


