Although several researches have been carried out in the field of Speech Emotion Recognition (SER), only few of them consider people of different ages or languages. In particular, most of the SER datasets reported in the literature are collected from young adults or take into account a single language, such as English or Chinese. These datasets tend to be poorly heterogeneous and dependent on the context in which they are collected. In general they are composed of acted utterances or they are recorded in situations properly designed to evoke certain emotions. This paper proposes a framework that allows to benefit of complementary information coming from multisource data to train a general SER model. To merge different sources, proper preprocessing steps to normalize the data source, the type of recorded speeches, and the subjects considered are here described. Furthermore we present a domain adaptation strategy that allows to benefit of the general model adapting it to a certain language and/or a certain population age. In particular here we are interested in developing SER models that consider Italian older adults. Preliminary results that consider several sources for training and different language as test set confirm the validity of the proposal.

Grossi, A., Fratti, G., Gasparini, F. (2023). A computational framework for speech emotion recognition in case of multisource data. In Proceedings of the 4th Italian Workshop on Artificial Intelligence for an Ageing Society co-located with 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023) (pp.113-126). CEUR-WS.

A computational framework for speech emotion recognition in case of multisource data

Grossi A.;Fratti G.;Gasparini F.
2023

Abstract

Although several researches have been carried out in the field of Speech Emotion Recognition (SER), only few of them consider people of different ages or languages. In particular, most of the SER datasets reported in the literature are collected from young adults or take into account a single language, such as English or Chinese. These datasets tend to be poorly heterogeneous and dependent on the context in which they are collected. In general they are composed of acted utterances or they are recorded in situations properly designed to evoke certain emotions. This paper proposes a framework that allows to benefit of complementary information coming from multisource data to train a general SER model. To merge different sources, proper preprocessing steps to normalize the data source, the type of recorded speeches, and the subjects considered are here described. Furthermore we present a domain adaptation strategy that allows to benefit of the general model adapting it to a certain language and/or a certain population age. In particular here we are interested in developing SER models that consider Italian older adults. Preliminary results that consider several sources for training and different language as test set confirm the validity of the proposal.
paper
domanin adaptation; Keywords speech emotion recognition; multisource; older adults; XGboost;
English
4th Italian Workshop on Artificial Intelligence for an Ageing Society, AIxAS 2023 - 9 November 2023
2023
Proceedings of the 4th Italian Workshop on Artificial Intelligence for an Ageing Society co-located with 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023)
2023
3623
113
126
https://ceur-ws.org/Vol-3623/
open
Grossi, A., Fratti, G., Gasparini, F. (2023). A computational framework for speech emotion recognition in case of multisource data. In Proceedings of the 4th Italian Workshop on Artificial Intelligence for an Ageing Society co-located with 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023) (pp.113-126). CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
Grossi-2023-AIxAS-VoR.pdf

accesso aperto

Descrizione: Intervento a convegno - AIxAS 2023 paper 11
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.09 MB
Formato Adobe PDF
1.09 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/523800
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact