The aim of this work is to define a speech emotion recognition (SER) model able to recognize positive, neutral and negative emotions in natural conversations of Italian elderly people. Several datasets for SER are available in the literature. However most of them are in English or Chinese, have been recorded while actors and actresses pronounce short phrases and thus are not related to natural conversation. Moreover only few speeches among all the databases are related to elderly people. Therefore, in this work, a multi-language and multi-age corpus is considered merging a dataset in English, that includes also elderly people, with a dataset in Italian. A general model, trained on young and adult English actors and actresses is proposed, based on XGBoost. Then two strategies of domain adaptation are proposed to adapt the model either to elderly people and to Italian speakers. The results suggest that this approach increases the classification performance, underlining also that new datasets should be collected.
Gasparini, F., Grossi, A. (2022). Sentiment recognition of Italian elderly through domain adaptation on cross-corpus speech dataset. In Proceedings of the Italian Workshop on Artificial Intelligence for an Ageing Society 2022 co-located with 21st International Conference of the Italian Association for Artificial Intelligence (AIxIA 2022) (pp.12-28). CEUR-WS.
Sentiment recognition of Italian elderly through domain adaptation on cross-corpus speech dataset
Gasparini F.;Grossi A.
2022
Abstract
The aim of this work is to define a speech emotion recognition (SER) model able to recognize positive, neutral and negative emotions in natural conversations of Italian elderly people. Several datasets for SER are available in the literature. However most of them are in English or Chinese, have been recorded while actors and actresses pronounce short phrases and thus are not related to natural conversation. Moreover only few speeches among all the databases are related to elderly people. Therefore, in this work, a multi-language and multi-age corpus is considered merging a dataset in English, that includes also elderly people, with a dataset in Italian. A general model, trained on young and adult English actors and actresses is proposed, based on XGBoost. Then two strategies of domain adaptation are proposed to adapt the model either to elderly people and to Italian speakers. The results suggest that this approach increases the classification performance, underlining also that new datasets should be collected.File | Dimensione | Formato | |
---|---|---|---|
Gasparini-2022-AIxAS-VoR.pdf
accesso aperto
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
328.75 kB
Formato
Adobe PDF
|
328.75 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.