Performance in Speech Emotion Recognition (SER) on a single language has increased greatly in the last few years thanks to the use of deep learning techniques. However, cross-lingual SER remains a challenge in real-world applications due to two main factors: the first is the big gap among the source and the target domain distributions; the second factor is the major availability of unlabeled utterances in contrast to the labeled ones for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when only few labeled examples in the target domain (i.e. the new language) are available. Our method is based on a Transformer and it adapts to the new domain by exploiting a pseudo-labeling strategy on the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the proposed method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains. The experimental findings indicate that the unweighted accuracy is increased by an average of 40% compared to state-of-the-art methods.

Agarla, M., Bianco, S., Celona, L., Napoletano, P., Petrovsky, A., Piccoli, F., et al. (2024). Semi-supervised cross-lingual speech emotion recognition. EXPERT SYSTEMS WITH APPLICATIONS, 237(Part A (1 March 2024)) [10.1016/j.eswa.2023.121368].

Semi-supervised cross-lingual speech emotion recognition

Agarla, Mirko;Bianco, Simone;Celona, Luigi
;
Napoletano, Paolo;Piccoli, Flavio;Schettini, Raimondo;
2024

Abstract

Performance in Speech Emotion Recognition (SER) on a single language has increased greatly in the last few years thanks to the use of deep learning techniques. However, cross-lingual SER remains a challenge in real-world applications due to two main factors: the first is the big gap among the source and the target domain distributions; the second factor is the major availability of unlabeled utterances in contrast to the labeled ones for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when only few labeled examples in the target domain (i.e. the new language) are available. Our method is based on a Transformer and it adapts to the new domain by exploiting a pseudo-labeling strategy on the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the proposed method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains. The experimental findings indicate that the unweighted accuracy is increased by an average of 40% compared to state-of-the-art methods.
Articolo in rivista - Articolo scientifico
Cross-lingual; Semi-supervised domain adaptation; Semi-supervised learning; Speech emotion recognition;
English
3-set-2023
2024
237
Part A (1 March 2024)
121368
open
Agarla, M., Bianco, S., Celona, L., Napoletano, P., Petrovsky, A., Piccoli, F., et al. (2024). Semi-supervised cross-lingual speech emotion recognition. EXPERT SYSTEMS WITH APPLICATIONS, 237(Part A (1 March 2024)) [10.1016/j.eswa.2023.121368].
File in questo prodotto:
File Dimensione Formato  
Agarla-2024-Expert Systems with Applications-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 2.8 MB
Formato Adobe PDF
2.8 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/436678
Citazioni
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 6
Social impact