Bicocca Open Archive

In this work, we explore the effectiveness of multimodal models for estimating the emotional state expressed continuously in the Valence/Arousal space. We consider four modalities typically adopted for the emotion recognition, namely audio (voice), video (face expression), electrocardiogram (ECG), and electrodermal activity (EDA), investigating different mixtures of them. To this aim, a CNN-based feature extraction module is adopted for each of the considered modalities, and an RNN-based module for modelling the dynamics of the affective behaviour. The fusion is performed in three different ways: at feature-level (after the CNN feature extraction), at model-level (combining the RNN layer’s outputs) and at prediction-level (late fusion). Results obtained on the publicly available RECOLA dataset, demonstrate that the use of multiple modalities improves the prediction performance. The best results are achieved exploiting the contribution of all the considered modalities, and employing the late fusion, but even mixtures of two modalities (especially audio and video) bring significant benefits.

Patania, S., D’Amelio, A., Lanzarotti, R. (2022). Exploring Fusion Strategies in Deep Multimodal Affect Prediction. In Image Analysis and Processing – ICIAP 2022 21st International Conference, Lecce, Italy, May 23–27, 2022, Proceedings, Part II (pp.730-741). Springer Verlag [10.1007/978-3-031-06430-2_61].

Exploring Fusion Strategies in Deep Multimodal Affect Prediction

Patania, S^Primo;D’Amelio, A;Lanzarotti, R

2022

Abstract

In this work, we explore the effectiveness of multimodal models for estimating the emotional state expressed continuously in the Valence/Arousal space. We consider four modalities typically adopted for the emotion recognition, namely audio (voice), video (face expression), electrocardiogram (ECG), and electrodermal activity (EDA), investigating different mixtures of them. To this aim, a CNN-based feature extraction module is adopted for each of the considered modalities, and an RNN-based module for modelling the dynamics of the affective behaviour. The fusion is performed in three different ways: at feature-level (after the CNN feature extraction), at model-level (combining the RNN layer’s outputs) and at prediction-level (late fusion). Results obtained on the publicly available RECOLA dataset, demonstrate that the use of multiple modalities improves the prediction performance. The best results are achieved exploiting the contribution of all the considered modalities, and employing the late fusion, but even mixtures of two modalities (especially audio and video) bring significant benefits.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Deep learning; Multimodal emotion recognition; Multimodal fusion;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Image Analysis and Processing – ICIAP 2022 21st International Conference - May 23–27, 2022
			
	Anno del convegno
	
				2022
			
	Curatori della monografia
	
				Sclaroff, S; Distante, C; Leo, M; Farinella, GM; Tombari, F
			
	Titolo degli atti
	
				Image Analysis and Processing – ICIAP 2022
21st International Conference, Lecce, Italy, May 23–27, 2022, Proceedings, Part II
			
	ISBN del volume degli atti
	
				9783031064296
			
	Collana o serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Data di pubblicazione
	
				2022
			
	Numero del volume
	
				13232 LNCS
			
	Pagina iniziale
	
				730
			
	Pagina finale
	
				741
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-031-06430-2_61
			
	Fulltext
	
				reserved
			
	Citazione
	
				Patania, S., D’Amelio, A., Lanzarotti, R. (2022). Exploring Fusion Strategies in Deep Multimodal Affect Prediction. In Image Analysis and Processing – ICIAP 2022
21st International Conference, Lecce, Italy, May 23–27, 2022, Proceedings, Part II (pp.730-741). Springer Verlag [10.1007/978-3-031-06430-2_61].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Patania-2022-ICIAP 2022-VoR.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 1.15 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.15 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/553715

Citazioni

2

1

Social impact