Bicocca Open Archive

This paper introduces MAMITA, a novel Italian multimodal benchmark dataset developed for the automatic detection of misogynistic content in online media, with a specific focus on memes. The dataset comprises 1880 memes sourced from popular social platforms—Facebook, Twitter, Instagram, Reddit—and meme-centric websites, selected using misogyny-related keywords covering a wide range of manifestations including body shaming, stereotyping, objectification, and violence. A key feature of this benchmark is its dual annotation strategy: all memes were independently labeled by both domain experts and a pool of 232 crowd annotators. This approach resulted in two parallel sets of annotations that reflect differing labeling perspectives. For each meme, labels include a binary classification (misogynistic or not), the type of misogyny, and its intensity. Beyond categorical labels, the dataset incorporates perspectivist metadata, capturing individual annotators’ perceptions of misogyny along with their demographic and socio-cultural background, including age, level of education, and social status. Each meme’s textual content was also automatically transcribed to enable multimodal analysis. This enriched benchmark enables nuanced research on the automatic detection of misogynistic content in online social media and supports investigations into how perceived misogyny varies across annotator profiles, allowing us to address the urgent challenge related to the diffusion of hateful content against women. Warning: this paper includes examples that may be offensive or harmful.

Fersini, E., Gasparini, F., Rizzi, G., Saibene, A. (2025). MAMITA: Benchmarking Misogyny in Italian Memes. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025). CEUR-WS.

MAMITA: Benchmarking Misogyny in Italian Memes

Fersini, E;Gasparini, F;Rizzi, G;Saibene, A

2025

Abstract

This paper introduces MAMITA, a novel Italian multimodal benchmark dataset developed for the automatic detection of misogynistic content in online media, with a specific focus on memes. The dataset comprises 1880 memes sourced from popular social platforms—Facebook, Twitter, Instagram, Reddit—and meme-centric websites, selected using misogyny-related keywords covering a wide range of manifestations including body shaming, stereotyping, objectification, and violence. A key feature of this benchmark is its dual annotation strategy: all memes were independently labeled by both domain experts and a pool of 232 crowd annotators. This approach resulted in two parallel sets of annotations that reflect differing labeling perspectives. For each meme, labels include a binary classification (misogynistic or not), the type of misogyny, and its intensity. Beyond categorical labels, the dataset incorporates perspectivist metadata, capturing individual annotators’ perceptions of misogyny along with their demographic and socio-cultural background, including age, level of education, and social status. Each meme’s textual content was also automatically transcribed to enable multimodal analysis. This enriched benchmark enables nuanced research on the automatic detection of misogynistic content in online social media and supports investigations into how perceived misogyny varies across annotator profiles, allowing us to address the urgent challenge related to the diffusion of hateful content against women. Warning: this paper includes examples that may be offensive or harmful.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Misogynous Memes, Italian Benchmark, Expert vs Crowd Annotation, Perspectivism
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
			
	Anno del convegno
	
				2025
			
	Titolo degli atti
	
				Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
			
	Data di pubblicazione
	
				2025
			
	URL alternativo
	
				https://ceur-ws.org/Vol-4112/43_main_long.pdf
			
	Fulltext
	
				open
			
	Citazione
	
				Fersini, E., Gasparini, F., Rizzi, G., Saibene, A. (2025). MAMITA: Benchmarking Misogyny in Italian Memes. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025). CEUR-WS.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Fersini-2025-CLiC-it 2025-VoR.pdf accesso aperto Descrizione: Articolo pubblicato su atti CEUR-WS Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.76 MB Formato Adobe PDF Visualizza/Apri	1.76 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/582702

Citazioni

ND

ND

Social impact