Bicocca Open Archive

Hate Speech identification is a challenging task given the world knowledge required. Moreover, it is even more complex in the social media context due to language and media specificities. Despite these challenges, advances in this task may help improving collective well-being on social media. In this context, the biCourage team participated in the English version of Task 1 of HASOC 2021, a shared task for “Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages”. Our participation in this campaign aimed to examine the suitability of Graph Convolutional Neural Networks (GCN), due to their capability to integrate flexible contextual priors, as a computationally effective solution compared to more computationally expensive and relatively data-hungry methods, such as fine-tuning. Specifically, we explored and combined two text-to-graph strategies based on different language modelling objectives, comparing them with fine-tuned Bert. We submitted the results of several deep learning architectures, comprised of different arrangements of GCNs and transformer architectures. Our team achieved the best results in both subtasks using the GCNs based architectures combining two text-to-graph strategies ranked in 21st and 20th positions in Subtasks 1A and 1B. Assessing the models’ prediction, we identify complementary capabilities in the text-to-graph strategies that further research on their combination can explore. Moreover, the proposed GCN model is 3.85 times faster than fine-tuned Bert in training speed and still outperforms it by 2.3% and 5.41% on the F1 score of Subtasks 1A and 1B, respectively.

Wilkens, R., Ognibene, D. (2021). biCourage: ngram and syntax GCNs for Hate Speech detection. In Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation (pp.357-366). CEUR-WS.

biCourage: ngram and syntax GCNs for Hate Speech detection

Wilkens, R;Ognibene, D

2021

Abstract

Hate Speech identification is a challenging task given the world knowledge required. Moreover, it is even more complex in the social media context due to language and media specificities. Despite these challenges, advances in this task may help improving collective well-being on social media. In this context, the biCourage team participated in the English version of Task 1 of HASOC 2021, a shared task for “Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages”. Our participation in this campaign aimed to examine the suitability of Graph Convolutional Neural Networks (GCN), due to their capability to integrate flexible contextual priors, as a computationally effective solution compared to more computationally expensive and relatively data-hungry methods, such as fine-tuning. Specifically, we explored and combined two text-to-graph strategies based on different language modelling objectives, comparing them with fine-tuned Bert. We submitted the results of several deep learning architectures, comprised of different arrangements of GCNs and transformer architectures. Our team achieved the best results in both subtasks using the GCNs based architectures combining two text-to-graph strategies ranked in 21st and 20th positions in Subtasks 1A and 1B. Assessing the models’ prediction, we identify complementary capabilities in the text-to-graph strategies that further research on their combination can explore. Moreover, the proposed GCN model is 3.85 times faster than fine-tuned Bert in training speed and still outperforms it by 2.3% and 5.41% on the F1 score of Subtasks 1A and 1B, respectively.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Bert fine-tuning; biCourage; graph convolutional network; hate speech; text-to-graph;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				13th Forum for Information Retrieval Evaluation, FIRE-WN 2021 - 13 December 2021 through 17 December 2021
			
	Anno del convegno
	
				2021
			
	Curatori della monografia
	
				Mehta, P; Mandl, T; Majumder, P; Mitra, M
			
	Titolo degli atti
	
				Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation
			
	Collana o serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Data di pubblicazione
	
				2021
			
	Numero del volume
	
				3159
			
	Pagina iniziale
	
				357
			
	Pagina finale
	
				366
			
	URL alternativo
	
				https://ceur-ws.org/Vol-3159/
			
	Fulltext
	
				open
			
	Citazione
	
				Wilkens, R., Ognibene, D. (2021). biCourage: ngram and syntax GCNs for Hate Speech detection. In Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation (pp.357-366). CEUR-WS.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Wilkens-2021-FIRE-WN-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 666.47 kB Formato Adobe PDF Visualizza/Apri	666.47 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/472243

Citazioni

1

ND

Social impact