Bicocca Open Archive

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Lomonaco, F., Siino, M., Tesconi, M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023) (pp.2708-2716). CEUR-WS.

Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers

Lomonaco F.;Siino M.;Tesconi M.

2023

Abstract

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				author profiling; cryptocurrency influencers; data augmentation; japanese; text classification; text enrichment; Twitter;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN 2023
			
	Anno del convegno
	
				2023
			
	Curatori della monografia
	
				Aliannejadi, M; Faggioli, G; Ferro, N; Vlachos, M
			
	Titolo degli atti
	
				Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023)
			
	Collana o serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Data di pubblicazione
	
				2023
			
	Numero del volume
	
				3497
			
	Pagina iniziale
	
				2708
			
	Pagina finale
	
				2716
			
	URL alternativo
	
				https://ceur-ws.org/Vol-3497/
			
	Fulltext
	
				open
			
	Citazione
	
				Lomonaco, F., Siino, M., Tesconi, M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023) (pp.2708-2716). CEUR-WS.
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Lomonaco-2023-CLEF-WN-VoR.pdf accesso aperto Descrizione: This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.04 MB Formato Adobe PDF Visualizza/Apri	1.04 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/524726

Citazioni

15

ND

Social impact