Bicocca Open Archive

With the development of Web 2.0 technologies, people have gone from being mere content users to content generators. In this context, the evaluation of the quality of (potential) information available online has become a crucial issue. Nowadays, one of the biggest online resources that users rely on as a knowledge base is Wikipedia. The collaborative aspect at the basis of Wikipedia can let to the possible creation of low-quality articles or even misinformation if the process of monitoring the generation and the revision of articles is not performed in a precise and timely way. For this reason, in this paper, the problem of automatically evaluating the quality of Wikipedia contents is considered, by proposing a supervised approach based on Machine Learning to perform the classification of articles on qualitative bases. With respect to prior literature, a wider set of features connected to Wikipedia articles has been taken into account, as well as previously unconsidered aspects connected to the generation of a labeled dataset to train the model, and the use of Gradient Boosting, which produced encouraging results.

Bassani, E., Viviani, M. (2019). Automatically assessing the quality of Wikipedia contents. In Proceedings of the ACM Symposium on Applied Computing (pp.804-807). Association for Computing Machinery [10.1145/3297280.3297357].

Automatically assessing the quality of Wikipedia contents

Bassani, E;Viviani, M

2019

Abstract

With the development of Web 2.0 technologies, people have gone from being mere content users to content generators. In this context, the evaluation of the quality of (potential) information available online has become a crucial issue. Nowadays, one of the biggest online resources that users rely on as a knowledge base is Wikipedia. The collaborative aspect at the basis of Wikipedia can let to the possible creation of low-quality articles or even misinformation if the process of monitoring the generation and the revision of articles is not performed in a precise and timely way. For this reason, in this paper, the problem of automatically evaluating the quality of Wikipedia contents is considered, by proposing a supervised approach based on Machine Learning to perform the classification of articles on qualitative bases. With respect to prior literature, a wider set of features connected to Wikipedia articles has been taken into account, as well as previously unconsidered aspects connected to the generation of a labeled dataset to train the model, and the use of Gradient Boosting, which produced encouraging results.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				poster + paper
			
	Parole chiave
	
				Information Quality; Machine Learning; Social Media; Wikipedia;
			
	Parole chiave
	
				Information Quality; Machine Learning; Social Media; Wikipedia; Software
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				34th Annual ACM Symposium on Applied Computing, SAC 2019
			
	Anno del convegno
	
				2019
			
	Titolo degli atti
	
				Proceedings of the ACM Symposium on Applied Computing
			
	ISBN del volume degli atti
	
				9781450359337
			
	Data di pubblicazione
	
				2019
			
	Numero del volume
	
				147772
			
	Pagina iniziale
	
				804
			
	Pagina finale
	
				807
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/3297280.3297357
			
	Fulltext
	
				none
			
	Citazione
	
				Bassani, E., Viviani, M. (2019). Automatically assessing the quality of Wikipedia contents. In Proceedings of the ACM Symposium on Applied Computing (pp.804-807). Association for Computing Machinery [10.1145/3297280.3297357].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/230328

Citazioni

10

7

Social impact