With the development of Web 2.0 technologies, people have gone from being mere content users to content generators. In this context, the evaluation of the quality of (potential) information available online has become a crucial issue. Nowadays, one of the biggest online resources that users rely on as a knowledge base is Wikipedia. The collaborative aspect at the basis of Wikipedia can let to the possible creation of low-quality articles or even misinformation if the process of monitoring the generation and the revision of articles is not performed in a precise and timely way. For this reason, in this paper, the problem of automatically evaluating the quality of Wikipedia contents is considered, by proposing a supervised approach based on Machine Learning to perform the classification of articles on qualitative bases. With respect to prior literature, a wider set of features connected to Wikipedia articles has been taken into account, as well as previously unconsidered aspects connected to the generation of a labeled dataset to train the model, and the use of Gradient Boosting, which produced encouraging results.

Bassani, E., Viviani, M. (2019). Automatically assessing the quality of Wikipedia contents. In Proceedings of the ACM Symposium on Applied Computing (pp.804-807). Association for Computing Machinery [10.1145/3297280.3297357].

Automatically assessing the quality of Wikipedia contents

Bassani, E;Viviani, M
2019

Abstract

With the development of Web 2.0 technologies, people have gone from being mere content users to content generators. In this context, the evaluation of the quality of (potential) information available online has become a crucial issue. Nowadays, one of the biggest online resources that users rely on as a knowledge base is Wikipedia. The collaborative aspect at the basis of Wikipedia can let to the possible creation of low-quality articles or even misinformation if the process of monitoring the generation and the revision of articles is not performed in a precise and timely way. For this reason, in this paper, the problem of automatically evaluating the quality of Wikipedia contents is considered, by proposing a supervised approach based on Machine Learning to perform the classification of articles on qualitative bases. With respect to prior literature, a wider set of features connected to Wikipedia articles has been taken into account, as well as previously unconsidered aspects connected to the generation of a labeled dataset to train the model, and the use of Gradient Boosting, which produced encouraging results.
poster + paper
Information Quality; Machine Learning; Social Media; Wikipedia;
Information Quality; Machine Learning; Social Media; Wikipedia; Software
English
34th Annual ACM Symposium on Applied Computing, SAC 2019
2019
Proceedings of the ACM Symposium on Applied Computing
9781450359337
2019
147772
804
807
none
Bassani, E., Viviani, M. (2019). Automatically assessing the quality of Wikipedia contents. In Proceedings of the ACM Symposium on Applied Computing (pp.804-807). Association for Computing Machinery [10.1145/3297280.3297357].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/230328
Citazioni
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 5
Social impact