This paper presents a framework for enriching and complementing administrative data from the Italian Third Sector Single National Register (RUNTS) with textual content extracted from the websites of the non-profit organisations listed in it. Through an automated web-scraping process we associate a website to each organisation and extract from its textual content information to describe the areas of the entity's actual economic activity. We develop a machine learning classifier to allocate each organisation into standardised categories of the International Classification of Non-profit Organisations. We further explore collected web data to identify other dimensions of non-profit operations. Enriching administrative registers with web data can yield trustworthy and detailed insights into the landscape of non-profit economic activities. Obtained results open up opportunities for further research of the labour market and economic development generated by the Third Sector, as well as comparative analysis with the sector of for-profit enterprises.

Bottai, C., Trentini, F., Velyka, A. (2024). Augmenting the Italian Third Sector registry using non-profit organisations’ websites. In Proceedings CARMA 2024 - 6th International Conference on Advanced Research Methods and Analytics (pp.140-147). Editorial Universitat Politècnica de València [10.4995/carma2024.2024.17830].

Augmenting the Italian Third Sector registry using non-profit organisations’ websites

Bottai, Carlo
;
Trentini, Francesco;Velyka, Anna
2024

Abstract

This paper presents a framework for enriching and complementing administrative data from the Italian Third Sector Single National Register (RUNTS) with textual content extracted from the websites of the non-profit organisations listed in it. Through an automated web-scraping process we associate a website to each organisation and extract from its textual content information to describe the areas of the entity's actual economic activity. We develop a machine learning classifier to allocate each organisation into standardised categories of the International Classification of Non-profit Organisations. We further explore collected web data to identify other dimensions of non-profit operations. Enriching administrative registers with web data can yield trustworthy and detailed insights into the landscape of non-profit economic activities. Obtained results open up opportunities for further research of the labour market and economic development generated by the Third Sector, as well as comparative analysis with the sector of for-profit enterprises.
paper
Third Sector; Administrative data enrichment; Big Data; Web scraping; NLP
English
CARMA 2024 - 6th International Conference on Advanced Research Methods and Analytics
2024
Domenech, J; Vicente, M R; de Pedraza, P
Proceedings CARMA 2024 - 6th International Conference on Advanced Research Methods and Analytics
2024
140
147
open
Bottai, C., Trentini, F., Velyka, A. (2024). Augmenting the Italian Third Sector registry using non-profit organisations’ websites. In Proceedings CARMA 2024 - 6th International Conference on Advanced Research Methods and Analytics (pp.140-147). Editorial Universitat Politècnica de València [10.4995/carma2024.2024.17830].
File in questo prodotto:
File Dimensione Formato  
Bottai-2024-CARMA-VoR.pdf

accesso aperto

Descrizione: This work is licensed under a Creative Commons License CC BY-NC-SA 4.0
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 286.81 kB
Formato Adobe PDF
286.81 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/494979
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact