This paper presents a framework for enriching and complementing administrative data from the Italian Third Sector Single National Register (RUNTS) with textual content extracted from the websites of the non-profit organisations listed in it. Through an automated web-scraping process we associate a website to each organisation and extract from its textual content information to describe the areas of the entity's actual economic activity. We develop a machine learning classifier to allocate each organisation into standardised categories of the International Classification of Non-profit Organisations. We further explore collected web data to identify other dimensions of non-profit operations. Enriching administrative registers with web data can yield trustworthy and detailed insights into the landscape of non-profit economic activities. Obtained results open up opportunities for further research of the labour market and economic development generated by the Third Sector, as well as comparative analysis with the sector of for-profit enterprises.
Bottai, C., Trentini, F., Velyka, A. (2024). Augmenting the Italian Third Sector registry using non-profit organisations’ websites. In Proceedings CARMA 2024 - 6th International Conference on Advanced Research Methods and Analytics (pp.140-147). Editorial Universitat Politècnica de València [10.4995/carma2024.2024.17830].
Augmenting the Italian Third Sector registry using non-profit organisations’ websites
Bottai, Carlo
;Trentini, Francesco;Velyka, Anna
2024
Abstract
This paper presents a framework for enriching and complementing administrative data from the Italian Third Sector Single National Register (RUNTS) with textual content extracted from the websites of the non-profit organisations listed in it. Through an automated web-scraping process we associate a website to each organisation and extract from its textual content information to describe the areas of the entity's actual economic activity. We develop a machine learning classifier to allocate each organisation into standardised categories of the International Classification of Non-profit Organisations. We further explore collected web data to identify other dimensions of non-profit operations. Enriching administrative registers with web data can yield trustworthy and detailed insights into the landscape of non-profit economic activities. Obtained results open up opportunities for further research of the labour market and economic development generated by the Third Sector, as well as comparative analysis with the sector of for-profit enterprises.File | Dimensione | Formato | |
---|---|---|---|
Bottai-2024-CARMA-VoR.pdf
accesso aperto
Descrizione: This work is licensed under a Creative Commons License CC BY-NC-SA 4.0
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
286.81 kB
Formato
Adobe PDF
|
286.81 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.