In 2012, the United Nations introduced 17 Sustainable Development Goals (SDGs) aimed at creating a more sustainable and improved future by 2030. However, tracking progress toward these goals is difficult because of the extensive scale and complexity of the data involved. Text classification models have become vital tools in this area, automating the analysis of vast amounts of text from a variety of sources. Additionally, large language models (LLMs) have recently proven indispensable for many natural language processing tasks, including text classification, thanks to their ability to recognize complex linguistic patterns and semantics. This study analyzes various proprietary and open-source LLMs for a single-label, multi-class text classification task focused on the SDGs. Then, it also evaluates the effectiveness of task adaptation techniques (i.e., in-context learning approaches), namely Zero-Shot and Few-Shot Learning, as well as Fine-Tuning within this domain. The principle of the proposed method relies on leveraging LLMs to automatically assign relevant SDG labels to input texts, enabling scalable, consistent, and efficient monitoring of SDG-related content across different sources. By employing in-context learning and prompt engineering, the study investigates how smaller, more accessible models can achieve high performance with minimal labeled data. Quantitative experiments demonstrate that, on the SDG text classification task, smaller models (such as flan-t5-large) with prompt optimization can achieve macro F1-scores up to 0.75, closely matching the performance of much larger models like gpt-3.5, which attained a macro F1-score of 0.77. Few-shot learning further improved results for challenging classes, reducing the performance gap between open-source and proprietary LLMs. The results reveal that smaller models, when optimized through prompt engineering, can perform on par with larger models like OpenAI's GPT (Generative Pre-trained Transformer). These findings suggest that, with proper prompt and task adaptation, open-source LLMs can offer a competitive and more accessible alternative for SDG classification, paving the way for broader and cost-effective adoption of automated SDG monitoring tools.
Cadeddu, A., Chessa, A., De Leo, V., Fenu, G., Motta, E., Osborne, F., et al. (2025). A Comparative Study of Task Adaptation Techniques of Large Language Models for Identifying Sustainable Development Goals. IEEE ACCESS, 13, 175271-175291 [10.1109/ACCESS.2025.3618017].
A Comparative Study of Task Adaptation Techniques of Large Language Models for Identifying Sustainable Development Goals
Osborne F.;
2025
Abstract
In 2012, the United Nations introduced 17 Sustainable Development Goals (SDGs) aimed at creating a more sustainable and improved future by 2030. However, tracking progress toward these goals is difficult because of the extensive scale and complexity of the data involved. Text classification models have become vital tools in this area, automating the analysis of vast amounts of text from a variety of sources. Additionally, large language models (LLMs) have recently proven indispensable for many natural language processing tasks, including text classification, thanks to their ability to recognize complex linguistic patterns and semantics. This study analyzes various proprietary and open-source LLMs for a single-label, multi-class text classification task focused on the SDGs. Then, it also evaluates the effectiveness of task adaptation techniques (i.e., in-context learning approaches), namely Zero-Shot and Few-Shot Learning, as well as Fine-Tuning within this domain. The principle of the proposed method relies on leveraging LLMs to automatically assign relevant SDG labels to input texts, enabling scalable, consistent, and efficient monitoring of SDG-related content across different sources. By employing in-context learning and prompt engineering, the study investigates how smaller, more accessible models can achieve high performance with minimal labeled data. Quantitative experiments demonstrate that, on the SDG text classification task, smaller models (such as flan-t5-large) with prompt optimization can achieve macro F1-scores up to 0.75, closely matching the performance of much larger models like gpt-3.5, which attained a macro F1-score of 0.77. Few-shot learning further improved results for challenging classes, reducing the performance gap between open-source and proprietary LLMs. The results reveal that smaller models, when optimized through prompt engineering, can perform on par with larger models like OpenAI's GPT (Generative Pre-trained Transformer). These findings suggest that, with proper prompt and task adaptation, open-source LLMs can offer a competitive and more accessible alternative for SDG classification, paving the way for broader and cost-effective adoption of automated SDG monitoring tools.| File | Dimensione | Formato | |
|---|---|---|---|
|
Cadeddu et al-2025-IEEE Access-VoR.pdf
accesso aperto
Descrizione: A Comparative Study of Task Adaptation Techniques of Large Language Models for Identifying Sustainable Development Goals
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
2.7 MB
Formato
Adobe PDF
|
2.7 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


