Prompting with large language model (LLM) powered chatbots, such as ChatGPT, is adopted in a variety of tasks and processes across different domains. Given the intrinsic complexity of LLMs, effective prompting is not as straightforward as anticipated which highlights the need for novel educational and support methods that are both widely accessible and seamlessly integrated into task workflows. However, LLM prompting shows strong dependence on the specific task and domain, reducing the usefulness of generic methods. We intend to investigate if LLM-based methods can support learning assessments using ad-hoc guidelines and an extremely limited number of annotated prompt samples. In our framework, guidelines are transformed into features to be detected in the learners’ prompts. The descriptions of these features, together with annotated sample prompts, are used to create few-shot learning detectors. We compare various configurations of these few-shot detectors testing 3 state-of-the-art LLMs and derived ensemble models. Our experiments are performed using cross-validation on original sample prompts and a specifically collected test set of prompts from task-naive learners. We find a strong impact of the LLMs on our feature list. One of the most recent models, GPT-4, shows promising performance on most of the features. However, some closely connected models (GPT-3, GPT-3.5 Turbo (Instruct)) show different behaviors when classifying features. We highlight the need for further research in light of the possible impact of design choices on the selection of features and detection prompts. Our findings are of relevance for researchers and practitioners in generative AI literacy, as well as researchers in computer-supported learning assessment.

Ognibene, D., Donabauer, G., Theophilou, E., Koyuturk, C., Yavari, M., Bursic, S., et al. (2025). Use me wisely: AI-driven assessment for LLM prompting skills development. EDUCATIONAL TECHNOLOGY & SOCIETY, 28(3 (July 2025)), 184-201 [10.30191/ETS.202507_28(3).SP12].

Use me wisely: AI-driven assessment for LLM prompting skills development

Ognibene, D.
Primo
;
Telari, A.;Testa, A.;
2025

Abstract

Prompting with large language model (LLM) powered chatbots, such as ChatGPT, is adopted in a variety of tasks and processes across different domains. Given the intrinsic complexity of LLMs, effective prompting is not as straightforward as anticipated which highlights the need for novel educational and support methods that are both widely accessible and seamlessly integrated into task workflows. However, LLM prompting shows strong dependence on the specific task and domain, reducing the usefulness of generic methods. We intend to investigate if LLM-based methods can support learning assessments using ad-hoc guidelines and an extremely limited number of annotated prompt samples. In our framework, guidelines are transformed into features to be detected in the learners’ prompts. The descriptions of these features, together with annotated sample prompts, are used to create few-shot learning detectors. We compare various configurations of these few-shot detectors testing 3 state-of-the-art LLMs and derived ensemble models. Our experiments are performed using cross-validation on original sample prompts and a specifically collected test set of prompts from task-naive learners. We find a strong impact of the LLMs on our feature list. One of the most recent models, GPT-4, shows promising performance on most of the features. However, some closely connected models (GPT-3, GPT-3.5 Turbo (Instruct)) show different behaviors when classifying features. We highlight the need for further research in light of the possible impact of design choices on the selection of features and detection prompts. Our findings are of relevance for researchers and practitioners in generative AI literacy, as well as researchers in computer-supported learning assessment.
Articolo in rivista - Articolo scientifico
Artificial intelligence in education; Computational thinking; Data science applications in education; Natural language processing;
English
6-mag-2025
2025
28
3 (July 2025)
184
201
open
Ognibene, D., Donabauer, G., Theophilou, E., Koyuturk, C., Yavari, M., Bursic, S., et al. (2025). Use me wisely: AI-driven assessment for LLM prompting skills development. EDUCATIONAL TECHNOLOGY & SOCIETY, 28(3 (July 2025)), 184-201 [10.30191/ETS.202507_28(3).SP12].
File in questo prodotto:
File Dimensione Formato  
Ognibene-2025-Educational Technology & Society-VoR.pdf

accesso aperto

Descrizione: is available under Creative Commons CC-BYNC-ND
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 951.12 kB
Formato Adobe PDF
951.12 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/553625
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
Social impact