The rise of Large Language Models (LLMs) such as ChatGPT, Microsoft Copilot, and Claude has generated growing interest in their application as educational tools. Recent studies have explored their use not only for content generation but also as instructional partners, particularly in cognitively demanding academic tasks such as literature reviews. However, limited empirical work has assessed their effectiveness as pedagogical agents within structured, dialogic educational settings. This study presents a qualitative, comparative analysis of three advanced LLMs—ChatGPT 5.3, Microsoft Copilot, and Claude 3.5 Haiku—each engaged as a tutor in accordance with Mollick’s AI role taxonomy. Using a 14-step interaction protocol grounded in the ram framework, each model facilitated a dialogically structured reflection in which students collaboratively constructed a SWOT analysis evaluating the model’s own instructional effectiveness. Claude 3.5 Haiku demonstrated the highest levels of protocol compliance, adaptability, and critical depth, consistently providing personalized and reflective feedback. ChatGPT offered accurate and efficient technical support but showed limitations in individualization and procedural fidelity. Microsoft Copilot struggled with dialogic continuity and adherence to the protocol, resulting in weaker overall tutoring performance. These findings underscore the importance of structured prompting and clearly defined pedagogical roles in LLM-based tutoring and highlight the potential of nextgeneration models like Claude to scaffold high-level academic tasks.

Marconi, L., Brasca, F., Ferri, P. (2026). Evaluating Large Language Models as Academic Tutors: A Human-AI Collaboration Approach for Structured Literature Review. In HCI International 2025 – Late Breaking Papers 27th International Conference on Human-Computer Interaction, HCII 2025, Gothenburg, Sweden, June 22–27, 2025, Proceedings, Part XVI (pp.197-213). Springer [10.1007/978-3-032-13187-4_14].

Evaluating Large Language Models as Academic Tutors: A Human-AI Collaboration Approach for Structured Literature Review

Marconi, L.
Primo
;
Ferri P.
2026

Abstract

The rise of Large Language Models (LLMs) such as ChatGPT, Microsoft Copilot, and Claude has generated growing interest in their application as educational tools. Recent studies have explored their use not only for content generation but also as instructional partners, particularly in cognitively demanding academic tasks such as literature reviews. However, limited empirical work has assessed their effectiveness as pedagogical agents within structured, dialogic educational settings. This study presents a qualitative, comparative analysis of three advanced LLMs—ChatGPT 5.3, Microsoft Copilot, and Claude 3.5 Haiku—each engaged as a tutor in accordance with Mollick’s AI role taxonomy. Using a 14-step interaction protocol grounded in the ram framework, each model facilitated a dialogically structured reflection in which students collaboratively constructed a SWOT analysis evaluating the model’s own instructional effectiveness. Claude 3.5 Haiku demonstrated the highest levels of protocol compliance, adaptability, and critical depth, consistently providing personalized and reflective feedback. ChatGPT offered accurate and efficient technical support but showed limitations in individualization and procedural fidelity. Microsoft Copilot struggled with dialogic continuity and adherence to the protocol, resulting in weaker overall tutoring performance. These findings underscore the importance of structured prompting and clearly defined pedagogical roles in LLM-based tutoring and highlight the potential of nextgeneration models like Claude to scaffold high-level academic tasks.
slide + paper
Large Language Models; Human–AI Interaction; Education; AI role; Human-AI Collaboration Protocols
English
27th International Conference on Human-Computer Interaction, HCII 2025 - June 22–27, 2025
2025
Wei, J; Margetis, G; Degen, H; Ntoa, S
HCI International 2025 – Late Breaking Papers 27th International Conference on Human-Computer Interaction, HCII 2025, Gothenburg, Sweden, June 22–27, 2025, Proceedings, Part XVI
9783032131867
3-gen-2026
2026
16346
197
213
https://link.springer.com/chapter/10.1007/978-3-032-13187-4_14
reserved
Marconi, L., Brasca, F., Ferri, P. (2026). Evaluating Large Language Models as Academic Tutors: A Human-AI Collaboration Approach for Structured Literature Review. In HCI International 2025 – Late Breaking Papers 27th International Conference on Human-Computer Interaction, HCII 2025, Gothenburg, Sweden, June 22–27, 2025, Proceedings, Part XVI (pp.197-213). Springer [10.1007/978-3-032-13187-4_14].
File in questo prodotto:
File Dimensione Formato  
Marconi-2026-Lecture Notes in Computer Science-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/582521
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact