Bicocca Open Archive

The rise of Large Language Models (LLMs) such as ChatGPT, Microsoft Copilot, and Claude has generated growing interest in their application as educational tools. Recent studies have explored their use not only for content generation but also as instructional partners, particularly in cognitively demanding academic tasks such as literature reviews. However, limited empirical work has assessed their effectiveness as pedagogical agents within structured, dialogic educational settings. This study presents a qualitative, comparative analysis of three advanced LLMs—ChatGPT 5.3, Microsoft Copilot, and Claude 3.5 Haiku—each engaged as a tutor in accordance with Mollick’s AI role taxonomy. Using a 14-step interaction protocol grounded in the ram framework, each model facilitated a dialogically structured reflection in which students collaboratively constructed a SWOT analysis evaluating the model’s own instructional effectiveness. Claude 3.5 Haiku demonstrated the highest levels of protocol compliance, adaptability, and critical depth, consistently providing personalized and reflective feedback. ChatGPT offered accurate and efficient technical support but showed limitations in individualization and procedural fidelity. Microsoft Copilot struggled with dialogic continuity and adherence to the protocol, resulting in weaker overall tutoring performance. These findings underscore the importance of structured prompting and clearly defined pedagogical roles in LLM-based tutoring and highlight the potential of nextgeneration models like Claude to scaffold high-level academic tasks.

Marconi, L., Brasca, F., Ferri, P. (2026). Evaluating Large Language Models as Academic Tutors: A Human-AI Collaboration Approach for Structured Literature Review. In HCI International 2025 – Late Breaking Papers 27th International Conference on Human-Computer Interaction, HCII 2025, Gothenburg, Sweden, June 22–27, 2025, Proceedings, Part XVI (pp.197-213). Springer [10.1007/978-3-032-13187-4_14].

Evaluating Large Language Models as Academic Tutors: A Human-AI Collaboration Approach for Structured Literature Review

Marconi, L.^Primo;Brasca, F.^Secondo;Ferri P.

2026

Abstract

The rise of Large Language Models (LLMs) such as ChatGPT, Microsoft Copilot, and Claude has generated growing interest in their application as educational tools. Recent studies have explored their use not only for content generation but also as instructional partners, particularly in cognitively demanding academic tasks such as literature reviews. However, limited empirical work has assessed their effectiveness as pedagogical agents within structured, dialogic educational settings. This study presents a qualitative, comparative analysis of three advanced LLMs—ChatGPT 5.3, Microsoft Copilot, and Claude 3.5 Haiku—each engaged as a tutor in accordance with Mollick’s AI role taxonomy. Using a 14-step interaction protocol grounded in the ram framework, each model facilitated a dialogically structured reflection in which students collaboratively constructed a SWOT analysis evaluating the model’s own instructional effectiveness. Claude 3.5 Haiku demonstrated the highest levels of protocol compliance, adaptability, and critical depth, consistently providing personalized and reflective feedback. ChatGPT offered accurate and efficient technical support but showed limitations in individualization and procedural fidelity. Microsoft Copilot struggled with dialogic continuity and adherence to the protocol, resulting in weaker overall tutoring performance. These findings underscore the importance of structured prompting and clearly defined pedagogical roles in LLM-based tutoring and highlight the potential of nextgeneration models like Claude to scaffold high-level academic tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				Large Language Models; Human–AI Interaction; Education; AI role; Human-AI Collaboration Protocols
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				27th International Conference on Human-Computer Interaction, HCII 2025 - June 22–27, 2025
			
	Anno del convegno
	
				2025
			
	Curatori della monografia
	
				Wei, J; Margetis, G; Degen, H; Ntoa, S
			
	Titolo degli atti
	
				HCI International 2025 – Late Breaking Papers 27th International Conference on Human-Computer Interaction, HCII 2025, Gothenburg, Sweden, June 22–27, 2025, Proceedings, Part XVI
			
	ISBN del volume degli atti
	
				9783032131867
			
	Collana o serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Data ahead of print o Data prima pubblicazione Online
	
				3-gen-2026
			
	Data di pubblicazione
	
				2026
			
	Numero del volume
	
				16346
			
	Pagina iniziale
	
				197
			
	Pagina finale
	
				213
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-032-13187-4_14
			
	URL alternativo
	
				https://link.springer.com/chapter/10.1007/978-3-032-13187-4_14
			
	Fulltext
	
				reserved
			
	Citazione
	
				Marconi, L., Brasca, F., Ferri, P. (2026). Evaluating Large Language Models as Academic Tutors: A Human-AI Collaboration Approach for Structured Literature Review. In HCI International 2025 – Late Breaking Papers 27th International Conference on Human-Computer Interaction, HCII 2025, Gothenburg, Sweden, June 22–27, 2025, Proceedings, Part XVI (pp.197-213). Springer [10.1007/978-3-032-13187-4_14].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Marconi-2026-Lecture Notes in Computer Science-VoR.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 1.24 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.24 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/582521

Citazioni

ND

ND

Social impact