Bicocca Open Archive

In a first-of-its-kind study, we assessed the capabilities of large language models (LLMs) in making complex decisions in haematopoietic stem cell transplantation. The evaluation was conducted not only for Generative Pre-trained Transformer 4 (GPT-4) but also conducted on other artificial intelligence models: PaLm 2 and Llama-2. Using detailed haematological histories that include both clinical, molecular and donor data, we conducted a triple-blind survey to compare LLMs to haematology residents. We found that residents significantly outperformed LLMs (p = 0.02), particularly in transplant eligibility assessment (p = 0.01). Our triple-blind methodology aimed to mitigate potential biases in evaluating LLMs and revealed both their promise and limitations in deciphering complex haematological clinical scenarios.

Civettini, I., Zappaterra, A., Granelli, B., Rindone, G., Aroldi, A., Bonfanti, S., et al. (2024). Evaluating the performance of large language models in haematopoietic stem cell transplantation decision-making. BRITISH JOURNAL OF HAEMATOLOGY, 204(4 (April 2024)), 1523-1528 [10.1111/bjh.19200].

Evaluating the performance of large language models in haematopoietic stem cell transplantation decision-making

Civettini, Ivan;Zappaterra, Arianna;Granelli, Bianca Maria;Rindone, Giovanni;Aroldi, Andrea;Bonfanti, Stefano;Colombo, Federica;Fedele, Marilena;Grillo, Giovanni;Parma, Matteo;Perfetti, Paola;Terruzzi, Elisabetta;Gambacorti-Passerini, Carlo;Ramazzotti, Daniele^Co-ultimo;Cavalca, Fabrizio^Co-ultimo

2024

Abstract

In a first-of-its-kind study, we assessed the capabilities of large language models (LLMs) in making complex decisions in haematopoietic stem cell transplantation. The evaluation was conducted not only for Generative Pre-trained Transformer 4 (GPT-4) but also conducted on other artificial intelligence models: PaLm 2 and Llama-2. Using detailed haematological histories that include both clinical, molecular and donor data, we conducted a triple-blind survey to compare LLMs to haematology residents. We found that residents significantly outperformed LLMs (p = 0.02), particularly in transplant eligibility assessment (p = 0.01). Our triple-blind methodology aimed to mitigate potential biases in evaluating LLMs and revealed both their promise and limitations in deciphering complex haematological clinical scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				artificial intelligence; GPT; HSC transplantation; interrater agreement; transplant;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				9-dic-2023
			
	Data di pubblicazione
	
				2024
			
	Rivista
	
				BRITISH JOURNAL OF HAEMATOLOGY
			
	Numero del volume
	
				204
			
	Fascicolo
	
				4 (April 2024)
			
	Pagina iniziale
	
				1523
			
	Pagina finale
	
				1528
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1111/bjh.19200
			
	Fulltext
	
				open
			
	Citazione
	
				Civettini, I., Zappaterra, A., Granelli, B., Rindone, G., Aroldi, A., Bonfanti, S., et al. (2024). Evaluating the performance of large language models in haematopoietic stem cell transplantation decision-making. BRITISH JOURNAL OF HAEMATOLOGY, 204(4 (April 2024)), 1523-1528 [10.1111/bjh.19200].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Civettini-2023-Br J Haematol-VoR.pdf accesso aperto Descrizione: Short Report Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.6 MB Formato Adobe PDF Visualizza/Apri	1.6 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/453078

Citazioni

7

8

Social impact