Background: Large language models like ChatGPT have demonstrated potential in medical image interpretation, but their efficacy in liver histopathological analysis remains largely unexplored. This study aims to assess ChatGPT-4-vision's diagnostic accuracy, compared to liver pathologists' performance, in evaluating liver fibrosis (stage) in metabolic dysfunction-associated steatohepatitis.Methods: Digitized Sirius Red-stained images for 59 metabolic dysfunction-associated steatohepatitis tissue biopsy specimens were evaluated by ChatGPT-4 and 4 pathologists using the NASH-CRN staging system. Fields of view at increasing magnification levels, extracted by a senior pathologist or randomly selected, were shown to ChatGPT-4, asking for fibrosis staging. The diagnostic accuracy of ChatGPT-4 was compared with pathologists' evaluations and correlated to the collagen proportionate area for additional insights. All cases were further analyzed by an in-context learning approach, where the model learns from exemplary images provided during prompting.Results: ChatGPT-4's diagnostic accuracy was 81% when using images selected by a pathologist, while it decreased to 54% with randomly cropped fields of view. By employing an in-context learning approach, the accuracy increased to 88% and 77% for selected and random fields of view, respectively. This method enabled the model to fully and correctly identify the tissue structures characteristic of F4 stages, previously misclassified. The study also highlighted a moderate to strong correlation between ChatGPT-4's fibrosis staging and collagen proportionate area.Conclusions: ChatGPT-4 showed remarkable results with a diagnostic accuracy overlapping those of expert liver pathologists. The in-context learning analysis, applied here for the first time to assess fibrosis deposition in metabolic dysfunction-associated steatohepatitis samples, was crucial in accurately identifying the key features of F4 cases, critical for early therapeutic decision-making. These findings suggest the potential for integrating large language models as supportive tools in diagnostic pathology.

Panzeri, D., Laohawetwanit, T., Akpinar, R., De Carlo, C., Belsito, V., Terracciano, L., et al. (2025). Assessing the diagnostic accuracy of ChatGPT-4 in the histopathological evaluation of liver fibrosis in MASH. HEPATOLOGY COMMUNICATIONS, 9(5) [10.1097/HC9.0000000000000695].

Assessing the diagnostic accuracy of ChatGPT-4 in the histopathological evaluation of liver fibrosis in MASH

Panzeri D.;Chirico G.;Sironi L.;
2025

Abstract

Background: Large language models like ChatGPT have demonstrated potential in medical image interpretation, but their efficacy in liver histopathological analysis remains largely unexplored. This study aims to assess ChatGPT-4-vision's diagnostic accuracy, compared to liver pathologists' performance, in evaluating liver fibrosis (stage) in metabolic dysfunction-associated steatohepatitis.Methods: Digitized Sirius Red-stained images for 59 metabolic dysfunction-associated steatohepatitis tissue biopsy specimens were evaluated by ChatGPT-4 and 4 pathologists using the NASH-CRN staging system. Fields of view at increasing magnification levels, extracted by a senior pathologist or randomly selected, were shown to ChatGPT-4, asking for fibrosis staging. The diagnostic accuracy of ChatGPT-4 was compared with pathologists' evaluations and correlated to the collagen proportionate area for additional insights. All cases were further analyzed by an in-context learning approach, where the model learns from exemplary images provided during prompting.Results: ChatGPT-4's diagnostic accuracy was 81% when using images selected by a pathologist, while it decreased to 54% with randomly cropped fields of view. By employing an in-context learning approach, the accuracy increased to 88% and 77% for selected and random fields of view, respectively. This method enabled the model to fully and correctly identify the tissue structures characteristic of F4 stages, previously misclassified. The study also highlighted a moderate to strong correlation between ChatGPT-4's fibrosis staging and collagen proportionate area.Conclusions: ChatGPT-4 showed remarkable results with a diagnostic accuracy overlapping those of expert liver pathologists. The in-context learning analysis, applied here for the first time to assess fibrosis deposition in metabolic dysfunction-associated steatohepatitis samples, was crucial in accurately identifying the key features of F4 cases, critical for early therapeutic decision-making. These findings suggest the potential for integrating large language models as supportive tools in diagnostic pathology.
Articolo in rivista - Articolo scientifico
artificial intelligence; collagen; hepatology; large language models; stage;
English
1-mag-2025
2025
9
5
e0695
open
Panzeri, D., Laohawetwanit, T., Akpinar, R., De Carlo, C., Belsito, V., Terracciano, L., et al. (2025). Assessing the diagnostic accuracy of ChatGPT-4 in the histopathological evaluation of liver fibrosis in MASH. HEPATOLOGY COMMUNICATIONS, 9(5) [10.1097/HC9.0000000000000695].
File in questo prodotto:
File Dimensione Formato  
Panzeri et al-2025-Hepatology Communications-VoR.pdf

accesso aperto

Descrizione: This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND)
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/559442
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
Social impact