Bicocca Open Archive

Objective. The use of standardized structured reports (SSR) and suitable terminologies like SNOMED-CT can enhance data retrieval and analysis, fostering large-scale studies and collaboration. However, the still large prevalence of narrative reports in our laboratories warrants alternative and automated labeling approaches. In this project, natural language processing (NLP) methods were used to associate SNOMED-CT codes to structured and unstructured reports from an Italian Digital Pathology Department. Methods. Two NLP-based automatic coding systems (support vector machine, SVM, and long-short term memory, LSTM) were trained and applied to a series of narrative reports. Results. The 1163 cases were tested with both algorithms, showing good performances in terms of accuracy, precision, recall, and F1 score, with SVM showing slightly better performances as compared to LSTM (0.84, 0.87, 0.83, 0.82 vs 0.83, 0.85, 0.83, 0.82, respectively). The integration of an explainability allowed identification of terms and groups of words of importance, enabling fine-tuning, balancing semantic meaning and model performance. Conclusions. AI tools allow the automatic SNOMED-CT labeling of the pathology archives, providing a retrospective fix to the large lack of organization of narrative reports.

Cazzaniga, G., Eccher, A., Munari, E., Marletta, S., Bonoldi, E., Della Mea, V., et al. (2023). Natural Language Processing to extract SNOMED-CT codes from pathological reports. PATHOLOGICA, 115(6), 318-324 [10.32074/1591-951X-952].

Natural Language Processing to extract SNOMED-CT codes from pathological reports

Cazzaniga, Giorgio;Eccher, Albino;Munari, Enrico;Marletta, Stefano;Bonoldi, Emanuela;Della Mea, Vincenzo;Cadei, Moris;Sbaraglia, Marta;Guerriero, Angela;Dei Tos, Angelo Paolo;Pagni, Fabio;L'Imperio, Vincenzo

2023

Abstract

Objective. The use of standardized structured reports (SSR) and suitable terminologies like SNOMED-CT can enhance data retrieval and analysis, fostering large-scale studies and collaboration. However, the still large prevalence of narrative reports in our laboratories warrants alternative and automated labeling approaches. In this project, natural language processing (NLP) methods were used to associate SNOMED-CT codes to structured and unstructured reports from an Italian Digital Pathology Department. Methods. Two NLP-based automatic coding systems (support vector machine, SVM, and long-short term memory, LSTM) were trained and applied to a series of narrative reports. Results. The 1163 cases were tested with both algorithms, showing good performances in terms of accuracy, precision, recall, and F1 score, with SVM showing slightly better performances as compared to LSTM (0.84, 0.87, 0.83, 0.82 vs 0.83, 0.85, 0.83, 0.82, respectively). The integration of an explainability allowed identification of terms and groups of words of importance, enabling fine-tuning, balancing semantic meaning and model performance. Conclusions. AI tools allow the automatic SNOMED-CT labeling of the pathology archives, providing a retrospective fix to the large lack of organization of narrative reports.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				digital pathology; laboratory information system; natural language processing; SNOMED-CT;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2023
			
	Rivista
	
				PATHOLOGICA
			
	Numero del volume
	
				115
			
	Fascicolo
	
				6
			
	Pagina iniziale
	
				318
			
	Pagina finale
	
				324
			
	DOI dell'articolo
	
				https://dx.doi.org/10.32074/1591-951X-952
			
	Fulltext
	
				open
			
	Citazione
	
				Cazzaniga, G., Eccher, A., Munari, E., Marletta, S., Bonoldi, E., Della Mea, V., et al. (2023). Natural Language Processing to extract SNOMED-CT codes from pathological reports. PATHOLOGICA, 115(6), 318-324 [10.32074/1591-951X-952].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
10281-455302_VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 644.17 kB Formato Adobe PDF Visualizza/Apri	644.17 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/455302

Citazioni

3

2

Social impact