Bicocca Open Archive

Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.

Ferretti, C., Saletta, M. (2023). Naturalness in Source Code Summarization. How Significant is it?. In 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC) (pp.125-134). IEEE Computer Society [10.1109/ICPC58990.2023.00027].

Naturalness in Source Code Summarization. How Significant is it?

Ferretti C.;Saletta M.

2023

Abstract

Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				neural transformers; pre-trained models; program comprehension; source code summarization;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023 - 15 May 2023 through 16 May 2023
			
	Anno del convegno
	
				2023
			
	Titolo degli atti
	
				2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)
			
	ISBN del volume degli atti
	
				9798350337501
			
	Data di pubblicazione
	
				2023
			
	Numero del volume
	
				2023-May
			
	Pagina iniziale
	
				125
			
	Pagina finale
	
				134
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/ICPC58990.2023.00027
			
	Fulltext
	
				none
			
	Citazione
	
				Ferretti, C., Saletta, M. (2023). Naturalness in Source Code Summarization. How Significant is it?. In 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC) (pp.125-134). IEEE Computer Society [10.1109/ICPC58990.2023.00027].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/439120

Citazioni

8

5

Social impact