Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.

Ferretti, C., Saletta, M. (2023). Naturalness in Source Code Summarization. How Significant is it?. In 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC) (pp.125-134). IEEE Computer Society [10.1109/ICPC58990.2023.00027].

Naturalness in Source Code Summarization. How Significant is it?

Ferretti C.;Saletta M.
2023

Abstract

Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.
paper
neural transformers; pre-trained models; program comprehension; source code summarization;
English
31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023 - 15 May 2023 through 16 May 2023
2023
2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)
9798350337501
2023
2023-May
125
134
none
Ferretti, C., Saletta, M. (2023). Naturalness in Source Code Summarization. How Significant is it?. In 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC) (pp.125-134). IEEE Computer Society [10.1109/ICPC58990.2023.00027].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/439120
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
Social impact