Leveraging LLMs for code generation is becoming increasingly common, as tools like ChatGPT can suggest method implementations with minimal input, such as a method signature and brief description. Empirical studies further highlight the effectiveness of LLMs in handling such tasks, demonstrating notable performance in code generation scenarios. However, LLMs are inherently non-deterministic, with their output influenced by parameters such as temperature, which regulates the model's level of creativity, and top-p, which controls the choice of the tokens that shall appear in the output. Despite their significance, the role of these parameters is often overlooked. This paper systematically studies the impact of these parameters, as well as the number of prompt repetitions required to account for non-determinism, in the context of 548 Java methods. We observe significantly different performances across different configurations of ChatGPT, with temperature having a marginal impact compared to the more prominent influence of the top-p parameter. Additionally, we show how creativity can enhance code generation tasks. Finally, we provide concrete recommendations for addressing the non-determinism of the model.

Donato, B., Mariani, L., Micucci, D., Riganelli, O. (2025). Studying How Configurations Impact Code Generation in LLMs: The Case of ChatGPT. In 2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC) (pp.442-453). IEEE Computer Society [10.1109/ICPC66645.2025.00055].

Studying How Configurations Impact Code Generation in LLMs: The Case of ChatGPT

Donato B.;Mariani L.;Micucci D.;Riganelli O.
2025

Abstract

Leveraging LLMs for code generation is becoming increasingly common, as tools like ChatGPT can suggest method implementations with minimal input, such as a method signature and brief description. Empirical studies further highlight the effectiveness of LLMs in handling such tasks, demonstrating notable performance in code generation scenarios. However, LLMs are inherently non-deterministic, with their output influenced by parameters such as temperature, which regulates the model's level of creativity, and top-p, which controls the choice of the tokens that shall appear in the output. Despite their significance, the role of these parameters is often overlooked. This paper systematically studies the impact of these parameters, as well as the number of prompt repetitions required to account for non-determinism, in the context of 548 Java methods. We observe significantly different performances across different configurations of ChatGPT, with temperature having a marginal impact compared to the more prominent influence of the top-p parameter. Additionally, we show how creativity can enhance code generation tasks. Finally, we provide concrete recommendations for addressing the non-determinism of the model.
paper
ChatGPT; code generation; LLMs; repetitions; temperature; top-p;
English
33rd IEEE/ACM International Conference on Program Comprehension, ICPC 2025 - 27-28 April 2025
2025
2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC)
9798331502232
2025
442
453
none
Donato, B., Mariani, L., Micucci, D., Riganelli, O. (2025). Studying How Configurations Impact Code Generation in LLMs: The Case of ChatGPT. In 2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC) (pp.442-453). IEEE Computer Society [10.1109/ICPC66645.2025.00055].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/561183
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
Social impact