Background. Several recent software engineering studies use data mined from the version control systems adopted by the different software projects. However, inspecting the data and statistical methods used in those studies reveals several problems with the current approach, mainly related to the dependent nature of the data. Objective. We analyzed time-dependent data in software engineering at commit level, and propose an alternative approach based on time series analysis. Method. We identified statistical tests designed for time series analysis and propose a technique to model time dependent data, similarly to what is done in finance and weather forecasting. We applied our approach to a small set of projects of different sizes, investigating the behaviour of the SQALE Index, in order to highlight the time and interdependency of the different commits. Results. Using these techniques, we analysed and model the data, showing that it is possible to investigate this type of commit data using methods from time series analysis. Conclusion. Based on the promising results, we plan to validate the robustness of the approach by replicating previous works.

Saarimaki, N., Moreschini, S., Lomio, F., Penaloza, R., Lenarduzzi, V. (2022). Towards a Robust Approach to Analyze Time-Dependent Data in Software Engineering. In Proceedings - 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022 (pp.36-40). IEEE COMPUTER SOC [10.1109/SANER53432.2022.00015].

Towards a Robust Approach to Analyze Time-Dependent Data in Software Engineering

Penaloza R.;
2022

Abstract

Background. Several recent software engineering studies use data mined from the version control systems adopted by the different software projects. However, inspecting the data and statistical methods used in those studies reveals several problems with the current approach, mainly related to the dependent nature of the data. Objective. We analyzed time-dependent data in software engineering at commit level, and propose an alternative approach based on time series analysis. Method. We identified statistical tests designed for time series analysis and propose a technique to model time dependent data, similarly to what is done in finance and weather forecasting. We applied our approach to a small set of projects of different sizes, investigating the behaviour of the SQALE Index, in order to highlight the time and interdependency of the different commits. Results. Using these techniques, we analysed and model the data, showing that it is possible to investigate this type of commit data using methods from time series analysis. Conclusion. Based on the promising results, we plan to validate the robustness of the approach by replicating previous works.
paper
Data Analysis; Empirical Methods; Mining Software Repository; Time Dependent Variables; Time Series Analysis;
English
29th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022 - 15 March 2022 through 18 March 2022
2022
Proceedings - 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022
9781665437868
2022
36
40
none
Saarimaki, N., Moreschini, S., Lomio, F., Penaloza, R., Lenarduzzi, V. (2022). Towards a Robust Approach to Analyze Time-Dependent Data in Software Engineering. In Proceedings - 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022 (pp.36-40). IEEE COMPUTER SOC [10.1109/SANER53432.2022.00015].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/394729
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
Social impact