Evidence Based Software Testability Measurement

Guglielmo, L

Software testing is a key activity of the software life-cycle that requires time and resources to be effective. One of the key aspects that influences the cost of the testing activities and ultimately the effectiveness of those activities for revealing the possible faults is software testability. Software testability expresses how difficult or easy it is to test a software artifact. The availability of estimates on the testability of the software under test and the components therein can support test analysts in anticipating the cost of testing, tuning the test plans, or pinpointing components that should undergo refactoring before testing. Several studies have been performed since the 1990 on the topic; the first ones focused more on giving an appropriate definition for software testability, while later ones focused on making software more testable and finding ways to measure software testability reliably. It is on this last aspect that we focused for this research work. Research on measuring software testability has the main objective of evaluating the testability of software components with the final goal of improving their testability, and better estimate the effort need in the testing phase. The current approaches proposed for estimating testability can mostly be categorized in: estimating testability by analyzing the fault-sensitivity of a software and estimating testability by analyzing the structure of the code of a software. Analyzing fault-sensitivity was really popular in the 90s, but the research on it has progressively dwindled in favor of the approach of using static software metrics, as representatives to the design characteristic of a software, to estimate the testability of the software. These approaches have some intrinsic limitations, admittedly highlighted in the studies, such as being costly or focusing on design characteristics that estimate testability only indirectly. In this research work we introduce a new technique for estimating software testability in which the novelty is to exploit automated test case generation to investigate to which extent a program may or may not suffer of testability issues. In a nutshell, our technique consists in executing (possibly multiple times) a test generator of choice against a program under test, and then automatically analyzing the outcomes of the test generation activity to extract evidences that the generated test cases are fostering effective (or ineffective) testing, due in particular to reasons that can be specifically reconciled with design choices that characterize the current program. We regard to testability issues as design choices that hamper the easiness of achieving effective testing. The higher the amount of the evidences our technique can collect for a given program in favor of the presence or the absence of testability issues in the program, the higher or the lower, respectively, the testability estimate that our technique will be reporting for that program. To validate our proposal we developed a tool to concretely obtain the testability values of software artifacts and we performed many empirical experiments with the aim of finding if our technique is able to highlight testability issues reliably. Moreover, we compared our results against some of the most popular metrics that are currently suggested as potential estimators of testability. The results show the potential of our technique for measuring software testability even when compared against other proposed metrics.

Il test del software è un’attività chiave del ciclo di vita del software che richiede tempo e risorse per essere efficace. Uno degli aspetti fondamentali che influenza il costo delle attività di test e conseguentemente la loro efficacia nel rivelare i possibili difetti è la testabilità del software. La testabilità esprime quanto sia facile o difficile testare un artefatto software. La disponibilità delle stime sulla testabilità di un software e dei suoi artefatti può aiutare gli analisti nell’anticipare i costi dell’attività di test. Molti studi a riguardo sono stati svolti dal 1990 in poi. I primi studi si sono focalizzati principalmente nel fornire una definizione appropriata di testabilità del software, mentre i successivi hanno prestato maggiore attenzione ad individuare modi per rendere il software più testabile e per misurare la testabilità in maniera affidabile. Il nostro lavoro di ricerca si focalizza principalmente su quest’ultimo aspetto. L’attività di ricerca sui metodi per misurare la testabilità del software ha l’obiettivo principale di valutare la testabilità dei componenti del software al fine di migliorarne la testabilità e valutare l’impegno necessario durante l’attività di test. Gli approcci proposti allo stato dell’arte per misurare la testabilità possono essere categorizzati in: stima della testabilità ottenuta analizzando la sensibilità ai difetti e stima della testabilità ottenuta dall’analisi della struttura del codice. L’analisi della sensibilità ai difetti è stata molto popolare negli anni ’90, ma la ricerca in merito è progressivamente scemata a favore degli approcci che utilizzano le metriche statiche del software (che rappresentano le caratteristiche di design) per effettuare una stima della testabilità. Questi approcci hanno alcune limitazioni intrinseche, menzionate negli studi stessi, come l’essere costosi, oppure derivare le stime di testabilità in modo indiretto focalizzandosi sulle caratteristiche di design del software. Questo lavoro di ricerca introduce una nuova tecnica per la stima della testabilità del software. La principale novità è l’utilizzo della generazione automatica dei casi di test, al fine di investigare quanto un programma possa soffrire di problemi di testabilità. In breve, la nostra tecnica esegue (potenzialmente più di una volta) un generatore di casi di test sul programma da testare, e successivamente analizza automaticamente gli esiti dell’attività di generazione automatica dei test, al fine di estrarre evidenze che i casi di test generati sono efficaci (o inefficaci), in particolare a causa delle caratteristiche del design del programma. Più alta è la quantità di evidenze che la nostra tecnica riesce a raccogliere per un dato programma a favore della presenza o assenza di problemi di testabilità, più alta o bassa, rispettivamente, sarà la stima di testabilità che la nostra tecnica riporterà per quel programma. Al fine di validare la tecnica proposta, abbiamo implementato uno strumento per calcolare i valori di testabilità del software in Java, e con questo strumento abbiamo eseguito una serie di esperimenti al fine di verificare se la tecnica identifica problemi di testabilità in maniera affidabile. Inoltre, abbiamo comparato i nostri risultati con quelli delle metriche software più popolari allo stato dell’arte. I risultati ottenuti mostrano il potenziale della nostra tecnica nel misurare la testabilità del software, anche quando comparata alle metriche attualmente proposte.

(2023). Evidence Based Software Testability Measurement. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2023).