Virtual execution environments and middleware are required to be extremely reliable because applications running on top of them are developed assuming their correctness, and platform-level failures can result in serious and unexpected application-level problems. Since software platforms and middleware are often executed for long time without any interruption, large part of the testing process is devoted to investigate their behavior when long and stressful executions occur (these test cases are called workloads). When a problem is identified, software engineers examine log files to find its root cause. Unfortunately, since of the workloads length, log files can contain a huge amount of information and manual analysis is often prohibitive. Thus, de-facto, the identification of the root cause is mostly left to the intuition of the software engineer. In this paper, we propose a technique to automatically analyze logs obtained from workloads to retrieve important information that can relate the failure to its cause. The technique works in three steps: (1) during workload executions, the system under test is monitored; (2) logs extracted from workloads that have been successfully completed are used to derive compact and general models of the expected behavior of the target system; (3) logs corresponding to workloads terminated unsuccessfully are compared with the inferred models to identify anomalous event sequences. Anomalies help software engineers to identify failure causes. The technique can also be used during operational phase, to discover possible causes of unexpected failures by comparing logs corresponding to failing executions with models derived at testing time. Preliminary experimental results conducted on the Java Virtual Machine indicate that several bugs can be rapidly identified thanks to the feedbacks provided by our technique.

Cotroneo, D., Pietrantuono, R., Mariani, L., Pastore, F. (2007). Investigation of failure causes in workload-driven reliability testing. In Proceedings of Fourth international workshop on Software quality assurance, in conjunction with the 6th ESEC/FSE joint meeting table of contents (pp.78-85). New York, NY : ACM [10.1145/1295074.1295089].

Investigation of failure causes in workload-driven reliability testing

MARIANI, LEONARDO;PASTORE, FABRIZIO
2007

Abstract

Virtual execution environments and middleware are required to be extremely reliable because applications running on top of them are developed assuming their correctness, and platform-level failures can result in serious and unexpected application-level problems. Since software platforms and middleware are often executed for long time without any interruption, large part of the testing process is devoted to investigate their behavior when long and stressful executions occur (these test cases are called workloads). When a problem is identified, software engineers examine log files to find its root cause. Unfortunately, since of the workloads length, log files can contain a huge amount of information and manual analysis is often prohibitive. Thus, de-facto, the identification of the root cause is mostly left to the intuition of the software engineer. In this paper, we propose a technique to automatically analyze logs obtained from workloads to retrieve important information that can relate the failure to its cause. The technique works in three steps: (1) during workload executions, the system under test is monitored; (2) logs extracted from workloads that have been successfully completed are used to derive compact and general models of the expected behavior of the target system; (3) logs corresponding to workloads terminated unsuccessfully are compared with the inferred models to identify anomalous event sequences. Anomalies help software engineers to identify failure causes. The technique can also be used during operational phase, to discover possible causes of unexpected failures by comparing logs corresponding to failing executions with models derived at testing time. Preliminary experimental results conducted on the Java Virtual Machine indicate that several bugs can be rapidly identified thanks to the feedbacks provided by our technique.
paper
Log-file analysis, workload-driven testing, automated fault localization
English
Fourth international workshop on Software quality assurance, in conjunction with the 6th ESEC/FSE joint meeting table of contents
2007
Proceedings of Fourth international workshop on Software quality assurance, in conjunction with the 6th ESEC/FSE joint meeting table of contents
978-1-59593-724-7
2007
78
85
none
Cotroneo, D., Pietrantuono, R., Mariani, L., Pastore, F. (2007). Investigation of failure causes in workload-driven reliability testing. In Proceedings of Fourth international workshop on Software quality assurance, in conjunction with the 6th ESEC/FSE joint meeting table of contents (pp.78-85). New York, NY : ACM [10.1145/1295074.1295089].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/2932
Citazioni
  • Scopus 12
  • ???jsp.display-item.citation.isi??? ND
Social impact