High-availability (HA) systems—essential in many contemporary contexts—are designed to guarantee the availability of processes and data for more than 99% of their operational time. These systems are typically implemented as Cloud/Edge infrastructures that are properly maintained by human operators and intelligent agents in order to guarantee the required level of availability. Moreover, we are witnessing the widespread adoption of AI-based automation across many industries. AI-based software agents are increasingly being adopted to introduce more automation in highly available systems, particularly for monitoring and fault detection, fault prediction, recovery, and optimization processes. In this review paper, we discuss the state of the art of AI-based solutions for HA systems. In particular, we focus on the use of AI for the core operational mechanisms of monitoring, failure detection, and recovery. Our discussion begins by reviewing a few key background concepts of HA architectures, then we review recent work on AI-based solutions for monitoring, fault detection and recovery in HA systems.

Fotia, L., Gaeta, R., Messina, F., Rosaci, D., Sarné, G. (2026). Artificial Intelligence for High-Availability Systems: A Comprehensive Review. COMPUTERS, 15(4) [10.3390/computers15040231].

Artificial Intelligence for High-Availability Systems: A Comprehensive Review

Sarné G. M. L.
2026

Abstract

High-availability (HA) systems—essential in many contemporary contexts—are designed to guarantee the availability of processes and data for more than 99% of their operational time. These systems are typically implemented as Cloud/Edge infrastructures that are properly maintained by human operators and intelligent agents in order to guarantee the required level of availability. Moreover, we are witnessing the widespread adoption of AI-based automation across many industries. AI-based software agents are increasingly being adopted to introduce more automation in highly available systems, particularly for monitoring and fault detection, fault prediction, recovery, and optimization processes. In this review paper, we discuss the state of the art of AI-based solutions for HA systems. In particular, we focus on the use of AI for the core operational mechanisms of monitoring, failure detection, and recovery. Our discussion begins by reviewing a few key background concepts of HA architectures, then we review recent work on AI-based solutions for monitoring, fault detection and recovery in HA systems.
Articolo in rivista - Review Essay
artificial intelligence; Cloud/Edge infrastructures; high-availability systems;
English
8-apr-2026
2026
15
4
231
open
Fotia, L., Gaeta, R., Messina, F., Rosaci, D., Sarné, G. (2026). Artificial Intelligence for High-Availability Systems: A Comprehensive Review. COMPUTERS, 15(4) [10.3390/computers15040231].
File in questo prodotto:
File Dimensione Formato  
Fotia et al-2026-Computers-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 875.22 kB
Formato Adobe PDF
875.22 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/607402
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact