Bicocca Open Archive

The performance of state-of-the-art Deep Learning models heavily depends on the availability of well-curated training and testing datasets that sufficiently capture the operational domain. Data augmentation is an effective technique in alleviating data scarcity, reducing the time-consuming and expensive data collection and labelling processes. Despite their potential, existing data augmentation techniques primarily focus on simple geometric and colour space transformations, like noise, flipping and resizing, producing datasets with limited diversity. When the augmented dataset is used for testing the Deep Learning models, the derived results are typically uninformative about the robustness of the models. We address this gap by introducing GENFUZZER, a novel coverage-guided data augmentation fuzzing technique for Deep Learning models underpinned by generative AI. We demonstrate our approach using widely-adopted datasets and models employed for image classification, illustrating its effectiveness in generating informative datasets leading up to a 26% increase in widely-used coverage criteria.

Missaoui, S., Gerasimou, S., Matragkas, N. (2023). Semantic Data Augmentation for Deep Learning Testing Using Generative AI. In Proceedings - 2023 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023 (pp.1694-1698). Institute of Electrical and Electronics Engineers Inc. [10.1109/ASE56229.2023.00194].

Semantic Data Augmentation for Deep Learning Testing Using Generative AI

Missaoui S.;Gerasimou S.;Matragkas N.

2023

Abstract

The performance of state-of-the-art Deep Learning models heavily depends on the availability of well-curated training and testing datasets that sufficiently capture the operational domain. Data augmentation is an effective technique in alleviating data scarcity, reducing the time-consuming and expensive data collection and labelling processes. Despite their potential, existing data augmentation techniques primarily focus on simple geometric and colour space transformations, like noise, flipping and resizing, producing datasets with limited diversity. When the augmented dataset is used for testing the Deep Learning models, the derived results are typically uninformative about the robustness of the models. We address this gap by introducing GENFUZZER, a novel coverage-guided data augmentation fuzzing technique for Deep Learning models underpinned by generative AI. We demonstrate our approach using widely-adopted datasets and models employed for image classification, illustrating its effectiveness in generating informative datasets leading up to a 26% increase in widely-used coverage criteria.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Coverage Guided Fuzzing; Data Augmentation; Deep Learning Testing; Generative AI; Safe AI;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023 - 11 September 2023 through 15 September 2023
			
	Anno del convegno
	
				2023
			
	Titolo degli atti
	
				Proceedings - 2023 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023
			
	ISBN del volume degli atti
	
				9798350329964
			
	Data di pubblicazione
	
				2023
			
	Pagina iniziale
	
				1694
			
	Pagina finale
	
				1698
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/ASE56229.2023.00194
			
	Fulltext
	
				none
			
	Citazione
	
				Missaoui, S., Gerasimou, S., Matragkas, N. (2023). Semantic Data Augmentation for Deep Learning Testing Using Generative AI. In Proceedings - 2023 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023 (pp.1694-1698). Institute of Electrical and Electronics Engineers Inc. [10.1109/ASE56229.2023.00194].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/527816

Citazioni

1

1

Social impact