Bicocca Open Archive

Deep learning (DL) methods have recently gained popularity and been used in commonplace applications; voice and face recognition, among the others. Despite the growing popularity of DL and the associated hardware acceleration techniques, GPU-based systems still have very high costs. Moreover, while the cloud represents a cost-effective and flexible solution, in large settings operations costs can be further optimized by carefully managing and fostering resource sharing. This work addresses the online joint problem of capacity planning of virtual machines (VMs) and DL training jobs scheduling, and proposes a Mixed Integer Linear Programming (MILP) formulation. In particular, DL jobs are assumed to feature a deadline, while multiple VM types are available from a cloud provider catalog, and each VM has, possibly, multiple GPUs. Our solutions optimize the operations costs by (i) right-sizing the VM capacities; (ii) partitioning the set of GPUs among multiple concurrent jobs running on the same VM, and (iii) determining a deadline-Aware job schedule. Our approach is evaluated using an ad-hoc simulator and a prototype environment, and compared against first-principle approaches, resulting in a cost reduction of 45-80%.

Jahani, A., Lattuada, M., Ciavotta, M., Ardagna, D., Amaldi, E., Zhang, L. (2019). Optimizing on-demand GPUs in the Cloud for Deep Learning Applications Training. In 2019 4th International Conference on Computing, Communications and Security, ICCCS 2019 (pp.1-8). Institute of Electrical and Electronics Engineers Inc. [10.1109/CCCS.2019.8888151].

Optimizing on-demand GPUs in the Cloud for Deep Learning Applications Training

Jahani A.;Lattuada M.;Ciavotta M.;Ardagna D.;Amaldi E.;Zhang L.

2019

Abstract

Deep learning (DL) methods have recently gained popularity and been used in commonplace applications; voice and face recognition, among the others. Despite the growing popularity of DL and the associated hardware acceleration techniques, GPU-based systems still have very high costs. Moreover, while the cloud represents a cost-effective and flexible solution, in large settings operations costs can be further optimized by carefully managing and fostering resource sharing. This work addresses the online joint problem of capacity planning of virtual machines (VMs) and DL training jobs scheduling, and proposes a Mixed Integer Linear Programming (MILP) formulation. In particular, DL jobs are assumed to feature a deadline, while multiple VM types are available from a cloud provider catalog, and each VM has, possibly, multiple GPUs. Our solutions optimize the operations costs by (i) right-sizing the VM capacities; (ii) partitioning the set of GPUs among multiple concurrent jobs running on the same VM, and (iii) determining a deadline-Aware job schedule. Our approach is evaluated using an ad-hoc simulator and a prototype environment, and compared against first-principle approaches, resulting in a cost reduction of 45-80%.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Cloud; on-demand GPUs; Optimization models; Scheduling
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				4th International Conference on Computing, Communications and Security, ICCCS 2019
			
	Anno del convegno
	
				2019
			
	Titolo degli atti
	
				2019 4th International Conference on Computing, Communications and Security, ICCCS 2019
			
	ISBN del volume degli atti
	
				978-172810875-9
			
	Data di pubblicazione
	
				2019
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				8
			
	Article number
	
				8888151
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/CCCS.2019.8888151
			
	Fulltext
	
				none
			
	Citazione
	
				Jahani, A., Lattuada, M., Ciavotta, M., Ardagna, D., Amaldi, E., Zhang, L. (2019). Optimizing on-demand GPUs in the Cloud for Deep Learning Applications Training. In 2019 4th International Conference on Computing, Communications and Security, ICCCS 2019 (pp.1-8). Institute of Electrical and Electronics Engineers Inc. [10.1109/CCCS.2019.8888151].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/327734

Citazioni

8

ND

Social impact