Deep learning (DL) methods have recently gained popularity. Training this class of models is, however, computing-intensive, and frequently GPUs are used to boost performance. Although the costs of GPU-based systems are gradually reducing due to the high demand, they are still prohibitive: in public clouds, GPU-powered virtual machines (VMs) time unit price is 5-8x higher than CPU-only VMs. While the cloud remains the most cost-effective and flexible deployment, operation costs can be reduced, in large settings, by rightsizing and sharing resources among multiple processes. This work addresses the online joint capacity planning and job scheduling with due dates problem and proposes alternative matheuristic solution methods. Our objective is to optimize operation costs by: i) rightsizing the VM capacities at each node, ii) partitioning the set of GPUs among multiple concurrent jobs on the same VM, and iii) determining a due-date-aware job schedule. The effectiveness of the proposed hierarchical approach, coupled with an appropriate Mixed Integer Linear Programming formulation, is validated against first-principle methods by relying on simulation. The experiments prove that the efficiency of GPU-based systems evaluated in terms of costs can be improved by 50-70%. Finally, scalability analyses show that the proposed approach enables to solve problem instances with up to 100 nodes in less than one minute on average, making it suitable for practical scenarios.

Filippini, F., Lattuada, M., Ciavotta, M., Jahani, A., Ardagna, D., Amaldi, E. (2020). Hierarchical Scheduling in on-demand GPU-as-a-Service Systems. In Proceedings - 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2020 (pp.125-132). Institute of Electrical and Electronics Engineers Inc. [10.1109/SYNASC51798.2020.00030].

Hierarchical Scheduling in on-demand GPU-as-a-Service Systems

Ciavotta Michele;
2020

Abstract

Deep learning (DL) methods have recently gained popularity. Training this class of models is, however, computing-intensive, and frequently GPUs are used to boost performance. Although the costs of GPU-based systems are gradually reducing due to the high demand, they are still prohibitive: in public clouds, GPU-powered virtual machines (VMs) time unit price is 5-8x higher than CPU-only VMs. While the cloud remains the most cost-effective and flexible deployment, operation costs can be reduced, in large settings, by rightsizing and sharing resources among multiple processes. This work addresses the online joint capacity planning and job scheduling with due dates problem and proposes alternative matheuristic solution methods. Our objective is to optimize operation costs by: i) rightsizing the VM capacities at each node, ii) partitioning the set of GPUs among multiple concurrent jobs on the same VM, and iii) determining a due-date-aware job schedule. The effectiveness of the proposed hierarchical approach, coupled with an appropriate Mixed Integer Linear Programming formulation, is validated against first-principle methods by relying on simulation. The experiments prove that the efficiency of GPU-based systems evaluated in terms of costs can be improved by 50-70%. Finally, scalability analyses show that the proposed approach enables to solve problem instances with up to 100 nodes in less than one minute on average, making it suitable for practical scenarios.
paper
Cloud; On-demand GPUs; Optimization; Scheduling;
English
22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing - September 2020 through 4 September 2020
2020
Proceedings - 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2020
9781728176284
2020
125
132
9357094
reserved
Filippini, F., Lattuada, M., Ciavotta, M., Jahani, A., Ardagna, D., Amaldi, E. (2020). Hierarchical Scheduling in on-demand GPU-as-a-Service Systems. In Proceedings - 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2020 (pp.125-132). Institute of Electrical and Electronics Engineers Inc. [10.1109/SYNASC51798.2020.00030].
File in questo prodotto:
File Dimensione Formato  
SYNASC2020-4.pdf

Solo gestori archivio

Tipologia di allegato: Submitted Version (Pre-print)
Dimensione 766.77 kB
Formato Adobe PDF
766.77 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/304577
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
Social impact