Bicocca Open Archive

Multi-agent trajectory planning in safety-critical systems needs to ensure safety while scaling to many agents. Sampling and optimization methods often adapt slowly and scale poorly. Reinforcement learning can improve adaptability, but it often violates safety constraints and suffers sample inefficiency. This work proposes IDDPGMAF, which integrates Independent Deep Deterministic Policy Gradient (IDDPG) with a pre-trained Multi-head Action Filter Network (MAF-Net). We first cast the problem as a constrained mixed-integer nonlinear program and then reformulate it as a constrained decentralized Markov decision process for real-time adaptability and coordination. IDDPG enables scalable learning, while MAF-Net acts as a differentiable safety filter that masks unsafe actions and penalizes suboptimal behaviors. The IDDPG-MAF method is adapted to a complex multi-aircraft trajectory planning task under dynamic thunderstorm cells. Experimental results show that IDDPG-MAF achieves over 99% safe separation (vs. 82% for the state-of-the-art baseline), 95.5% task success even under moderate uncertainty, and scales safely to 45 aircraft in a compact spatiotemporal window, effectively doubling the maximum capacity of current operations.

Pang, B., Zhang, M., Hu, X., Pham, D., Alam, S., Lulli, G. (2026). Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning. In AAMAS 2026 Conference Proceedings (pp.1928-1937).

Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning

Pang, B;Zhang, M;Hu, X;Pham, DT;Alam, S;Lulli, G

2026

Abstract

Multi-agent trajectory planning in safety-critical systems needs to ensure safety while scaling to many agents. Sampling and optimization methods often adapt slowly and scale poorly. Reinforcement learning can improve adaptability, but it often violates safety constraints and suffers sample inefficiency. This work proposes IDDPGMAF, which integrates Independent Deep Deterministic Policy Gradient (IDDPG) with a pre-trained Multi-head Action Filter Network (MAF-Net). We first cast the problem as a constrained mixed-integer nonlinear program and then reformulate it as a constrained decentralized Markov decision process for real-time adaptability and coordination. IDDPG enables scalable learning, while MAF-Net acts as a differentiable safety filter that masks unsafe actions and penalizes suboptimal behaviors. The IDDPG-MAF method is adapted to a complex multi-aircraft trajectory planning task under dynamic thunderstorm cells. Experimental results show that IDDPG-MAF achieves over 99% safe separation (vs. 82% for the state-of-the-art baseline), 95.5% task success even under moderate uncertainty, and scales safely to 45 aircraft in a compact spatiotemporal window, effectively doubling the maximum capacity of current operations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Multi-agent system; planning under uncertainty; decentralized decision making; deep reinforcement learning; action masking
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026) - 25-29 May 2026
			
	Anno del convegno
	
				2026
			
	Titolo degli atti
	
				AAMAS 2026 Conference Proceedings
			
	Data di pubblicazione
	
				2026
			
	Pagina iniziale
	
				1928
			
	Pagina finale
	
				1937
			
	URL alternativo
	
				https://ifaamas.org/Proceedings/aamas2026/forms/contents.htm
			
	Fulltext
	
				open
			
	Citazione
	
				Pang, B., Zhang, M., Hu, X., Pham, D., Alam, S., Lulli, G. (2026). Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning. In AAMAS 2026 Conference Proceedings (pp.1928-1937).
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Pang et al-2026-AAMAS-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.53 MB Formato Adobe PDF Visualizza/Apri	1.53 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/611324

Citazioni

ND

ND

Social impact