Multi-agent trajectory planning in safety-critical systems needs to ensure safety while scaling to many agents. Sampling and optimization methods often adapt slowly and scale poorly. Reinforcement learning can improve adaptability, but it often violates safety constraints and suffers sample inefficiency. This work proposes IDDPGMAF, which integrates Independent Deep Deterministic Policy Gradient (IDDPG) with a pre-trained Multi-head Action Filter Network (MAF-Net). We first cast the problem as a constrained mixed-integer nonlinear program and then reformulate it as a constrained decentralized Markov decision process for real-time adaptability and coordination. IDDPG enables scalable learning, while MAF-Net acts as a differentiable safety filter that masks unsafe actions and penalizes suboptimal behaviors. The IDDPG-MAF method is adapted to a complex multi-aircraft trajectory planning task under dynamic thunderstorm cells. Experimental results show that IDDPG-MAF achieves over 99% safe separation (vs. 82% for the state-of-the-art baseline), 95.5% task success even under moderate uncertainty, and scales safely to 45 aircraft in a compact spatiotemporal window, effectively doubling the maximum capacity of current operations.
Pang, B., Zhang, M., Hu, X., Pham, D., Alam, S., Lulli, G. (2026). Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning. In AAMAS 2026 Conference Proceedings (pp.1928-1937).
Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning
Lulli, G
2026
Abstract
Multi-agent trajectory planning in safety-critical systems needs to ensure safety while scaling to many agents. Sampling and optimization methods often adapt slowly and scale poorly. Reinforcement learning can improve adaptability, but it often violates safety constraints and suffers sample inefficiency. This work proposes IDDPGMAF, which integrates Independent Deep Deterministic Policy Gradient (IDDPG) with a pre-trained Multi-head Action Filter Network (MAF-Net). We first cast the problem as a constrained mixed-integer nonlinear program and then reformulate it as a constrained decentralized Markov decision process for real-time adaptability and coordination. IDDPG enables scalable learning, while MAF-Net acts as a differentiable safety filter that masks unsafe actions and penalizes suboptimal behaviors. The IDDPG-MAF method is adapted to a complex multi-aircraft trajectory planning task under dynamic thunderstorm cells. Experimental results show that IDDPG-MAF achieves over 99% safe separation (vs. 82% for the state-of-the-art baseline), 95.5% task success even under moderate uncertainty, and scales safely to 45 aircraft in a compact spatiotemporal window, effectively doubling the maximum capacity of current operations.| File | Dimensione | Formato | |
|---|---|---|---|
|
Pang et al-2026-AAMAS-VoR.pdf
accesso aperto
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
1.53 MB
Formato
Adobe PDF
|
1.53 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


