ISFORS-MIX: Multi-agent reinforcement learning with Importance-Sampling-Free Off-policy learning and Regularized-Softmax Mixing network
[ 1 ] Instytut Robotyki i Inteligencji Maszynowej, Wydział Automatyki, Robotyki i Elektrotechniki, Politechnika Poznańska | [ P ] employee
[2.2] Automation, electronics, electrical engineering and space technologies
2025
scientific article
english
- Multi-agent reinforcement learning
- Importance-Sampling-Free Off-policy
- Regularized-Softmax
- StarCraft Multi-Agent Challenge
- WarGame Challenge
EN In multi-agent reinforcement learning (MARL), the low quality of value function and the estimation bias and variance in value function decomposition (VFD) are critical challenges that can significantly impact the performance of cooperative and competitive tasks. These issues often lead to suboptimal policies and unstable learning, which hinders the practical application of MARL in complex environments. This paper proposes a novel method called Importance-Sampling-Free Off-policy learning and Regularized-Softmax Mixing network (ISFORSMIX) to address these problems. Through enhancing the value function quality and modifying the loss function, ISFORS-MIX integrates Importance-Sampling-Free Off-policy (ISFO) learning and Regularized-Softmax (RS) techniques to improve the performance of QMIX. ISFORS-MIX is verified on the StarCraft Multi-Agent Challenge (SMAC) and WarGame Challenge (WGC) benchmarks. Results show that ISFORS-MIX outperforms five baseline algorithms, including QMIX and QTRAN, and can increase the quality of multi-agent cooperation and confrontation. Moreover, agents trained with ISFORS-MIX can make decisions faster and more stable to complete given tasks.
112881-1 - 112881-14
Article number: 112881
200
7,2 [List 2023]