Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Article

Download BibTeX

Title

ISFORS-MIX: Multi-agent reinforcement learning with Importance-Sampling-Free Off-policy learning and Regularized-Softmax Mixing network

Authors

[ 1 ] Instytut Robotyki i Inteligencji Maszynowej, Wydział Automatyki, Robotyki i Elektrotechniki, Politechnika Poznańska | [ P ] employee

Scientific discipline (Law 2.0)

[2.2] Automation, electronics, electrical engineering and space technologies

Year of publication

2025

Published in

Knowledge-Based Systems

Journal year: 2025 | Journal volume: vol. 309

Article type

scientific article

Publication language

english

Keywords
EN
  • Multi-agent reinforcement learning
  • Importance-Sampling-Free Off-policy
  • Regularized-Softmax
  • StarCraft Multi-Agent Challenge
  • WarGame Challenge
Abstract

EN In multi-agent reinforcement learning (MARL), the low quality of value function and the estimation bias and variance in value function decomposition (VFD) are critical challenges that can significantly impact the performance of cooperative and competitive tasks. These issues often lead to suboptimal policies and unstable learning, which hinders the practical application of MARL in complex environments. This paper proposes a novel method called Importance-Sampling-Free Off-policy learning and Regularized-Softmax Mixing network (ISFORSMIX) to address these problems. Through enhancing the value function quality and modifying the loss function, ISFORS-MIX integrates Importance-Sampling-Free Off-policy (ISFO) learning and Regularized-Softmax (RS) techniques to improve the performance of QMIX. ISFORS-MIX is verified on the StarCraft Multi-Agent Challenge (SMAC) and WarGame Challenge (WGC) benchmarks. Results show that ISFORS-MIX outperforms five baseline algorithms, including QMIX and QTRAN, and can increase the quality of multi-agent cooperation and confrontation. Moreover, agents trained with ISFORS-MIX can make decisions faster and more stable to complete given tasks.

Pages (from - to)

112881-1 - 112881-14

DOI

10.1016/j.knosys.2024.112881

URL

https://www.sciencedirect.com/science/article/pii/S0950705124015156

Comments

Article number: 112881

Ministry points / journal

200

Impact Factor

7,2 [List 2023]

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.