A Multi-party Asymmetric Self-play Algorithm and Its Application in Multi-USV Adversarial Game Simulations
[ 1 ] Instytut Robotyki i Inteligencji Maszynowej, Wydział Automatyki, Robotyki i Elektrotechniki, Politechnika Poznańska | [ P ] pracownik
[2.2] Automatyka, elektronika, elektrotechnika i technologie kosmiczne
2024
rozdział w monografii naukowej / referat
angielski
- unmanned surface vehicle
- deep reinforcement learning
- multiparty asymmetric self-play algorithm
EN Aiming at the problem that the combination of self-play (SP) and deep reinforcement learning (DRL) only involves two-party games and the policy learning of each party is limited, a multi-party asymmetric self-play algorithm (MASP) is proposed. Firstly, by improving the ELO scoring system, the ELO scoring takes into account each party of the unmanned surface vehicle (USV) clusters, and the imbalance of the number of USV clusters, so that the frequency of exchanging strategies of all USV clusters in the confrontation process is equal. Secondly, it ensures that USV clusters have a balanced combat ability, and at the same time ensures that the combat ability of all parties is strong and weak and the gap is not too wide. In addition, the parameters are dynamically set to reduce the update frequency of the policy of the stronger party. The experimental results show that the MASP can make the USV clusters learn more effective policies, have a shorter game time, and obtain higher rewards and ELO scores in the simple 2v2 adversarial game scenario and the three-party game scenario of a warship escort mission.
142 - 146
20