A Multi-party Asymmetric Self-play Algorithm and Its Application in Multi-USV Adversarial Game Simulations

Jinjun Rao; Cong Wang; Mei Liu; Jingtao Lei; Wojciech Giernacki

doi:10.1145/3696687.3696712

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / A Multi-party Asymmetric Self-play Algorithm and Its Application in Multi-USV Adversarial Game Simulations

Zgłoś uwagę

Rozdział

Pobierz BibTeX

Tytuł

A Multi-party Asymmetric Self-play Algorithm and Its Application in Multi-USV Adversarial Game Simulations

Autorzy

Jinjun Rao
Cong Wang
Mei Liu
Jingtao Lei
Wojciech Giernacki (WARiE) ^{[ 1 ][ 2.2 ][ P ]}

^{[ 1 ]} Instytut Robotyki i Inteligencji Maszynowej, Wydział Automatyki, Robotyki i Elektrotechniki, Politechnika Poznańska | ^{[ P ]} pracownik

Dyscyplina naukowa (Ustawa 2.0)

[2.2] Automatyka, elektronika, elektrotechnika i technologie kosmiczne

Rok publikacji

2024

Typ rozdziału

rozdział w monografii naukowej / referat

Język publikacji

angielski

Słowa kluczowe

EN

unmanned surface vehicle
deep reinforcement learning
multiparty asymmetric self-play algorithm

Streszczenie

EN Aiming at the problem that the combination of self-play (SP) and deep reinforcement learning (DRL) only involves two-party games and the policy learning of each party is limited, a multi-party asymmetric self-play algorithm (MASP) is proposed. Firstly, by improving the ELO scoring system, the ELO scoring takes into account each party of the unmanned surface vehicle (USV) clusters, and the imbalance of the number of USV clusters, so that the frequency of exchanging strategies of all USV clusters in the confrontation process is equal. Secondly, it ensures that USV clusters have a balanced combat ability, and at the same time ensures that the combat ability of all parties is strong and weak and the gap is not too wide. In addition, the parameters are dynamically set to reduce the update frequency of the policy of the stronger party. The experimental results show that the MASP can make the USV clusters learn more effective policies, have a shorter game time, and obtain higher rewards and ELO scores in the simple 2v2 adversarial game scenario and the three-party game scenario of a warship escort mission.

Strony (od-do)

142 - 146

DOI

10.1145/3696687.3696712

URL

https://dl.acm.org/doi/10.1145/3696687.3696712

Książka

MLPRAE '24: Proceedings of the International Conference on Machine Learning, Pattern Recognition and Automation Engineering

Zaprezentowany na

The International Conference on Machine Learning, Pattern Recognition and Automation Engineering, MLPRAE 2024, 7-9.08.2024, Singapore, Singapore

Punktacja Ministerstwa / rozdział

20

System tworzony przez Politechnikę Poznańską oraz Poznańskie Centrum Superkomputerowo-Sieciowe

Zaloguj się przez eKonto, aby dodać do SIN