A modified random network distillation algorithm and its application in USVs naval battle simulation

Jinjun Rao; Xiaoqiang Xu; Haoran Bian; Jinbo Chen; Yaxing Wang; Jingtao Lei; Wojciech Giernacki; Liu Mei

doi:10.1016/j.oceaneng.2022.112147

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / A modified random network distillation algorithm and its application in USVs naval battle simulation

Zgłoś uwagę

Artykuł

Pobierz BibTeX

Tytuł

A modified random network distillation algorithm and its application in USVs naval battle simulation

Autorzy

Jinjun Rao
Xiaoqiang Xu
Haoran Bian
Jinbo Chen
Yaxing Wang
Jingtao Lei
Wojciech Giernacki (WARiE) ^{[ 1 ][ 2.2 ][ P ]}
Liu Mei

^{[ 1 ]} Instytut Robotyki i Inteligencji Maszynowej, Wydział Automatyki, Robotyki i Elektrotechniki, Politechnika Poznańska | ^{[ P ]} pracownik

Dyscyplina naukowa (Ustawa 2.0)

[2.2] Automatyka, elektronika, elektrotechnika i technologie kosmiczne

Rok publikacji

2022

Opublikowano w

Ocean Engineering

Rocznik: 2022 | Tom: vol. 261

Typ artykułu

artykuł naukowy

Język publikacji

angielski

Słowa kluczowe

EN

unmanned surface vessel
reinforcement learning
sparse reward
naval battle simulation

Streszczenie

EN Unmanned surface vessel (USV) operations will change the future form of maritime wars profoundly, and one of the critical factors for victory is the cluster intelligence of USVs. Training USVs for combat using reinforcement learning (RL) is an important research direction. Sparse reward as one of the complex problems in reinforcement learning causes sluggish and inefficient USV training. Therefore, a modified random network distillation (MRND) algorithm is proposed for the sparse reward problem. This algorithm measures the weight of internal rewards by calculating the variance of the number of training steps in each training episode to adjust internal and external rewards dynamically. Through the self-play iterative training method, our algorithm, in conjunction with the classical proximal policy optimization (PPO) algorithm, can improve USV cluster intelligence rapidly. Based on USV cluster combat training environments constructed on Unity3D and ML-Agent Toolkits platform, three types of USV cluster combat simulations are conducted to validate the algorithm, including a target pursuit combat simulation, a USV cluster maritime combat simulation, and a USV cluster base offense and defense combat simulation. Simulation experiments have shown that USV clusters trained with the MRND algorithm converge quicker, acquire more rewards in fewer steps, and exhibit a higher level of intelligence than the USV cluster trained by the comparison algorithms.

Data udostępnienia online

07.08.2022

Strony (od-do)

112147-1 - 112147-15

DOI

10.1016/j.oceaneng.2022.112147

URL

https://doi.org/10.1016/j.oceaneng.2022.112147

Uwagi

Article number: 112147

Punktacja Ministerstwa / czasopismo

140