Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation

Wojciech Jaśkowski; Marcin Szubert

doi:10.1109/TCIAIG.2015.2464711

System Informacji Naukowej Politechniki Poznańskiej

PL EN

Strona główna / Publikacje / Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation

Zgłoś uwagę

Artykuł

Pobierz BibTeX

Tytuł

Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation

Autorzy

Wojciech Jaśkowski (WI) ^{[ 1 ][ P ]}
Marcin Szubert (WI) ^{[ 1 ][ P ]}

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki, Politechnika Poznańska | ^{[ P ]} pracownik

Rok publikacji

2016

Opublikowano w

IEEE Transactions on Computational Intelligence and AI in Games

Rocznik: 2016 | Tom: vol. 8 | Numer: no. 4

Typ artykułu

artykuł naukowy

Język publikacji

angielski

Słowa kluczowe

EN

competitive coevolution
CMA-ES
n-tuple system
reinforcement learning
large parameter optimization
continuous optimization
numerical optimization
reversi

Streszczenie

EN One weakness of coevolutionary algorithms observed in knowledge-free learning of strategies for adversarial games has been their poor scalability with respect to the number of parameters to learn. In this paper, we investigate to what extent this problem can be mitigated by using Covariance Matrix Adaptation Evolution Strategy, a powerful continuous optimization algorithm. In particular, we employ this algorithm in a competitive coevolutionary setup, denoting this setting as Co-CMA-ES. We apply it to learn position evaluation functions for the game of Othello and find out that, in contrast to plain (co)evolution strategies, Co-CMA-ES learns faster, finds superior game-playing strategies and scales better. Its advantages come out into the open especially for large parameter spaces of tens of hundreds of dimensions. For Othello, combining Co-CMA-ES with experimentally-tuned derandomized systematic n-tuple networks significantly improved the current state of the art. Our best strategy outperforms all the other Othello 1-ply players published to date by a large margin regardless of whether the round-robin tournament among them involves a fixed set of initial positions or the standard initial position but randomized opponents. These results show a large potential of CMA-ES-driven coevolution, which could be, presumably, exploited also in other games.

Strony (od-do)

389 - 401

DOI

10.1109/TCIAIG.2015.2464711

URL

https://ieeexplore.ieee.org/document/7180338

Punktacja Ministerstwa / czasopismo

30

Impact Factor

1,113

System tworzony przez Politechnikę Poznańską oraz Poznańskie Centrum Superkomputerowo-Sieciowe

Zaloguj się przez eKonto, aby dodać do SIN