Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation

Wojciech Jaśkowski; Marcin Szubert

doi:10.1109/TCIAIG.2015.2464711

Scientific Information System of the Poznań University of Technology

PL EN

Main page / Publications / Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation

Submit a comment

Article

Download BibTeX

Title

Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position Evaluation

Authors

Wojciech Jaśkowski (WI) ^{[ 1 ][ P ]}
Marcin Szubert (WI) ^{[ 1 ][ P ]}

^{[ 1 ]} Instytut Informatyki, Wydział Informatyki, Politechnika Poznańska | ^{[ P ]} employee

Year of publication

2016

Published in

IEEE Transactions on Computational Intelligence and AI in Games

Journal year: 2016 | Journal volume: vol. 8 | Journal number: no. 4

Article type

scientific article

Publication language

english

Keywords

EN

competitive coevolution
CMA-ES
n-tuple system
reinforcement learning
large parameter optimization
continuous optimization
numerical optimization
reversi

Abstract

EN One weakness of coevolutionary algorithms observed in knowledge-free learning of strategies for adversarial games has been their poor scalability with respect to the number of parameters to learn. In this paper, we investigate to what extent this problem can be mitigated by using Covariance Matrix Adaptation Evolution Strategy, a powerful continuous optimization algorithm. In particular, we employ this algorithm in a competitive coevolutionary setup, denoting this setting as Co-CMA-ES. We apply it to learn position evaluation functions for the game of Othello and find out that, in contrast to plain (co)evolution strategies, Co-CMA-ES learns faster, finds superior game-playing strategies and scales better. Its advantages come out into the open especially for large parameter spaces of tens of hundreds of dimensions. For Othello, combining Co-CMA-ES with experimentally-tuned derandomized systematic n-tuple networks significantly improved the current state of the art. Our best strategy outperforms all the other Othello 1-ply players published to date by a large margin regardless of whether the round-robin tournament among them involves a fixed set of initial positions or the standard initial position but randomized opponents. These results show a large potential of CMA-ES-driven coevolution, which could be, presumably, exploited also in other games.

Pages (from - to)

389 - 401

DOI

10.1109/TCIAIG.2015.2464711

URL

https://ieeexplore.ieee.org/document/7180338

Ministry points / journal

30

Impact Factor

1,113

System created by Poznań University of Technology and Poznan Supercomputing and Networking Center

Log in through eKonto to add to SIS