Skip to main content

ORIGINAL RESEARCH article

Front. Robot. AI
Sec. Multi-Robot Systems
Volume 11 - 2024 | doi: 10.3389/frobt.2024.1394209

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Provisionally accepted
  • Aalto University, Otakaari, Finland

The final, formatted version of the article will be published soon.

    This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actorcritic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO).We propose two novel ways of integrating information across agents and time in MACRPO:First, we use a recurrent layer in the critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multiagent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.

    Keywords: cooperative, policy, Multi-agent, Information sharing, Interaction, reinforcement learning

    Received: 01 Jul 2024; Accepted: 25 Nov 2024.

    Copyright: © 2024 Kargar and Kyrki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Eshagh Kargar, Aalto University, Otakaari, Finland

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.