Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. Machine Learning and Artificial Intelligence

Volume 8 - 2025 | doi: 10.3389/frai.2025.1511712

This article is part of the Research Topic Embodied Artificial Intelligence: Advancing the Frontiers of Robot Sensing and Interaction View all articles

AMaze: An intuitive benchmark generator for fast prototyping of generalizable agents

Provisionally accepted
Kevin Godin - Dubois Kevin Godin - Dubois 1*Karine Miras Karine Miras 1Anna V. Kononova Anna V. Kononova 2
  • 1 VU Amsterdam, Amsterdam, Netherlands
  • 2 Leiden University, Leiden, Netherlands

The final, formatted version of the article will be published soon.

    Traditional approaches to training agents have generally involved a single, deterministic environment of minimal complexity to solve various tasks such as robot locomotion or computer vision. However, agents trained in static environments lack generalization capabilities, limiting their potential in broader scenarios. Thus, recent benchmarks frequently rely on multiple environments, for instance, by providing stochastic noise, simple permutations, or altogether different settings. In practice, such collections result mainly from costly human-designed processes or the liberal use of random number generators. In this work, we introduce AMaze, a novel benchmark generator in which embodied agents must navigate a maze by interpreting visual signs of arbitrary complexities and deceptiveness. This generator promotes human interaction through the easy generation of feature-specific mazes and an intuitive understanding of the resulting agents' strategies. As a proof-of-concept, we demonstrate the capabilities of the generator in a simple, fully discrete case with limited deceptiveness. Agents were trained under three different regimes (one-shot, scaffolding, interactive), and the results showed that the latter two cases outperform direct training in terms of generalization capabilities. Indeed, depending on the combination of generalization metric, training regime, and algorithm, the median gain ranged from 50% to 100% and maximal performance was achieved through interactive training, thereby demonstrating the benefits of a controllable human-in-the-loop benchmark generator.

    Keywords: Benchmark, human-in-the-loop, generalization, Mazes, reinforcement learning

    Received: 15 Oct 2024; Accepted: 26 Feb 2025.

    Copyright: © 2025 Godin - Dubois, Miras and Kononova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Kevin Godin - Dubois, VU Amsterdam, Amsterdam, Netherlands

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    94% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more