Autonomous robotic exploration with simultaneous environment and traversability models learning

Prágr, Miloš; Bayer, Jan; Faigl, Jan

doi:10.3389/frobt.2022.910113

ORIGINAL RESEARCH article

Front. Robot. AI, 05 October 2022

Sec. Field Robotics

Volume 9 - 2022 | https://doi.org/10.3389/frobt.2022.910113

This article is part of the Research TopicBio-Inspired Legged Robotics: Design, Sensing and ControlView all 6 articles

Autonomous robotic exploration with simultaneous environment and traversability models learning

Miloš Prágr*

Jan Bayer

Jan Faigl

Computational Robotics Laboratory, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czechia

In this study, we address generalized autonomous mobile robot exploration of unknown environments where a robotic agent learns a traversability model and builds a spatial model of the environment. The agent can benefit from the model learned online in distinguishing what terrains are easy to traverse and which should be avoided. The proposed solution enables the learning of multiple traversability models, each associated with a particular locomotion gait, a walking pattern of a multi-legged walking robot. We propose to address the simultaneous learning of the environment and traversability models by a decoupled approach. Thus, navigation waypoints are generated using the current spatial and traversability models to gain the information necessary to improve the particular model during the robot’s motion in the environment. From the set of possible waypoints, the decision on where to navigate next is made based on the solution of the generalized traveling salesman problem that allows taking into account a planning horizon longer than a single myopic decision. The proposed approach has been verified in simulated scenarios and experimental deployments with a real hexapod walking robot with two locomotion gaits, suitable for different terrains. Based on the achieved results, the proposed method exploits the online learned traversability models and further supports the selection of the most appropriate locomotion gait for the particular terrain types.

1 Introduction

The presented online terrain learning approach is motivated by long-term missions where autonomous robots would improve their operational performance in navigating a priori unknown environments. Some difficult to traverse terrains, such as large rocks, can be identified as obstacles using an observed geometric model of the environment. However, areas which appear flat and thus easy to traverse may, in practice, be hard to traverse due to their terra-mechanical properties, as experienced by NASA’s Mars Rover Spirit stuck in soft sand (Brown and Webster, 2010). In the presented approach, individual terra-mechanical properties are assumed to be partially unknown, and we learn a black box model to assess the traversability in a particular environment from the terrain appearance (Prágr et al., 2018). Since the scope of the functional relation between the terrain appearance and traversability might be limited to a particular environment, we advocate that on long-term deployments and exploration missions, the terrain models are learned online incrementally (Prágr et al., 2019b) as a part of the mission (Prágr et al., 2019a). Hence, we focus on the exploration of the environment and its terra-mechanical properties represented as the traversal costs that characterize the difficulty of traversing the individual terrains, as visualized in Figure 1. In particular, we consider multi-legged walking robots that can traverse various terrains with different traversal costs (also depending on the particular locomotion gait used), which provide a representative case for demonstrating the benefits of traversability assessment learned online. Compared to the previous work, the presented approach addresses the different locomotion gaits of the robot and distinguishes individual terrain-gait traversal cost models. In addition, the proposed exploration strategy provides a non-myopic (Zlot and Stentz, 2006) solution that takes into account both the spatial exploration and learning of the traversal cost models.

FIGURE 1

FIGURE 1. (A) Hexapod walking robot (courtesy of Forouhar et al. (2021)) (B) and its deployment using the proposed approach. The visualized planned path is to visit determined exploration goals for the spatial (in blue) and traversal cost models (in red). The spatial exploration goals are located close to the boundary of the already explored part of the environment. The traversal cost exploration goals correspond to sites where the terrain traversal cost model can be improved. Since the cost model is already partially learned, the red-tinted turf is known to be hard to traverse, and thus the robot prefers the green-tinted pavement, which is relatively easy to traverse. The yellow-tinted terrain is yet to be experienced by the robot and thus carries the terrain learning goal indicated by the red waypoint. The not-yet-observed area is gray.

In the proposed approach, the impassable parts of the explored environment are determined by the geometric models using a grid-based elevation map (Bayer and Faigl, 2019). The individual terrain-gait traversal cost models are near-to-far predictors that infer the time to traverse over the traversable areas from their appearance and are learned using the robot’s previous experience accrued when traversing similar-appearing terrains using a particular gait. The traversal cost models comprise Gaussian process (GP) regressors (Rasmussen and Williams, 2006), which predict the traversal costs from the terrain appearance, and growing neural gas (GNG) (Fritzke, 1994) terrain type clustering schemes used to identify similar-appearing terrains. The geometric and traversal cost models are incrementally constructed while exploring the mission environment. The geometric model is continually built from the robot’s exteroception, whereas each traversal cost model accumulates the costs experienced by the robot when moving using the respective locomotion gait. During the deployment, each model continually provides a set of exploration goals to be visited to learn (improve) the model. For several possible goal locations, the exploration strategy is to determine a sequence of the navigational goals to be visited that is addressed as a solution of the Generalized Traveling Salesman Problem (GTSP) (Noon, 1988) to provide a non-myopic solution considering the so-called TSP distance cost (Faigl and Kulich, 2013).

The remainder of the article is organized as follows. In Section 2, we present an overview of the related approaches in mobile robot exploration and traversability assessment. Section 3 formally defines the studied problem of mobile robot exploration with a priori unknown terrain traversal cost assessment. The proposed exploration with online traversal cost learning is presented in Section 4. Section 5 reports on the performed experimental results in simulations and real-world experimental deployments with a multi-legged robot controlled by two motion gaits. In Section 6, we discuss the strong points and limitations of the proposed approach. Section 7 concludes the study.

2 State of the art

This section presents an overview of works related to the proposed approach. First, we focus on the traversability assessment approaches. Then we survey mobile robot exploration and environment modeling.

2.1 Mobile robot traversability

Two main questions emerge when reasoning about robot traversability over terrains. First, can the terrain be safely traversed, or should it be avoided? Second, if the terrain is passable, how does it compare to other terrains, i.e., is it easier and safer to traverse? Note that for the sake of clarity, we further denote the binary (true/false) traversability, which determines whether an area is an impassable obstacle or passable terrain, as terrain passability. In contrast, the relative comparison of the traversal difficulty over passable terrains is denoted as assessing the traversal cost. The term traversability is used to describe the notion in general, including both the passability and traversal cost. A review of mobile robot traversability assessment methods can be found in Papadakis (2013), and an overview of learning-based methods for ground robot navigation is in Guastella and Muscato (2021). Hence, we focus on works relevant to how traversability is approached in this study.

The passability discrimination can be directly incorporated in mapping in the form of occupancy cell grids (Moravec and Elfes, 1985), Gaussian mixtures (O’Meadhra et al., 2019), GP models (O’Callaghan et al., 2009), or Hilbert maps (Ramos and Ott, 2016). The distinction of terrain passability can be understood as an instance of terrain classification, where terrains are assigned individual classes, and each class carries presumed terra-mechanical properties. For example, some classes can be considered hard-to-traverse vegetation or obstacles (Bradley et al., 2015). In addition to terrain classification, terrains can be assigned continuous values describing some observed terrain property such as roughness (Krüsi et al., 2016; Belter et al., 2019), slope (Stelzer et al., 2012), or step height (Homberger et al., 2016; Wermelinger et al., 2016). For continuous measures, passability can be based on thresholding the value, as in Stelzer et al. (2012), where the passability is determined by individually thresholding terrain slope, roughness, and step height. Moreover, classes may correspond to a particular robot configuration, such as in Haddeler et al. (2020), where the authors classify terrains into modes of wheeled-legged locomotion.

In instances where the terra-mechanical properties are unknown and thus terrains’ appearance and geometry features are not sufficient to determine their traversability, the traversability can be based on the robot’s prior experience with similar terrains. The experience-based measures can be derived from the robot proprioception and described using stability (McGhee and Frank, 1968; Lin and Song, 1993), slippage (Gonzalez and Iagnemma, 2018), vibrations (Bekhti and Kobayashi, 2016), velocity, or energy consumption (Kottege et al., 2015). The experience-based approaches describe the traversal cost only over passable terrains since the traversal is needed to acquire the robot experience. An exception worth mentioning is haptic sensing to determine obstacle passability (Baleia et al., 2015), which, however, still relies on the direct interaction of the robot with the terrain.

Since the experience-based approaches use on-location robot experience, they are difficult to use directly in path planning where it is necessary to evaluate terrain traversability from a distance using only exteroceptive measurements. Near-to-far approaches pair traversability indicators that can be observed only near the robot (such as proprioception or dense short-range measurements) with terrain appearance and geometry that can be observed from farther distances and thus learn to predict traversability from the long-range measurements. Sofman et al. (2006) incrementally learned the relation between dense laser-based features characterizing ground unit traversability and overhead features that can be used to assess traversability from aerial images, whereas Bekhti and Kobayashi (2016) learned to predict vibration-based traversability from terrain texture. Quann et al. (2020) proposed an energy traversal cost regressor considering both terrain position and appearance. In addition, Mayuku et al. (2021) proposed a self-supervised labeling approach for a near-to-far scenario, where vibration-based traversal cost is inferred from image data, and the self-supervised data gathering is based on identified terrain classes.

Following the approaches in the literature, we assume that terrain is rigid, and it is possible to distinguish passable terrain and non-traversable obstacles from the terrain geometry using a step height similar to Stelzer et al. (2012), or Wermelinger et al. (2016). Hence, this study focuses on modeling the traversal cost over the determined passable terrains. Moreover, we are motivated by the online cost assessment in mobile robot exploration, where the computational requirements are crucial. Therefore, we avoid high fidelity models, which besides being costly to compute also rely on plan execution with high precision (such as deterministic foothold placement), which might not be available in practice. The traversal cost is thus learned as a black box near-to-far model that uses terrain appearance to predict the time to traverse over terrains. Since the scope of the relation between the terrain appearance and traversability might be limited to a particular environment, we incrementally learn the cost predictor by sampling the robot’s experience with traversing individual terrains. Similar to the classification in Belter et al. (2019), a color histogram is selected as the terrain appearance descriptor because it is simple to compute and the histograms are sufficiently descriptive to capture multi-colored terrains. Furthermore, we consider locomotion gaits of the employed hexapod walking robot that are suitable for different terrains. Thus, the passable terrain is a terrain traversable by at least one gait, and obstacles are terrain parts that none of the gaits can traverse. We propose a decoupled approach that predicts the traversal cost for each gait independently, and the robot then selects the most cost-efficient gait for each terrain.

Regarding the existing methods, the proposed approach is closest to Haddeler et al. (2020), where modes of the wheeled-legged robot are switched. In addition, the proposed approach is also close to the self-supervised, near-to-far traversability-learning approach proposed by Mayuku et al. (2021). In that regard, the primary contribution of the proposed approach is the integration of active traversability learning in mobile robot exploration, where the robot plans a non-myopic path to improve both the spatial and traversal cost models learned online during the deployment.

2.2 Mobile robot exploration and environment modeling

Mobile robot exploration is an active perception problem that concerns behaviors where the robot seeks to build a model of a priori unknown environment. The exploration entails the robot seeking areas that are in some capacity unknown to construct a map of the environment. The exploration thus inherently combines localization, navigation, and planning (Schultz et al., 1999) to decide where the robot should go next. Steering the robot navigation to not-yet-observed areas yields frontier-based exploration (Yamauchi, 1997), where the frontiers represent boundaries between the observed traversable area and the unknown space represented on an occupancy grid (Moravec and Elfes, 1985). Recently, in the octree-based environment model, frontiers are represented as mesh faces with few neighbors (Azpúrna et al., 2021).

Bourgault et al. (2002) and Makarenko et al. (2002) exploit the probabilistic representation on such an occupancy evidence grid and navigate to maximize the approximated occupancy information gain. Charrow et al. (2015) proposed to use Cauchy–Schwarz quadratic mutual information to speed up the information gain computation. In addition, approaches that rely on non-grid-based representation for navigation, such as meshes and topological maps, may retain cell or voxel grids to quantify the information gain (Dang et al., 2020).

In addition to mapping, robots also build models of environment-underlying phenomena that can be temperature models (Luo and Sycara, 2018) or spread of gas (Rhodes et al., 2020). The environment phenomenon can be considered spatial, and the goal is thus to learn the mapping from the position in the environment to the value of the phenomenon. Furthermore, a spatiotemporal model can be considered (Ma et al., 2018) that would require repeatedly visiting particular areas to build the temporal model, which might be needed for changing environments (Krajník et al., 2017).

Spatial-based modeling can be considered as informative path planning (Singh et al., 2007), where the goal is to find the most informative path through the environment (Hollinger and Sukhatme, 2014) subject to a particular constraint such as the robot energy budget (Binney and Sukhatme, 2012). Informative path planning approaches can be broadly divided into myopic and non-myopic methods. The myopic methods are greedy and plan only with regard to the next goal, whereas non-myopic methods plan with a longer horizon. For example, in the context of frontier-based mobile robot exploration, seeking the closest frontier is myopic, contrary to path planning to visit all the representatives of the frontiers that is non-myopic (Faigl et al., 2012).

Like seeking frontiers in spatial exploration, the explorer learning an underlying model must actively locate sites to sample novel information. Hence, GP regressors (Rasmussen and Williams, 2006) are particularly suited for active learning because it is relatively straightforward to identify uncertain regions where the model should be improved. GP prediction uncertainty is characterized by the differential entropy of the predicted normal distribution, leading to the characterization of information gained by observing individual areas. However, in practice, directly computing the information gained by possible observations is not feasible due to the number of possible actions, especially for a long planning horizon. Hence, various approximations and sampling strategies have been proposed.

Pasolli and Melgani (2011) proposed to either directly seek the most uncertain samples signified by the highest prediction variance or to select areas that are the most remote in the feature space given the GP hyper-parameters. In Viseras et al. (2019), the robot selects paths with high average entropy per sampling to tradeoff informativeness and the number of samplings. Martin and Corke (2014) proposed to set the mean function of a GP traversal cost regressors to zero, thus motivating a robot to traverse unknown areas where the predictions are close to the zero mean. The GP Upper Confidence Bound (GP-UCB) (Srinivas et al., 2010) is an exploration–exploitation method that combines seeking the most uncertain areas with improving the model around the highest value. It can be used when the learner is interested in finding extreme values of the modeled phenomenon, such as temperature (Luo and Sycara, 2018; Shi et al., 2020). In addition, a depth-first variant of the Monte Carlo Tree Search (MCTS) to select anytime informative paths can be employed to consider both differential entropy and upper confidence bound to model sampling informativeness (Guerrero et al., 2021).

Karolj et al. (2020) computed a path to the closest spatial frontier that visits all local sampling locations for a magnetism model by solving the Traveling Salesman Problem (TSP) over the respective goal locations. In localization in mapping, Ossenkopf et al. (2019) note that occupancy information gained at an unknown location holds little value and thus weight the occupancy gains by a pose uncertainty (Vallvé and Andrade-Cetto, 2015). Hence, the explorer must address how to combine the occupancy and pose uncertainties. In Bourgault et al. (2002) and Stachniss et al. (2005), the total exploration utility is a linear combination of the occupancy uncertainty and the robot localization uncertainty represented using the differential entropy based on its position distribution. In Carrillo et al. (2018), it is argued that combining Shannon’s discrete and differential entropies is neither practical nor sound because the differential entropy is neither invariant under a change of variable nor dimensionally correct. Therefore, both quantities may differ significantly in value. Consequently, Carrillo et al. (2018) proposed to use the localization uncertainty to weigh the Rényi entropy (Rényi, 1961) of the occupancy grid.

Based on the literature review on exploration approaches, we propose to generalize the previous work (Prágr et al., 2019a) toward a non-myopic approach. The therein proposed method combines active learning of traversal cost over terrains with spatial exploration using a greedy approach. The approximated spatial information gains and cost models are derived from Shannon’s discrete and differential entropies, respectively. Considering the reasoning of Carrillo et al. (2018), we avoid a direct combination of these two values in this study. In addition, we aim to build a modular system that supports the learning of models that range from the spatial map and cost predictors used in this study to temperature and pollution models. Hence, instead of creating a combined information gain utility function using the Rényi entropy, which is suitable for the combination of a map and robot’s localization model used by Carrillo et al. (2018), we elect to use a policy that combines the spatial exploration and cost learning goals (and goals reported by any additional model), similarly to the approach proposed by Karolj et al. (2020).

However, unlike the therein-built magnetism model, a spatial GP, we assume that the terrain traversal cost correlates with the terrain appearance. Therefore, the GP regressor infers the cost from the terrain feature descriptors instead of the terrain location. Consequently, rather than terrains nearby, sampling the cost to traverse an unknown terrain primarily affects the predictions over similarly appearing terrains close in the feature space. The affected terrains are determined using a terrain clustering scheme. Incremental growing neural gas (IGNG) (Prudent and Ennaji, 2005) is used to continually construct the terrain class structure, in which each class is assigned traversal cost and sampling reward (information gain) based on the GP’s predictions. As a result, we model the computation of the goal visit sequence as an instance of the Generalized TSP (GTSP) (Noon, 1988) (also called the Set TSP), which is a variant of the TSP where nodes are grouped into mutually exclusive and exhaustive sets. The problem is then to visit each set instead of visiting each node. In the context of the proposed exploration approach, the individual nodes correspond to possible sampling locations, and the sets are either terrain classes extracted from the cost prediction model or places where the robot can observe areas unknown to the spatial model.

The problem of mobile robot exploration with traversal cost learning is defined in the next section, whereas the strengths and weak points of the proposed approach are further discussed in Section 6.

3 Problem specification

The addressed exploration using an autonomous hexapod walking robot combines spatial exploration with active learning of terrain traversal cost models. The environment is modeled as a 2D grid $W \subset R^{2}$ with cells $ν \in W$ with size d_ν corresponding to the size of the robot foothold. The position of the robot p^robot is discretized as ν^robot within the grid that is at the center of the robot’s circular footprint with radius r_robot covering all the potential robot’s footholds, as shown in Figure 2. Any path ψ can be decomposed to a sequence of neighboring cells as follows:

\begin{aligned} ψ = (ν_{1}, ν_{2}, \dots, ν_{n}), \\ s.t. \\ \forall i \in 1, \dots, n : π (ν_{i}) = 1, \\ \forall i \in 1, \dots, n - 1 : ν_{i + 1} \in 8 n b (ν_{i}), \end{aligned} (1)

where n is the number of cells in the respective sequence, the function 8 nb(ν) lists the cells in the 8-neighborhood of ν, and π(ν) = 1 indicates that the cell ν is passable. In addition, the robot can use a discrete set of walking gaits $G$ , and it is assumed that the gait changes occur instantaneously at the particular grid cells $ν \in W$ .

FIGURE 2

FIGURE 2. Footprint around the robot position covers the cells with potential multi-legged walking robot footholds.

The robot desires to move through the environment as efficiently as possible with respect to (w.r.t.) the cost C. Therefore, it moves along the cheapest path between ν and ν′.

ψ^{*} (ν, ν^{'}) = {argmin}_{ψ \in Ψ (ν, ν^{'})} C (ψ), (2)

where Ψ(ν, ν′) is the space of all paths from ν to ν′. The cost C(ψ) of traversing ψ represents a generic path cost such as time to traverse or expected consumed energy; without the loss of generality, the time to traverse is the cost of choice in this study. It is assumed that the cost is additive, thus permitting to combine the costs of two consecutive path segments ψ_a and ψ_b into the cost of the combined path ψ_a ⊕ ψ_b as follows:

C (ψ_{a} \oplus ψ_{b}) = C (ψ_{a}) + C (ψ_{b}), (3)

where ⊕ denotes the concatenation of the paths. The cost of a path is decomposed to the sequence of costs to traverse from passable cell ν_a to its neighbor ν_b.

C (ψ) = \sum_{i = 1}^{n - 1} ‖ (ν_{i}, ν_{i + 1}) ‖ c (ν_{i}, ν_{i + 1}), (4)

where ‖(ν_a, ν_b)‖ is the Euclidean distance between the cells (i.e., either d_ν or $\sqrt{2} d_{ν}$ ), and c (ν_a, ν_b) is the per-meter cost of traversing from ν_a to ν_b.

In the spatial exploration, the robot builds the geometry model $P$ , which provides the cell passability assessment π(ν). It is assumed that the geometry is sufficient to distinguish the passable areas; hence, the passability model $P$ is constructed directly from the continually streamed exteroceptive measurements (observed point clouds z^pcd).

3.1 Traversal cost modeling

The traversal cost is assumed to be too complex to be assessed only from the terrain geometry. In this study, the task is to learn a traversal cost predictor $C$ that models the cost as a function of terrain appearance. The cost assessments are used in path planning w.r.t. (4). In addition, the cost model is also responsible for selecting the gaits suitable for the particular terrains traversed by the robot. Since the robot position is abstracted as the center of its circular footprint, the predictor’s per-meter-cost predictions are conservative estimates that take into account all the cells on the footprint. The cost predictor is learned online during the exploration from the robot experience, which comprises the cost z^c experienced by the robot when traversing terrain described by the terrain appearance descriptor ta using gait g.

The learned model is compared to the uninformed baseline that represents a robot that only explores the spatial map and does not learn the cost models and thus uses the optimistic flat cost model.

\hat{c} (ν_{a}, ν_{b}) = \frac{1}{v_{max}}, (5)

where v_max is the maximum robot velocity over all $g \in G$ . Notice that, in planning, the particular value of v_max is not relevant as long as it is positive because it only scales the total cost, thus not affecting the planning decisions. The baseline selects the gaits reactively, using the fast gait capable of reaching v_max by default and switching to slower yet rough-terrain-capable gaits when the robot gets stuck on the traversed terrain.

The proposed approach is evaluated in model scenarios as follows. First, the robot is set to explore the environment $W$ , and it incrementally learns the model $C$ . Then the learned and baseline models are used in navigating the robot between a set of benchmark coordinates in $W$ and the total cost C experienced by the robot (i.e., the time needed to move between the coordinates) using the particular model is considered to be the benchmark value.

4 Proposed system for active terrain learning in exploration

In this section, we describe the proposed system for active terrain learning and exploration, which is overviewed in Figure 3. During the exploration, which yields the spatial geometric passability model $P$ , the goal of the robot is also to learn the traversal cost model $C$ . The geometric passability model $P$ describes the shape of the environment and thus areas passable by the robot. The traversal cost model is decomposed into the set of models $C = C^{G} = {C^{g}}_{g \in G}$ , where each traversal cost model $C^{g}$ predicts the costs associated with traversing the passable terrain using the gait $g \in G$ . The respective cost predictors are Gaussian process (GP) regressors (Rasmussen and Williams, 2006), which use terrain appearance to infer the robot-experienced traversal cost accrued during the deployment. Each GP is coupled with the incremental growing neural gas (IGNG) (Prudent and Ennaji, 2005) that clusters similarly appearing terrains and hence identifies terrain types not yet visited by the robot. The exploration problem is modeled as an open-ended instance of the generalized traveling salesman problem (GTSP) (Noon, 1988), a variant of the TSP where the vertices are organized in disjoint sets, and each set is visited once. In this study, each set corresponds to an exploration or learning goal (a set of sampling sites) yielded by the spatial or cost model.

FIGURE 3

FIGURE 3. An overview of the proposed exploration system. The robot uses the RGB-D data to build the color elevation model of the environment in which it identifies the passable areas (Algorithm 2). The terrain appearance stored in the model is paired with the costs experienced by the robot to learn the traversal cost models for the individual locomotion gaits (Algorithm 5 and Algorithm 6). The cost predictions for the individual gaits and the terrain passability are used to plan the exploration path in a TSP sequence (Algorithm 1) over every goal reported by the geometric and cost models. The robot navigates to the first goal in the sequence (Algorithm 4).

In the rest of the section, we describe the exploration process. The symbols used in the description are listed in Table 1. First, we show how the GTSP is used to find the exploration path. Then we show the geometric environment model in detail and the related passability model $P$ , the traversal cost models $C^{g}$ , and their use to find the exploration goals.

TABLE 1

TABLE 1. Used symbols.

4.1 Exploration

The robot explores the passability model $P$ and learns the traversal cost models $C^{G}$ by visiting the exploration $Γ_{P}$ and cost learning ${Γ_{C}}^{G}$ goals, which are continually yielded by the respective models. Each goal $γ \in Γ_{P} \cup {Γ_{C}}^{G}$ is associated with a set of sites (cells) $γ = {ν_{i}}_{i = 0}^{| γ |}$ where the robot can improve its models by sampling the respective goal. The robot needs to visit one of the corresponding locations to sample the goal. Geometric model goals $γ \in Γ_{P}$ are located at singular sites γ = {ν}, where the robot can improve the spatial model by observing new areas. Each traversal cost model goal $γ \in {Γ_{C}}^{G}$ , where ${Γ_{C}}^{G} = \cup_{g \in G} {Γ_{C}}^{g}$ , is associated with a set of sites $γ = {ν_{i}}_{i = 0}^{| γ |}$ at which the robot can improve the model by experiencing novel gait-terrain costs. The areas covered by the individual goals in a given cost model are designed to be disjoint. Thus, sampling the traversal cost model at a site corresponding to the ${g o a l}^{2} {γ_{C}}^{g} \in {Γ_{C}}^{g}$ provides no, or severely limited, information regarding the traversal cost model at a site corresponding to a different ${g o a l}^{1} {γ_{C}}^{g} \neq^{1} {γ_{C}}^{g}$ . On the other hand, the passability and traversal cost models are considered independent. Sampling at one particular site might improve both models since the robot can observe previously unseen areas while experiencing untraversed terrain. However, two cost models cannot be improved at once since the robot can only experience the cost for the currently used gait.

Given the current robot position $ν_{t}^{robot}$ and models $P_{t}$ and $C_{t}^{G}$ at any time t during the exploration, the robot selects a shortest exploration path $ψ_{E} (p_{t}^{robot}, P_{t}, C_{t}^{G})$ that visits at least one site corresponding to each goal. The path planning is modeled as an instance of the GTSP, where vertices (sites) are organized in disjoint sets (goals), and each set is visited exactly once. The distance matrix D describes the costs of paths between the individual sites, including the distances between the current robot position and the goal sites.

D (ν, ν^{'}) = \hat{C} (ψ^{*} (ν, ν^{'})) . (6)

A total of two transforms are applied to the distance matrix D to create an open instance of the GTSP. First, the robot does not need to return to its current position after exploring the environment. Hence, the problem is transformed by setting the cost to reach the current robot position from any goal as zero $\forall γ \in Γ_{P} \cup Γ_{C}^{G}, \forall_{ν} \in γ : D (ν_{γ}, ν^{robot}) = 0$ . Second, we apply the Noon–Bean transformation (Noon and Bean, 1993) to transform an instance of the GTSP into an instance of the TSP. The open instances of the transformed TSP are solved by the LKH solver (Helsgaun, 2000), a heuristic solver with asymptotic time complexity bounded by $O (m^{2.2})$ , where m is the number of vertices, which has been found sufficient for updates with tens of goal sites. The solver returns the sequence of sites (ν^robot, ν₀, ν₁, … , ν_n) to be visited through the environment, see Figure 4A, where n is the number of goals and each site ν_i corresponds to a different goal. The robot navigates toward the first site of the sequence and its current exploration goal $ν_{E}^{*}$ becomes $ν_{E}^{*} = ν_{0}$ , see an example of the path in Figure 4B.

FIGURE 4

FIGURE 4. An example of a planned exploration path. (A) Global path over the sequence of goals determined by the TSP solver; (B) the local path to the first goal.

The plan is recomputed on-demand either when there is a change in the goal set or as a result of reaching the current goal. Moreover, upon reaching a cost model goal, the robot switches to the model’s respective gait g^enforced and is forced to move forward for Δt_sample (or until an obstacle is reached) to sample the traversal cost over the terrain. The exploration ends when every model reports zero goals. The exploration process is summarized in Algorithm 1.

Algorithm 1. Exploration.

4.2 Environment geometry & passability model

The grid environment $W$ is represented by the colored elevation grid map $M_{2.5 D}$ with the cell size d_ν. The grid map is built online during the exploration according to Algorithm 2 using the robot’s range measurements and RGB camera images. The elevation at each cell $ν \in M_{2.5 D}$ is obtained by fusing the localized range measurements $z_{i}^{pcd}$ into the grid map using an one dimensional Kalman filter, as described in Fankhauser et al. (2014) or Bayer and Faigl (2020). The localization of the robot, and also the localization of the range measurements, is considered to be solved by the Intel RealSense T265 tracking camera, which estimates the robot’s full six DOF pose based on visual simultaneous localization and mapping supported by an inbuilt inertial measurement unit¹. The grid map is used as a model of the terrain geometry to identify passable places. It also captures the color of the terrain texture that is processed to compute the terrain appearance descriptors.

Algorithm 2. Spatial exploration.

We define the passability of the cell $ν \in M_{2.5 D}$ as the probability π(ν) that the cell ν can be traversed by the robot. The probability itself is based on the observed roughness of the terrain computed based on Bayer and Faigl (2021) as follows:

ρ (ν) = \max_{ν^{'} \in 8 n b (ν)} Δ (ν, ν^{'}), (7)

where 8 nb(ν) is the 8-neighborhood of the cell ν, and the step height Δ(ν_a, ν_b) is as follows:

Δ (ν_{a}, ν_{b}) = | e l e v a t i o n (ν_{a}) - e l e v a t i o n (ν_{b}) |, (8)

where elevation(ν) denotes the estimated height of the terrain at ν. The probability that the robot can pass a cell ν is as follows:

π (ν) \{\begin{cases} 0 & if ρ (ν) > ρ_{obstacle} \\ 1 & otherwise \end{cases}, (9)

where the threshold ρ_obstacle represents the lowest obstacle to be detected. An example of the grid map is shown in Figure 5A.In active perception scenarios, the information about the terrain model $M_{2.5 D}$ gained by observing the cell ν′ is evaluated by entropy based on the known passability. Since the distribution of the passability is binary and depends on the 8-neighborhood of the cell, information gained by observing ν′ with unknown height is approximated as follows:

{I_{P}}^{cell} (ν^{'}) \approx \frac{k (ν^{'}) + 1}{9}, (10)

where k(ν) is the number of the unknown cells in the neighborhood of ν. Thus, the expected information gained by perceiving the terrain from the position of the cell ν can be expressed as follows:

{I_{P}}^{model} (ν) = \sum_{ν^{'} \in δ (r_{sensor}, ν)} \{\begin{cases} {I_{P}}^{cell} (ν^{'}) & if o b s e r v a b l e (ν, ν^{'}) \\ 0 & otherwise \end{cases}, (11)

where δ(r_sensor, ν) is the sensor range r_sensor-large neighborhood of ν; the function observable (ν, ν′) returns $t r u e$ if and only if the cell ν′ is observable from ν, which is determined by casting a ray from ν to ν′ in the current elevation map $M_{2.5 D}$ . Using all the cells with non-zero entropy in the TSP formulation is computationally intensive. Thus, we propose to spatially cluster the entropy to generate a limited number of spatial entropy representants by Algorithm 3.

FIGURE 5

FIGURE 5. Illustration of the color-geometric and cost models. (A) Visualization of the online built geometrical model with marked passability and clusters based on the cells with non-zero information according to the shown color legend; (B) terrain appearance descriptor calculated as a histogram of cell colors. The costs used in path planning; (C) minimal cost over gaits after the distance transform; (D) respective cheapest gait (gaits in red and purple). (E) Colors used to build the color histogram terrain appearance descriptor; (F) measured costs used for learning the GP (adjusted by hyperbolic tangent), visualized over the terrain appearance; (G) raw GP cost prediction; (H) GP prediction uncertainty. (I) Terrain clusters (arbitrary colors used to distinguish clusters); (J) information gained with terrain learning goals (goal colors corresponding to clusters); (K) cluster costs used in planning.

Algorithm 3. Cluster entropy representatives

In addition to the terrain geometry, the grid map $M_{2.5 D}$ also carries the terrain texture calculated by the following approach. Each cell is provided a 10-bit color by projecting the camera image to the map $M_{2.5 D}$ . Then the color space is shrunk to nine different colors, defined by color prototypes listed in Figure 5B. The relative amount of the cell colors within the radius r_hist matched to the selected color prototypes are used to build a 9-dimensional terrain appearance descriptor ta(ν) for each cell $ν \in M_{2.5 D}$ , which is visualized as a color histogram in Figure 5B.

4.3 Traversal cost model

The cost model $C$ predicts the per-meter traversal cost c over observed areas deemed passable by the geometric passability model $P$ . The traversal cost model predicts the traversal cost from terrain appearance. Since the robot position is abstracted as the center of its circular footprint, the $C$ ’s per-meter-cost predictions are conservative estimates that take into account all the cells on the footprint.

\hat{c} (ν_{a}, ν_{b}) = {m a x}_{ν^{'} \in δ (r_{robot}, ν_{a})} \hat{c} (ν^{'}), (12)

where δ(r, ν) lists all cells within the r-radius of cell ν, and $\hat{c} (ν)$ is the $C$ cost estimate over cell ν. An example of the traversal cost assessment is depicted in Figure 5C.

The cost $\hat{c}$ is reported for the whole model set $C = C^{G} = {C^{g}}_{g \in G}$ , since it is the best gait-terrain cost.

\hat{c} (ν) = {m i n}_{g \in G} {\hat{c}}_{C^{g}} (ν), (13)

where each gait-terrain cost ${\hat{c}}_{C^{g}}$ is the prediction of the particular model $C^{g}$ . In addition, when navigating through the environment, the robot selects its gait w.r.t. the minimization in Eq. 13, as depicted in Algorithm 4. An example of gait selection is visualized in Figure 5D. A distance transform with c_loss per-meter-loss is used over the cell grid with the best-gait costs $\hat{c} (ν)$ to dissuade the robot from navigating areas near terrain boundaries where frequent gait changes are likely.

Algorithm 4. Navigate

$C^{g}$ comprises the cost regressor $R$ and the terrain type clustering $T$ . In $R$ , we use GP regression to predict the traversal costs because it provides the predicted values and models the prediction uncertainty. Each traversal cost regressor $R$ is learned from the learning set $L$ of the paired terrain descriptors and the respective traversal costs observed when using the particular gait g that are depicted in Figure 5E and Figure 5F, respectively. The particular learned cost regressor $R$ is used to predict the normal distribution of the traversal cost at queried terrain descriptor ta as follows:

N ({\hat{μ}}_{c}, {\hat{σ}}_{c}^{2}) (t a, R) = p r e d i c t (t a, R) . (14)

The cost prediction (visualized in Figure 5G) is the expected value.

\hat{c} (t a, R) = E (N ({\hat{μ}}_{c}, {\hat{σ}}_{c}^{2}) (t a, R)) = {\hat{μ}}_{c} (t a, R), (15)

and the uncertainty of the prediction (shown in Figure 5H) is characterized by the differential entropy.

H (N ({\hat{μ}}_{c}, {\hat{σ}}_{c}^{2}) (t a, R)) = \frac{1}{2} \log (2 π e {\hat{σ}}_{c}^{2} (t a, R)) . (16)

The prediction uncertainty is used to approximate the information gain $I_{C}$ associated with sampling the individual observed terrains, thus identifying areas the robot needs to visit to improve the traversal cost model.The terrain type clustering $T$ identifies the distinct terrain types (terrain descriptor clusters) in the environment. The terrain class set $T$ is designed to be disjoint regarding the prediction model. Thus, sampling the traversal cost model at a cell corresponding to one terrain class provides no, or severely limited, information regarding the traversal cost model at a location corresponding to a different class. In particular, following Pasolli and Melgani (2011), the classes are selected to be mutually distant in the terrain descriptor space. Each observed cell is assigned the closest terrain class as the closest class in the descriptor space.

T^{*} (ν) = {argmin}_{T \in T} ‖ t a (ν), t a (T) ‖, (17)

where ta(T) is the appearance assigned to the terrain class $T \in T$ . Since, on small terrain classes, it might not be possible to acquire enough samples to learn the traversal cost with sufficient certainty, we apply class erosion as described in Supplementary Appendix S1. The erosion output is the learning class assignment T and the planning class assignment $\hat{T}$ . We avoid computing the cost prediction for every cell independently² and report the $C^{g}$ prediction over a particular area as the cost to traverse over its respective terrain type.

{\hat{c}}_{C^{g}} (ν) = \{\begin{cases} \hat{c} (t a (\hat{T} (ν)), R) & if \hat{T} (ν) \neq \emptyset, \\ c_{max} & otherwise, \end{cases} (18)

where the maximum cost c_max is reported for cells with no class (i.e., eroded) ∅.The rest of this section describes how the traversal cost experience used to learn the models is measured, how the GP regressor is learned, and how the terrain type clustering is used to identify the locations where to improve the cost model.

4.3.1 Traversal cost measurement

The measured traversal cost describes the time needed to traverse between cells as z^c (ν, ν′). Since the distance between 2 cells is significantly lower than the robot stride length, the cost is smoothed over path segments (cell sequences) with a fixed duration. In particular, the per-meter cost c is continually measured as the inverted robot velocity v⁻¹ over the path segment traversed by the robot in the last Δ t s.

v^{- 1} (ψ_{s}) = \frac{T (ψ_{s})}{‖ ψ_{s} ‖}, (19)

where ‖ψ_s‖ is the length of the segment in meters and T(ψ) is the measurement duration that is fixed to Δt. If the robot had not changed its gait on the segment, the cost is reported to the particular model $C^{g}$ as the cost to traverse the midpoint of the segment as $z^{c} (ν_{⌊ | ψ_{s} | / 2 ⌋}, ν_{⌊ | ψ_{s} | / 2 ⌋ + 1})$ . In addition, to remove potential cost spikes, the cost is further smoothed using a moving average window of the same (Δt) duration. Since the inverse velocity is unbounded and has both high values and high variance for a stuck robot, the cost to be used by the predictor is transformed as follows:

c = c_{high} \tanh (\frac{1}{c_{high}} \frac{v^{- 1}}{{v_{max}}^{- 1}}), (20)

where the maximum robot velocity v_max (maximum from all $g \in G$ ) scales the cost of the robot moving over an ideal terrain to 1, and the high cost c_high, which should only be experienced by a stuck robot, is used in the transform to bound the cost values.

4.3.2 Gaussian process traversal cost regressor

The employed GP regressor predicts both the prediction mean and variance making it suitable to model the prediction distribution as in (Eq. 14). Its description is dedicated to Supplementary Appendix S2 to make the study self-contained. GP regressor is learned only if there are at least $n_{L}^{min}$ learning pairs in $L$ to avoid learning overconfident predictors at the beginning of the exploration. The learning is summarized in Algorithm 5.

Algorithm 5. Traversal cost model learning.

The covariance function used in this work is the squared exponential kernel.

K (x, x^{'}) = {σ_{s}}^{2} \exp (- \frac{1}{2} \frac{{(x - x^{'})}^{2}}{l^{2}}), (21)

where ${σ_{s}}^{2}$ is the output variance, and l is the length scale. We consider that the robot’s cost and feature models have known ranges based on (Eq. 20) and the histogram descriptor, respectively. Therefore, similar to Karolj et al. (2020), the kernel hyperparameters l and ${σ_{s}}^{2}$ and GP’s ${σ_{ϵ}}^{2}$ have fixed values that we consider to be dependent on the system parameters.The GP is continually relearned when new observations using the particular gait g are experienced. The learning complexity can be bounded by $O (n^{4})$ , where n is the number of training points. The size of the learning set $L$ is limited by using at most one training point corresponding to each cell in $M_{2.5 D}$ and by storing measurements only when they are novel and thus likely to improve the model. Hence, the relative traversal cost c(ν) experienced at cell ν is paired with the appearance descriptor ta(ν) of the respective traversed terrain, and when building the learning set $L$ , the model reports the pair (ta(ν), c(ν)) for each cell where both values are available.Since the robot keeps only one measurement for each cell, each novel cost measurement z^c (ν, ν′) experienced when using the gait g is allocated to the grid map cell ν and its neighbors in 8 nb(ν), and the traversal cost c(ν) at the cell ν is modeled using the Kalman filter with the estimated value and covariance as follows:

c_{k} = \frac{σ_{sense}^{2} c_{k - 1} + σ_{k - 1}^{2} {z^{c}}_{k}}{σ_{sense}^{2} + σ_{k - 1}^{2}}, σ_{k}^{2} = \frac{σ_{sense}^{2} σ_{k - 1}^{2}}{σ_{sense}^{2} + σ_{k - 1}^{2}}, (22)

where ${z^{c}}_{k}$ is the kth cost measurement at ν and $σ_{sense}^{2}$ is its variance. The filter is initialized by the first cost observation ${z^{c}}_{0}$ at the respective cell, and the initial filter variance is $σ_{0}^{2}$ .In total, two cases are considered as situations when the cost is novel, and thus the model should be improved by storing the cost w.r.t. (Eq. 22): 1) when the prediction is erroneous and 2) when the prediction is uncertain. For the former, the cost experienced at the cell ν is accumulated if the measured cost z^c is out of the approximate 95% confidence interval $| {\hat{μ}}_{c} (t a (ν)) - z^{c} | > 2 {\hat{σ}}_{c} (t a (ν))$ of the prediction at ν. For the latter, the approximated information gain of the prediction is considered, and the robot accrues measurements when there is a potential of information gain $I_{C} (T (ν)) > 0$ , which computation is described in the following paragraphs.

4.3.3 Terrain type clustering and goal identification

The traversal cost exploration goals ${Γ_{C}}^{g}$ are selected by the robot as areas where the model can be improved and thus are the areas where the traversal cost model is uncertain. Each goal represents a terrain class where the robot can sample novel information about the cost model. The overall approach to goal identification is summarized in Algorithm 6.

Algorithm 6. Terrain type clustering, goal identification, and cost identification.

Algorithm 7. Cluster.

The clustering scheme presented in Algorithm 7 is based on the IGNG, described in Supplementary Algorithm S1, to make the study self-contained. In the neural gas, each neuron is a terrain prototype ta(T) in the descriptor space that represents a terrain class T. When separating the classes, the intuition is that for exponential kernels, the length scale describes the range from the data where the model can reliably extrapolate, as used, for example, in Karolj et al. (2020). Hence, new classes are inserted into the neural gas when the distance from all prototypes exceeds σ^IGNG = 2l. The neural gas is constructed incrementally by repeated adaptation using the appearance descriptors in the environment, where the size of each adaptation batch is limited to n^IGNG descriptors that are randomly sampled from all the descriptors, and the yielded terrain classes can be seen in Figure 5I.

Algorithm 8. Compute information gain.

The terrain classes for which the cost model can be improved are identified using the cost regressor $R$ -predicted traversal cost distribution $N ({\hat{μ}}_{c}, {\hat{σ}}_{c}^{2}) (t a (T))$ at the class prototypes ta(T). The traversal cost exploration goals are selected according to Algorithm 8 as the classes where there is potential for information gain; see the visualization in Figure 5J. The gain is approximated from the prediction entropy.

I_{C} (T) \approx m a x (H ({\hat{σ}}_{c}^{2} (t a (T))) - H_{C}^{GT} (L), 0), (23)

where $H_{C}^{GT}$ is a threshold value associated with the uncertainty of the experienced traversal costs. The robot learns when there is potential of information gain $I_{C} > 0$ , and no information can be gained at eroded cells $I_{C} (\emptyset) = 0$ . We set the threshold value based on the highest prediction uncertainty for terrains that are considered certain since they cover cells that are already in the learning set as follows:

H_{C}^{GT} (L) = \max_{T \in T : | \{ν \in M_{2.5 D} : T (ν) = T\} \cap L | > m_{T}} H ({\hat{σ}}_{c}^{2} (t a (T))), (24)

where we avoid overconfident GP-predictions for barely sampled terrains by allowing only terrain classes with at least m_T observed ground truth cost values. The threshold equals the maximum value over such ground truth terrain classes.

Algorithm 9. Identify goals.

The sampling locations (visualized, for example, in Figure 5J) corresponding to the terrain class are sampled along a lattice S with the cell size d_S ≫ d_ν, as depicted in Algorithm 9. For each lattice point p_S, the closest cell ν in $δ (\frac{\sqrt{2} d_{S}}{2}, p_{S})$ radius that is not associated with a traverability measurement and that is informative with $I_{C} (T (ν)) > 0$ is reported as a sampling site; if no such cell exists, no site is reported for the lattice point. Since only cells without measurements are considered, it is possible for small terrain classes to run out of cells before reaching m_T measurements. In such a case, the class is considered too small to learn and is no longer reported as a goal, and it is pruned from the class set. In addition to the goals, the traversal cost ${\hat{c}}_{C^{g}} (ν)$ (visualized in Figure 5K) is also reported for the ν′s prototype $t a (\hat{T} (ν))$ w.r.t. (Eq. 13) according to Algorithm 10.

Algorithm 10. Set planning cost.

5 Experimental evaluation

The proposed exploration with active terrain learning has been examined in simulated trials and real experimental deployments using a hexapod walking robot. The simulated and real scenarios have been set up so that the robot first explores the environment and learns the cost models using the proposed method and, in some tests, a selected baseline method. Then the performance has been evaluated and compared with the baseline approach by navigating the robot over a sequence of benchmark waypoints using the respective traversal cost models of the environment learned during the exploration.

The hexapod walking robot, which can be seen in Figure 1, is used in the real deployment, and the simulation is parameterized to mimic the robot’s motion and sensory capabilities. The robot has six legs, each comprising three Dynamixel XM430-W350 servomotors. The robot is equipped with the Intel RealSense D435 camera used to construct the colored environment model and the Intel RealSense T265 localization camera. The onboard computation is provided by the Intel NUC 10i7FNK with Intel Core i7 10710U accompanied with 64 GB memory, running Ubuntu 18.04 with ROS Melodic (Quigley et al., 2009). The robot locomotion is facilitated by a blind adaptive motion gait (Faigl and Čížek, 2019). The robot uses two particular gait configurations, see Table 2: The fast gait suitable for flat, even surfaces, and the tall gait that performs better than the fast gait over rough terrain but otherwise is slower. The robot is equipped with a reflex that detects that the robot is stuck with costs exceeding c_max and switches over to the tall for Δt_fallback seconds to avoid the robot getting stuck when using the baseline model or at the beginning of the learning process. The parameterization of the proposed method can be found in Table 3, and the operating frequencies of the proposed method’s processes are depicted in Table 4.

TABLE 2

TABLE 2. Gait parameterization.

TABLE 3

TABLE 3. System parameterization.

TABLE 4

TABLE 4. System operation frequencies.

5.1 Simulated scenarios

The simulated scenarios are based on a courtyard environment captured by four 3D scans obtained using Leica BLK 360 3D scanner and visualized in Figure 6A. The scanner has standard deviation of 4 mm at 10 m and 7 mm at 20 m. The scans total approx. 1.4×10⁸ points.

FIGURE 6

FIGURE 6. (A) 3D scan of the university campus at Charles Square in Prague, (B) section of the courtyard, and the respective simulated environment (C) color and (D) relative traversability (light areas easier to traverse). The red bounding box represents the area where the robot should explore. The blue points are the points to be visited by the robot in the first test tour.

In total, two virtual environments are created using the scan: small and large. The small environment represents a small section of the courtyard, where the simulated robot mimics the real robot’s speed and sensory equipment. It is used to test the benefit of the individual components of the proposed approach by comparing them to baseline methods where the particular component is removed or simplified. The large environment comprises terrain segments observed in the scan that are rearranged to create a larger, artificial environment with obstacles where different exploration algorithms are compared using a faster robot with an extended sensor range.

5.1.1 Small environment

The small environment is concerned with a section of the environment that is detailed in Figure 6B. We have created a simulation model of the environment containing several types of pavement (gray and red) and turf (green, brown), shown in Figure 6C. The turf is modeled as hard to traverse and can get the robot stuck for the fast gait, whereas the pavement does not impede the robot, see Figure 6D.

First, to demonstrate the benefits of using a cost model learned from prior experience, the robot is tasked to execute two tours in the environment using the learned cost model and a flat-cost baseline model. Second, the utility of exploring along the proposed GTSP-derived path is demonstrated by comparing its time to explore the environment with a greedy, myopic baseline, which drives the robot to the cheapest goal to reach w.r.t. the so far learned costs.

The first tour comprises four waypoints. The robot starts at the bottom-left point and executes the tour counter-clockwise until reaching the start location again. The two particular areas are designed to demonstrate the utility of the learned model: 1) the segment between the bottom-right and top-right waypoints where the robot can choose either a direct route over the turf or a longer path over the pavement and 2) the area around the top-left waypoint where the turf cannot be avoided and thus the robot needs to switch to the tall gait. The second tour comprises 20 points randomly sampled in the environment, and it serves to demonstrate the performance of the learned model over a tour that was not handcrafted.

In addition to the proposed approach and the baseline, in the simulated tests, we also deploy a hybrid gait selection approach that chooses its gait using the proposed model but does not plan its path w.r.t. the predicted costs and walks directly to the next waypoint. Unlike the baseline approach, which switches to the tall gait when stuck and repeatedly tries to switch back to the fast gait, the hybrid gait selection approach switches gaits only when approaching or leaving the terrain identified as hard to traverse by the model. Hence, it should outperform the baseline over longer sections on difficult terrains, where the baseline is slowed down by trying to switch back to the fast gait.

The simulation environment consists of the Intel i7-9700 3.00 GHz with 32 GB memory running Ubuntu 18.04 with ROS Melodic. Since the captured environment comprises terrains that might slow down the robot because they are somewhat non-rigid, instead of using a geometry-based simulator such as Gazebo, which cannot model such terrains, we elect to build a virtual environment over a simple simulator using real-world data. The simulation is performed using the simple two dimensional robot simulator (STDR)³ within the ROS ecosystem. On top of the simulator, we have implemented an interface that simulates the robot’s RGB-D camera, which assigns each point in the robot’s simulated exteroceptive measurements color based on the point’s position in the environment color map shown in Figure 6C and filters the measurements to contain only points within the 87 deg wide field of view of the simulated RGB-D camera. The terra-mechanical properties are simulated by slowing down the robot over the individual traversed terrains w.r.t. the performance observed over such terrain in a real-world deployment, as shown in Figure 6D.

In the evaluation, the robot first explores and learns the models shown in Figure 7A to Figure 7I. An example exploration path can be seen in Figure 7J. The robot learns that the turf, which appears either green or brown, cannot be traversed by the fast gait and thus selects the tall gait over that terrain type. On the other hand, the pavement does not hinder the fast gait, which is considerably faster and thus preferred.

FIGURE 7

FIGURE 7. Environment assessment after the simulated scenario run with regards to both gaits; (A) dominant color in the histogram feature; (B) merged cost used for planning; (C) selected gait (fast in red, tall in purple); (D) costs used for learning the fast gait model [adjusted by hyperbolic tangent in (Eq. 20)], visualized over the terrain appearance; (E) clusters used in the fast gait model (arbitrary colors used to distinguish clusters); (F) fast gait cost predictions assigned by the dilated clusters; (G) costs used for learning the tall gait model [adjusted by hyperbolic tangent using (Eq. 20)], visualized over the terrain appearance; (H) clusters used in the tall gait model (arbitrary colors used to distinguish clusters); (I) tall gait cost predictions assigned by the dilated clusters; (J) exploration run; (K) test-tour run using the baseline model without the learned traversal costs; (L) test-tour run using the learned traversal costs. The development of the path through the fully discovered simulated environment during the exploration; (M) at the beginning of the exploration, the robot uses flat costs and thus does not avoid difficult terrains; (N) after learning the costs for the fast gait, the robot is too cautious and avoids going near the costly turf; (O) after learning the tall gait costs, the robot is less cautious and is willing to walk near difficult terrain.

Although the two gait models create the terrain clusters independently, the clusters in Figure 7E and Figure 7H differ only in cluster indices used in the internal representation (each index is associated with a different color in the visualization). It can be observed that the robot does not use any clusters associated with the red line on the pavement, either removing the thin cluster outright in the erosion or pruning the small erosion remains after the robot finds out that it cannot get enough samples to learn such a small terrain.

In the particular exploration run shown in Figure 7J, the robot first walks along the left side of the exploration bounds, learning the fast gait costs for both the pavement and turf and the tall gait cost over the turf. Then the robot learns the tall gait cost over the pavement while clearing the spatial exploration goals. During the exploration, it can be seen that the robot avoids walking over the remaining turf, only approaching it at the very end of the exploration. Thus, the robot needs only to enter and not leave the turf (minimizing the time on the costly terrain) to reach the goal that lies on the turf.

The test runs using the baseline, and the learned model over the first tour are shown in Figure 7K and Figure 7L, respectively. In addition, the development of the tours that would be used at different points during the exploration can be seen in Figure 7M through Figure 7O. In the baseline test, the robot walks directly between the waypoints and only switches to the tall gait after getting stuck. On the other hand, when using the learned model, the robot avoids the turf if possible and switches to the tall gait before entering the turf while pursuing the top-left goal.

The performance over 25 simulated trials (five exploration runs, each with five tour tests for the tour tests; 25 runs for the simulated exploration tests) can be observed in Table 5. On the first tour, the hybrid gait selection approach is slower than the reactive baseline. In the authors’ opinion, it is caused by the conservative (large) value of r_robot, which compels the robot to use the slow tall gait on the border between the rough terrain and pavement, whereas the reactive approach only tries to switch back to the fast gait (which is its main disadvantage when compared to the hybrid approach) a few times on the short rough terrain segment. Nonetheless, the proposed learned model knows to avoid such areas and performs better or the same as the other approaches over every tour segment. Hence, the results suggest that robot benefits from using the learned costs in path planning. Over the second tour, the robot performs similarly. The learned model outperforms the baseline when moving around or over the turf. Both approaches exhibit similar travel times when the direct path between the waypoint leads only over the pavement. Unlike over the first tour, the hybrid gait selection performs better than the baseline approach, presumably due to longer sections over hard-to-traverse terrains on the second tour. The proposed approach consistently outperforms the baseline and hybrid gait selection approaches; we conclude that the robot benefits from using the learned model.

TABLE 5

TABLE 5. Performance as the time (total cost) in seconds to traverse.

In addition to the tour tests, the results suggest that the robot benefits from using the non-myopic GTSP planner compared to the myopic greedy approach. Even though the performance of the two approaches appears relatively close, the Mann–Whitney U Test (Mann and Whitney, 1947) rejects the null hypothesis of the same exploration time distribution at 99.5% confidence against both the two-sided and the relevant one-sided alternative. In the authors’ opinion, the high variance in the observed exploration times can be attributed to the effect of random chance in exploration since neither myopic nor non-myopic approaches are informed about the terrains in unexplored areas. However, the myopic explorer is more likely to make a bad decision, such as not clearing some of the goals in a particular area that needs to be visited later. Therefore, the proposed non-myopic approach performs better overall.

5.1.2 Large environment

The large environment is an artificial 20 × 25 m outdoor/indoor scenario. The map comprises patches from the courtyard scan rearranged as shown in Figure 8. Given the size of the environment, the robot is sped up five times. The cell size is increased to 0.1 m, and other parameters are adjusted accordingly, see Table 3. In addition, the robot uses an omnidirectional sensor with the increased range of 10 m, which expands the range of terrains that can be observed without the respective terrain’s traversal. To accommodate the simulation of the increased sensor range, the virtual environment is run on AMD Ryzen Threadripper 3960× 3.8 GHz with 48 GB memory running Ubuntu 18.04 and ROS Melodic, using STDR in the same manner as for the small environment.

FIGURE 8

FIGURE 8. Large simulated environment (A) color and (B) relative traversability, (C) and the test tour through the environment, which starts at the starred node and is counter-clockwise. The built maps of the large simulated environment: (D) geometric map and (E) merged costs used for planning after exploration using the proposed approach; merged costs ofter exploration using the spatial-only model while (F) avoiding and (G) traversing rough terrain, respectively.

Similar to the small environment, the robot is first set to explore the environment and then is tasked to visit the set of waypoints shown in Figure 8C. The proposed algorithm is compared to a spatial-only baseline approach, which learns the cost models only as a result of experiencing cost while pursuing spatial exploration goals. The spatial-only changes the gaits in a reactive fashion when stuck and hence only learns the model for the tall gait if it enters the difficult green or brown turf during the exploration.

The quantitative results for the large environment are shown in Table 5. Since the proposed approach actively tries to sample every terrain type, it is slower to explore the whole environment. However, the proposed approach performs better in the tour evaluation. Closer examination suggests that while the tour times of the proposed approach remain similar in all trials, the spatial-only times vary wildly since the learned models differ based on which terrains the robot has traversed during the exploration. This randomness can be attributed to differences in simulation and plan execution. In addition, Figures 8D–G shows the learned maps for the proposed model, and for the spatial-only model in both the cases when the rough terrain was and was not traversed. For the case when a rough terrain was traversed by the spatial-only model, the costs differ between the individual rough areas. However, the ground truth costs shown in Figure 8B suggest that they should be the same, as is the case for the proposed model. Likely, this is caused by the robot traversing only the brown-green rough terrain located on the left of the environment. The green terrain, located in the center and right of the environment, appears somewhat similar to the brown-green terrain. Hence, the robot considers it to be difficult to traverse to a certain degree. However, since the spatial-only model does not deliberately sample the terrains, the model’s guess differs somewhat from the exact cost to traverse the particular terrain, decreasing the fidelity of the predictions.

Overall, the presented results suggest that the proposed approach presents a tradeoff in terms of exploration and execution time: the longer time spent exploring the environment and learning the cost models provides the robot with better cost maps, which shorten the time to navigate the environment after it is explored. It should be noted that since the behavior of the spatial-only model is affected by random chance (differences in simulation and plan execution), it can provide models as good as the proposed approach. However, there is no guarantee that this would happen regularly, whereas the proposed approach has returned high fidelity maps in every test case.

5.2 Real robot experimental deployment

The viability of the proposed approach is demonstrated in the real experimental deployment, where the robot explores an indoor 2 × 6 m area visualized in Figure 9. The office-like environment comprises flat synthetic terrain that is easy to traverse but appears to the robot differently colored at different locations since it is glossy and carries the color of nearby objects located next to the arena. When building the colored elevation map $M_{2.5 D}$ , we use the first color observed at each location to account for the issue. In addition to the flat terrain, a green artificial turf is placed in a part of the test area to provide a relatively hard terrain to traverse. The robot interacts with the real terrains similarly to the simulation: the fast gait may get stuck on the turf but is faster than the tall gait over the flat parts of the arena. During the experiment, the robot is set to explore the area; even though it can leave the bounds of the 2 m × 6 m large area, it does not pursue goals located outside of the bounds.

FIGURE 9

FIGURE 9. The 2m × 6m large deployment area with a green artificial turf. The area boundary is in red, and the waypoints of the test tour are depicted in blue. The shown robot is at the starting position.

Figure 10 shows the maps learned in the experimental run, which is also presented in the accompanying Supplementary Video S1. A colored map of the environment is depicted in Figure 10A. The overall costs and selected gaits through the environment are shown in Figure 10B and Figure 10C, respectively.

FIGURE 10

FIGURE 10. Environment evaluation and the real robot exploration run; (A) dominant color in the histogram feature; (B) merged cost used for planning; (C) selected gait (fast in red, tall in purple); (D) costs used for learning the fast gait model (adjusted by hyperbolic tangents), visualized over the terrain appearance; (E) clusters used in the fast gait model (arbitrary colors used to distinguish clusters); (F) fast gait cost predictions assigned by the dilated clusters; (G) costs used for learning the tall gait model (adjusted by hyperbolic tangents), visualized over the terrain appearance; (H) clusters used in the tall gait model (arbitrary colors used to distinguish clusters); (I) tall gait cost predictions assigned by the dilated clusters; (J) exploration run; (K) test-tour run using the baseline model without the learned traversal costs; (L) test-tour run using the learned traversal costs.

During the experimental deployment, the robot first learns the largest gray appearing flat terrain using the fast gait. Then it learns on the turf for both gaits and returns to the gray area to learn for the tall gait. Afterward, the robot pursues the yet unvisited spatial goals and smaller off-color terrain clusters that appear near the environment boundary and are caused by the glossy floor that carries the color of the nearby objects.

Compared to the simulation, the robot needs a larger amount of the measurements to learn the terrains (see Figure 10D and Figure 10G), and there are more terrain clusters (see Figure 10E and Figure 10H). It suggests that the real environment is noisier and contains multiple differently colored areas, which is in line with our observations regarding the glossy floor material. Nevertheless, the traversal costs learned by the robot for the individual gaits (see Figure 10F and Figure 10I) are within expectations, as is the overall planning cost depicted in Figure 10B and gait selection visualized in Figure 10C.

The test run scenarios are set up similarly to the tours used in the simulated test; the robot is placed in front of the hard-to-traverse turf and tasked to reach a goal location behind the hard-to-traverse terrain, slightly out of the exploration bounds, see Figure 9. The paths shown in Figure 10K and Figure 10L show that when using the baseline without the learned model, the robot tries to reach the goal directly over the turf, gets stuck, and needs to switch to the slow tall gait. On the other hand, when using the learned model, the robot avoids the hard-to-traverse areas and reaches its goal quickly using the fast gait. The performance in the presented run can be seen in Table 5. Overall, we conclude that the real deployment confirms that the robot can actively learn the traversability as a part of the exploration mission and benefits from using such learned models.

6 Discussion

The presented exploration system is proposed as a combination of spatial geometric modeling and learning terrain-gait traversal cost models. However, the system is designed to support additional models that do not describe the robot’s traversal cost. Moreover, since the models are kept separate, there is no need to use the same feature set for each of them. Therefore, the approach is compatible with spatial models such as magnetism models (Karolj et al., 2020) or GP-based occupancy (Wang and Englot, 2016). The only requirement for a model is that it produces a set of learning goals in the environment that are resolved once particular information is sampled. Hence, the proposed system can be extended by including additional traversability models, such as modeling the passability of potentially non-rigid obstacles.

In addition, we approach the traversal cost prediction so that it supports any cost model that is additive along the traversed path, such as time to traverse or consumed energy. Besides, individual cost predictors describe the gaits of a hexapod walking robot, but they can also describe any discrete set of robot configurations. Hence, the approach is viable for any mobile robot that describes its motion experience using an additive cost and can also be used to model the energy a tracked robot consumes, for example, with adjustable flippers. A particular limitation of the cost modeling used in the presented approach is that we assume that the individual gaits are switched for free w.r.t. the cost (i.e., instantaneously for cost modeled as the time to traverse), whereas in practice, the gait requires some time to exhibit its properties. In this study, we leave the question of how to predict gait-change cost open for future work.

The used cost model goal generation stems from the idea that adding new observations does not increase GP uncertainty if the hyper-parameters are fixed (Rasmussen and Williams, 2006). Therefore, sampling new measurements should not increase uncertainty and thus not spawn new goals in areas containing none. In practice, even though we use fixed GP hyper-parameters, the non-increasing nature of the uncertainty does not strictly hold for the approximated information gain since, in addition to the GP hyper-parameters, the information gain also depends on the terrain clusters and the costs and descriptors in the learning set, all of which might drift during the exploration. However, the robot behavior demonstrated in both evaluation setups shown in Figure 7J and Figure 10J suggests that the assumption holds in general. The robot clears the areas corresponding to the individual terrains (goals) and is not compelled to return to previously visited areas.

The primary limitation of the proposed approach is identified in its inability to compare the utility of the goals originating from the different models. We are motivated to build a modular system that would support different model types; therefore, the proposed decoupled approach considers each goal equally valued, regardless of the source model. This limits how the models are used since the goal utility, such as the information gain, is relegated to be used only inside the particular model to determine which environment features (locations or terrain types) are goals to use in creating an instance of the GTSP. The proposed approach provides a non-myopic solution to visit the goals reported by the individual models, where the models are also non-myopic since each can report multiple goals. Myopic models that would report their respective highest utility goal (potentially with multiple sampling sites) can be used. However, similarly to the myopic planner with the results reported in Table 5, the time to explore would likely increase since the GTSP planner would lack the information on where to go after the current goals are sampled, and thus the exploration path would often change significantly. Integrating goal utility into the decoupled planning and using alternative utility functions such as the GP-UCB remains the subject of future work.

7 Conclusion

In this study, we present a system for autonomous mobile robot exploration that incorporates active learning of traversal cost models in addition to spatial model building. During the exploration, the robot builds the spatial geometric model of the environment and learns the traversal cost models, each comprising a Gaussian process regressor and a growing neural gas terrain clustering scheme. The geometric model is used to determine areas passable by the robot, while the cost models predict the traversal costs over the passable terrains from the terrain’s appearance. Each cost model corresponds to a particular hexapod walking robot locomotion gait. The robot approaches exploration in a decoupled manner, creating a set of goals for the spatial exploration and for each traversal cost model. The exploration path is planned by solving an instance of the generalized traveling salesman problem over the goals that are sets of possible sites of visits to improve the particular model. The proposed system has been evaluated in simulation setup and real experimental deployment with two different walking gaits. The results suggest that the proposed system yields the robot to explore the environment and learn the traversal cost models. The learned models benefit the robot’s operation in the environment. In future work, we plan to model the gait change costs, include additional traversability models such as obstacle rigidity, and extend the proposed approach to support goal utility and exploration–exploitation models.

Data availability statement

The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Author contributions

With the support of JF, MP, and JB designed the proposed system. MP and JB performed the experiments and processed the data. MP, JB, and JF wrote the manuscript. All the authors contributed to the manuscript and approved the submitted version.

Funding

The work was supported by the Czech Science Foundation (GAČR) under research project No. 18-18858S and 19-20238S. The support under the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics” is also gratefully acknowledged.

Acknowledgments

We would like to thank Petr Čížek and Jiří Kubík for their help with the hexapod walking robot maintenance.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frobt.2022.910113/full#supplementary-material

Footnotes

¹In the simulated experiments, the localization is provided by the simulator.

²In practice, for small environments, it is feasible to compute the prediction for every cell, and we do so for visualization as depicted in Figure 5G and Figure 5H.

³http://stdr-simulator-ros-pkg.github.io

References

Azpúrna, H., Campos, M. F. M., and Macharet, D. G. (2021). Three-dimensional terrain aware autonomous exploration for subterranean and confined spaces. IEEE Int. Conf. Robotics Automation (ICRA), 2443. –2449. doi:10.1109/ICRA48506.2021.9561099

CrossRef Full Text | Google Scholar

Baleia, J., Santana, P., and Barata, J. (2015). On exploiting haptic cues for self-supervised learning of depth-based robot navigation affordances. J. Intell. Robot. Syst. 80, 455–474. doi:10.1007/s10846-015-0184-4

CrossRef Full Text | Google Scholar

Bayer, J., and Faigl, J. (2021). “Decentralized topological mapping for multi-robot autonomous exploration under low-bandwidth communication,” in European Conference on Mobile Robots (Bonn, Germany: ECMR), 1–7. doi:10.1109/ECMR50962.2021.9568824

CrossRef Full Text | Google Scholar

Bayer, J., and Faigl, J. (2019). “Speeded up elevation map for exploration of large-scale subterranean environments,” In 2019 Modelling and Simulation for Autonomous Systems (Palermo, Italy: MESAS), 192–202. doi:10.1007/978-3-030-43890-615

CrossRef Full Text | Google Scholar

Bayer, J., and Faigl, J. (2020). “Speeded up elevation map for exploration of large-scale subterranean environments,” in 2020 Modelling and Simulation for autonomous systems (MESAS). 190–202.

CrossRef Full Text | Google Scholar

Bekhti, M. A., and Kobayashi, Y. (2016). “Prediction of vibrations as a measure of terrain traversability in outdoor structured and natural environments,” in Image and video technology, 282–294. doi:10.1007/978-3-319-29451-3_23

CrossRef Full Text | Google Scholar

Belter, D., Wietrzykowski, J., and Skrzypczyński, P. (2019). Employing natural terrain semantics in motion planning for a multi-legged robot. J. Intell. Robot. Syst. 93, 723–743. doi:10.1007/s10846-018-0865-x

CrossRef Full Text | Google Scholar

Binney, J., and Sukhatme, G. S. (2012). Branch and bound for informative path planning. IEEE Int. Conf. Robotics Automation (ICRA), 2147. –2154. doi:10.1109/ICRA.2012.6224902

CrossRef Full Text | Google Scholar

Bourgault, F., Makarenko, A. A., Williams, S. B., Grocholsky, B., and Durrant-Whyte, H. F. (2002). “Information based adaptive robotic exploration,” in IEEE/RSJ international conference on intelligent robots and systems (Lausanne, Switzerland: IROS), 540–545. doi:10.1109/IRDS.2002.1041446

CrossRef Full Text | Google Scholar

Bradley, D. M., Chang, J. K., Silver, D., Powers, M., Herman, H., Rander, P., et al. (2015). “Scene understanding for a high-mobility walking robot,” in IEEE/RSJ international conference on intelligent robots and systems (Hamburg, Germany: IROS), 1144–1151. doi:10.1109/IROS.2015.7353514

CrossRef Full Text | Google Scholar

Brown, D., and Webster, G. (2010). Now a stationary research platform, NASA’s Mars rover Spirit starts a new chapter in red planet scientific studies. Pasadena, CA: NASA Press Release.

Google Scholar

Carrillo, H., Dames, P., Kumar, V., and Castellanos, J. A. (2018). Autonomous robotic exploration using a utility function based on Rényi’s general theory of entropy. Auton. Robots 42, 235–256. doi:10.1007/s10514-017-9662-9

CrossRef Full Text | Google Scholar

Charrow, B., Liu, S., Kumar, V., and Michael, N. (2015). Information-theoretic mapping using cauchy-schwarz quadratic mutual information. IEEE Int. Conf. Robotics Automation (ICRA), 4791–4798. doi:10.1109/ICRA.2015.7139865

CrossRef Full Text | Google Scholar

Dang, T., Tranzatto, M., Khattak, S., Mascarich, F., Alexis, K., and Hutter, M. (2020). Graph-based subterranean exploration path planning using aerial and legged robots. J. Field Robot. 37, 1363–1388. doi:10.1002/rob.21993

CrossRef Full Text | Google Scholar

Faigl, J., and Čížek, P. (2019). Adaptive locomotion control of hexapod walking robot for traversing rough terrains with position feedback only. Robotics Aut. Syst. 116, 136–147. doi:10.1016/j.robot.2019.03.008

CrossRef Full Text | Google Scholar

Faigl, J., and Kulich, M. (2013). “On determination of goal candidates in frontier-based multi-robot exploration,” in European conference on mobile robots (Barcelona, Spain: ECMR), 210–215. doi:10.1109/ECMR.2013.6698844

CrossRef Full Text | Google Scholar

Faigl, J., Kulich, M., and Přeučil, L. (2012). “Goal assignment using distance cost in multi-robot exploration,” in IEEE/RSJ international conference on intelligent robots and systems (Vilamoura-Algarve, Portugal: IROS), 3741–3746. doi:10.1109/IROS.2012.6385660

CrossRef Full Text | Google Scholar

Fankhauser, P., Bloesch, M., Gehring, C., Hutter, M., and Siegwart, R. (2014). World Scientific, 433–440.Robot-centric elevation mapping with uncertainty estimatesMob. Serv. Robot.

CrossRef Full Text | Google Scholar

Forouhar, M., Čížek, P., and Faigl, J. (2021). “Scarab II: A small versatile six-legged walking robot,” in 5th full-day workshop on legged robots at IEEE international conference on robotics and automation (Xi’an, China: ICRA), 1–2.

Google Scholar

Fritzke, B. (1994). “A growing neural gas network learns topologies,” in Conference on neural information processing systems (Denver, CO: NIPS), 625–632.

Google Scholar

Gonzalez, R., and Iagnemma, K. (2018). Slippage estimation and compensation for planetary exploration rovers. State of the art and future challenges. J. Field Robotics 35, 564–577. doi:10.1002/rob.21761

CrossRef Full Text | Google Scholar

Guastella, D. C., and Muscato, G. (2021). Learning-based methods of perception and navigation for ground vehicles in unstructured environments: A review. Sensors 21, 73. doi:10.3390/s21010073

CrossRef Full Text | Google Scholar

Guerrero, E., Bonin-Font, F., and Oliver, G. (2021). Adaptive visual information gathering for autonomous exploration of underwater environments. IEEE Access 9, 136487–136506. doi:10.1109/ACCESS.2021.3117343

CrossRef Full Text | Google Scholar

Haddeler, G., Chan, J., You, Y., Verma, S., Adiwahono, A. H., and Meng Chew, C. (2020). “Explore bravely: Wheeled-legged robots traverse in unknown rough environment,” in IEEE/RSJ international conference on intelligent robots and systems (Las Vegas, NV: IROS), 7521–7526. doi:10.1109/IROS45743.2020.9341610

CrossRef Full Text | Google Scholar

Helsgaun, K. (2000). An effective implementation of the lin-kernighan traveling salesman heuristic. Eur. J. Operational Res. 126, 106–130. doi:10.1016/s0377-2217(99)00284-2

CrossRef Full Text | Google Scholar

Hollinger, G. A., and Sukhatme, G. S. (2014). Sampling-based robotic information gathering algorithms. Int. J. Rob. Res. 33, 1271–1287. doi:10.1177/0278364914533443

CrossRef Full Text | Google Scholar

Homberger, T., Bjelonic, M., Kottege, N., and Borges, P. V. K. (2016). “Terrain-dependant control of hexapod robots using vision,” in International symposium on experimental robotics (Nagasaki, Japan: ISER), 92–102. doi:10.1007/978-3-319-50115-4_9

CrossRef Full Text | Google Scholar

Karolj, V., Viseras, A., Merino, L., and Shutin, D. (2020). An integrated strategy for autonomous exploration of spatial processes in unknown environments. Sensors 20, 3663. doi:10.3390/s20133663

PubMed Abstract | CrossRef Full Text | Google Scholar

Kottege, N., Parkinson, C., Moghadam, P., Elfes, A., and Singh, S. P. N. (2015). Energetics-informed hexapod gait transitions across terrains. IEEE Int. Conf. Robotics Automation (ICRA), 5140–5147. doi:10.1109/ICRA.2015.7139915

CrossRef Full Text | Google Scholar

Krajník, T., Fentanes, J. P., Santos, J. M., and Duckett, T. (2017). Fremen: Frequency map enhancement for long-term mobile robot autonomy in changing environments. IEEE Trans. Robot. 33, 964–977. doi:10.1109/TRO.2017.2665664

CrossRef Full Text | Google Scholar

Krüsi, P., Bosse, M., and Siegwart, R. (2016). Driving on point clouds: Motion planning, trajectory optimization, and terrain assessment in generic nonplanar environments. J. Field Robot. 34, 940–984. doi:10.1002/rob.21700

CrossRef Full Text | Google Scholar

Lin, B., and Song, S. (1993). Dynamic modeling, stability and energy efficiency of a quadrupedal walking machine. J. Robot. Syst. 18, 657–670. doi:10.1002/rob.8104

CrossRef Full Text | Google Scholar

Luo, W., and Sycara, K. (2018). Adaptive sampling and online learning in multi-robot sensor coverage with mixture of Gaussian processes. IEEE Int. Conf. Robotics Automation (ICRA), 6359–6364. doi:10.1109/ICRA.2018.8460473

CrossRef Full Text | Google Scholar

Ma, K.-C., Liu, L., Heidarsson, H. K., and Sukhatme, G. S. (2018). Data-driven learning and planning for environmental sampling. J. Field Robot. 35, 643–661. doi:10.1002/rob.21767

CrossRef Full Text | Google Scholar

Makarenko, A. A., Williams, S. B., Bourgault, F., and Durrant-Whyte, H. F. (2002). in IEEE/RSJ international conference on intelligent robots and systems, 1, An experiment in integrated exploration534–539. doi:10.1109/IRDS.2002.1041445(IROS)

CrossRef Full Text

Mann, H. B., and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60. doi:10.1214/aoms/1177730491

CrossRef Full Text | Google Scholar

Martin, S., and Corke, P. (2014). Long-term exploration & tours for energy constrained robots with online proprioceptive traversability estimation. IEEE Int. Conf. Robotics Automation (ICRA), 5778–5785. doi:10.1109/ICRA.2014.6907708

CrossRef Full Text | Google Scholar

Mayuku, O., Surgenor, B. W., and Marshall, J. A. (2021). “A self-supervised near-to-far approach for terrain-adaptive off-road autonomous driving,” in IEEE international conference on robotics and automation (Xi’an, China: ICRA), 14054–14060. doi:10.1109/ICRA48506.2021.9562029

CrossRef Full Text | Google Scholar

McGhee, R. B., and Frank, A. A. (1968). On the stability properties of quadruped creeping gaits. Math. Biosci. 3, 331–351. doi:10.1016/0025-5564(68)90090-4

CrossRef Full Text | Google Scholar

Moravec, H., and Elfes, A. (1985). “High resolution maps from wide angle sonar,” in 1985 IEEE international conference on robotics and automation proceedings, 116–121. doi:10.1109/ROBOT.1985.1087316

CrossRef Full Text | Google Scholar

Noon, C. E., and Bean, J. C. (1993). An efficient transformation of the generalized traveling salesman problem. INFOR Inf. Syst. Operational Res. 31, 39–44. doi:10.1080/03155986.1993.11732212

CrossRef Full Text | Google Scholar

Noon, C. E. (1988). The generalized traveling salesman problem. Ann Arbor, MI: Ph.D. thesis, University of Michigan.

Google Scholar

O’Callaghan, S., Ramos, F. T., and Durrant-Whyte, H. (2009). Contextual occupancy maps using Gaussian processes. IEEE Int. Conf. Robotics Automation (ICRA), 1054–1060. doi:10.1109/ROBOT.2009.5152754

CrossRef Full Text | Google Scholar

O’Meadhra, C., Tabib, W., and Michael, N. (2019). Variable resolution occupancy mapping using Gaussian mixture models. IEEE Robot. Autom. Lett. 4, 2015–2022. doi:10.1109/LRA.2018.2889348

CrossRef Full Text | Google Scholar

Ossenkopf, M., Castro, G., Pessacg, F., Geihs, K., and De Cristóforis, P. (2019). “Long-Horizon Active SLAM system for multi-agent coordinated exploration,” in European conference on mobile robots (Prague, Czech Republic: ECMR), 1–6. doi:10.1109/ECMR.2019.8870952

CrossRef Full Text | Google Scholar

Papadakis, P. (2013). Terrain traversability analysis methods for unmanned ground vehicles: A survey. Eng. Appl. Artif. Intell. 26, 1373–1385. doi:10.1016/j.engappai.2013.01.006

CrossRef Full Text | Google Scholar

Pasolli, E., and Melgani, F. (2011). Gaussian process regression within an active learning scheme. IEEE Int. Geoscience Remote Sens. Symposium, 3574–3577. doi:10.1109/IGARSS.2011.6049994

CrossRef Full Text | Google Scholar

Prágr, M., Čížek, P., Bayer, J., and Faigl, J. (2019a). “Online incremental learning of the terrain traversal cost in autonomous exploration,” in Robotics: Science and systems, (RSS) (Freiburg im Breisgau, Germany). 1–10. doi:10.15607/RSS.2019.XV.040

CrossRef Full Text | Google Scholar

Prágr, M., Čížek, P., and Faigl, J. (2018). “Cost of transport estimation for legged robot based on terrain features inference from aerial scan,” in IEEE/RSJ international conference on intelligent robots and systems (IROS) (Prague, Czech Republic: IEEE), 1745–1750. doi:10.1109/IROS.2018.8593374

CrossRef Full Text | Google Scholar

Prágr, M., Čížek, P., and Faigl, J. (2019b). “Incremental learning of traversability cost for aerial reconnaissance support to ground units,” in 2018 modelling and simulation for autonomous systems (Prague, Czech Republic: MESAS), 412–421. doi:10.1007/978-3-030-14984-0_30

CrossRef Full Text | Google Scholar

Prudent, Y., and Ennaji, A. (2005). An incremental growing neural gas learns topologies. Int. Jt. Conf. Neural Netw. (IJCNN) 2, 1211–1216. doi:10.1109/IJCNN.2005.1556026

CrossRef Full Text | Google Scholar

Quann, M., Ojeda, L., Smith, W., Rizzo, D., Castanier, M., and Barton, K. (2020). Off-road ground robot path energy cost prediction through probabilistic spatial mapping. J. Field Robot. 37, 421–439. doi:10.1002/rob.21927

CrossRef Full Text | Google Scholar

Quigley, M., Conley, K., Gerkey, B. P., Faust, J., Foote, T., Leibs, J., et al. (2009). ICRA Workshop on Open Source Software, 1–6.Ros: An open-source robot operating system.

PubMed Abstract | Google Scholar

Ramos, F., and Ott, L. (2016). Hilbert maps: Scalable continuous occupancy mapping with stochastic gradient descent. Int. J. Rob. Res. 35, 1717–1730. doi:10.1177/0278364916684382

CrossRef Full Text | Google Scholar

Rasmussen, C. E., and Williams, C. K. I. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge, Mass: MIT Press.

Google Scholar

Rényi, A. (1961). On measures of entropy and information. Berkeley Symposium Math. Statistics Probab., 547–561.

Google Scholar

Rhodes, C., Liu, C., and Chen, W.-H. (2020). “Informative path planning for gas distribution mapping in cluttered environments,” in IEEE/RSJ international conference on intelligent robots and systems (Las Vegas, NV: IROS), 6726–6732. doi:10.1109/IROS45743.2020.9341781

CrossRef Full Text | Google Scholar

Schultz, A. C., Adams, W., and Yamauchi, B. (1999). Integrating exploration, localization, navigation and planning with a common representation. Auton. Robots 6, 293–308. doi:10.1023/A:1008936413435

CrossRef Full Text | Google Scholar

Shi, Y., Wang, N., Zheng, J., Zhang, Y., Yi, S., Luo, W., et al. (2020). “Adaptive informative sampling with environment partitioning for heterogeneous multi-robot systems,” in IEEE/RSJ international conference on intelligent robots and systems (Las Vegas, NV: IROS), 11718–11723. doi:10.1109/IROS45743.2020.9341711

CrossRef Full Text | Google Scholar

Singh, A., Krause, A., Guestrin, C., Kaiser, W., and Batalin, M. (2007). “Efficient planning of informative paths for multiple robots,” in International joint conference on artifical intelligence, 2204–2211.

Google Scholar

Sofman, B., Lin, E., Bagnell, J. A., Cole, J., Vandapel, N., and Stentz, A. (2006). Improving robot navigation through self-supervised online learning. J. Field Robot. 23, 1059–1075. doi:10.1002/rob.20169

CrossRef Full Text | Google Scholar

Srinivas, N., Krause, A., Kakade, S., and Seeger, M. (2010). “Gaussian process optimization in the bandit setting: No regret and experimental design,” in Intl. Conf. International conference on machine learning (ICML) (Haifa, Israel, 1015–1022.

Google Scholar

Stachniss, C., Grisetti, G., and Burgard, W. (2005). “Information gain-based exploration using rao-blackwellized particle filters,” in Robotics: Science and systems, 1–8. doi:10.15607/RSS.2005.I.009

CrossRef Full Text | Google Scholar

Stelzer, A., Hirschmüller, H., and Görner, M. (2012). Stereo-vision-based navigation of a six-legged walking robot in unknown rough terrain. Int. J. Rob. Res. 31, 381–402. doi:10.1177/0278364911435161

CrossRef Full Text | Google Scholar

Vallvé, J., and Andrade-Cetto, J. (2015). Potential information fields for mobile robot exploration. Robotics Aut. Syst. 69, 68–79. doi:10.1016/j.robot.2014.08.009

CrossRef Full Text | Google Scholar

Viseras, A., Shutin, D., and Merino, L. (2019). Robotic active information gathering for spatial field reconstruction with rapidly-exploring random trees and online learning of Gaussian processes. Sensors 19, 1016. doi:10.3390/s19051016

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, J., and Englot, B. (2016). Fast, accurate Gaussian process occupancy maps via test-data octrees and nested Bayesian fusion. IEEE Int. Conf. Robotics Automation (ICRA), 1003–1010. doi:10.1109/ICRA.2016.7487232

CrossRef Full Text | Google Scholar

Wermelinger, M., Fankhauser, P., Diethelm, R., Krüsi, P., Siegwart, R., and Hutter, M. (2016). “Navigation planning for legged robots in challenging terrain,” in IEEE/RSJ international conference on intelligent robots and systems, 1184–1189. doi:10.1109/IROS.2016.7759199

CrossRef Full Text | Google Scholar

Yamauchi, B. (1997). A frontier-based approach for autonomous exploration. CIRA (IEEE), 146–151. doi:10.1109/CIRA.1997.613851

CrossRef Full Text | Google Scholar

Zlot, R., and Stentz, A. (2006). Market-based multirobot coordination for complex tasks. Int. J. Rob. Res. 25, 73–101. doi:10.1177/0278364906061160

CrossRef Full Text | Google Scholar

Keywords: mobile robot exploration, active learning, traversability, multi-legged robot, locomotion gait

Citation: Prágr M, Bayer J and Faigl J (2022) Autonomous robotic exploration with simultaneous environment and traversability models learning. Front. Robot. AI 9:910113. doi: 10.3389/frobt.2022.910113

Received: 31 March 2022; Accepted: 23 August 2022;
Published: 05 October 2022.

Edited by:

Luis Rodolfo Garcia Carrillo, New Mexico State University, United States

Reviewed by:

Arturo Gil Aparicio, Miguel Hernández University of Elche, Spain
Hengameh Mirhajianmoghadam, New Mexico State University, United States

Copyright © 2022 Prágr, Bayer and Faigl. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Miloš Prágr, cHJhZ3JtaTFAZmVsLmN2dXQuY3o=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.