Corrigendum: Optimizing trajectories for highway driving with offline reinforcement learning

Mirchevska, Branka; Werling, Moritz; Boedecker, Joschka

doi:10.3389/ffutr.2023.1320940

CORRECTION article

Front. Future Transp. , 11 December 2023

Sec. Connected Mobility and Automation

Volume 4 - 2023 | https://doi.org/10.3389/ffutr.2023.1320940

Corrigendum: Optimizing trajectories for highway driving with offline reinforcement learning

This article is a correction to:

Optimizing trajectories for highway driving with offline reinforcement learning
1. Read original article

Branka Mirchevska¹*

Moritz Werling²

Joschka Boedecker^1,3

¹Department of Computer Science, University of Freiburg, Freiburg, Germany
²BMW Group, Munich, Germany
³IMBIT // BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany

A Corrigendum on
Optimizing trajectories for highway driving with offline reinforcement learning

by Mirchevska B, Werling M and Boedecker J (2023). Front. Future Transp. 4:1076439. doi: 10.3389/ffutr.2023.1076439

In the published article, there was an error. Algorithm 2: a_lo should be $a_{l a t_{p}}$ .

A correction has been made to 3 Approach, 3.2 Decision making. This sentence previously stated:

“ $π_{θ} (s) = (a_{t v}, a_{l a t_{d}}, a_{l o n_{d}}, a_{l o})$ .”

The corrected sentence appears below:

“ $π_{θ} (s) = (a_{t v}, a_{l a t_{d}}, a_{l o n_{d}}, a_{l a t_{p}})$ .”

In the published article, there was an error. Algorithm 2: a_lo should be $a_{l a t_{p}}$ .

A correction has been made to 3 Approach, 3.2 Decision making. This sentence previously stated:

“ $t = g e n e r a t e_t r a j (s, a_{t v}, a_{l a t_{d}}, a_{l o n_{d}}, a_{l o})$ .”

The corrected sentence appears below:

“ $t = g e n e r a t e_t r a j (s, a_{t v}, a_{l a t_{d}}, a_{l o n_{d}}, a_{l a t_{p}})$ .”

A correction has been made to 4 MDP Formalization, 4.3 Reward. This sentence previously stated:

“For the first objective, not causing collisions and remaining within the road boundaries, we define an indicator ind_f signaling when the agent has failed in the following way:”

The corrected sentence appears below:

“For the first objective, not causing collisions and remaining within the road boundaries, we define an indicator f signaling when the agent has failed in the following way:”

A correction has been made to 4 MDP Formalization, 4.3 Reward. This equation previously stated:

i n d_{f} = \{\begin{cases} 1, & if the agent has failed \\ 0, & otherwise \end{cases} (1)

The corrected equation appears below:

f = \{\begin{cases} 1, & if the agent has failed \\ 0, & otherwise \end{cases} (1)

A correction has been made to 4 MDP Formalization, 4.3 Reward. This equation previously stated:

i n d_{v} = \{\begin{cases} 1, & v_{lon} < v_{des} \\ 0, & otherwise \end{cases} (3)

The corrected equation appears below:

v_{s} = \{\begin{cases} 1, & v_{lon} < v_{des} \\ 0, & otherwise \end{cases} (3)

A correction has been made to 4 MDP formalization, 4.3 Reward. This equation previously stated:

\begin{align} r (s, a) = i n d_{f} (- 0.5) + (1 - i n d_{f}) [i n d_{v} (1 - δ_{v} / v_{des}) + (1 - i n d_{v}) \\ + i n d_{jlon} (p j_{lon} (s q j_{lon} (a) / j_{lon}^{\max})) + (1 - i n d_{jlon}) (p j_{lon}) \\ + i n d_{jlat} (p j_{lat} (s q j_{lat} (a) / j_{lat}^{\max})) + (1 - i n d_{jlat}) (p j_{lat})] \end{align} (7)

The corrected equation appears below:

\begin{align} r (s, a) = f (- 0.5) + (1 - f) [v_{s} (1 - δ_{v} / v_{des}) + (1 - v_{s}) \\ + i n d_{jlon} (p j_{lon} (s q j_{lon} (a) / j_{lon}^{\max})) + (1 - i n d_{jlon}) (p j_{lon}) \\ + i n d_{jlat} (p j_{lat} (s q j_{lat} (a) / j_{lat}^{\max})) + (1 - i n d_{jlat}) (p j_{lat})] \end{align} (7)

A correction has been made to 6 Experiments and results, 6.3 Smoothness analysis. This equation previously stated:

\begin{align} r (s, a) = f (- 0.5) + (1 - f) [v_{s} (1 - δ_{vel} / v_{des}) + (1 - v_{s}) \\ + j_{s} (j_{r w} (- j_{cost} (a) / j_{cost}^{u b})) + (1 - j_{s}) (- j_{r w})] \end{align} (8)

The corrected equation appears below:

\begin{align} r (s, a) = f (- 0.5) + (1 - f) [v_{s} (1 - δ_{v} / v_{des}) + (1 - v_{s}) \\ + j_{s} (j_{r w} (- j_{cost} (a) / j_{cost}^{u b})) + (1 - j_{s}) (- j_{r w})] \end{align} (8)

A correction has been made to 6 Experiments and results, 6.3 Smoothness analysis. This sentence previously stated:

“The results indicate that the best performance in terms of jerk is yielded when the reward function from Eq. 8 is used and when j_w is assigned a value around 2. However, is important to note that the performance is not very sensitive to the value chosen for j_w and performs similarly well in a range of values. It is interesting to note that when the value for j_w is too low, e.g., 0.5, the agent deems the jerk-related reward component less significant which results in higher jerk values.”

The corrected sentence appears below:

“The results indicate that the best performance in terms of jerk is yielded when the reward function from Eq. 8 is used and when j_rw is assigned a value around 2. However, is important to note that the performance is not very sensitive to the value chosen for j_rw and performs similarly well in a range of values. It is interesting to note that when the value for j_rw is too low, e.g., 0.5, the agent deems the jerk-related reward component less significant which results in higher jerk values.”

A correction has been made to Appendix, Trajectory generation details. This equation previously stated:

\begin{align} t r a j_{lonp} = b_{0} + b_{1} t + b_{2} t^{2} + b_{3} t^{3} + b_{4} t^{4} \\ where t = \{0.0, d t, 2 d t, \dots, a_{l o n_{p}} d t\} \end{align} (A1)

The corrected equation appears below:

\begin{align} t r a j_{lonp} = b_{0} + b_{1} t + b_{2} t^{2} + b_{3} t^{3} + b_{4} t^{4} \\ where t = \{0.0, d t, 2 d t, \dots, a_{l o n_{d}}\} \end{align} (A1)

A correction has been made to Appendix, Trajectory generation details. This equation previously stated:

\begin{align} t r a j_{latp} = c_{0} + c_{1} t + c_{2} t^{2} + c_{3} t^{3} + c_{4} t^{4} + c_{5} t^{5} \\ where t = \{0.0, d t, 2 d t, \dots, a_{l a t_{p}} d t\} \end{align} (A2)

The corrected equation appears below:

\begin{align} t r a j_{latp} = c_{0} + c_{1} t + c_{2} t^{2} + c_{3} t^{3} + c_{4} t^{4} + c_{5} t^{5} \\ where t = \{0.0, d t, 2 d t, \dots, a_{l a t_{d}}\} \end{align} (A2)

The authors apologize for these errors and state that this does not change the scientific conclusions of the article in any way. The original article has been updated.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Keywords: reinforcement learning, trajectory optimization, autonomous driving, offline reinforcement learning, continuous control

Citation: Mirchevska B, Werling M and Boedecker J (2023) Corrigendum: Optimizing trajectories for highway driving with offline reinforcement learning. Front. Future Transp. 4:1320940. doi: 10.3389/ffutr.2023.1320940

Received: 13 October 2023; Accepted: 09 November 2023;
Published: 11 December 2023.

Approved by:

Frontiers Editorial Office, Frontiers Media SA, Switzerland

Copyright © 2023 Mirchevska, Werling and Boedecker. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Branka Mirchevska, bWlyY2hldmJAaW5mb3JtYXRpay51bmktZnJlaWJ1cmcuZGU=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Corrigendum: Optimizing trajectories for highway driving with offline reinforcement learning

Publisher’s note

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good