This paper proposes a Bayesian surprise learning algorithm that internally motivates the cognitive radar to estimate a target's state (i.e., velocity, distance) from noisy measurements and make decisions to reduce the estimation error gradually. The work exhibits how the sensor learns from experiences, anticipates future responses, and adjusts its waveform parameters to achieve informative measurements based on the Bayesian surprise.
For a simple vehicle-following scenario where the radar measurements are generated from linear Gaussian state-space models, the article adopts the Kalman filter to carry out state estimation. According to the information within the filter's estimate, the sensor intrinsically assigns a surprise-based reward value to the immediate past action and updates the value-to-go function. Through a series of hypothetical steps, the cognitive radar considers the impact of future transmissions for a prescribed set of waveforms–available from the sensor profile library–to improve the estimation process.
Numerous experiments investigate the performance of the proposed design for various surprise-based reward expressions. The robustness of the proposed method is compared to the state-of-the-art for practical and risky driving situations. Results show that the reward functions inspired by estimation credibility measures outperform their competitors when one-step planning is considered. Simulation results also indicate that multiple-step planning does not necessarily lead to lower error, particularly when the environment changes abruptly.