- 1Department of Electrical and Computer Engineering, Volgenau School of Engineering, George Mason University, Fairfax, VA, United States
- 2Department of Advanced Manufacturing and Robotics, Peking University, Beijing, China
A rapidly growing field of aquatic bio-inspired soft robotics takes advantage of the underwater animals’ bio-mechanisms, where its applications are foreseen in a vast domain such as underwater exploration, environmental monitoring, search and rescue, oil-spill detection, etc. Improved maneuverability and locomotion of such robots call for designs with higher level of biomimicry, reduced order of complex modeling due to continuum elastic dynamics, and challenging robust nonlinear controllers. This paper presents a novel design of a soft robotic fish actively actuated by a newly developed kind of artificial muscles—super-coiled polymers (SCP) and passively propelled by a caudal fin. Besides SCP exhibiting several advantages in terms of flexibility, cost and fabrication duration, this design benefits from the SCP’s significantly quicker recovery due to water-based cooling. The soft robotic fish is approximated as a 3-link representation and mathematically modeled from its geometric and dynamic perspectives to constitute the combined system dynamics of the SCP actuators and hydrodynamics of the fish, thus realizing two-dimensional fish-swimming motion. The nonlinear dynamic model of the SCP driven soft robotic fish, ignoring uncertainties and unmodeled dynamics, necessitates the development of robust/intelligent control which serves as the motivation to not only mimic the bio-mechanisms, but also mimic the cognitive abilities of a real fish. Therefore, a learning-based control design is proposed to meet the yaw control objective and study its performance in path following via various swimming patterns. The proposed learning-based control design employs the use of deep-deterministic policy gradient (DDPG) reinforcement learning algorithm to train the agent. To overcome the limitations of sensing the soft robotic fish’s states by designing complex embedded sensors, overhead image-based observations are generated and input to convolutional neural networks (CNNs) to deduce the curvature dynamics of the soft robot. A linear quadratic regulator (LQR) based multi-objective reward is proposed to reinforce the learning feedback of the agent during training. The DDPG-based control design is simulated and the corresponding results are presented.
1 Introduction
The nascent field of bio-inspired robotics has gained a huge popularity over the past 2 decades with numerous designs and developments contributed to the community (Pfeifer et al., 2007; Kim et al., 2013; Shi et al., 2015; Laschi et al., 2016; Christianson et al., 2019; Olsen and Kim, 2019), envisioning their applications in domains such as environmental monitoring, deep-sea exploration, search and rescue, and disaster response (Morgansen et al., 2007; Zheng Chen et al., 2010; Marchese et al., 2014; Phamduy et al., 2015). Taking advantage of natural biological structures, functions, and motions of aquatic animals aids us in creating underwater robots which are energy and locomotion efficient, and possess agile maneuverability, for a diverse range of purposes. Our research focuses on developing a biomimetic underwater soft robotic fish that can self-learn its locomotion to achieve different goals such as regulating its angle of orientation and adapting to variable swimming speeds (Rajendran and Zhang, 2018), which eventually serve as decomposed control tasks for high-level control objectives such as traversing along a planned trajectory and studying fish swarming behavior like schooling and shoaling.
The biological fish that employ body/caudal fin for propulsion typically adopt one of the following swimming styles, namely carangiform, sub-carangiform, anguilliform, and thunniform (Videler, 1993). Most of the traditional robotic fish prototypes designed in the past, comprise of two or more serially connected structures (Wen et al., 2012; Zhong et al., 2017), whose coordinated discrete movements result in undulations mimicking one of these swimming styles. The body of these robots are structurally constructed using rigid materials such as plastic, metal and glass-fiber (Raj and Thakur, 2016), which consequently increases the rigidity and mass of the robot. To overcome this limitation, over the past demi-decade, researchers have been exploring the usage of soft materials (Lauder et al., 2011) such as silicone rubber/elastomer (Katzschmann et al., 2018), silicone prepolymer (Aubin et al., 2019) and silk hydrogel (Donatelli et al., 2018) to construct the body of the fish robot (Olsen and Kim, 2019). The adoption of such soft materials in the construction of the robotic fish greatly contributes towards mimicking the flexibility of the biological fish body, thus generating a continuous deformation and streamlined displacement of water.
Traditional actuators such as electrical motors and pneumatic/hydraulic cylinders which are employed to realize fish undulations in the aforementioned multi-link robotic fish prototypes, although offer a high output force/torque, are generally heavy and quite rigid, thus making fish robots less flexible. Hence, the use of soft actuators such as artificial muscles like pneumatic artificial muscles (PAM), ionic polymer-metal composites (IPMC) (Chen, 2017; Olsen and Kim, 2019), dielectric elastomer actuators (Christianson et al., 2019), and super-coiled polymers (SCP) (Yip and Niemeyer, 2017; Rajendran and Zhang, 2018; Simeonov et al., 2018) is on the rise. Not only are artificial muscles slender, but also strong, flexible, lightweight, and analogously compliant to biological muscles. This offers appealing advantages to fish robots in terms of flexibility, maneuverability, propulsive energy efficiency and the ability to precisely mimic the biological fish from its anatomical perspective.
Over the past 3 decades, researchers from a wide field of disciplines have performed numerous visual experiments and numerical analysis to study and model the various swimming styles in different species of fish (Triantafyllou et al., 2000; Lauder, 2015; Webb and Gerstner, 2021). Most of the traditional models follow Lighthill’s elongated-body theory describing fish locomotion as traveling waves (Lighthill, 1971), or employ a mathematical dynamic model derived via system identification. As contemporary research focuses on mimicking the physical and biological structure and function of aquatic animals using soft materials, the necessity of arriving at a precise dynamic model for motion prediction and controller design is also simultaneously increasing. Nevertheless, this is becoming correspondingly difficult due to the continuum dynamics and high dimensionality involved in soft robots.
While different classical and modern control techniques have been analytically researched and experimentally developed, the nonlinearity of contemporary soft robots keeps rising continuously. As several robotic fish prototypes adopt various closed-loop control techniques such as PID control (Yu et al., 2004; Berlinger et al., 2021), PI control (Zhang et al., 2015a), central pattern generator control (Jeong et al., 2011), pre-trained neural networks (Thuruthel et al., 2019), robust control (Zhang et al., 2015b), to improve the performance of locomotion, others employ open-loop control techniques whereby a predefined swimming profile is generated to perform a coded set of actions (lookup table) which is predominantly used in cases of complex or highly nonlinear robotic fish dynamic models (Yu and Wang, 2005; Korkmaz et al., 2012). However, in order to address the problems of high nonlinearity and intrinsically infinite system dimension, researchers are looking into various present-day techniques in artificial intelligence (Rajendran and Zhang, 2018; Bhagat et al., 2019; Thuruthel et al., 2019), more specifically behavior-based or adaptive machine learning-based control.
Our previous work investigated the performance of SCP actuators while submerged in water and the compatibility of using SCP in a simple robotic fish model (Rajendran and Zhang, 2017). SCP, a recently developed artificial muscle actuator, is lightweight, flexible, strong with a high power-to-weight ratio and fabricated with silver-plated nylon threads (Yip and Niemeyer, 2017). Our study also showed through simulation that speed control of a one-dimensional robotic fish was successfully done with SCP actuators using reinforcement learning (Rajendran and Zhang, 2018; Sutton and Barto, 2018). Nevertheless, besides employing a sparsely discretized state space in the dynamics, our previous model is dimensionally limited which is too simplified to mimic the biological fish and study the swimming motion. This enforced the use of a lookup table which comprised of all the state-action combinations. However, since physical robots comprise of continuous action and state spaces, the use of Q-learning algorithm (Watkins and Dayan, 1992) in such a continuous environment would require an enormous lookup table, as a result, drastically increasing the number of computations.
In this paper, we propose a novel approach in designing a soft robotic fish using antagonistically arranged SCP artificial muscle actuators. The soft robotic fish is modeled geometrically as a three-link model combined with the antagonistic configuration of the SCP muscles, and modeled dynamically by incorporating the SCP actuator dynamics (Rajendran and Zhang, 2017; Yip and Niemeyer, 2017) with the hydrodynamic forces (Wang et al., 2015) to describe its two-dimensional swimming motion. To overcome the predicament of having a highly nonlinear and multi-dimensional control system, in addition to consideration of control computation times, this paper proposes a learning-based controller design approach for the dynamically modeled soft robotic fish using an improved, continuous reinforcement learning method, namely deep deterministic policy gradient (DDPG) algorithm (Lillicrap et al., 2015), which adopts an actor network to perform an action given a state, and a critic network to criticize the chosen action. To exemplify the use of DDPG in the dynamic model, this paper investigates the closed-loop control of the swimming orientation and path following of the soft robotic fish on a 2D plane.
This paper is organized as follows. Section 2 gives a brief overview on the experimental performance of SCP muscles when submerged in water. Section 3 presents the design of a three link soft robotic fish and its two-dimensional dynamic model. Section 4 illustrates and elucidates the geometric and dynamic model of the robotic fish. Section 5 proposes the deep-deterministic policy gradient learning based control design for the soft robotic fish to self-learn its swimming profiles to regulate the orientation and achieve path following by the fish. Simulation results are presented to validate the proposed controller design in Section 6. Finally, conclusion remarks are provided in Section 7.
2 Preliminary Background
Our previous work presented a two-link flapping prototype driven by an SCP muscle actuator and investigated its performance by submerging and testing the entire two-link prototype in ordinary non-deionized non-conductive tap water at room temperature (Rajendran and Zhang, 2017). As a proof of concept of the SCP actuation, we conducted the experiment using one 2-ply muscle as shown in Figure 1A, which was attached to one side of the two-links connecting both the ends spaced at 2.5 cm away from the links. Initially, only a little deformation (less than 0.5%) was observed in the SCP actuators when immersed in water. We conjecture that this comes from the fast heat dissipation in water, which eventually causes the muscle to hardly contract. To overcome this problem the muscle was coated with silicone conformal spray along with a layer of siliconized acrylic caulk as shown in Figure 1B and also a higher voltage (2 V per centimeter of the muscle) for excitation was applied. This resulted in a deformation of around 1%, eventually causing the flap angle to change by 16 degrees approximately. Moreover, the time taken for the flap to return to its original position was around 2 s on average, which is five times faster than when tested in air. From the results, it was evident that the recovery speed of the SCP actuator was significantly improved when tested in water. However, the maximum attainable flap angle became smaller in water. Also, a higher voltage had to be applied to the SCP actuator thus consuming more power. Having made these inferences, it comes to a design trade-off between actuation/recovery speed and energy consumption when using enhanced SCP actuators for underwater robots like robotic fish. With the proposed antagonistic design and muscle contraction in alternating directions, fish-like swimming is achievable with the SCP actuators.
FIGURE 1. SCP artificial muscles (Rajendran and Zhang, 2017). (A) One 2-ply SCP muscle coated with silicone and acrylic caulk; (B) three 2-ply SCP muscles twined together.
Following this, aiming towards a phased approach at developing reinforcement learning-based control for the soft robotic fish, a foundational Q-learning (Watkins and Dayan, 1992) based controller was designed and simulated to control the speed of a three-link robotic fish which consisted of discretized state and action spaces (Rajendran and Zhang, 2018). The robotic fish was restricted to one-dimensional locomotion and the agent was trained until the Frobenius norm between the current and previous Q-tables was minimized to a threshold. We observed from the simulation results that the robotic fish followed the learned swimming profile and regulated the speed to the reference value with a very small speed control error. Eventually, the averaged acceleration became zero, thus maintaining a quasi-steady-state forward swimming velocity. Another interesting observation was that the agent forcefully went to its resting state, i.e., all actuators at rest, in order to lower the speed when it exceeded the desired velocity. Likewise, with different desired velocities, we found a difference in the flapping frequency and amplitude. Considering the coarse scale of discretization, we consider the learning based speed control design succeeded in the simulation example, thus promising a scope to design advanced learning-based controllers for continuous action and state spaced robots.
3 Design of a 3-Link Soft Robotic Fish
The design of our soft robotic fish as shown in Figure 2, is inspired by the natural and biological structure of Tilapia cichlid fish species, which is specifically chosen to moderate the amount of volumetric material in the construction of the soft robotic fish body, and to build a lighter robot for greater maneuverability. The entire 3D model of the fish is designed using freeform modeling in AutoDesk Inventor, by tracing the front, side and top views of the cichlid fish as shown in Figures 3A–C, to maintain the shape of a streamlined body. Two symmetric molds are designed based on the generated CAD fish model and then 3D printed using PLA filament as shown in Figure 3D. These molds are then casted with Ecoflex 00–20 silicone rubber by Smooth-On with a curation period of 4 h.
FIGURE 2. Soft robotic fish with passive caudal fin, bundled SCP actuator and pole extensions attached.
FIGURE 3. Soft robotic fish design components. (A–C) Illustration of the robotic fish CAD design, from left to right: front, side and top views (Rajendran and Zhang, 2018); (D) 3D-printed fish molds (Rajendran and Zhang, 2018); (E) 3-link hinged attachment.
Once the silicone rubber bodies are cured, three links which form the skeletal bone of the fish to provide rigidity to the robot’s soft body in the process of actuation, are designed and 3D printed. The three links are attached in series together using the hinges on the links as shown in Figure 3E and by inserting straightened steel paper clips to provide a medium of pivoting. To form the electrical connections, steel crimps and copper tapes are attached around the poles on both sides of the links. The poles on the first and third links are connected together to form the common ground terminal. Long flexible wires are connected to the rest of the four poles on the second link, and one wire to the ground terminal, resulting in five wires that exit the robot.
To increase the propulsion efficiency of the robot, a truncated flat type passive caudal fin is attached close to link three using a flexible silicone rubber adhesive. This fin is casted on a 3D designed and printed shallow mold, using the same silicone rubber material. Within 12 min of the material being casted, thinly 3D printed semi-flexible rods which mimic the fin rays in a caudal fin are placed on a growing fashion in the casted mold, so that the fin rays are submerged, thus forming a semi-flexible caudal fin once cured. Two pole extensions are attached on the newer version of our soft robotic fish in order to provide more room for the bundled SCP actuator, consequently exhibiting more deformation in the actuator resulting in higher deflection of the tail. The pole extensions also have the ability to house multiple actuators in parallel.
4 3-Link Robotic Fish Model
The soft robotic fish is modeled from its geometrical and dynamical perspectives. In this paper, the soft robotic fish is constrained to a planar swimming motion, thus fixating its altitude.
4.1 Geometric Model
The geometry of the 3-link fish robot with the artificial muscle actuators attached, is illustrated in Figure 4A, is defined with respect to the soft robotic fish’s body or local reference frame
where di is the deformation ratio between the current and original resting length of a muscle mi satisfying i ∈ (1, 2, 3, 4), and [[ (⋅) ]] denotes the Iverson bracket such that [[ (condition) ]] = 1 when the condition is true and equal to 0 otherwise (Knuth, 1992). The coordinated actuation of these SCP muscles causes deformation with respect to their lengths, consequently, causing flapping movements of the links l1 and/or l3 with respect to link l2. The angles formed due to the rotations of links l1 and l3 around joints j1 and j2 are denoted by the flap or deflection angles
where δi2 and δi3 are Kronecker delta functions, and i represents the current muscle which is activated. From past research conducted by fish biologists and roboticists, a maximum oscillatory amplitude by a flap angle of 25° is adequate (Zhong et al., 2017) to achieve a considerable swimming speed of the robotic fish, and is easily achieved in the aforementioned geometric model with a deformation of an SCP muscle reaching as low as 2.5% or di = 0.025 (Rajendran and Zhang, 2017; Rajendran and Zhang, 2018), provided that the muscles are placed close to the links unlike the experimental prototype described in Section 2.
4.2 Dynamic Model
The schematic of the soft robotic fish along with relevant reference frames and variables that describe the motion of the robot is illustrated in Figure 4B. The inertial or stationary frame of reference is denoted by
The entire dynamics of the soft robotic fish driven by artificial muscles is modeled using two subsystems. The first subsystem comprises of the thermo-electrical and thermo-mechanical dynamics of the SCP muscle actuators which takes in the actuating voltage potentials and outputs the deformations in the muscles’ lengths (Yip and Niemeyer, 2017). The system input vector is given by
where Mm is the mass of the SCP muscle actuator, λ is the absolute thermal conductivity, Rm is the electrical resistance of the actuator, Cth is the coefficient of thermal mass,
where bm is the damping coefficient, cm is the thermal constant and km is the mean stiffness constant of the SCP actuator.
The deformed lengths of the muscles are used to derive the soft robotic fish’s profile or discretized curvature in its body frame using the 3-link geometric model as equated in Eqs. 1–3. Consequently, the joint angles establish the input to the second subsystem which comprises of the planar positional dynamics and hydrodynamics of the robotic fish. The states of the second subsystem are collectively given by the vector
where Mf is the mass of the robotic fish, Mx and My are the added masses along the x and y directions respectively, Jz is the mass moment of inertia of the robotic fish about the z axis, Fx and Fy are the forces acting along the x and y directions in the body frame, and τz is the moment or torque about the z axis. These forces and moment are expressed as
where
Here, KD is the drag coefficient of the soft robotic fish body,
The aforementioned soft robotic fish dynamics is approximated as a simplified three-link model, which ignores the fluid structure interactions, however, considers the hydrodynamic forces of robotic fish per se in its dynamic model. The fish prototype presents its own limitation such as bounded tail-flapping range due to the geometric constraints involving the SCPs, thus restricting the range of undulations too. Additionally, the actuation frequency of the soft robotic fish is implicitly restricted by taking the SCP dynamics into consideration, whereby the SCP’s time constant approximates to 0.8 s when submerged in water (Rajendran and Zhang, 2017), thus bounding the upper actuation frequency to
5 Motion Planning of Soft Robotic Fish Using Learning-Based Control
This section aims at designing a learning-based controller to meet various motion planning control objectives of the soft robotic fish which includes 1) regulating the yaw angle θ and 2) path following via tracking given waypoints. Nevertheless, the consolidated dynamics of the various subsystems constituting the soft robotic fish model as given in Eqs 4–18, is fairly complex and nonlinear, exhibits hysteresis, and uncertainties usually in dynamics of the actual systems, thus necessitating a robust nonlinear controller. To alleviate the challenges which mostly arise in designing a traditional nonlinear controller, this paper combines a contemporary reinforcement learning algorithm from the field of artificial intelligence and a customized framework to design a learning-based controller. In contrast to the simple Q-learning based approach employed in our previous work (Rajendran and Zhang, 2018), this paper adopts a much more sophisticated and efficient deep reinforcement learning algorithm called deep-deterministic policy gradient algorithm (DDPG), which is compatible with continuous action and state spaces (Lillicrap et al., 2015). The following subsections describe the architecture of the learning framework consolidating the aforementioned soft robotic fish model with the learning environment, and gives an overview of DDPG reinforcement learning algorithm, the deployed reward function and hyper-parameters.
5.1 Learning Framework and Architecture
5.1.1 Agent and Environment
The inherent cognitive realization of the soft robotic fish is characterized as a learning agent that takes in the current system state s obtained from feedback of the robot and outputs the best possible action a. The learning agent primarily constitutes of an actor deep neural network (DNN), which is iteratively trained using the DDPG learning algorithm. An action performed by the agent at any given time instant, comprises of the voltage potential Vi applied to the SCP actuators mi where i ∈ (1, 2, 3, 4). The action vector follows the system input vector as defined before in the dynamic model in Section 3, which is collectively put as
5.1.2 Image-Based Observations
Foreseeing the experimental validation on the physical soft robotic fish, most of the states in s, necessary for the agent to envision the robot’s pose, can be obtained through feedback via electronic sensing by embedding various position sensors such as inertial measurement unit, accelerometer, and/or gyroscope. Obtaining the curvature of the soft robotic fish is equally indispensable for the agent to envision the robot’s profile, however, employing the use of flex sensors or distributed sensing elements in/around the soft body has its own limitations. While flex sensors require a complex arrangement/construction to maximize the frictional and spatial contact between the sensor strip and the soft body, use of distributed sensing elements such as pressure sensors not only limits to a finite set of discretized measurements of the soft body profile in contrast to its continuum curvature, but also requires an optimal position of sensor placement.
In order to overcome the above limitations and obtain the soft robotic fish’s continuous curvature incorporating the SCP actuators’ dynamics, this paper presents a novel state representation of the soft robot’s profile using grayscale images. These grayscale images are computationally generated such that they identically replicate the masked top view of the soft robotic fish, in order to speed up the training of the agent rather than depend on the visual processing/feedback from experiments on the robotic fish. First, as shown in Figure 5A, the three links of the fish are geometrically plotted using the joint angles
where ρ is the ratio between the maximum coordinates and required image size of dimensions p × q,
where
FIGURE 5. Sequential approach towards generating an image-based observation zp,q(t) of a sample soft robotic fish profile with
5.1.3 DDPG Learning-Based Controller Design
The DDPG algorithm (Lillicrap et al., 2015), as illustrated in Figure 6 and elucidated in Algorithm 1, primarily employs the use of a critic C and an actor A neural network. Due to the image-based observational input to the agent, the actor neural network is modeled as a combination of a convolutional neural network (CNN) and a DNN as shown in Figure 6. The algorithm inputs the grayscale image matrix zp,q(t) to the CNN and performs a sequential convolution on the image with a kernel or filter of size kf at a stride of length kl to extract the features from the image. The convolved image goes through a pooling layer, fully flattened, concatenated with the rest of the state vector f(x, y∗), and is then collectively fed to the actor DNN. Throughout the agent’s life span ttotal which constitutes one training episode, the actor estimates the best action a at every time step ta that can be carried out in a given state s as per its most recently trained policy πf, aka the representation of state-action mapping. An Ornstein-Uhlenbeck noise process of variance σ2 is induced to the selected action to influence global exploration while training. The agent performs the chosen action by executing the soft robotic fish dynamics as described in Eqs 4–18 stepping through a time interval of ts where ts ≪ ta, followed by which the environment returns a new state s′ and a reward r. These entities collectively establish a transition tuple ɛ = (s, a, r, s′) that is incrementally stored in a huge dataset known as the experience replay buffer E. At every action time ta, a mini-batch Emb of nmb transitions is randomly sampled from E, and its targets are determined from the Bellman equation (Lillicrap et al., 2015). A mean-squared error loss between the target values and its estimates are determined and back-propagated through the critic network C. The propagated gradients of the updated critic network are then used to reform the actor network. A recent target replica of the actor A′ and critic C′ DNNs are retained to chase a set of temporarily fixed targets, thus encouraging convergence of the algorithm. The overall training lasts for N episodes, with a terminal condition based on a reward averaged over a set of latest episodes.
Algorithm 1. Deep-Deterministic Policy Gradient Learning in Soft Robotic Fish
5.2 Reward Function
The shaping of the reward function plays an important role in training the agent. The high nonlinearity of the aforementioned modeled soft robotic fish, selects in this paper a reward r equipped with a linear quadratic regulator (LQR) cost function given by
where η is a scaling factor, ye = y∗ − y is the tracking error of the system output, and Q and R are the weight matrices bringing in a trade-off between the system performances and control input efforts respectively.
5.3 Hyper-Parameters
Hyper-parameters play a significant role in the duration of training and accuracy of finding a global optimum and convergence. These parameters include the learning rate of the critic αC and actor αA networks such that αC, αA ∈ (0, 1), whereby very small learning rates increase the chance of global exploration, hence decreasing the chances of reaching local optima. Several other parameters are the size of the experience buffer |E| which provides adequate sampling space, size of the sampled minibatch n which are generally chosen in powers of 2 to favor computational efficiency, reward discount factor γ which denotes the significance of the far rewards over the near rewards, variance of the noise process σ2 to control the exploration factor, number of episodes for averaging of reward, and terminating criterion of the training pertaining to the averaged reward.
6 Simulation Results
This section presents the simulation results of two control tasks—yaw control and path following, to evaluate the performance of the proposed DDPG-based control of the soft robotic fish. The two control objectives serve as fundamentally decomposed control goals in high level control objectives such as path planning, schooling, shoaling, leader-following, etc. Table 1 shows the parameters applied in the simulations, which pertain to the environment, learning hyper-parameters, SCP muscles and fish dynamics. The thermo-electric and thermo-mechanical SCP muscle parameters follow (Rajendran and Zhang, 2017; Yip and Niemeyer, 2017; Rajendran and Zhang, 2018). While some of the training hyper-parameters adopt (Lillicrap et al., 2015), others are chosen by trial and error to expedite the convergence of the training by weighting the level of global exploration versus local exploitation. The fish dynamics parameters, however, are designed by envisioning the soft robotic fish and its expected planar motion comprising the hydrodynamic coefficients, and approximating the parameters of previously modeled robotic fish which exhibit similar motions (Marchese et al., 2014).
The system design parameters are selected considering the reasonable SCP dynamics in conjunction with the fish flapping tail frequency, thus having an action time step of ta = 0.5 s. The image observation parameters are chosen based on the performance of the CNN and foreseeing the computational processing power of a hardware computer vision/image processor such as OpenMV, Pixy, and Raspberry Pi Cameras to generate image-based observations. Regardless of the camera used in the experiments, they all support a minimum capture rate of 60 frames per second (FPS), thus giving a wide window of time to determine the next action a given an observation s, and therefore, deeming the proposed visual learning-based control algorithm realizable due to the considerable sampling time to = ta.
6.1 Yaw Control
The yaw control objective of the soft robotic fish aims at orienting the robot at a desired angle such that θ∗ ∈ [−π, π]. As this requires the agent to obtain the knowledge of both the current angle θ and desired angle θ∗ as part of its observation s, the learning is subtly modified to reduce the dimension of the observation s for quicker convergence. Consequently, the observation comprises of the difference between the current and desired angles such that the agent’s target remains θ∗ = 0 at all times, whereas the agent itself is randomly initialized to
The trained agent is then simulated to control the soft robotic fish, initialized at (xi, yi, θ) = (0, 0, −178°), to achieve a desired orientation of θ∗ = 0°. The control input u2 generated by the actor network is shown in Figure 7A and the corresponding change in the tail angle
FIGURE 7. Simulated result of yaw control of the robotic fish initialized at the origin with pose (xi, yi, θ) = (0, 0,−178°) and desired orientation θ∗ = 0°. (A) Control input u2 representing the voltages of the SCP muscles m3, m4; (B) the trajectory of the robotic fish turning from −178° to 0°; (C) the tail flap angle
The overall performance of the trained agent is evaluated by simulating the soft robotic fish for 60 s, initialized at 10 degree intervals in the range (−180°, 180°), with its desired angle set to zero at all times. Two performance factors are taken into consideration pertaining to the yaw angle regulation: 1) settling time, and 2) steady state error. The settling times of all these simulated periods are collated by obtaining the time instants when terminalCondition is satisfied, and the resulting plot is illustrated in Figure 8. Evidently, as shown in the figure, we see that it only takes 20 s for the soft robotic fish to rotate 180 degrees based on the dynamics described in Eqs 4–18. Additionally, as the difference between the current and desired orientation angle increases, the settling time also increases. We also find that the outcome slightly favors negative values of desired angles over the positive values, thus appearing asymmetrically, which can be attributed to algorithm’s randomness such as initialization of the actor and critic neural networks’ weights before the training, the shift in algorithm’s Q-value during training, and convergence of the training based on the samples selected in the experience replay buffer. In order to balance this predicament, prolonged training of the agent is encouraged to refine the convergence with minimal shift in the actor NN’s weights.
FIGURE 8. Simulated result of the settling times in yaw control of the soft robotic fish initially oriented at zero degrees and targeted to swim at every angle spaced by 10 degrees in the range (−180°, 180°).
The outcome of the evaluation in terms of the steady state error in the angular orientation is shown in Figure 9, where the steady state errors of the soft robotic fish agent at different target angles spaced at 10 degree intervals in the range (−180°, 180°) are collated and displayed using red squares. The error bars corresponding to each target angle represent the steady state boundaries caused due to the flapping oscillations. As the minimization of the angular velocity or swinging motion is essential to alleviate the effect of the hydrodynamic drag force which reduces propulsive efficiency (Liu et al., 2008; Farideddin Masoomi et al., 2015), we see that throughout the range of the soft robotic fish’s target angles, the agent has learned to maintain a steady state error within ±5 degrees satisfying
FIGURE 9. Simulated result of the steady state errors in yaw control of the soft robotic fish initially oriented at zero degrees and targeted to swim at every angle spaced by 10 degrees in the range (−180°, 180°), where error bars represent the steady state boundaries caused due to the flapping oscillations.
6.2 Path Following
As the trained agent is capable of successfully controlling the orientation of the soft robotic fish, this section demonstrates the agent’s ability to continuously follow a predefined path. Hence, the agent is strenuously tested by simulating the robotic fish to follow a set of planar waypoints closely constrained and proportional to its body length (BL) in order to observe the maneuvering range. In the first test, four waypoints are generated and arranged equidistantly to the origin and subsequent preceding and succeeding waypoints. The robotic fish is initialized at the origin with the pose (xi, yi, θ) = (0, 0, 0°), and set to follow the waypoints numbered (w1, w2, w3, w4) in a cyclic manner. The target angle is determined at every action time step ta given by
FIGURE 10. Simulated result of the robotic fish following a path defined by (A) a cyclic set of four waypoints and (B) a line defined by the equation −xi + yi = 5.
Following this, a second test is performed to test the agent to follow a line defined by the parametric equation g1xi + g2yi + g3 = 0, when initializing the soft robotic fish to different poses (x, y, θ). At every action time step ta, the cross-track error (CTE) which is defined as the normal distance between the center of the fish and the target line, is computed by
which leads to our design of the target orientation of the fish
7 Conclusion
This paper proposed a novel design of a soft robotic fish actuated by antagonistically arranged SCP artificial muscles, which takes advantage of the quicker heat dissipation in SCPs when submerged in water, thus leading to faster actuation. The soft robotic fish was modeled from its geometrical and dynamical perspectives to realize a two-dimensional swimming motion by incorporating hydrodynamic forces and moments. The paper also presented a learning-based controller design, which perceives the curvature dynamics and soft profile of the fish via image-based state observations. We conjecture that this type of visual learning-based controller design can be generalized and ubiquitously used in training/inference of agents to self-learn locomotion in soft robots that are limited with volumetric constraints and pose challenges in embedding complex curvature-sensing electronics. Not only this sensing approach leads to more flexible and less expensive soft robots, but also contributes towards decrease in the production time. Additionally, the derived model and learning-based controller were simulated to evaluate the agent’s performance and validate its effectiveness with respect to two control objectives i.e., regulating the robot’s yaw angle and following a predefined path.
The future scope of this paper branches out to several directions such as optimal design of SCP-actuated soft robots and researching online reinforcement learning-based controllers. Significantly, the visual learning-based controller design could pave a path to embark on a new research direction towards visual imitative learning in soft robots from real biological lifeforms, thus not only mimicking the anatomical functions, but also mimicking the cognitive phases in locomotion and social behavior. Nevertheless, our future research work primarily includes culminating the development of the experimental platform to test the SCP-driven soft robotic fish by addressing some current impediments such as buoyancy control and mobile power supply, followed by validating the proposed visual learning-based controller design in real-time. Concurrently, we also plan to investigate the design, outcome and performance of a fully image-based state feedback controller to simplify the learning approach by reducing the number of required embedded positional sensors, aiming to expand its applications to a wider variety of soft robots.
Data Availability Statement
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.
Author Contributions
SR is a graduate student pursuing PhD in Electrical and Computer Engineering at George Mason University and this research primarily is carried out towards the PhD dissertation thesis under FZ’s advice.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frobt.2021.809427/full#supplementary-material
References
Aubin, C. A., Choudhury, S., Jerch, R., Archer, L. A., Pikul, J. H., and Shepherd, R. F. (2019). Electrolytic Vascular Systems for Energy-Dense Robots. Nature 571 (7763), 51–57. doi:10.1038/s41586-019-1313-1
Berlinger, F., Saadat, M., Haj-Hariri, H., Lauder, G. V., and Nagpal, R. (2021). Fish-like Three-Dimensional Swimming with an Autonomous, Multi-Fin, and Biomimetic Robot. Bioinspir. Biomim. 16 (2), 026018. doi:10.1088/1748-3190/abd013
Bhagat, S., Banerjee, H., Ho Tse, Z., and Ren, H. (2019). Deep Reinforcement Learning for Soft, Flexible Robots: Brief Review with Impending Challenges. Robotics 8 (1), 4. [Online]. Available: doi:10.3390/robotics8010004
Chen, Z. (2017). A Review on Robotic Fish Enabled by Ionic Polymer-Metal Composite Artificial Muscles. Robotics Biomim. 4 (1), 24–13. doi:10.1186/s40638-017-0081-3
Christianson, C., Bayag, C., Li, G., Jadhav, S., Giri, A., Agba, C., et al. (2019). Jellyfish-inspired Soft Robot Driven by Fluid Electrode Dielectric Organic Robotic Actuators. Front. Robot. AI 6, 126. [Online]. doi:10.3389/frobt.2019.00126
Donatelli, C. M., Bradner, S. A., Mathews, J., Sanders, E., Culligan, C., Kaplan, D., et al. (2018). “Prototype of a Fish Inspired Swimming Silk Robot,” in 2018 IEEE International Conference on Soft Robotics (RoboSoft) (IEEE), 60–65.
Farideddin Masoomi, S., Gutschmidt, S., Chen, X., and Sellier, M. (2015). The Kinematics and Dynamics of Undulatory Motion of a Tuna-Mimetic Robot. Int. J. Adv. Robotic Syst. 12 (7), 83. [Online]. doi:10.5772/60059
Horn, R. A., and Johnson, C. R. (2012). Matrix Analysis. Cambridge, United Kingdom: Cambridge University Press.
Jeong, I.-B., Park, C.-S., Na, K.-I., Han, S., and Kim, J.-H. (2011). “Particle Swarm Optimization-Based central Patter Generator for Robotic Fish Locomotion,” in 2011 IEEE Congress of Evolutionary Computation (CEC) (IEEE), 152–157. doi:10.1109/cec.2011.5949612
Katzschmann, R. K., DelPreto, J., MacCurdy, R., and Rus, D. (2018). Exploration of Underwater Life with an Acoustically Controlled Soft Robotic Fish. Sci. Robot. 3 (16). [Online]. doi:10.1126/scirobotics.aar3449
Kim, S., Laschi, C., and Trimmer, B. (2013). Soft Robotics: a Bioinspired Evolution in Robotics. Trends Biotechnol. 31 (5), 287–294. doi:10.1016/j.tibtech.2013.03.002
Knuth, D. E. (1992). Two Notes on Notation. The Am. Math. Monthly 99 (5), 403–422. doi:10.1080/00029890.1992.11995869
Korkmaz, D., Budak, U., Bal, C., Koca, G. O., and Akpolat, Z. (2012). “Modeling and Implementation of a Biomimetic Robotic Fish,” in International Symposium on Power Electronics Power Electronics, Electrical Drives, Automation and Motion (IEEE), 1187–1192. doi:10.1109/speedam.2012.6264510
Laschi, C., Mazzolai, B., and Cianchetti, M. (2016). Soft Robotics: Technologies and Systems Pushing the Boundaries of Robot Abilities. Sci. Robot. 1 (1), eaah3690. doi:10.1126/scirobotics.aah3690
Lauder, G. V. (2015). Fish Locomotion: Recent Advances and New Directions. Annu. Rev. Mar. Sci. 7, 521–545. doi:10.1146/annurev-marine-010814-015614
Lauder, G. V., Madden, P. G. A., Tangorra, J. L., Anderson, E., and Baker, T. V. (2011). Bioinspiration from Fish for Smart Material Design and Function. Smart Mater. Struct. 20 (9), 094014. doi:10.1088/0964-1726/20/9/094014
Lighthill, M. J. (1971). Large-amplitude Elongated-Body Theory of Fish Locomotion. Proc. R. Soc. Lond. Ser. B. Biol. Sci. 179 (1055), 125–138.
Liu, Y.-x., Chen, W.-s., and Liu, J.-k. (2008). Research on the Swing of the Body of Two-Joint Robot Fish. J. Bionic Eng. 5 (2), 159–165. doi:10.1016/s1672-6529(08)60020-7
Marchese, A. D., Onal, C. D., and Rus, D. (2014). Autonomous Soft Robotic Fish Capable of Escape Maneuvers Using Fluidic Elastomer Actuators. Soft robotics 1 (1), 75–87. doi:10.1089/soro.2013.0009
Morgansen, K. A., Triplett, B. I., and Klein, D. J. (2007). Geometric Methods for Modeling and Control of Free-Swimming Fin-Actuated Underwater Vehicles. IEEE Trans. Robot. 23 (6), 1184–1199. doi:10.1109/led.2007.911625
Olsen, Z. J., and Kim, K. J. (2019). Design and Modeling of a New Biomimetic Soft Robotic Jellyfish Using Ipmc-Based Electroactive Polymers. Front. Robot. AI 6, 112. [Online]. doi:10.3389/frobt.2019.00112
Pfeifer, R., Lungarella, M., and Iida, F. (2007). Self-organization, Embodiment, and Biologically Inspired Robotics. Science 318 (5853), 1088–1093. [Online]. doi:10.1126/science.1145803
Phamduy, P., LeGrand, R., and Porfiri, M. (2015). Robotic Fish: Design and Characterization of an Interactive Idevice-Controlled Robotic Fish for Informal Science Education. IEEE Robot. Automat. Mag. 22 (1), 86–96. doi:10.1109/mra.2014.2381367
Raj, A., and Thakur, A. (2016). Fish-inspired Robots: Design, Sensing, Actuation, and Autonomy-A Review of Research. Bioinspir. Biomim. 11 (3), 031001. doi:10.1088/1748-3190/11/3/031001
Rajendran, S. K., and Zhang, F. (2017). “Developing a Novel Robotic Fish with Antagonistic Artificial Muscle Actuators.”in Dynamic Systems and Control Conference. (American Society of Mechanical Engineers ASME), V001T30A011. doi:10.1115/dscc2017-5380
Rajendran, S. K., and Zhang, F. (2018). “Learning Based Speed Control of Soft Robotic Fish,” in Dynamic Systems and Control Conference (American Society of Mechanical Engineers ASME), V001T04A005. doi:10.1115/dscc2018-897751890
Shi, L., Habib, M. K., Xiao, N., and Hu, H. (2015). Biologically Inspired Robotics. J. Robotics 2015 (894394), 1–2. [Online]. doi:10.1155/2015/894394
Simeonov, A., Henderson, T., Lan, Z., Sundar, G., Factor, A., Zhang, J., et al. (2018). Bundled Super-coiled Polymer Artificial Muscles: Design, Characterization, and Modeling. IEEE Robot. Autom. Lett. 3 (3), 1671–1678. doi:10.1109/lra.2018.2801469
Sutton, R. S., and Barto, A. G. (2018). Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT press.
Thuruthel, T. G., Falotico, E., Renda, F., and Laschi, C. (2019). Model-based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators. IEEE Trans. Robot. 35 (1), 124–134. doi:10.1109/tro.2018.2878318
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., et al. Continuous Control with Deep Reinforcement Learning. arXiv preprint arXiv:1509.02971, 2015.
Triantafyllou, M. S., Triantafyllou, G. S., and Yue, D. K. P. (2000). Hydrodynamics of Fishlike Swimming. Annu. Rev. Fluid Mech. 32 (1), 33–53. doi:10.1146/annurev.fluid.32.1.33
Wang, J., McKinley, P. K., and Tan, X. (2015). Dynamic Modeling of Robotic Fish with a Base-Actuated Flexible Tail. J. dynamic Syst. Meas. Control 137 (1). doi:10.1115/1.4028056
Watkins, C. J., and Dayan, P. (1992). Q-learning. Machine Learn. 8 (3-4), 279–292. doi:10.1023/a:1022676722315
Webb, P. W., and Gerstner, C. L. (2021). “Fish Swimming Behaviour: Predictions from Physical Principles,” in Biomechanics in Animal Behaviour (New York, NY: Garland Science), 59–77.
Wen, L., Wang, T., Wu, G., and Liang, J. (2012). Quantitative Thrust Efficiency of a Self-Propulsive Robotic Fish: Experimental Method and Hydrodynamic Investigation. IEEE/Asme Trans. Mechatronics 18 (3), 1027–1038.
Yip, M. C., and Niemeyer, G. (2017). On the Control and Properties of Supercoiled Polymer Artificial Muscles. IEEE Trans. Robot. 33 (3), 689–699. doi:10.1109/tro.2017.2664885
Yu, J., Tan, M., Wang, S., and Chen, E. (2004). Development of a Biomimetic Robotic Fish and its Control Algorithm. IEEE Trans. Syst. Man. Cybern. B 34 (4), 1798–1810. doi:10.1109/tsmcb.2004.831151
Yu, J., and Wang, L. (2005). “Parameter Optimization of Simplified Propulsive Model for Biomimetic Robot Fish,” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation (IEEE), 3306–3311.
Zhang, F., Ennasr, O., Litchman, E., and Tan, X. (2015). Autonomous Sampling of Water Columns Using Gliding Robotic Fish: Algorithms and Harmful-Algae-Sampling Experiments. IEEE Syst. J. 10 (3), 1271–1281.
Zhang, F., Lagor, F. D., Yeo, D., Washington, P., and Paley, D. A. (2015). Distributed Flow Sensing for Closed-Loop Speed Control of a Flexible Fish Robot. Bioinspir. Biomim. 10 (6), 065001. doi:10.1088/1748-3190/10/6/065001
Zheng Chen, Z., Shatara, S., and Xiaobo Tan, X. (2010). Modeling of Biomimetic Robotic Fish Propelled by an Ionic Polymer-Metal Composite Caudal Fin. Ieee/asme Trans. Mechatron. 15 (3), 448–459. doi:10.1109/tmech.2009.2027812
Keywords: underwater robots, soft robotics, fish swimming, bio-inspired robotics, artificial muscle, deep reinforcement learning, convolutional neural network (CNN)
Citation: Rajendran SK and Zhang F (2022) Design, Modeling, and Visual Learning-Based Control of Soft Robotic Fish Driven by Super-Coiled Polymers. Front. Robot. AI 8:809427. doi: 10.3389/frobt.2021.809427
Received: 05 November 2021; Accepted: 17 December 2021;
Published: 04 March 2022.
Edited by:
Wenjun Xu, Peng Cheng Laboratory, ChinaReviewed by:
Ahmet Fatih Tabak, Kadir Has University, TurkeyJiang Zou, Shanghai Jiao Tong University, China
Copyright © 2022 Rajendran and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Feitian Zhang, ZmVpdGlhbkBwa3UuZWR1LmNu