The treatment of cerebro- and cardiovascular diseases requires complex and challenging navigation of a catheter. Previous attempts to automate catheter navigation lack the ability to be generalizable. Methods of Deep Reinforcement Learning show promising results and may be the key to automate catheter navigation through the tortuous vascular tree. This work investigates Deep Reinforcement Learning for guidewire manipulation in a complex and rigid vascular model in 2D. The neural network trained by Deep Deterministic Policy Gradients with Hindsight Experience Replay performs well on the low-level control task, however the high-level control of the path planning must be improved further.
Catheter-based interventions, e.g., for the treatment of cerebro- and cardiovascular diseases, often require complex navigation of a catheter from the groin to the lesion through the vascular tree. Even highly trained specialists regularly struggle with the fact that catheter movements at the proximal end of the catheter translate into unexpected movements at its distal end. Automation of this task will unburden the mental workload for physicians and may improve the average treatment result.
Attempts to automate endovascular catheter navigation perform autonomous movements for a highly specialized task and specific anatomy , , , . These use a loop-shaping controller for the bendable tip made of shape memory alloy and a robot manipulator for the translation to follow a given catheter tip trajectory, automatically orienting the bendable catheter tip towards the target vessel while the translation movement is performed manually, extracting the vessel centerlines and controlling a robotic bendable catheter to follow the desired centerlines or using magnetic motion capture sensors to control the speed of a robotically actuated catheter following a given trajectory. So far, no approach promises quick adaptability to individual patients’ vessel geometries, vessel characteristics and the various tasks to be performed.
Utilizing Neural Networks trained by Deep Reinforcement Learning as control algorithm for the catheter manipulation robot has the potential to generalise the methodology to perform the navigation in catheter-based interventions. In Reinforcement Learning an agent interacts with a environment by performing actions and receiving state observations and rewards for every such action. In Deep Reinforcement Learning a Neural Network is choosing the next action and the agent trains the Neural Network, such that the reward is maximised. The reward is given by the developer and should include all relevant optimisation factors, e.g., reaching a target, reducing catheter forces and reducing vessel wall contacts while navigating a catheter precisely. Solving numerous Atari games with a single Neural Network configuration shows the potential of this approach .
This paper extends own previous work, where initial navigation trials of an autonomous catheter guidance through a simple transparent acrylic glass vascular phantom is shown . The simulation framework, testbench manipulator and camera-based guidewire tracking are reused. The vessel geometry of the phantom is modified to resemble natural vessel shapes and to allow investigation of learned navigation behaviors. The simulation scene is adapted to the new vessel geometry and the control algorithm is adjusted to solve the navigation task in the modified vascular phantom.
Materials and methods
The testbench setup is presented in Figure 1. The guidewire is actuated by a manipulator with two degrees of freedom (translation and rotation) through a rigid 2D phantom filled with a mixture of glycerin and water to model blood inside the vessels. The phantom is 3D printed using Stereolithographie. A camera mounted above and a light source mounted below the phantom emulate the fluoroscopy image usually obtained from X-ray. The catheter position is extracted from the camera image with 5 Hz in accordance with the control frequency of 5 Hz. The neural network calculates the commands for the guidewire manipulator. To enable the principle of human-oversight by the human-in-command approach  for future research, a gamepad provides the option to the user to manually override the manipulator commands.
The geometry of the vascular phantom is illustrated in Figure 2a. Most branches can be reached by the shortest path, which would be the result of a breadth-first-search. However, the phantom is designed, such that a conventional guide wire cannot directly reach branch 3 by going directly through branch 1 due to its structural rigidity (Figure 2c). Instead, it is necessary to navigate through branch 8 and the loop between branch 1 and 8 (Figure 2d). Furthermore, a bifurcation is inserted which is mechanically difficult to navigate. When inserting the guidewire from branch 1 into branch 8 it needs to be translated and rotated at the same time to avoid kinking. A failed attempt with a kinked guidewire can be seen in Figure 2b.
The neural network is trained using Deep Deterministic Policy Gradients  with Hindsight Experience Replay . The state of the guidewire is defined by five points in XZ-Coordinates evenly spaced with a distance of 5 mm along the guidewire starting from the tip. The input to the neural network is the catheter state of the current and the last three timesteps and the last three actions taken by the neural network. The reward system gives a −1 reward for every timestep where the preselected target is not reached and 0 for every timestep where the target is reached within a threshold of 5 mm. Output from the neural network are the continuous translation and rotation commands, which are sent to the guidewire manipulator.
Training of the neural network is performed in a simulation environment using the SOFA-Framework  with the BeamAdapter Plugin . Phantom walls are assumed rigid and the lumen empty. Friction between wall and guidewire and guidewire stiffness have been iteratively tuned to mimic guidewire failure in the testbench. Main failure of the guidewire includes entanglement of the guidewires at bifurcations where increased bending occurs.
During the training in the simulation environment the control algorithm is evaluated every 500 training episodes by performing 100 test episodes, in which the guidewire has to be navigated from a random start point to a random target within the vessel geometry of the phantom. An episode is regarded successful, if the target is reached within 25 seconds. Subsequently, the trained neural network is transferred from the simulation environment to the testbench and tested within the real phantom. The non-successful episodes are analyzed by observing the navigation process on the testbench. To improve the success rate on the testbench, it is also evaluated providing the neural network with the coordinates of the next bifurcation along the path towards the target instead of simply setting the target.
Figure 3 shows the success rate of the catheter navigation during the training process in the simulation environment. The success rate reached a maximum of 96% after 52,500 training episodes. Transferring the trained neural network to the testbench resulted in the same results as navigating in the simulation environment.
The non-successful epsiodes can be split up in two groups of failures. The first group is the high-level control failure, where the navigation fails, because the controller tries to navigate a path to the target which is impossible to reach. The second group is the low-level control failure, where the navigation at a single branching point fails, because the controller is not able to maneuver the guidewire into the desired branch. The fully trained controller shows only high-level control failures. These occur either when the target cannot be reached by trying to navigate the shortest connection, e.g., navigation from branch 4 to branch 3, or when the target is in close proximity to the current branch, e.g., navigating from branch 3 to branch 2. Low-level control failures only happen with partially trained controllers and cease to occur when the controller is fully trained.
Providing the coordinates of the next bifurcation along the path towards the target as interim target to the controller improved the success rate to 100% for this phantom.
Discussion and conclusion
A neural network trained by Deep Deterministic Policy Gradients has been shown to learn navigating a guidewire within a complex and rigid vascular phantom in 2D. The low-level control of the guidewire states no problem for the neural network controller. The learned high-level control, especially during path planning where the direct path is not ideal, poses a challenge for the neural network. A possible reason for this is the fact, that most training episodes do not require advanced path planning. In most start/target constellations it is sufficient for the controller to navigate the shortest path. Learning more complex motion behavior is presumed to be slow due to seldom appearance during training. A method to improve training may be adjusting the start/target constellations to paths that are hard to reach for the control algorithm or providing Human Demonstration data for these constellations.
Alternatively the high level path planning can be performed by a separate algorithm, such that the target for the controller is the next bifurcation instead of the final target. This way, the neural network of the controller is not required to learn the high-level control.
Concluding it is shown that a controller based on a neural network trained by Deep Reinforcement Learning is able to navigate a guidewire through a complex two dimensional vascular phantom. The low-level control task of adapting to the mechanics of the guidewire are achieved effortlessly. The high-level control task of finding the correct path is difficult to learn if the target cannot be allocated to a branch clearly or the path is not straightforward.
Future work includes improving the neural network for high-level control and navigation through more complex vessel structures, e.g., 3D navigation, changing geometries, and combination of a guidewire with a catheter.
Funding source: Fraunhofer Internal Programs
Research funding: This work was supported by the Fraunhofer Internal Programs under Grant No. WISA 833 967.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: Authors state no conflict of interest.
Informed consent: Informed consent is not applicable.
Ethical approval: The conducted research is not related to either human or animals use.
1. Jayender, J, Patel, RV, Nikumb, S. Robot-assisted active catheter insertion: algorithms and experiments. Int J Robot Res 2009;28:1101–17. https://doi.org/10.1177/0278364909103785. Search in Google Scholar
2. Schwein, A, Kramer, B, Chinnadurai, P, Virmani, N, Walker, S, O’Malley, M, et al. Electromagnetic tracking of flexible robotic catheters enables “assisted navigation” and brings automation to endovascular navigation in an in vitro study. J Vasc Surg 2018;67:1274–81. https://doi.org/10.1016/j.jvs.2017.01.072. Search in Google Scholar
3. Smoljkic, G, Poorten, EV, Sette, M, Sloten, JV. Semi-autonomous position control of a catheter inside the vasculature. In: Proceedings of the 2012 SCATh joint workshop on new technologies for computer/robot assisted surgery; 2012:1–5 pp. Search in Google Scholar
4. Tercero, C, Ikeda, S, Uchiyama, T, Fukuda, T, Arai, F, Okada, Y, et al. Autonomous catheter insertion system using magnetic motion capture sensor for endovascular surgery. Int J Med Robot Comput Assist Surg 2007;3:52–8. https://doi.org/10.1002/rcs.116. Search in Google Scholar
5. Mnih, V, Kavukcuoglu, K, Silver, D, Rusu, AA, Veness, J, Bellemare, MG, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529–33. https://doi.org/10.1038/nature14236. Search in Google Scholar
6. Behr, T, Pusch, TP, Siegfarth, M, Hüsener, D, Mörschel, T, Karstensen, L. Deep reinforcement learning for the navigation of neurovascular catheters. Curr Dir Biomed Eng 2019;5:5–8. https://doi.org/10.1515/cdbme-2019-0002. Search in Google Scholar
7. High Level Independent Group on Artificial Intelligence (AI HLEG). Ethics guidelines for trustworthy AI: Euorpean Commission; 2019:1–39 pp. Search in Google Scholar
8. Lillicrap, TP, Hunt, JJ, Pritzel, A, Heess, N, Erez, T, Tassa, Y, et al. Continuous control with deep reinforcement learning. In: 4th international conference on learning representations, ICLR 2016 - conference track proceedings; 2016. Search in Google Scholar
9. Andrychowicz, M, Wolski, F, Ray, A, Schneider, J, Fong, R, Welinder, P, et al. Hindsight experience replay. In: Advances in neural information processing systems, vol 30; 2017:5049–59 pp. Search in Google Scholar
10. Faure, F, Duriez, C, Delingette, H, Allard, J, Gilles, B, Marchesseau, S, et al. SOFA: a multi-model framework for interactive physical simulation. In: Payan, Y, editor. Soft tissue biomechanical modeling for computer assisted surgery: Springer; 2012:283–321 pp. Search in Google Scholar
11. Duriez, C, Cotin, S, Lenoir, J, Neumann, P. New approaches to catheter navigation for interventional radiology simulation. Comput Aided Surg 2006;11:300–8. https://doi.org/10.1080/10929080601090623. Search in Google Scholar
© 2020 Lennart Karstensen et al., published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.