In many clinical applications, detecting and tracking of respiratory motion is required. As an example, image-guided radiotherapy (IGRT) of the chest and abdomen relies heavily on this principle: some kind of marker is placed on or attached to the patient’s chest and is monitored using a non-invasive localisation device. Subsequently, trajectory of the marker is then analysed and used to either dynamically activate the treatment beam (called gating ) or to guide the radiation source . Especially in the second scenario, tracking one marker may not be sufficient: the actual target of the treatment beam – the tumour – is typically not observed directly. Although this could be done (either using continuous X-ray localisation  or 3D ultrasound tracking ), the current method in clinical use relies on a mathematical model linking the motion on the patient’s chest to the motion of the actual target.
It has been shown that the accuracy of these correlation algorithms can be improved by incorporating multiple markers . In this work, we demonstrate how consumer hardware (Microsoft’s Kinect v2 depth sensor) can be used to accurately track the 3D position of multiple markers using a special marker shirt.
2 Methods and materials
To acquire respiratory motion traces, a special marker shirt has been developed. Eleven marker templates were printed onto a Lycra shirt, ensuring tight fit on the volunteers, while position of the markers correspond to areas relevant for the measurement. Each marker consists of a black circle surrounded by a black ring. Details and numbering of the markers are shown in Figure 1.
Tracking the position of the markers is done using Microsoft’s Kinect v2 camera (see Figure 2) and the corresponding software development kit (SDK) . The camera is able to simultaneously capture three different types of images at a frame rate of up to 30 Hz: a color image (1920 × 1280 pixels), an infrared-illuminated grayscale image (512 × 424 pixels), and a depth image (512 × 424 pixels, depth resolution of 1 mm). Typical images are shown in Figure 3. Details about the technology behind the sensor is given in .
Using these images, and the known intrinsics and extrinsics of the color- and IR-cameras inside the Kinect sensor, it is possible to determine the 3D position for each pixel in the depth image. We have developed an application that allows selecting and tracking up to 15 markers in real time. The general process is as follows:
During setup, the user is shown a camera image of the subject and is asked to select the initial position of the markers and the template to use for tracking.
The position of the template in the given regions of interest (ROI) is determined
The distance of the center point of the template found is determined using template matching
The matching ROIs are centered around the position of the last match
Template matching is done using cross correlation. It is implemented in C# using a wrapper library (EmguCV) around the OpenCV computer vision library.
2.1 Volunteer study
The Kinect sensor was attached to an industrial robot (Adept Viper s850) to allow accurate and stable placement. The setup is shown schematically in Figure 5. In a small volunteer study (four participants, one female, three male), we evaluated the possibility of feature tracking. Our volunteers were asked to lie down in supine position and breathe normally for three to four minutes.
2.2 Accuracy measurements
Finally, the stability and accuracy of the Kinect sensor was evaluated in another experiment. First, the robot as shown in Figure 5 was programmed to follow a sinusoidal motion (therefore similar to respiratory motion) along the z- axis while the distance to the patient couch was computed for each camera frame. Second, the distance to the patient couch was measured repeatedly for about twelve minutes to determine the amount of noise and possible drift.
Using our multi-threaded implementation in C#, tracking eleven markers in the color camera image – using ROIs of twice the size of the marker template – was possible in real time using multi-threading on a MacBook Pro Retina (2.3 GHz Core i7, four cores, 16 GiB RAM, SSD). In general, the runtime of one template matching iteration was around 80 ms.
3.1 Volunteer study
Recording motion traces of the markers worked for all four volunteers (three male, one female), although markers one and three were difficult to track due to stretching of the fabric. Figure 6 shows the distances measured for all eleven templates. Note the large differences in amplitude between the individual sensors.
The depth motion trace of a second volunteer (subject four) is shown in Figure 7. Note the much larger amplitude for markers 5–8 and 11 and the sudden motion around t = 95 s due to the volunteer sneezing. Additionally, the values from markers one and three (red and blue, respectively) show that tracking them is difficult due to deformation.
Additionally, the in-image motion of the template was also evaluated. It is exemplarily shown for one marker (marker eight of subject one) in Figure 8. Here, it is clear that there is very little motion in the left/right direction, as would be expected. In the superior/inferior-direction, however, some motion is present (one pixel corresponds to approximately 1-1.5 mm in our setup, depending on the exact distance from the sensor), albeit not as strong as in the anterior/posterior-direction.
3.2 Accuracy measurements
Using the same setup as described before, we determined the absolute accuracy of the depth measurements. The trajectory of the robot – overlaid with the measured distance to the template – is given in Figure 9. Clearly, the distance measured by the Kinect sensor deviates substantially from the true motion of the robot, maximum is 3.7 mm and the root mean square error (RMSE) is 2.0 mm with a working distance on the order of 50 cm.
The results of the static measurement evaluation are shown in Figure 10. The measurement was taken directly after turning on the Kinect sensor and some kind of time-dependent drift is visible. We believe that this is due to the changing temperature of the sensor PCB. The depth value is determined – as outlined above – from averaging all pixels in the template, resulting in sub-millimeter resolution. The noise level, however, is still considerable: we observe a standard deviation of 0.4 mm.
We have demonstrated that the Kinect v2 sensor’s data streams – color image and depth image – can be used to track multiple markers on the human chest in 3D and in real time using standard hardware. Additionally, by averaging the depth values inside the marker template, it is possible to substantially reduce the measurement noise to a standard deviation of 0.4 mm. On the other hand, however, we observed that the depth values measured using the robotic setup and the sinusoidal motion pattern deviate strongly from the actual data: the motion amplitude of the sine was 20 mm and the amplitude of the template matching was more than 25 mm – 25 % more. We believe that this is caused by multiple factors:
Inaccurate alignment of the depth axis of the Kinect sensor with the robot’s z-axis and the template center
Errors in sensor’s calibration (the Kinect sensor stores its intrinsics and extrinsics in firmware and we did not perform camera calibration)
As next steps, we plan to perform sub-pixel template matching to increase the resolution along the L/R- and S/I-axes and to further analyze the accuracy of the setup by tracking the marker with a dedicated tracking device (like NDI’s Polaris Spectra system). Also the operating speed of the system (now about 15 fps) could be increased due to massive code parallelization, so that every frame from the Kinect v2 is used. We need to make sure, however, that the light emitted by the Kinect v2 sensor does not interfere with the IR light used by the Spectra system. Both operate in the near-infrared range around 850 to 860 nm.
O. Blanck, P. Jauer, F. Ernst, R. Bruder, and A. Schweikard. Pilot-Phantomtest zur ultraschall-geführten robotergestützten Radiochirurgie. In H. Treuer, editor,44. Jahrestagung der DGMP, Cologne, Germany, 2013. DGMP, pages 122–123. Google Scholar
R. Dürichen, M. A. F. Pimentel, L. Clifton, A. Schweikard, and D. A. Clifton. Multi-task gaussian processes for multivariate physiological time-series analysis. IEEE Transactions on Biomedical Engineering, 62(1):314–322, 2014. . CrossrefGoogle Scholar
J. Hanley, M. M. Debois, D. Mah, G. S. Mageras, A. Raben, K. Rosenzweig, B. Mychalczak, L. H. Schwartz, P. J. Gloeggler, W. Lutz, C. C. Ling, S. A. Leibel, Z. Fuks, and G. J. Kutcher. Deep inspiration breath-hold technique for lung tumors: the potential value of target immobilization and reduced lung density in dose escalation. International Journal of Radiation Oncology, Biology, Physics, 45(3):603–611, 1999. . CrossrefGoogle Scholar
D. Lau. The science behind Kinects or Kinect 1.0 versus 2.0. http://www.gamasutra.com/blogs/DanielLau/20131127/205820/The_Science_Behind_Kinects_or_Kinect_10_versus_20.php, November, 2013. Online, last visited 2015-03-24.
Microsoft Corporation. Kinect for Windows SDK 2.0. http://www.microsoft.com/en-us/download/details.aspx?id=44561, October, 2014. Online, last visited 2015-03-24.
H. Shirato, S. Shimizu, K. Kitamura, T. Nishioka, K. Kagei, S. Hashimoto, H. Aoyama, T. Kunieda, N. Shinohara, H. DosakaAkita, and K. Miyasaka. Four-dimensional treatment planning and fluoroscopic real-time tumor tracking radiotherapy for moving tumor. International Journal of Radiation Oncology, Biology, Physics, 48(2):435–442, 2000. . CrossrefGoogle Scholar
About the article
Published Online: 2015-09-12
Published in Print: 2015-09-01
Conflict of interest: Authors state no conflict of interest. Material and Methods: Informed consent: Informed consent has been obtained from all individuals included in this study. Ethical approval: The research related to human use has been complied with all the relevant national regulations, institutional policies and in accordance the tenets of the Helsinki Declaration, and has been approved by the authors’ institutional review board or equivalent committee.