In image-guided radiotherapy, monitoring and compensating for respiratory motion is of high importance. We have analysed the possibility to use Microsoft’s Kinect v2 sensor as a low-cost tracking camera. In our experiment, eleven circular markers were printed onto a Lycra shirt and were tracked in the camera’s color image using cross correlation-based template matching. The 3D position of the marker was determined using this information and the mean distance of all template pixels from the sensor. In an experiment with four volunteers (male and female) we could demonstrate that real time position tracking is possible in 3D. By averaging over the depth values inside the template, it was possible to increase the Kinect’s depth resolution from 1 mm to 0.1 mm. The noise level was reduced to a standard deviation of 0.4 mm. Temperature sensitivity of the measured depth values was observed for about 10-15 minutes after system start.
Pre-operative planning of valve-sparing aortic root reconstruction relies on the automatic discrimination of healthy and pathologically dilated aortic roots. The basis of this classification are features extracted from 3D ultrasound images. In previously published approaches, handcrafted features showed a limited classification accuracy. However, feature learning is insufficient due to the small data sets available for this specific problem. In this work, we propose transfer learning to use deep learning on these small data sets. For this purpose, we used the convolutional layers of the pretrained deep neural network VGG16 as a feature extractor. To simplify the problem, we only took two prominent horizontal slices throgh the aortic root, the coaptation plane and the commissure plane, into account by stitching the features of both images together and training a Random Forest classifier on the resulting feature vectors. We evaluated this method on a data set of 48 images (24 healthy, 24 dilated) using 10-fold cross validation. Using the deep learned features we could reach a classification accuracy of 84 %, which clearly outperformed the handcrafted features (71 % accuracy). Even though the VGG16 network was trained on RGB photos and for different classification tasks, the learned features are still relevant for ultrasound image analysis of aortic root pathology identification. Hence, transfer learning makes deep learning possible even on very small ultrasound data sets.
In the recent past, 3D ultrasound has been gaining relevance in many biomedical applications. One main limitation, however, is that typical ultrasound volumes are either very poorly resolved or only cover small areas. We have developed a GPU-accelerated method for live fusion of freehand 3D ultrasound sweeps to create one large volume. The method has been implemented in CUDA and is capable of generating an output volume with 0.5 mm resolution in real time while processing more than 45 volumes per second, with more than 300.000 voxels per volume. First experiments indicate that large structures like a whole forearm or high-resolution volumes of smaller structures like the hand can be combined efficiently. It is anticipated that this technology will be helpful in pediatric surgery where X-ray or CT imaging is not always possible.
In radiation therapy of abdominal targets, optimal tumor irradiation can be challenging due to intrafractional motion. Current target localization methods are mainly indirect, surrogate-based and the patient is exposed to additional radiation due to X-ray imaging. In contrast, 4D ultrasound (4DUS) imaging provides volumetric images of soft tissue tumors in real-time without ionizing radiation, facilitating a non-invasive, direct tracking method. In this study, the target was defined by features located in its local neighborhood. Features were extracted using the FAST detector and the BRISK descriptor, which were extended to 3D. To account for anatomical variability, a feature library was generated that contains manually annotated target information and relative locations of the features. During tracking, features were extracted from the current 4DUS volume and compared to the feature library. Recognized features are used to estimate feature position and shape. The developed method was evaluated in 4DUS sequences of the liver of three healthy subjects. For each dataset, a target was defined and manually contoured in a training and a test sequence. Training was used for library creation, the test sequence for target tracking. The target estimations are compared to the annotations to quantify a tracking error. The results show that binary feature libraries can be used for robust target localization in 4DUS data of the liver and could potentially serve as a tracking method less sensitive to target deformation.
Inference from medical image data using machine learning still suffers from the disregard of label uncertainty. Usually, medical images are labeled by multiple experts. However, the uncertainty of this training data, assessible as the unity of opinions of observers, is neglected as training is commonly performed on binary decision labels. In this work, we present a novel method to incorporate this label uncertainty into the learning problem using weighted Support Vector Machines (wSVM). The idea is to assign an uncertainty score to each data point. The score is between 0 and 1 and is calculated based on the unity of opinions of all observers, where u = 1 if all observers have the same opinion and u = 0 if the observers opinions are exactly 50/50, with linear interpolation in between. This score is integrated in the Support Vector Machine (SVM) optimization as a weighting of errors made for the corresponding data point. For evaluation, we asked 15 observers to label 48 2D ultrasound images of aortic roots addressing whether the images show a healthy or a pathologically dilated anatomy, where the ground truth was known. As the observers were not trained experts, a high diversity of opinions was present in the data set. We performed image classification using both approaches, i.e. classical SVM and wSVM with integrated uncertainty weighting, utilizing 10-fold Cross Validation, respectively (linear kernel, C = 7). By incorporating the observer uncertainty, the classification accuracy could be improved by 3.1 percentage points (SVM: 83.5%, wSVM: 86.6%). This indicates that integrating information on the observers’ unity of opinions increases the generalization performance of the classifier and that uncertainty weighted wSVM could present a promising method for machine learning in the medical domain.
4D ultrasound (4D US) is gaining relevance as a tracking method in radiation therapy (RT) with modern matrix array probes offering new possibilities for real-time target detection. However, for clinical implementation of USguided RT, image quality, volumetric framerate and artifacts caused by the probe’s presence during planning and / or setup computed tomography (CT) must be quantified. We compared three diagnostic 4D US systems with matrix array probes using a commercial wire phantom to measure spatial resolution as well as a calibration and a torso phantom to assess different image quality metrics. CT artifacts were quantified in the torso phantom by calculating the total variation and percentage of affected voxels between a reference CT scan and CT scans with probes in place. We found that state-of-the-art 4D US systems with small probes can fit inside the CT bore and cause fewer metal artifacts than larger probes. US image quality varies between systems and is task-dependent. Volume sizes and framerates are much higher than the commercial guidance solution for US-guided RT, warranting further investigation regarding clinical performance for image guidance.
Ultrasound (US) imaging, in contrast to other image guidance techniques, offers the distinct advantage of providing volumetric image data in real-time (4D) without using ionizing radiation. The goal of this study was to perform the first quantitative comparison of three different 4D US systems with fast matrix array probes and real-time data streaming regarding their target tracking accuracy and system latency. Sinusoidal motion of varying amplitudes and frequencies was used to simulate breathing motion with a robotic arm and a static US phantom. US volumes and robot positions were acquired online and stored for retrospective analysis. A template matching approach was used for target localization in the US data. Target motion measured in US was compared to the reference trajectory performed by the robot to determine localization accuracy and system latency. Using the robotic setup, all investigated 4D US systems could detect a moving target with sub-millimeter accuracy. However, especially high system latency increased tracking errors substantially and should be compensated with prediction algorithms for respiratory motion compensation.
Fluoroscopy and digital subtraction angiography provide guidance in endovascular aortic repair (EVAR) but introduce radiation exposure and require the administration of contrast agent. To overcome these disadvantages, previous studies proposed to display the pose of an electromagnetically (EM) tracked catheter tip within a three-dimensional virtual aorta on augmented reality (AR) glasses. For further guidance, we propose to create virtual angioscopy images based on the catheter tip pose within the aorta and to display them on HoloLens. The aorta was segmented from the computed tomography (CT) data using MeVisLab software. A landmarkbased registration allowed the calculation of the pose of the EM sensor in the CT coordinate system. The sensor pose was sent to MeVisLab running on a computer and a virtual angioscopy image was created at runtime based on the segmented aorta. When requested by HoloLens, the last encoded image was sent from MeVisLab to the AR glasses via Wi-Fi using a remote procedure call (gRPC), and then decoded and displayed on HoloLens. For evaluation purposes, the latency of transmitting and displaying the images was measured using two different lossy compression formats (namely JPEG and DXT1). A mean latency of 82 ms was measured for the JPEG format. On the other hand, using the DXT1 format, the mean latency was reduced by 87 %. This study proved the feasibility of creating pose-dependent virtual angioscopy images and displaying them on HoloLens. Additionally, the results showed that the DXT1 format outperformed the JPEG format regarding latency. The virtual angioscopy may add valuable additional information for guidance in radiation-sparing EVAR procedure approaches.
Real-time target localization with ultrasound holds high potential for image guidance and motion compensation in radiosurgery due to its non-invasive image acquisition free from ionizing radiation. However, a two-step localization has to be performed when integrating ultrasound into the existing radiosurgery workflow. In addition to target localization inside the ultrasound volume, the probe itself has to be localized in order to transform the target position into treatment room coordinates. By adapting existing camera calibration tools, we have developed a method to extend the stereoscopic X-ray tracking system of a radiosurgery platform in order to locate objects such as marker geometries with six degrees of freedom. The calibration was performed with 0.1 mm reprojection error. By using the full area of the flat-panel detectors without pre-processing the extended software increased the tracking volume and resolution by up to 80%, substantially improving patient localization and marker detectability. Furthermore, marker-tracking showed sub-millimeter accuracy and rotational errors below 0.1°. This demonstrates that the developed extension framework can accurately localize marker geometries using an integrated X-ray system, establishing the link for the integration of real-time ultrasound image guidance into the existing system.
In this paper, we presented a deep convolutional neural network (CNN) approach for forehead tissue thickness estimation. We use down sampled NIR laser backscattering images acquired from a novel marker-less near-infrared laser-based head tracking system, combined with the beam’s incident angle parameter. These two-channel augmented images were constructed for the CNN input, while a single node output layer represents the estimated value of the forehead tissue thickness. The models were – separately for each subject – trained and tested on datasets acquired from 30 subjects (high resolution MRI data is used as ground truth). To speed up training, we used a pre-trained network from the first subject to bootstrap training for each of the other subjects. We could show a clear improvement for the tissue thickness estimation (mean RMSE of 0.096 mm). This proposed CNN model outperformed previous support vector regression (mean RMSE of 0.155 mm) or Gaussian processes learning approaches (mean RMSE of 0.114 mm) and eliminated their restrictions for future research.