Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Current Directions in Biomedical Engineering

Joint Journal of the German Society for Biomedical Engineering in VDE and the Austrian and Swiss Societies for Biomedical Engineering

Editor-in-Chief: Dössel, Olaf

Editorial Board: Augat, Peter / Buzug, Thorsten M. / Haueisen, Jens / Jockenhoevel, Stefan / Knaup-Gregori, Petra / Kraft, Marc / Lenarz, Thomas / Leonhardt, Steffen / Malberg, Hagen / Penzel, Thomas / Plank, Gernot / Radermacher, Klaus M. / Schkommodau, Erik / Stieglitz, Thomas / Urban, Gerald A.

CiteScore 2018: 0.47

Source Normalized Impact per Paper (SNIP) 2018: 0.377

Open Access
See all formats and pricing
More options …

Image based reconstruction for cystoscopy

Matthias Brischwein / Thomas Wittenberg / Tobias Bergen
Published Online: 2015-09-12 | DOI: https://doi.org/10.1515/cdbme-2015-0113


This paper summarizes our initial efforts to reconstruct the urinary bladder from endoscopic images acquired in the clinical routine. We found that up to now, only very few attempts have been reported which achieve a true 3D reconstruction of the human bladder. One promising approach which yields a geometric reconstruction up to scale from a monocular stream of images is highlighted and our initial results obtained from adapting the method for its use in clinical cystoscopy are presented.

Keywords: Cystoscopy; Image Based Reconstruction; Multiple View Geometry; Structure-from-Motion

1 Introduction

Despite the steady progress achieved in improving image quality, navigating and working with cystoscopes remains difficult. Not only do they challenge hand-eye-coordination, but size constraints strongly limit the field of view. Having visual access to what is in the proximity of the observed bladder tissue hence is of clear benefit. With digital cystoscopes at hand, technologies are developed to stitch digital images from endoscopic video to expand the effective field of view [2, 4, 15, 22]. For this, geometric surface reconstruction is an essential task [6, 9]. Traditional computer vision mostly focuses on man made environments and algorithms successfully applied in this domain often build implicitly or explicitly on the assumption of distinctive rigid geometric structures and/or textures. In medical endoscopy, the scene is mostly wet, glossy and of seemingly undefined shape. The physician needs to interact with the tissue and blood or other particles may enter the scene to complicate the reconstruction task.

2 Image based reconstruction

Image based reconstruction of the geometry of a scene is the inverse problem of finding the 3D-to-2D mapping of X3 to corresponding points X2 in an image I with X unknown at the same time, i.e.


with K ∈ ℝ2 × ℝ3 the camera calibration, R ∈ ℝ3 × ℝ3 the rotation and t=RC the translation of the camera coordinate system C3 in world coordinates. The factor is due to the ambiguity in scale and arbitrary for unitless reconstructions. When treated in homogeneous coordinates, Equation1 becomes


with P ∈ ℝ3 × ℝ4 [9]. Euclidean transformations in 3D are linked to their image by a linear model with a total of 15 degrees of freedom (DoF) in projective space [9]. To find a reconstruction of P and X˜ which matches our Euclidean perception of the world up to scale, at least eight out of these 15 DoF need to be fixed with prior knowledge from the scene, camera setup, or both. One can distinct between dense 3D reconstruction and feature based 3D reconstruction. For dense methods we refer to [18].

Sparse multiple view reconstruction is achieved in a series of steps covering pre-processing, 2D feature detection and pairwise matching, 2D image registration, consistent tracking of features over multiple frames, the iterative optimization of camera locations and scene geometry, and a post-processing step to interpolate the continuous surface between sparsly reconstructed 3D points and to texture map and blend image pixels to that surface [10, 19]. For a comparison of popular feature detectors we refer to [13].

Although feature matching is a combinatorial search problem, efficient approximate solutions exist, based on hierarchical decomposition of feature space [16]. Images I andI′ are registered to each other by computing the planar homography χ˜=Hχ˜,H3×3 between feature correspondences (χ˜,χ˜) for which RANSAC [8] is often the robust method of choice. For 3D reconstruction bundle adjustment (BA) is the preferred method [21]. Latter is a modified Levenberg-Marquardt (LM) global minimization in the parameters of P^i and X^j over all views {P0, … Pi, … PN} and 3D points {X˜0,X˜j,,X˜}, optimizing the projection error X˜ijP^iX˜^j22 [21]. In contrast to LM, BA exploits the sparse structure of the problem to significantly reduce computational costs [21].

2.1 Related work

Aside from cystoscopy, research actively investigates the problem of image based reconstruction in other environments subject to endoscopic inspection, such as abdomen, esophagus, colon or the respiratory system. For a comprehensive up to date review, we refer to Bergen et al. [6] and Maier-Hein et al. [12]. For cystoscopic environments though, little is found in literature on true geometric reconstruction of the urinary bladder from monocular image streams. Solutions for planar stitching i.e. the mapping of the scene onto a planar 2D surface, have been presented by several authors [2, 4, 5, 7, 22]. Although the planar projection model provides good oversight over the captured scene, it also produces significant distortion when the scene is indeed spherical. Consequently, estimating size and thus severity of certain lesions is problematic.

In [4] and [7], Daul et al. propose a method for 3D reconstruction of the bladder surface using active stereo vision. A laser light pattern is projected onto the tissue surface and tracked by the camera. The 3D motion of the camera is deduced by minimizing the perspectivity which relates the known light pattern in successive views via a 2D homography. The advantage of this method is that it does not rely on salient texture features which in endoscopic scenes are often hard to find.

In [2], Behrens et al. have investigated image stitching algorithms for fluorescence cystoscopy. To reduce the amount of distortion, they suggest to not project the whole scene onto a plane but to project multiple smaller surface mosaics on a hemicube which mimics the common depictions known by most urologists from anatomical textbooks. The authors further propose an approximate 3D reconstruction by defining a virtual spherical mesh and mapping stripes of image mosaics onto that mesh [3].

The approach presented by Soper et al. [20] is closest to our goal of generating a 3D surface reconstruction of the inner bladder wall based on cystoscopic video. The authors assume manual bladder inspection to be substituted with a robot that controls a custom ultra-thin, highly flexible endoscope to scan the bladder in spiral turns of 360 degrees at constant speed and distance to the surface.

Our approach detailed below is closely related to that in [20] but aims at transferring the proposed processing pipeline to endoscopic data acquired in manual inspections using a rigid or flexible cystoscope.

The processing pipeline as proposed by Soper et al.
Figure 1

The processing pipeline as proposed by Soper et al.

2.2 Cystoscopic reconstruction

Soper et al. suggest the reconstruction of the urinary bladder by iteratively optimizing the angular components of 3D features on the unit sphere and to do a final iteration with the spherical constraint removed to let scene points converge to the true shape (Figure 1).

SIFT features [11] are extracted and subsequent frames registered to each other using the Nearest to Second Nearest Neighbor Distance Ratio (NNDR) [19] on feature level and RANSAC [8] on frame level. To reduce the data set, Soper et al. propose to maximize the baseline between overlapping frames using a greedy adaptive step size heuristic. In our experiments we achieve a compression rate in the range of 70% to 90% on real images which is in the same range as reported in [20]. Given that N images were selected, the search space for loops is N(N1)2. To accelerate search, every frame is compared with only every third other frame unless overlap is detected. Then, the current frame is also compared against all frames neighboring the successful match, until the next mismatch. In an associative matching step frame i is finally compared with frame k, if i and k both match with frame j. With frame matching complete, 3D feature extraction follows for which pairwise feature correspondences need to be consistently assigned to disjoint tracks to register multiple corresponding views. The solution in [20] is to query for each feature correspondence the history of already mapped features, whether any in the corresponding pair is already assigned to some track. If not, a new track is started. If one feature from the pair is assigned, the other is assigned to the same track. If both features were already assigned but to different tracks, both features, including the associated tracks, are discarded. In our experiments we observed a rather high drop rate, effectively discarding close to 75% of possible tracks (Figure 4). After 3D point extraction, reconstruction is performed by adding frames and initializing new cameras in alternating iterations, each time followed by bundle adjustment (BA). For BA to converge in a global optimum, careful initialization is essential [21]. Therefore, new cameras are initialized by projecting visible points to the sphere and solving a Perspective-N-Point problem (PnP) according to [17]. With scene points reconstructed, Soper et al. go to fit a spline to the point cloud to reconstruct the surface which is textured using weighted average blending.

(a) Frame matching on the virtual bladder phantom over a sequence of 400 frames. Black pixels denote a match. 98 frames were selected for reconstruction (Compression: 75.5%). The off-diagonals are characteristic for spiral-like camera motion; (b) Sparse reconstruction with the spherical constraint in place. Scene points are magnified for visual impression.
Figure 2

(a) Frame matching on the virtual bladder phantom over a sequence of 400 frames. Black pixels denote a match. 98 frames were selected for reconstruction (Compression: 75.5%). The off-diagonals are characteristic for spiral-like camera motion; (b) Sparse reconstruction with the spherical constraint in place. Scene points are magnified for visual impression.

Up to now, our algorithm follows very much along the lines of Soper et al. However, we do not assume a robot controlled motion of the cystoscope but one that is freely maneuvered. We chose to take SURF [1] for feature detection and we initialize camera poses using the EPnP algorithm by Moreno-Noguer [14]. SURF as well as EPnP are reported to show close to equal results albeit at less computational demand than the alternatives chosen in [20]. The frame selection described in [20] is susceptible to taint the subset of frames with uninformative ones as they are observed in periods of overexposure or rapid movement. This is due to the algorithm reducing the baseline in response to mismatched frames. At minimal baseline, it fails to drop neighoring frames if still no match is found. Instead, both frames are selected into the subset, ultimately keeping all of the most uninformative frames. We simply chose to reinitialize in those cases i.e. we wait until the next successful sequential match and then start adding frames again as proposed in [20].

2.3 Discussion

To verify our algorithm, ideal data is taken from a synthetic, computer generated phantom which mimics the non-spherical shape of the human bladder (Fig. 3a). The texture is chosen to be feature rich. To evaluate the frame analysis stage, image sequences from real cystoscopic procedures serve as input.

In Figure 2a, the result of frame matching for the virtual phantom is visualized. A virtual camera ride is performed at constant speed, following the spiral scan pattern described in [20]. As expected, the matrix of matched frames shows a sparse block-diagonal structure with the subset of previously selected frames on the main diagonal and non-sequential matches in the upper triangular half. Depending on spiral overlap, loops are detected with a certain periodicity resulting in further diagonals in the match table. Figure 2b shows a first result after BA.

(a-b) In manually controlled examinations, urologists tend to scan the bladder in a star-like trajectory joining in pivot points with high discriminative power (indicated by artificial black lines); (c) Match matrix for frame matching, on real data over a sequence of 3000 frames. Black pixels denote a match. 258 frames were selected for reconstruction (Compression: 91.4%).
Figure 3

(a-b) In manually controlled examinations, urologists tend to scan the bladder in a star-like trajectory joining in pivot points with high discriminative power (indicated by artificial black lines); (c) Match matrix for frame matching, on real data over a sequence of 3000 frames. Black pixels denote a match. 258 frames were selected for reconstruction (Compression: 91.4%).

For a real bladder inspection, the match table is depicted in Figure 3c. The results are obtained over a sequence of 3,000 frames from which 258 views were automatically selected for further matching. The pattern of a manually controlled bladder examination is quite different from the spiral pattern, observed in Fig.2 and much sparser. Instead of showing off-diagnoals, additional matches are found on diagonals orthogonal to the main diagonal. This pattern is characteristic for urologists who scan the bladder on a star-like trajectory towards the ureter with some salient fix point chosen on the bladder wall to which they periodically return before rotating the cystoscope and starting a new scan track (Figure 3a-b). Due to re-initialization, the matrix is not contiguous on the main diagonal. This is no problem as long as the overall graph of matched frames is connected i.e. there are no unconnected subgraphs. For this particular scene, no unconnected subgraphs are found for subgraphs are connected by very few pairwise matches with frames showing the fix point chosen by the urologist.

From evaluating scene point extraction, a high loss of possible 3D features was found. For the synthetic scene, 14,060 tracks were established over 98 frames. Although showing a distinctive texture, from these, only 5,681 (40%) of tracks were found consistent over more than three frames (Fig. 4). The average length for robust tracks is 3.60 frames. In the above experiment on real data over 3,000 frames, 258 views were selected. From these, 15,074 tracks were initialized during scene point extraction. After invalidating inconsistent tracks and those declared too short (L ≤ 2) the number of retained tracks amounts to only 3,145 (21%). The average length is 3.52.

Distribution of tracks lengths for the synthetic bladder scene. 98 frames were tracked. 60% of the tracks created in course of scene point extraction got rejected either due to inconsistencies or because they are were short (L ≤ 2).
Figure 4

Distribution of tracks lengths for the synthetic bladder scene. 98 frames were tracked. 60% of the tracks created in course of scene point extraction got rejected either due to inconsistencies or because they are were short (L ≤ 2).

With respect to runtime, the system is currently far from real-time. Frame matching is in the range of hours on commodity hardware. However, room for improvement is expected in a timely decoupling of frame analysis and reconstruction into separate processing threads.

2.4 Conclusion

Our interest is in geometric reconstruction of the urinary bladder from cystoscopic video for which we adapt and evaluate a method recently described in [20]. In our experiments we observed a high drop rate of features in the 3D feature extraction stage, effectively creating a high loss in geometric detail. In next steps we keep on validating our algorithm on video material from cystoscopic procedures while adding surface interpolation, texture mapping and blending. We further plan to improve multiframe tracking to reduce the loss of geometric detail.


  • [1]

    Bay H, Tytelaars T, Gool L. SURF: Speeded Up Robust Features. Europ. Conf. on Computer Vision 2006; 3951: 404–417. Google Scholar

  • [2]

    Behrens A, Stehle T, Gross S, Aach T. Local and global panoramic imaging for fluorescence bladder. Engineering in Medicine and Biology Society 2009; 6990–6993. Google Scholar

  • [3]

    Behrens A, Heisterklaus I, Müller Y, et al. 2-D and 3-D Visualization Methods of Endoscopic Panoramic Bladder Images. Medical Imaging 2011: Visualization, Image-Guided Procedures, and Modeling 2011; 7964. Google Scholar

  • [4]

    Ben-Hamadou A, Daul C, Soussen C, Rekik A, Blondel W. A novel 3D surface construction approach: Application to three-dimensional endoscopic data. 17th IEEE Int. Conf. on Image Processing 2010; 4425–4428. Google Scholar

  • [5]

    Bergen T, Wittenberg T, Münzenmayer C, Chen CCG, Hager G. A graph-based approach for local and global panorama imaging in cystoscopy. Proc. of SPIE 2013; 8671. Google Scholar

  • [6]

    Bergen T, Wittenberg T. Stitching and Surface Reconstruction from Endoscopic Image Sequences: A Review of Applications and Methods. IEEE Journal of Biomedical and Health Informatics 2014; 99: 2168–2194 Google Scholar

  • [7]

    Daul C, Blondel W, Ben-Hamadou A, et al. From 2D towards 3D cartography of hollow organs. Proc. 7th Int. Conf. on Electrical Engineering Computing Science and Automatic Control 2010: 285 – 293. Google Scholar

  • [8]

    Fischler M, Bolles R. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM 1981; 6(24): 381–395. Google Scholar

  • [9]

    Hartley R, Zisserman A. Multiple View Geometry in Computer Vision. Cambridge UK: Cambridge Univ. Press 2003. 

  • [10]

    Koppel D, Chen C-I, Wang Y-F, et al. Toward automated model building from video in computer-assisted diagnoses in colonoscopy. Medical Imaging 2007: Visualization and Image-Guided Procedures 2007; 65091–65091L-9. Google Scholar

  • [11]

    Lowe D. Distinctive Image Features from Scale-Invariant Key-points. Int. Journal of Comp. Vision 2004; 2(60): 91-110. Google Scholar

  • [12]

    Maier-Hein L, Mountney P, et al. Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery. Med. Img. Analysis 2013; 8(17): 974 – 996. Google Scholar

  • [13]

    Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Trans. on Pattern Analysis and Machine Intelligence 2005; 10(27): 1615 – 1630. Google Scholar

  • [14]

    Moreno-Noguer F, Lepetit V, Fua P. Accurate Non-Iterative O(n) Solution to the PnP Problem. IEEE Int. Conf. on Computer Vision 2007: 1–8. Google Scholar

  • [15]

    Mountney P, Yang G. Dynamic view expansion for minimally invasive surgery using simultaneous localization and mapping. IEEE Conf. on Engineering in Medicine and Biology 2009: 1184 – 1187. Google Scholar

  • [16]

    Muja M, Lowe D. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration. International Conference on Computer Vision Theory and Application 2009: 331–340. Google Scholar

  • [17]

    Quan L, Lan Z. Linear N-point camera pose determination. IEEE Trans. on Pattern Analyis and Machine Intelligence 1999; 8(21) : 774 –780. Google Scholar

  • [18]

    Seitz S, Curless B, Diebel J, Scharstein D, Szeliski R. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. IEEE Conf. on Computer Vision and Pattern Recognition 2006; 1: 519–528. Google Scholar

  • [19]

    Szeliski R. Computer Vision: Algorithms and Applications. Springer 2011. 

  • [20]

    Soper T, Porter M, Seibel E. Surface Mosaics of the Bladder Reconstructed From Endoscopic Video for Automated Surveil-lance. IEEE Trans. on Biomed. Eng. 2012; 6(59): 1670–1680. Google Scholar

  • [21]

    Triggs B, McLauchlan P, Hartley R, Fitzgibbon A. Bundle Adjustment – A Modern Synthesis. In: Vision Algorithms: Theory and Practice. Springer 2000: 298–372. Google Scholar

  • [22]

    Weibel T, Daul C, Wolf D, Rösch R, Guillemin F. Graph based construction of textured large field of view mosaics for bladder cancer diagnosis. Pattern Recognition 2012; 12(45): 4138 – 4150. Google Scholar

About the article

Published Online: 2015-09-12

Published in Print: 2015-09-01

Author’s Statement

Conflict of interest: Authors state no conflict of interest. Material and Methods: Informed consent: Informed consent has been obtained from all individuals included in this study. Ethical approval: The research related to human use has been complied with all the relevant national regulations, institutional policies and in accordance the tenets of the Helsinki Declaration, and has been approved by the authors’ institutional review board or equivalent committee.

Citation Information: Current Directions in Biomedical Engineering, Volume 1, Issue 1, Pages 470–474, ISSN (Online) 2364-5504, DOI: https://doi.org/10.1515/cdbme-2015-0113.

Export Citation

© 2015 by Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in