One of the main intensively developed directions in the field of information technologies is the computer vision. The set of different high-tech and commercially successful products has been realized on the basis of the computer vision theory and available technical systems. This fact promotes growing interest to this field both from developers and consumers.
The term “computer vision” combines a set of technologies using digital image processing for the solution of pattern recognition tasks, for the definition of their form, orientation in space, and other characteristics. At the same time this information about the objects can be used for the solution of several practical tasks. For example: motion control of different systems on the basis of visual analysis of their environment, production quality control, person identification by the facial shape, 3D object surface scanning, etc.
Computer vision systems are divided into two main categories: two-dimensional (2D) and three-dimensional (3D). 2D systems provide processing of “flat” images in X and Y coordinates. While 3D systems provide processing of 3D-images, where besides X, Y coordinates, the depth coordinates Z are also present. In their turn the coordinates can be presented in different pixel or global forms. The global system of coordinates is the main system for position definition of all scene objects.
One of the most important computer vision tasks is definition of the 3D coordinates of an arbitrarily chosen point on the observation scene of a stereoscopic system. It was necessary for the authors of the article to solve this problem during the realization of one of the robotic systems with the computer vision ability. For this purpose the wide literature review was conducted. However in none of the reviewed articles we could find the clear, consequent and logically-shaped presentation of the methods of this problem solution such as presentation of schemes, formulas, algorithms and experimental verification.
In all the considered works [1, 2, 3, 4, 5, 6, 7, 8, 9] the optical scheme of stereoscopic system which is the basis of the geometric scheme construction was not studied. For example, in works [1, 2, 3, 4, 5] the geometric model of simple stereoscopic system provided by space coordinate calculation formulas is presented. However, these formulas are given only in geometric form without using optical scheme, their pixel interpretation is not provided, and there is no substantiation of the geometric scheme construction. Below we provide the strict proving that the formula of the depth coordinate Z differs from the similar formula presented in the work . The formulas considered in the works [1, 2, 3, 4, 5] have geometric representation and can not have practical application for real stereoscopic systems, since the real data of the obtained images is presented in the pixel form. In the work  it is indicated that the center of global coordinates coincides with the image center of the left camera. In these works it is not pointed out where this image is located: either on the sensor or in the lens plane. In the work  the formula for the depth coordinate Z calculation is presented for the global coordinate system. For calculation, the source pixel coordinates in the image stereopair are used. However, the process of this formula obtaining was not disclosed, and there are no schemes of its optical and geometric interpretation. In the work  some stereo vision task solutions are also considered, in particular, the method of point coordinate definition by determination of the cross-point of two rays perpendicular to the planes of the cameras. Also, there are no exact formulas for calculation of space coordinates of the scene arbitrary point in this work.
In the works [6, 8, 9] the questions of practical application of 3D computer vision stereoscopic systems are considered. In particular, the work  presents the description of the geoinformation system with integration of virtual reality and the stereoscopic system. In the work  the questions of using the computer vision stereoscopic system for controlling the movement of the robot-manipulator are studied. This work proposes to use the visual feedback to achieve the high accuracy of the servo system positioning relative to the observed aim neglecting a calibration error. The work  provides information on using 3D computer vision stereoscopic system for detection of obstacles on a road for traffic safety.
The work  analyses object distance calculation errors using the stereoscopic system. It is found out that the percentage error of distance calculation is inversely proportional to the pixel number used in the shift between two images, and directly proportional to the object distance.
On the basis of the conducted review it is possible to make the following conclusions:
Among the numerous publications available both in the Internet and in journals, we couldn’t find the exact and consequent description of the question considered in this article;
In none of the sources we could find the stereoscopic system optical scheme suitable for 3D coordinate calculation of the scene points. This optical scheme is the basis for the geometric scheme construction enabling to calculate 3D coordinates;
In many publications this question is studied superficially, and there are no links to sources where it is possible to find the detailed description and construction of the proposed formulas;
Most of the publications don’t touch upon the tasks of representation of scene point global coordinates through this point pixel representations on stereo images. As a rule, these formulas have geometric interpretation, which makes their practical use difficult;
There is no program realization of the studied problem in almost all of the considered articles;
In all the works studied by us, the information on the given subject is presented in small extent that results in difficulties in its understanding and validity checking.
In this work we made an attempt to eliminate all these shortcomings. At first, the formulas for material point global coordinate calculation in geometric form were obtained, with their further transformation into the pixel form. Undoubtedly, the suggested work doesn’t study all the problems in the field of calculation of point 3D coordinates using the stereoscopic system, and is only some addition and generalization of the earlier studied works. In this article we attempted to present the solution of the considered task in an easy and understandable form.
As there is a considerable need for relatively simple in realization but effective 3D machine vision systems, this article can be used for development of such systems. This article presents the detailed description of relatively simple 3D machine vision system, which notwithstanding its simplicity can be applied for the solution of a number of real practical tasks. For example, it can be used in the field of home robotics where not so strict requirements are demanded for accuracy characteristics of applied algorithms.
2 The Task Formulation
The task of this article is the detailed consideration of the method of calculation of 3D global point (point object) coordinates by using an image stereopair, which obtained from two cameras with parallel optical axes and containing the given point image. In this case, any visible object with its sizes on the image less than the whole image sizes can be viewed as the point. For solving the given task the construction of 3D point coordinate calculation formulas is realized on the basis of geometric approach. For this purpose the optical scheme for paths of rays in the stereoscopic system is used, and the triangulation methods are applied. The stepwise description is given as an example for algorithmic and hardware realization of simple computer vision system for calculation of small shining lamp 3D coordinates, and the practical recommendations for definition of the optical system unknown parameters are suggested. In conclusion, the results of the comparative evaluation of coordinate calculation accuracy are given for the considered method, and the recommendations for the solution of the number of practical tasks with the use of this method are provided.
3 Solution of 3D coordinate calculation task for a material point
Let the arbitrary point P(x, y, z) in 3D space be taken in the view field of two video cameras, where x, y, z are the global coordinates that should be defined. The optical scheme of the considered stereoscopic system is presented in Figures 1a and 2a. The 3D orthogonal system of XYZ coordinates was applied for this scheme, its initial point O coinciding with the left camera matrix center, and axis Z being perpendicular to plane YOX containing CCD matrixes of the left and the right video cameras.
The left and the right video cameras are shown in the scheme (Figure 1a), their optical axes are parallel and they pass through points O1, O2 correspondingly, and are also perpendicular to the plane of the CCD matrix location. Let the distance O1O2 between the axes of two cameras be denoted as b, that is usually named as the “baseline”.
According to similarity of right triangles ABCL and AGP, it follows that
and according to similarity of right triangle KECR and KSP, it follows that
Since AB = KE = h, WP = x, GW = BCL and AG = KS = z, then from Equation (1) it is possible to derive the following:
Since PS = O1 O2 − x + O2 K and US=CRE, then PS = b − x + CRE, and from Equation (2) the following relation is obtained:
Since the right parts of the latter relations are equal, then
It results in
Then from Equation (3) the following relation is obtained:
According to similarity of right triangles PTM and CLVM, it follows that
Since VM = h, PT = PJ + JT, PJ = y, JT = CLV and TM = z, then Equation (7) is transformed as follows:
From Equation (6), the following relation is derived
Substituting it into the latter expression, we obtain
Thus, on the basis of the above-mentioned calculations, the space coordinates of the arbitrary point are calculated by the following formulas:
Since these formulas were obtained as geometric relation intervals, and the CCD matrix image is presented by the pixel system of coordinates, then these formulas should be matched with pixel representation. Relatively to the left and the right cameras, the CCD matrixes have their origins of pixel coordinates OL and OR correspondingly which are situated in the upper left corner of each matrix. The directions of pixel axes OLXL and ORXR of cameras along the width coincide with the direction of the axis O1X, and directions of pixel axes OLYL and ORYR along the height are opposite to the direction of the axis O1Y. It is conditioned by the fact that in the images the coordinate origin is the upper corner of the CCD matrix, and the values along the axis YL increase from bottom to top (relative to the axis O1Y these values decrease), and along the axis XL the values increase from left to right (i.e. similarly as along the axis O1X). It should be noted that each CCD matrix of both video cameras has the same fixed sizes: w is the width in pixels along the axis XL, and d is the height in pixels along the axis YL. In its turn, each pixel of video camera CCD matrixes has the same size m both along the width and along the height.
Let’s suppose that the arbitrary point P(x, y, z) exists in the space of the stereoscopic system scene, and its stereopair is depicted on the flat image of CCD matrixes as two pixel points PL(xL, yL) and PR(xR, yR) of the right and the left video cameras correspondingly (Figure 1b). The following task arises: how to calculate the 3D coordinates of the material point P(x, y, z) using its pixel images of the stereopair PL(xL, yL) and PR(xR, yR).
To solve this problem it is necessary to make transition from the geometrically obtained calculation Equations (10) of global coordinates to their pixel interpretation. The centers O1 and O2 of CCD matrix in pixels have the following coordinates From the geometric scheme (Figure 1b) it is seen that as point P is situated within the stereoscopic system base, the coordinate xL of its image on the left camera is negative relative to O1Z, and it is positive relative to the right camera. It can also be seen from the geometric scheme that if point P is situated strictly higher than the plane ZO1X, then its global coordinate on the image of the left camera CCD matrix is negative. Also from the stereoscopic system optical scheme presented in Figure 1a it is obvious that O1YL = O2YR, and that means that the global coordinates of CCD matrix images have the same height for the left and the right cameras, i.e. . Then the pixel points PL(xL, yL) and PR(xR, yR) have the following global coordinates:
To simplify the notation, as denote then the Equation (14) can be written in the following form:
4 Determination of Physical Sizes of an Optical Sensor Pixel
The simplest way of pixel physical size determination is obtaining of this information either from the camera producer technical documentation or from its optical sensor documentation. In the sensor technical documentation this information is usually available. However in most cases camera producers do not point out the pixel physical sizes for certain commercial reasons in their documentation. Also there is usually no information about the producer and the optical sensor model number. In such cases it is possible to carry out the manual measurement. For this purpose it is necessary to disassemble the camera, to put a ruler along the sensor and to do a digital snapshot with high resolution (Figure 4). Using the simple graphics editor, the following parameters are determined in pixels in the snapshot: the length of one centimeter (C), and also the width (W) and the height (H) of photosensitive sensor region (in the Figure 4 – the nacreous square). In the course of the conducted measurements the following values were obtained: C = 712, W = 312, H = 238. In addition, it is known that the considered camera has the actual maximal resolution 1600x1200 pixels (RW = 1600, RH = 1200). The physical width and the length of one pixel in millimeters were calculated in the following way for this camera:
Thus after averaging and rounding, the physical pixel size for the considered camera can be taken equal to 0.0028mm.
5 The hardware and software realization
In Figure 5 shows the external view of the stereoscopic system consisting of two cameras with parallel optical axes mounted on the tripod. Further, the video monitor used for displaying coordinate measurement results of the small shining lamp of the simple pocket flashlight. In this case the flashlight lamp was used as the “material point”. The incandescent lamp was used whose essential part of energy is dispersed in the infrared (thermal) range. The technical specification of the used cameras is presented in Table 1. The cameras were modified by mounting of the light filter that eliminates the visible specter of light illumination and transmits the infrared specter. For this purpose the developed photographic film which had been preliminarily exposed under the daylight was used. The cameras were connected to the computer through the USB-port. On the basis of Equations (15), the software development was realized for verification of the given method accuracy. This software allows to calculate 3D coordinates of the lamp. The calculation is performed according to the following principles:
Two images are received into the program (one from each camera).
The images are additionally filtered by the program due to the given threshold of the brightness level and are transformed into the black-white format. The threshold of the brightness level is chosen by experimental way so that only one brightest spot which corresponds to the shining lamp is identified on each image. Thus, only one pixel set is identified on each image.
The mass centers for each identified pixel set are calculated by coordinates X and Y which define the central pixels of light spots on images of right and left cameras correspondingly: PL(xL, yL) and PR(xR, yR).
Using Equation (15), the calculation of 3D lamp coordinates and their subsequent displaying on the screen are realized.
Preliminarily the following parameters b, h, m, w, d are set up in the program.
All the above-mentioned operations are realized every time in the process of renewal of images which are received from the cameras into the program according to the fixed frame rate. It allows executing the automatic calculation of the global coordinates of the shining lamp during its movement in front of the stereopair. If the frame rate is set at the sufficient level, then the coordinate calculation is performed in the real-time mode.
Also the program has the manual identification mode on images of the object center whose coordinates are to be calculated. This mode can be used when the object is located relatively far, and, correspondingly, its image is not identified well enough by the above-mentioned automatic method.
6 The Calibration Algorithm of Mutual Parallelism of Optical Axes of Cameras in the Stereoscopic System
From the stereoscopic system geometric scheme (Figure 1b) it is seen that even small deviation of optical axes from their mutual parallelism state can result in considerable coordinate measurement inaccuracy which will grow during measured distance increasing. For facilitation of the stereoscopic system adjustment process, the simple calibration procedure of camera optical axis mutual parallelism can be used.
The calibration process is based on the fact that according to Equation (14): increasing the distance to the observed point object results in decreasing the difference xL − xR. In case of using the discrete digital optical systems, if the distance to the object is many times more than the base distance b between the camera optical centers, then the difference xL − xR will be equal to zero for a considerably distant point object. If the optical axes of the stereoscopic system shown in Figures 1 – 2 are absolutely mutually parallel, then the considerably distant point object in both images should have the same coordinates (xL = xR, yL = yR).
Thus, we can use the following calibration algorithm of mutual parallelism of optical axes of cameras:
To direct the calibrated stereoscopic system at the very distant point object which is well identified in both images.
To carry out the sequential adjustment of the camera mutual orientation in different planes for providing the equality of the coordinates (xL = xR, yL = yR) of the chosen point object in both images.
To make the rigid fixation of the mutual position of the cameras.
It is enough to perform this procedure only once before using the stereoscopic system.
7 The Experimental Measurement of the Object Distance Estimation Accuracy
By the above-described soft-hardware system, the set of experiments were conducted which enables to make certain conclusions about the accuracy properties of the given approach and about the conditions for the best accuracy achievement. The calculation accuracy of the depth distance to the object (coordinate Z) was being defined. The experiment was conducted in two stages: for close distances with relatively small baseline b = 9.7 cm (Figure 6a) and for far distances with enlarged baseline b = 51.15 cm (Figure 6b). The measurement inaccuracy was calculated, and the inaccuracy-distance diagram was built (Figure 7). The following equipment was additionally used in the experiment: the millimeter cross-section paper, the low-power laser, two triangular rulers, a sheet of A5-paper, the tripod with plumb and the small colored triangle fixed on this plumb.
At the first stage of the experiment, the baseline between the cameras was fixed as equal to 9.7cm. Such choice of distance was conditioned by the distance between human eyes. The long band of millimeter cross-section paper was stretched on the plane surface, where the stereoscopic system was placed in such way, that its left camera center O1 exactly coincided with the coordinate start point marked on the paper (Figure 6a). The plumb fixed on the left camera was also used for this purpose. The plumb with the fixed object for distance measuring which was mounted on the tripod was sequentially being fixed in the distance from the stereoscopic system with the interval 10cm. After each object replacement, the measurement of the distance to the object both by physical way (using cross-section on the paper) and by algorithmic way was performed. In this experiment for obtaining the highest accuracy, the colored paper triangle fixed on the plumb was used as the object for distance measuring instead of the incandescent lamp. The coordinate calculation was performed by semiautomatic way: at first the operator manually using the mouse pointed out the upper vertex of the triangle in both images of the stereopair, and then the algorithmic calculation of the object coordinates was carried out. In this case, the cameras worked in standard mode without preliminary installation of the special light filter. The use of the plumb allowed realizing precise vertical alignment of the object and the cameras relatively to the horizontal plane. The results obtained by physical measurements were chosen as standard ones, and inaccuracy of algorithmic results was calculated relatively to them. The results of the conducted experiment are presented in Figure 7a.
It should be noted that the object coordinate calculation accuracy depends both on b, h, m, w, d definition accuracy and on the degree of mutual parallelism of optical axes of the left and the right cameras. At the first stage of the experiment their parallelism was adjusted manually by visual way (“by sight”). In this case, if the definitions of the above-listed parameter values are not so difficult, then the optical axes parallelism adjustment in manual way is not a trivial task. The above considered calibration algorithm was used for this purpose.
At the second stage of the experiment, b = 9.7 cm for close distances (up to 400cm.) and b = 51.15cm for far distances (from 400cm and more) were used. In both cases the calibration of mutual parallelism of camera optical axes was preliminarily performed, that helped to substantially decrease the absolute inaccuracy for relatively far distances in comparison with the results obtained after the calibration “by sight” (Figure 7b). For far distances, the point of laser beam falling on the vertically placed sheet of white paper was used as an object to which the distance was measured. To provide the vertical position, the sheet was attached to the stand consisting of two triangular rulers fastened together.
As it is evident from the above-given experiment results, while increasing the measured distance, the increase of inaccuracy according to the exponential law also occurs (Figure 7). Carrying out of the preliminary calibration of camera optical axis mutual parallelism enables to considerably decrease this inaccuracy for far distances. For providing the acceptable accuracy, the base distance b between the cameras is to be chosen on the basis of the following rule: the farther is the object from the stereoscopic system, the longer should be the base distance, and on the contrary – the nearer is the object, the smaller must be b.
In the process of the development of effective computer vision systems it is possible to use the existing open source libraries for software development [12, 13]. However, some of developers face certain difficulties: insufficient understanding of the essence of algorithms doesn’t allow realizing their fine-tuning for exact machine vision hardware system. In its turn, it results in considerable degradation of accuracy characteristics, and it makes them unusable for practical application in real systems. Since these libraries are often developed according to the principle of universality, this leads to certain redundancy of algorithmic realization of their functions. This redundancy can be undesirable in the development of high-performance or embedded systems.
As there is a considerable necessity for relatively simple but effective 3D machine vision systems, the information given in this article can be used for the development of such systems. The solution of 3D point coordinate calculation task using the stereoscopic system with parallel optical axes was presented in this study. Some problematic issues were considered in the sequential and understandable form, and their solutions are sufficient for the creation of relatively simple 3D machine vision systems that are also quite suitable for real applications. In particular, the described algorithms can potentially be used in robotic 3D machine vision systems. For example, on the basis of the above described algorithms it can be developed a relatively simple machine vision system which identifies one-colour object on the image stereopair by the preliminarily set colour and calculates its space coordinates. For calculation of space object coordinates, the coordinate mass centers of constituent pixels are used as 2D object coordinates on the image stereopair. Further, the robotic system can use these coordinates for acquisition of an object or for movement to it. The above described algorithms were used for realization of the movable robotic platform equipped with the stereoscopic 3D machine vision system whose external view is presented in Figure 8.
The algorithms for map building of pixel mutual disparity on two images comprising a stereopair can be used for functionality extension of the 3D machine vision system [10, 11]. It is possible to find for each pixel of one image its corresponding pixel on the other image with the help of these algorithms (on the condition that the given pixel is observable on the both images). Then its space coordinates can be calculated with the use of Equation (15). Thus, the 3D mesh of the observed scene can be formed on the basis of the image stereopair, i.e. 3D scanning can be performed.
The authors would like to thank Dr. Dzhangerey Ashigaliyev for his contributions to this work. This work is supported by the Science Committee of RK under grant num. BR05236839.
Kristian Ambrosch, Martin Humenberger, Wilfried Kubinger, and Andreas Steininger. 2008. Flexible Hardware-Based Stereo Matching. EURASIP Journal on Embedded Systems 2008 (2008). http://dx.doi.org/10.1155/2008/386059
Myron Z. Brown, Darius Burschka, and Gregory D. Hager. 2003. Advances in Computational Stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 8 (Aug. 2003), 993–1008. Google Scholar
Umesh R. Dhond and Jake K. Aggarwal. 1989. Structure from Stereo - A Review. IEEE Transactions on Systems, Man, and Cybernetics 19, 6 (Nov. 1989), 1489–1510. Google Scholar
David Forsyth. 2003. Computer vision: A modern approach. Prentice Hall, Upper Saddle River, NJ. Google Scholar
Gregory D. Hager, Wen-Chung Chang, and Stephen A. Morse. 1995. Robot hand-eye coordination based on stereo vision. IEEE Control Systems Magazine 15, 1 (April 1995), 30–39. Google Scholar
Ta-Te Lin, Yuan-Kai Hsiung, Guo-Long Hong, Hung-Kuo Chang, and Fu-Ming Lu. 2008. Development of a virtual reality GIS using stereo vision. Computers and Electronics in Agriculture 63, 1 (Aug. 2008), 38–48. Google Scholar
Mauricio Marengoni and Denise Stringhini. 2011. High level computer vision using OpenCV. In Graphics, Patterns and Images Totarials (24th SIBGRAPI-T). IEEE, 11–24. http://dx.doi.org/10.1109/SIBGRAPI-T.2011.11
Eugene S. Mcvey and Jong W. Lee. 1982. Some accuracy and resolution aspects of computer vision distance measurements. IEEE Trans. Pattern Anal. Mach. Intell. 4, 6 (Nov. 1982), 646–649. Google Scholar
Sergiu Nedevschi, Radu Danescu, Dan Frentiu, Tiberiu Marita, Florin Oniga, Ciprian Pocol, Rolf Schmidt, and Thorsten Graf. 2004. High Accuracy Stereo Vision System for Far Distance Obstacle Detection. In IEEE Intelligent Vehicles Symposium. IEEE, 292–297. http://dx.doi.org/10.1109/IVS.2004.1336397
Chang S. Park and Hyun W. Park. 2000. A robust stereo disparity estimation using adaptive window search and dynamic programming search. Pattern Recognition 34, 12 (April 2000), 2573–2576. http://dx.doi.org/10.1016/S0031-3203(01)00016-4
Shishir Shah and Jake K. Aggarwal. 1997. Mobile robot navigation and scene modeling using stereo fish-eye lens system. Machine Vision and Applications 10, 4 (Nov. 1997), 159–173. Google Scholar
Linda G. Shapiro and George C. Stockman. 2001. Computer Vision. Prentice Hall, Upper Saddle River, NJ. Google Scholar
Shane Tuohy, Diarmaid O’Cualain, Edward Jones, and Martin Glavin. 2010. Distance determination for an automobile environment using Inverse Perspective Mapping in OpenCV. In Signals and Systems Conference. IET Irish, 100–105. Google Scholar
About the article
Published Online: 2018-05-08
Citation Information: Open Engineering, Volume 8, Issue 1, Pages 109–117, ISSN (Online) 2391-5439, DOI: https://doi.org/10.1515/eng-2018-0016.
© 2018 R.R. Mussabayev et al.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0