Research on online calibration of lidar and camera for intelligent connected vehicles based on depth-edge matching

,


Introduction
Autonomous driving technology is a field that has been pursued and realized in the process of human inventing cars and gradually mass production and popularization, and it is also the core technology of intelligent networked cars. In  [1]. The world's first autonomous driving navigation vehicle was successfully developed by a foreign team, and the vehicle can travel automatically on a preset track. In recent years, with the development and gradual improvement of computer science and artificial intelligence, autonomous driving technology has once again entered the road of rapid development [1,2]. The complete autonomous driving technology can be divided into: building high-precision maps; positioning in the initial state of the vehicle and driving state; the vehicle's perception technology of the surrounding environment; prediction of the possible situation in the surrounding environment. The global path planning based on starting point and end point indicates that the result of local path planning is converted into control signals to control the vehicle. The above six parts of automatic driving need to receive various information from sensors and process them before they can be used.
The quality of perception is optimized in the complex environment using a sensor-based methodology that can achieve the comprehensive perception. The fusion technology can be divided into multiple levels, including data fusion, feature fusion and model fusion. Through multilevel fusion, different types of high-precision information can be obtained: for example, data fusion can reduce measurement errors, feature fusion can improve recognition accuracy, and model fusion can improve the understanding of complex scenes. In the perception layer of autonomous driving systems, the frequently used sensors are RGB cameras (monocular, binocular) and multi-line lidar [2,3]. The laser mine emits laser light to the surrounding environment and receives the reflected laser light. The distance between the detected point and the lidar is calculated according to the time difference between receiving and reflection and the speed of light. Based on this ranging principle, lidar can obtain high-precision depth information of 360 degrees horizontally and a certain angle in the vertical direction [4,5]. However, the number of rays of lidar is limited, and only the sparse points with limited resolution can be attained.
The RGB cameras with monocular outcomes can be used for obtaining high resolution color and texture-based information from the environment while using 3D image data having high-precision depth information [6][7][8]. Therefore, the fusion of the two is also a hot research issue in autopilot sensor fusion. The achievement of high quality of sensor fusion, external parameters are needed to be calibrated first. The radar point cloud data is matched, so as to integrate the data at all levels [9].
This article enhances the practicality of the online calibration algorithms in actual autonomous driving scenarios. The research work focuses on prosing an online calibration method for intelligent networked automotive lidar and camera based on depth-edge matching. For initial value calculation and estimation of external parameters, hand-eye calibration is used. The solution of hand-eye calibration is optimized and accurate external parameters are obtained through data conversion. This study utilizes a CMA-ES algorithm for parameter optimization and they are compared with the conventional method based on edge matching. A significant improvement is observed using the proposed optimization approach and the parameters can be appropriately with further improvements.
Rest of this article is arranged as: Section 2 presents the literature review of the article followed by the description of self-calibration method based on depth-edge matching in Section 3. Section 4 describes the verification of outcomes obtained for self-calibration method based on depth-edge matching. Concluding remarks of this research work are indicated in Section 5.

Literature review
At present, researchers all around the world have conducted many studies on the calibration of cameras and lidars. Lyu et al. proposed the use of a glass calibration board that is harder than ordinary wood boards. The RANSAC algorithm uses the point cloud on the edge of the calibration board to fit the equation of the straight line on the edge of the calibration board, and calculates the vertices of the calibration plane through multiple straight lines. In order to calculate the plane equation of the calibration plate. This method uses the edge points of the point cloud. Although the harder glass calibration board is not easily deformed in space compared to the wooden board, because of the sparseness of the point cloud, the data at the edge of the calibration board is very noisy. Therefore, the fitting result has a larger error [10]. Computerized vehicles depend on exact and solid discovery of drivable regions, which are normally ordered into free space, street region and path data. Most current strategies use monocular or sound system cameras to distinguish these. Though, lidar sensors are getting increasingly normal and give interesting traits to street region location. Wulff et al. proposed a road area pixel-level semantic binary segmentation method based on an improved U-Net Full Convolutional Network (FCN) architecture. The fusion of the camera and lidar allows the characteristics of each sensor to be effectively and powerfully utilized in a single FCN. For the training of UView-Cam, several openly available street environment datasets were used, and UGrid-Fused was trained on the KITTI dataset. Both methods accomplish real-time performance with a better detection rate of about 10 Hz [11].
To understand the distinguishing proof and situating of the work object when the excavator is working freely, it is important to adjust and calibrate the binocular camera to confirm that the binocular camera lies in the inward and outer boundary parameters of the camera. Zhou et al. decided the transformation connection between the coordinate framework, the camera coordinate framework and the image coordination framework dependent on the rule of binocular sound system vision running. Select the chessboard picture as a test case that meets the calibration necessities by utilizing MATLAB's visual tool kit. The camera adjustment interaction and calibration is performed in order to decide the camera boundaries and parameters. Dissect and contrast the adjustment boundaries with at first confirm the precision of the calibrated boundary parameters. At last, the source of error is combined with OpenCV3.4.10 in the QT5.14.2 programming environment, and the calibration method meets the application requirements [12].
Although there are some automatic external calibration methods between camera and lidar, there are still various limitations in the application of these methods in automatic systems [13][14][15]. The external parameter calibration of autopilot system requires a more robust and online optimization solution. Therefore, this paper proposes an online calibration method of intelligent network-connected vehicle lidar and camera based on depth-edge matching, in order to provide a more robust and on-line optimized online calibration method of lidar and camera, and to provide technical support for intelligent network-connected vehicle.

Self-calibration method based on depth-edge matching
There are various self-calibration methods based on the depth-edge matching approach. The steps involved in selfcalibration are external parameter initialization, data conversion, parameter optimization and implementation of self-calibration algorithm based on depth-edge matching.

Initialization of external parameters
In this paper, the hand-eye calibration method is used to solve the problem of the initial value of the external parameters of the camera lidar. Hand-eye calibration requires that the relative pose of the sensor and the actuator is constant during the calibration process, which is marked as X. At a certain moment, the position change of the actuator in the coordinate system is recorded as A, and the position change of the sensor in the world coordinate system is recorded as B, then the three will satisfy: By solving this equation, the relative pose X can be solved. In order to obtain the correct result even if the movement amplitude is too small, the hand-eye calibration needs a suitable solution.
Consider the vehicle from time t i up to time t i+1 . For motion, cameras and lidar sensors are fixed on the vehicle. Among them, the reading of a camera such as a monocular camera, a binocular camera or an infrared camera is a two-dimensional image, which is recorded as I i with I i+1 . Camera from t i to t i+1 . Moment of movement is recorded as T C i . Can be calculated by visual odometer. Extract first I i with I i+1 . Then perform feature matching to calculate the camera movement. On the other hand, for lidar, the sensor reading is recorded as a point cloud P i with P i+1 , lidar at time t i to t i+1 . The movement is expressed asT L i , It can be easily estimated by the ICP algorithm. To improve T L i Estimate the accuracy and increase the calculation speed or eliminate the interference of abnormal points, and the point cloud can be down-sampled and filtered. Remember that the external parameter between the camera and the lidar is X, which is similar to the principle of hand-eye calibration. T C i , T L i , X satisfies the relationship: Among them T C i , T L i , X matrices are the respectively obtained as follows.

Data conversion
The result error of hand-eye calibration is huge, and more precise optimization algorithms are needed to optimize the final result [14][15][16]. However, there are differences in the properties of camera images and 3D lidar point clouds, and it is difficult to directly find the correspondence between the two features that can be used for calibration. Therefore, data conversion is needed to assist in establishing the correspondence between the two. Given any point in the three-dimensional space, Figure 1 shows the process of camera imaging and lidar scanning to obtain data, and intuitively shows the external parameter transformation relationship between the two sensors. Assuming that the external parameters include rotation matrix R, translation vector t, the data can be converted between "2D to 3D" and "3D to 2D" between sensors. Given the internal parameters of the camera C and the external parameters between the camera and the lidar L, the j-th point in the lidar point cloud can be anticipated to the two-dimensional image plane of the camera: Given the internal parameters of camera C1 and the external parameters of cameras C1 and C2, the threedimensional point in the coordinate system of camera C2 can be anticipated to the two-dimensional image plane of camera C1: On the contrary, given the camera C the j point in the pixel coordinate system, the corresponding 3D point is given by: Through the above-mentioned data conversion, the camera image and the lidar point cloud can be calculated in the same form, and the optimization algorithm is used to find the accurate external parameter value. Although the rotation matrix is a commonly used form to describe spatial rotation, it ensures the constraints and is not suitable as an optimization parameter. Therefore, in the optimization of external parameters, this paper uses a vector of 6 degrees of freedom, which is recorded as Θ = (r, t) ∈ se(3). Θ contains 3 degrees of freedom rotation vector r = (r1, r2, r3). The translation vector containing 3 degrees of freedom can be obtained by exponential mapping t = (x, y, z).

Parameter optimization
In the calibration method proposed in this paper, the objective function of external parameter optimization is a non-convex function of the 6-degree-of-freedom vector, so this paper chooses CMA-ES as the optimization algorithm [17][18][19][20]. The CMA-ES algorithm is proven to perform well in complex optimization problems with medium variable scale. In actual optimization, the objective function is more sensitive to rotation in external parameters, so an optimized scale primer is added to balance the imbalance between rotation and translation optimization, and the optimization parameters are modified to: Θ = (r, λs t) = (r 1 , r 2 , r 3 , λs x, λs y, λs y).

Initialization based on hand-eye calibration
The movement of the camera and lidar can be calculated by the visual odometer and ICP registration respectively. In order to solve the problem of scale loss in the visual odometer, the method in this paper introduces monocular depth estimation, which converts the visual odometer problem into a PnP problem: The set of feature points at time t for a given camera image s i = p 1 i , p 2 i , . . . , p n i . And the corresponding feature point set at t+1.
}︀ . Through Eq. (5) and monocular depth estimation, S i can be transformed into a three-dimensional point set in the camera coordinate system S i = P 1 i , P 2 i , . . . , P n i . Given point set S i and S i+1 , camera movement R C i with t C i . It can be optimized by the Levenberg-Marquardt method: Given m camera movements T C 1 , T C 2 , . . . , T C m . And the corresponding lidar movement T L 1 , T L 2 , . . . , T L m . Can be decoupled from Eq. (2) and get: This equation can be solved by constructing the least square method: Seeking R X Later, the translation between the camera and the lidar can be obtained by the following relationship:

Calibration based on depth matching
Optimization based on depth matching is used to reduce the error of the first stage results. First, use the results of the first stage, Θ the lidar point cloud can be projected to get the lidar depth map D L ; At the same time, the camera's depth map D C can be obtained by using monocular depth estimation. If given Θ is accurate, then any point in D L corresponds to the point in D C with the same coordinate value and the same depth value. Because Θthere is an error, so the coordinate value of the corresponding point in D L and D C will shift. Therefore, the average depth difference of D L and D C is used as the objective function pair Θ optimize:

Calibration based on depth edge
The accuracy of monocular depth estimation is limited, which results in the accuracy of the calibration results ob-tained by the aforementioned method is also limited. Therefore, it is necessary to use completely accurate information for the final accurate calibration. This paper introduces a method based on edge matching to perform the final accurate calibration. Given a calibration result based on depth matching Θ. As the initial value, the edge feature map E C of the image can be obtained from the camera image, and the Lapalcian operator is used to extract the edge from the image. From the assumption that the depth of the lidar point cloud corresponds to the edge feature, the lidar edge feature map E L can be obtained, and the external parameters can be optimized by using the matching degree of E C and E L as the objective function: Where ⊙ represents the multiplication operation under the same coordinate value, and (i,j) represents the coordinate value. Though, the conventional edge-based method requires a series of images and point clouds for the optimization of external parameters, and has the following three defects: first, because a large amount of data is used, the objective function is too complicated; second, the conventional method relies on Because the algorithm is based on the matching between the edge of the image and the depth discontinuity in the point cloud, and there are a large number of edges that are not caused by the depth discontinuity. For example, buildings and trees are objects that are common in autonomous driving and contain rich edge features. These edges will lead to wrong matching, which leads to optimization failure. Therefore, we improved the conventional method and introduced depth information and optimization parameter scaling to improve the speed, accuracy, and robustness of optimization. In order to provide more effective constraints, the objective function is modified to: The first item is the index of the depth difference of the corresponding point. λ d is a constant term. When there is a big deviation in the optimization, the index term will increase sharply to punish the error.

Verification of outcomes for self-calibration method based on depth-edge matching
In order to verify the self-calibration method based on depth-edge matching proposed in this paper, we conducted a series of experiments on the famous KITTI dataset in the field of autonomous driving and our own dataset. The KITTI data set is a complete autonomous driving data set and provides true values of external parameters. In order to verify the effect of the method in this article in the real environment, we used the self-built autonomous driving system to conduct a series of data collection work. In order to verify the speed and robustness of the proposed method based on depth-edge matching compared with the conventional method based on edge matching, the optimization time is limited to 1200 seconds, and the optimization process of the two methods is compared. In the experiment, the conventional method requires at least 6 frames of data to be optimized correctly, and it takes several hours to achieve convergence, while the improved method proposed in this paper only requires 2 frames of data to achieve correct optimization. The experiment was repeated 40 times, and the average error of the external parameters in the optimization process was calculated by the true value provided. The result is shown in Figure 2. It can be seen that given two frames of data, the external parameters can be appropriately improved by the method in this paper, and the algorithm converges in about 1000 seconds. However, the conventional method cannot optimize the parameters correctly when there are only 2 frames of data.
To verify the accuracy of the depth-edge matching method proposed in this paper, we repeated 50 experiments. As shown in Figure 3, the rotation error of most results is between 0.1 ∘ and 0.8 ∘ , and the translation error is between 0.02m and 0.06m. We noticed that the change of rotation has a greater impact on the objective function and intuitive projection than the change of translation, making it more difficult to optimize translation than rotation. On the other hand, it is much more difficult to make an artificial rough measurement of rotation to provide certain a priori information than a rough measurement of translation, because there is no suitable tool or way to directly measure the angle in space.
For indicating the effect of the method proposed in this paper compared with other methods, we have selected representative algorithms of various methods as indicated in Table 1.  In addition to the method of optimization based on edge matching (denoted as Levinson), there are also direct calculation of movement through the hand-eye calibration method (denoted as Taylor). The method of using current self-supervised deep learning technology (denoted as Cal-ibNet), and the method of manually selecting the corresponding Manual calibration method for points (take the best result as a reference and record it as Manual. Repeat 20 experiments, the average result of the best 5 results is shown in Table 1, where the unit of rotation Pitch, Yaw and Roll is degree. The unit of each component of translation is meter. It can be seen that compared with other representative algorithms of various methods, the errors in all aspects  Although there are some automatic external calibration methods between the camera and the lidar, there are various limitations in the application of these methods in the automatic system. The external parameter calibration of the autonomous driving system requires a more robust and online optimization solution. Therefore, this paper proposes an online calibration method for intelligent networked automotive lidar and camera based on depth-edge matching. The main work is as follows: 1. The initial values of external parameters are estimated and calculated through hand-eye calibration, and the solution of hand-eye calibration is optimized. 2. Obtain accurate external parameters through data conversion, and use CMA-ES algorithm to optimize the optimized parameters. 3. Compared with the conventional method based on edge matching, it is found that while providing the two frames of data, the external parameters can be appropriately improved by the method in this paper, and the algorithm congregates in about 1000 seconds. Also, the proposed approach provides generalized solution as the conventional method cannot optimize the parameters correctly when there are only 2 frames of data. 4. The rotation error of most results of this method is between 0.1 ∘ and 0.8 ∘ , and the translation error is between 0.02m and 0.06m. Compared with other representative algorithms of various methods, the errors in all aspects are more balanced and there is no outstanding error value. 5. However, there are some future indications of this research work can be extended to provide more optimized and generalized solution compared to the conventional method for utilizing minimum data frames.