Research on target feature extraction and location positioning with machine learning algorithm

: The accurate positioning of target is an important link in robot technology. Based on machine learning algorithm, this study firstly analyzed the location positioning principle of binocular vision of robot, then extracted features of the target using speeded-up robust features (SURF) method, positioned the location us-ing Back Propagation Neural Networks (BPNN) method, and tested the method through experiments. The experimental results showed that the feature extraction of SURF method was fast, about 0.2 s, and was less affected by noise. It was found from the positioning results that the output position of the BPNN method was basically consistent with the actual position, and errors in X, Y and Z directions were very small, which could meet the positioning needs of the robot. The experimental results verify the effectiveness of machine learning method and provide some theoretical support for its further promotion and application in practice.


Introduction
With the development of technology, intelligent robots have been more and more widely used in people's work and life, including civil [1], industry [2], agriculture [3], medical [4], military [5], etc., especially in environments such as the universe and ocean that human beings can not match. Intelligent robots is an important embodiment of national science and technology level and industrial level. In order to achieve more intelligent service of robots, it is necessary to improve the control technology of robots, such as target positioning, path planning, etc. [6]. Taking industrial robot as an example, in the process of completing tasks, the robot needs to locate the target accurately to realize the following work such as recognition, detection, grasping and classification. Robot positioning has become a problem which is widely concerned by researchers [7]. Zhang et al. [8] designed a piecewise fitting monocular vision ranging method for the positioning of humanoid robots and found through experiments that the average error of the method was 1.7 mm, suggesting relatively accurate positioning. Luo et al. [9] studied the positioning of picking points of grape picking robot. Based on the improved clustering image segmentation algorithm, the positioning of picking points was carried out. Experiments were carried out in the environment of Opencv2.3.1 and Visual C + +. They found that the error ratio of the picking points obtained by the proposed method and the manually set picking points reached 88.33%, which could meet the position requirements of robot. Yan et al. [10] designed a positioning method based on passive radio frequency identification (RFID), carried out simulation experiments, and found that the absolute error of the method was smaller than 10.16 cm and the calculation time was short. Wu et al. [11] preprocessed images with pixel projection, then located and recognized the target through deep convolution neural network (DCNN), and found through experiment that the method was effective and could recognize the type of workpiece quickly. Binocular vision robot has a wide range of applications in the industrial field and has obvious advantages in modern and automatic production. Its position positioning is also an important and difficult problem. Therefore, the research on its positioning has very important practical values. At present, the accuracy of many methods can not meet the requirements of robot work. To realize the target feature extraction and positioning of binocular vision robot better, this study mainly analyzed the machine learning algorithm, designed a method that extracted features using speeded-up robust features (SURF) and positioned target using back propagation neural network (BPNN), carried out simulation experiments. The present study provides some theoretical support for the application of the method in practice and also makes some contributions to the better application of robots.

Positioning principle of binocular vision
Vision is a key technology of robots, which can identify and locate the target. According to the number of cameras, one or two, vision can be divided into monocular and binocular. Binocular vision has higher positioning accuracy and can obtain the three-dimensional information of the target, so it has a more extensive application [12]. The principle of binocular vision is similar to that of human eyes. As there is a distance between the left and right eyes of human, when observing an object, the object will project different positions on the left and right retinas. The deviation of the position is parallax. Binocular vision observes the target through two cameras [14]. According to the parallax of the target, the position of the target can be calculated. When two cameras observe target A at the same time, two coordinates, are obtained. According to the trigonometric relations, there is: where B refers to the baseline, the distance between the center points of two cameras. If the binocular vision parallax then the camera coordinates of A can be expressed as: If parallax D, baseline B and focal length f are determined, the coordinates of the target can be obtained.

Target feature extraction
In the two images collected by binocular vision, for location, it is necessary to extract the target area.
The commonly used methods include Scale Invariant Feature Transform (SIFT) algorithm and SURF algorithm. SIFT algorithm has a good accuracy in image feature extraction. SIFT algorithm can extract features well for zoomed and rotated images, but it is difficult to be applied in practice because of its poor real-time performance. SURF algorithm overcomes the above defect and has faster feature extraction speed. Therefore, this study extracted target features using SURF algorithm based on the texture features of the image. SURF is an improvement of Scale Invariant Feature Transform (SIFT) algorithm, which has a high feature extraction speed and can detect stable feature points; therefore it has a good performance in feature extraction [15].
For image I, one point is set as then Hessian matrix H(X, σ) of scale σ at X can be expressed as: where Lxx (x, σ) stands for the convolution of Gaussian second-order partial derivative ∂ 2 with image I at X. Box filter is used. Lxx, Lxy, Lyy is replaced by Dxx, Dxy, Dyy to obtain the determinant of approximate matrix Happrox: where ω stands for a regulation parameter. SURF algorithm builds image pyramid using the indirect method, divides the scale space according to groups, calculates the image extreme points in each layer of pyramid, sets the threshold value, and performs non-maximal suppression in 3×3×3 neighbourhood on the obtained feature point. Only the point which is larger than the point in the neighbourhood is the feature point.
In order to ensure the invariance of the feature vector, a circular region is divided by taking the feature point as the center and 6 s (s is the scale of the feature point) as the radius. Then the Haar wavelet response operation is performed within the region to calculate the direction corresponding to the maximum Harr response accumulation value, i.e., the main direction of the feature point. Then, taking the point as the center, a square area whose side length is 20 s is selected and divided into 16 subregions. Each region is sampled by 5 s × 5 s, and the Haar wavelet response is calculated. Then a four-dimensional vector is obtained: Sixty-four descriptors can be obtained by combining the vectors in the 16 subregions. Finally, the steps of the SURF based feature extraction method are as follows. Firstly, the sets of the feature point pairs of the template and the left and right images, I le and I right , are obtained through SURF. Then the feature points which are matched with the template are searched and put into set S le and S right . For the extracted points, the corresponding Euclidean distance is calculated. When the ratio of the two minimum Euclidean distances is smaller than the threshold value, the matching of feature points is successful.

Machine learning location algorithm
Machine learning algorithms include decision tree algorithm, Bayesian algorithm, support vector machine (SVM), etc. In this study, the target was positioned using the neural network method from machine learning algorithm. Neural network which is the simulation of human brain has better performance in solving complex nonlinear problems compared to other methods. Moreover, neural network has good adaptivity and fault tolerance. Neural network can be used in positioning. BPNN [16] is the most widely used one, and its structure is shown in Figure 1.
When the target is positioned by BPNN, the image is collected by binocular vision, and the feature points of the target are obtained by feature extraction. The pixel coordinates of the target feature points are taken as L. C. Li (1) The input vector of BPNN is set as: where l is the number of nodes The weight is set as w ij . The input vector of the hidden layer is set as: where m stands for the number of neurons, and the output vector was set as: Then: w ij x j )︃ , i = 1, 2, · · · , l, j = 1, 2, · · · m.
(2) The input vector of the output layer is set as: the output vector is set as: where n stands for the number of nodes, and the weight is set as v jk . Then: (3) If the expected output vector of BPNN is: then the error between the expected output vector and actual output vector Z is error signal E: After expansion, there is: Corrections ∆w ij and ∆v jk of the weight are: where η stands for the learning coefficient. (4) When the error signal meets the requirements, the calculation stops and the result is output, i.e., the world coordinates of the target feature points.

Simulation experiment
The target positioning of binocular vision robot was tested using the SURF and BPNN method proposed in this study. The target was photographed by two charge coupled device (CCD) cameras, and then programming was performed using C++ language in the environment of Visual Studio 2008. Firstly, the feature extraction method was analyzed. Taking an image collected by the camera as an example (Figure 4), the time required for feature extraction of the algorithm was simulated under the condition of no noise and noise multiple of 0.01, 0.03, 0.05 and 0.07 respectively. The results are shown in Figure 2.  the increase amplitude was very small; when the noise reached 0.07 times, the calculation time was 0.271 s, which was only 0.026s longer than that in the case of no noise, indicating that the feature extraction algorithm designed in this study could meet the real-time requirement.
Two hundred groups of sample images were collected, and the pixel coordinates of the corresponding feature points were extracted. Then 200 groups of sample data were obtained. One hundred and fifty groups of data were used for BPNN training, and the remaining 50 groups of data were used for positioning test. Pixel coordinates p l , q l , pr and qr of feature points of left and right images obtained by binocular vision were the input of BPNN, and world coordinates xw, yw and zw of the feature points were the output of BPNN. The BPNN method used in this study was compared with the SVM method [17]. The testing results are shown in Table 1. It was seen from Table 1 that the output result of SVM was slightly different with the expected output, but the actual output of BPNN was very close to the expected output, and the error of the output of zw especially was almost 0. The error of every time of positioning of the two methods was counted, and the results are shown in Figures 4 and 5. After calculation, it was found that the maximum error values of xw, yw, and zw in SVM were 3.5889, 2.7437, and 0.0546 respectively, and the minimum errors were 0.8732, 0.7694, and 0.411 respectively; in BPNN, the maximum errors of xw, yw, and zw were 1.9954, 1.9812, and 0.0001 respectively, and the minimum errors were 0.0280, 0.0069 and 0.0000 respectively. The above results showed that the error of SVM was significantly larger than BPNN, indicating that the positioning precision of BPNN was higher. It was found that the error of BPNN in X and Y directions was slightly larger, and the error was almost 0 in the Z direction. The changes of error showed that the error was always smaller than 2, which could meet the positioning requirements of robots.

Discussion
Machine learning covers many aspects of knowledge, such as probability theory, statistics, etc., which is a very important research direction in artificial intelligence [18]. It has been widely used in solving complex engineering and scientific problems, such as natural language processing [19], pattern recognition [20], biological information processing [21], machine vision [22], etc. The algorithms of machine learning include decision tree [23], random forest [24], Bayesian [25], etc. In this study, the target positioning method was designed using BPNN method.
Before a robot positions a target, feature extraction of the target is needed. In this study, the feature extraction of the target was achieved by SURF method. Then the pixel coordinates of the extracted feature points were used as the input of BPNN to locate the target. The experimental results showed that SURF method had a good performance in target feature extraction as it completed feature extraction in a short time and was less affected by noise. It was seen from Figure 3 that the increase amplitude of the calculation time of the algorithm was very small, around 0.2 s, though there was an increase with the increase of noise. The positioning experiment showed that the positioning precision of BPNN was significantly superior to that of SVM, the output position coordinates of BPNN were very close to the actual coordinates, and the errors in three directions were very small, which could meet the needs of robots well in the actual work.
This study found that the SURF method and BPNN model showed good performance and had strong applicability in solving the positioning problem of robots; however, there are some limitations that need to be improved in future works: (1) comparative study was not carried out on more machine learning methods; (2) the BPNN method was not further optimized to enhance the positioning precision.

Conclusion
In order to solve the problem of robot target location, the target features were extracted using the SURF method, and then the location of the target was realized by the BPNN model. It was found from the experiment that: (1) the SURF method was less interfered by noise and had a high extraction speed, showing a good performance in the target feature extraction; (2) the positioning result of the BPNN method was superior to that of SVM and had very small errors, and the error in X and Y directions was slightly large, but not more than 2.
The experimental results verify the effectiveness of the proposed method for target feature extraction and location, which can be promoted and applied in practice.