Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Open Physics

formerly Central European Journal of Physics

Editor-in-Chief: Seidel, Sally

Managing Editor: Lesna-Szreter, Paulina

IMPACT FACTOR 2018: 1.005

CiteScore 2018: 1.01

SCImago Journal Rank (SJR) 2018: 0.237
Source Normalized Impact per Paper (SNIP) 2018: 0.541

ICV 2017: 162.45

Open Access
See all formats and pricing
More options …
Volume 16, Issue 1


Volume 13 (2015)

Fast recognition method of moving video images based on BP neural networks

Yu Shao
  • Corresponding author
  • School of Electronic and Information Engineering, SIAS International University, Xinzheng, 451150, China
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Deden Witarsyah
Published Online: 2018-12-31 | DOI: https://doi.org/10.1515/phys-2018-0123


At present, the accuracy of real-time moving video image recognition methods are poor. Also energy consumption is high and fault tolerance is not ideal. Consequently this paper proposes a method of moving video image recognition based on BP neural networks. The moving video image is divided into two parts: the key content and the background by binary gray image. By collecting training cubes. The D-SFA algorithm is used to extract moving video image features and to construct feature representation. The image features are extracted by collecting training cubes. The BP neural network is constructed to get the error function. The error signal is returned continuously along the original path. By modifying the weights of neurons in each layer, the weights propagate to the input layer step by step, and then propagates forward. The two processes are repeated to minimize the error signal. The result of image feature extraction is regarded as the input of BP neural network, and the result of moving video image recognition is output. And fault tolerance in real-time is better than the current method. Also the recognition energy consumption is low, and our method is more practical.

Keywords: BP neural network; moving video image; recognition; binaryzation

PACS: 07.05.Mh; 07.05.Kf; 07.05.Pj

1 Introduction

Previously, moving video image recognition technology was a very strange phrase, but now it has become more and more apart of people’s lives. Digital moving video image is the representation of two-dimensional image with finite digital pixels. Its recognition and detection is of great significance to the breakthrough in this field [1, 2]. Image recognition technology is becoming more and more mature. Every year, new technologies and achievements occur at a great pace. In the 21st century, one of the hottest technologies is artificial intelligence. However, image recognition technology is the core of artificial intelligence. It is the eye of future intelligent AI. Its application will inevitably lead to the rapid development of artificial intelligence [3, 4]. To sum up, the research of moving video image recognition methods and technology is ongoing.

In order to automatically recognize and track the moving droplets in the welding video images, Zhang et al proposed the image recognition method based on frame difference. Mean-shift algorithm, aiming at the characteristics of gray images and single backgrounds. In order to solve the problem that the Mean-shift algorithm needs to fetch the target manually in the starting frame, the frame difference method was used to process the first two frames of the video image to get the target window and the center position in order to calibrate them. Combining the Mean-shift algorithm, based on gray histograms, the target template position of the next frame was determined so as to realize the automatic recognition and tracking of moving droplets. The results showed that the method had good real-time performance, but the recognition accuracy was low [5]. Static-video face recognition proposed by Fan Zheyi et al was an identity recognition technology where the training set was a high-quality static image and the test set was a low-quality video sequence. Aiming at the difficulty of image alignment and motion blur, an improved sparsely represented static-video face recognition algorithm was proposed. According to the gradient variance information, the geometric features of facial images under video conditions were achieved. The problem of motion blur was solved by constructing a dictionary by multi-scale filtering of images. The key frames in video sequences were extracted by cross-correlating coefficients between images. Experimental results showed that the algorithm ran stably, but the real-time recognition performance was poor [6]. Xu H.N. et al proposed a motion recognition method based on three-dimensional depth image sequence to solve the problem of the high cost of traditional motion recognition algorithms in color video and inadequate two-dimensional information. The algorithm proposed a time depth model (TDM) to describe actions in time dimension. In the three orthogonal Cartesian planes, the depth image sequence was divided into several sub-actions. Also the inter-frame difference and energy accumulation were made for all the sub-actions to form a depth motion map to describe the dynamic characteristics of the action. In the spatial dimension, the spatial pyramid directional gradient histogram (SPHOG) was used to encode the time depth model to get the final descriptor. Finally, support vector machine (SVM) was used to classify actions. Experiments on two authoritative databases of MSR Action 3D and MSR Gesture 3D showed that the method had high recognition accuracy, but the recognition energy consumption was not ideal [7]. Liu Mingzhu et al proposed an image recognition algorithm based on deep learning. A gabor filter was used to extract textural features of video images in four directions: horizontal, vertical, skimming and scratching. Then the depth confidence network was constructed by a RBM incremental depth learning algorithm layer by layer to locate the text region in the extracted texture feature image. In this paper, the feasibility of using morphological processing methods and an OCR character library were also studied to realize text recognition of video image, and the recognition effect were analyzed. The test results showed that the algorithm had good real-time performance [8].

Given the problems existing in the current research results, a fast recognition method of moving video image based on BP neural network is proposed. The detailed process is as follows:

  1. A binary method is used to process the moving video image to improve the accuracy of image recognition, enhance the real-time image recognition, and to a certain extent reduce the energy consumption of recognition.

  2. The D-SFA algorithm is used to extract the features of moving video images, lay a foundation for further improving the accuracy of image recognition.

  3. The result of image feature extraction is input into the BP neural network algorithm [in order] to recognize the moving video image.

The effectiveness of the proposed method is verified experimentally.

2 Method

2.1 Moving video image processing

Image threshold segmentation is a widely used imaging technology. It takes advantage of the differences in gray characteristics between the object to be extracted from the image and its background. It also regards the image as a combination of two types of regions (targets and background) with different gray levels [9]. Among them, the most important is the selection of image threshold, inappropriate threshold selection will affect the quality of the binary image and recognition accuracy. This is because of the influence of uneven illumination, camera distortion, insufficient exposure and narrow dynamic range, results in serious artifacts appearing in the moving video images. Because of the uneven gray distribution and insufficient contrast, the edge of the moving video image is blurred and the details are not clearly distinguished. Also the binaryzation effect of moving video image is seriously affected.

For this reason, a global threshold algorithm based on the spatial distribution of moving video images and the classification criterion of maximum inter-class variance is used to binarize the recognition of moving video images, which can not only eliminate artifacts, but also maintain the edge integrity of moving video images [10].

Given ideal condition of uniform illumination, no noise and interference, the total gray level of the moving video image changes gently. Supposing the key content of the image is g1, the background gray is g2, and 0 ≤ g1, g2 ≤ 255. Supposing that the proportion of key content pixels in a moving video image is r1, the proportion of background pixels is r2, and 0 < r1, r2 < 1, r1 + r2 = 1. The gray mean of moving video images is expressed as Eq. (1)


The variance calculation is shown in Eq. (2)


According to Eq. (1) and Eq. (2) it can get:


According to Eq. (3) there are:


The Eq. (4) is substituted for Eq. (2) and it can get:


In summary, the grayscale of the key content in the image is:


In this way, the grayscale of image’s background is:


The rough threshold value can be expressed as:


According to the calculation of rough threshold, the fine threshold value of an image with binaryzation is determined. The binarization of moving video images can be reduced to the classification of the two models (targets and backgrounds). Finally, the images are divided into two categories: key content and background [11].

Assuming that a given moving video image has a gray level of 123 · · · L, a total of L, and a threshold of t, the pixels with gray levels greater than t and less than t are divided into two categories: class 1 and class 2. The total number of pixels in class 1 is ω1 (t), the average gray value is μ1 (t), and the variance is σ1 (t). The total number of pixels in class 2 is , the average gray value is μ2 (t), the variance is σ2 (t), and the average gray value of image pixels is μ (t). The inter-class variance σA2tand intra-class variance σA2tcan be defined as




In pattern classification theory, there are three criteria for separability measurement among different classes: scattering matrix, divergence and Battacharyya distance. The ratio of inter-class variance to intra-class variance corresponds to the scattering matrix, which reflects the distribution of patterns in pattern space. Also the greater the similarity of the pixels of each class are the classification results will be better [12, 13]. Therefore, the maximum inter-class variance criterion function S (t) is used to fine tune the rough threshold.


According to Eq. (11) the binarization method based on spatial distribution is combined with the maximum inter-class variance classification criterion to realize the binarization of moving video images. In this way, the contrast between the background and the target is enhanced, the accuracy of the image recognition is improved, the real-time performance of the image recognition is enhanced, and the energy consumption of the recognition is reduced to a certain extent.

2.2 Feature extraction of moving video image

Based on the results of moving video image processing, the D-SFA algorithm is used to extract the features in the image. In this paper, image feature extraction is divided into three parts: collecting training cubes; extracting features of moving video image by using the algorithm; and constructing feature representations.

Collecting training cube is a method of constructing original input signal xtfrom video sequences. Firstly, the original video is processed and the frame difference image sequence is obtained. Aselected frame is used as the initial frame to detect the feature points, and then the optical flow method is used to track the feature points to get the corresponding set of trajectories of all the feature points in the video. For each trajectory in the trajectory set, the pixel values in the neighborhood of each trajectory point w × w are extracted to form a series of pixel blocks. Considering the time information, the sequence of pixel blocks of each point is integrated by Δt successive frames, and Δt = 3 is taken here. After further integrating all the feature points, the training cube is obtained, that is, the input vector xtis constructed. Figure 1 shows the process of training the cube.

Process of training cube
Figure 1

Process of training cube

According to the cube training results, the D-SFA algorithm is used to extract the features of the moving video images. The algorithm is a kind of unsupervised learning algorithm. The idea of extracting image features from video is that the training cubes collected from different kinds of behavioral video are mixed together for machine learning of feature functions, and then the features are extracted from the trained feature functions. Because the supervised information cannot be encoded, the extracted features do not have good discrimination between behaviors. The algorithm introduces supervised information in the learning process. The idea of extracting image features from human behavior video is that the training cubes collected for each type of behavior are used for learning feature functions respectively, so the learning feature functions have the ability to distinguish the inter-class behavior, that is, they are selective to intra-class behavior.

Since feature analysis can minimize the mean square derivative, the fitting degree of a cube to the corresponding feature function can be measured by transforming the cube’s square derivative [14]. If the value is small, this cube and the feature function are well fitted. For the Cith and jth feature functions of the ith cube, the square derivative is defined.


Where, L represents the number of tracked frames, and represents the transformation operation. Here, L is defined as 15 and is defined as 3.

Based on the calculation of Eq. (12) the square derivatives are accumulated on all cubes to form the feature of moving video image:


Where, N represents the number of cubes collected in a moving video, Vi = (vi,1, vi,2, · · · vi,K), and K represents the number of feature functions of a moving video image. Through image feature extraction, the accuracy of moving video image recognition is further improved.

2.3 Fast recognition of moving video images based on BP neural network

BP neural network is a multi-layer forward network with at least one layer at each level composed of input layers, output layers, and hidden layers shown in Figure 2.

Three-layer BP neural network model
Figure 2

Three-layer BP neural network model

The main idea of a back propagation algorithm in BP neural network is to divide the learning process into two

Part of experimental samples
Figure 3

Part of experimental samples

stages: In the first stage (forward propagation process), the input information calculate the actual output value of each unit layer from the input layer, each layer of neuron state only affects the state of the next layer of neurons; In the second stage (back propagation process), assuming that the desired output value is not obtained at the output layer, the difference between the actual output and the desired output is calculated recursively layer by layer, and the error signal tends to be minimized by modifying the weight of the front layer according to the error. It gradually approximates the target by continuously calculating the network weights and deviation changes in the direction relative to the descent of error function slope. Every change of weight and error is directly proportional to the effect of network error [1521].

Based on the above analysis, it is assumed that the number of cells in the input layer, the middle layer and the output layer is K, H and G respectively. f is added to the input vector in the BP neural network, H is the intermediate output vector, and G is the actual output vector of the network, that is, the recognition results of moving video images are recognized.

Assuming that the weight of the output unit i′ to the hidden unit j′ is Wij,the weight of the hidden unit j′ to the output unit l is Wjl.θland ϕjrepresent the thresholds of the output unit and the hidden unit. Controlling θl in the range of [1.3, 1.4] and ϕjin the range of [0.5, 0.6] can effectively improve the fault-tolerance of moving image recognition.

A transfer function is a function that reflects the intensity of the stimulus pulse from the lower input to the upper node. It is also called a stimulus function. Generally, it is the Sigmoid function that is continuously selected in (0, 1). That is,


The error function E (x) is:


The output hjof each unit in the middle layer and output layer can be expressed as:


In the BP neural network algorithm, the gradient descent method is used to adjust weights.


Where, xirepresents the output or external input of point i′, η represents learning rate, and ξjrepresents error.

Equation (18) is used to modify the weights and thresholds, and the error signals are returned continuously along with the original. By modifying the weights of neurons in each layer, the error signals propagate to the input layer one by one. Through the forward propagation process, the two processes are repeated, making the error signal smallest. When all the errors meet the requirements, the moving video image feature f is input into BP neural network, and the recognition result of the moving video image is obtained.


The result of Equation (19) is the result of moving video recognition based on BP neural network.

3 Results

In order to validate the method of motion video image recognition based on BP neural network, the experiment of motion video image recognition is carried out using behavior recognition database. The database contains 50 kinds of single behaviors, including bending, running, single-foot jumping, double-foot jumping, in-situ jumping, waving, sidetracking, walking, single-arm waving and double -arm waving. Each behavior is completed by 9 different individuals. Figure 3 is part of the experimental samples. The experimental platform is built on Matlab.

  • Accuracy of image recognition

  • Real- time performance of image recognition

  • Energy consumption of image recognition

  • Fault- tolerance of image recognition

The results are as follows:

Figure 4 shows that the combination of the frame difference method and a Mean-shift algorithm has the worst recognition accuracy. Other current methods do not have strong recognition accuracy, the reliability is poor. The recognition method of moving video image based on BP neural network does not change with the number of moving video images to be recognized, and the highest recognition accuracy is 99%.

Comparison of accuracy of different image recognition methods
Figure 4

Comparison of accuracy of different image recognition methods

In the real time experiment of image recognition, the number of images to be recognized is defined as 100.

In Figure 5 different methods show different real-time performance in a certain number of images to be recognized. Current image recognition algorithms and methods do not have stability and persistence in real-time, and the recognition process has a high delay. Moving video image recognition method based on BP neural network can effectively control the recognition delay below 6 μs, which is feasible.

Real-time performance comparison of different image recognition methods
Figure 5

Real-time performance comparison of different image recognition methods

The average energy consumption of action recognition based on 3D depth image sequences is 121 nJ/bit, that of image recognition based on depth learning is 104 nJ/bit, the improved sparse representation for static-video face recognition is 116 nJ/bit, and the moving video recognition based on BP neural network is 88.6 nJ/bit. From the experimental data, the proposed method has lower energy consumption.

The results of the experiments in Figures 4 to 6 show that the proposed methods show superior performance. This is mainly due to the binarization of the image and the extraction of the image features before the proposed method is used to enhance the contrast between the background and the content of the image. Provides support for reducing the energy consumption, improving the recognition accuracy and enhancing the real-time recognition.

Comparison of different image recognition methods for identifying energy consumption
Figure 6

Comparison of different image recognition methods for identifying energy consumption

As can be seen from Figure 7, compared with the current research, the fault-tolerance of moving video image recognition based on BP neural networks is better. The proposed method sets the thresholds of the output unit θl and the hidden unit 1 and ϕj,which effectively enhances the fault-tolerance in the process of moving video image recognition.

Comparison of fault-tolerance of different image recognition methods
Figure 7

Comparison of fault-tolerance of different image recognition methods

4 Discussion

In this discussion, the fault-tolerance of the moving video image recognition method based on BP neural network is observed with the different range of the hidden unit threshold ϕj.ϕjis defined in two ranges of [0.5, 0.6] and [0.7, 0.8] respectively, so as to observe the influence of the value of ϕjon the fault-tolerance of image recognition. The results are as follows:

As shown in Figure 8, the fault-tolerance coefficient of moving video image recognition based on BP neural network is larger in ϕjand that of the proposed method fluctuates continuously in ϕj0.5,0.6,and the overall fault-tolerance coefficient is smaller than 0.9. From the discussion results, it can be seen that the method defines ϕj0.7,0.8,in the interval of [0.5, 0.6], which can adjust the image fault-tolerance coefficient to the maximum.

Influence of different values of ϕj′${\phi _{j'}}$on fault tolerance of image recognition
Figure 8

Influence of different values of ϕjon fault tolerance of image recognition

5 Conclusions

As the focus of current research, moving video image recognition has attracted wide attention and many scholars have embarked on research. At present, there are some defects in the related research methods and the performance of the algorithms employed. Thus, a moving video image recognition method based on BP neural network is proposed. Through image processing, image feature extraction and image recognition, the detection and recognition of moving video images are completed. Experimental results show that the proposed method is robust. The following suggestions are put forward for the next research.

BP neural network algorithms have certain advantages in image recognition. They can be combined with the constantly updated new algorithms or methods to further improve the accuracy of image recognition.

Image denoising or enhancement algorithms should be added to further improve the performance of the recognition method.


Science and Technology Breakthrough Project of Henan Provincial Science and Technology Department.

Project name: Study on complex image retrieval method based on content diversity (182102210547)


  • [1]

    Li F.P., Liang J.G., Du X.F., et al. Research on Intelligent Patrol Robot Based on Image Processing Technology, Autom. Instrum., 2017, (6), 10-12. Google Scholar

  • [2]

    Liu C.Q., Chen B., Pan Z.H., et al. Research of Target Recognition Technique via Simulation SAR and SVM Classifier, J. China Acad. Electron Inform. Techn., 2016, 11(3), 257-262. Google Scholar

  • [3]

    Liu L.L. Research on Image Segmentation Technology for a License Plate Recognition, Bull. Sci. Techn., 2017, 33(4), 125-129. Google Scholar

  • [4]

    Wang M., Ju H.Z., Yao G.Q. Plane Target Recognition in the High Resolution Remote Sensing Image, Sci. Techn. Eng., 2017, 17(18), 265-270. Google Scholar

  • [5]

    Zhang S.Y., Zhu X.L., Wang Y.G., et al. Recognition and Tracking Algorithms of Moving Droplet Based on Inter-Frame Difference Method Combined with Mean-Shift, J. Shanghai Jiaotong Univ., 2016, 50(10), 1605-1608. Google Scholar

  • [6]

    Fan Z.Y., Zeng Y.J., Jiang J., et al. Improved Still-to-Video Face Recognition Algorithm Based on Sparse Representation, J. Signal Process., 2016, 32(5), 567-574. Google Scholar

  • [7]

    Xu H.N., Chen E.Q., Liang C.W. Three-dimensional spatiotemporal feature extraction method for action recognition, J. Comput. Appl., 2016, 36(2), 568-573. Google Scholar

  • [8]

    Liu M.Z., Zheng Y.F., Fan J.F., et al. Area Location and Recognition of Video Text Based on Depth Learning Method, J. Harbin Univ. Sci. Techn., 2016, 21(6), 61-66. Google Scholar

  • [9]

    Cao Y.Q., Cheng W., Huang X.S. Simulation Research on Tracking and Recognition of Moving Objects in Video Images, Comput. Simulation, 2017, 34(1), 191-196. Google Scholar

  • [10]

    Tang W., Wang X.T., Wang M.X. Study of FPGA and DSP-based Vehicle License Plate Recognition System, Comput. Meas. and Control, 2016, 24(2), 297-299. Google Scholar

  • [11]

    Zhang Q., Xia S.B., Guo P., et al. The Application of Image Recognition Technology in the Measurement Error Detection of Smart Meter, Electron. Des. Eng., 2017, 25(19), 187-189. Google Scholar

  • [12]

    Das R., Thepade S., Ghosh S. Framework for Content-Based Image Identification with Standardized Multiview Features, Etri J., 2016, 38(1), 174–184. Web of ScienceCrossrefGoogle Scholar

  • [13]

    Zeng H., Kang X. Fast Source Camera Identification Using Content Adaptive Guided Image Filter, J. Forensic Sci., 2016, 61(2), 520-526. Web of ScienceCrossrefGoogle Scholar

  • [14]

    Zhang F., Hao G., Shao M., et al. An Adipose Tissue Atlas, An Image-Guided Identification of Human-like BAT and Beige Depots in Rodents, Cell Metab., 2018, 27(1), 252-262. Web of ScienceCrossrefGoogle Scholar

  • [15]

    Kara I. Investigation of Ballistic Evidence through an Automatic Image Analysis and Identification System, J. Forensic Sci., 2016, 61(3), 775-781. CrossrefWeb of ScienceGoogle Scholar

  • [16]

    Klinshov V., Maslennikov O., Nekorkin V. Jittering Regimes of Two Spiking Oscillators with Delayed Coupling, Appl. Math. Nonlinear Sci., 2016, 1(1), 197-206.CrossrefGoogle Scholar

  • [17]

    Oyekale A.S. Cocoa Farmers’ Safety Perception and Compliance with Precautions in the Use of Pesticides in Centre and Western Cameroon, Appl. Ecol. Env. Res., 2017, 15(3), 205-219.CrossrefWeb of ScienceGoogle Scholar

  • [18]

    Gao W., Wang Y., Basavanagoud B., Jamil M.K. Characteristics Studies of Molecular Structures in Drugs, Saudi Pharm. J., 2017, 25(4), 580-586.Web of ScienceCrossrefGoogle Scholar

  • [19]

    Liu Z. What is the Future of Solar Energy? Economic and Policy Barriers, Energy Sources Part B-Econ. Plan. Pol, 2018, 13(3), 169-172.CrossrefGoogle Scholar

  • [20]

    Hosamani S.M., Kulkarni B.B., Boli R.G., Gadag V.M. Qspr Analysis of Certain Graph Theocratical Matrices and their Corresponding Energy, Appl. Math. Nonlinear Sci., 2017, 2(1), 131-150.CrossrefGoogle Scholar

  • [21]

    Torres-Martinez A., Sanchez A.J., Alvarez-Pliego N., Amalia Hernandez-Franyutti A., Carlos Lopez-Hernandez J., Bautista-Regil J., Gonadal Histopathology of Fish From La Polvora Urban Lagoon in the Grijalva Basin,Mexico. Rev. Int. De Contaminacion Ambiental, 2017, 33(4), 713-717. CrossrefGoogle Scholar

About the article

Received: 2018-10-07

Accepted: 2018-11-14

Published Online: 2018-12-31

Citation Information: Open Physics, Volume 16, Issue 1, Pages 1024–1032, ISSN (Online) 2391-5471, DOI: https://doi.org/10.1515/phys-2018-0123.

Export Citation

© 2018 Y. Shao and D. Witarsyah, published by De Gruyter. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in