Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access October 17, 2022

Cross-modal biometric fusion intelligent traffic recognition system combined with real-time data operation

Wei Xu EMAIL logo and Yujin Zhai
From the journal Open Computer Science

Abstract

Intelligent traffic recognition system is the development direction of the future traffic system. It effectively integrates advanced information technology, data communication transmission technology, electronic sensing technology, control technology, and computer technology into the entire ground traffic management system. It establishes a real-time, accurate, and efficient integrated transportation management system that plays a role in a wide range and all directions. The aim of this article is to integrate cross-modal biometrics into an intelligent traffic recognition system combined with real-time data operations. Based on the cross-modal recognition algorithm, it can better re-identify the vehicle cross-modally by building a model. First, this article first presents a general introduction to the cross-modal recognition method. Then, the experimental analysis is conducted on the classification of vehicle images recognized by the intelligent transportation system, the complexity of vehicle logo recognition, and the recognition of vehicle images with different lights. Finally, the cross-modal recognition algorithm is introduced into the dynamic analysis of the intelligent traffic recognition system. The cross-modal traffic recognition system experiment is carried out. The experimental results show that the intraclass distribution loss function can improve the Rank 1 recognition rate and mAP value by 6–7% points on the basis of the baseline method. This shows that improving the modal invariance feature by reducing the distribution difference between different modal images of the same vehicle can effectively deal with the feature information imbalance caused by modal changes.

1 Introduction

With the wide application of cross-modal biometric recognition in various fields, as long as the biometrics of different modalities are prerecorded, they can be used for cross-modal recognition based on these data. The intelligent transportation system can promote the maximum efficiency of transportation facilities and improve the quality of transportation services. The intelligent traffic recognition system can use cross-modal biometric technology and combine real-time data operations to make traffic recognition more rapid, accurate, and convenient.

Today, with the rapid development of artificial intelligence, the traffic recognition system uses a more technologies to achieve the effect of fast and accurate recognition. Research on the intelligent traffic recognition system of cross-modal biometric fusion combined with real-time data operation also has far-reaching significance for the development and expansion of the application scope of cross-modal biometric fusion. The intelligent traffic identification system is currently the most urgent technical problem to be solved by the urban management. In recent years, scholars have applied cross-modal biometrics to traffic recognition systems, but there are relatively few applications and researches combined with real-time data operations. Therefore, the cross-modal biometrics combined with real-time data operation is applied to the research on the intelligent traffic recognition system, which has both theoretical and practical significance.

2 Related works

With the rapid development of science and technology, more and more people have studied the intelligent traffic recognition system of cross-modal biometric fusion. Recently, Abed et al. proposed an intelligent multimodal biometric verification model based on the artificial intelligence method. It is used to identify and verify the identity of a person. The model works by identifying the unique patterns of each individual’s iris and finger veins. Then model has overcome many challenges such as identity fraud, poor image quality, noise, and unstable surrounding environment [1]. But the model used by Abed et al. was not suitable for the content of the article research. Subsequently, Punyani et al. proposed an improved recognition method with combined matching scores and decision-level fusion technology (hybrid technology). He used the palmprint database of the Chinese Academy of Sciences and the ear database of the Western Pomeranian University of Technology in his experimental research [2]. But he did not use strong theoretical knowledge to substantiate his findings. Aleem et al. proposed a multimodal biometric system based on face and fingerprint. He employed an alignment-based elastic algorithm for fingerprint matching. In the improved facial feature extraction, he adopted the extended local binary pattern [3]. But the scope of the applicability of his research topics is relatively small. At the same time, Vasavi and Latha found that business information system unimodal biometric system suffers from various problems, such as nonuniversality, data noise, unacceptable error rate, and spoofing attacks. Multimodal biometric systems addressed these limitations [4]. But the data he collected were not new enough, which led to inaccurate findings.

On the basis of the research of other scholars, Rahman et al. proposed a multimodal biometric recognition system utilizing the fusion of grade level and rating level. The system combines the KINECT gait mode and the KINECT face mode. For the KINECT gait pattern, he proposed a gait pattern based on skeletal information processing [5]. But the gait patterns he described are always a bit surreal for modern technology. Later, Hamidi and Kamankesh proposed a multi-agent framework to simulate traffic control tools and their interaction with road traffic. He adopted the constrained Markov decision process model to represent the agent’s decision in a multi-objective decision-making environment. The policy objective with the highest priority is a single optimization objective, and other objectives are converted into constraints [6]. However, he did not well combine the experimental goal with the reality gap during the experiment. On the basis of the summary of other studies, Mohamed and Alshalfan proposed a novel traffic management system suitable for future traffic systems and smart cities based on the existing VANET and IoV [7]. But he only described how to reduce the delay time for the vehicle in the research. Also he did not take into account the entire transportation system.

The research of this article is as follows: (1) To better study the subject of this article, image classification experiments, ablation experiments, and acquisition frame rate tests are carried out. A conclusion is drawn that is consistent with the actual situation. (2) The cross-modal recognition algorithm is applied to the research of intelligent transportation system. In other applications, cross-modal recognition is often used as a method for face and voice recognition. However, this article is devoted to dig deep into the internal characteristics and advantages of cross-modal recognition and to find out the common points between cross-modal recognition and the intelligent traffic recognition system. It uses the characteristics of the algorithm itself to conduct research, so as to obtain an experimental result that is consistent with reality.

3 Method and design of cross-modal biometric applications in intelligent transportation systems

In the traffic recognition system, the vehicle and driver captured by surveillance video and other pictures of the vehicle and driver in the database are retrieved, which is the purpose of the intelligent traffic re-identification task. In recent years, many supervised and unsupervised intelligent traffic re-identifications have achieved remarkable results in retrieving vehicle and driver directions across cameras. To better integrate cross-modal biometrics into intelligent traffic recognition systems, this section proposes a cross-modal pedestrian re-identification method based on hyperspheric manifold embedding. It maps the image features of the vehicle onto a hyperspherical manifold in the high-dimensional feature space. By using the cosine value between high-dimensional vectors as the classification basis and measurement basis, it ensures the unity of the two task dimensions. It enables the objective functions of feature learning and metric learning to reach the optimal solution at the same time [8].

3.1 Cross-modal re-identification method for hyperspherical manifold embedding

For the cross-modal vehicle re-identification problem, learning the cross-modal vehicle image feature representation with the embedded identity information is the primary problem. In the traditional vehicle re-identification problem, many methods have used embedded representations based on identity information as deep features of vehicle images. In cross-modal vehicle re-identification methods, the embedded representation based on identity information is often used as an important constraint to extract cross-modal vehicle image features [9]. For example, in the deep zero-padding method, the objective function of the deep neural network is the embedded representation based on the identity information. The overall framework of the embedded representation based on identity information is shown in Figure 1.

Figure 1 
                  Embedding representation based on identity information.
Figure 1

Embedding representation based on identity information.

A set of samples is often used as a batch to optimize the model during network training, resulting in the following formula:

(1) K VN = 1 M O = 1 M log R E UO Y G 0 K = 1 V R E K Y G O .

When the fully connected layer without a bias term is used as the classification layer to calculate this probability value, the formula is converted as follows:

(2) K VN = 1 M O = 1 M log R E UO Y G 0 cos θ K = 1 V R E K Y G O cos θ .

The different modulus values of the vehicle and the angle between the two vectors are shown in Figure 2. From this, it is more intuitive to know which category the vector is classified into, depending on the modulo value of the weight vector of this category and the size of the included angle.

Figure 2 
                  Spatial relationship between two cross-entropy losses. (a) Angle 1 between the vehicle modulus and the vector, (b) Angle 2 between the vehicle modulo value and the vector.
Figure 2

Spatial relationship between two cross-entropy losses. (a) Angle 1 between the vehicle modulus and the vector, (b) Angle 2 between the vehicle modulo value and the vector.

According to the aforementioned representation, the vector of the classification layer can be modified to obtain the following formula:

(3) E K = E K E K , G O = G O G O .

After another correction, the weight vector is re-calculated into the following formula:

(4) O K = E K G O cos θ = cos θ .

Then finally, the following formula is obtained:

(5) T SPHERE = 1 M O = 1 M log R D cos θ UO K = 1 V R D cos θ K .

It can be observed from these formulas that, for the feature of any vehicle, its classification result is only related to the size of the included angle of the weight of a certain category of the feature. Its spatial relationship is shown in Figure 3.

Figure 3 
                  Schematic diagram of hypersphere manifold embedding.
Figure 3

Schematic diagram of hypersphere manifold embedding.

If it is assumed that a batch of input different modalities contains all samples M, the formula can be obtained:

(6) F LA ( O 2 O 1 ) = L = 1 M O 2 log F 2 F 1 .

Thus, the divergence formula is obtained as follows:

(7) F LA ( O 2 O 1 ) = L = 1 M F 1 log F 1 F 2 .

From the hyperspherical flow loss and divergence formulas, the calculation formula can be obtained as follows:

(8) A identity 1 = A sphere 1   + F LA ( O 2 O 1 ) ,

(9) A identity 2 = A sphere 2   + F LA ( O 2 O 1 ) .

3.2 Cross-entropy loss function

Cross-entropy loss function is the most commonly used loss function in image classification problems, and it is also a loss function currently used in cross-modal traffic re-identification. When using the cross-drop loss function to solve the image classification problem, generally after the image feature extraction, a fully connected layer will be connected according to the number of categories M of the image, and the input of the fully connected layer is the dimension of the given image feature vector. The output is the number of categories M of the image. As shown in formula (10), the obtained result represents the probability that the image belongs to this category [10].

(10) D L = R SL L = 1 M R SL ,

(11) L = 1 M D O = 1 .

The cross entropy loss function actually calculates the deviation between the classification probability output by the network and the expected classification probability. The following formula is obtained:

(12) K ( O , W ) = L = 1 M O ( L ) log W ( L ) .

After the constraints, the following formula is obtained:

(13) F ( G O B , G O O ) < F ( G O B , G K B ) ,   F ( G O O , G O B ) < F ( G O O , G K O ) .

It can be seen that the ordering of the loss function, so that the formula is:

(14) K INTRA = L = 1 M MAX ( O + F ( G L O , G L B ) ) .

Then, by measuring the material, the constraint formula between the modes can be obtained as follows:

(15) ( G O B , G O O ) < F ( G O B , G L O ) ,   F ( G O O , G O B ) < F ( G O O , G L B ) .

Combined with the idea of loss function, the bidirectional loss between modes can be formulated as follows:

(16) J CROSS = L = 1 M MAX ( O + F ( G O B , G 0 O ) ) ,

(17) K RANK = K CROSS + K INTRA .

However, for the entire network, its objective function is expressed as follows:

(18) K VISIBLE = K IDENTITY + K IDENTITY 2 + K INTRA + L CROSS .

If the two functions are directly calculated by the cosine of the included angle, for two features, its distance formula is given as follows:

(19) F ( G L B , G L O ) = 1 COS ( G O B , G K O ) .

Finally, the formula can be obtained:

(20) E = IDM Y .

3.3 Neural network model to build cross-modal traffic re-identification model

Traffic re-identification, as the problem of finding and matching images with the same ID as the query image from vehicle images, is often abstracted as an image classification problem. The classification problem generally refers to the application of vehicle ID attributes for classification training [11]. In the image classification problem, a very important point is how to convert the two-dimensional information of the image into one-dimensional feature vectors that are relatively independent of each other. Before the convolutional neural network convolutional neural networks (CNN) was widely used, we usually used algorithms such as feature point matching, translation, rotation, histogram of oriented gridients, and local binary patterns (LBP) to find the similarity of two images [12]. Traditional manual feature extraction algorithms face the problems of complex manual design, poor robustness, and slow image matching. In this article, CNN-based deep neural network is used for feature extraction. Currently, the more common image feature extraction networks mainly include AlexNet, VGG, GoogLeNet, and ResNet. The corresponding network structure parameters are presented in Table 1.

Table 1

Network structure parameters of common image feature extraction networks

AlexNe VGG GoogLeNet ResNet
Number of layers 8 19 22 152
Number of convolution layers 5 16 21 151
Size of convolution kernel 11, 5, 3 3 7, 1, 3, 5 7, 1, 3, 5
Number of fully connected layers 3 3 1 1
Size of fully connected layers 4,096, 4,096, 1,000 4,096, 4,096, 1,000 1,000 1,000
Top five errors (based on ImageNet) 16.4% 7.3% 6.7% 3.57%

From the basic principles of deep neural networks, generally, the deeper the network, the better the ability to extract image features. The number of convolutional layers in ResNet is significantly more than the other three feature extraction networks. Therefore, this article gives priority to using the ResNet network for feature extraction. As shown in Figure 4, after a large number of experiments, the accuracy of the test set and training set decreases simultaneously after the number of network layers is deepened. So the problem is not due to overfitting.

Figure 4 
                  Training and test errors for 20-layer and 56-layer networks.
Figure 4

Training and test errors for 20-layer and 56-layer networks.

There are already batch normalization included. Methods such as group normalization alleviate the problems of gradient disappearance and gradient nonconvergence to a certain extent, but the convergence effect of the deep network is still relatively poor [13]. Some scholars have proposed that for deep neural networks, the worst result of each layer of the network is that it does not work, which is equivalent to passing the input directly to the output. In this way, the deep neural network degenerates into a shallow neural network, and there will be no network degradation and gradient nonconvergence problems. For this, identity maps can be proposed to construct residual blocks, as shown in Figure 5.

Figure 5 
                  Schematic diagram of the ResNet residual block structure.
Figure 5

Schematic diagram of the ResNet residual block structure.

The standard residual block structure of ResNet34 and ResNet50 is shown in Figure 6. The improvement of the latter relative to the former is mainly through the use of 1 * 1 convolution kernels for dimension increase and dimension reduction, which reduces the amount of parameters of the convolution layer.

Figure 6 
                  Basic residual block and improved residual block structure diagram.
Figure 6

Basic residual block and improved residual block structure diagram.

4 Experiment and intelligent traffic recognition system with cross-modal feature fusion

Because of the rapid development of network technology and information technology in China, we can know that we are already in the era of real-time data sharing. Big data represent the coexistence of many data information [14]. It has the characteristics of rich data types, high data processing efficiency and large amount of stored data information. Compared with the previous traffic data information, traffic big data have the following advantages: First, in terms of data information storage, there is a large amount of storage and a wide range of data information sources. It can be stored for a long time. The second point is that in terms of data information processing, the speed has become faster, the traffic flow has very good real-time efficiency, and the actual processing speed of data information has become faster than before. The third point is that diversification is a characteristic of modalities. There are many sources of data and information, and there are many different types, and its specific performance is diversified modalities. Fourth, it has a very high value. Data information has the characteristics of space, time, and many other aspects, which is its rich value. Fifth, it has the characteristics of visualization. Depending on the traffic operation, it can result in visualization. Therefore, in this article, combined with real-time data operation, the cross-modal features are integrated into the intelligent traffic recognition system, and experiments and analysis are made.

4.1 Traffic identification combined with real-time data operation

4.1.1 Collection and processing of traffic identification data information

According to the effective operation of real-time data, it can make the acquisition of traffic-related data information show microscopic and dynamic characteristics. It also uses Hadoop technology to calculate and store and take traffic-related data information.

4.1.2 The optimization of public transportation can meet the requirements of intelligent services

Through the operation of real-time data, it can dynamically monitor bus stops in real time. People can use the relevant application software to search the running situation of the bus and collect the vehicle information in real time. In this way, it can prevent the occurrence of difficulties in getting on the bus and blind waiting, and the allocation of related traffic resources also becomes reasonable.

4.1.3 Improve traffic safety

During the operation of the traffic recognition system, traffic accidents may occur due to the weather, the driver’s own reasons, the road conditions of the day and many other reasons. But using real-time data, it is possible to predict the occurrence of accidents and solve information problems quickly and efficiently. It can automatically give warnings in advance according to traffic accidents, so that the probability of traffic accidents can be predicted in advance and dealt with effectively.

4.1.4 Provide traffic guidance solutions

In the process of urban traffic operation, data related to traffic data information such as bus data, video surveillance data, and network traffic data appear at every moment. However, based on real-time data operations, the data information in this can be effectively extracted. It quickly relies on these data information to be analyzed and processed. At the same time, it extracts and tests the traffic situation through the comparative prediction model. The relevant transportation supervision and management department will make a release based on the obtained forecast information. This provides a reminder effect to the traveler. It can also shorten the distance the vehicle actually travels. The travel time of the vehicle is also reduced, which makes good use of the traffic capacity of the road network.

4.2 Cross-modal traffic re-identification image classification experiment

From the aforementioned algorithms, in Table 2, we present the classification results of different algorithms. The final classification result of the JointBoostI2C algorithm is better than that of the logic data center (LDC) and JointBoost algorithms [15]. It can be seen that while designing an appropriate weak classifier, the combination of weak classifiers should also be considered, which is beneficial to improve the classification effect.

Table 2

Comparison of the average classification results of JointBoost, LDC, and JointBoost I2C (%)

Different methods Average detection rate Average false alarm rate
JointBoost 79.8 11.8
LDC 88.6 8.7
JointBoost I2C 94.7 6.5

We compare the average classification performance of images with national broadband neural networks (NBNN) and LI2C using the method proposed in this section in 15 scenes and Caltech101 image database. In both databases, the average classification accuracy was compared for each of the 15 categories of images in 15 scenes and the 6 categories of images in Caltech101, as shown in Figure 7.

Figure 7 
                  Comparison of the classification results of three methods for each type of image in the Scene-15 and Caltech101 image libraries: (a) Scene-15 and (b) Calthech-101.
Figure 7

Comparison of the classification results of three methods for each type of image in the Scene-15 and Caltech101 image libraries: (a) Scene-15 and (b) Calthech-101.

As can be seen from Figure 7, for which images are easily distinguishable, the method used in this article is similar to the other two methods such as Suburb and Office, of Scene15, and Carside and Face of Caltech101. For those images that are difficult to distinguish, the classification performance of the method proposed in this section is higher than the other two methods such as Scene15’s living room, store, Caltech101 bufferfly, and bear [16].

Table 3 presents the average classification performance of the three methods. Overall, the average performance of the image classification used in this article has a large improvement over the other two methods. In this article, the training and image classification of the JointBoost I2C classifier takes 381, 4 and 74, and 1.5 s for 15 types of images in Scene 15 and 6 types of images in Caltech 101, respectively.

Table 3

Comparison of the average classification results of LI2C, JointBoost, and JointBoost I2C (%)

Different methods Scene-15 Caltech 101
NBNN 72.5 ± 0.92 70.4 ± 2.62
LI2C 81.2 ± 0.54 77.9 ± 1.31
The method of this article 89.7 ± 0.49 84.8 ± 0.83

The recognition results of the method in this chapter on the Caltech 101 dataset are presented in Table 4. The position (i, j) in the table represents the number of images that the ith image is classified as the j-type image, and the value of the cell on the diagonal in the table represents the number of correctly classified images. The larger the value, the better the classification effect [17]. It can be seen from the table that the method in this article can effectively re-identify multiple types of vehicles, and its average recognition rate is more than 80%.

Table 4

Recognition results on Caltech 101 dataset

Category Category 1 Category 2 Category 3 Category 4
Category 1 173 1 7 1
Category 2 3 183 3 4
Category 3 5 1 164 2
Category 4 0 0 5 175

A cross-modal vehicle feature registration method based on double-aligned feature embedding, which uses a fusion loss function as the constraint of the network. Therefore, this article first conducts two ablation experiments on the RegDB database for the three loss functions. It proves the validity of the three functions and the necessity of combining them [18]. Figure 8 presents the quantitative experimental results. The results of the ablation experiments show that the intraclass distribution loss function can improve the Rank1 recognition rate and mAP value by 6–7 percentage points based on the baseline method. This shows that it improves the modal invariance feature by reducing the distribution difference between different modal images of the same vehicle. It can effectively deal with the imbalance of feature information caused by modal changes.

Figure 8 
                  Ablation experiment results. (a) First ablation experiment (b) second ablation experiment.
Figure 8

Ablation experiment results. (a) First ablation experiment (b) second ablation experiment.

When using infrared images for retrieval on the RegDB dataset, the experimental results are shown in Figure 9. Compared to the task of retrieving infrared images using visible light images, the results of all models degrade when retrieving infrared images. Especially the ZeroPadding method has the most obvious drop. This is because the method uses only one deep network and cross-drop loss to simultaneously extract features from different modalities. Neither the spatial misalignment of cross-modal vehicle images nor the modal misalignment are effectively addressed. Therefore, the feature robustness is poor, and the effect drops significantly in the infrared-visible light retrieval task [19].

Figure 9 
                  Experimental results under different retrieval directions. (a) The first mode and (b) the second mode.
Figure 9

Experimental results under different retrieval directions. (a) The first mode and (b) the second mode.

4.3 Acquisition frame rate test of cross-modal traffic recognition acquisition module

The test of the acquisition frame rate of this module considers the test parameters from two aspects: (1) the choice of light and (2) the choice of time unit.

  1. Light selection: In the same day, selecting the noon time period with better light and the dusk time period with poor light for testing.

  2. Time unit selection: To avoid errors as much as possible, the tests were carried out at 10, 30, 60, 120, and 600 s. The test results are presented in Table 5.

Table 5

Acquisition frame rate test

Frames time 10 s 30 s 60 s 120 s 600 s Average frame rate
The light is better Frame number (frame) 273 816 1,668 3,324 16,440 27.48 frame per second
Frame rate (frame per second) 27.3 27.2 27.8 27.7 27.4
The light is poor Frame number (frame) 255 768 1,548 3,060 15,240 25.56 frame per second
Frame rate (frame per second) 25.5 25.6 25.8 25.5 25.4

The results presented in Table 5 show that after the data collection at different time units to obtain the average, there is a gap of about 2 frames between the average frame rates of better light and poor light. This shows that light has a certain influence on the acquisition frame rate of image recognition. Good light will be more conducive to the operation of the system [20]. It also shows that the reason for the intensity of light is also an important factor affecting the identification of the traffic system. It is important to treat the light factor as a distraction in the research process.

Based on the self-built car logo image library, this article chooses a feature extraction method with rotation invariance and scale invariance for car logo recognition. It is also known as improved scale-invariant feature transform (SIFT) feature extraction [21]. Because the vehicle recognition is affected by the natural environment, rain, and snow weather when capturing images, it is necessary to remove the interference items from the images. At the same time, the image capture is carried out on the wireless client. It requires the image to have a certain degree of adaptability to its own rotation and scaling, and the improved SIFT feature satisfies these performance requirements [22]. The experimental results are shown in Figure 10. To make the experiment more scientific and accurate, it is divided into two kinds of experiments: simple car logo and complex car logo.

Figure 10 
                  Different feature vector extraction for simple and complex car logos. (a) Extraction of different feature vectors for simple car logos and (b) extraction of different feature vectors for complex vehicle logos.
Figure 10

Different feature vector extraction for simple and complex car logos. (a) Extraction of different feature vectors for simple car logos and (b) extraction of different feature vectors for complex vehicle logos.

Through the aforementioned experimental results, it can be found that when the feature extraction is performed on several simple car logo images, the correct recognition rate of the car logo by various feature extraction methods can basically reach more than 85%. In the feature extraction of Mazda, Honda, Lexus, Citroen, Chevrolet, and Mercedes-Benz with relatively simple structures, the advantages of SIFT are not obvious. But the overall performance is still better than several other feature vector extraction methods [23]. But in the identification of complex structure vehicle logo, the efficiency is obviously better than other several feature extraction methods. During the experiment, relatively complex car logos such as Rolls Royce, Maserati, Koenigsegg, Saab, Aston Martin, and Jaguar were selected for the comparison of experimental results. It can be seen that although the final recognition rate of various types of feature extraction methods has reached more than 80%, the recognition accuracy of complex vehicle logos is lower than that of simple vehicle logos to a certain extent. But the drop rate of the SIFT feature extraction method is the smallest. Therefore, the recognition effect of SIFT is also the best [24].

5 Discussion

This article is devoted to the research and design of a cross-modal biometric fusion intelligent traffic recognition system combined with real-time data operations. This article applies computational model of cross-modal recognition to the complex analysis and processing of the intelligent traffic recognition system. It expands the application scope of cross-modal recognition and also it is a new attempt to research the complexity of intelligent traffic recognition system. Cross-modal recognition is mined through experiments of a cross-modal recognition fused traffic recognition system. As an important tool to study the complexity of the system, it has a certain potential in the study of the complexity of the intelligent traffic recognition system. In addition, on the basis of in-depth research by scholars worldwide, the model is improved. The model is combined with the cross-modal biometric fusion intelligent traffic recognition system with real-time data operation, which makes the model suitable for the research object. For the research of cross-modal biometrics, this article starts with the cross-entropy loss function in the most basic cross-modal recognition algorithm and analyzes the application method of the function. It successfully combines a cross-modal recognition algorithm with a traffic recognition system. In the stage of empirical analysis, image classification experiments, ablation experiments, and acquisition frame rate experiments were done, and the results of the application of cross-modal recognition in traffic recognition systems were obtained. The data of these three experiments are analyzed, and the results show that the obtained results are in line with the actual situation.

Through the analysis of this case, it shows that the intelligent traffic recognition system based on cross-modal biometric fusion is more effective than a single type of the traffic recognition system. In traffic recognition, the database of cross-modal recognition can more accurately and quickly locate vehicles and images of past vehicles. In the specific practical operation process, the traffic recognition system can perform a retrieval of the vehicle according to its own judgment and reduce the risks existing in the traffic reasonably and flexibly. Moreover, the intelligent traffic recognition system using cross-modal recognition is obviously more accurate, fast, and comprehensive.

This article presents a case study with an analysis of the experiments conducted. First, through the algorithm and qualitative analysis of cross-modal identification, method applicable for the study is determined. It uses the spherical model to analyze the data through image classification experiments, ablation experiments, and acquisition frame rate experiments. This article concludes that the intraclass distribution loss function can improve the Rank1 recognition rate and mAP value by 6–7% points based on the baseline method. This shows that improving the modal invariance feature by reducing the distribution difference between different modal images of the same vehicle can effectively deal with the feature information imbalance caused by modal changes.

6 Conclusion

Through case studies, the following conclusions are drawn: In general, intelligent traffic recognition systems operating with real-time data do have many advantages. The cross-modal biometrics are integrated into the intelligent traffic recognition system, which can not only improve the recognition rate of vehicles but also improve the re-recognition rate of vehicles. Through the use of experimental analysis, it can be found that in the classification of vehicles, the complexity of vehicle logos, the traffic recognition under different light, the use of cross-modal recognition can indeed increase the traffic recognition rate, as shown in the experiment. This requires a more detailed study and quantitative analysis of the experimental objectives. It can determine a more effective method. The research of intelligent traffic recognition system based on the combination of real-time data operation and cross-modal biometrics is discussed. The selection of projects is relatively limited, and the real cross-modal traffic recognition system will often face many problems. The real traffic recognition system should be re-analyzed in combination with multiple factors. This careful analysis will be of greater value and, of course, greater difficulty. Therefore, we believe that such studies will only be more or less in the future. The application range of cross-modal recognition will also become more and more extensive, and the intelligent traffic recognition system will also become more and more effective. It is worth looking forward to the integration of cross-modal biometrics in real-world traffic recognition systems in the future.

Acknowledgments

This work was supported by The Foundation for the Key Research and Promotion of Henan Province (Science and Technology) [222102210211]: Cross modal biometric fusion intelligent traffic Identification system.

  1. Funding information: This work is supported by the Key Science and Technology Program of Henan Province, China (222102210211).

  2. Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, author- ship, and/or publication of this article.

  3. Data availability statement: The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

[1] E. A. Abed, R. J. Mohammed, and T. Shihab, “Intelligent multimodal identification system based on local feature fusion between iris and finger vein,” Indonesian J. Electr. Eng. Comput. Sci., vol. 21, no. 1, pp. 224–232, 2021.10.11591/ijeecs.v21.i1.pp224-232Search in Google Scholar

[2] P. Punyani, R. Gupta, and A. Kumar, “A multimodal biometric system using match score and decision level fusion,” Int. J. Inf. Technol., vol. 14, no. 2, pp. 725–730, 2022.10.1007/s41870-021-00843-3Search in Google Scholar

[3] S. Aleem, P. Yang, S. Masood, P. Li, and B. Sheng, “ An accurate multi-modal biometric identification system for person identification via fusion of face and finger print,” World Wide Web, vol. 23, no. 2, pp. 1299–1317, 2020.10.1007/s11280-019-00698-6Search in Google Scholar

[4] K. Vasavi and Y. Latha, “RSA cryptography based multi-modal biometric identification system for high-security application,” Int. J. Intell. Eng. Syst., vol. 12, no. 1, pp. 10–21, 2019.10.22266/ijies2019.0228.02Search in Google Scholar

[5] M. W. Rahman, F. T. Zohra, and M. L. Gavrilova, “Score level and rank level fusion for kinect-based multi-modal biometric system,” J. Artif. Intell. Soft Comput. Res., vol. 9, no. 3, pp. 167–176, 2019.10.2478/jaiscr-2019-0001Search in Google Scholar

[6] H. Hamidi and A. Kamankesh, “An approach to intelligent traffic management system using a multi-agent system,” Int. J. Intell. Transp. Syst. Res., vol. 16, no. 2, pp. 1–13, 2018.10.1007/s13177-017-0142-6Search in Google Scholar

[7] S. Mohamed and K. A. Alshalfan, “Intelligent traffic management system based on the internet of vehicles (IoV),” J. Adv. Transp., vol. 2021, no. 4, pp. 1–23, 2021.Search in Google Scholar

[8] M. Merouane, “An approach for detecting anonymized traffic: Orbot as case study,” Autom. Control. Comput. Sci., vol. 56, no. 1, pp. 45–57, 2022.10.3103/S0146411622010072Search in Google Scholar

[9] Z. Liu, R. Wang, and D. Tang, “Extending labeled mobile network traffic data by three levels traffic identification fusion,” Future Gener. Comput. Syst., vol. 88, no. NOV, pp. 453–466, 2018.10.1016/j.future.2018.05.079Search in Google Scholar

[10] R. Gayathri, M. A. Bhairavi, and D. Aravind, “An intelligent and real time system for automatic driven toll gate system under complex scenes,” Int. J. Comput. Intell. Res., vol. 14, no. 1, pp. 1–13, 2018.Search in Google Scholar

[11] R. Sathiyaraj and A. Bharathi, “An efficient intelligent traffic light control and deviation system for traffic congestion avoidance using multi-agent system,” Transport, vol. 35, no. 3, pp. 1–9, 2019.10.3846/transport.2019.11115Search in Google Scholar

[12] Z. Wang and Y. Ma, “Detection and recognition of stationary vehicles and seat belts in intelligent Internet of Things traffic management system,” Neural Comput. Appl., vol. 9, pp. 1–10, 2021.10.1007/s00521-021-05870-6Search in Google Scholar

[13] J. Wang, B. He, J. Wang, and T. Li, “Intelligent VNFs selection based on traffic identification in vehicular cloud networks,” IEEE Trans. Veh. Technol., vol. 68, no. 5, pp. 4140–4147, 2019.10.1109/TVT.2018.2880754Search in Google Scholar

[14] D. Sivabalaselvamani, “Real time traffic flow prediction and intelligent traffic control from remote location for large-scale heterogeneous networking using tensorflow,” Int. J. Future Gener. Commun. Netw., vol. 13, no. 1, pp. 1006–1012, 2020.Search in Google Scholar

[15] Z. Liu and C. Wang, “Design of traffic emergency response system based on internet of things and data mining in emergencies,” IEEE Access, vol. 7, no. 99, pp. 113950–113962, 2019.10.1109/ACCESS.2019.2934979Search in Google Scholar

[16] S. M. Rajalaksh, A. Deborah, R. S. Thiru, K. Priya, and M. Rajendram, “RFID-based traffic violation detection and traffic flow analysis system,” Int. J. Pure Appl. Math., vol. 118, no. 20, pp. 319–328, 2018.Search in Google Scholar

[17] H. V. Chand, and J. Karthikeyan, “Survey on the role of IoT in intelligent transportation system,” Indonesian J. Electr. Eng. Comput. Sci., vol. 11, no. 3, pp. 936–941, 2018.10.11591/ijeecs.v11.i3.pp936-941Search in Google Scholar

[18] D. L. Dinh, H. N. Nguyen, H. T. Thai, and K. H. Le, “Towards AI-based traffic counting system with edge computing,” J. Adv. Transp., vol. 2021, no. 2, pp. 1–15, 2021.10.1155/2021/5551976Search in Google Scholar

[19] K. Bhagavan, S. S. Saketh, G. Mounika, M. Vishal, and M. Hemanth, “IOT based intelligent street lighting system for smart city,” Int. J. Eng. Technol., vol. 7, no. 2, pp. 345–347, 2018.10.14419/ijet.v7i2.32.15709Search in Google Scholar

[20] M. M. Ahmed, M. Abdel-Aty, and R. Yu, “Bayesian updating approach for real-time safety evaluation with automatic vehicle identification data,” Transp. Res. Rec., vol. 2280, no. 1, pp. 60–67, 2018.10.3141/2280-07Search in Google Scholar

[21] A. Jenefa and B. S. Moses, “A multi-phased statistical learning based classification for network traffic,” J. Intell. Fuzzy Syst., vol. 40, no. 14, pp. 1–19, 2021.10.3233/JIFS-201895Search in Google Scholar

[22] R. Thiagarajan and D. S. Prakashkumar, “Identification of passenger demand in public transport using machine learning,” Webology, vol. 18, no. Special Issue 02, pp. 223–236, 2021.10.14704/WEB/V18SI02/WEB18068Search in Google Scholar

[23] B. H. Sun, W. W. Deng, B. Zhu, J. Wu, and S. S. Wang, “Identification of vehicle motion intention based on reaction behavior model,” Jilin Daxue Xuebao (Gongxueban)/Journal Jilin Univ. (Eng. Technol. Ed.), vol. 48, no. 1, pp. 36–43, 2018.Search in Google Scholar

[24] G. Lee, R. Mallipeddi, and M. Lee, “Trajectory-based vehicle tracking at low frame rates,” Expert. Syst. Appl., vol. 80, no. SEP, pp. 46–57, 2017.10.1016/j.eswa.2017.03.023Search in Google Scholar

Received: 2022-04-26
Revised: 2022-07-11
Accepted: 2022-07-22
Published Online: 2022-10-17

© 2022 Wei Xu and Yujin Zhai, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 6.12.2022 from frontend.live.degruyter.dgbricks.com/document/doi/10.1515/comp-2022-0252/html
Scroll Up Arrow