Seismic vulnerability signal analysis of low tower cable-stayed bridges method based on convolutional attention network

: Due to the particularity and complexity of sedimentary environments, the wave impedance di ﬀ erences between di ﬀ erent re ﬂ ection interfaces in underground media may vary greatly. Therefore, an encoder – decoder neural network is proposed to enhance erroneous seismic weak re ﬂ ection signals. The convolutional neural network (CNN) has the problem of di ﬃ culty in parallel computing, resulting in slow network training and computational e ﬃ - ciency. Considering that attention has an innate global self-attention mechanism, can compensate for long-term dependency de ﬁ ciencies, and has the ability to perform parallel computing, which greatly compensates for the short-comings of CNNs and recurrent neural networks, a seismic impedance inversion method based on convolutional attention networks is proposed. To improve the ability to extract noise, residual structure and convolutional attention module (CBAM) were introduced. The residual structure utilizes residual jump to weaken network degradation and reduce the di ﬃ culty of feature mapping. The CBAM uses a mixed attention weight of channel and space, which can enhance features with high correlation and suppress features with low correlation. In the decoder, in order to improve the dimension recovery ability of feature fusion, bilinear interpolation is selected for upsampling. The application results of the model and actual data indicate that this method can e ﬀ ectively enhance the weak re ﬂ ection signals caused by the formation itself and improve the reservoir identi ﬁ cation ability of seismic data.


Introduction
At present, there are few reports on how to solve the problem of incomplete measurement of vibration data in earthquake simulation experiments in existing research [1][2][3].Neural networks have been widely used in fields such as earthquake prediction and active control of structural vibration due to their large-scale parallel processing and distributed information storage capabilities, strong self-learning, association, fault tolerance capabilities, good adaptability and self-organization, and nonlinear fitting systems with multiple inputs and outputs [4].Neural networks also have the advantages of not needing to establish mathematical models and having the ability to identify and predict unknown systems and are often used as tools for data prediction.Guo et al. [5] used BP neural networks to predict the results of different experimental conditions with a small number of experiments.From this, it can be seen that neural networks have the potential to predict incomplete response data in earthquake simulation experiments.
Seismic weak reflection signal enhancement methods mainly rely on convolutional neural networks (CNNs) for image processing, recurrent neural networks (RNNs) for natural language processing, and new networks formed by the combination of these networks [6].As a CNN, it has the problem of translation invariance in convolution and only focuses on local information, which makes it difficult for convolutional networks to capture and store longdistance dependency information in sequence processing.However, expanding the convolutional kernel and channel number to capture long-term dependencies can lead to the disappearance of gradients, resulting in the lack of convergence in the entire network training [7].So CNNs have certain limitations.As an RNN, due to its ability to store effective information in various scattered neurons, it cannot effectively preserve those overly long data dependency problems, resulting in insufficient long-term dependency capture in longer sequence problems.Moreover, RNNs are difficult to parallelize effectively, resulting in slow computational efficiency and wasting time and cost during network training.Remote early warning can calculate the three elements of the earthquake [8] through earthquake location and magnitude calculation methods, then predict the intensity of the area where the seismic wave does not reach through the attenuation relationship of ground motion parameters [9].This method fully considers the source parameters and predicts the intensity through the attenuation relationship obtained by statistics.The intensity distribution of the entire intensity field can be predicted without using the waveform information of all stations.However, the timeliness of this method is not as good as that of local early warning.There is a large blind area for early warning, and the predicted intensity distribution is generally in accordance with a certain law [10].The isointensity line is generally circular, while the actual intensity distribution is often uneven, and there are some areas with abnormal intensity.Therefore, predicting through attenuation relationships also has certain limitations.
With the development of artificial intelligence, deep learning technology has been applied to many fields, which has also brought new ideas to the field of exploration geophysics.Among them, the field of seismic data processing has achieved some research results through deep learning, and the attenuation of seismic signal noise has become one of the more active research fields.After introducing CNNs, various noise attenuation methods based on CNN architecture have been proposed.Jiang et al. [11] used denoising convolutional networks to attenuate random noise in seismic data, resulting in better denoising results than traditional methods.Luo and Zhao [12] applied the data augmentation method and utilized a U-shaped CNN (Unet) to effectively suppress random noise.Qiao et al. [13] added residual blocks on the basis of Unet to solve the problem of generalization, which not only achieved the denoising effect but also protected effective signals.Zhang et al. [14] mentioned adaptive learning of seismic signal features and the use of convolutional denoising autoencoders to eliminate strong noise.Margheriti et al. [15] constructed a network containing global context and attention mechanisms by improving the network, which suppressed random noise and retained more local details.Guo et al. [16] constructed an unsupervised residual convolutional network that uses double-layer convolutional blocks to improve computational efficiency and achieve adaptive noise attenuation.This article proposes a CNN (RAUnet) that integrates residual attention mechanism [17], introducing convolutional attention mechanism and residual structure to improve denoising effect and enhance model generalization.The denoising performance of the algorithm is demonstrated through experiments on synthetic and actual seismic signals.
This article proposes the seismic weak reflection signal enhancement method based on convolutional attention network.By performing fractional power operation on the reflection coefficient, a sequence of quasi-reflection coefficients is obtained.The synthetic seismogram and pseudo-synthetic seismogram are calculated with the primitive reflexes coefficient and pseudo-reflection coefficient, and the training sample set is generated.The residual [18] network is then trained to establish a mapping relationship between synthetic seismic records and quasisynthetic seismic records.Finally, the network is applied to seismic data to enhance weak seismic reflection signals.The method in this article is not to eliminate strong seismic signals but to narrow the relative difference between weak and strong reflection coefficients by constructing a power reflection coefficient model, thereby highlighting weak reflection signals.

RAUnet denoising method
In order to improve the denoising effect and enhance the generalization of the model, this article constructs a CNN (RAUnet) that integrates a residual attention mechanism [19,20].The network adopts batch standardization and activation functions with leakage rectification after each convolution layer, adds residual structure and convolution attention modules in the encoder, respectively, and uses bilinear interpolation in the decoder for upsampling [21].
Batch normalization (BN) [22] refers to the standardization of each batch of data at the network layer when training a neural network using the gradient descent method.The position of batch standardization is between the convolutional layer and the nonlinear layer, which is regarded as an interlayer mechanism of the network [23].It provides a commonly used method for reinitializing parameters in deep networks, which can significantly reduce the issue of update coordination between multi-layer networks, thereby achieving the goal of reducing the offset of covariates within the network [24].The results after batch standardization can be expressed as follows: where a k and b k are input samples [25] and output results for BN [26] layer, respectively; μ B and σ B , respectively, represent the expected and standard deviation of the sample [27]; ε is a constant; a ˜k is a k regularization; and γ and β are leachable parameters [28].The activation function is located behind the batch standardization layer and is used to control neurons, which can solve the nonlinear problems encountered in seismic signal processing.The commonly used activation function is mainly the linear rectification activation function (ReLU) [29], but if the transfer variable is less than zero, the neurons in the negative value interval will not be updated, which will lead to the loss of effective information.To avoid gradient disappearance, the linear rectification activation function [30] (leaky ReLU, Figure 1) with leakage is used, and its formula is as follows: where h is the input of the activation function [31] and c is a very small constant.The value of c in this article is set to 0.2.
Attention mechanisms [32] can be divided into two categories: channel attention and spatial attention.Channel attention can enhance or weaken different channels for different tasks to determine the importance of each feature channel.Spatial attention believes that not all regions contribute equally to the task, mainly emphasizing areas related to the task.Both types of mechanisms have an enhancing effect on the network, and their effective combination brings greater promotion to the network.Therefore, this article adopts a mixed domain attention mechanism combining channel and space [33] methods, adding learning weights to high-frequency data in the channel domain and local correlations in the spatial domain [34], respectively, to construct a convolutional attention module as shown in Figure 2.
It is generally believed that the deeper the network [35], the more likely it is to obtain a model with better performance, but this is not the case.The practice has shown that for certain large datasets [36], even if the number of network layers continues to deepen, the ideal effect cannot be achieved, which can backfire and lead to network degradation and other problems.In response to the above issues, Wu et al. [37] introduced skip connections, namely the residual learning framework (ResNet), which combines shallow and deep features to form the input for subsequent operations, effectively alleviating the loss of input features and improving the performance of feature recovery [38].This article adopts a residual structure as shown in Figure 3.The skip stacking method involves the stacking of pixel values in the image.When the dimensions are the same, they can be directly added.If they are different, they need to be converted into dimensions first and then added [39].

Data selection
The training results of artificial neural network algorithms largely depend on the quality of the training samples.Therefore, when selecting data, it is necessary to consider all situations as much as possible, including sufficient data volume and ensuring the uniformity of data distribution.This article first selected 629 earthquake events recorded by the KiKnet strong earthquake network in Japan from 1999 to 2016 for research, with magnitudes ranging from MS3.7 to MS9.0.Among them, there are more seismic events below magnitude 7, and their distribution is uniform, while seismic events above magnitude 7 are relatively rare.In this regard, this article focuses on the analysis of two earthquakes with a magnitude of more than 8, namely, the earthquake with a magnitude of 8.0 on September 26, 2003, and the earthquake with a magnitude of 9.0 on March 11, 2011.Both of them occurred in the Seismic vulnerability signal analysis  3 waters of Japan, and the recently triggered stations were 103 and 145 km, respectively.In the analysis and testing, it was found that these two earthquakes were unable to accurately predict the intensity of major earthquakes occurring on land, such as the Wenchuan earthquake, and there may be significant errors in predicting the intensity of nearby instruments.Therefore, it was decided to include strong earthquake data from the Wenchuan earthquake on May 12, 2008, the Jiji earthquake on September 21, 1999, and the Ludian earthquake on August 3, 2014, in the sample.There were a total of 632 seismic events used for testing.
If there are too many data choices and different types, it will increase the difficulty of network training and also increase the difficulty of prediction.If there are too few data choices and insufficient sample types available, the training samples may not be convincing.In the selection of stations, try to choose stations with significant predictive value and exclude the influence of meaningless and abnormal stations.The following points should be considered: 1) Each station of the KiK-net strong earthquake observation network in Japan is divided into surface stations and underground bedrock stations.This time, all three directional data from underground bedrock stations are used, and the three earthquakes added later are data from surface stations due to limited data sources; 2) According to the instructions in the "China Earthquake Intensity Scale (GB/T 17742-2020)" [38], stations with intensities less than 1 in actual calculations are excluded; 3) The propagation path of earthquakes with deeper focal depths is complex and has less impact than shallow earthquakes.This article focuses on predicting shallow earthquakes and excludes earthquake-triggering stations with focal depths greater than 50 km; 4) Small earthquakes and large earthquakes with larger focal distances are not the objects of early warning research.In record screening, the relationship between magnitude and focal distance was found based on the influence range of the earthquake, as shown in the following equation: where R is the source distance and M is the magnitude.In the process of data selection, it is also necessary to exclude  stations that have experienced head loss, data anomalies, and other situations.According to this rule, a total of 2,231 stations were selected in this article.The distribution of the relationship between the focal distance, magnitude, and intensity of these stations is shown in Figures 9 and 10.The distribution of the selected stations in this article should be as uniform as possible and represent the triggering situation of the vast majority of seismic stations.

Overview of vibration table testing and construction of neural networks
In order to study the dynamic coupling effect between valves and pipelines in nuclear power valve pipeline systems and to discuss the dynamic behavior of valve pipeline systems during earthquakes, a "valve pipeline system seismic simulation test" was designed for pipeline systems without valves and with valves installed [39].The nodes of the tube unit in the model correspond one-to-one to the positions of the experimental measurement points, and the position number of the measurement points is used to refer to the nodes.Due to the complete symmetry of the pipeline structure in the Y and Z directions, only the Y-direction test was designed for it during the testing process.Condition 1 in Table 1 is to investigate the dynamic characteristics of pipeline structures.In order to study the response of pipeline structures under different seismic excitations and provide experimental data for seismic margin analysis of pipeline structures, two artificial seismic wave tests with different amplitudes were designed, namely condition 2 and condition 3. The maximum amplitude of the   Seismic vulnerability signal analysis  5 artificial seismic wave used in condition 2 is 13.2 m/s 2 , as shown in Figure 4(a).The maximum amplitude of the artificial seismic wave used in condition 3 is 8.6 m/s 2 , as shown in Figure 4(b).
The acceleration time history curve measured at the maximum position measurement point 7 in condition 2 is shown in Figure 5. From Figure 6, it can be seen that in the response time history curve of this measuring point, there are a total of 14 sampling points in the time periods of 17.226-17.228,17.338-17.344,and 17.399-17.402s.The same acceleration data obtained from adjacent sampling points clearly does not conform to objective laws, indicating that under the action of artificial seismic wave 1, the response of measurement point 7 may have errors or incomplete measurements during the above time period.Due to the numerous working conditions involved in the entire process and the fact that the test structure was replaced several times during the test, this issue was not discovered during the test process, and the test bench was removed after the completion of the test, making it impossible to repeat the test.In order to make the experimental data available and achieve the experimental objectives, a neural network is used to predict the erroneous data in the experiment to supplement the time history curve of measurement point 7 in condition 2.
In order to avoid significant differences between local and global minimum values as much as possible, the calculations for each network are carried out multiple times, starting from multiple randomly selected initial value points.For each model, fitting is performed using parameters ranging from 1 to 20 s, resulting in a total of 60 network models.This article conducts result statistics on each model calculated, using each network to predict the absolute value of the intensity difference between each station in the training sample and confirmation sample and using its average value as the discriminant criterion.The specific results are shown in Figure 7.
From Figure 8, the following conclusion can be drawn: (i) The predicted results improve with time, and the effect increases faster in the first 15 s.(ii) The prediction result of model 3 is better than that of model 2, and model 2 is better than that of model 1.Model 3 has a good prediction result since the first second, which shows that accurate source distance and magnitude play a great role in improving the prediction of intensity.(iii) The prediction results of the training samples are slightly better than those of the confirmation samples, indicating that the model can also make good predictions for the same type of data that is not used for training.(iv) The predicted results of the three models are all less than 0.6 by 20 s, and in practical applications, most of the intensity differences after rounding the results  are 0 or 1.In actual prediction, if the absolute value of intensity difference can be within 1 and the rounded value is also within 1, it is a good and acceptable result in an earthquake warning.In Figure 8, the change in the average value of the intensity difference is analyzed, while the distribution of intensity difference is also analyzed.First, three models were used to conduct a detailed analysis of the average intensity difference between the representative first and 20th seconds of the training samples (Figure 9).The other intermediate moments were not listed one by one, and they were a gradually changing process.From Figure 9, it can be seen that at the first second, the proportion of predicted intensity difference less than 1 using model 3 is significantly higher than the other two models.Model 1 has poor prediction performance, with 7.92% of predicted intensity difference greater than 3.At the 20th second, the prediction performance of the three models was relatively good, and the proportion of predicted intensity differences less than 1 was higher than 90%, with relatively accurate results.Second, this article analyzes the proportion of the prediction intensity difference of the three models from 1 to 20 s for training samples that is less than 1 (Figure 10), which can to some extent reflect the situation of intermediate moments not listed in Figure 9. From Figure 10, it can be seen that the proportion of intensity differences less than 1 shows a gradual upward trend, with a faster rate of increase in the first 10 s and model 1 showing the most obvious rate of increase.This means that the prediction performance of the three models will have a significant improvement over time and reach over 90% at 20 s, indicating good prediction results.

Overview of vibration table testing and construction of neural networks
Overall, the number of intermediate layers selected for the network used for predicting data is only 1 layer.Selection of input layer nodes: In the vibration table experiment, there are 13 acceleration measurement points, each of which contains 63,877 acceleration values.When the number of samples reaches a certain level, the speed of the network is also difficult to improve [40], and the training error will not decrease.Therefore, the correlacoefficient method is used to select the training set, and the correlation coefficient calculation formula is where r x y , represents the correlation coefficient between x and y, I is a positive integer, X is the average value of x, and Y is the average value of y.The correlation coefficient represents not only the direction of linear correlation but also the degree of linear correlation, with a value interval of [−1, 1].When the correlation coefficient is 1, the two variables are positively correlated, and when the value is −1, the two variables are negatively correlated.The closer the absolute value of the correlation coefficient is to 1, the more obvious the linear relationship is [40].The empirical formula for the number of hidden layer nodes is  Seismic vulnerability signal analysis  9 where l is the number of hidden layer nodes, n is the number of output layer nodes, m is the number of input layer nodes, and a is a constant between [1,10].After determining the value range of l according to Eq. ( 8), use the trial and error method to determine the number of hidden layer nodes: first, set a larger number of hidden layer nodes, then gradually reduce the number of nodes and train with the same sample to determine the number of nodes corresponding to the smallest network error.The process of obtaining neural network prediction data is as follows: first, train the neural network by randomly selecting the acceleration measurement point data with complete measurement, determine the topological structure of the neural network by trial and error method, start from the network topological structure, gradually reduce the network structure, and obtain the best prediction network structure based on the average absolute error (MAE) and root mean square error (RMSE); Second, another group of fully measured acceleration data validation group is selected to verify the applicability of the network structure.Finally, use a network structure with high prediction accuracy to predict unmeasured data in the experiment.The correlation coefficients between the acceleration data of point 5 in working condition 2 and the data of other measuring points are listed in Table 2.
From Table 2, it can be seen that there is a significant correlation between acceleration measurement point 4, measurement point 6, and seismic table acceleration in condition 2 and measurement point 5, all of which are greater than 0.98.Therefore, measuring point 4, measuring point 6, and seismic table acceleration values were selected as input values for the neural network.The entire process acceleration data were divided into a training set and a testing set in a 3:2 ratio, and the neural network was trained (63,876 sampling points).From equation ( 8), it can be seen that when the input node is 3, the selection range for the number of hidden layer nodes in the network is [3,12].During the training process, it was found that when the number of hidden layer nodes is seven, the training frequency is the least and the target accuracy is achieved.Therefore, the hidden layer nodes are determined to be 7.The RMSE and MAE are used to evaluate the prediction accuracy, and their calculation formulas are     where Y i is the expected output and i is the predicted value.The acceleration data were predicted, and the RMSE and MAE comparison results between the predicted values and the true values are shown in Table 3.
From Table 3, it can be seen that when the number of input layer nodes is 3, the MAE between the predicted acceleration value of the Our method neural network and the true value is 36.1566,which is 63.82% of the MAE between the predicted and true values of the BP neural network.The RMSE is 158.8076, which is 67.78% of the BP neural network and has a significant error.When the number of input layer nodes was reduced to 2 and 1 for network training, it was found that the prediction accuracy of the network was lower when the number of input layer nodes was 1 than when the number of input layer nodes was 3. Therefore, the study only provided prediction accuracy analysis when the number of input layer nodes was 2. Select measurement point 4 and measurement point 6 with the highest correlation coefficient with measurement point 5 as the input set, use the trial and error method to determine the hidden layer as 6, and train the network.

Topological construction of neural networks
Based on Table 2 and Figure 3, it can be seen that when the number of input nodes is 2, the prediction accuracy of the OURS method neural network is better than that of the three-layer input.In summary, it can be seen that when the number of input nodes is 2, the prediction results of the OURS method neural network topology for acceleration data are basically consistent with the actual values.Therefore, a three-layer BP neural network optimized by MAE is used to predict vibration data, with two input nodes, six hidden layer nodes, and one output layer node.
The construction process of a neural network for predicting acceleration data is shown in Figure 11.
After the network training is completed, its applicability needs to be verified.Select the acceleration measurement point 9 under the same working condition, i.e., working condition 2, as the validation group, with 40 data points exceeding ±80 m/s as the predicted data.Use the trained neural network topology structure mentioned above for prediction.The correlation coefficients between measurement point 9 in condition 2 and other measurement points are listed in Table 4.
From Table 4, it can be seen that in condition 2, the correlation coefficients between measurement points 8, 9, and 10 are the highest and all greater than 0.99.Therefore, in the training set, measurement points 8 and 10 are selected as the input sets for the network topology structure obtained above.The residual value between the predicted value and the true value of measurement point 9 is shown in Figure 5. From Figure 12, it can be seen that out of the 40 predicted data points at measurement point 9, the maximum residual between the predicted value and the true value is 1.70817 m/s 2 , which occurred in the 17.226th second.The MAE between the predicted value and the true value is 0.438, and the RMSE is 3.765, with an acceptable error range.Therefore, the topology structure of the OURS method neural network trained above is applicable for predicting acceleration data and can be used to predict 29 data points beyond the range of the accelerometer in the experiment (Figure 13).

Predicted results of unmeasured data
There are 15 data values of 90 m/s 2 and 14 data values of −90m/s 2 in the acceleration response time range of point 7 in working condition 2. Using the previously trained and validated neural network topology, select the neural network training set using the correlation coefficient method to predict measurement points using unmeasured data.
The correlation coefficients between other measurement points in condition 2 and measurement point 7 are listed in Table 5. Select measurement points 6 and 8 with a correlation coefficient greater than 0.99 as the input sets.

Visualization denoising of synthetic seismic signals
In order to compare and analyze the denoising effect of algorithms from both qualitative and quantitative perspectives, the trained model is applied to the test set.Add Gaussian white noise with noise levels of 5, 8, and 10% to the simple model data and further evaluate the denoising effect of the proposed method and Unet under different noise levels.The denoising results are shown in Figure 14 when the noise level is 8%.From Figure 14, it can be seen that, compared to Unet, the proposed method removes most of the noise, and the denoised profile is closer to the original clean signal.

Conclusion
Compared with the traditional CNN, the model proposed in this article has its unique advantages.It can deal with the problem of long-term sequence dependence and can carry out parallel computing.It can focus on global information and has a broader receptive field, making the inversion network a greater advantage in the seismic inversion of long-term sequence.The application of model data testing and actual data also indicates that this method can achieve good inversion results in seismic impedance inversion, providing a new method for intelligent seismic reservoir prediction.This method can effectively enhance weak reflection, improve the recognition ability of seismic data for weak reflection interfaces, and provide a feasible solution for small-layer division with broad application prospects.

Figure 4 :
Figure 4: Time history curve of an artificial seismic wave.(a) Artificial seismic wave 1.

Figure 5 :
Figure 5: Acceleration curve of measurement point 7 under working condition 2.

Figure 6 :
Figure 6: Relationship between focal distance and magnitude of selected seismic event stations in this article.

Figure 7 :
Figure 7: Relationship between focal distance and intensity of selected seismic event stations in this article.

Figure 8 :
Figure 8: Predicted average intensity difference of three models from 1 to 20 s changes in absolute values.

Figure 9 :
Figure 9: Predicted intensity difference of the first and 20th seconds of three models distribution of absolute values.(a) Model 1: distribution of absolute value of preheating intensity difference, (b) Model 2: distribution of absolute value of preheating intensity difference, (c) Model 3: distribution of absolute value of preheating intensity difference.

Figure 10 :
Figure 10: Proportional changes of predicted intensity differences within 1 for the three models.

Figure 11 :
Figure 11: Neural network acceleration data prediction process.

Figure 12 :
Figure 12: Residual value between the predicted value and the real value of test point 9.

Figure 13 :
Figure 13: Acceleration time history curve and partially enlarged view of measurement point 7 under condition 2 after completion.(a) Acceleration process curve, (b) A locally enlarged image, (c) B locally enlarged image.

Figure 14 :
Figure 14: Comparison of synthetic seismic signal denoising results (a.clean signal; b. noisy signal; c.Unet denoising result; d. residual of OURS method enhancement).

Table 1 :
Pipeline structure test conditions

Table 2 :
Correlation coefficient between test point 5 and other test points under condition 2

Table 3 :
Error comparison between the predicted value and the actual value at test point 5

Table 4 :
Correlation coefficient between test point 9 and each test point under condition 2

Table 5 :
Correlation coefficient between test point 7 and other test points under condition 2