On a Stochastic Regularization Technique for Ill-Conditioned Linear Systems

Abstract Knowledge about the input–output relations of a system can be very important in many practical situations in engineering. Linear systems theory comes from applied mathematics as an efficient and simple modeling technique for input–output systems relations. Many identification problems arise from a set of linear equations, using known outputs only. It is a type of inverse problems, whenever systems inputs are sought by its output only. This work presents a regularization method, called random matrix method, which is able to reduce errors on the solution of ill-conditioned inverse problems by introducing modifications into the matrix operator that rules the problem. The main advantage of this approach is the possibility of reducing the condition number of the matrix using the probability density function that models the noise in the measurements, leading to better regularization performance. The method described was applied in the context of a force identification problem and the results were compared quantitatively and qualitatively with the classical Tikhonov regularization method. Results show the presented technique provides better results than Tikhonov method when dealing with high-level ill-conditioned inverse problems.


Introduction
It is very common in applied mechanics to come across identification problems when estimating a certain physical parameters of interest. Unfortunately, most of these problems lie in the area of inverse problems. Either because of the lack of appropriate measurement instruments or due the difficulties to access the measurement locations in order to obtain the desired quantities directly [1][2][3][4]. Inverse problems are characterized by determining unknown causes based on observations of their effects [5]. A particular type of inverse problems, found frequently in engineering, can be formulated in terms of a linear system [6] x = A + y, where y is the output vector that represents the observed effect, A + = (A * A) −1 A * is the pseudo inverse matrix operator and x is the input vector related to the desired quantity. The matrix A is known as the matrix operator of the linear system y = Ax and A * is the Hermitian transpose of A. In the absence of noise in both vectors x and y, the matrix inversion processes are computationally stable if A has linearly independent columns [7]. However, if data is corrupted with noise, these processes may become unstable, and noise can be largely amplified in the solution, so that the solution might become completely meaningless [8].
When dealing with noisy inverse problems, it is necessary the use of regularization techniques to obtain better and stable solutions. In ill-posed and (or) ill-conditioned systems, regularization involves the introduction of additional information to the system in order to improve its solution. Many classical regularization methods, such as Tikhonov regularization and mollifier methods, are discussed in detail in many books and papers in the literature [8,9]. In particular, the use of adapted matrices over several kinds of regularization problems has also been studied recently [10,11]. Moreover, other regularization techniques that involve matrix learning and large covariance problems are among the object of interest of several researchers [12][13][14].
Force identification problems are examples of such kind of noisy and unstable inverse problems that claims for regularization methods. These problems has gained a lot of interest in the industry, especially in the fields of civil engineering and structural mechanics, since it can be used to forecast the remaining lifetime of a given structure [15]. Force identification problems aims to estimate forces based on the measured structural response and the dynamic model of the structure behavior using frequency response functions (FRF) in matrix form [16]. Several advances and methods can be found in the literature related to the solution of a force identification inverse problem, such as the use of mixed penalty functions for regularization [15], Kalman filter [17], Taylor formula [18] and sparse representations [19].
In this context, this paper brings a novel strategy to regularize ill-conditioned systems of linear equations using random matrices based on Monte Carlo methods. Here we are interested in a regularization method that introduces modifications in the matrix operator A in order to stabilize the desired inversion process. Compared to the other well known and well used regularization methods, the method proposed in this paper has the advantage of modifying and regularizing the ill-posed linear system using a case-specific model for the noise presented in the signal. This is done through the use of specific probability density functions that are related to the noise presented on the sampled signal. The mathematical process of constructing the Random MAtrix method (RMA) is shown step by step in the following sections. Then, to validate the proposed method and the algorithm efficiency was used a force identification inverse problem based on a noise corrupted data. Finally, the results were compared quantitatively and qualitatively with the classical Tikhonov regularization method and was shown that RMA provides better results than the Tikhonov regularization in all the cases tested.

Ill-Posed and Ill-Conditioned Systems
A problem is said to be well-posed if the following requirements are fulfilled: (1) there exist a unique solution to the problem; (2) the solution is smooth, i.e., it depends continuously on the input data. Otherwise the problem is called ill-posed [8].
The first condition requires that the solution is not ambiguous. The second one assures the continuous dependence on the solution of the problem. In simple words, a well-posed problem is one that, based on its formulation, has a consistent solution that could describe physical systems continuously.
In practice, the measured data is a finite set, known to have a certain degree of approximation with the actual phenomenon, and one wants to find a solution that is the best possible fit to the problem. Depending on the quality of the measured data, the search for the best solution of the problem may find some instabilities and undesirable errors.
Assume that a matrix operator A is singular, this implies that at least one singular value of A is zero. On the other hand, when some singular values are very small, closely to zero, and not null A is called ill-conditioned. This situation makes the solution very sensitive to small changes in the data. A way to measure this sensitivity described is through the condition number of the matrix operator A [20], it is defined as follows where σmax(A) and σ min (A) are the maximal and minimal singular values of A, respectively, and ‖·‖ 2 is the norm defined in the square summable sequence space ℓ 2 [5]. Note that σ min (A) must be non zero, otherwise the condition number cannot be calculated. It means that A must be a positive or semi-positive full-rank matrix. The system is called well-conditioned if the condition number is close to its minimal value (the unit). While a problem with a high condition number is said to be ill-conditioned. For the particular case of the square matrices operators, illconditioned systems can be easily identified by the matrix operator A determinant, which is close to zero if the matrix operator is quasi-singular.

Regularized Linear Systems
Consider a system of linear equations written in the matrix form as shown in Eq. (1). If noise is added to response vector y, then the solution vector x must be compensated by an errorx such that A(x +x) = y +ŷ =ỹ. ( The error amplification factor |x|/|ŷ| will be very large if the matrix operator A is ill-conditioned [22]. Now, consider in Eq. (3) that a before-state error ε 0 is used to reduce the effects of the uncertainties inỹ, i.e., wherex = x +x.
On the other hand, if a regularized matrix operator A = A +Â is applied to the exact response vector y, then a compensation error εa over the input of the system will arise. It can be written as The term εa will be called the regularization error andÂ is the matrix responsible to stabilize the inversion process. We know that the exact response y is not available, so the Eq. (5) must be modified to consider the noisy responseỹ, i.e., Now, in Eq. (6), if we use the before-state error to compensate the uncertainties in y then we should get In this case we write the after-state error vector ε 1 = ε 0 + εa as a sum of the uncertainty and regularization effects. The error vectors ε 0 and ε 1 show how the solution differs from both original and regularized systems. The simple rule presented by Eq. (8) can be applied for anyx, to check if the regularization result is satisfactory. However, there is no analytical expression available for that and then errors should be numerically checked for each response of the system using The presented regularization strategy shows how to find a satisfactory regularized linear matrix operator A using a Monte Carlo simulation [23]. The regularized operator A applied to a noisy outputỹ generates a reasonable noisy input datax.

Random Matrix Method (RMA)
The proposed regularization aims to find a stochastic linear matrix operatorÂ that could improve the conditioning number of the system when it is added to the original matrix A. Then, the new system should have a reduced error amplification factor, which improves the inverse problem solution.
The method consists in generating a finite set of matrices {︁Â k }︁ , indexed by an integer k, such that their condition numbers are smaller than the condition number of the actual matrix operator A, i.e., each matrix of the set must satisfy the rule expressed by the following equation where A k = A +Â k . The set of matrices {︁Â k }︁ can be determined by a Monte Carlo simulation based on the relation between the before and after-state error given by Eq. (8). For that, it is important to derive some analytical expressions to estimate the error vectors before (ε 0 ) and after (ε 1 ) regularization.
From Eq. (4) we have that the before-state error vector is related to the exact and noisy response vector as y + Aε 0 = y. So, Aε 0 = −(ỹ − y). Moreover, from Eq. (6) we have Aεa +Âx = 0. Then, sincex = A +ỹ , the regularization error is given by εa = −A +Â A +ỹ . Finally, the before and after-state error vectors can be estimated by the following expressions whereŷ =ỹ − y, A + is the pseudo inverse matrix operator andÂ is the desired matrix regularization.
As it is shown in Eqs. (10)-(11), the uncertaintiesŷ are needed to estimate the error vectors above. In order to faithfully represent the meaning of Eq. (8) and check perfectly the regularization efficiency by using Eq. (9), the uncertaintiesŷ should be approximated as much as possible by the uncertainties of the measured data.
The problem of finding the uncertaintyŷ leads to a Monte Carlo simulation, using a reasonable probability density function (PDF). Thus, a set of uncertainties {︀ŷ i }︀ can be found to check the random matrices modifications in many sensitivities input directions, which represents the variations on the contaminated input data that can reduce noise amplification factors in the system inversion. At first, a finite set of before-state error vectors {ε 0i }, indexed by an integer i, is obtained from Eq. (10) . According to the Central Limit Theorem, for large computations the simulated uncertainties might be modeled using the standard normal distribution [23]. Other probability density functions could be applied whether some specific information is known about the referenced stochastic responses. On the other hand, if no information about the noise is known then we suggest that the random matrices must be generated preferably by uniform distribution or Gaussian distribution, so that it could be possible to search an unbiased solution.
As it is seen, RMA has two main regularization parameters: εmax and W k . The first one controls the amplitude of the numerical uncertainties. The numerical uncertainties must be modeled according to the expected experimental error. The second one controls the amplitude of the matrices modifications, that is, the regularization levels. It is known that the regularization error εa is acceptable just for small scales, which means that the regularization must be sufficient to repair the ill-conditioned system.

Tikhonov Regularization
The Tikhonov regularization method is one of the oldest techniques addressed to the solution of ill-conditioned and ill-posed identification problems [20]. This technique was first developed by the Russian Andrey Nikolayevich Tikhonov, and it can be simple stated as the constrained least square problem [21]: For the discrete case, the Euclidian norm ‖Cx‖ 2 2 controls the amplitude of the input x by applying a regularization parameter λ to the requested minimization problem. The Tikhonov solution x λ,C is unique, and is formally given by where If the matrix C is chosen to be the identity matrix In, then we drop the term "C" in the Tikhonov solution, presented by Eq. (13), and the norm of the solution will also be minimized. The bottom line here is how to optimize the regularization parameter λ. Using a SVD technique [24,25] the matrix operator A ∈ C n×m , with rank m, can be written as where σ i is the singular value, u i is the left singular vector and v i the right singular vector. Then, each term of the minimization problem stated by Eq. (13) can be written as for any y ∈ R(A) = bining Eq. (15) and Eq. (16) with Eq. (13) one can obtain a solution where is the solution filter factor. For the case C = In, the regularization parameter acts over the smallest singular values, σ 2 i < λ, which will produce f i ≈ σ 2 i /λ 2 . The criterium to choose the regularization parameter λ can be developed by substituting Eq. (17) into Eq. (15) and Eq. (16). The result is shown below These equations provide a balance between two sources of error: the perturbation error, presented by Eq. (19), due to the ill-conditioning, and the regularization error, presented by Eq. (18), due to the system modification imposed by λ. Note that the term ε 2 is a strictly increasing function of λ, while the term η 2 is strictly decreasing, and thus an optimum value of the regularization parameter λ opt exists.
The so called L-curve [2] is a plot of ⃦ ⃦ Cx λ,C ⃦ ⃦ 2 versus ⃦ ⃦ Ax λ,C − y ⃦ ⃦ 2 , and it represents a balance between these two error quantities, which are the main issue of any regularization technique. The generic form of the L-curve is presented in Figure 1, on the left side. The L-curve clearly distinguishes two important regions for the regularization parameter λ. From the central corner to the left, the respective regularization parameters λ decreases, and the perturbation error η 2 , increase fastly. From the central corner to the right, the regularization parameter λ increases, decreasing the perturbation error but increasing the regularization error ε 2 . Thus, the optimized regularization parameter λ is near to the central corner of the curve.
An alternative choice of the regularization parameter λ is provided by the decreasing singular values curve, schemed in Figure 1, on the right side. The decreasing singular values curve is a plot of the singular values, σ i , versus its numbers. Its known that the singular value decomposition (SVD) can be used to better approximate a matrix Ar (rank r) to a matrix A ∈ C n×m [5]. Therefore, when dealing with noise corrupted matrices, it is possible to find a reasonable Ar matrix that better replaces the original noisy A matrix, for the purpose of inverse identification problems.
In this context, the decreasing singular values curve can be used to establish limits for the smallest singular values in the Tikhonov solution. The presented results show that an optimized Tikhonov solution can be obtained by setting the value of the regularization parameter λ to the smallest acceptable singular value, so that the area W, defined from the greatest to the smallest acceptable singular value, corresponds to an empirical percentage of the total area. The Tikhonov solution was applied, in the presented experimental identifications, by using the decreasing singular values curve, which returns better results than the L-curve, on the choice of the regularization parameter λ. It was used a percentage of 99% on the choice of the optimized regularization parameter λ opt .

Experiments and Results
A force identification experiment was carried out for the system presented in Figure 2, constituted of a cantilever rectangular structure, with one side clamped and the other free. In the experiment accelerometers, force transducers, and electrodynamic shakers were used. White noise of the same order of magnitude, however, separated by two different noise generators, were supplied to the shakers.
Through the presented experimental setup it was possible to obtain a set of structural frequency response functions (FRF) A ij (ω), which correlates the exciting forces (inputs) x i (ω) with the measured accelerations y j (ω) (outputs) [26,27]. The experiment was carried out over an inertial mass, in order to better control the reference force values, measured by the force transducers, which connects the electrodynamic shakers to the structure. The measured structural FRF was conveniently expressed in the matrix form A ∈ C 5×5 , according to Eq. (1). The number of spectral lines, for data acquisitions, was set to 4096, and 6400 Hz was considered as the frequency sample rate. The two electrodynamic shakers excited the structure simultaneously at positions #3 and #5 (considering positions equally distributed, starting from the right clamped side, as it is seen in Figure 2). The identification processes were executed in the range of zero to 4686 Hz (3000 spectral lines). The response measured data, structural frequency response functions A(ω) and accelerations y(ω), were then corrupted with 25%, 50% and 75% of additive and multiplicative white noise (see Figure 3 for the 75% added noise case), in order to difficult the force identifications, and also, simulating a worst case scenario situation.
Finally, the force identification systems, corresponding to each one of the three different noise level cases, were inverted using the RMA, Tikhonov regularization and least squares. The solution to the non-regularized force identification was obtained by a least square solution, expressed by Eq. (13) in the matrix form, for λ = 0. Is expected that this naive identification procedure shows very discrepant results pointing the importance of regularization in presence of noisy data.

}︁
→ Q = 100 random matrices per direction εmax → approximated set to maximum magnitude order of noise in responses data (~25% of the order of the response signal data) W k → the amplitude order was set to be constant for each level of noise added. The parameter values was set by a normal distribution.
The parameter W k , described in Section 2.3 in general as a control matrix, was chosen based on the dispersion of the noise found in the FRF. It was estimated, for each level of contamination, based on the noise amplitude observed in contaminated FRF matrices. In this work, this parameter was set to be constant and chosen as W k = 0.005 for 25% contamination, W k = 0.01 for 50% contamination and W k = 0.02 for 75% contamination of the measured data.
In Figure 4 it is shown, on the top, the matrix condition number before and after regularization, obtained from both regularization methods. The obtained results for the random matrix regularization method and for the Tikhonov solution are quite similar, but with small differences on the interval close to 2000 Hz. Also in Figure 4 is shown, on the bottom, the error amplification factor for A(ω), before and after regularization, obtained from both regularization methods. The random matrix regularization method and the Tikhonov regularized solutions return equivalently good results in the whole frequency range. In this case we can say that the RMA regularization maintains both the condition number and the error amplification factor approximately at the same level as the Tikhonov regularization which is desirable.  [5,5](ω) and its noisy counterpart corrupted with 75% of additive and multiplicative white noise.
The results of the force identifications are shown in Figure 5. First, it is immediate to note that the results of the identification forces on nodes # 3 and # 5, obtained by the least square method, differ significantly from the reference curves as expected. This fact instantly reinforces the basic necessity of a regularization technique in noisy matrix inversion processes, as stated before.
As far as the other identifications are concerned, also in Figure 5 it is possible to notice that the RMA is noticeably more satisfactory than the Tikhonov regularization method, because it can be seen that the points in the RMA curve are closer to their respective reference values. For a better visualization and understanding of the results, a quantitative analysis is presented furthermore.
The forces identified at nodes #3 and #5, through the different methods, were quantitatively compared with the reference force curve by means of the signal-to-error ratio (SER). The SER measures the ratio between the energy of the signal and the energy of the estimation error. It is calculated (in decibels) as: where F ref is the reference force and F id is the identified force by one of the three methods considered. Results presented in Table 2 are consistent with the qualitative results observed in Figure 5 considering the signal contaminated with 75% of noise. Note that an in-   In these cases, it is immediate to note that the RMA regularization results are closer to the reference than the curve obtained using Tikhonov regularization. Both figures are related to the data corrupted with 75% of additive and multiplicative white noise.
fied in node #5 in relation to Tikhonov regularization. In addition, we observed in the other results presented in Table 2 that for contaminations with smaller noise levels of 25% and 50%, the RMA still shows higher SER in comparison to the SER related to the Tikhonov regularization method. In these cases, RMA offers gains of 0.65 dB and 2.34 dB for 25% of noise contamination at nodes #3 and #5, respectively; and gains of 1.21 dB and 0.72 dB for 50% of noise contamination at nodes #3 and #5, respectively. This way, considering all sixes identified forces, the minimum SER gain RMA obtained in the experiment was 0.65 dB and the maximum was 2.34 dB when compared to Tikhonov regularization. This indicates that the quadratic error ∑︀ with Tikhonov method is about 16% higher than the RMA error in the minimum case and about 70% higher in the maximum case. It is important to point out that low condition numbers or error amplification factors do not guarantee a good solution. These approaches might be used just to provide a reasonable modification on the system, in order to restrain the solution in the presence of noise and ill-conditioning or just to check the identification conditions between different configurations. A good solution depends, at first, on the quality of the data and, second, on a successfully regularized identification process, which must act mainly over the ill-conditioning, keeping the solution as close as possible to that one obtained from a hypothetically noiseless data.
Now focusing the attention on the internal results of the random matrix regularizations. As it is shown in Table 1, 100 random matricesÂ k were generated for each sensitivity directionsŷ i . It means that the random matrix method should search for 100 regularization possibilities at each frequency line, or matrix inversion, by checking the Eq. (9). The matrix which produces the best overall result, over the 50 sensitivity directions, and off-course meet the rule described by Eq. (8), must be accepted. If none of the 100 matrices meet the Eq. (9) and Eq. (8), so the regularization is not reasonable, and must be avoided. It is important to notice that RMA achieved 100% efficiency in the regularizations practiced along the frequency axis, in all cases considered in this paper.
Due to its stochastic nature, the RMA method is highly sensitive to the choice of the regularization parameters P and Q (Table 1), which define the order of magnitude of the Monte Carlo simulations to be practiced. It is also of great importance the appropriate choice of the probability density function responsible for modeling the noise that corrupts the dataset. In the cases presented here, the normal distribution was selected precisely to fit the white noise inserted in the data, as well as the experimental noise that, through the application of the Central Limit Theorem is usually approximated by a Gaussian function or Normal Distribution. It is worth to mention that datasets corrupted with noise that are modeled by probability density functions different from the Gaussian distribution must be treated using the same computational procedure described in Section 2.3, however the function N(n) must return the coefficients given by the density probability function related to the noise of the treated problem.

Conclusion
The main purpose of the random matrices regularization method (RMA) is to reduce matrices condition numbers and amplification factors in order to provide a better solution in presence of ill-conditioning and noisy linear systems. The RMA method works adding information to the original noisy systems, not truncating as other regularization techniques. This method uses the probability distribution function that models the noise presented in the data to add information to the ill-posed system in order to achieve better accuracy in the system inversion compared to other analytical methods.
As it could be seen in the results presented that the random matrix regularization method operates safety and satisfactorily over the uncertainties in the data, because the method discards modifications that could, according to the matrix condition number rule describe in Eq. (8), increase regularization due to errors. Moreover, in all the situations tested the presented method produced results, that according to the signal-to-error ratio, are closer to the reference curve than the other methods. Showing the efficiency of the proposed approach.
In this paper was used a force identification problem in order to demonstrate the efficiency and feasibility of the random matrix method. However, the proposed approach can be applied on any kind of noisy inverse problems arising in different fields of study, such as medical imaging, signal processing, remote sensing, oceanography, among others.
Finally, it is important to note that the choice of the regularization parameters is in fact empirical, but it can be aided by any knowledge about the noise in the data. It is expected that a certain regularization parameter set could optimize the force identification results. The problem in finding the best regularization parameters is been studied in the moment.