Innovative techniques for the improvement of industrial noise sources identification by beamforming


 An innovative technique based on beamforming is implemented, at the aim of detecting the distances from the observer and the relative positions among the noise sources themselves in multisource noise scenarios. By means of preliminary activities to assess the optical camera focal length and stereoscopic measurements followed by image processing, the geometric information in the source-microphone direction is retrieved, a parameter generally missed in classic beamforming applications. A corollary of the method consists of the possibility of obtaining also the distance among different noise sources which could be present in a multisource environment. A loss of precision is found when the effect of the high acoustic reflectivity ground interferes with the noise source.


Introduction
Industrial plants play a fundamental role in the economic life of a territory, generating wealth and employment; on the other hand, their environmental impact can be detrimental for the quality of life of the citizenship. Obligations become more stringent when the plants are located in urban or peri-urban areas, close to residential or noise sensitive buildings (schools or hospitals) [1][2][3]. Limiting to the environment noise, large industrial plants are usually characterized by many sound sources which emissions vary in space and time, having different frequency spectra and characteristics (e.g., presence of impulsive or tonal components). The set of these sources contributes to the global noise level emitted by the plant and received by the surrounding buildings and workplaces, which must comply with the national legal limits. Therefore, the knowledge of the individual sources contribution becomes essential when it is necessary to perform corrective actions on the plant, for instance limiting the emissions of the most disturbing sources or verifying the absence of malfunctioning organs with emissions higher than the standard. Common sound level meters are not suitable for this purpose and other techniques, such as sound intensity, are of limited use for implants of large scale.
In the last couple of decades, however, a new tool has become available for professionals: the acoustic camera, which is essentially a planar array of microphones plus a video camera. Acoustic cameras are becoming cheaper with the years and can host an increasing number of microphones thanks to the development of the technology of low-price MEMS transducers. Using acoustic cameras, graphical maps of sound pressure level or other acoustic descriptors can be obtained, allowing the rough localization of the different sound sources [4][5][6].
If no geometric information is available on the scenario under investigation, acoustic cameras do not allow to spatially position the noise sources retrieved. Many efforts were spent in this direction, using different techniques [7][8][9]; the method presented in this work is aimed at proposing a simple approach, based on a limited number of measurements and a post-processing analysis to assess the distance among noise sources and their distance from the microphones array.
The spatial placing of the most relevant sound emitters assumes a particular relevance in the phase of noise reduction of industrial sources; an easy way to define their spatial location adds a fundamental information to the standard two-dimensional picture obtained from the beamforming software.

Beamforming background
Commercially available acoustic cameras use mainly two techniques: acoustic holography and beamforming. Whereas acoustic holography uses sound field data measured over a planar surface close to the source (NAH stands for Near field Acoustic Holography) to estimate a 3D mathematical model of the sound field, beamforming works at medium to long distance from the source and at higher frequencies than acoustic holography [10].
Being beamforming the technique used in this study, some applications and a brief theoretical background are reported in the following.
The acoustic beamforming consists of a spatial filtering technique that permits locating specific sound sources from a variety of other insignificant sources, based on the direction of arrival.
This method uses signal processing algorithms that allows to spatially isolate a sound coming from a precise direction decided by the user, by creating a virtual superdirectional microphone that can be oriented starting from a microphone array of known characteristics.
This technique has a wide range of applications [11]: it is the basis of speech recognition; these algorithms are used in audio-video conference rooms where speaker and camera localization is difficult to implement manually. They are used in smartphones and telephones especially when the speaker is used: the caller does not hear his voice back when the voice is transmitted through the loudspeaker. These algorithms also have applications in seismology and they are used to study the wave reflections on the surface of the earth.
Many other fields of interest involve beamforming [12]: it is used as a part of Active Noise Cancellation System in the car where the vehicle takes the voice instructions irrespective of the presence of background noises and music from the infotainment systems by amplifying or gain to the signal. Beamforming finds also application in the fault detector system where engineers can locate the issues in a component present in an automobile or aircraft without actually opening them completely. It is used to monitor the component and the engine health before getting involved in the overhaul process, thereby reducing a huge amount of maintenance repair and overhaul timings.
This technique makes use of flat and linear arrays placed exclusively in the far-field region and composed of omnidirectional microphones, which do not present attenuation of the signal based on the angle of incidence of the sound.
The beamforming allows obtaining, in short times, an acoustic map of the area under examination, superimposed on the photographic image of the area itself. Thanks to the type of representation offered by this technique, it is therefore possible to see the directions of origin of the noise and then proceed to the identification of the sound sources present in the environment studied quickly and effectively.
Two main beamforming algorithms are used: "delay and sum in the time domain" and "delay and sum in the frequency domain" [12].

Delay and sum beamforming in the time domain
The delay and sum beamforming in the time domain algorithm is the most frequently used and the apparatus that allows its application is described as follows. A linear array of width D composed of m omnidirectional microphones arranged at regular intervals d (where the distance is measured from the central axis of each microphone) is considered (Figure 1).
A block that has the function of temporal delay is connected to each microphone (not specific which type of block as it is sufficient to perform a translation in time of the input signal) and finally the data of all the blocks are connected to a signal adder which outputs the processed signal.
Considering an extended source of sound energy located at a distance L from the microphone array and placing ourselves in the far field region, we can schematize it as if it were made up of an infinite number of sources placed in an orderly manner over the entire length of the source. Therefore, each microphone will detect the superposition of the signals coming from each angle θ to which the points constituting the extended sound source are placed respectively. Initially, we consider only a signal coming from a single fixed angle θ, and then implement it for every angle. We assume that the wavefronts seen by the array are flat and that the attenuation of the intensity of the same signal picked up by two different microphones is negligible. The important thing about putting ourselves in far field is that the wavefronts emitted by a single source have the same angle θ in each microphone of the array and, therefore, based on the position of the source with respect to the central axis of the array, the same signal will be picked up by each microphone with a time delay dependent on the angle θ.
In fact, knowing the position of the source at a certain time t, the algorithm allows to reconstruct the emitted signal knowing the time delay and performing an appropriate correction in amplitude. The signals of all the m microphones positioned in the acoustic field perturbed by the source are acquired simultaneously; each signal is then shifted in time to the emission time knowing the path presumably traveled by the sound wave, and finally all N signals thus obtained are averaged.
The ∆ delay can be calculated trigonometrically by knowing the wave propagation speed (sound speed c) and the distance d between each microphone using the following formula: where ∆n(θ) is the given signal delay in the n-th microphone with respect to the signal at microphone n = 0; note that the delay is dependent on the angle of incidence of the sound front.
Another variable is introduced: Now, considering a signal coming from a source at the angle θ 0 and position x the same signal will be picked up by the n-th microphone like: The signal coming out from the transducers will travel the line to the delay blocks with a defined ∆n(θ), given by the second part of Equation 1. As each signal encounters its own delay block, each of them will be temporally translated so that when the signals exit the blocks they will be temporally aligned at the same time instant: Then, each signal will go through the summing block that will join them and, being all temporally aligned, a considerably strengthened signal will be produced: Now, focusing the attention to another source at an angle θ 1 ≠ θ 0 , it will produce on the microphones new signals that will undergo the same process described before. The only difference is that, when they meet the delay blocks, the latter, being set in relation to the angle θ 0 , will move them in such a way that when they reach the adder they are no longer aligned. Therefore, once the adder combines them together, the generated signal will possess limited intensity: )︂ Knowing the temporal translation that the delay blocks produce, it is known at what angle θ 0 the amplified signal arrives. Now, considering an infinity of sources each placed at an angle θ between θ min and θmax, the n-th microphone directly picks up a signal sn which is already the sum of the signals coming from all angles θ with θ min < θ < θmax. In this case the received signal is: and the Equation 7 is generalized as follows: Therefore, each microphone of the array picks up the signals coming from every direction but, at the exit, the function y(x; t) is, in good approximation, only the signal coming from θ 0 .

Delay and sum beamforming in the frequency domain
One of the improvements that can be implemented to increase the effectiveness of the algorithm is to no longer work in the time domain, moving to the frequency domain, reason for what the method is called frequency domain beamforming [13]. This is done because, in real cases, broadband signals have to be managed and a discrete window that includes a finite bandwidth must be used.
The working principle is almost similar to the one previously seen, but in this case the time delay is applied to the signals in the frequency domain and the signals acquired by the microphones are processed through DFT (Discrete Fourier Transform) and IDFT (Inverse Discrete Fourier Transform) [14].
Considering that DFT works only with discrete signals, each data received by a microphone must be sampled in order to obtain a discrete signal: The m signals will pass through m DFT blocks, and every block will output N signals with different frequency ω, i.e. a frequency spectrum: Then, every ω-th signal coming from the n-th DFT block will meet a specific block that will add a phase equal to ω∆n(θ 0 ); in fact, in the frequency domain a time delay can be applied by means of a circular translation T: Tn,x,ω = Rn,x,ω e iω∆n(θ0) (12) The n signals with the same frequency ω are then summed by the adder in order to obtain: Rn,x,ω e −iω∆n(θ0) (13) Finally, the IDFT is applied in order to pass from a discrete to a continuous signal: Obtaining the desired signal y(x,t):

Instruments
The instruments to perform the beamforming measurements is made of a microphone array that allows to retrieve all the parameters to implement the algorithms previously described. The first concept was developed in World War II as a radar antenna, but the father of the microphone array can be considered John Billingsley [14,15] who proposed an acoustic telescope based on a microphone array to localize in real-time the sound source on full-size jet-engines. Nowadays, thanks to the technology progress, a more powerful acquisition and computing system allow higher sampling frequencies, longer acquisition times, larger number of microphones and real-time analysis that have strongly improved the instrument performance.
Near-field Acoustic Holography (NAH) and beamforming algorithm are the two main techniques that use the microphone array setup. In principle, the microphone array for each technique should have the specific feature to perform the measurements: NAH can perform source location using the microphone distance (grid) less than half wavelength with good resolution by implementing measures very close to the source; beamforming technique, thanks to the irregular array position, presents good resolution for longer distance to the sound source.
The possibility to match NAH and beamforming in a single array are so called pseudo-random microphone distribution, with a number of microphones ranging from 40 to 60-80-100. The hardware implemented to perform the measurements for this work [16] allows to perform both NAH and beamforming measurements with a pseudo-random array ( Figure 2). Its characteristics of are reported in the Table 1:  The new generation of low-cost sensors (MEMS) allow to have a large number of microphones with low power consumption, high SNR=61 dBA, high sensitivity around −26 dBFS and flat frequency response from 60 Hz to 15 kHz. A webcam to retrieve the digital image is placed at the centre of the microphone array.

Distance calculation method
There are many approaches used to calculate the distance of a reference point relative to the position of a camera. These methods can be active and use a signal (such as radio waves, microwaves, infrared, etc.) sent to the object, or they can be passive and only receive information about the target position. Among passive methodologies, the most popular are those relying on stereoscopic measurements based on the use of two cameras displaced one from each other by a known distance. Given the basis of stereo imaging, as an alternative, two or more images taken from a moving camera can also be used to compute distance information [17][18][19].  (Figure 3). Anyway, a characteristic constructive parameter of the optical camera rigidly connected to the microphones array of the AC has to be known: the focal length f. It represents the distance between the lens and the image sensor when the subject is in focus, i.e. the optical distance from the point where light rays converge to form a sharp image of an object to the digital sensor at the focal plane in the camera. The optical camera included in the used AC has a fixed f.
Actually, if f is known, applying basic triangulation, the calculation of Z can be obtained through the following equation: In the equation, d represents the disparity between the two images expressed in pixels, i.e., at the aim of this work, the distance between the points of maximum sound pressure level where the noise source is localized. The disparity d can assume different values according to the resolution of the acquired acoustic data and, in particular, d changes if it is calculated starting from raw or processed data, depending on the typology of the interpolation method used to improve the resolution. Consequently, different estimation of Z can be found.
The knowledge of Z also allows to calculate the size of a single pixel, in fact, if the dimension of some reference objects inside the image is known, the size at distance Z, Ps(Z), is given by the following linear equation: being D a depth for which the size of a single pixel Ps(D) is known. Consequently, knowing Ps(Z), simply multiplying it by the number of pixels between two (or more) noise sources, it is possible to obtain an estimation of their real distance in the plane of the acoustic image.

Experimental activities
A first step of the experimental activities was focused on the estimation of the focal length f of the optical camera included in the AC, since it was missing in the datasheets available and was not possible to retrieve it from a web search. In order to determine f, on the basis of other characteristic parameters of the camera, the procedure introduced by Zhang [20] was followed. This method requires the camera to detect a planar pattern shown at a certain distance with few different orientations. A specific camera calibrator based on this method is available in MATLAB [21]. According to the workflow and the instructions given, we used an A3 format checkboard pattern with one side containing an even number of black and white squares and the other side containing an odd number of them, attached to a planar surface. A number of 15 images at a distance of 1 m -thus ensuring that the checkboard filled more than the 20% of each image -were captured, placing the checkerboard at an angle less than 45 degrees relative to the camera plane ( Figure 4). The captured images (JPEG format) were added to the calibrator and it was verified that they met the requested specifications, through a specific built-in analysis, before running the calibration procedure. The described methodology, together with the available data, allowed to calculate an f equal to 2173.6 pixel. Once f was determined, we considered an experimental set up (Figure 7) in an acoustic open field in order to verify the accuracy of the calculation of Z and the inter-sources distance X using both raw and processed acoustic data.
The experimental set up included two noise sources (S 1 and S 2 ) consisting of two speakers, respectively positioned on the left and on the right side of the camera optical axis at a distance of 3.00 m, with a resulting total distance between S 1 and S 2 (X) equal to 6 m. S 1 was set at an height of 1.50 m, while the height of S 2 was 0.70 m. S 1 and S 2 was tuned to sinusoidal signals of different frequency (f 1 and f 2 ) not integer multiples each other, in order to avoid any summation effects that could result from spurious harmonics superposition. In particular, f 1 was 1,733 Hz and f 2 was 1,000 Hz. The optical camera of the AC was positioned at a height of 1.35 m and the distance between the AC plane and the plane passing for S 1 and S 2 was 12.25 m. The ground of the experimental setup is made of asphalt, with an acoustic reflectivity coefficient around 0.95 at all frequencies [22].
In order to determine Z according to the method described in the previous section, two measurements (M 1 and M 2 ) were carried out moving the AC on its plane, so that b was equal to 0.23 m (Figure 5), acquiring both optical and acoustic data.
Acoustic data, expressed as sound pressure levels in dB, were stored in TDMS (Technical Data Management Solution by National Instruments TM ) files, that were 16 x 21 matrices, and they were also post-processed so to have the  data deriving respectively from S 1 and S 2 . Such a construction was made using specific AC software filters and was carried out obtaining other matrices including the spectral components contained in a band of 100 Hz centered to f 1 and f 2 .
There is a correspondence "1 acoustic pixel = 48 x 48 optical pixels" in the raw acoustic data. In fact, as shown in Figure 6, images produced by the AC software without data interpolation are gray scale images of 1024 × 780 pixels, where an "acoustic pixel" corresponds to a square of 48 × 48 optical pixels. Using these raw acoustic data, a first value of Z(Zraw) was calculated.
Subsequently, the resolution of the acoustic data was improved at the aim of assigning a single value of sound pressure to each pixel of the optical acquisition and, therefore, improving the accuracy of Z estimation. Two standard bivariate interpolation methods were used: the bilinear interpolation and the bicubic interpolation [23,24], and thus two further values of Z (Z bilinear and Z bicubic ) were calculated. These interpolation methods are also implemented in the AC post-processing software and through their use it is possible to obtain images like the one of Figure 7.
The three calculated values were then compared to the real known distance between the AC and the sources (12.25 m), in order to evaluate the accuracy of the estimations of the method both using raw and processed acoustic data. Consequently, on the basis of the three Z values, three values of X were estimated: Xraw, X bilinear and X bicubic . Obviously, carrying out the calculation, S 1 and S 2 were considered two point-sources with the sound emitting elements located in the center of each speaker.

Results
The disparity d is the main output of the experimental measurements and it can be calculated both for sources S 1 and S 2 . In Table 2 are reported the values of disparity, in pixel, for the three levels of image refinement: raw, bilinear and bicubic. Considering the calculated disparity d, the above described algorithms to retrieve the distance Z between the sources plane and the measuring point plane M, have been applied to the three different matrices obtained from raw data and interpolated data.
The values of the distance Z have been reported in Table 3. According to the values obtained for d, even if both the sources are in the same plane parallel to the AC plane, results obtained considering S 1 and S 2 are strongly different. In particular, looking in detail the S 2 results, it clearly emerges that the estimation of Z is scarcely accurate, as a consequence of the not directivity of S 2 , the high reflectivity coefficient of the asphalt and the height of S 2 , that is a half of the one of S 1 . Given these characteristics, the contribution of reflections occurring on the ground were not negligible and they influenced the post-processing of data.
Considering the S 1 results, instead, it is worth noting that raw data give an estimation of Z that is not correct, while the bilinear and bicubic interpolation techniques both give a quite accurate estimation. With respect to the known value of the distance Z between sources and Acoustic Camera (12.25 m), data from S 1 allow to calculate values with percentage difference in the order of about 7-13%. In detail, starting from data interpolated with a bilinear technique, the difference is −13.2% (−1.62 m), while using data interpolated with a bicubic technique the difference is +7.3% (+0.90 m).
Regarding the estimation of the distance X between the acoustic sources S 1 and S 2 , the analysis of the image allows to obtain the information on the number of pixels between them, which is the necessary data to calculate the Euclidean distance according to equation 17. Using a reference object, in fact, it is possible to define the size of the pixel in the image at a certain distance from AC and consequently to determine the Euclidean distance X, simply multiplying it by the obtained number of pixels. In Table 4 the number of pixels, the size of each single pixel and the estimation of the Euclidean distance between the sources are reported. The better estimation of the distance between the sources is given by the bicubic interpolation technique. The percentage difference calculated with respect to the measured value of 6.05 m (real Euclidean distance): in fact, it is equal to −14.6% for the bilinear interpolation and +4.2% for the bicubic interpolation technique, while the estimation obtained from raw data represents the worst case (−16%).

Conclusions
The innovative technique proposed gives potentially interesting results on defining both the distance among noise sources and observer and the distances among the noise sources themselves, in the beamforming methodology for noise sources identification. Results show that the third dimension in the beamforming analysis could be retrieved, with a certain accuracy, moving the microphones array on two positions, while the sound emission remains constant.
The method allows also to obtain the relative real distance among different noise sources arising from a multisource environment.
A loss of precision is highlighted when the source is positioned close to the ground; a possible reason could be searched in the acoustic reflection of the ground itself, an effect that could be investigated in more details in future analyses. If these effects were confirmed, the data elaboration should take into account of the ground (or eventual vertical obstacles) acoustic properties.
The approach could be improved by means of better mathematic interpolation algorithms, at the aim of increasing the accuracy of the post-processing analysis of data derived from the beamforming technique.

Funding information:
The authors state no funding involved.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

Conflict of interest:
The authors state no conflict of interest.