Detecting surface defects of heritage buildings based on deep learning

: The present study examined the usage of deep convolutional neural networks (DCNNs) for the classi ﬁ cation, segmentation, and detection of the images of surface defects in heritage buildings. A survey was conducted on the building surface defects in Gulang Island (a UNESCO World Cultural Heritage Site), which were subsequently classi ﬁ ed into six categories according to relevant standards. A Swin Transformer-and YOLOv5-based model was built for the automated detection of surface defects. Experimental results suggested that the proposed model was 99.2% accurate at classifying plant penetration and achieved a mean intersection-over-union (mIoU) of over 92% in relation to moss, cracking, alkalization, staining, and deterioration, outperforming CNN-based semantic segmentation networks such as FCN, PSPNet, and DeepLabv3plus. The Swin Transformer-based approach for the segmentation of building surface defect images achieved the highest accuracy regardless of the evaluation metric (with an mIoU of 90.96% and an mAcc of 95.78%), when contrasted to mainstream DCNNs such as SegFormer, PSPNet, and DANet.


Introduction
Heritage buildings are valuable historic structures that play an increasingly important role in reflecting history, promoting cultural inheritance, and displaying values, which makes their conservation attract increasing attention from the international community.Nevertheless, the situation of heritage building conservation in China looks grim.Statistics from the National Cultural Heritage Administration of the People's Republic of China (PRC) reveal that, over the recent 5 years, hundreds of projects have been submitted for approval which are related to the retrofit of heritage buildings on a national scale.The Ministry of Finance of the PRC invested a total of RMB six billion in the conservation and restoration of heritage buildings during the "12th Five-Year Plan" period alone.To date, heritage buildings of the first six batches have all experienced a round of restoration.It is demonstrated that the conservation and restoration of heritage buildings remain a frequent issue for which heavy investment is indispensable.
Building surface, the interface between the interior and exterior of a structure, retains the maximum possible degree of authenticity that each heritage conservation charter requires to preserve.It not only highlights the history of the architecture itself, but more importantly, it allows people to intuitively recognize the category and degree of a defect in a heritage masonry building and determine the corresponding restoration required.Thus, the detection of building surface defects is a priority of architectural heritage survey and defect diagnosis.Traditional detection methods prove too lengthy and laborious to systematically analyze building surface defects effectively.Moreover, some spaces (e.g., roofs) are inaccessible for detection due to limitations posed by site conditions, potentially endangering the safety of surveyors [1].
For the past few years, with the development of both digital image processing algorithms and traditional machine learning algorithms, computer vision (CV) analysis and deep learning (DL) among other technologies have come into use in assessing the structural health of a building.Relevant research has covered the following scopes: detecting defects on the interior surfaces of buildings through CNN [2]; monitoring structural health through DL [3][4][5]; recognizing and evaluating cracks and spalls by applying Gaussian regression and support vector machine [6][7][8]; and detecting masonry wall defects through the logistic regression and point cloud [9,10].A number of papers have also explored the classification of architectural heritage images [11][12][13][14][15][16][17].However, very few of the aforementioned studies specifically examined defect image recognition on heritage building surfaces, while existing research methods could not meet the practical needs of detecting multiple defects on building surfaces.The major proposal of this disquisition is summarized as follows: 1.This study classifies the defects on heritage building surfaces into six categories as per their features such as plant penetration, moss, cracking, alkalization, staining, and deterioration.2. This study proposes a model to recognize the images of surface defects in heritage building using the DL models of YOLOv5 and Swin Transformer with environmental differences affecting such defects that are considered in this research study.3. Application tests are implemented to validate the feasibility and accuracy of the assumed model for detecting surface defects in heritage buildings.
The structure of the current work is organized as follows: Section 2 gives a retrospect of the interrelated studies on the research topic.Section 3 presents materials and methods of this study.Section 4 is focused on comparative tests and their results; Section 5 gives an illustration of practical applications of the proposed research, and Section 6 concludes the results and discussion of the proposed research.

Related work
The use of DL in the context of heritage building conservation has emerged as an essential area of research and innovation.This section presents a comprehensive investigation of the application of conventional and stateof-the-art DL-based approaches for classifying, segmenting, and detecting surface defects in heritage buildings.A comprehensive investigation of the literature review is summarized in Table 1.

Conventional approaches
To monitor historical buildings, IR thermography has been widely used for two decades.Hidden structures of walls, moisture status, and finishing status were investigated by using IR thermography [18].Moreover, the same technique was applied to measure the porous material.The IR thermography technique is more suitable for investigating conserved, repaired, and restored structures [19].Heritage as a latent structure of buildings requires conservation and preservation.Several social and natural factors have serious threats to deteriorate and damage the origin of the buildings.To make sure that heritage buildings are conserved and preserved, their visual inspection is of greater importance.Conventional practices are based on manual inspection that takes a lot of time and resources [20].An innovative technique may replace manual inspection by using less human resources and much faster than conventional techniques.For tangible heritage conservation, planning practices were recommended for several digital technology companies [21].Lerones et al. [22] presented an innovative method for detecting moisture in heritage buildings using 3D laser scanner surveying data.Moisture can lead to structural deterioration and aesthetic damage in historic buildings, making its detection crucial.This non-intrusive method analyzes laser reflectivity levels offline, covering large areas quickly without interacting with materials.This approach provides conservation professionals with objective and comprehensive information on moisture damage, aiding in decision-making.The effectiveness of this method is demonstrated through its application in the Cathedral of Ciudad Rodrigo, Spain.
Tavukçuoğlu [23] discussed the significance of non-destructive testing (NDT) techniques for in situ building inspections.It highlights the value of quantitative infrared thermography (QIRT) and ultrasonic pulse velocity (UPV) measurements for assessing moisture, thermal, material, and structural issues in historical and contemporary structures.The joint use of QIRT and ultrasonic testing permits damage detection, assessment of materials, and thermal performance evaluation.The research highlights the need for a multi-disciplinary methodology to augment materials technology and conservation practices.Błaszczak-Bąk et al. [24] explored the use of LiDAR technology for wall defect detection in unlit environments.In this work, the Terrestrial Laser Scanning (TLS) measurements are processed using the Optimum Dataset (OptD) method.This preserves more points of interest in imperfect surfaces (e.g., cracks) while removing redundant information in homogeneous areas.The improved OptD algorithm is effective in detecting and segmenting defects, aiding in estimating repair costs.Wong [25] proposed NDT method which includes infrared thermography, ground penetrating radar (GPR), microwave moisture tomography, and ultrasonic pulse echo tomography to evaluate historic building conditions.Case studies presented in this work demonstrate the identification of hidden details, defects, deterioration, and moisture detection in these structures.
Radnić et al. [26] deliberated the restoration and structural strengthening of historical masonry buildings using a case study of the Minceta fortress in Dubrovnik.This work emphasized the significance of nondestructive tests, including static and dynamic analysis in evaluating structural safety and deterioration.Błaszczak-Bąk et al. [27] presented an approach for automatic wall defect detection in unlit environments using LiDAR and the modified OptD method.TLS measurements are handled to identify and segment defects, facilitating cost estimation for repair and renovation of historic buildings.
Wu et al. [28] proposed RBGNet system for a new rail surface defect detection.RBGNet makes use of a novel architecture that combines rail surface and edge information to accurately identify rail surface defects.This work employs a hybrid loss function for network training and integrates edge features with rail surface features to improve detection precision.The system is verified on complex unmanned aerial vehicle (UAV) rail datasets, demonstrating high detection rates in challenging environments.Wood and Mohammadi [29] studied about detecting surface damage and cracks in historic fresco walls using geometric features in point cloud data.This work delivers a non-destructive and color-independent approach for identifying damage based on geometric descriptors.The method has been investigated on a diverse dataset of historic buildings, showcasing its potential for damage detection in heritage structures.
Al-Sakkaf et al. [30] did a review on the use of GPR technology for defect detection in heritage buildings.GPR suggests a non-invasive method for detecting internal features in structures, particularly stone masonry.This study identifies conventional methodologies and highlights the effectiveness of GPR in assessing heritage structures' condition.

DL approaches
Deep convolutional neural networks (DCNNs) were used to detect damages caused by various pathologies.The effects of pathologies were severe in damaging cultural heritage [31].The main advantage of the proposed approach was to quantify the structural defects of buildings by using water infiltration, concrete carbonation, and efflorescence.To assess the conditions of heritage buildings, the study by Sharma et al. [32] was used to determine the dust amount deposited in buildings.The level of dust indicated the damage to a building and higher level of dust deposited generated a warning for maintenance.
To estimate the missing components of historical places, a recent work [33] proposed to use the Faster R-CNN model in forbidden cities.The proposed model has the capability to detect 2D images and position the missing components.Although the proposed technique laid a foundation for the intelligent inspection of heritage buildings, still a lot of research is needed for the comprehensive detection of missing parts of the buildings.Extended research was presented by Zou et al. [34] to inspect the distinctive patterns on the surfaces of ancient architectures.A DL approach based on inpainting, segmentation, and classification was proposed.Segmentation is aimed at gaining the mask for the defective parts.Afterwards, inpainting algorithm was applied to rebuild the damaged parts.Ultimately, Residual Neural Networks were applied for the classification of rebuilt images.Overall, the proposed approach improved the classification accuracy of reconstructed images.However, inpainting and segmentation were not presented for inspection in advance.
Wang et al. [35] facilitated the identification of damage in historic masonry structures by introducing an automatic detection technique.This utilizes the Faster R-CNN model based on ResNet101.This method successfully detects efflorescence and spalling damage with high precision.The research also presented IP webcam and smartphone-based real-time damage detection systems, contributing to the safeguarding and management of historic buildings.Wenlong et al. [36] concentrated on bridge surface damage detection using DL.This work addressed the challenges related to dataset size and class imbalance.This work presents an Atrous spatial pyramid pooling (ASPP) module and a weight-balanced Intersection over Union (IoU) loss function to enhance accuracy in detecting delamination and rebar exposure on bridges.
Ye and Sun [37] reviewed machine vision-based methods for detecting surface defects in ceramic tableware.This work recapitulated imaging methods, defect types, and mathematical modeling approaches.The research study identifies areas for improvement in feature extraction algorithms based on deep neural networks.Stephen et al. [38] presented a CNN model for classifying surface defects in tile surface images.The CNN learns discriminative feature representations and performs binary-class classification, distinguishing between cracked and non-cracked surfaces.This approach shows potential for automating visual inspections and achieving efficient classification of surface defects.
Teng et al. [39] introduced an improved YOLOv3 model for real-time bridge surface defect detection.This work addresses issues like blurry edges and noise, achieving high detection accuracy.This method has the ability to optimize bridge inspection processes and enhance defect detection in various bridge types.Shao et al. [40] presented a two-stage method for detecting surface defects in concrete buildings using point clouds and a 3D neural network.This method divides buildings into 3D grids and employs PointNet++ for damage classification.This method attains acceptable detection performance on aging concrete surfaces, yielding a nondestructive approach for damage assessment.
Bolourian [41] developed a point cloud-based DL method, SNEPointNet++ for semantic segmentation of concrete bridge surface defects.This work uses a publicly available dataset for defect detection and achieves high recall and precision rates for different types of defects.Furthermore, this work proposed an efficient path planning procedure for LiDAR-equipped UAVs during bridge inspections.Meklati et al. [42] introduced a crowd-sensing solution based on DL for automatically detecting common surface damages on heritage walls.This work makes use of a CNN integrated into a mobile application, enabling users to capture and diagnose wall damage instantly.The approach proves effective, providing rapid and objective damage assessments.
Chen et al. [43] employed a transfer learning based approach for image classification to detect cracks in building facades.Transfer learning enhances accuracy even with limited data.The research work highlights the potential of DL for efficient image classification in the context of building facade inspections.Bruno et al. [44] proposed a mask R-CNN model for detecting decay morphologies on built heritage, especially historic buildings.This method utilizes CV and artificial intelligence to remotely assess the conservation status of heritage structures.Experimental results prove effectiveness in identifying specific types of alterations and provides valuable support for heritage conservation efforts.
Yang [45] provided an overview of surface defect detection methods based on CNNs.This work summarizes modern methods and their application scenarios in industrial defect detection.The focus is on utilizing DL models for efficient and automated defect classification in various domains.Other than DL based approaches proposed in the literature, Kwon et al. [46] introduced a genetic algorithm based technique to predict the maintenance cost of the aging buildings in urban areas.The applicability of the proposed technique was tested by performing a number of experiments.This technique devised a systematic method to forecast the maintenance cost and supported the management to decide about maintenance in long-term perspectives.
Functional deterioration of buildings is negatively impacted by the climate change conditions.Several works have been carried out to seek the reasons behind this negative relationship between climate change and building deterioration.One of these studies have been conducted to determine the climate change impacts on buildings in Chile where it was revealed that an increase in temperature could reduce the average annual precipitation [47].The results of the above mentioned were surprising for researchers.However, the proposed technique's applicability to other issues of buildings remains to be conducted in future works.For example, prioritization of maintenance costs can be linked for buildings located in various parts of the world.

Data collection for classification
Building defects normally arise due to the constant influence of external factors that are primarily comprised of five types: mechanical properties, electromagnetic radiation, climatic conditions, oxidation, and biological agents [48][49][50][51][52].As shown in Figure 1, this study has classified surface defects into six categories, including plant penetration, moss, cracking, alkalization, staining, and deterioration.A total of 900 defect images were captured by a digital camera from the exterior walls of different heritage buildings, with each category accounting for 150 images.Out of the total captured images, 720 were used for training the model, 120 from each category, while the remaining 180 images were used for testing.

Image cropping and database creation
Due to their complex backgrounds, the sample images were randomly cropped and sifted through to highlight specific information on a certain category of the defect and minimize the influence of irrelevant factors.We selected the images with defect features, which were clearly visible and were over 15% of the entire cropped image in size.
Afterwards, the selected images were cut into sub-images of 512 × 512 pixels to generate the dataset.The training dataset contained 2,400 sub-images in total, which were labeled into six categories such as, plant penetration (400 sub-images), moss (400), cracking (400), alkalization (400), staining (400), and deterioration (400).Plant penetration images were classified and detected through a YOLOv5 model, whereas the remaining five categories (cracking, alkalization, staining, deterioration, and moss) adopted Swin Transformer for the segmentation and detection of their images.A transformer-based model totally relies on self-attention to compute the input and output representations regardless of using convolutional or RNN models.Thus, Swin Transformer has resolved the computational issue as well as costs that are linear to an image size [53].Also, Swin Transformer model improves the performance efficiency by operating regionally and enhancing the respective fields that show a high correlation with the visual signals.

Image pre-processing
Data pre-processing matters in various DL algorithms.In practice, data normalization and whitening are essential to many algorithms in order to yield optimal results.Since the ambient light conditions exerted a considerable influence on the sample images when they were collected, this study has subtracted the mean from each image to minimize the effects on their overall luminance.
Detecting surface defects of heritage buildings based on deep learning  7 Given that this study adopted the pre-trained model on the ImageNet dataset for image segmentation, the mean value and standard deviation were calculated based on the ImageNet dataset, which were respectively set to [123.675, 116.28, 103.53] and [58.395, 57.12, 57.375 where X̅ denotes the mean value of Sample X, x i represents one of the sample data (x i ∈ X), and + x i refers to the sample obtained after mean subtraction.

Model training and comparative tests
This study utilized the YOLOv5 model for defect image classification and detection (plant penetration) while adopting Swin Transformer for defect image segmentation and detection (cracking, alkalization, staining, deterioration, and moss).The model in this research, when applied to detecting some building surface defects, produced the effects as illustrated in Figures 2 and 3) The proposed research used YOLOv5 as a surface defect classification and detection.The YOLOv5 also performs well based on speed and accuracy.Its accuracy is mainly based on three parts such as backbone, encoder, and decoder [54].The strength of this network lies in the extraction of backbone features and using those features for prediction.

YOLOv5-and Swin transformer-based network models
YoLov5, a typical one-stage object detection algorithm that combines image classification with localization, reframes object detection.Synchronously, generates the probability and bounding box coordinates of each class by using the regression method.At its core is feature extraction where CNN underlies.It could identify the target class on an input image and output information about the position [55,56].YOLOv5 is available in the form of four models, namely, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, which differ from each other in the width and depth of their backbone networks.In 2020, Ultralytics released the fifth YOLO variant version, which outperforms all the previous ones in speed and accuracy.Figure 4 shows the network architecture of YoLo which consists of the different modules.
One of the major tasks in CV is semantic segmentation, which pertains to image classification because it is aimed at producing a pixel-level prediction (or predictions) for a class instead of an image-level one.Fully convolutional networks (FCNs), which perform semantic segmentation tasks through CNN, have inspired a variety of subsequent works since their appearance and have become a major design option for dense prediction tasks [57].Segmentation transformer uses Vision Transformer (ViT) as a backbone, while incorporating several CNN decoders to increase feature resolution [58,59].Despite its high performance, ViT also has limitations: (1) it can only produce single-scale representations of low resolution instead of multi-scale ones and (2) it incurs high computational costs when applied to large images.To address these limitations, Swin Transformer (Figure 5) splits an image into different patches (taking the form of windows) before computing self-attention efficiently within each window, while employing UPerNet the decoder produces segmentation results.Therefore, it is suitable for dense predictions [60].
The architecture of Swin Transformer is illustrated in Figure 5.The input image size is defined as HxWx3.First of all, it splits an input RGB image into a number of non-overlapping patches.Each patch is considered to be a token and a concatenation of raw pixels such as RGB values.As seen in Figure 5, there is an additional linear embedding layer to an arbitrary dimension which is denoted by C. In the end, we have several Swin Transformer blocks with the self-attenuation computation that can be applied to patch tokens.The image recognition model for surface defects of heritage buildings in this research was trained on PyTorch DL framework.Adaptive moment estimation (Adam) and particularly its variant AdamW optimization algorithm was applied for model training, with the initial learning rate set to 10 -5 , betas = (0.9, 0.999), and weight_decay = 0.01.Weight_decay [62] is computed as given in equation ( 2).
where η expresses the learning rate, λ is the parameter performing the parameter scaling and weight decaying, and θ is the parameter that is being optimized.These are subtracted from parameters during the update step.
The defect sample images were trained on Nvidia V100 32GB graphics cards, with the batch size equaling 4. The 400 epochs of training took around 15 h.

Comparative tests and results
The Swin Transformer-based method for the segmentation of building surface defect images achieved the highest accuracy regarding the evaluation metric, with an mIoU of 90.96% and a mean accuracy (mAcc) of 95.78%, outperforming mainstream DL networks such as SegFormer, PSPNet, and DANet.The comparative results when different segmentation algorithms were executed for some defect (alkalization and deterioration) images are shown in Figure 6, whereas the comparative test results (mIoU/mAcc metric) of each model are shown in Table 2.
Tables 2 and 3 show the comparison of the Swin-Base network with the state-of-the-art techniques used for detection of defects in buildings in the literature.Swin-Base algorithm has higher mIoU metric results for five defects as shown in Table 2.Among the defects, cracking defect detection accuracy is lower for Swin-Base algorithm compared with the other four defects.However, the accuracy rate for cracking defects and the rest of the defects from Swin-Base algorithm remained higher than the other algorithms.As illustrated in Table 3, higher mAcc metric results were gained by the Swin-Base algorithm in the current research.

Test on the remaining samples after the manual screening
The image datasets that were previously not used for model training (180 images in total, 30 from each category of defect) were sifted through and processed for the purpose of generating the test samples, which comprised a total of 550 sub-images: plant penetration (50), moss (100), cracking (100), alkalization (100), staining (100), and deterioration (100).Test results confirmed the good performance of the trained model, specific data of which are shown in Tables 4 and 5.

Recognition of the defects in the real test images
Since field-collected images were normally large in size with a high pixel density and contained complex information, it was necessary to utilize sliding windows to scan those images in search of building surface The bold value represents the model showed the good performance of the accuracy of mIoU in the real test images defects therein.As shown in Figure 7, each sliding window was moved over the image with a certain stride to identify any defect within this window.To ensure thorough coverage of the defect zones, overlapping scan areas were created by setting the window stride to half of the window size.The field-collected images differed in their scale factor, thus requiring different sizes of sliding windows to scan them, which were resized through the method of bilinear interpolation to fit the input size the model required (512 × 512).The specific recognition process is shown in Figure 8, while image recognition of some surface defects is demonstrated in Figure 9.  Detecting surface defects of heritage buildings based on deep learning  15 Application test results indicated that the trained model performed well in recognizing and localizing surface defects.Nevertheless, a number of detection errors also occurred, particularly in the following three cases: (1) surroundings in the images were complex, which easily affected the recognition; (2) defects in the same image abounded, varied, and strongly resembled each other; and (3) great disparities existed between the training samples and the defect images to be identified.

Conclusion
This study examined the application of the YOLOv5-and Swin Transformer-based DL model in recognizing images of surface defects in heritage buildings.This research conducted a comparative study on the mainstream models such as SegFormer, PSPNet, and DANet.Surface defects were classified into six categories according to their features, including plant penetration, moss, cracking, alkalization, staining, and deterioration.The proposed model for defect detection was practically tested for its reliability.The test results suggested that the YOLOv5-based image classification method and the Swin Transformer-based segmentation method could contribute to the rapid identification of surface defects in heritage buildings due to their high recognition accuracy.
Funding information: This research was supported by the National Natural Science Foundation of China (Grant number 52078154).

Figure 1 :
Figure 1: Samples of the dataset that were used to train our model showing diverse plant penetration images (a-c), cracking (d-f), staining (g-i), deterioration (j-l), alkalization (m-o), and moss (p-r).

Figure 2 : 9 Figure 3 :
Figure 2: Effects of the model in plant penetration detection.(a and c) Original images and (b and d) classification results.

Figure 4 :
Figure 4: Overview of YoLo network architecture.

Figure 5 :
Figure 5: Overview architecture of Swin Transformer.(a) Architecture and (b) two successive swin transformer blocks.

Figure 7 :
Figure 7: Segmentation results of the remaining samples of building surface defects (Sliding window).

Figure 8 :
Figure 8: Flow chart of image recognition through sliding windows.

Table 1 :
Summary of comprehensive investigation of literature reviewDetecting surface defects of heritage buildings based on deep learning  3

Table 2 :
(1)parative test results of different segmentation algorithms (mIoU metric), with Swin Transformer achieving the highest accuracy in detecting all five defects The bold value represents Swin Transformer achieving the highest accuracy of pixel and mAcc in detecting all five defects Detecting surface defects of heritage buildings based on deep learning  13To meet practical needs, it is necessary to evaluate the generalization ability related to the previously trained model on the out-of-sample images.Practical testing consisted of two parts in this study:(1)validation test of the model on the remaining samples after the manual screening and (2) test of the model in recognizing the defects in the real test images.

Table 3 :
Comparative test results of different segmentation algorithms (mAcc metric), with Swin Transformer achieving the highest accuracy in detecting all five defects The bold value represents Swin Transformer achieving the highest accuracy of pixel and mAcc in detecting all fve defects

Table 4 :
Classification results of the remaining sample images of plant penetration

Table 5 :
Segmentation results of the remaining sample images of building surface defects