Multivariate analysis for the classification of copper–lead and copper–zinc glasses

Abstract The similarity patterns in the physicochemical properties of copper–lead and copper–zinc borate glasses were identified by means of finding similarity within the objects of study using multivariate statistical analysis. As exploratory methods of multivariate analysis, cluster analysis, principal components analysis, and two-way clustering were applied for a set of copper–lead and copper–zinc borate glasses. Specific correlations among the physicochemical properties of copper glasses were interpreted. In particular, the effect of Pb and Zn doping metal ion in copper glasses in the structural and mechanical properties is identified. Interestingly, the degree of lead content determines two kinds of glasses with specific physicochemical properties.


Introduction
In the design of glass compositions, transition metals oxides being a constitutional part of the glass systems could play a dual role related to their physicochemical and structural properties (coordination number, different redox state at specific conditions, etc.) [1][2][3][4][5][6][7][8].
Oxide glasses exhibit interesting properties such as density, elastic moduli, hardness, glass transition temperature, and liquid fragility, and the changes in chemical composition cause significant alternations in all measured physical properties. It was also found that the change of copper content in copper-lead borate glasses results in a slow disproportionation of B-containing groups due to change in the coordination number of Cu + when compared with Cu 2+ ions. It causes changes in the physicochemical properties such as hardness, thermal stability, density, thermal expansion coefficient, and chemical durability. Zinc metaborate glasses are subject to an enhanced disproportionation of metaborate in other structural units since the lead metaborate series exhibit different structural alternations [9][10][11][12][13][14][15][16].
Specific changes in the physicochemical parameters of the borate glasses due to changes in their chemical composition could be subject to another experimental data approach, namely, intelligent data analysis (exploratory data mining) [17,18]. This option is rarely used in the analysis of the properties of noncrystalline materials. Multivariate statistics methods such as cluster analysis (CA) or principal components analysis (PCA) offer a new opportunity for data modeling, classification, and interpretation. They are superior with respect to their information content to the traditional correlation analysis if relationships between physicochemical properties and chemical composition are sought.
Research objectives in the current work may be linked with a validation of the efficiency of exploratory data analyses to study and interpret a set of experimental data from the literature source including values for different physicochemical parameters in different copper-zinc and copper-lead metaborate glass series. In addition, relationships for chemical composition versus physicochemical parameters are evaluated by regression analysis.

Multivariate statistical methods
In the present study, two multivariate statistical methods were used: CA and PCA.
CA is a well-known approach for searching similarity patterns (called clusters) within the data approach to reveal groups of similarity (clusters) within large data sets. The identification of the similarity groups makes it possible to better interpret specific relations between the objects of the study (glass types) or among the parameters characterizing them (physicochemical variables).
Three different modes of clustering were applied: hierarchical cluster, nonhierarchical cluster, and twoway CA.
Hierarchical CA is a nonsupervised method for linkage of objects in clusters that follows a simple algorithm: • normalization of the raw experimental data (to avoid differences in dimensionality which could deteriorate the clustering; usually the z-transform procedure is applied); • introduction of a similarity measure (Euclidean distances or squared Euclidean distances are often calculated as a measure of object similarity); • choice of linkage procedure (among many options we have chosen the Ward's method of linkage); • selection of criteria for determination of the cluster significance, as the Sneath's criterion based on the D max distance in the similarity matrix. The output plot of the analysis is called a dendrogram, which allows the visual determination of significant clusters for later interpretation. No preliminary conditions are introduced for hierarchical clustering [19].
Nonhierarchical clustering belongs to supervised pattern recognition methods. It requires a preliminary determination of the number of clusters to which the objects (or variables) should belong. Therefore, it demands a priori hypothesis for the number of clusters for the proper preselection of the number of clusters based on expert opinion or preliminary information. The K-means clustering was conducted, which could be considered both as supervised and unsupervised cluster approaches. It aims to separate the objects into a preliminary chosen number of clusters so that each object belongs to the cluster with the nearest distance that represents the prototype of the cluster. This procedure is iterative.
Two-way clustering is also used in the present study. It is useful in cases where both objects and variables are expected to contribute simultaneously to discover meaningful patterns for the clusters. The resulting structure (clusters) is by nature not homogeneous. However, it is recognized by some researchers that this method offers a powerful tool for exploratory data analysis since it makes possible (like other multivariate statistical approaches such as correspondence analysis) to reveal relationships among the clusters of objects and the clusters of variables.
PCA (very similar to factor analysis) is a linear dimensionality reduction strategy, which introduces new directions in the original data space, making it possible to project the data in a lower-dimensionality space formed by new linearly uncorrelated variables called principal components or latent factors [20]. It is worth mentioning that the main application of this analysis is to reduce the number of variables and to detect structure in the data set, which allows classifying objects or variables. Therefore, PCA is applied as a data reduction method [19,20].
Ethical approval: The conducted research is not related to either human or animal use.

Hierarchical CA
The input data set consists of 16 borate glass samples (eight of them with Pb doping and eight with Zn doping) described by 11 parameters (physicochemical variables) [21].
The goal is to reveal patterns of similarity among the different glass samples or among their physicochemical descriptors.
In Figure 1, the hierarchical dendrogram for linkage of 11 variables is presented (z-transformed raw data; squared Euclidean distances as similarity measures; Ward's method of linkage, and Sneath's test of cluster significance).
Three clusters could be interpreted: C1: C gpacking density and CuO (%mol). The essential role in glass properties is related to the atomic packing density (C g ). The atomic packing density is described as the ratio between the minimum theoretical volume occupied by the constituting ions and the effective volume of the glass. The correlation of C g with the copper content is readily observed on the dendrogram. One could conclude that this tendency is general for the Cu-containing glass network. For CuO levels above 20 mol%, C g tends to level-off, and no further accumulative behavior was detected (Supporting Information).
For the case of the glass system with lead, the packing density decreases for above 20 mol% of PbO. This observation is a consequence of the structural properties and the increasing significance of the packed structure pattern in the packing density.
The similarity between glass transition temperature T g and elastic modulus, glass K, is depicted on the dendrogram (Figure 1). The K is directly related to the external force required to compress or extend interatomic distances in opposition to the internal forces that seek to establish equilibrium interatomic distance.
The K is directly related to the external force required to compress or extend interatomic distances in opposition to the internal forces that seek to establish equilibrium interatomic distance.
The elastic properties of glasses define many of their mechanical properties. Besides, the thermal or pressure change of the bulk (K), shear (G), and Young (E) moduli offers information on the variations in the atomic glass structure. This cluster reveals a strong correlation among these physical properties. The values of the constants K,  G, E as well as of the Poisson's ratio could be measured experimentally by various methods for solids, including mechanical deformation or sound wave propagation techniques. For liquids, the elastic constants can be obtained from velocity measurements of high-frequency sound waves. These explanations are directly supported by the results of the hierarchical CA. In the last cluster, C3, the similarity pattern is formed between density and coefficient of thermal expansion (CTE). This linkage could be used to model new Cu glasses with particular thermal characteristics. C3 is a very homogeneous cluster since C1 and C2 could be considered as one bigger cluster, which could be divided into two subclusters. Figure 2 represents the hierarchical clustering of the glass samples (objects of the study). There is a wellexpressed separation between Pb-and Zn-glass types. It could be stated that higher concentrations of Pb (PCB 20 and PCB 30) form an intermediate cluster between both classes of samples and are closer to the Zn type of glass rather than to Pb type of glass. This should be explained by specific relationships among the physicochemical parameters.

PCA
The z-transformed dataset was also subject to PCA. It was found that three principal components explain over 95% of the total variance of the system. Figure 3 shows the projection of the variables on the plane of factors 1 and 2, and Figure 4 shows the projection on the plane of the objects.
In general, most of the groupings observed by hierarchical CA are confirmed by PCA. It is readily seen that CTE and density variables form a well-expressed group since all other variables belong to another class.
The same conclusion is valid for the projection of the objects, that is, well-expressed separation of Pb and Zn cases with an intermediate position of PCB 20 and PCB 30).
The relationship between glass samples and physicochemical characteristics is demonstrated by a two-way clustering approach. It reveals how close the clusters of cases (glass systems) are connected with the clusters of variables (physicochemical parameters). Thus, discriminant parameters for each glass cluster could be extracted.
The foremost object for separation of the glass samples into two classes is the variances in the physicochemical parameters. Pb-glasses are characterized by the highest values of density and CTE since Zn-glasses show the lowest level of these two characteristics. Again, PCB 20 and PCB 30 differ from the Pb-glass pattern having low values of CTE and density.
All other physicochemical parameters reveal their higher levels for the Zn-glasses class and lower ones for the Pb-glasses class.

Conclusions
The present study has used intelligent data analysis to elucidate the relationship between the copper-lead and copper-zinc borate glasses with different copper contents. Several correlations among glasses and their physicochemical properties based on multivariate analysis techniques as CA, PCA, and two-way clustering were obtained. This allows for identifying the effects of Pb and Zn doping metal ions in copper glasses in the structural and mechanical properties. One cluster is identified for Zn-doped glasses, whereas two clusters with differentiated properties are identified with copper glasses doped with Pb.