Interlaboratory comparison on the determination of the molecular composition of humic substances (HS) was undertaken in the framework of IUPAC project 2016-015-2-600. The analysis was conducted using high resolution mass spectrometry, nominally, Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS) with electrospray ionization. Six samples of HS from freshwater, soil, and leonardite were used for this study, including one sample of humic acids (HA) from coal (leonardite), two samples of soil HA (the sod-podzolic soil and chernozem), two samples of soil fulvic acids (FA) (the sod-podzolic soil and chernozem), and one sample of freshwater humic acids (the Suwannee River). The samples were analyzed on five different FTICR MS instruments using the routine conditions applied in each participating laboratory. The results were collected as mass lists, which were further assigned formulae for the determination of molecular composition. The similarity of the obtained data was evaluated using appropriate statistical metrics. The results have shown that direct comparison of discrete stoichiometries assigned to the mass lists obtained by the different laboratories yielded poor results with low values of the Jaccard similarity score – not exceeding 0.56 (not more than 56 % of the similar peaks). The least similarity was observed for the aromatics-rich HA samples from leonardite (coal) and the chernozem soil, which might be connected to difficulties in their ionization. The reliable similarity among the data obtained in this intercomparison study was achieved only by transforming a singular point (stoichiometry) in van Krevelen diagram into a sizeable pixel (a number of closely located stoichiometries), which can be calculated from the population density distribution. The conclusion was made that, so far, these are descriptors of occupation density distribution, which provide the metrics compliant with the data quality requirements, such as the reproducibility of the data measurements on different instruments.
atmospheric pressure photochemical ionization
- FTICR MS
Fourier transform ion cyclotron resonance mass spectrometry
International Humic Substances Society
International Union of Pure and Applied Chemistry
nuclear magnetic resonance spectroscopy
natural organic matter
The chemistry of complex matter is a new challenge for modern chemistry. This refers in full to natural organic matter (NOM) and humic substances (HS), which represent the complex systems of non-living organic matter cycling throughout the environment , . The abiotic synthesis of HS has no genetic code: it is governed by the biothermodynamic stability of the products . The fundamental properties of NOM and HS are non-stoichiometry of elemental composition and extreme heterogeneity of structural and molecular compositions , . As a result, a notion of the molecule for these compounds is transferred into the molecular ensemble, whose every parameter is described by a distribution of stoichiometries, structures, and molecular masses . Moreover, interactions between individual components, which are related to the spatial organization of the NOM or HS ensemble, can be crucial for the formation of physical and chemical properties of the entire system , . This renders the common deterministic approach to the representation of chemical structure inapplicable to NOM and HS, while it implies a use of individual structural formulae with fixed chemical topography, including the number and type of atoms, as well as the lengths and angles of chemical linkages. Hence, for these systems, alternative approaches are needed, based on the notion of chemical space and the probabilistic distributions of stoichiometries and structures in the complex chemical system .
The method of choice for unfolding the molecular complexity of humic systems is high resolution mass spectrometry and, in particular, Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS) with soft ionization techniques, such as electrospray (ESI) or atmospheric pressure photoionization (APPI) . This is because, unlike many other analytical techniques, FTICR MS resolves individual molecular stoichiometries, even within complex humic systems . The application of FTICR MS for the molecular analysis of HS and NOM was started in 1997 by Alan Marshall’s group at the National High-Magnetic Field Laboratory at Florida State University, USA . 20 years later, vast data sets are accumulated by different laboratories, in particular on the compositional space of NOM samples from various sources , , , as well as on HS of terrigenic origin , . It is very tempting to collate this ever-growing data set generated by different laboratories into a “chemical library”, or database, of individual components of NOM and HS, which could lay the grounds for molecular systematics of non-living organic matter. It was this intention that ignited the idea of IUPAC project 2016-015-2-600 (https://iupac.org/project/2016-015-2-600) and gathered together interested research groups from Germany, USA, Korea, and Russia. However, initial discussions by members of the group revealed that the lack of intercomparison studies on the determination of the compositional space of HS using ultra-high resolution mass spectrometry is a major obstacle for the collation of the existing data on the composition of individual components of HS into a coherent database. The necessary information to fill a “chemical library” includes data on the reproducibility of the determination of individual HS components and the relative intensity distribution acquired on different FTICR MS instruments. Thus, the aim of this preliminary study was to elaborate the appropriate metrics to compare NOM and HS molecular content as determined by FTICR MS and to quantify the similarity between sample composition as determined by different groups under similar sample-preparation and data-treatment conditions.
The developed strategy included, firstly, assembling a set of well-described HS samples of diverse terrestrial origin to account for possible matrix effects when performing FTICR MS measurements. The sample set included HS isolated from natural water, soil, and brown coal – six samples in total. Also, the crucial task was to elaborate guidelines for sample preparation and the basic conditions of ESI FTICR MS measurements. In addition, guidelines for FTICR MS data collecting and sharing format were compiled. The obtained data set was used for multivariate statistical treatment. In total, the data set included 30 FTICR mass spectra, which were acquired on six samples, used in this study on five different FTICR MS instruments.
2 Materials and methods
2.1 Intercomparison set of humic materials and FTICR MS facilities
Six HS samples were included in the experimental set: one sample of humic acids (HA) from the Suwannee River (SRHA, the IHSS standard), one sample of humic acids from the sod-podzolic soil collected near Moscow (SHA-PwM, the state standard sample, Russia), one sample of humic acids from the Mollisol collected near Kursk (SHA-CtK, the state standard sample, Russia), one sample of fulvic acids from the sod-podzolic soil collected near Moscow (SFA-PwM), one sample of fulvic acids from the Mollisol collected near Kursk (SFA-CtK), and one sample of coal (leonardite) humic acids (CHA) isolated from commercial humate. The samples were provided by the laboratory of Dr. Irina Perminova (Lomonosov MSU, Moscow, Russia). The selected samples were delivered to five research groups who had access to FTICR mass spectrometers. These five groups are listed in Table S1.
2.2 The guidelines for intercomparison study
2.2.1 The guidelines for sample preparation
The HS samples used in this study were delivered to the participating groups as dry powders. The following guidelines were elaborated for preparation of the samples for FTICR MS analysis:
For the SRHA sample and two SFA samples:
dry HS sample (1 mg) was dissolved in 2 mL of methanol (HPLC grade or higher). Prior to analysis, the obtained stock solutions were diluted by methanol to a mass concentration of 20 to 100 mg/L. These concentrations are typical for aquatic samples and FA .
For the CHA and two SHA samples:
dry HS sample (1 mg) was dispersed in 2 mL of methanol-chloroform (1:2) mixture, followed by centrifugation at 10 000 rpm. This procedure (lipids removal) was repeated five times. Then, a wet precipitate was dissolved in 2 mL of 0.1 % ammonium hydroxide solution. Prior to analysis, the solution was diluted by water-methanol (1:1) mixture to HS mass concentration of 100 to 200 mg/L. The described workflow provides for a better quality of mass spectra due to the removal of highly ionizable lipids and, partially, fatty acids .
2.2.2 The guidelines for FTICR MS instrumental setup
The goal of this study was to compare results of HS molecular analysis obtained by FTICR MS under conditions routinely applied in participating laboratories. Nevertheless, the following general conditions were recommended for instrumental setup:
Standard ESI in negative mode
External calibration by appropriate standards in the mass range from 200 to 1000 m/z
Spectra acquisition: broadband mode, mass range from 200 to 1000 m/z
Accumulation of 300 scans
Mass-spectrum has to be internally recalibrated using ubiquitously-presented residual fatty acids ions  with identical masses, adjusting the root-mean-square error below 0.1 ppm. The error is calculated as the difference between experimental and theoretical masses (Da) divided by integer molecular mass (MDa) and can be both negative and positive.
2.3 Data collection and formulae assignment
The obtained FTICR MS results were collected in the form of calibrated non-filtered mass lists with signal to noise ratio exceeding four and anonymized by assigning codes to each laboratory in randomized order (a, b, c, d, e) prior to data treatment. The raw mass lists are provided in the Supporting Information. All data were treated for formula assignment using house-made Transhumus software based on the open-source R environment designed by A. Grigoryev. The parameters used for formula assignments were set with sensible chemical constraints according to the literature: O/C ratio below or equal to 1, H/C between 0.3 and 2.2, element counts (C atoms up to 120, H atoms up to 200, oxygen atoms between 1 and 60, N atoms up to 2, and sulfur atoms up to 1), and a mass accuracy window below 0.5 ppm (Da/MDa) . The data analysis was performed by the Lomonosov MSU research group shown in Table S1.
2.4 Statistical treatment
For direct comparison of the mass spectra acquired by the different research groups for the same HS sample, we calculated a Jaccard similarity score (J score) based on the presence/absence of the formulae in the pairs of the corresponding mass spectra . The J score is formally defined as the size of the intersection divided by the size of the union of the sample sets. The mathematical representation of the J score is written as:
where the numerator is the number of common formulae in the mass spectra measured for the same HS sample (area of overlap), and the denominator is the total number of formulae found in the different spectra for the same HS sample (area of union).
For correlation analysis, the FTICR MS assignments were plotted in a van Krevelen diagram, where the x and y axes represent a ratio of the numbers (n) of O to C atoms (nO/nC) and a ratio of the numbers of H to C atoms (nH/nC), respectively, as described by Kim et al.. The obtained diagram was quantitatively treated using a cell-partitioning algorithm by binning the diagram area into 20 cells, as described by Perminova . The partitioning is shown in Fig. 1, using the SRHA sample as an example. The cell-based distribution of experimental points was calculated by quantifying the intensity-weighted population density of each distinguished squared area in a van Krevelen plot cell (Di), as expressed by Equation (2) below:
where Dk is the intensity weighted population density of the cell k and N is the total number of points in the van Krevelen diagram. In the numerator, Nk is the number of points belonging to the particular cell k and Ii is the relative intensity of a peak from the mass-spectrum represented as a point i within the cell k. while in the denominator Ij is the summarized relative intensity of all peaks of the mass-spectrum corresponding to the points populating in the diagram.
The obtained values of population density could be used as descriptors of the chemical space of HS and NOM, per se, as 20 values – this approach was used in this study. They can be also combined based on the classes of chemical compounds, using the mapping of the van Krevelen diagram as described in . Another approach to splitting the van Krevelen diagram is based on a use of the aromaticity index (AI) . Atomic constraints for the AI-based partitioning were proposed by Kellerman et al. , as is shown in Table 1.
|Compound class||Atomic constraints|
|Saturated compounds (saturates)||nO/nC* < 0.3, nH/nC ≥ 1.5, nN < 1|
|N-containing aliphatic compounds (N-saturates)||nH/nC ≥ 1.5, nN ≥ 1|
|Aliphatic and alicyclic compounds (aliphatics)||nO/nC ≥ 0.3, nH/nC ≥ 1.5, nN < 1|
|Low-oxidized unsaturated compounds (unsat_lowOC)||nH/nC < 1.5, AI < 0.5, nO/nC ≤ 0.5|
|Highly oxidized unsaturated compounds (unsat_highOC)||nH/nC < 1.5, AI < 0.5, nO/nC > 0.5|
|Low-oxidized aromatic compounds (aromatic_lowOC)||nO/nC ≤ 0.5, 0.5 < AI, AI ≤ 0.67|
|Highly oxidized aromatic compounds (aromatic_highOC)||nO/nC > 0.5, 0.5 < AI, AI ≤ 0.67|
|Low-oxidized condensed compounds (condensed_lowOC)||nO/nC ≤ 0.5, AI > 0.67|
|Highly oxidized condensed compounds (condensed_highOC)||nO/nC > 0.5, AI > 0.67|
*n stands for a number of atoms and AI is the aromaticity index.
3 Results and discussion
3.1 Intercomparison of the FTICR MS data for the SRHA sample
We started by examining the mass spectra obtained for the IHSS standard sample, SRHA, by five participating laboratories (Figs. 2 and 3). Mass distributions turned out to be substantially different, in spite of the same protocol for sample preparation and the same given conditions for instrumental setup being used by the participants. The spectra of the laboratories a, b, and c contained very intense peaks of impurities. For example, mass-spectrum obtained in laboratory a was dominated by a low-mass even peak, whereas SRHA is composed mostly of odd peaks. This could be indicative of plastic and fatty acid impurities. At the same time, the mass spectra obtained in laboratories d and e did not have peaks of impurities. Nevertheless, due to high sensitivity and the dynamic range of FTICR mass-spectrometers, the number of peaks with a signal to noise (S/N) ratio larger than four was about 20 000 in all cases, except for laboratory c, which showed the lowest number of peaks. The difference in mass spectra was also reflected in peak distribution at nominal mass, as is shown in Fig. 2. For all five spectra, we observed more than 10 monoisotopic peaks at a nominal mass. The maximum intensity for laboratory a was observed for the ion with m/z equal to 301.05651, which corresponds to the unsaturated compound with molecular formula C12H13O9 (H/C equals to 1.16). At the same time, for laboratories a, b, c, and d, the maximum intensity was attributed to the ion with m/z equal to 301.03538, belonging to an aromatic compound with molecular formula C15H9O7 (H/C equals to 0.66). Abundant saturated ions with high mass defect were observed only in the case of laboratory c, while they were absent in the other spectra, as is shown in Fig. 3.
We assigned all molecular formulae for monoisotopic peaks, excluding heavy isotopologues, such as 13C and 34S. The general characteristics of the data set obtained for SRHA are given in Table 2. In the obtained range of data, laboratory a determined the highest number of assigned molecular compositions for SRHA (6403), while laboratory d determined the lowest (1748). Despite substantial variation in the number of assigned formulae, all five spectra were characterized by the dominant contribution of CHO compositions. The intensity of all other stoichiometries did not exceed 12 %. Very substantial variations were observed in the number of CHOS formulae. Where laboratories a and c determined 499 and 213 compositions, respectively, laboratories b, d, and e found significantly fewer of these compositions, lower by a factor of 10. The abundant CHOS compounds could also be originated from impurities, e.g., surfactants .
|CHO, in %||88||93||89||86||92|
|CHON, in %||8||5||6||1||4|
|CHOS, in %||4||1||1||12||1|
|CHONS, in %||1||1||3||1||2|
*Subscript “n” indicates number-averaged values, AI stands for aromaticity index calculated as described by Koch and Dittmar .
Despite the differences in the number and abundance of the assigned formulae, the calculated number-averaged parameters, such as Mn, AI, nO/nC, and nH/nC, revealed similarities between the spectra examined. Four out of five spectra (except for those of laboratory d) were characterized by number-averaged AI (AIn) of about 0.5 and Mn between 450 and 500 Da. Values of AIn equal to 0.3 and Mn equal to 379 Da were observed for laboratory d, which indicates the higher contribution of low molecular mass and relatively saturated compounds, as compared to the other laboratories. The number-averaged values of O/C ratio also indicate higher contributions of reduced compounds in laboratories b and d, with nO/nC below 0.5, compared to laboratories a, c, and e. This could be explained by the different parameters of ion accumulation in ion source and collision cell adjusted in experiments, which are known to affect the proportions of oxidized and reduced components .
The van Krevelen diagrams obtained for the same sample were characterized by similar general features of molecular distribution, but with substantial differences in the exact location of molecular species. The SRHA spectra measured by the b and d laboratories were characterized with the highest abundance of the reduced compounds. These clear outliers affected the number-averaged values given in Table 2. The data obtained by the a, c, and e laboratories were characterized by the dominant contribution of oxidized aromatic species with an O/C atomic ratio between 0.5 and 0.7, which is inherent for NOM from temperate and tropical regions , . The data sets of SRHA-b and SRHA-d did not show any distinct features. At the same time, the SRHA-d was fully depleted of oxidized compounds and its whole molecular profile was shifted towards saturated compounds.
The common and unique stoichiometries were calculated and plotted in van Krevelen diagrams, as shown in Fig. 4a and b, respectively. In total, there were 9193 formulae determined for the SRHA sample by five spectrometers. Of these, 1260 were “common” (determined by all five spectrometers), 4272 were “shared” (detected by two or more spectrometers), and 3661 were “unique” (detected by only one spectrometer). Fig. 4a shows that the stoichiometries with the most robust values, which were determined in the SRHA sample by all instruments, were located in the center of the van Krevelen diagram, close to the O/C value of 0.5 and H/C value of 1. Another observation was that the least robust stoichiometries (Fig. 5b) were positioned in the bottom left corner of the van Krevelen diagram, which is usually prescribed to “black carbon” . As a rule, these components possess the lowest abundance in FTICR mass spectra; they are poorly resolved due to low S/N ratios. This produces a large uncertainty in formulae assignment due to variability in the sensitivity and dynamic range of different instruments.
The example of direct comparison of individual stoichiometries obtained for the same sample (SRHA) by five spectrometers is a good visualization of very small overlap (14 %) in the discrete stoichiometries of HS samples determined by different FTICR MS instruments. Given much better similarity in the intensity-weighted parameters, shown in Table 2 and similar molecular profiles in Fig. 4, we have binned the area of the van Krevelen diagram into nine regions, assigned to different compound classes, as is described in Table 1, and calculated the occupation density of each area. The obtained data are shown in Fig. 5
All spectra showed the prevalence of unsaturated compounds in the molecular composition of SRHA, which provided for 60 % of the total intensity of formula assignments. Still, the spectra obtained by the b, d, and e laboratories were characterized by the prevalence of low-oxidized unsaturated compounds, while the spectra acquired by the a and c laboratories were characterized by the major contribution of highly-oxidized unsaturated compounds and a similar abundance of low- and highly-oxidized species, respectively. The d laboratory determined an extremely high content of saturated compounds, which can be considered the outlier. For the other four laboratories, the ratio of the summarized relative abundance of unsaturated compounds to aromatics was two, whereas for the d laboratory it was equal to 3. Thus, all participating laboratories revealed the major role of substituted aromatic structures indicating the ligninic nature of SRHA. The ratio of oxidized to reduced species varied among the laboratories, which can be explained by different equipment adjustments for analysis. The obtained results highlighted the importance of the development of statistical tools for data treatment, which could explore the unique and common features of HS samples analyzed by different FTICR MS instruments.
3.2 General characteristics of the full data set on intercomparison of FTICR mass spectra
The collected data set of FTICR MS data was composed of 30 spectra: five spectra for six HS samples measured by five participating laboratories. The raw mass-spectrometric data were compared using two signal properties: m/z values and relative abundance, expressed as a logarithm of signal intensity for better visualization. We highlighted the “shared” peaks in blue (“shared” means the peaks observed by two or more laboratories with an uncertainty of less than 0.2 ppm) and the “unique” peaks (i.e., observed by a single laboratory) were highlighted in orange, as is shown in Fig. 6. It would be reasonable to assume that the signals with the highest intensity could be observed in all the spectra of one sample, and the low-intensity ones would be those unique to different spectra. In that case, it would be possible for each mass range to find a threshold separating the common signals from the unique ones. It would also be logical to assume that such a threshold would depend on the mass: it could increase or decrease. Under these circumstances, one could find such a straight line that would separate a set of the common (intense) signals from the set of unique (low-intensity) ones. However, in this study it was shown that such separability was not observed within the data. Hence, our initial assumption was wrong.
This might indicate that there is no simple rule for discrimination between the “common” and the “residual” formulae. Still, abundance (or peak intensity) had substantial discriminating power with regard to the common and residual peaks. For the four laboratories a, b, c, and d, the shared peaks were much more abundant than the unique ones in four samples out of six (this was not the case for those determined by the e laboratory). The highest discrepancy between the measurements was observed for the SHA-CtK and CHA samples, which were characterized with the largest number of abundant unique peaks. It is important to note that these very samples were characterized with the highest contribution of condensed aromatic compounds . Taking into account the low ionization efficiency of these molecular constituents of humic ensemble, we suggest that the number of these species might be critical when comparing results from equipment with different analytical characteristics, as in the case of this work. The particular problems can be seen in ionizing the samples enriched with condensed components for laboratory d: the mass spectra in this case had a few common peaks with those observed by the other laboratories.
The high number of unique peaks detected by different laboratories might be explained by three reasons: discrepancy in a mass measurement, instrument-specific ion accumulation parameters, and different dynamic ranges and sensitivities of the equipment used. Recommendations for mass-spectra acquisition and calibration described in Section 2.2 of this manuscript were aimed at minimizing the first parameter. However, the dynamic range and sensitivity are specific features of the instruments and depend mainly on the ion optics, ICR cell geometry, and the homogeneity of the magnetic field . In addition, the lifetime of ions depends drastically on the ion accumulation parameters . The latter were not standardized in this study due to the different architecture of the instruments used by the participating laboratories. In Fig. 6, it can be seen that even after peaks filtration by S/N equal to 4, the laboratory a possessed the widest dynamic range at 108 arbitrary units of peak intensities, while the narrowest dynamic range, at 105, was observed for laboratory e.
The next step was a comparison of the CHNOS formulae assignments, which were made for the obtained FTICR MS data sets. The data are shown for the total number of the assigned formulae and for the number of unique formulae in Fig. 7a and b, respectively. From Fig. 7, it can be concluded that the initial data acquired by laboratories differed substantially in the total number of assigned formulae for the intercomparison HS sample set. The lowest number of formulae was assigned to the soil HA from Mollisol (SHA-CtK). The number varied from as low as 1743 (laboratory b) to as high as 3380 (laboratory e).
The total number of assigned formulae was followed by three other HA samples: SRHA, CHA, and SHA-PwM. A greater number of assigned formulae were observed for the more oxidized and lower molecular weight soil fulvic acids: in particular, sod-podzolic soil FA (SFA-PwM). With respect to the laboratories, the highest values were provided by laboratory a (the values ranged from 3000 to more than 10 000), followed by laboratory e (the values ranged from 3000 to 9000). Laboratories b and d provided lower values, from 1700 up to 7000, while laboratory c provided the lowest numbers, from 1700 to 3600. It was also of interest to examine the number of unique identifications made by the different laboratories (Fig. 8b). In the cases of laboratories a, b, and c, the number of unique identifications was higher for FA than HA, while an inverse pattern was seen in laboratories d and e. Laboratory a provided the highest number of unique formulae for all samples, while laboratories b and c yielded the lowest number of unique identifications. The dramatic discrepancy between the number of detected peaks and the assigned formulae clearly illustrates the challenge of direct comparison between HS molecular compositions determined using different instruments.
To compare molecular profiles determined by FTICR MS for six HS samples under study, all CcHhNnOoSs assignments were used for the calculation of H/C and O/C ratios and plotted in van Krevelen diagrams as population density maps versus the common representation using discrete stoichiometries (Fig. 8). This type of data presentation allows for robust fingerprinting of integral common features in the formulae distribution over the 2D space .
For all HS samples studied, the substantial similarity of occupation density distribution was observed for the same sample measured by different laboratories. In the case of CHA and SHA-CtK, the aromatic and condensed compounds possessed the highest abundance. At the same time, the SHA-Pw sample was characterized by the significant contribution of saturated compounds. Both SFA samples and the SRHA sample were characterized by the high contribution of oxidized aromatic compounds. These trends are in sync with the data on the elemental and structural group composition analysis of the samples used in this study , , . Despite the clear outliers obtained for CHA and SHA-Ctk by laboratory d, the instruments were capable of discrimination between the HS fractions used in this study.
3.3 Similarity analysis of the FTICR MS data obtained in this study
Statistical analysis was used for assessing the similarities between the molecular compositions determined for the six HS samples used in this study by the five participating FTICR MS laboratories. Similarity heatmaps were calculated for all pairs of analyzed samples based on Euclidian distances between vectors in multidimensional space composed of the abundances of the assigned molecular formulae. The logic behind this approach is that the J score can be calculated for any two spectra (see Equation (1)). Then, the matrix with a size of N × N can be composed out of N spectra, where the positions (i, j) will be taken by the J score values varying from 0 to 1. The value 0 indicates an absence of common signals, whereas the value one indicates full coincidence of the spectra (up to differences in intensity). It is obvious that such a matrix will have one values on the diagonal, and that it will be symmetric. To graphically display such a matrix, the values could be color-coded and designated in the color bar. This type of matrix is called a “heatmap”. The type of heatmap depends on the order of spectra arrangement. If the spectra are grouped according to certain criterion, for example, according to the sample type (six samples measured by five laboratories), then the heatmap will show a 5 × 5 submatrix that characterizes the similarity of each sample across all laboratories. The heatmaps constructed for the data obtained in this study are shown in Figs. 9 and 10.
The grouping of the samples was made by source and by laboratory (Fig. 9a and b, respectively). We observed poor similarity between the data obtained by different laboratories for all HA samples (Fig. 9a). Only in case of SHA-Pw, the certain degree of similarity was observed between the laboratories b and c. At the same time, both SFA samples and the SRHA sample were characterized by significant similarity between the labs. This result could be related to the much better ionization efficiency of the polar components of FA compared to more aromatic and less oxidized HA, yielding much more comparable results.
The second tool that we used for similarity analysis was the comparison of the presence/absence of the formulae in the data set using the Jaccard similarity score calculation. The results are shown as heatmaps in Fig. 10. As can be deduced from Fig. 10, the analysis for the presence of the same peaks in the spectra measured by the different laboratories yielded poor results, with very low J score values, i.e., not exceeding 0.56 (not more than 56 % of the pairwise shared peaks). The obtained values can be interpreted as meaning that, for each peak in the spectrum, there is no confidence of its presence in the spectrum obtained by a different laboratory.
In search of ways to improve the similarity between the data obtained for the same sample by different laboratories, we have generated descriptors for the density distribution of the assigned formulae over the field of the van Krevelen diagram. For this purpose, we binned a van Krevelen diagram into 20 cells within the range of nH/nC values, from 0.2 to 2.2, and nO/nC values, from 0 to 1, as is shown in Fig. 1, followed by a calculation of the intensity-weighted contribution of each cell (Dk) using Equation (2), as proposed by Perminova . Then, we conducted a correlation analysis using the obtained descriptors and calculated the squared value of correlation coefficients (R2) for each pair of mass spectra (i.e., laboratories) for each sample. The results of this correlation analysis are shown in Fig. 11.
This analysis has shown the presence of two outliers in the data set, nominally, the CHA and SHA-CtK spectra measured by laboratory d. These were characterized with very low R2 values, seen as dark crosses in the corresponding matrices in Fig. 11a. This is consistent with a visual inspection of the van Krevelen diagrams of these samples, which differ substantially between the laboratories. Except for these clear outliers, all other correlations were characterized with high R2 values (close to 1), which shows a good applicability of this approach to the similarity analysis. It was also of interest to explore the correlations between the samples or to run the correlation analysis for each pair of samples from each laboratory. The corresponding data are shown in Fig. 11b. Except for laboratory d, we observed very high R2 values for the samples with similar origin, for example for both SFA samples and the SRHA sample, two SHA samples, and the coal HA sample, seen as the bright green and yellow squares in Fig. 11b. The correlation analysis indicates that the different FTICR MS instruments acquired similar results for the density distribution of the assigned formula over the van Krevelen diagram for the HS samples from different sources used in this study.
While conducting the above data analysis, we have established that the presence of the zero values corresponding to the non-populated cells in the correlation data set increased the correlation relationship between the sets on the account of “0” data. These “0” data can be clearly seen in Fig. 1, representing occupation density Dk (Equation (2)) of the least populated cells, e.g., the cells from 1 to 5, or from 15 to 20. This is why we ran the corresponding analysis by excluding all 0 values (i.e., non-populated cells) from the data set. A good countermeasure for the appearance of 0 values is to enlarge the areas of the van Krevelen diagram. This can be done by ascribing a sum of the Dk values to the certain compound class using mapping as described by Perminova . Another approach is to bin the van Krevelen diagram with regard to the AI value, as described by Koch et al. . We conducted this alternative binning of the van Krevelen diagram into the nine areas, as described in Table 1, and used the obtained descriptors for the correlation analysis. The results are shown in Fig. 12.
It was found that a use of the coarser binning of the van Krevelen diagram into nine areas gave slightly worse results than the finer splitting into 20 cells, both for the pairwise correlation (each pair of laboratories for each sample) and when pairing by laboratory (each pair of samples for each laboratory). However, the results were in general consistent and supportive of a high similarity of population density of the compositional space of the six HS samples used in this study measured by five FTICR MS spectrometers.
The intercomparison study conducted on six HS samples of different origin and five FTICR MS instruments revealed substantial inconsistency between the data obtained on different instruments. Significant discrepancy in formulae assignment was obtained for SRHA sample (IHSS standard): out of 9193 formulae determined for the SRHA sample by five spectrometers, only 13 % (1260) belonged to the common pool determined by all five spectrometers, while 45 % (4272) were shared by two or more spectrometers, and 40 % (3661) were unique, detected by only one spectrometer. This finding indicates that particular caution should be exercised when specific properties of the samples are attributed to unique species assigned from the FT-ICR MS data.
This observation was supported by a comparison of the extended data composed of mass spectra for SRHA, for four soil HS samples, and for one coal HS sample, which were acquired by five intercomparison laboratories (30 mass spectra in total). Again, a high discrepancy was observed in the presence/absence statistics as estimated by Jaccard similarity score: the spectra measured by the different laboratories yielded poor results with very low J score values, not exceeding 0.56 (not more than 56 % of the peaks were shared pairwise). The obtained values indicate that, for each peak in the spectrum, there is no confidence of its presence in the spectrum obtained by a different laboratory. However, it should be specifically mentioned here that, in this study, no attempt was made to unify the setup of the instruments: FTICR MS measurements were performed under the conditions routinely applied in each laboratory. Some uncertainty could also be imposed by the inadequately low sample mass (1 mg) recommended for weighting in our guidelines. This might cause some variation in the concentration used due to an error of analytical balance. Our hope is that this impact was minimized during the adjustment of one of the major instrument parameters, the total ion current, which should always be maintained at the same level by each group by the simple tuning of ion accumulation time. Thus, our results are indicative of substantial differences in the ion-transfer and ion-excitation FTICR instrument parameters at the laboratories participating in the present study, as was demonstrated by Soule et al.  and Sleighter and Hatcher .
On the positive side, much stronger data consistency was observed for the population density of van Krevelen diagrams. The density distribution of molecular formulae over the entire field of the van Krevelen diagram was much more reproducible by different spectrometers than single formulae assignments. This was observed for binning the space of a van Krevelen diagram into nine regions as well as into 20 cells. Moreover, the closest correlation was observed for pairs of fulvic acids and for pairs of humic acids. This is indicative of a correct grouping of the samples by origin by different instruments.
The obtained results demonstrate that caution should be exercised when assigning unique formulae observed by FTICR MS instruments responsible for the specific properties of the measured samples: the other spectrometer might never find this same formula in the sample. At the same time, it can find a formula of the similar chemical class in the closest vicinity of the formula observed by the former spectrometer. This might be caused by differences in ionization parameters, variations in instrument architectures, ion collisions in the cell, etc. The presence of these “hidden variables” corrupts the data quality provided by FTICR MS in terms of reproducibility of both peak position (observed mass) and intensity (abundance). That is why the reliable similarity among the data obtained in this intercomparison study was achieved only by transforming a singular point (stoichiometry) in a van Krevelen diagram into a sizeable pixel (a number of closely located stoichiometries). At the moment, it is the minimum size of this pixel that determines the “working” resolution of the method, by enabling the generation of the data compliant with the data quality requirements, such as the reproducibility of the data measurements on different instruments.
5 Suggestions and outlook
Given the data obtained, the straightforward suggestion on the side of the data quality experts could be to determine all “hidden variables” that impact the data acquisition of FTICR mass spectra of NOM and HS and take them into consideration in order to achieve much more reproducible data. This is a reasonable approach, and a much more detailed and strict measurement protocol might be elaborated for future intercalibration studies on FTICR MS analysis of the constitutional space of NOM and HS. The question is whether the operating laboratories will be able to follow such an “overdefined” measurement protocol falling far beyond their routine measurement conditions. For this reason, we see the enhanced scrutiny in data acquisition as only a part of the solution to the identified problem of low reproducibility of FTICR MS data for HS and NOM. A much bigger challenge is the adaptation of the reproducibility concept, which is regarded as a central methodological criterion of the sciences  and stems from the analysis of static systems with invariant constitutions,  to constitutionally dynamic complex systems . This is the dynamic diversity inherent within these systems, which enables their capability for variation and selection. According to the work of Jean-Marie Lehn,  constitutional dynamics introduces a paradigm shift from constitutionally static chemistry with regard to materials design. The same paradigm shift should take place in estimating the reproducibility (data quality assessment) of measurements of the constitutional space of dynamic systems, such as humic substances and natural organic matter.
There is a lot of discussion in the modern literature with regard to the changing meaning of statistical criteria once they are applied to complex systems , , . This is the central issue for the modern philosophy of science , as well as for researchers measuring, designing, and predicting the behavior of complex systems in the fields of biomedicine, polymer chemistry, neuroscience, ecology, and cognitive science, to name a few , , , . The most promising scientific response to the “crisis of reproducibility”  is in constructing multi-scale mathematical and dynamic computational models using machine learning approaches. In the case of the problem under study, the feasible solution may be the construction of a probabilistic model that would link the actual molecular composition of the HS or NOM sample, characteristics of the spectra acquisition, and computing process of formulae assignment to the observed molecular composition of the sample. If successful, such a model could facilitate the recovery of comparable data for laboratory 1 (an unknown) based on the known data provided by laboratories 2, 3, and 4. The unknown data for laboratory 1 could be obtained in the form of probabilities of presence for each molecular formula or of probability distribution functions for intensities. For solving similar tasks, see, e.g., Chapter 6 of , on probabilistic modeling. We believe that the full-scale application of probabilistic modeling, both to the reproducible measurements of molecular compositions of humic systems and to predicting their properties, will lead to the creation of molecular systematics and to the facilitation of structure-activity modeling for the dynamic systems of non-living organic matter.
Supporting Information for this paper includes a table with the description of the participating laboratories and the files with raw mass lists and assigned formulae, and source code (jupyter notebooks), which were used for data analysis. This supporting information can be found at the repository https://github.com/rukhovich/IUPAC-project. The detailed description is available in this repository in the Readme.md file.
Membership of sponsoring bodies
Membership of the IUPAC Division on Chemistry and the Environment (2016–2017) during the preparation of this report was as follows: President (2016–2017): Petr Fedotov (Russia); Vice President: Rai Kookana (Australia); Past President: Laura McConnell (USA); Secretary: Hemda Garelick (UK); Titular Members: Manos Dassenakis (Greece), Philippe Garrigues (France), Irina V. Perminova (Russia), Heinz Rüdel (Germany), John B. Unsworth (UK), Baoshan Xing (USA); Associate Members: Bradley Miller, Associate Member (USA), Gijs A. Kleter (Netherlands), Nadia G. Kandile (Egypt), Roberto Terzano (Italy), Guibin Jiang (China/Beijing), Stefka Tepavitcharova (Bulgaria); National Representatives: Ana Aguiar-Ricardo (Portugal), Luke Chimuka (South Africa), Doo Soo Chung (Korea) Mohammad Din (Pakistan), Annemieke Farenhorst (Canada), Yong-Chien Ling (China/Taipei), Pradeep Kumar (India), Nelly Mañay (Uruguay), Anna-Lea Rantalainen (Finland), Edgar Resto (Puerto Rico)
Funding source: International Union of Pure and Applied Chemistry
Funding source: Russian Foundation for Basic Research
Award Identifier / Grant number: 16-04-01753
Award Identifier / Grant number: 18-29-25065
Award Identifier / Grant number: 18-32-20116 mol_a_ved
Funding source: Russian Science Foundation
Award Identifier / Grant number: 19-14-00306
Award Identifier / Grant number: 20-63-47070
We appreciate valuable contributions in the presented results of the Task Group members of this project who did not appeared in the co-authors' list of this report, nominally: Rob Fatland (USA), Kirk Hatfield (USA), Harold Klammler (Brasil), Ajit Shah (Great Britain), and Aaron Stubbins (USA). We appreciate the deeply thorough reading and thoughtful comments of the anonymous reviewers on the initial text of the manuscript, which enabled its substantial improvement.
The study was conducted in the framework of the IUPAC project 2016-015-2-600 (https://iupac.org/project/2016-015-2-600) and partially supported by the IUPAC funding. The support of RFBR project 16-04-01753 (IP, AZ, DK), 18-29-25065 (IP), and 18-32-20116 mol_a_ved (ES). FTICR MS experiments (AZ) were supported by RSF grant no 19-75-00092. Statistical treatment of FTICR MS data (IP, GR) were partially supported by RSF grant 20-63-47070. A portion of this work was performed at the National High Magnetic Field Laboratory ICR User Facility, which is supported by the National Science Foundation Division of Chemistry through DMR-1644779 and the State of Florida.
 F. Stevenson. Humus Chemistry: Genesis, Composition, Reactions, p. 512, Wiley, New York, 2nd ed. (1994).Search in Google Scholar
 D. S. Orlov. Humic Substances of Soil and General Theory of Humification, p. 266, CRC Press, USA, 1st ed. (1995).Search in Google Scholar
 W. Ziechmann. Huminstoffe, p. 408 S, Verlag Chemie, Weinheim (1980).Search in Google Scholar
 I.V. Perminova, A. Gaspar, Ph. Schmitt-Kopplin, N. A. Kulikova, A. I. Konstantinov, N. Hertkorn, K. Hatfield. in Biophysico-Chemical Processes Involving Natural Nonliving Organic Matter in Environmental Systems, N. Senesi, B. Xing (Eds.), Chapter 13. pp. 487–538, IUPAC Book Series, Wiley (2009).10.1002/9780470494950.ch13Search in Google Scholar
 A. G. Marshall, C. L. Hendrickson, G. S. Jackson. Mass Spectrom. Rev.17, 1 (1998).10.1002/(SICI)1098-2787(1998)17:1<1::AID-MAS1>3.0.CO;2-KSearch in Google Scholar
 N. Hertkorn, C. Ruecker, M. Meringer, R. Gugisch, M. Frommberger, E. M. Perdue, M. Witt, P. Schmitt-Kopplin. Anal. Bioanal. Chem.389, 1311 (2007).10.1007/s00216-007-1577-4Search in Google Scholar
 A. Stubbins, R. G. M. Spencer, H. Chen, P. G. Hatcher, K. Mopper, P. J. Hernes, V. L. Mwamba, A. M. Mangangu, J. N. Wabakanghanzi, J. Six. Limnol. Oceanogr.55, 1467 (2010).10.4319/lo.2010.55.4.1467Search in Google Scholar
 A. Y. Zherebker, I. V. Perminova, A. I. Konstantinov, A. B. Volikov, Y. I. Kostyukevich, A. S. Kononikhin, E. N. Nikolaev. J. Anal. Chem.71, 372 (2016).10.1134/S1061934816040109Search in Google Scholar
 A. Zherebker, E. Shirshin, O. Kharybin, Y. Kostyukevich, A. Kononikhin, A. I. Konstantinov, D. Volkov, V. A. Roznyatovsky, Y. K. Grishin, I. V. Perminova, E. Nikolaev. J. Agric. Food Chem.66, 12179 (2018).10.1021/acs.jafc.8b04079Search in Google Scholar PubMed
 G. Vladimirov, C. L. Hendrickson, G. T. Blakney, A. G. Marshall, R. M. A. Heeren, E. N. Nikolaev. J. Am. Soc. Mass Spectrom. 2012, 23 (2), 375.10.1007/s13361-011-0268-8Search in Google Scholar PubMed
 D. V. Kovalevskii, A. B. Permin, I. V. Perminova, D. V. Konnov, V. S. Petrosyan. Moscow Univ. Chem. Bull.40, 375 (1999).Search in Google Scholar
 R. Sleighter, P. G. Hatcher. In: Fourier Transforms - Approach to Scientific Principles, G. Nikolic (Ed.), pp. 295–320, InTech (2011). ISBN: 978-953-307-231-9, InTech, Available from. http://www.intechopen.com/books/fourier-transforms-approach-to-scientificprinciples/fourier-transform-mass-spectrometry-for-the-molecular-level-characterization-of-natural-organicmatt.Search in Google Scholar
 H. Tetens. in Enzyklopädie Philosophie und Wissenschaftstheorie Band 3, J. Mittelstraß (Ed.), p. 593f, Metzler, Stuttgart, Germany (1995).Search in Google Scholar
 J. M. Lehn. Top. Curr. Chem.322, 1 (2012).Search in Google Scholar
 J. Winn, C. Bishop, T. Diethe J. Guiver, Y. Zaykov. Model-Based Machine Learning. ©2013–19 Winn, MBML v0.7, Microsoft Research Cambridge, Cambridge (2013).Search in Google Scholar
The online version of this article offers supplementary material (https://doi.org/10.1515/pac-2019-0809).
© 2020 IUPAC & De Gruyter. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. For more information, please visit: http://creativecommons.org/licenses/by-nc-nd/4.0/