BY-NC-ND 3.0 license Open Access Published by De Gruyter April 2, 2013

Quantitative spectroscopic analysis of heterogeneous systems: chemometric methods for the correction of multiplicative light scattering effects

Juan Zhang, Sheng-Zi Liu, Jing Yang, Mi Song, Jing Song, Hai-Li Du and Zeng-Ping Chen

Abstract

Owing to the high measuring speed and requirement of less or even no sample preparation, spectroscopic technologies have been increasingly applied in areas of food technology, agriculture, and pharmaceutics where heterogeneous samples are more frequently encountered than homogeneous ones. For heterogeneous samples, the potential uncontrolled variations in optical path length due to the physical variations such as particle size and distribution, shape and sample packing may cause dominant multiplicative light scattering perturbations which will mask the spectral variations related to the content differences of chemical compounds in samples and hence deteriorate the accuracy and reliability of quantitative spectroscopic analysis. Consequently, with a view to improve the accuracy and reliability of spectroscopic analysis for heterogeneous samples, a number of chemometric techniques have been developed to correct for the effects of multiplicative light scattering. This paper reviews currently available correction techniques for absorption spectroscopy with emphasis on their theories, limitations, and suitability for practical applications.

Introduction

Spectroscopic technologies have advantages of high measuring speed and requirement of less or even no sample preparation, which make them highly suitable for in-line/on-line process monitoring (Siesler et al. 2002). As a result, spectroscopic technologies such as near-infrared (NIR), mid-infrared, UV-Vis, Raman, X-ray diffraction, etc. have been increasingly applied to process monitoring in areas of food technology (Ghosh and Jayas 2009, Lu and Rasco 2012), agriculture (Stevens et al. 2006, Zhang et al. 2012), and particularly pharmaceutics. In the pharmaceutics area, for example, spectroscopic techniques were used in the development and production of active pharmaceutical ingredients (Virtanen and Maunu 2010, Sun et al. 2010), fermentation (Lopes and Menezes 2003, Jørgensen et al. 2004, Ferreira and Menezes 2006, Nordon et al. 2008), crystallization (Birch et al. 2005), granulation (Jørgensen et al. 2002), blending (Blanco et al. 2002, Popo et al. 2002), drying (Parris et al. 2005, De Beer et al. 2007), and also in the monitoring of raw materials (Kirdar et al. 2010). During in-line/on-line quantitative monitoring of complex chemical and biochemical processes using spectroscopic techniques, calibration models are indispensable to transform abundant spectroscopic measurements into desired chemical and biological information (in most cases the concentrations of chemical and biological compounds) inherent within spectroscopic measurements. Calibration methods used in quantitative spectroscopic process monitoring are mostly multivariate linear calibration models. The underlying assumption of multivariate linear calibration models is that a linear relationship between spectroscopic measurements and concentrations of target chemical components exist. Such an assumption generally holds when spectroscopic techniques are applied to homogeneous samples. However, in industrial on-line and in-line applications, heterogeneous samples are more frequently encountered than homogeneous ones. For heterogeneous samples, the relationship between the spectroscopic measurements and the concentrations of the target chemical components might significantly deviate from a linear model. At present, there are still many practical problems left to be resolved in quantitative analysis of heterogeneous samples using spectroscopic techniques, among which, the variation in optical path length due to physical differences between samples may result in multiplicative light scattering and hence significantly deteriorate the reliability and predictive accuracy of multivariate calibration models. The development of reliable calibration models for processes subject to variations in physical properties (such as particle size, compactness of samples, surface topology, etc.) are increasingly becoming a matter of concern. Clearly, accurate quantitative spectroscopic analysis of heterogeneous samples can only be possible if appropriate correction techniques can be used to compensate the effects of multiplicative light scattering. During the past three decades or so, a number of correction methods have been developed to address the effects of multiplicative light scattering on quantitative analysis of heterogeneous samples by absorption spectroscopy (Geladi et al. 1985, Barnes et al. 1989, Helland et al. 1995, Ishimaru 1997, Pedersen et al. 2002, Martens et al. 2003, Chen et al. 2006, 2008, 2009, 2011, 2012, Thennadil et al. 2006, Chen and Morris 2008, 2009, Kessler et al. 2009, Steponavicius and Thennadil 2009, 2011, Ottestad et al. 2010, Wang et al. 2011, Jin et al. 2012), fluorescence spectroscopy (Durkin et al. 1994, Sterenborg et al. 1996, Horowitz et al. 1998, Pogue and Burke 1998, Finlay et al. 2001, Weersink et al. 2001, Chen et al. 2002, Biswal et al. 2003, Bradley and Thorniley 2006), and Raman spectroscopy (Bechtel et al. 2008, Shih et al. 2008, Barman et al. 2009). Bradley and Thorniley (2006) have recently reviewed the correction techniques for fluorescence spectroscopy. This paper focuses on only the chemometric correction techniques available for absorption spectroscopy. For convenience of presentation, these correction methods are broadly classified into three categories, namely empirical techniques, semi-empirical techniques, and physical model-based techniques, and reviewed in terms of their theories, limitations, and suitability for practical applications.

Empirical techniques

For I transparent solutions comprising K absorbing chemical components, where the cuvette width is kept constant during the recording of each measurement, according to Beer-Lambert’s law, the theoretical absorbance spectrum (xi,Chem, row vector) of sample i is a linear combination of the absorbance contributions of all K components:

where the row vector, rk, is the pure absorption spectrum, and ci,k is the concentration of the k-th component in sample i. By assuming rk (k=1,2,…,K) are linearly independent, the multivariate linear calibration model built between xi,Chem and ci,k (i=1,2,…,I) can provide satisfactory predictions for the concentrations of component k in future solution samples. If the samples to be analyzed are solid (powder, granules) or emulsions and dispersions, it is practically challenging to make the optical path length constant across samples. Methods for correcting the effects of multiplicative light scattering due to the variation in optical path length are therefore indispensable for accurate quantitative spectroscopic analysis of heterogeneous samples involving solids.

Multiplicative scatter correction (MSC)

The underlying assumption of MSC developed by the Martens group (Geladi et al. 1985) is that the relationship between the spectrum of a sample and the mean spectrum of a set of samples seems to be approximately linear.

where xi is the spectrum of the i-th sample; 1 is a row vector with its elements equal to unity; ei represents the residual spectrum, which ideally represents the chemical information in spectrum i. With the model parameters ai and bi being estimated by least squares method, the corrected spectrum xi,MSC can be calculated according to Eq. (3).

The results of MSC are reliable only when it is applied to a certain part of the spectrum that does not contain chemical information and is only influenced by multiplicative light scattering. If the above condition is not satisfied then the estimated intercept and slope may contain spectral information relating to the analyte of interest, which will be lost during the implementation of the correction procedure.

Inverted scatter correction (ISC) and its extension

ISC proposed by Helland et al. (1995) is an alternative procedure similar to MSC for the correction of multiplicative light scattering. In contrast to MSC where each spectrum is regressed on a reference spectrum (the mean spectrum), ISC takes the reference spectrum as the regressand and the spectrum to be corrected as the regressor.

The above so-called “forward” model adopted by ISC materializes in a statistical difference between ISC and MSC. ISC attempts to minimize the noise on the average/reference spectrum instead of the noise on the individual spectra as does MSC. Despite this subtle statistical difference, ISC still has the same application limitations as discussed for MSC.

ISC was further extended by Pedersen et al. (2002) to include linear and quadratic terms of the wavelengths and a quadratic term relating to the spectrum to be corrected.

where the wavelength row vector λ is a linear function of the number of wavelengths (nanometers) and the entries lie between -1 and +1. As stated by Pedersen et al. the quadratic term relating to the spectrum to be corrected was included to model complex optical phenomena caused by the heterogeneity of samples other than multiplicative light scattering. The inclusion of linear and quadratic terms of the wavelengths was to account for the presence of possible smooth wavelength-dependent spectral variations that may be present between samples, and hence improve the estimation of the basic interference effects, with the intercept and slope relating to multiplicative light scattering. Nevertheless, either ISC or extended inverted signal correction (EISC) has the same application limitations as discussed for MSC.

Standard normal variate (SNV) transformation

SNV (Barnes et al. 1989) transformation removes the effects of multiplicative light scattering on each spectrum by subtracting the spectrum mean and scaling with the spectrum standard deviation. In contrast to MSC, the transformation is applied to each spectrum individually.

where xij,SNV is the transformed element for the original spectral element xij of the i-th sample at variable j; x̅i. denotes the mean of spectrum i; and J is the number of variables (i.e., wavelengths) in the spectrum. SNV can indeed effectively correct the effects of multiplicative light scattering on each spectrum. However, it introduces another multiplicative parameter (related to the variations in concentrations of constituents across samples) to the spectrum being corrected, which has similar detrimental effects on quantitative results of spectroscopic analysis as multiplicative light scattering.

Semi-empirical techniques

Extended multiplicative scatter correction (EMSC)

In EMSC (Martens et al. 2003, Ottestad et al. 2010), the effect of light scattering caused by changes in optical path length due to physical variations of samples is approximated by the following model:

where the coefficients ai and bi denote the additive and multiplicative effects of light scattering due to the physical variations of sample i relative to a reference sample. The coefficients di and gi are introduced to account for the smooth wavelength-dependent spectral variations that may be present between samples. ɛi captures the unknown sources of spectral variations. As xi,chem can instead be expressed in terms of variations around a chosen reference spectrum m (row vector):

where m is some reference spectrum, for example, measured in a “typical” sample or computed as the mean of a set of spectra; Δci,k represents the concentration difference of the k-th constituent between the i-th sample and the reference m. Eq. (9) can then be rewritten as:

If the pure spectra of all the chemical components in the mixture samples are known a priori, and all the regressors (1, m, rk, λ, and λ2) are expected to be linearly independent, the coefficients ai, bi, hi,k, di, and gi can be estimated by least squares regression of the regressand, xi, on the model regressor matrix, M=[1; m, r1; r2; …; rK; λ; λ2]. Hence the corrected spectrum xi,EMSC can be calculated according to Eq. (12).

As can be seen from Eq. (11), the applicability of EMSC depends on the availability of pure spectra for all chemical components present in the samples and the consistency of spectral contributions from components in the mixtures with components isolated in pure state. Such stringent requirements can hardly be satisfied in real world applications.

Extended multiplicative scatter correction in logarithm form

Thennadil et al. further modified EMSC by converting spectral measurements into logarithm form (Thennadil et al. 2006).

To estimate the model parameters, it is assumed that log(xi,chem) can be approximated by the linear combination of the log of the pure component spectra and the reference spectrum m.

where

(k=1,2,…,K) are model parameters, which do not directly relate to the concentrations of constituents in samples. Having the pure component spectra at hand, the coefficients ai, bi, and di can be estimated by a similar procedure as adopted by EMSC, and the corrected spectra (xi,corr) are obtained according to Eq. (15).

Clearly, EMSC in logarithm form has the same intrinsic limitations as EMSC, and possesses no obvious advantages over EMSC.

Optical path length estimation and correction (OPLEC)

The parameter estimation and spectral correction of both EMSC and its modification version rely on the availability of the pure spectrum of every chemical component in mixture samples. This limitation results from the requirement of all model parameters in the spectral correction step. As a matter of fact, when the ultimate goal of correcting the effects of multiplicative light scattering is to build a robust and reliable calibration model, only parameter bi containing information about the multiplicative effects of optical path length variation is needed. Therefore, the estimation of parameter bi for both calibration and test samples from the corresponding spectra is the key to the correction of multiplicative light scattering effects in the case where there is no prior information about the pure spectra of all chemical components in mixture samples. OPLEC and its modification version proposed by the Chen group (Chen et al. 2006, Jin et al. 2012) provide an elegant solution to this problem.

OPLEC and its modification version adopt the following model to approximate the relationship between spectral measurements and the concentrations of chemical constituents in mixture samples.

Let us arbitrarily assume that the first chemical component in Eq. (16) is the analyte of interest, and

(which strictly holds for ci,k representing unit-free concentration such as weight fraction and mole fraction), then Eq. (16) can also be expressed as:

Suppose the singular value decomposition of Xcal (Xcal=[x1; x2; …; xI]) can be expressed as follows:

where superscript ‘T’ denotes the transpose; subscripts ‘r’ and ‘n’ signify that the corresponding factors represent spectral information and noise, respectively. According to Eq. (17), both vectors b=[b1; b2;…; bI] and diag(c1)b=[b1×c1,1; b2×c2,1; …; bI×cI,1] are in the column space of Ur, so the following equations holds:

Because there is no need to know the absolute values of the elements of b, it can therefore be assumed that each element of b is no less than unity (i.e., b1). Vector b satisfying Eq. (19) can be obtained by solving the following constrained optimization problem, which can be resolved by the quadprog function in MATLAB (see Appendix 1 for the detailed MATLAB code).

where ‖ ‖2 denotes l2 norm; w is a weight to balance the two parts in the above optimization function. It can be simply set to be the maximum element of c1(c1=[c1,1; c2,1; …; cI,1]).

It is obvious from Eq. (17) that linear relationships exist between xi and bi, and between xi and ci,1bi as well. After the estimation of multiplicative parameter vector b of I calibration samples, the following two calibration models can be built by multivariate linear calibration methods, for example, partial least squares (PLS).

Here, diag(c1) denotes the diagonal matrix in which the corresponding diagonal elements are the elements of c1. Once the model parameters β1, and β2 are estimated by PLS, the concentration (ctest,1) of the target constituent in the test sample can then be accurately predicted from its spectrum (xtest) through a dual calibration strategy – dividing the prediction of the second calibration model by the corresponding prediction of the first calibration model.

OPLEC has some unique features. It realizes the estimation of parameter bi of calibration samples accounting for the multiplicative effects of optical path length variation without the requirement of the pure spectra of chemical constituents in mixture samples. Moreover, instead of preprocessing spectra using the estimated parameters to correct the multiplicative light scattering effects as all the other empirical and semi-empirical methods do, OPLEC adopts a unique “dual calibration strategy” to unconfound the concentration information of the target constituent in test samples from multiplicative light scattering effects. As a result, OPLEC has much wider potential application (Chen and Morris 2008, 2009, Chen et al. 2008, 2009, 2011, 2012) than other existing methods designed for correcting multiplicative light scattering effects. Wang et al. (2011) compared the performance of a number of chemometric methods designed for correcting multiplicative light scattering effects. Their results suggested that OPLEC is the most promising one among the investigated chemometric methods.

Physical model-based techniques

Radiative transfer equation based scatter correction and calibration method

The main source of variation in absorption in turbid media is the path length of photons. In principle, this variation can be eliminated by obtaining a measure of absorption per unit length, which is independent of path length, using radiative transfer theory. The change in the intensity of light of a given wavelength traveling through a sample in a certain direction is described by the radiative transfer equation (RTE) (Ishimaru 1997):

where I(r, z, λ) is the specific intensity at wavelength λ at a distance r from the source along directional vector z; μa(λ) and μs(λ) are the bulk absorption and scattering coefficients at wavelength λ, respectively; μt (λ) is the total extinction coefficient at wavelength λ; ω is the solid angle. The bulk absorption and scattering coefficients are proportional to concentrations of absorbing and scattering components, respectively. And

is the phase function, which is a measure of the angular distribution of scattered light and can be represented as a function of the anisotropy factor g(λ) through the use of the Henyey-Greenstein approximation.

For a system with multiple constituents, the bulk absorption and scattering coefficients are the sum of the respective coefficients of individual constituents:

where cp,k, σap,k(λ), and σsp,k(λ) are the concentration expressed as the number of particles per unit volume, absorption, and scattering cross-sections of the particulate species k at wavelength λ, respectively; σa,l(λ) and cl represent the absorptivity and concentration of the purely absorbing species l, respectively; and np and na is the number of different particulate species and purely absorbing species present in the sample, respectively. As can be seen from Eqs. (23)–(26), the RTE at each wavelength is defined by three variables μa(λ), μs(λ), and g(λ). At least three measurements at each wavelength are needed to extract these three variables. One can observe that μa(λ) is a measure of absorption per unit length and it is independent of the path length traveled by the photons.

Based on the above RTE, Steponavicius and Thennadil proposed a methodology for estimation of concentrations of chemical components in suspensions (Steponavicius and Thennadil 2009, 2011). The RTE-based scatter correction and calibration method is essentially a two-step procedure: (1) acquisition of the bulk absorption coefficient μa(λ) from three spectral measurements (i.e., total diffuse transmittance, total diffuse reflectance, and collimated transmittance) by inverting the RTE; and (2) extraction of pertinent chemical information from μa(λ). As there is no analytical solution to the RTE, the inverse adding-doubling algorithm was used to obtain numerical solutions. After extracting the bulk absorption coefficient, EMSC was used to preprocess the resulting absorption spectra (i.e., the bulk absorption coefficient as a function of wavelength) with a view to further reduce unwanted variations. Finally, a multivariate calibration model was built on the preprocessed absorption spectra for estimating the concentration of the chemical component of interest.

Experimental results for proof-of-concept model systems reveal that the RTE-based scatter correction and calibration method can to some extent effectively correct the multiplicative light scattering effects. However, the requirement of measuring the total diffuse transmittance, total diffuse reflectance, and collimated transmittance of samples and the acquisition of the bulk absorption coefficient μa(λ) from the three spectral measurements by the inverse adding-doubling algorithm make it applicable to only off-line spectroscopic analysis and greatly hinders its application in on-line/in-line quantitative monitoring of complex chemical and biochemical processes.

Correction method based on multivariate curve resolution with hard model constraints

Recently, Kessler et al. (2009) proposed to correct the multiplicative light scattering effects by the combination of multivariate curve resolution (MCR) (De Juan and Tauler 2003) with the Kubelka-Munk model. The Kubelka-Munk function shown in Eq. (27) (Kubelka and Munk 1931, Kubelka 1948) is a simplified solution of radiative transfer theory.

where R describes the reflectance of an optically infinite thick sample, κ stands for absorption coefficient and s for scattering coefficient, which are correlated but not identical to the absorption and scattering coefficients μa(λ) and μs(λ) of the RTE (Thennadil 2008). If measurements of the reflectances Rd of an optically thin sample with known geometrical thickness d, and R of an optically thick sample are available, the absorption and scattering coefficients κ and s for each wavelength can be calculated from Eqs. (28) and (29).

Once the absorption and scattering spectra κ and s (i.e., the absorption and scattering coefficients κ and s as a function of wavelength) are calculated from the two reflectance spectra R and Rd as described above, they are used as constraints for the multivariate curve resolution by alternating least squares (MCR-ALS) algorithm.

The measured spectral data of the mixtures can be arranged in a data matrix D(I×J). The spectra build the I rows of D and the measured response at each wavelength gives the J columns. The matrix C(I×K) describes the individual contributions (concentration profiles) of K species involved in the given spectra. The matrix ST(J×K) is then the spectral contribution of these K species in the J columns of the data matrix (pure spectra profiles). E(I×J) is the residual matrix which contains the data variance unexplained by the product CST. One important and frequently used iterative approach to solve Eq. (30) is MCR-ALS (Azzouz and Tauler 2008). The optimization process starts from initial guesses of C and ST that are then refined to yield profiles with chemical meaning.

The advantage to use absorption and scattering spectra κ and s calculated from two reflectance spectra R and Rd as hard model constraints together with MCR-ALS is improved predictions of calibration models. However, just as the RTE-based scatter correction and calibration method, the requirement of measuring two reflectance spectra R and Rd of the same sample with different geometrical thicknesses also prevent it from applying to on-line/in-line quantitative monitoring of complex chemical and biochemical processes.

Application studies

In the following, some typical applications have been selected to illustrate the applicability of some of the correction methods discussed above.

NIR transmittance spectra of powder mixtures (Martens et al. 2003, Chen et al. 2006)

Five mixtures of gluten and starch powder with different weight ratios (1:0, 0.75:0.25, 0.5:0.5, 0.25:0.75, and 0:1) were prepared. For each of the five powder mixtures, five samples were randomly taken and loosely packed into five different glass cuvettes. Two consecutive transmittance spectra between 850 nm and 1050 nm were recorded for each sample. Subsequently, each sample was packed more firmly, and a further two consecutive transmittance spectra were recorded resulting in a total of 100 spectra. Each of the 100 transmittance spectra was transformed into absorbance spectra. The whole spectral data set was divided into calibration set and test set. Sixty spectra from the three mixtures with the ratio of gluten/starch equal to 1:0, 0.5:0.5, and 0:1 formed the calibration set. The test set comprised the remaining 40 spectra from the other two mixtures. More details on this spectral data are given in the original paper of Martens et al. (2003).

As shown in Figure 1, the 20 replicates of the NIR absorbance spectra for each of the five mixtures of gluten and starch powder differ significantly, suggesting the presence of significant multiplicative light scattering effects. When the optimal PLS model with nine underlying components built on the raw spectra of the calibration samples was used to predict the contents of the target constituent (gluten) in the test samples, its root mean square error of prediction for the test samples (RMSEPtest) was 0.024 (equivalent to an average relative error of 6.0%), which clearly demonstrates that PLS did not effectively model the multiplicative light scattering effects in the raw spectra (Figure 2). PLS models built on the calibration spectra preprocessed by MSC, SNV, or EISC gave unacceptable predictions with errors even larger than those attained by PLS models generated from raw calibration spectra. The failure of MSC, SNV, and EISC on this powder mixture data suggests that they are not suitable for samples with significant spectral variations resulting from changes in chemical composition. The application of either EMSC or OPLEC greatly improved the predictive accuracy of PLS models, and attained the same RMSEPtest value of 0.005, equivalent to an average relative error with a magnitude of 1.0%. These results demonstrated the superiority of semi-empirical methods over empirical ones in the correction of the detrimental effects of multiplicative light scattering.

Figure 1 The raw absorbance spectra of mixtures of gluten and starch powder with different weight ratios (black lines, 1:0; blue lines, 0.75:0.25; red lines, 0.5:0.5; green lines, 0.25:0.75; cyan lines, 0:1). (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2006, 78, 7674–7681. Copyright 2006 American Chemical Society.)

Figure 1

The raw absorbance spectra of mixtures of gluten and starch powder with different weight ratios (black lines, 1:0; blue lines, 0.75:0.25; red lines, 0.5:0.5; green lines, 0.25:0.75; cyan lines, 0:1). (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2006, 78, 7674–7681. Copyright 2006 American Chemical Society.)

Figure 2 Predictive performance of PLS models built on spectra of the calibration powder samples preprocessed by different methods (red circle, the raw spectra; black star, MSC; cyan diamond, SNV; blue square, OPLEC; yellow triangle up, EMSC; green triangle down, EISC). (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2006, 78, 7674–7681. Copyright 2006 American Chemical Society.)

Figure 2

Predictive performance of PLS models built on spectra of the calibration powder samples preprocessed by different methods (red circle, the raw spectra; black star, MSC; cyan diamond, SNV; blue square, OPLEC; yellow triangle up, EMSC; green triangle down, EISC). (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2006, 78, 7674–7681. Copyright 2006 American Chemical Society.)

Tecator data (Borggaard and Thodberg 1992, Jin et al. 2012)

Tecator data are available at http://lib.stat.cmu.edu/datasets/tecator. This benchmark spectral data set is composed of NIR absorbance spectra of 240 finely chopped pure meat samples with different moisture, fat, and protein contents recorded on a Tecator Infratec Food and Feed Analyzer in the wavelength range of 850–1050 nm with an interval of 2 nm (Borggaard and Thodberg 1992). The fat content in each sample was determined by a Soxhlet method. The Soxhlet values ranged from 2% to 59% fat. The 240 spectra were originally divided into calibration set (129 samples), test set (43 samples), validation set (43 samples), extrapolation set for fat (8 samples), and extrapolation set for protein (7 samples). The extrapolation set for protein is excluded in this review, because fat was taken as the target constituent in the present review.

It can be seen from Figure 3 that there are significant additive baseline effects in the tecator data. The presence of significant additive baseline effects strongly suggests the existence of multiplicative effects, because the changes in physical properties of samples generally result in both additive baseline effects and multiplicative effects. Owing to the existence of multiplicative effects, the optimal PLS calibration model on the raw calibration spectra did not give satisfactory predictions for all four data sets with RMSEP values of 1.7%, 2.7%, 2.3%, and 8.5% for the calibration, test, validation, and extrapolation sets, respectively. The application of SNV and MSC saw no significant improvements in the RMSEP values for the four data sets (Figure 4). Surprisingly, EISC succeeded in improving the quality of the predictions of the PLS calibration model for tecator data. Nevertheless, its RMSEP value for the extrapolation set was still as high as 3.3%. In contrast, the modified OPLEC (OPLECm) outperformed all other methods. Its RMSEP values for the calibration, test, validation, and extrapolation sets were 0.4%, 0.5%, 0.4%, and 1.0%, respectively, which further confirmed the effectiveness of OPLECm in correcting the detrimental influence of multiplicative effects.

Figure 3 The 129 raw calibration spectra of tecator data.

Figure 3

The 129 raw calibration spectra of tecator data.

Figure 4 The RMSEP values for tecator data obtained by different calibration methods. (Reprinted with permission from J.W. Jin et al. Anal. Chem.2012, 84, 320–326. Copyright 2006 American Chemical Society.)

Figure 4

The RMSEP values for tecator data obtained by different calibration methods. (Reprinted with permission from J.W. Jin et al. Anal. Chem.2012, 84, 320–326. Copyright 2006 American Chemical Society.)

Raman measurements of powder mixtures (Chen et al. 2011)

A total of 72 powder mixtures of potassium chromate and barium nitrate powder with different weight ratios (1:0, 0.90:0.10, 0.75:0.25, 0.60:0.40, 0.50:0.50, 0.40:0.60, 0.25:0.75, 0.10:0.90, and 0:1) and different particle sizes (425, 250, 180, 150, 125, 109, 96, and 75 μm) were prepared (Chen et al. 2011). For each of 72 powder mixtures, a sample was randomly taken and loosely packed into a cylindrical sample cup, and its Raman spectrum was acquired. Following this, each sample was packed more firmly, and a further Raman spectrum was recorded. The resulting 142 spectra (two outliers were removed) were divided into calibration (78 spectra) and test sets (64 spectra). Raman signals between 292.8 and 1136.6 cm-1 were selected for subsequent data analysis. More experimental details can be found in the original paper of Chen et al. (2011).

The particle size and compactness of the powder mixture samples have significant influences on Raman peak intensities. For samples with the same mass ratios of potassium chromate to barium nitrate and particle size, a firmly packed sample has significantly more intense Raman peaks than those of a loosely packed sample (Figure 5A). Variations in particle size of powder samples also have significant effects on Raman spectra (Figure 5B). Figure 6 compared the performance of various PLS calibration models and the dual calibration model (DCS) based on OPLEC for correcting the detrimental multiplicative confounding effects caused by variations in particle size and compactness of powder samples. The RMSEP value of the optimal PLS model (PLS_raw) on the raw Raman spectra for the independent test samples was 0.08 (equivalent to an average relative error of 30.8%), which clearly demonstrates that PLS is incapable of discriminating Raman intensity contributions due to a variation in mass fractions of chemical constituents from those caused by changes in particle size and compactness of a sample. The predictive ability of PLS calibration models was significantly deteriorated rather than improved due to the application of MSC, EISC, and SNV. As expected, the DCS model based on OPLEC attained a RMSEP value of 0.04 for independent test samples, equivalent to an average relative error of 9.6%, which is less than one-third of the corresponding value of the optimal PLS_raw model. Such a significant reduction in RMSEP value fully demonstrated that the dual calibration strategy based on OPLEC can effectively model the confounding effects of physical properties of a sample and improve the accuracy of quantitative analysis of powder samples using Raman spectrometry. Hence, it will be of major benefit for quantitative analysis of particulate samples such as powder blends and pharmaceutical dosage forms.

Figure 5 (A) Raman spectra of a binary powder mixture sample (potassium chromate:barium nitrate: 0.90:0.10, particle size, 425 μm) with different compactness (blue dashed line, firmly packed; red solid line, loosely packed); (B) peak intensity at 1047.5 cm-1 vs. mass fraction of barium nitrate with different particle sizes (black circle, 180 μm; blue triangle, 109 μm; red square, 75 μm). (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2012, 84, 4088–4094. Copyright 2006 American Chemical Society.)

Figure 5

(A) Raman spectra of a binary powder mixture sample (potassium chromate:barium nitrate: 0.90:0.10, particle size, 425 μm) with different compactness (blue dashed line, firmly packed; red solid line, loosely packed); (B) peak intensity at 1047.5 cm-1 vs. mass fraction of barium nitrate with different particle sizes (black circle, 180 μm; blue triangle, 109 μm; red square, 75 μm). (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2012, 84, 4088–4094. Copyright 2006 American Chemical Society.)

Figure 6 The RMSEP values of different calibration methods for both calibration and independent test samples. (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2012, 84, 4088–4094. Copyright 2006 American Chemical Society.)

Figure 6

The RMSEP values of different calibration methods for both calibration and independent test samples. (Reprinted with permission from Z.P. Chen et al. Anal. Chem.2012, 84, 4088–4094. Copyright 2006 American Chemical Society.)

Concluding remarks

As discussed above, all empirical methods such as SNV, ISC, EISC, and MSC impose stringent requirements on the spectra of samples under study. Their results are reliable only when their requirements have been fully satisfied. However, their stringent requirements can hardly be satisfied in real world applications. Therefore, special care should be taken when utilizing empirical methods to correct multiplicative light scattering effects in spectral measurements.

Although physical model-based techniques impose no special requirements on the features of the spectra of samples under study, they need two or more measurements (e.g., total diffuse transmittance, total diffuse reflectance, and collimated transmittance) for each turbid sample to calculate the so-called “pure” absorption and scattering spectra from which the concentrations of the analyte of interest can then be determined. Obviously, physical model-based techniques are, at present, applicable to only off-line spectroscopic analysis. Their application in on-line/in-line quantitative monitoring of complex chemical and biochemical processes is hindered by the difficulty in designing an optical probe capable of measuring two or more spectral measurements simultaneously.

The semi-empirical methods such as EMSC and OPLEC are established on a semi-empirical model in which the effects of multiplicative light scattering on spectral measurements are approximately modeled by an additive parameter and a multiplicative parameter. Although the model adopted by the semi-empirical methods is relatively simple, they have rather good performance. However, the application of EMSC and its modification version is limited to only relatively simple systems where the pure spectra of all chemical constituents are available.

Among all correction methods discussed in this review, OPLEC is rather unique. It is a novel calibration strategy rather than a preprocessing method. Without the need of any prior information such as pure spectra of chemical constituents in samples, it can effectively mitigate the detrimental effects of multiplicative light scattering on spectral measurements and achieve fairly accurate concentration predictions for the analytes of interest in complex samples. It therefore seems to be reasonably safe to recommend OPLEC for the correction of multiplicative effects in quantitative spectroscopic analysis of heterogeneous systems.


Corresponding authors: Sheng-Zi Liu, The Medical College, Hunan Normal University, Changsha, Hunan 410006, PR China and Zeng-Ping Chen, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, Hunan 410082, PR China

The authors acknowledge the financial support of the National Natural Science Foundation of China (Grants 21075034, 21275046, and 21035001), the National Instrumentation Program of China (Grant 2011YQ0301240102), and the Program for New Century Excellent Talents in University (NCET-12-0161).

Appendix 1

The MATLAB code for the modified OPLEC method

% [b, fval]=OPLECm(X, c, CompNumb);

% this is an m-file for the estimation of the multiplicative effect vector b for the calibration samples;

% X contains xi in its rows; xi (i=1,2,…,I) are the spectra of I calibration samples.

% c is the concentration vector of the target chemical component in the calibration samples;

% CompNumb is the number of spectroscopically active chemical components in mixture samples;

% b is a vector containing the multiplicative scattering parameters for the calibration samples;

% fval is the value of objective function at b;

function [b, fval]=OPLECm(X, c, CompNumb);

[U,S,V]=svd(X);

Us = U(:,1:CompNumb);

n=length(c);

w=max(c);

H1=eye(n, n)-Us*Us′;

H2=diag(c./w)*H1*diag(c./w);

H=H1 + H2;% matrix H in min(0.5*b′*H*b+f′*b);

f=zeros(n,1);% vector f in min(0.5*b′*H*b+f′*b);

A=-eye(n,n);% matrix A in A*b<=p;

p=-ones(n,1);% vector p in A*b<=p;

StartingVect=ones(n,1);

options=optimset(“quadprog”);

options=optimset(options,“LargeScale”,“off”,“Display”,“off”);

[b,fval]=quadprog(H,f,A,p,[],[],[],[],StartingVect, options);

% after obtaining the model parameter vector b for calibration samples, two calibration models are built using the standard PLS toolbox. One is between the concentration vector (c) of the target chemical component and the spectral data X; the other is between diag(c)b and X. The multiplicative effect on the test sample can then be corrected through dividing the prediction of the second calibration model by the prediction of the first calibration model.

References

Azzouz, T.; Tauler, R. Application of multivariate curve resolution alternating least squares (MCR-ALS) to the quantitative analysis of pharmaceutical and agricultural samples. Talanta 2008, 74, 1201–1210. Search in Google Scholar

Barman, I.; Singh, G. P.; Dasari, R. R.; Feld, M. S. Turbidity-corrected Raman spectroscopy for blood analyte detection. Anal. Chem. 2009, 81, 4233–4240. Search in Google Scholar

Barnes, R. J.; Dhanoa, M. S.; Lister, S. J. Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl. Spectrosc. 1989, 43, 772–777. Search in Google Scholar

Bechtel, K. L.; Shih, W. C.; Feld, M. S. Intrinsic Raman spectroscopy for quantitative biological spectroscopy part II: experimental applications. Opt. Express 2008, 16, 12737–12745. Search in Google Scholar

Birch, M.; Fussell, S. J.; Higginson, P. D.; McDowall, N.; Marziano, I. Towards a PAT-based strategy for crystallization development. Organ. Proc. Res. Dev. 2005, 9, 360–364. Search in Google Scholar

Biswal, N. C.; Gupta, S.; Ghosh, N.; Pradhan, A. Recovery of turbidity free fluorescence from measured fluorescence: an experimental approach. Opt. Express 2003, 11, 3320–3331. Search in Google Scholar

Blanco, M.; Gozález Bañó, R.; Bertran, E. Monitoring powder blending in pharmaceutical processes by use of near infrared spectroscopy. Talanta 2002, 56, 203–212. Search in Google Scholar

Borggaard, C.; Thodberg, H. H. Optimal minimal neural interpretation of spectra. Anal. Chem. 1992, 64, 545–551. Search in Google Scholar

Bradley, R. S.; Thorniley, M. S. A review of attenuation correction techniques for tissue fluorescence. J. R. Soc. Interf. 2006, 3, 1–13. Search in Google Scholar

Chen, Z. P.; Morris, J. Improving the linearity of spectroscopic data subjected to fluctuations in external variables by the extended loading space standardization. Analyst 2008, 133, 914–922. Search in Google Scholar

Chen, Z. P.; Morris, J. Process analytical technology and compensating for nonlinear effects in process spectroscopic data for improved process monitoring and control. Biotechnol. J. 2009, 4, 610–619. Search in Google Scholar

Chen, X. D.; Xie, H. B.; Xu, Z.; Yu, D. Y. Correction of tissue autofluorescence by reflectance spectrum. Proc. SPIE 2002, 4916, 441–444. Search in Google Scholar

Chen, Z. P.; Morris, J.; Martin, E. Extracting chemical information from spectral data with multiplicative light scattering effects by optical path-length estimation and correction. Anal. Chem. 2006, 78, 7674–7681. Search in Google Scholar

Chen, Z. P.; Fevotte, G.; Caillet, A.; Littlejohn, D.; Morris, J. Advanced calibration strategy for in-situ quantitative monitoring of phase transition processes in suspensions using FT-Raman spectroscopy. Anal. Chem. 2008, 80, 6658–6665. Search in Google Scholar

Chen, Z. P.; Morris, J.; Borissova, A.; Khan S.; Mahmud, T.; Penchev, R.; Roberts, K. J. On-line monitoring of batch cooling crystallisation of organic compounds using ATR-FTIR spectroscopy coupled with an advanced chemometric calibration method. Chemom. Intell. Lab. Syst. 2009, 96, 49–58. Search in Google Scholar

Chen, Z. P.; Zhong, L. J.; Nordon, A.; Littlejohn, D.; Holden, M.; Fazenda, M.; Harvey, L.; McNeil, B.; O’Kennedy, R. Calibration of multiplexing fibre-optic spectroscopy. Anal. Chem. 2011, 83, 2655–2659. Search in Google Scholar

Chen, Z. P; Li, L. M.; Jin, J. W.; Nordon, A.; Littlejohn, D.; Yang, J.; Zhang, J.; Yu, R. Q. Quantitative analysis of powder mixtures by Raman spectrometry: the influence of particle size and its correction. Anal. Chem. 2012, 84, 4088–4094. Search in Google Scholar

De Beer, T. R. M.; Allesø, M.; Goethals, F.; Coppens, A.; Vander Heyden, Y.; Lopez De Diego, H.; Rantanen, J.; Verpoort, F.; Vervaet, C.; Remon, J. P.; Baeyens, W. R. G. Implementation of a process analytical technology system in a freeze-drying process using Raman spectroscopy for in-line process monitoring. Anal. Chem. 2007, 79, 7992–8003. Search in Google Scholar

De Juan, A.; Tauler, R. Chemometrics applied to unravel multicomponent processes and mixtures: revisiting latest trends in multivariate resolution. Anal. Chim. Acta 2003, 500, 195–210. Search in Google Scholar

Durkin, A. J.; Jaikumar, S.; Ramanujam, N.; Richards-Kortum, R. Relation between fluorescence spectra of dilute and turbid samples. Appl. Opt. 1994, 33, 414–423. Search in Google Scholar

Ferreira, A. P.; Menezes, J. C. Monitoring a complex medium fermentation with sample-sample two-dimensional FT-NIR correlation spectroscopy. Biotechnol. Prog. 2006, 22, 866–872. Search in Google Scholar

Finlay, J. C.; Conover, D. L.; Hull, E. L.; Foster, T. H. Porphyrin bleaching and PDT-induced spectral changes are irradiance dependent in ALA-sensitized normal rat skin in vivo. Photochem. Photobiol. 2001, 73, 54–63. Search in Google Scholar

Geladi, P.; McDougall, D.; Martens, H. Linearization and scatter-correction for near-infrared reflectance spectra of meat. Appl. Spectrosc. 1985, 39, 491–500. Search in Google Scholar

Ghosh, P. K.; Jayas, D. S. Use of spectroscopic data for automation in food processing industry. Sens. Instrument. Food Qual. Safe. 2009, 3, 3–11. Search in Google Scholar

Helland, I. S.; Næs, T.; Isaksson, T. Related versions of the multiplicative scatter correction method for preprocessing spectroscopic data. Chemom. Intell. Lab. Syst. 1995, 29, 233–241. Search in Google Scholar

Horowitz, B.; Gridin, V. V.; Bulatov, V.; Schechter, I. Laser-induced fluorescence of perylene in a microparticle suspension environment. Anal. Chem. 1998, 70, 3191–3197. Search in Google Scholar

Ishimaru, A. Wave Propagation and Scattering in Random Media; IEEE Press/Oxford University Press: Oxford, 1997. Search in Google Scholar

Jin, J. W.; Chen, Z. P.; Li, L. M.; Steponavicius, R.; Thennadil, S. N.; Yang, J.; Yu, R. Q. Quantitative spectroscopic analysis of heterogeneous mixtures: the correction of multiplicative effects caused by variations in physical properties of samples. Anal. Chem. 2012, 84, 320–326. Search in Google Scholar

Jørgensen, A.; Rantanen, J.; Karjalainen, M.; Khriachtchev, L.; Räsänen, E.; Yliruusi, J. Hydrate formation during wet granulation studied by spectroscopic methods and multivariate analysis. Pharm. Res. 2002, 19, 1285–12891. Search in Google Scholar

Jørgensen, P.; Pedersen, J. G.; Jensen, E. P.; Esbensen, K. H. On-line batch fermentation process monitoring (NIR) – introducing ‘biological process time’. J. Chemom. 2004, 18, 81–91. Search in Google Scholar

Kessler, W.; Oelkrug, D.; Kessler, R. Using scattering and absorption spectra as MCR-hard model constraints for diffuse reflectance measurements of tablets. Anal. Chim. Acta 2009, 642, 127–134. Search in Google Scholar

Kirdar, A. O.; Chen, G.; Weidner, J.; Rathore, A. S. Application of near-infrared (NIR) spectroscopy for screening of raw materials used in the cell culture medium for the production of a recombinant therapeutic protein. Biotechnol. Prog. 2010, 26, 527–531. Search in Google Scholar

Kubelka, P. New contributions to the optics of intensely light-scattering materials, part I. J. Opt. Soc. Am. 1948, 38, 448–457. Search in Google Scholar

Kubelka, P.; Munk, F. Ein beitrag zur optik der farbanstriche. Z. Tech. Phys. 1931, 12, 593–620. Search in Google Scholar

Lopes, J. A.; Menezes, J. C. Industrial fermentation end-product modelling with multilinear PLS. Chemom. Intell. Lab. Syst. 2003, 68, 75–81. Search in Google Scholar

Lu, X.; Rasco, B. A. Determination of antioxidant content and antioxidant activity in foods using infrared spectroscopy and chemometrics: a review. Crit. Rev. Food Sci. Nutr. 2012, 52, 853–875. Search in Google Scholar

Martens, H.; Nielsen J. P.; Engelsen, S. B. Light scattering and light absorbance separated by extended multiplicative signal correction. Application to near-infrared transmission analysis of powder mixtures. Anal. Chem. 2003, 75, 394–404. Search in Google Scholar

Nordon, A.; Littlejohn, D.; Dann, A. S.; Jeffkins, P. A.; Richardson, M. D.; Stimpson, S. L. In situ monitoring of the seed stage of a fermentation process using non-invasive NIR spectrometry. Analyst 2008, 133, 660–666. Search in Google Scholar

Ottestad, S.; Isaksson, T.; Saeys, W.; Wold, J. P. Scattering correction by use of a priori information. Appl. Spectrosc. 2010, 64, 795–804. Search in Google Scholar

Parris, J.; Airiau, C.; Escott, R.; Rydzak, J.; Crocombe, R. Monitoring API drying operations with NIR. Spectroscopy 2005, 20, 34–41. Search in Google Scholar

Pedersen, D. K.; Martens, H.; Nielsen, J. P.; Engelsen, S. B. Near-infrared absorption and scattering separated by extended inverted signal correction (EISC): analysis of near-infrared transmittance spectra of single wheat seeds. Appl. Spectrosc. 2002, 56, 1206–1214. Search in Google Scholar

Pogue, B. W.; Burke, G. Fiber-optic bundle design for quantitative fluorescence measurement from tissue. Appl. Opt. 1998, 37, 7429–7436. Search in Google Scholar

Popo, M.; Romero-Torres, S.; Conde, C.; Romañach, R. J. Blend uniformity analysis using stream sampling and near infrared spectroscopy. AAPS PharmSciTech 2002, 3, 61–71. Search in Google Scholar

Shih, W. C.; Bechtel, K. L.; Feld, M. S. Intrinsic Raman spectroscopy for quantitative biological spectroscopy part I: theory and simulations. Opt. Express 2008, 16, 12726–12736. Search in Google Scholar

Siesler, H. W.; Ozaki, Y.; Kawata, S.; Heise, H. M. Near-infrared Spectroscopy: Principal, Instruments, Applications; Wiley VCH: Weinheim, 2002. Search in Google Scholar

Steponavicius, R.; Thennadil, S. N. Extraction of chemical information of suspensions using radiative transfer theory to remove multiple scattering effects: application to a model two-component system. Anal. Chem. 2009, 81, 7713–7723. Search in Google Scholar

Steponavicius, R.; Thennadil, S. N. Extraction of chemical information of suspensions using radiative transfer theory to remove multiple scattering effects: application to a model multicomponent system. Anal. Chem. 2011, 83, 1931–1937. Search in Google Scholar

Sterenborg, H. J. C. M.; Saarnal, A. E.; Frank, R.; Motamedi, M. Evaluation of spectral correction techniques for fluorescence measurements on pigmented lesions in vivo. J. Photochem. Photobiol. B: Biol. 1996, 35, 159–165. Search in Google Scholar

Stevens, A.; van Wesemael, B.; Vandenschrick, G.; Touré, S.; Tychon, B. Detection of carbon stock change in agricultural soils using spectroscopic techniques. Soil Sci. Soc. Am. J. 2006, 70, 844–850. Search in Google Scholar

Sun, C. X.; Zang, H. C.; Liu, X. M.; Dong, Q.; Li, L.; Wang, F. S.; Sui, L. Y. Determination of potency of heparin active pharmaceutical ingredient by near infrared reflectance spectroscopy. J. Pharmaceut. Biomed. Anal. 2010, 51, 1060–1063. Search in Google Scholar

Thennadil, S. N. Relationship between the Kubelka-Munk scattering and radiative transfer coefficients. J. Opt. Soc. Am. A Image Sci. Vis. 2008, 25, 1480–1485. Search in Google Scholar

Thennadil, S. N.; Martens, H.; Kohler, A. Physics-based multiplicative scatter correction approaches for improving the performance of calibration models. Appl. Spectrosc. 2006, 60, 315–321. Search in Google Scholar

Virtanen, T.; Maunu, S. L. Quantitation of a polymorphic mixture of an active pharmaceutical ingredient with solid state 13C CPMAS NMR spectroscopy. Int. J. Pharm. 2010, 394, 18–25. Search in Google Scholar

Wang, K.; Chi, G.; Lau, R.; Chen, T. Multivariate calibration of near infrared spectroscopy in the presence of light scattering effect: a comparative study. Anal. Lett. 2011, 44, 824–836. Search in Google Scholar

Weersink, R.; Patterson, M. S.; Diamond, K. R.; Silver, S.; Padgett, N. Noninvasive measurement of fluorophore concentration in turbid media with a simple fluorescence/reflectance ratio technique. Appl. Opt. 2001, 40, 6389–6395. Search in Google Scholar

Zhang, J. C.; Yuan, L.; Wang, J. H.; Huang, W. J.; Chen, L. P.; Zhang, D. Y. Spectroscopic leaf level detection of powdery mildew for winter wheat using continuous wavelet analysis. J. Integrat. Agric. 2012, 11, 1474–1484. Search in Google Scholar

Received: 2012-11-25
Accepted: 2013-2-13
Published Online: 2013-04-02
Published in Print: 2013-05-01

©2013 by Walter de Gruyter Berlin Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.