Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter July 6, 2019

Evaluating the Performance of Newly Integrated Model in Nonlinear Chemical Process Against Missing Measurements

Vivianna Maria Mickel ORCID logo, Wan Sieng Yeo ORCID logo and Agus Saptoro ORCID logo


Application of data-driven soft sensors in manufacturing fields, for instance, chemical, pharmaceutical, and bioprocess have rapidly grown. The issue of missing measurements is common in chemical processing industries that involve data-driven soft sensors. Locally weighted Kernel partial least squares (LW-KPLS) algorithm has recently been proposed to develop adaptive soft sensors for nonlinear processes. This algorithm generally works well for complete datasets; however, it is unable to cope well with any datasets comprising missing measurements. Despite the above issue, limited studies can be found in assessing the effects of incomplete data and their treatment method on the predictive performances of LW-KPLS. To address these research gaps, therefore, a trimmed scores regression (TSR) based missing data imputation method was integrated to LW-KPLS to formulate trimmed scores regression assisted locally weighted Kernel partial least squares (TSR-LW-KPLS) model. In this study, this proposed TSR-LW-KPLS was employed to deal with missing measurements in nonlinear chemical process data. The performances of TSR-LW-KPLS were evaluated using three case studies having different percentages of missing measurements varying from 5 % to 40 %. The obtained results were then compared to the results from singular value decomposition assisted locally weighted Kernel partial least squares (SVD-LW-KPLS) model. SVD-LW-KPLS was also proposed by incorporating a singular value decomposition (SVD) based missing data treatment method into LW-KPLS. From the comparative studies, it is evident that the predictive accuracies of TSR-LW-KPLS are superior compared to the ones from SVD-LW-KPLS.


The authors would like to acknowledge the Fundamental Research Grant Scheme (FRGS/2/2014/TK05/ CURTIN/02/1), Ministry of Education, Malaysia and Curtin University Malaysia for co-funding this research.


[1] Kano M, Fujiwara K. Virtual sensing technology in process industries: trends and challenges revealed by recent industrial applications. J Chem Eng Jpn. 2013;46:1–17.10.1252/jcej.12we167Search in Google Scholar

[2] Yao L, Ge Z. Deep learning of semisupervised process data with hierarchical extreme learning machine and soft sensor application. IEEE Trans Ind Electron. 2018;65:1490–8.10.1109/TIE.2017.2733448Search in Google Scholar

[3] Ookita K. Operation and quality control for chemical plants by soft-sensors. CICSJ Bulletin. 2006;24:31–3.Search in Google Scholar

[4] Saptoro A. State of the art in the development of adaptive soft sensors based on just-in-time models. Procedia Chem. 2014;9:226–34.10.1016/j.proche.2014.05.027Search in Google Scholar

[5] Meng Y, Lan Q, Qin J, Yu S, Pang H, Zheng K. Data-driven soft sensor modeling based on twin support vector regression for cane sugar crystallization. J Food Eng. 2019;241:159–65.10.1016/j.jfoodeng.2018.07.035Search in Google Scholar

[6] Gopakumar V, Tiwari S, Rahman I. A deep learning based data driven soft sensor for bioprocesses. Biochem Eng J. 2018;136:28–39.10.1016/j.bej.2018.04.015Search in Google Scholar

[7] Pan B, Jin H, Wang L, Qian B, Chen X, Huang S, et al. Just-in-time learning based soft sensor with variable selection and weighting optimized by evolutionary optimization for quality prediction of nonlinear processes. Chem Eng Res Des. 2019;144:285–9910.1016/j.cherd.2019.02.004Search in Google Scholar

[8] Yuan X, Huang B, Wang Y, Yang C, Gui W. Deep learning-based feature representation and its application for soft sensor modeling with variable-wise weighted SAE. IEEE Trans Ind Inf. 2018;14:3235–43.10.1109/TII.2018.2809730Search in Google Scholar

[9] Kadlec P, Gabrys B, Strandt S. Data-driven Soft Sensors in the process industry. Comput Chem Eng. 2009;33:795–814.10.1016/j.compchemeng.2008.12.012Search in Google Scholar

[10] Jiang Y, Yin S. Recent advances in key-performance-indicator oriented Prognosis and Diagnosis with a MATLAB toolbox: DB-KIT. IEEE Trans Ind Inf. 2019;15:2849–58.10.1109/TII.2018.2875067Search in Google Scholar

[11] Jiang Y, Yin S, Kaynak O. Data-driven monitoring and safety control of industrial cyber-physical systems: basics and beyond. IEEE Access. 2018;6:47374–84.10.1109/ACCESS.2018.2866403Search in Google Scholar

[12] Kresta JV, Macgregor JF, Marlin TE. Multivariate statistical monitoring of process operating performance. Can J Chem Eng. 1991;69:35–47.10.1002/cjce.5450690105Search in Google Scholar

[13] Bidar B, Sadeghi J, Shahraki F, Khalilipour MM. Data-driven soft sensor approach for online quality prediction using state dependent parameter models. Chemom Intell Lab Sys. 2017;162:130–41.10.1016/j.chemolab.2017.01.004Search in Google Scholar

[14] Yuan X, Zhang H, Song Z. A soft-sensor for estimating copper quality by image analysis technology. In: 2013 10th IEEE International Conference on Control and Automation (ICCA), 2013:991–610.1109/ICCA.2013.6565042Search in Google Scholar

[15] Ge Z, Song Z, Zhao L, Gao F. Two-level PLS model for quality prediction of multiphase batch processes. Chemom Intell Lab Sys. 2014;130:29–36.10.1016/j.chemolab.2013.09.008Search in Google Scholar

[16] Martínez-Guijarro R, Pachés M, Ferrer J, Seco A. Model performance of partial least squares in utilizing the visible spectroscopy data for estimation of algal biomass in a photobioreactor. Environ Technol Innovation. 2018;10:122–31.10.1016/j.eti.2018.01.005Search in Google Scholar

[17] Shang C, Yang F, Huang D, Lyu W. Data-driven soft sensor development based on deep learning technique. J Process Control. 2014;24:223–33.10.1016/j.jprocont.2014.01.012Search in Google Scholar

[18] Shao W, Tian X. Adaptive soft sensor for quality prediction of chemical processes based on selective ensemble of local partial least squares models. Chem Eng Res Des. 2015;95:113–32.10.1016/j.cherd.2015.01.006Search in Google Scholar

[19] Vásquez N, Magán C, Oblitas J, Chuquizuta T, Avila-George H, Castro W. Comparison between artificial neural network and partial least squares regression models for hardness modeling during the ripening process of Swiss-type cheese using spectral profiles. J Food Eng. 2018;219:8–15.10.1016/j.jfoodeng.2017.09.008Search in Google Scholar

[20] Zhang X, Kano M, Li Y. Locally weighted kernel partial least squares regression based on sparse nonlinear features for virtual sensing of nonlinear time-varying processes. Comput Chem Eng. 2017;104:164–71.10.1016/j.compchemeng.2017.04.014Search in Google Scholar

[21] Woo SH, Jeon CO, Yun Y-S, Choi H, Lee C-S, Lee DS. On-line estimation of key process variables based on kernel partial least squares in an industrial cokes wastewater treatment plant. J Hazard Mater. 2009;161:538–44.10.1016/j.jhazmat.2008.04.004Search in Google Scholar

[22] Zhang Y, Teng Y, Zhang Y. Complex process quality prediction using modified kernel partial least squares. Chem Eng Sci. 2010;65:2153–8.10.1016/j.ces.2009.12.010Search in Google Scholar

[23] Fei C, Wang F, Xiaogang W, Zhang S. A kernel partial least squares method for gas turbine power plant performance prediction. In: 2012 24th Chinese Control and Decision Conference (CCDC), 2012:3170–410.1109/CCDC.2012.6244501Search in Google Scholar

[24] Iwata H, Ebana K, Uga Y, Hayashi T. Genomic prediction of biological shape: elliptic fourier analysis and kernel partial least squares (PLS) regression applied to grain shape prediction in rice (Oryza sativa L.). PLoS One. 2015;10:e0120610.10.1371/journal.pone.0120610Search in Google Scholar PubMed

[25] Zhang X, Yan W, Shao H. Nonlinear multivariate quality estimation and prediction based on Kernel partial least squares. Ind Eng Chem Res. 2008;47:1120–31.10.1021/ie070741+Search in Google Scholar

[26] Wang Y, Cao H, Zhou Y, Zhang Y. Nonlinear partial least squares regressions for spectral quantitative analysis. Chemom Intell Lab Sys. 2015;148:32–50.10.1016/j.chemolab.2015.08.024Search in Google Scholar

[27] Jin H, Chen X, Yang J, Wu L. Adaptive soft sensor modeling framework based on just-in-time learning and kernel partial least squares regression for nonlinear multiphase batch processes. Comput Chem Eng. 2014;71:77–93.10.1016/j.compchemeng.2014.07.014Search in Google Scholar

[28] Nakagawa H, Tajima T, Kano M, Kim S, Hasebe S, Suzuki T, et al. Evaluation of infrared-reflection absorption spectroscopy measurement and locally weighted partial least-squares for rapid analysis of residual drug substances in cleaning processes. Anal Chem. 2012;84:3820–6.10.1021/ac202443aSearch in Google Scholar PubMed

[29] Gupta A, Vasava HB, Das BS, Choubey AK. Local modeling approaches for estimating soil properties in selected Indian soils using diffuse reflectance data over visible to near-infrared region. Geoderma. 2018;325:59–71.10.1016/j.geoderma.2018.03.025Search in Google Scholar

[30] W S Y, Saptoro A, Kumar P. Development of adaptive soft sensor using locally weighted Kernel partial least square model. Chem Prod Process Model. 2017;12:20170022Search in Google Scholar

[31] Xu X, Chong W, Li S, Arabo A, Xiao J. MIAEC: missing data imputation based on the evidence Chain. IEEE Access. 2018;6:12983–92.10.1109/ACCESS.2018.2803755Search in Google Scholar

[32] Arteaga F, Ferrer A. Dealing with missing data in MSPC: several methods, different interpretations, some examples. J Chemom. 2002;16:408–18.10.1002/cem.750Search in Google Scholar

[33] Schmitt P, Mandel J, Guedj M. A comparison of six methods for missing data imputation. J Biom Biostat. 2015;6:224. DOI: 10.4172/2155- 6180.1000224.Search in Google Scholar

[34] Folch-Fortuny A, Arteaga F, Ferrer A. PLS model building with missing data: new algorithms and a comparative study. J Chemom. 2017;31:e2897.10.1002/cem.2897Search in Google Scholar

[35] Bashir F, Wei H-L. Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm. Neurocomputing. 2018;276:23–30.10.1016/j.neucom.2017.03.097Search in Google Scholar

[36] Mercier S, Uysal I. Noisy matrix completion on a novel neural network framework. Chemom Intell Lab Sys. 2018;177:1–7.10.1016/j.chemolab.2018.04.001Search in Google Scholar

[37] Folch-Fortuny A, Villaverde AF, Ferrer A, Banga JR. Enabling network inference methods to handle missing data and outliers. BMC Bioinf. 2015;16:283.10.1186/s12859-015-0717-7Search in Google Scholar

[38] Folch-Fortuny A, Arteaga F, Ferrer A. PCA model building with missing data: new proposals and a comparative study. Chemom Intell Lab Sys. 2015;146:77–88.10.1016/j.chemolab.2015.05.006Search in Google Scholar

[39] Serneels S, Verdonck T. Principal component regression for data containing outliers and missing elements. Comput Stat Data Anal. 2009;53:3855–63.10.1016/j.csda.2009.04.008Search in Google Scholar

[40] Sedghi S, Sadeghian A, Huang B. Mixture semisupervised probabilistic principal component regression model with missing inputs. Comput Chem Eng. 2017;103:176–87.10.1016/j.compchemeng.2017.03.015Search in Google Scholar

[41] Yuan X, Ge Z, Huang B, Song Z. A probabilistic just-in-time learning framework for soft sensor development with missing data. IEEE Trans Control Syst Technol. 2017;25:1124–32.10.1109/TCST.2016.2579609Search in Google Scholar

[42] Liu Y, Brown SD. Comparison of five iterative imputation methods for multivariate classification. Chemom Intell Lab Sys. 2013;120:106–15.10.1016/j.chemolab.2012.11.010Search in Google Scholar

[43] Walczak B, Massart DL. Dealing with missing data: part I. Chemom Intell Lab Sys. 2001;58:15–27.10.1016/S0169-7439(01)00131-9Search in Google Scholar

[44] Severson KA, Molaro MC, Braatz RD. Principal component analysis of process datasets with missing values. Processes. 2017;5:38.10.3390/pr5030038Search in Google Scholar

[45] Kim S, Kano M, Hasebe S, Takinami A, Seki T. Long-term industrial applications of inferential control based on just-in-time soft-sensors: economical impact and challenges. Ind Eng Chem Res. 2013;52:12346–56.10.1021/ie303488mSearch in Google Scholar

[46] Dobbin KK, Simon RM. Optimally splitting cases for training and testing high dimensional classifiers. BMC Med Genomics. 2011;4:31.10.1186/1755-8794-4-31Search in Google Scholar PubMed

[47] Liu Y, Gao Z, Li P, Wang H. Just-in-time Kernel learning with adaptive parameter selection for soft sensor modeling of batch processes. Ind Eng Chem Res. 2012;51:4313–27.10.1021/ie201650uSearch in Google Scholar

[48] Shafiee G, Arefi MM, Jahed-Motlagh MR, Jalali AA. Nonlinear predictive control of a polymerization reactor based on piecewise linear Wiener model. Chem Eng J. 2008;143:282–92.10.1016/j.cej.2008.05.013Search in Google Scholar

[49] Xiaofeng Y, Ge Z, Song Z. Locally weighted Kernel principal component regression model for soft sensing of nonlinear time-variant processes. Ind Eng Chem Res. 2014;53:13736–49.10.1021/ie4041252Search in Google Scholar

[50] Birol G, Ündey C, Çinar A. A modular simulation package for fed-batch fermentation: penicillin production. Comput Chem Eng. 2002;26:1553–65.10.1016/S0098-1354(02)00127-8Search in Google Scholar

[51] Yuan X, Huang B, Ge Z, Song Z. Double locally weighted principal component regression for soft sensor with sample selection under supervised latent structure. Chemom Intell Lab Sys. 2016;153:116–25.10.1016/j.chemolab.2016.02.014Search in Google Scholar

[52] Kuang B, Tekin Y, Mouazen AM. Comparison between artificial neural network and partial least squares for on-line visible and near infrared spectroscopy measurement of soil organic carbon, pH and clay content. Soil Tillage Res. 2015;146:243–52.10.1016/j.still.2014.11.002Search in Google Scholar

[53] Dou Y, Sun Y, Ren Y, Ren Y. Artificial neural network for simultaneous determination of two components of compound paracetamol and diphenhydramine hydrochloride powder on NIR spectroscopy. Anal Chim Acta. 2005;528:55–61.10.1016/j.aca.2004.10.050Search in Google Scholar

[54] Rashid NA, Mohd Rosely NA, Mohd. Noor MA, Shamsuddin A, Abd. Hamid MK, Asri Ibrahim K. Forecasting of refined palm oil quality using principal component regression. Energy Procedia. 2017;142:2977–82.10.1016/j.egypro.2017.12.364Search in Google Scholar

[55] Bowden GJ, Maier HR, Dandy GC. Optimal division of data for neural network models in water resources applications. Water Resour Res. 2002;38:2–1–2–11.10.1029/2001WR000266Search in Google Scholar

Received: 2018-11-28
Revised: 2019-06-01
Accepted: 2019-06-19
Published Online: 2019-07-06

© 2019 Walter de Gruyter GmbH, Berlin/Boston