Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Reviews in Chemical Engineering

Editor-in-Chief: Luss, Dan / Brauner, Neima

Editorial Board: Agar, David / Davis, Mark E. / Edgar, Thomas F. / Giorno, Lidietta / Joshi, J. B. / Khinast, Johannes / Kost, Joseph / Leal, L. Gary / Li, Jinghai / Mills, Patrick / Morbidelli, Massimo / Ng, Ka Ming / Schouten, Jaap C. / Schroeder, Avi / Seinfeld, John / Stitt, E. Hugh / Tronconi, Enrico / Vayenas, Constantinos G. / Zagoruiko, Andrey / Zondervan, Edwin

IMPACT FACTOR 2018: 4.200

CiteScore 2018: 4.96

SCImago Journal Rank (SJR) 2018: 1.016
Source Normalized Impact per Paper (SNIP) 2018: 1.572

See all formats and pricing
More options …
Volume 31, Issue 5


Data cleaning in the process industries

Shu Xu
  • McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 78712, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Bo Lu
  • McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 78712, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Michael Baldea
  • McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 78712, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Thomas F. Edgar
  • Corresponding author
  • McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 78712, USA
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Willy Wojsznis / Terrence Blevins / Mark Nixon
Published Online: 2015-09-15 | DOI: https://doi.org/10.1515/revce-2015-0022


In the past decades, process engineers are facing increasingly more data analytics challenges and having difficulties obtaining valuable information from a wealth of process variable data trends. The raw data of different formats stored in databases are not useful until they are cleaned and transformed. Generally, data cleaning consists of four steps: missing data imputation, outlier detection, noise removal, and time alignment and delay estimation. This paper discusses available data cleaning methods that can be used in data pre-processing and help overcome challenges of “Big Data”.

Keywords: big data; data cleaning; knowledge discovery; missing data imputation; noise removal; outlier detection; time alignment and delay estimation


  • Abraham B, Box GEP. Bayesian analysis of some outlier problems in time series. Biometrika 1979; 66: 229–236.CrossrefGoogle Scholar

  • Abraham B, Chuang A. Outlier detection and time series modeling. Technometrics 1989; 31: 241–248.CrossrefGoogle Scholar

  • Abuelzeet ZH, Becerra VM, Roberts PD. Combined bias and outlier identification in dynamic data reconciliation. Comput Chem Eng 2002; 26: 921–935.CrossrefGoogle Scholar

  • Aguiar-Conraria L, Soares MJ. The continuous wavelet transform: a primer. Technical report, Portugal: Economics Department, University of Minho, 2011.Google Scholar

  • Ahmed S. Parameter and delay estimation of continuous-time models from uniformly and non-uniformly sampled data (PhD thesis). Alberta, Canada: University of Alberta, 2006.Google Scholar

  • Allan J, Carbonell J, Doddington G, Yamron J, Yang Y. Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Carnegie Mellon University, 1998; 194–218.Google Scholar

  • Allison PD. Handling missing data by maximum likelihood. In: SAS Global Forum Satistics and Data Analysis. Orlando, Florida: SAS institute, 2012; 1–21.Google Scholar

  • Almeida J, Barbosa L, Pais A, Formosinho S. Improving hierarchical cluster analysis: a new method with outlier detection and automatic clustering. Chemometr Intell Lab Syst 2007; 87: 208–217.CrossrefGoogle Scholar

  • AlMutawa J. Identification of errors-in-variables state space models with observation outliers based on minimum covariance determinant. J Process Control 2009; 19: 879–887.CrossrefGoogle Scholar

  • Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 1992; 46: 175–185.Google Scholar

  • Aminghafari M, Cheze N, Poggi JM. Multivariate denoising using wavelets and principal component analysis. Comput Stat Data Anal 2006; 50: 2381–2398.CrossrefGoogle Scholar

  • Anderson R. Modern methods for robust regression. Quantitative applications in the social sciences. New York: SAGE Publications Inc., 2008.Google Scholar

  • Anderson BD, Moore JB. Optimal filtering. New Jersey: Prentice Hall, 1979.Google Scholar

  • Arteaga F, Ferrer A. Dealing with missing data in MSPC: several methods, different interpretations, some examples. J Chemom 2002; 16: 408–418.CrossrefGoogle Scholar

  • Arteaga F, Ferrer A. Framework for regression-based missing data imputation methods in on-line MSPC. J Chemom 2005; 19: 439–447.CrossrefGoogle Scholar

  • Bakshi BR. Multiscale PCA with application to multivariate statistical process monitoring. AIChE J 1998; 44: 1596–1610.CrossrefGoogle Scholar

  • Baraldi AN, Enders CK. An introduction to modern missing data analyses. J Sch Psychol 2010; 48: 5–37.PubMedCrossrefGoogle Scholar

  • Baraldi P, Maio FD, Genini D, Zio E. Reconstruction of missing data in multidimensional time series by fuzzy similarity. Appl Soft Comput 2014; 26: 1–9.CrossrefGoogle Scholar

  • Barnett V, Lewis T. Outliers in statistical data. Wiley series in probability and mathematical satistics, 2nd ed, Chichester: Wiley, 1984.Google Scholar

  • Bavdekar VA, Deshpande AP, Patwardhan SC. Identification of process and measurement noise covariance for state and parameter estimation using extended Kalman filter. J Process Control 2011; 21: 585–601.CrossrefGoogle Scholar

  • Becker C. The size of the largest nonidentifiable outlier as a performance criterion for multivariate outlier identification: the case of high-dimensional data. In: Bethlehem JG, van der Heijden PG, editors. COMPSTAT. Heidelberg: Physica-Verlag, 2000: 211–216.Google Scholar

  • Benesty J, Huang Y, Chen J. Time delay estimation via minimum entropy. IEEE Signal Process Lett 2007; 14: 157–160.CrossrefGoogle Scholar

  • Bianco AM, Garca Ben M, Martnez EJ, Yohai VJ. Outlier detection in regression models with ARIMA errors using robust estimates. J Forecasting 2001; 20: 565–579.CrossrefGoogle Scholar

  • Bishop CM. Novelty detection and neural network validation. Vision, Image Signal Proc 1994; 141: 217–222.CrossrefGoogle Scholar

  • Bishop CM. Pattern recognition and machine learning. Information science and statistics. New York: Springer-Verlag, 2006.Google Scholar

  • Bishop CM, Svensén M, Williams CK. GTM: the generative topographic mapping. Neural Comput 1998; 10: 215–234.CrossrefGoogle Scholar

  • Björklund S. A survey and comparison of time-delay estimation methods in linear systems. Technical report, Linkopings University, 2003.Google Scholar

  • Blevins T, Nixon M, Zielinski M. Using wireless measurements in control applications. Technical report, Emerson Process Management 2013. URL http://www2.emersonprocess.com/siteadmincenter/PM%20Articles/ISA_Nov13_WirelessHart.pdf.

  • Bode C, Ko B, Edgar T. Run-to-run control and performance monitoring of overlay in semiconductor manufacturing. Control Eng Pract 2004; 12: 893–900.CrossrefGoogle Scholar

  • Bogomolov A. Multivariate process trajectories: capture, resolution and analysis. Chemometr Intell Lab Syst 2011; 108: 49–63.CrossrefGoogle Scholar

  • Bolton RJ, Hand DJ. Unsupervised profiling methods for fraud detection. In: Proc. Credit Scoring and Credit Control VII. Edinburgh, Scotland: Credit Research Centre, University of Edinburgh, 2001; 5–7.Google Scholar

  • Boukouvala F, Muzzio FJ, Ierapetritou MG. Predictive modeling of pharmaceutical processes with missing and noisy data. AIChE J 2010; 56: 2860–2872.CrossrefGoogle Scholar

  • Box GEP, Jenkins GM, Reinsel GC. Time series analysis: forcasting and control, 4th ed., New York:Wiley, 2013.Google Scholar

  • Bradley PS, Fayyad UM, Mangasarian OL. Mathematical programming for data mining: formulations and challenges. INFORMS J Comput 1999; 11: 217–238.CrossrefGoogle Scholar

  • Breiman L. Random forests. Mach Learn 2001; 45: 5–32.CrossrefGoogle Scholar

  • Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. SIGMOD Rec 2000; 29: 93–104.CrossrefGoogle Scholar

  • Brown RG, Hwang PY. Introduction to random signals and applied kalman filtering, 4th ed., New Jersey: John Wiley & Sons, Ltd., 2012.Google Scholar

  • Byers S, Raftery AE. Nearest-neighbor clutter removal for estimating features in spatial point processes. J Am Stat Assoc 1998; 93: 577–584.CrossrefGoogle Scholar

  • Cai Q, He H, Man H. Spatial outlier detection based on iterative self-organizing learning model. Neurocomputing 2013; 117: 161–172.CrossrefGoogle Scholar

  • Camacho J. Missing-data theory in the context of exploratory data analysis. Chemometr Intell Lab Syst 2010; 103: 8–18.CrossrefGoogle Scholar

  • Camacho J. Visualizing big data with compressed score plots: approach and research challenges. Chemometr Intell Lab Syst 2014; 135: 110–125.CrossrefGoogle Scholar

  • Candès EJ, Li X, Ma Y, Wright J. Robust principal component analysis? J ACM 2011; 58: 11: 1–11: 37.CrossrefGoogle Scholar

  • Chaloner K, Byant R. A Bayesian approach to outlier detection and residual analysis. Biometrika 1988; 75: 651–659.CrossrefGoogle Scholar

  • Chang I, Tiao GC, Chen C. Estimation of time series parameters in the presence of outliers. Technometrics 1988; 30: 193–204.CrossrefGoogle Scholar

  • Cheeseman P, Self M, Kelly J, Taylor W, Freeman D, Stutz J. Bayesian classification. In: Proceedings of American Association of Artificial Intelligence (AAAI). San Mateo: Morgan kaufmann, 1988: 607–611.Google Scholar

  • Chen Z. Bayesian filtering: from Kalman filters to particle filters, and beyond. Statistics 2003; 182: 1–69.CrossrefGoogle Scholar

  • Chen WS. Bayesian estimation by sequential Monte Carlo sampling for nonlinear dynamic systems. PhD thesis, The Ohio State University 2004.Google Scholar

  • Chen C, Liu LM. Joint estimation of model parameters and outlier effects in time series. J Am Stat Assoc 1993; 88: 284–297.Google Scholar

  • Chen J, Romagnoli J. A strategy for simultaneous dynamic data reconciliation and outlier detection. Comput Chem Eng 1998; 22: 559–562.CrossrefGoogle Scholar

  • Chen J, Bandoni A, Romagnoli J. Robust statistical process monitoring. Comput Chem Eng 1996; 20, Suppl 1: 497–502.CrossrefGoogle Scholar

  • Chen J, Bandoni A, Romagnoli J. Outlier detection in process plant data. Comput Chem Eng 1998; 22: 641–646.CrossrefGoogle Scholar

  • Chen T, Morris J, Martin E. Dynamic data rectification using particle filters. Comput Chem Eng 2008; 32: 451–462.CrossrefGoogle Scholar

  • Chiang LH, Russell EL, Braatz RD. Fault detection and diagnosis in industrial systems. London: Springer-Verlag, 2001.Google Scholar

  • Chiang LH, Pell RJ, Seasholtz MB. Exploring process data with the use of robust outlier detection algorithms. J Process Control 2003; 13: 437–449.CrossrefGoogle Scholar

  • Cho JH, Lee JM, Choi SW, Lee D, Lee IB. Fault identification for process monitoring using kernel principal component analysis. Chem Eng Sci 2005; 60: 279–288.CrossrefGoogle Scholar

  • Choi SW, Lee C, Lee JM, Park JH, Lee IB. Fault detection and identification of nonlinear processes based on kernel PCA. Chemometr Intell Lab Syst 2005; 75: 55–67.CrossrefGoogle Scholar

  • Chong IG, Jun CH. Performance of some variable selection methods when multicollinearity is present. Chemometr Intell Lab Syst 2005; 78: 103–112.CrossrefGoogle Scholar

  • Christoffersson A. The one component model with incomplete data. PhD thesis, Uppsala University 1970.Google Scholar

  • Comon P. Independent component analysis, a new concept? Signal processing 1994; 36: 287–314.CrossrefGoogle Scholar

  • Cortes C, Vapnik V. Support vector networks. Mach Learn 1995; 20: 273–297.CrossrefGoogle Scholar

  • Croux C, Rousseeuw PJ, Hössjer O. Generalized S-estimators. J Am Stat Assoc 1994; 89: 1271–1281.CrossrefGoogle Scholar

  • Cucina D, di Salvatore A, Protopapas MK. Outliers detection in multivariate time series using genetic algorithms. Chemometr Intell Lab Syst 2014; 132: 103–110.CrossrefGoogle Scholar

  • Cui W, Yan X. Adaptive weighted least square support vector machine regression integrated with outlier detection and its application in QSAR. Chemometr Intell Lab Syst 2009; 98: 130–135.CrossrefGoogle Scholar

  • Daszykowski M, Kaczmarek K, Heyden YV, Walczak B. Robust statistics in data analysis – a review: basic concepts. Chemometr Intell Lab Syst 2007; 85: 203–219.CrossrefGoogle Scholar

  • Davies L, Gather U. The identification of multiple outliers. J Am Stat Assoc 1993; 88: 782–792.CrossrefGoogle Scholar

  • Davis J, Edgar TF, Porter J, Bernaden J, Sarli MS. Smart manufacturing, manufacturing intelligence and demand-dynamic performance. Comput Chem Eng 2012; 47: 145–156.CrossrefGoogle Scholar

  • de la Fuente RLN, Garc a Muñoz S, Biegler LT. An efficient nonlinear programming strategy for PCA models with incomplete data sets. J Chemom 2010; 24: 301–311.Google Scholar

  • de Ligny CL, Nieuwdorp GHE, Brederode WK, Hammers WE, van Houwelingen JC. An application of factor analysis with missing data. Technometrics 1981; 23: 91–95.CrossrefGoogle Scholar

  • de Noord OE, Theobald EH. Multilevel component analysis and multilevel PLS of chemical process data. J Chemom 2005; 19: 301–307.CrossrefGoogle Scholar

  • Dempster AP, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 1977; 39: 1–38.Google Scholar

  • Deng J, Huang B. Identification of nonlinear parameter varying systems with missing output data. AIChE J 2012; 58: 3454–3467.CrossrefGoogle Scholar

  • Di Nuovo AG. Missing data analysis with fuzzy C-Means: a study of its application in a psychological scenario. Expert Syst Appl 2011; 38: 6793–6797.CrossrefGoogle Scholar

  • Dielman TE. Least absolute value regression: recent contributions. J Stat Comput Simul 2005; 75: 263–286.CrossrefGoogle Scholar

  • Doymaz F, Bakhtazad A, Romagnoli JA, Palazoglu A. Wavelet-based robust filtering of process data. Comput Chem Eng 2001; 25: 1549–1559.CrossrefGoogle Scholar

  • Eirola E, Doquire G, Verleysen M, Lendasse A. Distance estimation in numerical data sets with missing values. Inform Sci 2013; 240: 115–128.CrossrefGoogle Scholar

  • Eirola E, Lendasse A, Vandewalle V, Biernacki C. Mixture of Gaussians for distance estimation with missing data. Neurocomputing 2014; 131: 32–42.CrossrefGoogle Scholar

  • Eriksson A, Van Den Hengel A. Efficient computation of robust low-rank matrix approximations in the presence of missing data using the L 1 norm. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. San Francisco, CA, USA: IEEE, 2010; 771–778.Google Scholar

  • Esbensen KH, Halstensen M, Lied TT, Saudland A, Svalestuen J, de Silva S, Hope B. Acoustic chemometrics – from noise to information. Chemometr Intell Lab Syst 1998; 44: 61–76.CrossrefGoogle Scholar

  • Escobar HJG. Advanced monitoring and soft sensor development with application to industrial processes. PhD thesis, Auburn University, 2012.Google Scholar

  • Ester M, peter Kriegel H, S J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996; 226–231.Google Scholar

  • Faloutsos C, Korn F, Labrinidis A, Kotidis Y, Kaplunovich A, Perkovic D. Quantifiable data mining using principal component analysis. Technical Report Technical Report 97-25, College Park, MD: Institute for Systems Research, University of Maryland, 1997.Google Scholar

  • Fernández-Pierna JA, Wahl F, de Noord OE, Massart DL. Methods for outlier detection in prediction. Chemometr Intell Lab Syst 2002; 63: 27–39.CrossrefGoogle Scholar

  • Fernández-Pierna JA, Jin L, Daszykowski M, Wahl F, Massart DL. A methodology to detect outliers/inliers in prediction with PLS. Chemometr Intell Lab Syst 2003; 68: 17–28.CrossrefGoogle Scholar

  • Filzmoser P, Dehon C, Croux C. Outlier resistant estimators for canonical correlation analysis. In: Bethlehem J, van der Heijden P, editors. COMPSTAT. Heidelberg: Physica-Verlag, 2000: 301–306.Google Scholar

  • Fischer B, Medvedev A. L2 time delay estimation by means of Laguerre functions. In: Procedings of American Control Conference, San Diego, CA, USA, volume 1 1999; 455–459.Google Scholar

  • Fox A. Outliers in Time Series. J R Stat Soc Series B Stat Methodol 1972; 34: 350–363.Google Scholar

  • Franses PH, Lucas A. Outlier detection in cointegration analysis. J Bus Econ Stat 1998; 16: 459–468.Google Scholar

  • Frigge M, Hoaglin DC, Iglewicz B. Some implementations of the boxplot. Am Stat 1989; 43: 50–54.Google Scholar

  • Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 1975; 21: 32–40.CrossrefGoogle Scholar

  • Gabriel KR, Zamir S. Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 1979; 21: 489–498.CrossrefGoogle Scholar

  • Galeano P, Peña D, Tsay RS. Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 2006; 101: 654–669.CrossrefGoogle Scholar

  • Galicia HJ, He QP, Wang J. Adaptive outlier detection and classification for online soft sensor update. In: Kariwala, Vinay, Samavedham, Lakshminarayanan, Braatz, Richard D, conference editors. International Symposium on Advanced Control of Chemical Processes ADCHEM, Furama Riverfront, Singapore, 2012a.Google Scholar

  • Galicia H, He Q, Wang J. A Bayesian supervisory approach of outlier detection for recursive soft sensor update. In: CPC VIII Conference, Savannah, Georgia, USA, volume 54 2012b.Google Scholar

  • Galvão RKH, José GE, Filho HAD, Araujo MCU, da Silva EC, Paiva HM, Saldanha TCB, Ênio Sartre Oliveira Nunes de Souza. Optimal wavelet filter construction using X and Y data. Chemometr Intell Lab Syst 2004; 70: 1–10.CrossrefGoogle Scholar

  • Ge Z. Quality prediction and analysis for large-scale processes based on multi-level principal component modeling strategy. Control Eng Pract 2014; 31: 9–23.CrossrefGoogle Scholar

  • Ge Z, Yang C, Song Z. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem Eng Sci 2009; 64: 2245–2255.CrossrefGoogle Scholar

  • Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta 1986; 185: 1–17.CrossrefGoogle Scholar

  • Gómez-Carracedo M, Andrade J, López-Maha P, Muniategui S, Prada D. A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemometr Intell Lab Syst 2014; 134: 23–33.CrossrefGoogle Scholar

  • Graham JW. Missing data analysis: making it work in the real world. Annu Rev Psychol 2009; 60: 549–576.PubMedCrossrefGoogle Scholar

  • Grung B, Manne R. Missing values in principal component analysis. Chemometr Intell Lab Syst 1998; 42: 125–139.CrossrefGoogle Scholar

  • Gupta MR, Chen Y. Theory and use of the EM algorithm. Foundations and trends in signal processing. Norwell, MA: Now Publishers Inc, 2011.Google Scholar

  • Hampel FR. A general qualitative definition of robustness. Ann Math Stat 1971; 42: 1887–1896.CrossrefGoogle Scholar

  • Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69: 383–393.CrossrefGoogle Scholar

  • Han J, Kamber M, Pei J. Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems, 3rd ed., San Francisco: Morgan kaufmann, 2006.Google Scholar

  • Hansson A, Wallin R. Maximum likelihood estimation of Gaussian models with missing data–Eight equivalent formulations. Automatica 2012; 48: 1955–1962.CrossrefGoogle Scholar

  • Hartigan JA, Wong MA. Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc Ser C Appl Stat 1979; 28: 100–108.Google Scholar

  • Hawkins S, He H, Williams G, Baxter R. Outlier detection using replicator neural networks. In: Kambayashi Y, Winiwarter W, Arikawa M, editors. Data warehousing and knowledge discovery. Berlin, Heidelberg: Springer, 2002: 170–180.Google Scholar

  • Haykin S. Kalman filtering and neural networks. New York: Wiley, 2001.Google Scholar

  • Haykin S. Adaptive filter theory. Prentince Hall Information and system sciences series, 5th ed., NJ: Prentice Hall, 2013.Google Scholar

  • Haykin S, Widrow B. Least-mean-square adaptive filters. New Jersey: John Wiley & Sons, Ltd., 2003.Google Scholar

  • He QP, Wang J. Statistics pattern analysis: a new process monitoring framework and its application to semiconductor batch processes. AIChE J 2011; 57: 107–121.CrossrefGoogle Scholar

  • Hodge VJ, Austin J. A survey of outlier detection methodologies. AI Rev 2004; 22: 85–126.Google Scholar

  • Holland PW, Welsch RE. Robust regression using iteratively reweighted least-squares. Commun Stat Theory Methods 1977; 6: 813–827.CrossrefGoogle Scholar

  • Huber PJ. Robust estimation of a location parameter. Ann Math Stat 1964; 35: 73–101.CrossrefGoogle Scholar

  • Hyvärinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw 2000; 13: 411–430.CrossrefPubMedGoogle Scholar

  • Imtiaz SA, Shah SL. Treatment of missing values in process data analysis. Can J Chem Eng 2008; 86: 838–858.CrossrefGoogle Scholar

  • Isaksson A, Horch A, Dumont G. Event-triggered deadtime estimation from closed-loop data. In: Proceeding of American Control Conference, Arlington, VA, USA, volume 4 2001; 3280–3285.CrossrefGoogle Scholar

  • Jaeckel LA. Estimating regression coefficients by minimizing the dispersion of the residuals. Ann Math Stat 1972; 43: 1449–1458.CrossrefGoogle Scholar

  • Japkowicz N, Myers C, Gluck M. A novelty detection approach to classification. In: Proceedings of the Fourteenth Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, 1995; 518–523.Google Scholar

  • Jesús Sánchez M, Peña D. The identification of multiple outliers in ARIMA models. Commun Stat Theory Methods 2003; 32: 1265–1287.CrossrefGoogle Scholar

  • Jiang W, Zhang ZM, Yun Y, Zhan DJ, Zheng YB, Liang YZ, Yang ZY, Yu L. Comparisons of five algorithms for chromatogram alignment. Chromatographia 2013; 76: 1067–1078.CrossrefGoogle Scholar

  • Jutten C, Herault J. Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Processing 1991; 24: 1–10.CrossrefGoogle Scholar

  • Kadlec P, Gabrys B, Strandt S. Data-driven soft sensors in the process industry. Comput Chem Eng 2009; 33: 795–814.CrossrefGoogle Scholar

  • Kalman RE. A new approach to linear filtering and prediction problems. J Fluids Eng 1960; 82: 35–45.Google Scholar

  • Karjala TW, Himmelblau DM. Dynamic rectification of data via recurrent neural nets and the extended Kalman filter. AIChE J 1996; 42: 2225–2239.CrossrefGoogle Scholar

  • Kassidas A, Macgregor JF, Taylor PA. Synchronization of batch trajectories using dynamic time warping. AIChE J 1998; 44: 864–875.CrossrefGoogle Scholar

  • Khatibisepehr S, Huang B. Dealing with irregular data in soft sensors: bayesian method and comparative study. Ind Eng Chem Res 2008; 47: 8713–8723.CrossrefGoogle Scholar

  • Khatibisepehr S, Huang B. A Bayesian approach to robust process identification with ARX models. AIChE J 2013; 59: 845–859.CrossrefGoogle Scholar

  • Kim JO, Curry J. The treatment of missing data in multivariate analysis. Sociol Methods Res 1977; 6: 215–240.CrossrefGoogle Scholar

  • Knapp C, Carter G. The generalized correlation method for estimation of time delay. IEEE Trans Acoust 1976; 24: 320–327.CrossrefGoogle Scholar

  • Knorr EM, Ng RT. Algorithms for mining distancebased outliers in large datasets. In: Ashish Gupta, Oded Shmueli, Jennifer Widom, editors. Proceedings of the international conference on very large data bases. New York City, USA: Morgan Kaufmann, 1998; 392–403.Google Scholar

  • Kohonen T. Self-organizing maps. Springer series in information sciences, 3rd ed., Heidelberg: Physica-Verlag, 1999.Google Scholar

  • Kourti T. Abnormal situation detection, three-way data and projection methods; robust data archiving and modeling for industrial applications. Annu Rev Control 2003; 27: 131–139.CrossrefGoogle Scholar

  • Kourti T, MacGregor JF. Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemometr Intell Lab Syst 1995; 28: 3–21.CrossrefGoogle Scholar

  • Kriegel HP, Kröger P, Schubert E, Zimek A. LoOP: Local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, Hong Kong, China. New York, NY, USA: ACM 2009; 1649–1652.Google Scholar

  • Kriegel HP, Kröger P, Shubert E, Zimek A. Interpreting and unifying outlier scores. In: Proceedings of 11th SIAM International Conference on Data Mining 2011.Google Scholar

  • Ku W, Storer RH, Georgakis C. Disturbance detection and isolation by dynamic principal component analysis. Chemometr Intell Lab Syst 1995; 30: 179–196. InCINC ’94.CrossrefGoogle Scholar

  • Kwak DS, Kim KJ. A data mining approach considering missing values for the optimization of semiconductor-manufacturing processes. Expert Syst Appl 2012; 39: 2590–2596.CrossrefGoogle Scholar

  • Lakshminarayan K, Harp S, Samad T. Imputation of missing data in industrial databases. Appl Intell 1999; 11: 259–275.CrossrefGoogle Scholar

  • Lazarevic A, Kumar V. Feature bagging for outlier detection. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05. New York, NY, USA: ACM 2005; 157–166.Google Scholar

  • Lee J, Kang B, Kang SH. Integrating independent component analysis and local outlier factor for plant-wide process monitoring. J Process Control 2011; 21: 1011–1021.CrossrefGoogle Scholar

  • Leibman M, Edgar T, Lasdon L. Efficient data reconciliation and estimation for dynamic processes using nonlinear programming techniques. Comput Chem Eng 1992; 16: 963–986.CrossrefGoogle Scholar

  • Li W, Bhargava A, Shah SL. Adaptive process monitoring via multichannel EIV lattice filters. AIChE J 2002; 48: 786–799.CrossrefGoogle Scholar

  • Liebman M. Reconciliation of process measurements using statistical and nonlinear programming techniques. PhD thesis, University of Texas at Austin 1991.Google Scholar

  • Little RJA. Missing-data adjustments in large surveys. J Bus Econ Stat 1988; 6: 287–296.Google Scholar

  • Little RJA, Rubin RB. Statistical analysis with missing data, 2nd ed., New York:Wiley, 2002.Google Scholar

  • Liu Y, Chen J. Correntropy Kernel learning for nonlinear system identification with outliers. Ind Eng Chem Res 2014; 53: 5248–5260.CrossrefGoogle Scholar

  • Liu H, Shah S, Jiang W. On-line outlier detection and data cleaning. Comput Chem Eng 2004; 28: 1635–1647.CrossrefGoogle Scholar

  • Ljung GM. On outlier detection in time series. J R Stat Soc Series B Stat Methodol 1993; 55: 559–567.Google Scholar

  • Lopes VV, Menezes JC. Inferential sensor design in the presence of missing data: a case study. Chemometr Intell Lab Syst 2005; 78: 1–10.CrossrefGoogle Scholar

  • Losada RA. Digitial FIlters with MATLAB. The MathWorks, Inc, 2008. URL http://www.mathworks.com/tagteam/55876_digfilt.pdf. Accessed on December 4, 2014.

  • Lu B, Castillo I, Chiang L, Edgar TF. Industrial PLS model variable selection using moving window variable importance in projection. Chemometr Intell Lab Syst 2014; 135: 90–109.CrossrefGoogle Scholar

  • Lütkepohl H, Saikkonen P, Trenkler C. Testing for the cointegrating rank of a VAR process with level shift at unknown time. Econometrica 2004; 72: 647–662.CrossrefGoogle Scholar

  • Lydon B. Internet of things industrial automation industry exploring and implementing IoT. InTech Magazine 2014; URL https://www.isa.org/standards-and-publications/isa-publications/intech-magazine/2014/mar-apr/cover-story-internet-of-things/. Accessed on October 7, 2014.

  • Ma Y, Shi H, Ma H, Wang M. Dynamic process monitoring using adaptive local outlier factor. Chemometr Intell Lab Syst 2013; 127: 89–101.CrossrefGoogle Scholar

  • MacGregor J, Kourti T. Statistical process control of multivariate processes. Control Eng Pract 1995; 3: 403–414.CrossrefGoogle Scholar

  • Mallat S. A wavelet tour of signal processing, 3rd ed., The sparse way: Academic Press, 2008.Google Scholar

  • Mallows CL. On some ttopic in robustness. Technical report, Murray Hill, New Jersey: Bell Telephone Laboratories Technical Memorandum, 1975.Google Scholar

  • Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH. Big data: the next frontier for innovation, competition, and productivity. The McKinsey Global Institute, McKinsey & Company, 2011. URL http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation. Accessed on October 7,2014.

  • Marsland S. on-line novelty detection through self-organization, with application to Inspection robotics. PhD thesis, University of Manchester 2001.Google Scholar

  • Martens H, Næs T. Multivariate calibration, 1st ed., New Jersey: John Wiley & Sons, Ltd., 1989.Google Scholar

  • Martin R, Thomson D. Robust-resistant spectrum estimation. Proc IEEE 1982; 70: 1097–1115.CrossrefGoogle Scholar

  • Martin RD, Yohai VJ. Influence functionals for time series. Ann Stat 1986; 14: 781–818.CrossrefGoogle Scholar

  • McBrayer KF, Edgar TF. Bias detection and estimation in dynamic data reconciliation. J Process Control 1995; 5: 285–289.CrossrefGoogle Scholar

  • Mehmood T, Liland KH, Snipen L, Sæbø S. A review of variable selection methods in partial least squares regression. Chemometr Intell Lab Syst 2012; 118: 62–69.CrossrefGoogle Scholar

  • Miao Y, Su H, Wang W, Chu J. Simultaneous data reconciliation and joint bias and leak estimation based on support vector regression. Comput Chem Eng 2011; 35: 2141–2151.CrossrefGoogle Scholar

  • Micić AD, Mataušek MR. Optimization of PID controller with higher-order noise filter. J Process Control 2014; 24: 694–700.CrossrefGoogle Scholar

  • Mitchell TM. Machine learning, 1st ed., New York: McGraw-Hill, 1997.Google Scholar

  • Munoz JC, Chen J. Removal of the effects of outliers in batch process data through maximum correntropy estimator. Chemometr Intell Lab Syst 2012; 111: 53–58.CrossrefGoogle Scholar

  • Muñoz A, Muruzábal J. Self-organizing maps for outlier detection. Neurocomputing 1998; 18: 33–60.CrossrefGoogle Scholar

  • Muteki K, MacGregor JF, Ueda T. Estimation of missing data using latent variable methods with auxiliary information. Chemometr Intell Lab Syst 2005; 78: 41–50.CrossrefGoogle Scholar

  • Nairac A, Townsend N, Carr R, King S, Cowley P, Tarassenko L. A system for the analysis of jet engine vibration data. Integr Comput Aided Eng 1999; 6: 53–66.Google Scholar

  • Narasimhan S, Jordache C. Data reconciliation and gross error detection. Burlington: TX: Gulf Professional Publishing, 1999.Google Scholar

  • Natrella M. e-Handbook of statistical methods. NIST/SEMATECH 2010. URL http://www.itl.nist.gov/div898/handbook/. Accessed on September 7, 2014.

  • Nelson PR. The treatment of missing measurements in PCA and PLS models. PhD thesis, McMaster University 2002.Google Scholar

  • Nelson PR, Taylor PA, MacGregor JF. Missing data methods in PCA and PLS: score calculations with incomplete observations. Chemometr Intell Lab Syst 1996; 35: 45–65.CrossrefGoogle Scholar

  • Ni B, Xiao D, Shah SL. Time delay estimation for MIMO dynamical systems C with time-frequency domain analysis. J Process Control 2010; 20: 83–94.CrossrefGoogle Scholar

  • Nielsen NPV, Carstensen JM, Smedsgaard Jr. Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping. J Chromatogr A 1998; 805: 17–35.CrossrefGoogle Scholar

  • Okatani T, Deguchi K. On the Wiberg algorithm for matrix factorization in the presence of missing components. Int J Comput Vis 2007; 72: 329–337.CrossrefGoogle Scholar

  • Oppenheim G, Philippe A, de Rigal J. The particle filters and their applications. Chemometr Intell Lab Syst 2008; 91: 87–93.CrossrefGoogle Scholar

  • Orfanidis SJ. Introduction to signal processing. NJ: Prentice Hall, 1996.Google Scholar

  • Pearson R. Outliers in process modeling and identification. IEEE Trans Control Syst Technol 2002; 10: 55–63.CrossrefGoogle Scholar

  • Pell RJ. Multiple outlier detection for multivariate calibration using robust statistical techniques. Chemometr Intell Lab Syst 2000; 52: 87–104.CrossrefGoogle Scholar

  • Peña D. Influential observations in time series. J Bus Econ Stat 1990; 8: 235–241.Google Scholar

  • Pison G, Van Aelst S. Analyzing data with robust multivariate methods and diagnostic plots. In: Härdle W, Rónz B, editors. Compstat. Heidelberg: Physica-Verlag, 2002: 165–170.Google Scholar

  • Prabhu AV, Edgar TF, Good R. Missing data estimation for run-to-run EWMA-controlled processes. Comput Chem Eng 2009; 33: 1861–1869.CrossrefGoogle Scholar

  • Prakash J, Huang B, Shah SL. Recursive constrained state estimation using modified extended Kalman filter. Comput Chem Eng 2014; 65: 9–17.CrossrefGoogle Scholar

  • Puwakkatiya-Kankanamage EH, Garca-Muñoz S, Biegler LT. An optimization-based undeflated PLS (OUPLS) method to handle missing data in the training set. J Chemom 2014; 28: 575–584.CrossrefGoogle Scholar

  • Qin JS. Process data analytics in the era of big data. AIChE J 2014; 60: 3092–3100.CrossrefGoogle Scholar

  • Qin SJ, Valle S, Piovoso MJ. On unifying multiblock analysis with application to decentralized process monitoring. J Chemom 2001; 15: 715–742.CrossrefGoogle Scholar

  • Quinlan J. Induction of decision trees. Mach Learn 1986; 1: 81–106.CrossrefGoogle Scholar

  • Quinlan JR. C4.5: Programs for machine learning. Morgan Kaufmann Series in Machine Learning, 1st ed., San Mateo: Morgan kaufmann, 1993.Google Scholar

  • Rabiner LR, Gold B. Theory and application of digital signal processing. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1975.Google Scholar

  • Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00. New York, NY, USA: ACM 2000; 427–438.Google Scholar

  • Raymond MR, Roberts DM. A comparison of methods for treating incomplete data in selection research. Educ Psychol Meas 1987; 47: 13–26.CrossrefGoogle Scholar

  • Ren X, Rad A, Chan P, Lo W. Online identification of continuous-time systems with unknown time delay. IEEE Trans Automat Contr 2005; 50: 1418–1422.CrossrefGoogle Scholar

  • Reynolds D. Gaussian mixture models. In: Encyclopedia of biometrics. New York, NY: Springer, 2009: 659–663.Google Scholar

  • Richard JP. Time-delay systems: an overview of some recent advances and open problems. Automatica 2003; 39: 1667–1694.CrossrefGoogle Scholar

  • Roberts SJ. Novelty detection using extreme value statistics. Vision, Image Signal Proc 1999; 146: 124–129.CrossrefGoogle Scholar

  • Roberts S, Tarassenko L. A probabilistic resource allocating network for novelty detection. Neural Comput 1994; 6: 270–284.CrossrefGoogle Scholar

  • Rokach L, Maimon O. Data mining with decision trees: theory and applications. Series in Machine Perception and Artificial Intelligence– Vol. 81, 2nd ed., Singapore: World Scientific, 2014.Google Scholar

  • Rosenblatt F. Principles of neurodynamics, perceptrons and the theory of brain mechanisms, 1st ed., Washington, DC: Spartan Books, 1961.Google Scholar

  • Roth PL. Missing data: a conceptual review for applied psychologists. Pers Psychol 1994; 47: 537–560.CrossrefGoogle Scholar

  • Rousseeuw PJ. Least median of squares regression. J Am Stat Assoc 1984; 79: 871–880.CrossrefGoogle Scholar

  • Rousseeuw PJ. Multivariate estimation with high breakdown point. Math Stat Appl 1985; B: 283–297.CrossrefGoogle Scholar

  • Rousseeuw PJ, Leroy AM. Robust regression and outlier detection. Wiley series in probability and statistics, 3rd ed., Hoboken, New Jersey: John Wiley & Sons, Inc., 1996.Google Scholar

  • Rousseeuw PJ, Van Driessen K. A Fast Algorithm for the minimum covariance determinant estimator. Technometrics 1999; 41: 212–223.CrossrefGoogle Scholar

  • Rousseeuw P, Yohai V. Robust regression by means of S-estimators. In: Robust and nonlinear time series analysis. New York: Springer-Verlag, 1984: 256–272.Google Scholar

  • Rubin DB. Multiple imputation for nonresponse in surveys. Wiley series in probability and mathematical statistics, 1st ed., New Jersey: John Wiley & Sons, Ltd., 1987.Google Scholar

  • Russell S, Norvig P. Artificial intelligence: a modern approach, 3rd ed., NJ: Prentice Hall, 2009.Google Scholar

  • Russell EL, Chiang LH, Braatz RD. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemometr Intell Lab Syst 2000; 51: 81–93.CrossrefGoogle Scholar

  • Santos TL, Botura PE, Normey-Rico JE. Dealing with noise in unstable dead-time process control. J Process Control 2010; 20: 840–847.CrossrefGoogle Scholar

  • Savitzky A, Golay MJ. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 1964; 36: 1627–1639.CrossrefGoogle Scholar

  • Schafer JL. Analysis of incomplete multivariate data. CRC monographs on statistics & applied probability, 1st ed., Florida: Chapman & Hall/CRC, 1997.Google Scholar

  • Schafer JL, Graham JW. Missing data: our view of the state of the art. Pyschol Methods 2002; 7: 147–177.CrossrefGoogle Scholar

  • Schölkopf B, Smola A, Müller KR. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 1998; 10: 1299–1319.CrossrefGoogle Scholar

  • Seborg DE, Mellichamp DA, Edgar TF, Doyle III FJ. Process dynamics and control, 3rd ed., New York, NY: John Wiley & Sons, 2010.Google Scholar

  • Segovia VR, Hägglund T, Åström K. Measurement noise filtering for PID controllers. J Process Control 2014; 24: 299–313.CrossrefGoogle Scholar

  • Serneels S, Verdonck T. Principal component analysis for data containing outliers and missing elements. Comput Stat Data Anal 2008; 52: 1712–1727.CrossrefGoogle Scholar

  • Shekhar S, Lu CT, Zhang P. Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. San Francisco, CA, USA; New York, NY: ACM, 2001; 371–376.Google Scholar

  • Shen H, Nelson G, Kennedy S, Nelson D, Johnson J, Spiller D, White MR, Kell DB. Automatic tracking of biological cells and compartments using particle filters and active contours. Chemometr Intell Lab Syst 2006; 82: 276–282.CrossrefGoogle Scholar

  • Shubert E, Wojdanowski R, Kriegel HP, Zimek A. On evaluation of outlier rankings and outlier scores. In: Proceedings of 12th SIAM International Conference on Data Mining, Anaheim, CA, USA, 2012.Google Scholar

  • Shum HY, Ikeuchi K, Reddy R. Principal component analysis with missing data and its application to polyhedral object modeling. IEEE Trans Pattern Anal Mach Intell 1995; 17: 854–867.CrossrefGoogle Scholar

  • Silva-Ramírez EL, Pino-Mejas R, López-Coello M, de-la Vega MDC. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 2011; 24: 121–129.CrossrefGoogle Scholar

  • Singh A. Outliers and robust procedures in some chemometric applications. Chemometr Intell Lab Syst 1996; 33: 75–100.CrossrefGoogle Scholar

  • Soderstrom T. Integration of on-line data reconciliation and bias identification techniques. PhD thesis, The University of Texas at Austin 2001.Google Scholar

  • Soderstrom TA, Himmelblau DM, Edgar TF. A mixed integer optimization approach for simultaneous data reconciliation and identification of measurement bias. Control Eng Pract 2001; 9: 869–876.CrossrefGoogle Scholar

  • Tang J, Chen Z, chee Fu AW, Cheung D. A robust outlier detection scheme for large data sets. In: Cheng, Ming-shan, Yu, Philip S, Liu, Bing, editors. Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining, Hong Kong, China 2001; London, UK: Springer-Verlag, 2002, 6–8.Google Scholar

  • Tang J, Chen Z, Fu AWC, Cheung DW. Enhancing effectiveness of outlier detections for low density patterns. In: Advances in knowledge discovery and data mining, Berlin, Heidelberg: Springer, 2002: 535–548.Google Scholar

  • Tax DMJ, Duin RPW. Support vector data description. Mach Learn 2004; 54: 45–66.CrossrefGoogle Scholar

  • Tham MT, Montague GA, Morris AJ, Lant PA. Soft-sensors for process estimation and inferential control. J Process Control 1991; 1: 3–14.CrossrefGoogle Scholar

  • Toprac AJ, Downey DJ, Gupta S. Run-to-run control process for controlling critical dimensions 1999. URL http://www.google.com/patents/US5926690. Accessed on October 20, 2014.

  • Torr PHS, Murray DW. Outlier detection and motion segmentation. 1993; 2059: 432–443.Google Scholar

  • Tsay RS. Outliers, level shifts, and variance changes in time series. J Forecasting 1988; 7: 1–20.CrossrefGoogle Scholar

  • Tsay RS, Peña D, Pankratz AE. Outliers in multivariate time series. Biometrika 2000; 87: 789–804.CrossrefGoogle Scholar

  • Tsikriktsis N. A review of techniques for treating missing data in OM survey research. J Oper Manag 2005; 24: 53–62.CrossrefGoogle Scholar

  • Tukey JW. Exploratory data analysis. Behavior science, 1st ed., London: Pearson, 1977.Google Scholar

  • van Dyk DA, Meng XL. The art of data augmentation. J Comput Graph Stat 2001; 10: 1–50.CrossrefGoogle Scholar

  • Vatanen T. Missing value imputation using subspace methods with applications on survey data. Master’s thesis, Aalto University, Espoo, Finland 2012.Google Scholar

  • Vatanen T, Osmala M, Raiko T, Lagus K, Sysi-Aho M, Orešic M, Honkela T, hdesmä ki HL. Self-organization and missing values in SOM and GTM. Neurocomputing 2015; 147: 60–70.CrossrefGoogle Scholar

  • Venkatasubramanian V. Drowning in data: informatics and modeling challenges in a data-rich networked world. AIChE J 2009; 55: 2–8.CrossrefGoogle Scholar

  • Verboven S, Hubert M. LIBRA: a MATLAB library for robust analysis. Chemometr Intell Lab Syst 2005; 75: 127–136.CrossrefGoogle Scholar

  • Vetterli M, Herley C. Wavelets and filter banks: theory and design. IEEE Trans Signal Proc 1992; 40: 2207–2232.CrossrefGoogle Scholar

  • Walczak B, Massart D. Dealing with missing data: Part I. Chemometr Intell Lab Syst 2001a; 58: 15–27.CrossrefGoogle Scholar

  • Walczak B, Massart D. Dealing with missing data: Part II. Chemometr Intell Lab Syst 2001b; 58: 29–42.CrossrefGoogle Scholar

  • Wang J, He QP. A bayesian approach for disturbance detection and classification and its application to state estimation in run-to-run control. IEEE Trans Semiconduct Manufact 2007; 20: 126–136.CrossrefGoogle Scholar

  • Wang J, He QP. Multivariate statistical process monitoring based on statistics pattern analysis. Ind Eng Chem Res 2010; 49: 7858–7869.CrossrefGoogle Scholar

  • Weber R. Measurement smoothing with a nonlinear exponential filter. AIChE J 1980; 26: 132–134.CrossrefGoogle Scholar

  • Wentzell PD, Andrews DT, Hamilton DC, Faber K, Kowalski BR. Maximum likelihood principal component analysis. J Chemom 1997; 11: 339–366.CrossrefGoogle Scholar

  • Westerhuis JA, Kourti T, MacGregor JF. Analysis of multiblock and hierarchical PCA and PLS models. J Chemom 1998; 12: 301–321.CrossrefGoogle Scholar

  • Wettschereck D. A study of distance-based machine learning algorithms. PhD thesis, Department of Computer Science, Oregon State University, Corvallis 1994.Google Scholar

  • Wiberg T. Computation of principal components when data are missing. In: Symposium of Computational Statistics. Berlin, Germany, 1976; 229–236.Google Scholar

  • Wiegand P, Pell R, Comas E. Simultaneous variable selection and outlier detection using a robust genetic algorithm. Chemometr Intell Lab Syst 2009; 98: 108–114.CrossrefGoogle Scholar

  • Wiener N. Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications, 1st ed., Cambridge, MA: MIT Press, 1964.Google Scholar

  • Willems G, Pison G, Rousseeuw P, Van Aelst S. A hotelling test based on MCD. In: Härdle W, Rönz B, editors. Compstat. Heidelberg: Physica-Verlag, 2002: 117–122.Google Scholar

  • Wise BM, Gallagher NB. Multivariate modeling of batch processes using summary variables. Technical report, Eigenvector Research, Inc., Wenatchee, WA, 2011.Google Scholar

  • Xu S, Baldea M, Edgar TF, Wojsznis W, Blevins T, Nixon M. An improved methodology for outlier detection in dynamic datasets. AIChE J 2015; 61: 419–433.CrossrefGoogle Scholar

  • Yan X. Multivariate outlier detection based on self-organizing map and adaptive nonlinear map and its application. Chemometr Intell Lab Syst 2011; 107: 251–257.CrossrefGoogle Scholar

  • Yang ZJ, Hachino T, Tsuji T. On-line identification of continuous time-delay systems combining least-squares techniques with a genetic algorithm. Int J Control 1997; 66: 23–42.CrossrefGoogle Scholar

  • Yu J, Qin SJ. Multimode process monitoring with Bayesian inference-based finite Gaussian mixture models. AIChE J 2008; 54: 1811–1829.CrossrefGoogle Scholar

  • Zeng J, Gao C. Improvement of identification of blast furnace ironmaking process by outlier detection and missing value imputation. J Process Control 2009; 19: 1519–1528.CrossrefGoogle Scholar

  • Zhang Y, Abdulla WH. A comparative study of time-delay estimation techniques using microphone arrays. Technical Report 619, Department of Electrical and Computer Engineering, The University of Auckland 2005.Google Scholar

  • Zhang Z, Chen J. Simultaneous data reconciliation and gross error detection for dynamic systems using particle filter and measurement test. Comput Chem Eng 2014; 69: 66–74.CrossrefGoogle Scholar

  • Zhang Z, Dong F. Fault detection and diagnosis for missing data systems with a three time-slice dynamic Bayesian network approach. Chemometr Intell Lab Syst 2014; 138: 30–40.CrossrefGoogle Scholar

  • Zhang S, Qin Z, Ling CX, Sheng S. “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Trans Knowl Data Eng 2005; 17: 1689–1693.CrossrefGoogle Scholar

  • Zhao Z, Huang B, Liu F. Bayesian method for state estimation of batch process with missing data. Comput Chem Eng 2013a; 53: 14–24.CrossrefGoogle Scholar

  • Zhao Z, Huang B, Liu F. Parameter estimation in batch process using EM algorithm with particle filter. Comput Chem Eng 2013b; 57: 159–172.CrossrefGoogle Scholar

  • Zhao Z, Li Q, Huang M, Liu F. Concurrent PLS-based process monitoring with incomplete input and quality measurements. Comput Chem Eng 2014; 67: 69–82.CrossrefGoogle Scholar

  • Zhen D, Zhao HL, Gu F, Ball AD. Phase-compensation-based dynamic time warping for fault diagnosis using the motor current signal. Meas Sci Technol 2012; 23: 55601.CrossrefGoogle Scholar

  • Zhou DH, Frank PM. A real-time estimation approach to time-varying time delay and parameters of NARX processes. Comput Chem Eng 2000; 23: 1763–1772.CrossrefGoogle Scholar

  • Zhou XY, Lim JS. Replace missing values with EM algorithm based on GMM and naive Bayesian. Int J Soft Eng Res Appl 2014; 8: 177–188.Google Scholar

  • Zhou J, Luecke R. Estimation of the covariances of the process noise and measurement noise for a linear discrete dynamic system. Comput Chem Eng 1995; 19: 187–195.CrossrefGoogle Scholar

  • Zhu J, Ge Z, Song Z. Robust modeling of mixture probabilistic principal component analysis and process monitoring application. AIChE J 2014; 60: 2143–2157.CrossrefGoogle Scholar

  • Zikopoulos PC, Eaton C, deRoos D, Deutsch T, Lapis G. Understanding big data: analytics for enterprise class hadoop and streaming data. New York: McGraw-Hill Osborne Media, 2011.Google Scholar

About the article

Shu Xu

Shu Xu is a PhD candidate in the Department of Chemical Engineering at the University of Texas at Austin. His research focuses on data cleaning and data analytics in process industries. He received his Bachelor’s degree from the Department of Chemical Engineering at Tianjin University, China.

Bo Lu

Bo Lu is a PhD candidate in the Department of Chemical Engineering at the University of Texas at Austin. His research focuses on data-driven modeling and process monitoring of batch processes in chemical manufacturing industries, specifically the development of PLS based inferential sensors for batch processes. He received his bachelor’s degree from the Department of Chemical Engineering at the University of Alberta in Edmonton, Canada.

Michael Baldea

Michael Baldea is Assistant Professor in the McKetta Department of Chemical Engineering at the University of Texas at Austin. He obtained his Diploma (2000) and MSc (2001) from “Babes-Bolyai” University in Cluj-Napoca, Romania, and his PhD (2006) from the University of Minnesota, all in chemical engineering. His research concentrates on the modeling, analysis, optimization and control of process and energy systems, areas in which he has published over 60 refereed papers. He is the recipient of several research and service awards, including the NSF CAREEER Award, the Moncrief Grand Challenges Prize, the Model-Based Innovation Prize from Process Systems Enterprise, and the Best Referee Award from the Journal of Process Control.

Thomas F. Edgar

Thomas F. Edgar is the Abell Chair in Engineering Professor of Chemical Engineering at the University of Texas at Austin and Director of the UT Energy Institute. Dr. Edgar received his BS degree in chemical engineering from the University of Kansas and a PhD from Princeton University. For the past 40 years, he has concentrated his academic work in process modeling, control, and optimization, with over 400 articles and book chapters. Dr. Edgar has received major awards from AIChE, ASEE, and AACC and is a member of the National Academy of Engineering.

Willy Wojsznis

Willy Wojsznis has earned his Engineering Degree in Electrical Engineering in 1964 and his PhD from Technical University of Warsaw in 1973. Since 1991 Willy is with Emerson Process Management developing advanced control products. Recently he has been involved in Big Data research. His research and development resulted in 38 US patents and over 50 technical conference and journal papers. He coauthored ISA bestseller books Advanced control unleashed, Advanced control foundation, Wireless control foundation, and a chapter of ISA/CRC instrumentation handbook. Willy is inducted into a Control Magazine’s Process Automation Hall of Fame and is ISA Fellow and IEEE senior member.

Terrence Blevins

Terrence Blevins received a Master of Science in Electrical Engineering from Purdue University in 1973. He lead the development of DeltaV advanced control products and coauthored the book Wireless Control Foundation and ISA bestselling books Advanced Control Foundation and Control Loop Foundation. Terry is a member of Control Magazine’s Process Automation Hall of Fame and an ISA Fellow. Presently, he is a principal technologist in the applied research team at Emerson Process Management.

Mark Nixon

Mark Nixon was lead architect for DeltaV from its inception through 2005. In 2006 he took a very active role in the design and standardization of WirelessHART. He currently leads the applied research group where he is pursuing his interests in control, big data analytics, wireless, operator interfaces, and advanced graphics. He holds over 90 patents and has coauthored four books on wireless and control. He is an ISA Fellow and a member of the Automation Hall of Fame. He received his bachelors from the University of Waterloo in Canada.

Corresponding author: Thomas F. Edgar, McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 78712, USA, e-mail:

Received: 2015-04-08

Accepted: 2015-08-12

Published Online: 2015-09-15

Published in Print: 2015-10-01

Citation Information: Reviews in Chemical Engineering, Volume 31, Issue 5, Pages 453–490, ISSN (Online) 2191-0235, ISSN (Print) 0167-8299, DOI: https://doi.org/10.1515/revce-2015-0022.

Export Citation

©2015 by De Gruyter.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Junhua Zheng, Jinlin Zhu, Guangjie Chen, Zhihuan Song, and Zhiqiang Ge
Engineering Applications of Artificial Intelligence, 2020, Volume 89, Page 103475
Xi Wang and Chen Wang
IEEE Access, 2020, Volume 8, Page 1866
Esa Hämäläinen and Tommi Inkinen
Journal of Industrial Information Integration, 2019, Page 100105
Anna Stief, Ruomu Tan, Yi Cao, James R. Ottewill, Nina F. Thornhill, and Jerzy Baranowski
Journal of Process Control, 2019, Volume 79, Page 41
Marcos Quiñones-Grueiro, Alberto Prieto-Moreno, Cristina Verde, and Orestes Llanes-Santiago
Chemometrics and Intelligent Laboratory Systems, 2019, Volume 189, Page 56
Daniel Laky, Shu Xu, Jose Rodriguez , Shankar Vaidyaraman , Salvador García Muñoz , and Carl Laird 
Processes, 2019, Volume 7, Number 2, Page 96
Ahmed Ragab, Mohamed El Koujok, Hakim Ghezzaz, Mouloud Amazouz, Mohamed-Salah Ouali, and Soumaya Yacout
Expert Systems with Applications, 2019, Volume 122, Page 388
Weikai Wang, Kirubakaran Velswamy, Kuangrong Hao, Lei Chen, and Witold Pedrycz
Knowledge-Based Systems, 2018
Biao Wang, Zhizhong Mao, and Keke Huang
Chemical Engineering Research and Design, 2018
Fangmin Xu, Chenyang Zheng, and Haiyan Cao
Mathematical Problems in Engineering, 2018, Volume 2018, Page 1
Shile Shen, Shujuan Tan, Sai Wu, Cheng Guo, Juan Liang, Qian Yang, Guoyue Xu, and Jie Deng
Energy Conversion and Management, 2018, Volume 157, Page 41
Zhiqiang Ge, Zhihuan Song, Steven X. Ding, and Biao Huang
IEEE Access, 2017, Volume 5, Page 20590

Comments (0)

Please log in or register to comment.
Log in