Accessible Requires Authentication Published by Oldenbourg Wissenschaftsverlag April 30, 2020

Data selection for system identification (DS4SID) from logged process records of continuously operated plants

Zur Selektion von Daten aus Prozessdatenarchiven kontinuierlich betriebener Produktionsanlagen für die Systemidentifikation
David Arengas and Andreas Kroll


Use of historical logged data can be considered for system identification if performing dedicated experiments is not possible. Continuously operated plants are examples of processes where experiments for system identification are typically restricted due to a possibly negative impact on production. However, process variables are logged for long periods of time which results in large databases that are a valuable source of information for model estimation. Automatic selection of informative data intervals can support system identification when use of logged process data is addressed. A new method is presented that differs in several aspects from current approaches. Firstly, interval bounding is performed using the gradient of a norm associated to the resulting information matrix which decreases interval misdetection. Secondly, process data do not need to be normalized for change detection. Thirdly, an instrumental variables identification method is used which offers robustness to autocorrelated noise. Lastly, the proposed selection technique can be applied to multivariate processes. The performance of the proposed method is demonstrated in a case study implemented in a lab-scale chemical plant.


Aufgezeichnete Daten können für die Systemidentifikation verwendet werden, falls die Durchführung gezielter Experimenten zur Datengewinnung eingeschränkt ist. Letzteres ist oft in kontinuierlich betriebenen Produktionsanlagen der Fall, da mögliche negative Auswirkung auf die Produktion vermieden werden sollen. Prozessgrößen werden jedoch häufig über Jahre erfasst, was zu großen Datenbeständen führt. Diese stellen eine wertvolle Informationsquelle für die datengetriebene Modellierung dar. Eine manuelle Auswahl der meist seltenen informativen Datensequenzen bedeutet einen sehr großen Aufwand, was eine automatisierte Auswahl attraktiv macht. In diesem Beitrag wird ein neues Verfahren vorgestellt, welches sich in mehreren Punkten von bekannten Verfahren abhebt: erstens werden die Grenzen informativer Intervalle basierend auf dem Gradienten einer Norm auf der Informationsmatrix ermittelt, wodurch die Häufigkeit fehlerhafter Detektionen verringert wird. Zweitens müssen die Prozessdaten nicht normiert werden. Drittens wird ein gegenüber autokorreliertem Rauschen robustes Hilfsvariablenverfahren angewandt. Viertens kann das vorgeschlagene Verfahren auf Mehrgrößenprozesse angewendet werden. In einer Fallstudie in der Prozessinsel einer Modellfabrik wird das vorgestellte Verfahren für industrienahe Signaleigenschaften demonstriert.


1. D. Arengas and A. Kroll. “A Search Method for Selecting Informative Data in Predominantly Stationary Historical Records for Multivariable Systems.” In: 21st International Conference on System Theory, Control and Computing ICSTCC 2017. Sinaia, Romania, 2017, pp. 100–105. Search in Google Scholar

2. D. Arengas and A. Kroll. “Searching for Informative Intervals in Predominantly Stationary Data Records to Support System Identification.” In: 26th International Conference on Information, Communication and Automation Technologies ICAT 2017. Sarajevo, Bosnia & Herzegovina, 2017, pp. 132–137. Search in Google Scholar

3. D. Arengas and A. Kroll. “Removal of Insufficiently Informative Data to Support System Identification in MISO Processes.” In: 17th European Control Conference (ECC). Limassol, Cyprus, 2018, pp. 2842–2847. Search in Google Scholar

4. D. Arengas and A. Kroll. “A Data Selection Method for Large Databases Based on Recursive Instrumental Variables for System Identification of MISO Models.” In: 2019 18th European Control Conference (ECC). IEEE. Naples, Italy, 2019, pp. 357–362. Search in Google Scholar

5. M. Basseville, I. V. Nikiforov, et al.Detection of abrupt changes: theory and application. Vol. 104. Prentice Hall, Englewood Cliffs, 1993. Search in Google Scholar

6. A. C. Bittencourt, A. J. Isaksson, D. Peretzki and K. Forsman. “An Algorithm for Finding Process Identification Intervals from Normal Operating Data.” In: Processes 3 (2015), pp. 357–383. Search in Google Scholar

7. S. Cao and R. R. R. Rhinehart. “An Efficient Method for On-line Identification of Steady State.” In: Journal of Process Control 5.6 (1995), pp. 363–374. Search in Google Scholar

8. P. Carrette, G. Bastin, Y. Y. Genin and M. Gevers. “Discarding Data May Help in System Identification.” In: IEEE Transactions on Signal Processing 44.9 (1996), pp. 23002310. Search in Google Scholar

9. B. Friedlander. “The Overdetermined Recursive Instrumental Variable Method.” In: IEEE Transactions on Automatic Control AC-29.4 (1984), pp. 353–356. Search in Google Scholar

10. M. Gevers, A. S. Bazanella, X. Bombois and L. Miskovic. “Identification and the Information Matrix: How to Get Just Sufficiently Rich?” In: IEEE Transactions on Automatic Control 54.12 (2009), pp. 2828–2840. Search in Google Scholar

11. F. Gustafsson. Adaptive Filtering and Change Detection. John Wiley & Sons, 2006. Search in Google Scholar

12. A. Horch. “Condition Monitoring of Control Loops.” Ph.D. Thesis. Royal Institute of Technology, Stockholm, 2000. Search in Google Scholar

13. S. Kay. Fundamentals of Statistical Signal Processing: Detection theory. Prentice Hall Signal Processing Series. Prentice-Hall PTR, 1998. Search in Google Scholar

14. A. Kroll, A. Durrbaum, D. Arengas, B. Jaschke, H. Al Mawla and A. Geiger. “µPlant: Model factory for the automatization of networked, heterogeneous and flexibly changeable multi-product plants.” In: Proceedings of Automation 2016. VDI-Berichte 2284. Baden-Baden, 2016. Search in Google Scholar

15. D. J. Leith, D. J. Murray-Smith and R. Bradley. “Combination of Data Sets for System Identification.” In: IEE Proceedings-D (Control Theory and Applications). Vol. 140.1. IET. 1993, pp. 11–18. Search in Google Scholar

16. L. Ljung. System Identification: Theory for the user. 2nd Edition. Prentice Hall, 1999. Search in Google Scholar

17. L. Ljung. “Perspectives on system identification.” In: Annual Reviews in Control 34.1 (2010), pp. 1–12. Search in Google Scholar

18. J. G. D. Oliveira and C. Garcia. “Algorithm-Aided Identification Using Historic Process Data.” In: XIII Simposio Brasileiro de Automagao Intelligente. Porto Alegre – RS, Brazil, 2017, pp. 1235–1240. Search in Google Scholar

19. D. Peretzki, A. J. Isaksson, A. C. Bittencourt and K. Forsman. “Data Mining of Historic Data for Process Identification.” In: 2011 AIChE Annual Meeting. Minneapolis, MN, USA, 2011. Search in Google Scholar

20. Y. A. W. Shardt and L. S. Shah. “Segmentation Methods for Model Identification from Historical Process Data.” In: 14th IFAC World Congress. Cape Town, South Africa, 2014, pp. 2836–2841. Search in Google Scholar

21. Y. A. Shardt and B. Huang. “Data quality assessment of routine operating data for process identification.” In: Computers & chemical engineering 55.8 (2013), pp. 19–27. Search in Google Scholar

22. T. Soderstrom and P. Stoica. “Comparison of some instrumental variable methods—consistency and accuracy aspects.” In: Automatica 17.1 (1981), pp. 101–115. Search in Google Scholar

23. T. Soderstrom and P. Stoica. System Identification. Prentice Hall International, 1989. Search in Google Scholar

24. J. Wang, S. Jianju, Z. Yan and Z. Donghua. “Searching Historical Data Segments for Process Identification in Feedback Control Loops.” In: Computers and Chemical Engineering 112 (2018), pp. 6–16. Search in Google Scholar

Received: 2019-05-01
Accepted: 2020-03-11
Published Online: 2020-04-30
Published in Print: 2020-05-27

© 2020 Walter de Gruyter GmbH, Berlin/Boston