Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter November 14, 2017

Tracking Control of a Continuous Stirred Tank Reactor Using Direct and Tuned Reinforcement Learning Based Controllers

B. Jaganatha Pandian ORCID logo and Mathew M. Noel


The need for linear model, of the nonlinear system, while tuning controllers limits the use of classic controllers. Also, the tuning procedure involves complex computations. This is further complicated when it is necessary to operate the nonlinear system under different operating constraints. Continues Stirred Tank Reactor (CSTR) is one of those non-linear systems which is studied extensively in control and chemical engineering due to its highly non-linear characteristics and its diverse operating range. This paper proposes two different control schemes based on reinforcement learning algorithm to achieve both servo as well as regulatory control. One approach is the direct application of Reinforcement Learning (RL) with ANN approximation and another is tuning of PID controller parameters using reinforcement learning. The main objective of this paper is to handle multiple set point control for the CSTR system using RL. The temperature of the CSTR system is controlled here for multiple setpoint changes. A comparative study is also done between the two proposed algorithm and from the test result, it is seen that direct RL approach with approximation performs better than tuning a PID using RL as oscillations and overshoot are less for direct RL approach. Also, the learning time for the direct RL based controller is lesser than the later.

A Nomenclature

a, A

Action variable and its constraint set

s, S

State vector and its constraint set


Reward function


Probability of reaching “s” upon execution of “a”


Cumulative discounted reward


Optimal policy


Optimal value


Coolant Flow rate (lpm)


Concentration of A in the reactor (mol/l)


Temperature of reactor fluid (K)


Product Flow rate (lpm)


Input product concentration (mol/lit)


Input temperature (K)


Coolant Temperature (K)


Container volume (l)


Activation energy term (K)


Reaction rate constant (1pm)

k1, k2, k3

CSTR Plant constants


[1] Mohammadzaheri M, Chen L. Intelligent control of a nonlinear tank reactor based on Lyapunov direct method. In: Industrial Technology, 2009. ICIT 2009. IEEE International Conference on 2009 Feb 10:1–6. IEEE.10.1109/ICIT.2009.4939554Search in Google Scholar

[2] Salahshoor K, Sabet Kamalabady A. Adaptive feedback linearization control of SISO nonlinear processes using a self-generating neural network-based approach. Chem Prod Process Model. 2011;6(1). DOI: 10.2202/1934-2659.1518Search in Google Scholar

[3] Rahmat MF, Yazdani AM, Movahed MA, Mahmoudzadeh S. Temperature control of a continuous stirred tank reactor by means of two different intelligent strategies. Int J Smart Sens Intell Syst. 2011;4(2):244–67.10.21307/ijssis-2017-438Search in Google Scholar

[4] Wahab A, Khairi A, Hussain MA, Omar R. An artificial intelligence software-based controller for temperature control of a partially simulated chemical reactor system. Chem Prod Process Model. 2008;3(1):53.Search in Google Scholar

[5] Aguilar R, Poznyak A. A new robust sliding-mode observer design for monitoring in chemical reactors. Analysis. 2004;3:6.Search in Google Scholar

[6] Manimozhi M, Meenakshi R. Multiloop IMC-based PID controller for CSTR process. In: Proceedings of the International Conference on Soft Computing Systems 2016:615–25. Springer India.10.1007/978-81-322-2671-0_59Search in Google Scholar

[7] Zhang Y, Ding SX, Yang Y, Li L. Data-driven design of two-degree-of-freedom controllers using reinforcement learning techniques. IET Control Theory Appl. 2015;9(7):1011–21.10.1049/iet-cta.2014.0156Search in Google Scholar

[8] Radac MB, Precup RE, Roman RC. Model-free control performance improvement using virtual reference feedback tuning and reinforcement Q-learning. Int J Syst Sci. 2017;48(5):1071–83.10.1080/00207721.2016.1236423Search in Google Scholar

[9] Si J, Wang YT. Online learning control by association and reinforcement. IEEE Trans Neural Networks. 2001;12(2):264–76.10.1109/72.914523Search in Google Scholar

[10] Syafiie S, Tadeo F, Martinez E. Model-free learning control of neutralization processes using reinforcement learning. Eng Appl Artif Intell. 2007 Sep 30;20(6):767–82.10.1016/j.engappai.2006.10.009Search in Google Scholar

[11] Cerrada M, Aguilar J. Reinforcement learning in system identification. California: INTECH Open Access Publisher, 2008.10.5772/5273Search in Google Scholar

[12] Govindhasamy JJ, McLoone SF, Irwin GW. Reinforcement learning for process identification, control and optimisation. In: Intelligent Systems, 2004. Proceedings. 2004 2nd International IEEE Conference 2004 Jun 22;1:316–21. IEEE.Search in Google Scholar

[13] Malikopoulos AA, Papalambros PY, Assanis DN. A real-time computational learning model for sequential decision-making problems under uncertainty. J Dyn Syst Meas Control. 2009;131(4):041010.10.1115/1.3117200Search in Google Scholar

[14] Wong WC, Lee JH. A reinforcement learning‐based scheme for direct adaptive optimal control of linear stochastic systems. Optimal Control Appl Methods. 2010;31(4):365–74.10.1002/oca.915Search in Google Scholar

[15] Pradeep DJ, Noel MM, Arun N. Nonlinear control of a boost converter using a robust regression based reinforcement learning algorithm. Eng Appl Artif Intell. 2016;52:1–9.10.1016/j.engappai.2016.02.007Search in Google Scholar

[16] Pazis J, Lagoudakis MG. Learning continuous-action control policies. In: Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL’09. IEEE Symposium on 2009 Mar 30:169–76. IEEE.10.1109/ADPRL.2009.4927541Search in Google Scholar

[17] Weinstein A. Local planning for continuous Markov decision processes. New Jersey: Rutgers The State University of New Jersey-New Brunswick, 2014.Search in Google Scholar

[18] Howell MN, Frost GP, Gordon TJ, Wu QH. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics. 1997;7(3):263–76.10.1016/S0957-4158(97)00003-2Search in Google Scholar

[19] Lee M, Anderson CW. Convergent reinforcement learning control with neural networks and continuous action search. In: Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014 IEEE Symposium on 2014 Dec 9:1–8. IEEE.10.1109/ADPRL.2014.7010612Search in Google Scholar

[20] Noel MM, Pandian BJ. Control of a nonlinear liquid level system using a new artificial neural network based reinforcement learning approach. Appl Soft Comput. 2014;23:444–51.10.1016/j.asoc.2014.06.037Search in Google Scholar

[21] Liu YJ, Tang L, Tong S, Chen CP, Li DJ. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Trans Neural Networks Learn Syst. 2015;26(1):165–76.10.1109/TNNLS.2014.2360724Search in Google Scholar

[22] Howell MN, Best MC. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Eng Pract. 2000;8(2):147–54.10.1016/S0967-0661(99)00141-0Search in Google Scholar

[23] Chamsai T, Jirawattana P, Radpukdee T. Robust adaptive PID controller for a class of uncertain nonlinear systems: an application for speed tracking control of an SI engine. Math Probl Eng. 2015;2015:1–12.10.1155/2015/510738Search in Google Scholar

[24] El Hakim A, Hindersah H, Rijanto E. Application of reinforcement learning on self-tuning pid controller for soccer robot multi-agent system. In: Rural Information & Communication Technology and Electric-Vehicle Technology (rICT & ICeV-T), 2013 Joint International Conference on 2013:1–6. IEEE.10.1109/rICT-ICeVT.2013.6741546Search in Google Scholar

[25] Sedighizadeh M, Rezazadeh A. Adaptive PID controller based on reinforcement learning for wind turbine control. Proceedings of World Academy of Science, Engineering and Technology. Cairo, Egypt. 2008;27:257–62.Search in Google Scholar

[26] Liu YJ, Tong S. Optimal control-based adaptive NN design for a class of nonlinear discrete-time block-triangular systems. IEEE Trans Cybern. 2016;46(11):2670–80.10.1109/TCYB.2015.2494007Search in Google Scholar PubMed

[27] Li DP, Li DJ. Adaptive neural tracking control for nonlinear time-delay systems with full state constraints. IEEE Trans Syst Man Cybern Syst. 2017;47(7):1590–1601.10.1109/TSMC.2016.2637063Search in Google Scholar

[28] Li DP, Li DJ, Liu YJ, Tong S, Chen CP. Approximation-based adaptive neural tracking control of nonlinear MIMO unknown time-varying delay systems with full state constraints. IEEE Trans Cybern. 2017;47(10):3100–09.10.1109/TCYB.2017.2707178Search in Google Scholar PubMed

[29] Bellman RI. Dynamic programming. Princeton, NJ: Princeton University Press,1957:3. 1(2).Search in Google Scholar

Received: 2017-06-10
Revised: 2017-09-25
Accepted: 2017-10-14
Published Online: 2017-11-14

© 2018 Walter de Gruyter GmbH, Berlin/Boston