Hybrid and cognitive digital twins for the process industry

: In a Europe that is undergoing digital transfor - mation, the COGNITWIN project is contributing to accel - erate the transformation and introduce Industry 4.0 to the European process industries. The opportunities here can be illustrated by the SPIRE 2050 Vision document ( https://www.spire2030.eu/sites/default/ ﬁ les/users/user85/ Vision_Document_V6_Pages_Online_0.pdf ) , which states that “ Digitalisation of process industries has a tremendous potential to dramatically accelerate change in resource management, process control and in the design and the deployment of disruptive new business models. ” The pro - cess industries are characterized with harsh environments where sensors are either costly, not available, or may be subject to costly maintenance. The development of digital twins that can exploit the combinations of data - based and physics - based models is often found to be a preferred path to robust digital twins that can help cutting costs and reduce energy consumption. In this article, we present 5 out of 6 industrial pilots that are developed in the COGNITWIN pro - ject. We discuss the commonalities and di ﬀ erences between the selected approaches and give some ideas about how cog - nition can be incorporated into the digital twins. The aim of this article is to inspire similar projects in related industries.


Motivation: Needs of COGNITWIN
The European Green Deal [1] has the ambition to transform Europe into the first climate neutral continent by 2050. This requires development of a sustainable EU economy, and significant transformation of both production and consumption practices.
One of the key cornerstones of this goal relies on the successful transformation of the process industries. These industries have a strong impact on energy and material consumption [2]. In addition, these industries are an integral part of the value chains of the European economy. These heavy industries do realize that digitalization of their processes can offer great advantages. However, their challenges are often quite different from what is seen in the manufacturing sector, where advances on the digitization of their processes have advanced fast. The process industry sector is different, often dealing with very high temperatures, limited or no access to the core of the process, chemically harsh environments and where appropriate sensors are often very expensive to purchase, have limited lifetime and require short maintenance intervals. Currently, the sensor needs are defined to meet traditional process control procedures, and the process industry is lagging behind other industries in digitalization level. Therefore, the further digitalization in the process industry represents an unexploited potential and may enable acceleration of the transformation to more sustainable future industries while supporting [3] digital plant operations, intelligent material and equipment monitoring, and autonomous integrated supply chain management.
The Horizon 2020 project COGNITWIN¹ is aiming at answering these challenges and demonstrate how the process industry can be lifted through new sensors and application of the best available technologies in model development, combining data-based and physics-based approaches. In addition to developing hybrid digital twins [4], the project will introduce cognitive concepts into the digital twins.
In Section 2 of this article, we present the COGNITWIN industrial pilots and the approaches applied to develop hybrid and cognitive digital twins. Different approaches to the cognitive plants and digital twins are represented. Each approach is based on the industrial challenges in the different pilots, and the selection of the best methods to support the pilots, within the timeframe of the project, is crucial. Another important element is to develop methods and tools which may be used to support future industrial developments. Section 3 discusses the similarities and differences between the approaches applied in the different COGNITWIN pilots. Finally, the conclusions and next steps are given in Section 4.

COGNITWIN
The primary goal of the COGNITWIN project is to support European process industries and exploit the advantages of digitalization. Digital twins² (DTs) are key elements in achieving this goal. Referring to the Wikipedia definition, "A digital twin is a virtual representation that serves as the real-time digital counterpart of a physical object or process. Digital twins are the result of continual improvement in the creation of product design and engineering activities." When a DT has been realized, this will have numerous applications, such as better process understanding, better understanding of the sensor data, data collection and processing, supporting optimization of the overall process as well as subprocesses. In the COGNIT-WIN project, the ambition is to further add cognition³ to the digital twins. DTs may take many shapes and involve a huge forest of different techniques and methods, such as reduced order models [5,6], assimilation methods [7], recurrent CFD [8], and pragmatism in industrial modeling [9]. As discussed in [10] and [4], the COGNITWIN project has introduced three different layers of the DTs. We follow the IIC definition of DTs [11] and assume that a digital twin is a formal digital representation of an asset that captures attributes and behaviors of that asset. In another valuable contribution to defining DTs [12], it is discussed how a model differs from a DT.
DTs may be developed from data (data-driven), from physics-based models or from a combination of the two approaches (one type of hybrid approach). In process control, this type of hybrid DT has been extensively used since the 1960s [13]. Lately, nonlinear model tuning to data has been successfully applied by means of extended Kalman filters [14].
The DTs may be built to handle a limited universe of data and models and where the interaction with generic and up till now unknown data and models are not foreseen. In other cases, it is known a priori that the specific DT may be a system of multiple subsystems, and we may want to profit in the future from new models and data becoming available. To deal with such cases, several approaches to DT asset management have been developed [15][16][17][18]. Generic access to data or models may be realized through semantic interoperability methods [19][20][21].
In the COGNITWIN project, a Toolbox [22] is being developed for use in the project but also for use in coming projects with similar types of process industries. Based on the requirements from the pilots, we develop DTs and apply our present Toolbox [22] together with an overall orchestration pipeline (Digital Twin Pipeline, Figure 1).
It must be noted that the majority of COGNITWIN pilots have challenges that deviate significantly from the manufacturing industries, mostly due to high temperatures (T > 1,600°C), fumes, dust, operations with heavy equipment and lack of affordable and robust sensors. In many cases the measurement challenges are formidable and there are no sensors in the development pipeline. Available measurement data may have inconsistencies due to on-the-fly operational changes. The data were not saved with the thought of being the source for future machine learning and DT building. Therefore, we have already learned that it is crucial to have physicsbased models at hand to assess the data and make critical corrections. When data violate basic principles of mass, momentum, and energy conservation, we may use the physics-based model not only to validate the data but also to correct by model-based virtual data. The latter point may be important when the amount of data is scarce.
Even with very accurate DTs in operation, human intervention is still required to respond to rare, sudden, unknown events. A cognitive DT extends a hybrid DT by marrying expert knowledge with the power of hybrid analytics ( Figure 2). The synergy with expert knowledge makes it possible to find solutions to previously unforeseen situations. In some cases, the outcome from the DT  is of such a critical nature that an action will be a synthesis of DT prediction and an operator's approval.
In a nutshell, the cognition process involves comprehensive modeling of various knowledge elements that can support interpretation of process changes, including detection and interpretation of unexpected process variations.
The proposed DT abstraction layer [4,10] contains knowledge on (a) how to model dynamics in particular models, and (b) how to interpret variations in the models. Assuming that all models created are correct in principle (albeit with some limitations), the output values can therefore be interpreted on the level of the behavior of the process  by understanding the output data in the context of the process variations. In simple terms, the cognitive layer may be seen as primarily a qualitative assessment, but may change the models, methods, and output, by exploiting quantitative data and experiences.
2 Overview of pilots related to cognitive digital twins In this section, we briefly describe five different pilots for cognitive plants by explaining their approaches to digital twins and cognition. An overview over the pilots is provided in Table 1.

Hydro
The Hydro pilot aims to increase insight into the current status of the Gas Treatment Centre (GTC), with the end goal of evening out the fluoride content of the raw alumina feed proceeding to the core electrolysis process. The GTC supports the core electrolysis process that produces Hydro's main product: aluminum. Hazardous, high HF (hydrogen fluoride)-content fumes are a byproduct of the aluminum production process. The GTC has multiple functions: (1) to remove continuously generated fumes from the electrolysis cells using powerful fans, (2) to clean these fumes of toxic species (mainly HF) such as to avoid releasing hazardous gases into the workplace and the environment, and (3) to return fluoride (emitted HF) to the electrolysis cells, where it has an important role in maintaining optimal conditions for aluminum production.
The COGNITWIN project works with data from Hydro's Karmøy Technology Pilot (KTP) because of the improved digital infrastructure and greater availability of logged data signals. Fresh alumina is shipped to Karmøy and enters the GTC via filter compartments. While traveling through the baghouse filters and adsorption reactors, the alumina is exposed to and adsorbs the HF present in the fumes before being sent to a silo for secondary (HF-rich) alumina. The secondary alumina is distributed to the individual electrolysis cells where it serves as both the raw material for aluminum production and a source of elemental fluoride, which is necessary for the optimal performance of the electrolysis process.
Multiple factors affect the levels of HF present in the fumes and the subsequent HF-content of the secondary alumina returned to the electrolysis cells. This variable HF-content can result in fluctuating amount of fluoride species in the cell and influence the efficiency of the aluminum production process. Close monitoring as well as compensatory measures to stabilize the alumina composition are therefore needed to optimize cell performance. The fluoride content of alumina is, however, difficult to determine. Relevant measurements are occasionally taken but are not as of yet with high enough frequency to be used directly in process control. In addition, no tools for anticipating upcoming changes to HF-content currently exist.
The COGNITWIN project aims to develop a digital twin technology that can predict the current HF-content of secondary alumina, anticipate future disturbances to the HF-content based on forecasted process inputs, and help operators adjust GTC operation to minimize variation in alumina composition.
The model-based digital twin is currently running online in monitoring mode (not controlling the process) with data from many different sources, including the GTC process, electrolysis cells, raw alumina certificates, and the weather. The digital twin produces real-time predictions of the GTC process. The accuracy of the digital twin can be improved by self-adaption using data-driven estimation techniques based on model agreement with process measurements. The resulting hybrid digital twin is shown to follow the dynamics of the process measurement well, as demonstrated in Figure 3. By combining model predictions with process measurements, the hybrid digital twin enables soft-sensing of the otherwise unpredictable composition of alumina.
The digital system can be extended with information from the weather forecast in order to anticipate the future HF-content of alumina. The ability to predict disturbances to the alumina composition in advance allows for the effective optimization of GTC operation. Operator knowledge and experience will be used to define optimization schemes, hence adding a cognitive aspect to the twin.

ELKEM
The goals for the Elkem pilot are to increase product hit rate and maximize the use of recycled material for the ferrosilicon refining process. Ferrosilicon is produced in a submerged arc furnace and tapped into a transportable ladle at regular intervals. Typically, a batch of liquid ferrosilicon at approximately 1,600°C is tapped. The composition of the tapped metal varies with variations in the raw materials used in the furnace. After tapping, the ladle is transported to the refining station where the alloying and refining takes place. The composition of the metal is adjusted during refining and alloying to achieve the target composition of the specific grade. The tapped metal carries excess heat, and recycled ferrosilicon is utilized to cool the metal down to the correct temperature for casting. Currently, the registered amount of each addition is calculated by the control system and corrected manually by the operators based on their experience. Sampling the process is highly challenging due to the high temperature of the metal and is therefore only performed once or twice for each batch. The processing time for the samples is also significant, making it unsuitable for direct use in an automatic control system.
The COGNITWIN project aims to arrive at a cognitive digital twin that will aid the operators with choosing the correct amount of each addition which will both achieve the grade specification, minimize cost, and maximize the use of recycled ferrosilicon.
The foundation of the cognitive digital twin is a hybrid digital twin. A model based on first principles is currently installed at the plant and running online as a digital twin. The digital twin can predict the evolution of the temperature, the composition, and mass of slag/metal for the refining process with good accuracy (∼1% standard deviation). The prediction accuracy can be further improved by extending the digital twin to a hybrid digital twin, utilizing a data-driven model to calculate the amount of slag that is tapped from the furnace into the ladle. The digital twin will also employ a self-adapting algorithm, e.g., an augmented Kalman filter, to correct the first-principles model in real time. The hybrid digital twin will use new thermal cameras combined with a set of machine vision algorithms to extract additional information from the process. The optimum amount of each addition will be calculated by a nonlinear model predictive control algorithm. A flowchart for the hybrid digital twin is shown in Figure 4.
Once the hybrid digital twin is running online, it will be extended with cognitive elements, including selfadapting algorithms with the possibility to learn from the actions of the operators (shown in green in the figure). When the system has learned from a wide enough range of scenarios, operator intervention should become less frequent.

SIDENOR
The Sidenor pilot is dealing with optimization of the refractory life-time of ladles in steel production. The steel, produced from scrap smelting in an electric arc furnace, is tapped into a ladle where alloying and refining of the steel takes place. The ladle is a cup-shaped vessel (see Figure 5) that may typically contain 100-140 tons of steel. The ladle wall contacting the steel is made of refractory bricks that are designed to cope with the high temperatures of and the corrosiveness of the steel and slag. The slag is made of liquid and viscous metal oxides and is often designed to support the removal of unwanted impurities in the steel. The inner refractory lining is being consumed by each heat, and after a number of heats, the lining is so thin that it is deemed unsafe to continue to use the ladle without risking leakages into the plant, risking the life of people and making costly damages on building and equipment.
Currently, the assessment of having one more safe use (one more heat cycle) is done by visual inspection by experienced operators. As an outcome of the COGNI-TWIN project, we aim at creating a cognitive digital twin that will be able to assist the operator to decide if it is safe to use the ladle for one more heat cycle.
To build the model, operational data for several years are available, as well as information of the actual wear of the refractory at the time the ladle was deemed ready for relining. Direct machine learning on the data has been attempted, but the available data are not sufficient to create a data-based model that alone is able to reliably predict the remaining lifetime of a ladle. This is due to not only lack of relevant data, but also data points with various problems that need a lot of attention to penetrate.
A physics-based model with proper predictive capabilities might be possible, but with the complexity of the process and the limited time and resources available, this is also infeasible. Therefore, a hybrid approach is sought.  The physics-based model is a transient model able to predict the thermal evolution in the refractory walls and the erosion of the refractory lining. The results are depending on the history of the ladle to be accurate, which means we need to be able to follow the entire lifetime of the ladle.
The physics-based model will assist the data-driven model in two different ways. First, the process data sometimes have data points that are not valid for various reasons. The physics-based model can be used to find these so they can be disregarded by the machine learning model. Next, the results from the physics-based model will be used to generate additional data that can be used in a machine learning model to ultimately create a cognitive digital twin.
The system will be realized in a StreamPipes⁴-based application. The cognitive additions will include selfadapting algorithms, but even more important the interaction with the experienced operator. Clearly, the model and the operator must come to some agreement if the model allows one more heat while the operator says no! A long-term goal will be to have a cognitive twin that can make such assessments alone, at least under certain conditions.

NOKSEL
The NOKSEL pilot aims to enable predictive maintenance of Spiral Welded Steel Pipe (SWP) machinery to reduce the energy consumption in steel pipe manufacturing [22,23]. The SWP machine is composed of electromechanical systems. Because of the multi-step, serial, and interdependent nature of the production process followed by the SWP machine, even a single malfunction can stop the whole production, where the cost of unplanned machine breakdown is very high.
To avoid unexpected machine breakdowns, reduce maintenance costs, and lessen energy consumption of the machine, TEKNOPAR has enhanced TIA Platform elements and provided TIA AssetHealth solution, a cognitive digital twin, at the NOKSEL facilities in İskenderun [https:// tia-platform.com, https://tia-platform.com/product/tiaassethealth.html]. The selected sensor set installed on their optimum places, coupled PLCs, and control applications at the IoT level provided data acquisition. TIA AssetHealth uses acquired past and stream data for functions/applications: to monitor the real-time condition of the machine (TIA MONITORING) and to enable predictive maintenance (TIA PREMA). The online condition monitoring functionality of TIA AssetHealth includes descriptive statistics analysis (TIA STATISTICS and TIA METRICS), and overall equipment effectiveness calculations (TIA OEE). To enable predictive maintenance remaining useful life (RUL), estimations have been performed (TIA PREMA). Visualization was implemented by TIA UX applications.
TIA CONTROL subsystem has realized a cognitive and proactive dimension of the CT. Based on big data analytics and developed ontologies, TIA CONTROL is being used to prevent one of the common reasons for SWP machine failure. TIA CONTROL system enables autonomous decision-making to conduct preventive maintenance [23]. Figure 6 presents the TIA AssetHealth applications for the NOKSEL pilot.
During the development of the TIA AssetHealth for the NOKSEL pilot, some challenges have been met and solved. Table 2 lists these challenges, associated modules, and methods/tools to solve these challenges.

Sumitomo SHI FW
The Sumitomo SHI FW pilot deals with optimization of operation of circulating fluidized bed (CFB) energy boilers. The work has developed a digital twin-based system for management of fouling (Figure 7) at the flue gas heat exchange surfaces, exploiting physical and data-driven models, online process data, and novel sensoring. The aim of the fouling management system is to help the operator of the power plant to optimize the boiler controls in such a way that boiler operation economy is closer to optimal and the emissions and downtime of the boiler are reduced. The focus is on the optimization of soot-blowing sequence timing, via improved monitoring of fouling and better monitoring of incoming fuel properties.
Utilization of biomass and waste fuels in CFB boilers is often connected with an elevated risk of bed agglomeration, fouling, and corrosion. A better knowledge of the incoming fuel characteristics enables to monitor fuel quality changes and study its impact on fouling at heat exchange surfaces. Direct online measurement of fuel characteristics is expensive and laborious. Hence, a physical model-based on-line state estimation problem has been formulated. The developed approach provides an estimate of the distribution of different fuel fractions in Fouling is a phenomenon where material accumulates on the surface of heat exchanges. To ensure efficient heat transfer, the surfaces need to be cleaned regularly. A common practice at industrial plants is to use soot blowing with a predetermined fixed time interval. It would be beneficial to guide the soot blowing based on online monitored information, however. Slag buildup can be monitored by estimating the heat transfer efficiency using direct process data over the heat exchangers, but this method is error-prone due to the many variables involved. Therefore, several approaches of applying advanced modelbased state estimation techniques for improved estimation have been developed, making use of both physical and data-driven models, and fusing predictions with online data in model-based state estimation. The ensemble Kalman filtering was applied together with a physical heat exchanger model to monitor the heat transfer coefficient online. A computationally less demanding approach was developed for optimizing the next soot-blowing sequence start time, based on adaptive subspace identification for determination of prediction models used in mixed-integer optimization. In addition, the possibilities for direct monitoring of fouling on the heat exchanger tubes have been investigated, based on acoustic sensing and signal analytics in the frequency domain.  The industrial goal is to help the power plant operator to optimize the boiler controls, such as soot-blowing, and potentially also longer-term maintenance planning. In addition to efficient optimization, the fusion of the human operator in the decision-making loop is essential, to ensure that the improved knowledge is fed back to impact the plant operation.

An analysis of the different approaches
In this section, we discuss the similarities and differences between the approaches applied in the different COGNI-TWIN pilots.

Hybrid digital twins
The following discussion will be based on the Wikipedia definition⁵ of a DT, referred to in the Motivation section: "A digital twin is a virtual representation that serves as the real-time digital counterpart of a physical object or process. Digital twins are the result of continual improvement in the creation of product design and engineering activities." In the Hydro pilot, we introduced hybridization through improving DT accuracy by self-adaptation, using datadriven estimation techniques based on model agreement with process measurements. By combining physics-based model predictions with process measurements, the resulting hybrid⁶ digital twin (HT) enables soft-sensing of data that is otherwise unpredictable.
In the Elkem pilot, a first principle online model is installed at the plant. The DT prediction accuracy is improved by utilizing a data-driven model to calculate the amount of slag that is tapped from the furnace into the ladle. The digital twin will also employ a self-adapting algorithm, e.g., an augmented Kalman filter, to correct the first-principles model in real time. The hybrid digital twin will use thermal cameras combined with a set of machine vision algorithms to extract additional information from the process. The optimum amount of each addition will be calculated by a nonlinear model predictive control algorithm.
For the Sidenor pilot, a first-principle model was developed, in parallel with development of ML models. These developments were initially independent. However, as the physics model was evolving, some crucial data were questioned as those did not make sense according to the model. When those points were investigated, reasonable explanations were provided by the pilot and alternative data were obtained and applied. It was observed that a ML model can be developed to reproduce data acceptably, even if data that explain the results, in hindsight, can be found to be not what was assumed. It will be interesting to explore if the model is still correct (providing new explanations) or the model have prediction issues due to limited amount of data. Hybridization is achieved by (i) Systematically tuning of the   Here, hybrid is referring to a physics based model that is tuned and improved by learning from process data. Hybridization may also be based on exploiting different data-based models. model against the data, (ii) "repairing" data in the cases where lack of consistency is found in the data, (iii) using the model to provide new simulated data for the ML model which have not been measured, and (iv) combined model that exploits the physics-based prediction, ML/AI methods and plant data.
In the Noksel pilot, a Hybrid Twin (HT) is developed, where the first-order models and ML/AI models are integrated into a model for a HT. An iterative process is used, similar to the approaches applied in a majority of the pilots. As explained above, also here the predictions from the data-driven model will be taken as the data that will be applied by the hybrid algorithms to improve the overall model prediction power.
In the Sumitomo SHI FW pilot case, multiple physicsbased models are combined with extensive plant data using AI/ML in adaptive/learning methods for data-driven model construction and tuning, and population-based state estimation techniques. Hybridization is introduced by continuously estimating unmeasured process quantities, exploiting non-linear physics-based models and process operating data, applying nonlinear Kalman filtering. The approach of DT hybridization here is strongly related to the approaches in the Hydro and Elkem pilots. The applied methods and tools have application in a large range of similar industrial problems.
Several of the pilots are using or exploring StreamPipes to orchestrate the DT. This is an efficient platform for data collection, further expansion of the models and their interaction.

Cognition
The following Wikipedia definition of Cognition⁷ is used for this analysis: Cognition refers to "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses." It encompasses many aspects of intellectual functions and processes such as perception, attention, the formation of knowledge, memory and working memory, judgment and evaluation, reasoning and "computation," problem solving and decision making, comprehension and production of language. Cognitive processes use existing knowledge and discover new knowledge.
Cognition elements are yet not fully introduced in the COGNITWIN development. However, this will be faced in the coming months. Herein, we have however investigated some possibilities.
Involving humans to make critical decisions, but with recommendation from a numerical prediction, is a possibility that will certainly be explored in all the pilots. Similarly, self-adapting algorithms will be explored. In the Hydro case, optimization schemes will be based on operator experience and knowledge. The Noksel pilot has already developed sensing, storing, processing, and decision-making. One goal here is to develop capability to handle events that previously has not been experienced.

Conclusions
In the previous section, we have presented how the COGN-ITWIN project has approached the different pilots (industrial applications) in order to arrive at strong digital solutions for the involved industries. The solutions are hybrid and cognitive digital twins that are built on merges of existing and, or, new developments, applying both existing basic tools and adaptations hereof.
The toolbox [22] that is developed will be helpful for similar future projects with the industry. As the industry experience the usefulness of these digital solutions, optimizations on higher level may be wanted. This increases the complexity, but also the reward. For optimization at "mother company" level, we need to have good digital solutions for all sub levels. To do this, it becomes fundamental to realize the developments within asset management shells that enable effective and flexible communication and sharing between levels.
A major learning from COGNITWIN is that heavy process industries have limited data. Sometimes, there is no sensor technology available that can do the job, while in other cases, the sensor is available but cannot be applied due to excessive costs. It is critical that the data are well understood before use. In a majority of cases, for this industry, hybrid twins involving physics-based models is a must. The combination of physics and data-based models can offer better forecasts than each by themself. Different strategies to arrive at hybrid twins were applied, ranging from constantly tuning the physics-based model by data, to running primarily data-based models and where virtual sensors⁸ are explored based on physics-based models.