Causality and independence in perfectly adapted dynamical systems

Perfect adaptation in a dynamical system is the phenomenon that one or more variables have an initial transient response to a persistent change in an external stimulus but revert to their original value as the system converges to equilibrium. With the help of the causal ordering algorithm, one can construct graphical representations of dynamical systems that represent the causal relations between the variables and the conditional independences in the equilibrium distribution. We apply these tools to formulate sufficient graphical conditions for identifying perfect adaptation from a set of first-order differential equations. Furthermore, we give sufficient conditions to test for the presence of perfect adaptation in experimental equilibrium data. We apply this method to a simple model for a protein signalling pathway and test its predictions both in simulations and using real-world protein expression data. We demonstrate that perfect adaptation can lead to misleading orientation of edges in the output of causal discovery algorithms.


Introduction
Understanding causal relations is an objective that is central to many scientific endeavors.It is often said that the gold standard for causal discovery is a randomized controlled trial, but practical experiments can be too expensive, unethical, or otherwise infeasible.The promise of causal discovery is that we can, under certain assumptions, learn about causal relations by using a combination of data and background knowledge [1][2][3][4][5][6][7][8][9][10].Roughly speaking, causal discovery algorithms construct a graphical representation that encodes certain aspects of the data, such as conditional independences, given some constraints that are imposed by background knowledge.Under additional assumptions on the underlying causal mechanisms these graphical representations have a causal interpretation as well.In this work, we specifically consider the equilibrium distribution of perfectly adapted dynamical systems.We will show that such systems may have the property that the corresponding graphs that encode the conditional independences in the distribution do not have a straightforward causal interpretation in terms of the changes in distribution induced by interventions.
Perfect adaptation in a dynamical system is the phenomenon that one or more variables in the system temporarily respond to a persistent change of an external stimulus, but ultimately revert to their original values as the system reaches equilibrium again.We study the differences between the causal structure implied by the dynamic equations and the conditional dependence structure of the equilibrium distribution.To do so, we make use of the technique of causal ordering, introduced by Simon [11], which can be used to construct a causal ordering graph that represents causal relations and a Markov ordering graph that encodes conditional independences between variables [12].We introduce the notion of a dynamic causal ordering graph to represent transient causal effects in a dynamical model.We use these graphs to provide a sufficient graphical condition for a dynamical system to achieve perfect adaptation, which does not require simulations or cumbersome calculations.Furthermore, we provide sufficient conditions to test for the presence of perfect adaptation in real-world data with the help of the equilibrium Markov ordering graph and we explain why the usual interpretation of causal discovery algorithms, when applied to perfectly adapted dynamical systems at equilibrium, can be misleading.
We illustrate our ideas on three simple dynamical systems with feedback: a filling bathtub model [13,14], a viral infection model [15,16], and a chemical reaction network [17].We point out how perfect adaptation may also manifest itself in protein signalling networks, and take a closer look at the consequences for causal discovery from a popular protein expression data set [18].We adapt a model for the RAS-RAF-MEK-ERK signalling pathway from [19] and study its properties under certain saturation conditions.We test its predictions both in simulations and on real-world data.We propose that the phenomenon of perfect adaptation might explain why the presence and orientation of edges in the output of causal discovery algorithms does not always agree with the direction of edges in biological consensus networks that are based on a partial representation of the underlying dynamical mechanisms.

Background
In this section, we provide an overview of the relevant background material on which this work is based.We first consider the assumptions underpinning popular constraint-based causal discovery algorithms and give a brief description of a simple local causal discovery algorithm in Section 2.1.In Section 2.2, we proceed with a concise introduction to the causal ordering algorithm of Simon [11] and demonstrate how it can be applied to a set of equations together with a pre-specified set of exogenous variables to deduce the implied causal relations and conditional independences.Finally, in Section 2.3, we discuss the relationship of the present work with existing work.

Causal discovery
The main objective of causal discovery is to infer causal relations from experimental and observational data.The most common causal discovery algorithms can be roughly divided into score-based and constraint-based approaches.In this work, we will focus on the latter approach (examples include PC, FCI and variants thereof, see [1,2,4,5,7,9,10]), which exploits conditional independences in data to infer causal relations.We will first discuss assumptions for constraint-based causal discovery.We then consider the application of causal discovery algorithms to models with feedback.We proceed with a brief but concrete introduction to a simple local causal discovery algorithm.Finally, we discuss how the present work relates to existing literature.
Learning a graphical structure from conditional independence constraints typically relies on Markov and faithfulness assumptions relating conditional independences to properties of a graph.In particular, a d-separation is a relation between three disjoint sets of vertices in a graph that indicates whether all paths between two sets of vertices are d-blocked by the vertices in a third [3,4].If every d-separation in a graph implies a conditional independence in the probability distribution then we say that the distribution satisfies the directed global Markov property (or the d-separation criterion) w.r.t. that graph [20].Conversely, if every conditional independence in the probability distribution corresponds to a d-separation in a graph then we say that the distribution is d-faithful to that graph. 1 When a probability distribution satisfies the d-separation Markov property w.r.t. a graph and is also d-faithful to the graph, then this graph is a compact representation of the conditional independences in the probability distribution and we say that it encodes its independence relations.A lot of work has been done to understand the (combinations of) various assumptions (e.g.linearity, Gaussianity, discreteness, causal sufficiency, acyclicity, no selection bias) under which a graph that encodes all conditional independences and dependences in a probability distribution has a certain causal interpretation [see, e.g., 2, 4-10, 21, 22].Perhaps the simplest assumption is that the data was generated by a causal DAG [3,4]. 2 This graphical object represents both the conditional independence structure (the observational probability distribution is assumed to satisfy the directed global Markov property with respect to the graph) and the causal structure (every directed edge that is absent in the DAG corresponds with the absence of a direct causal relation between the two variables, relative to the set of variables in the DAG).For this acyclic setting, sophisticated constraint-based causal discovery algorithms (such as the PC and FCI algorithms [4,5]) have been developed.The key assumption that these algorithms make is that the same DAG expresses both the conditional independences in the observational distribution and the causal relations between the variables.
However, many systems of interest in various scientific disciplines (e.g.biology, econometrics, physics) include feedback mechanisms.Cyclic Structural Causal Models (SCMs) can be used to model causal features and conditional independence relations of systems that contain cyclic causal relationships [23].For linear SCMs with causal cycles, several causal discovery algorithms have been developed that are based on dseparations [2,6,21,22].The d-separation criterion is applicable to acyclic settings and to cyclic SCMs with either discrete variables or linear relations between continuous variables, but it is too strong in general [24].A weaker Markov property, based on the notion of σ-separation, has been derived for graphs that may contain cycles [23][24][25].If every σ-separation in a graph implies a conditional independence in the probability distribution then we say that it satisfies the generalized directed global Markov property (or the σ-separation criterion) w.r.t. that graph [25].Conversely, if every conditional independence in the probability distribution corresponds to a σ-separation in a graph then we say that it is σ-faithful to that graph.Richardson [26] already proposed a causal discovery algorithm that is sound under the generalized directed Markov property and the d-faithfulness assumption, assuming causal sufficiency.Recently, a causal discovery algorithm was proposed based on the σ-separation criterion and the assumption of σ-faithfulness that is sound and complete [8].It was also shown that, perhaps surprisingly, the PC and FCI algorithms are still sound and complete in that setting, although their output has to be interpreted differently [10].
For the sake of simplicity, we will limit our attention in this paper to one of the simplest causal discovery algorithms, Local Causal Discovery (LCD) [1].This algorithm is a straightforward and efficient search method to detect a specific causal structure from background knowledge and observational or experimental data.The algorithm looks for triples of variables (C, X, Y ) for which (a) C is a variable that is not caused by X, and (b) the following (in)dependences hold: C ⊥ ⊥ X, X ⊥ ⊥ Y , and C ⊥ ⊥ Y | X. Figure 1 shows all acyclic directed mixed graphs3 that correspond to the LCD triple (C, X, Y ), assuming d-faithfulness and no selection bias.They all have in common that there is no bidirected edge between X and Y , while there is a directed edge from X to Y .Hence, we can conclude that X causes Y and the two variables are not confounded.The algorithm was proven to be sound in both the σ-and d-separation settings even when cycles may be present [9].Even though this algorithm is not as sophisticated as many other causal discovery algorithms, it suffices for our exposition of the pitfalls of attempting causal discovery on data generated by a perfectly adaptive dynamical system.
All acyclic directed mixed graphs that form an LCD triple (C, X, Y ), assuming d-faithfulness.In the absence of latent confounders and selection bias, the structure can only be that of the directed acyclic graph on the left.
In this paper, we consider equilibrium distributions that are generated by dynamical models.The causal relations in an equilibrium model are defined through the effects of persistent interventions (i.e.interventions that are constant over time) on the equilibrium distribution, assuming that the system again converges to equilibrium.It has been shown that directed graphs encoding the conditional independences between endogenous variables in the equilibrium distribution of dynamical systems with feedback do not always have a straightforward and intuitive causal interpretation [12][13][14] (see also Appendix F).As a consequence, the output of causal discovery algorithms applied to equilibrium data of dynamical systems with feedback cannot always be interpreted causally in a naïve way.In this work, we will show that this not only happens in isolated cases, but that this is actually a common phenomenon in a large class of models with perfectly adapting feedback mechanisms.In our opinion, a better understanding of how perfectly adapting feedback loops affect the causal interpretation of the conditional independence structure is a prerequisite for successful applications of causal discovery in fields like systems biology, where one often encounters perfect adaptivity.One way to arrive at such understanding is through the application of the causal ordering algorithm, the topic of the next section.

Causal ordering
The causal ordering algorithm of Simon [11] returns an ordering of endogenous variables that occur in a set of equations, given a specification of which variables are exogenous.The algorithm was recently reformulated so that the output is a causal ordering graph that encodes the generic effects of certain interventions and a Markov ordering graph that represents conditional independences (both under certain assumptions regarding the solvability of the equations) [12].We refer readers that are not yet familiar with the causal ordering algorithm to [12] for a more extensive introduction to this approach.Here, we will provide only a brief description of the algorithm and discuss how its output should be interpreted.
First note that the structure of a set of equations and the variables that appear in them can be represented by a bipartite graph B = V, F, E , where vertices F correspond to the equations and vertices V correspond to the endogenous variables that appear in these equations.For each endogenous variable v ∈ V that appears in an equation f ∈ F there is an undirected edge (v − f ) ∈ E. The output of the causal ordering algorithm is a directed cluster graph V, E , consisting of a partition V of the vertices V ∪ F into clusters, and edges (v → S) ∈ E that go from vertices v ∈ V to clusters S ∈ V. Application of the causal ordering algorithm to a bipartite graph B = V, F, E results in the directed cluster graph CO(B) = V, E , which we will call the causal ordering graph [12].For simplicity, we assume here that the bipartite graph has a perfect matching (i.e.there exists a subset M ⊆ E of the edges in the bipartite graph B = V, F, E so that every vertex in V ∪ F is adjacent to exactly one edge in M ). 4 The causal ordering graph is constructed by the following steps: 51.Find a perfect matching M ⊆ E and let M (S) denote the vertices in V ∪ F that are joined to vertices in , where cl(f ) denotes the cluster in V that contains f .4. Exogenous variables appearing in the equations are added as singleton clusters to V, with directed edges towards the clusters of the equations in which they appear in E. 5. Output the directed cluster graph CO(B) = V, E .

Example 1. Consider the following set of equations with index set
where U w1 and U w2 are exogenous (random) variables indexed by W = {w 1 , w 2 }. Figure 2a shows the associated bipartite graph B = V, F, E .This graph has exactly one perfect matching }, which is used in step 2 of the causal ordering algorithm to construct the directed graph G(B, M ) in Figure 2b.The causal ordering graph that is obtained after applying steps 3 and 4 of the causal ordering algorithm is given in Figure 2c.Throughout this work, we will assume that sets of equations are uniquely solvable with respect to the causal ordering graph, which means that for each cluster, the equations in that cluster can be solved uniquely for the endogenous variables in that cluster (see Definition 14 in [12] for details).This implies amongst others that the endogenous variables in the model can be solved uniquely along a topological ordering of the causal ordering graph.Under this assumption, the causal ordering graph represents the effects of soft and certain perfect interventions [12].Soft interventions target equations; they do not change which variables appear in the targeted equation and may only alter the parameters or functional form of the equation.Perfect interventions target clusters in the causal ordering graph and replace the equations in the targeted cluster with equations that set the variables in the cluster equal to constant values.A soft intervention on an equation or a perfect intervention on a cluster has no effect on a variable in the causal ordering graph whenever there is no directed path to that variable from the intervention target (i.e. the targeted equation or an arbitrary vertex in the targeted cluster, respectively), see Theorems 20 and 23 in [12]. 6Since the equations in Example 1 are uniquely solvable w.r.t. the causal ordering graph in Figure 2c, 7 we can for example read off from the causal ordering graph that a soft intervention targeting f 1 may have an effect on X v2 (since there is a directed path from f 1 to v 2 ), and that a perfect intervention targeting the cluster {v 2 , f 2 } has no effect on X v1 (as there is no directed path from the cluster {v 2 , f 2 } to v 1 ).
Given the probability distribution of the exogenous random variables, one obtains a unique probability distribution on all the variables under the assumption of unique solvability w.r.t. the causal ordering graph.The Markov ordering graph is a directed acyclic graph MO(B) such that d-separations in this graph imply corresponding conditional independences between variables in this joint distribution, provided that all exogenous random variables are independent [12].The Markov ordering graph is obtained from a causal ordering graph CO(B) = V, E by constructing a graph V , E with vertices V = V and edges (v → w) ∈ E if and only if (v → cl(w)) ∈ E. The Markov ordering graph for the set of equations in Example 1 is given in Figure 2d.The d-separations in this graph imply conditional independences between the corresponding variables under the assumption that U w1 and U w2 are independent.For instance, since v 1 and w 2 are d-separated we can read off from the Markov ordering graph that X v1 and U w2 must be independent.
Assuming that the probability distribution is d-faithful to the Markov ordering graph and that we have a conditional independence oracle, we know that the output of the PC algorithm represents the Markov equivalence class of the Markov ordering graph [10]. 8However, while for acyclic systems, the directed edges in the Markov ordering graph can also be interpreted as direct causal relations, this is not the case for all systems [12,16].Several examples of this phenomenon are provided in Appendix F. In this work we will show that such discrepancy between the causal and the Markov structure is actually a common feature of perfectly adapted dynamical systems at equilibrium.

Related work
The causal ordering algorithm can be applied, for instance, to the equilibrium equations of a dynamical system to uncover its causal properties and conditional independence relations at equilibrium.The relationship between dynamical models and causal models has received much attention over the years.For instance, the works of [13,[30][31][32][33][34][35] considered causal relations in dynamical systems that are not at equilibrium, while [6,13,14,16,21,34,[36][37][38][39] considered graphical and causal models that arise from studying the stationary behavior of dynamical models.In particular, extensions of the causal ordering algorithm for dynamical systems were proposed and discussed in [13].Also, the causal ordering algorithm was applied in [16] to study the robustness of model predictions when combining two systems.The relationship between the causal semantics of a dynamical system before it reaches equilibrium and at equilibrium has also been studied [14,34,36].
In previous work, researchers have noted various problems when attempting to use a single graphical model to represent both conditional independence properties and causal properties of certain dynamical systems at equilibrium [14,21,37,39,40].Often, restrictive assumptions on the underlying dynamical models are made to avoid these subtleties-the most common one being to exclude the possibility of cycles altogether.In this work, we will not make such restrictive assumptions and instead show that such problems are pervasive in the important class of perfectly adapted dynamical systems.We follow [12,16] in addressing the issues by using the causal ordering algorithm to construct separate graphical representations for the causal properties and conditional independence relations implied by these systems.
It has been shown that the popular SCM framework [3,23] is not flexible enough to fully capture the causal semantics (in terms of perfect interventions targeting variables) of the dynamics of a basic enzyme reaction at equilibrium, and for that purpose [39] proposed to use Causal Constraints Models (CCMs) instead.However, CCMs lack a graphical representation (and consequently, all the benefits that usually come with it, like a Markov property and a graphical approach to causal reasoning).The techniques in [12] can also be used to construct graphical representations of causal relations and conditional independences of these models.In Appendix D, we demonstrate that the basic enzyme reaction is perfectly adapting and we show how the causal ordering technique can be used to obtain graphical presentations and a Markov property for this model.This approach offers several advantages over the CCMs approach to model this system.
An application area where obtaining a causal understanding of complex systems is often non-trivial due to feedback and perfect adaptation is that of systems biology.A research question that has seen considerable interest in that field is which reaction networks can achieve perfect adaptation [17,[41][42][43][44].The present work provides a method that facilitates the analysis of perfectly adapted dynamical systems by providing a principled and computationally efficient procedure to identify perfect adaptation either from model equations or from experimental data and background knowledge.
In Section 5, we apply our methodology in an attempt to arrive at a better understanding of the causal mechanisms present in protein signalling networks.For protein signalling networks, apparent 'causal reversals' have been reported, that is, cases where causal discovery algorithms find the opposite causal relation of what is expected [9,[45][46][47][48].One explanation for these reversed edges in the output of causal discovery algorithm could be the unknown presence of measurement error [49].As we demonstrate in this work, unknown feedback loops that result in perfect adaptation can be another reason why one might find reversed causal relations when applying causal discovery algorithms to biological data.

Perfect adaptation
In this section, we introduce the notion of perfect adaptation by taking a close look at several examples of dynamical systems that can achieve perfect adaptation.We then consider graphical representations of these systems both before and after they have reached equilibrium.This will set the stage for our main theoretical results regarding the identification of perfect adaptation in models or data, which will be presented in Section 4. The goals of this section are: (i) building intuition about mechanisms that result in perfect adaptation, (ii) outlining the relevance of this phenomenon in application domains, and (iii) explaining how the causal ordering algorithm helps to understand perfect adaptation.
The ability of (part of) a system to converge to its original state when a constant and persistent external stimulus is added or changed is referred to as perfect adaptation.If the adaptive behavior does not depend on the precise setting of the system's parameters then we say that the adaptation is robust.For our purposes, the most interesting of the two is robust perfect adaptation.As this is also often simply referred to as 'perfect adaptation' in the literature, we will do so here as well.

Examples
We consider three different dynamical models corresponding to a filling bathtub, a viral infection with an immune response, and a chemical reaction network.We use simulations to demonstrate that all of these systems are capable of achieving perfect adaptation.The details of the simulations presented in this section are provided in Appendix B, and code to reproduce these is provided by the authors under a free and open source license (see Data availability statement for details).

Filling bathtub
We consider the example of a filling bathtub of Iwasaki & Simon [13] (see also [12,14]).Let I K (t) be an input signal 9 that represents the size of the drain in the bathtub.The inflow rate X I (t), water level X D (t), water pressure X P (t), and outflow rate X O (t) are modelled by the following static and dynamic I + = 0.25 I + = 1.00I + = 10.0The input signal suddenly changes from an (constant) value for t < t 0 to a different (constant) value for t ≥ t 0 .The timing of this change is indicated by a vertical dashed line.The three systems started with input signals 6, and I − = 1.5, respectively.After a transient response, X O (t), X I (t), and X C (t) all converge to their original equilibrium value (i.e., they perfectly adapt to the input signal).equations:10 where g is the gravitational acceleration, and U I , U 1 , U 2 , U 3 , U 4 , U 5 are independent exogenous random variables taking value in R >0 .Let X D , X P , and X O denote the respective equilibrium solutions for the water level, water pressure, and outflow rate.The equilibrium equations associated with this model can easily be constructed by setting the time derivatives equal to zero and assuming the input signal I K (t) to have a constant value I K : We call the labelling f D , f P , f O that we choose for the equilibrium equations that are constructed from the time derivatives the natural labelling for this dynamical system. 11A solution (X I , X D , X P , X O ) to the system of equilibrium equations satisfies X I = U I and X O = X I almost surely.From this we conclude that, at equilibrium, the outflow rate is independent of the size of the drain I K , assuming that U I is independent of I K .We recorded the changes in the system after we changed the input signal I K of the bathtub system in equilibrium.The results in Figure 3a show that the outflow rate X O has a transient response to changes in the input signal I K ,12 but it eventually converges to its original value.We say that the outflow rate X O in the bathtub model perfectly adapts to changes in I K .

Viral infection model
We consider the example of a simple dynamical model for a viral infection and immune response of De Boer [15] (also discussed in [16]).The model describes target cells X T (t), infected cells X I (t), and an immune response X E (t).We will treat I σ (t) as an exogenous input signal that represents the production rate of target cells.The system is defined by the following dynamic equations: Here, β = bp c where b is the infection rate, p is the number of virus particles produced per infected cell, and c is the clearance rate of viral particles.Furthermore, d T is the death rate of target cells, a is an activation rate, d E and d I are turnover rates and k is a mass-action killing rate.We assume that a, k are constants and that d T , d I , d E , and β are independent exogenous random variables.We use the natural labelling for the equilibrium equations that are constructed from the differential equations: assuming a constant value I σ of the input signal.We initialized the model in an equilibrium state and simulated the response of the model after changing the input signal I σ to three different values.Figure 3b shows that the amount of infected cells X I (t) has a transient response to a change in the input signal I σ , but then returns to its original value.Hence, the amount of infected cells perfectly adapts to changes in the production rate of target cells.

Reaction networks with a negative feedback loop
The phenomenon of perfect adaptation is a common feature in biochemical reaction networks and there exist many reaction networks that can achieve (near) perfect adaptation [41,43].For networks consisting of only three nodes, Ma et al. [17] found by an exhaustive search that there exist two major classes of reaction networks that produce (robust) adaptive behavior.The reaction diagrams for these networks are given in Figure 4.Here we will only analyze the 'Negative Feedback with a Buffer Node' (NFBN) network.An analysis of the other network, the 'Incoherent Feed-forward Loop with a Proportioner Node' (IFFLP), is provided in Appendix E. The NFBN system can be described by the following first-order differential equations: where X A (t), X B (t), X C (t) are concentrations of three compounds A, B, and C, while I(t) represents an external input into the system.Assume that k IA , k CB , and k AC are independent exogenous random variables, that we will denote as U A , U B , U C respectively, and that the other parameters are constants.Perfect adaptation is achieved under saturation conditions [17], (1 , in which case the following approximation can be made: Under the assumption that I(t) has a constant value, the system converges to an equilibrium.We will denote the equilibrium equations that are associated with the time derivatives ẊA (t) and ẊC (t) using the natural labelling f A and f C .The equilibrium equation f B is obtained by setting the approximation of the time derivative ẊB (t) in ( 20) equal to zero.We initialized this model in an equilibrium state and then simulated its response after changing the input signal I to three different values.Figure 3c shows that X C (t) perfectly adapts to changes in the input signal I.The two reaction networks that can achieve perfect adaptation [17].Figure 4a shows Negative Feedback with a Buffer Node (NFBN), while Figure 4b shows an Incoherent Feed-forward Loop with a Proportioner Node (IFFLP).Orange edges represent saturated reactions, blue edges represent linear reactions, and black edges are unconstrained reactions.Arrowheads represent positive influence and edges ending with a circle represent negative influence.

Graphical representations
We will now provide the different graphical representations of the perfectly adapted dynamical systems that were introduced in the previous section.These representations are based on the graphs that are used in [12,13] to encode the equilibrium structure of equations, causal relations, and conditional independences.The main difference with previous work is that we also explicitly consider similar graphical representations for systems that have not yet reached equilibrium.

Bipartite graph:
The equilibrium bipartite graphs associated with the equilibrium equations of the filling bathtub, the viral infection, and the reaction network with feedback are given in Figures 5a, 5b, and 5c, respectively.We have added also a node representing the input signal.The dynamic bipartite graphs for the dynamics of these models are constructed from first-order differential equations in canonical form 14 in the following way.Both the derivative Ẋi (t) and the corresponding variable X i (t) are associated with the same vertex v i .The natural labelling is used for the differential equations, so that a vertex g i is associated with the differential equation for Ẋi (t).We then construct the dynamic bipartite graph B dyn = V, F, E with variable vertices v i ∈ V and the corresponding dynamical equation vertices g i ∈ F .Additional static equation vertices f i ∈ F are added as well in case the dynamical system consists of a combination of dynamic and static equations.The edge set E has an edge (v i − f j ) whenever X i (t) appears in the static equation f j .Additionally, there are edges (v i −g j ) whenever X i (t) or Ẋi (t) appears in the dynamic equation g j (which includes the cases i = j due to the natural labelling used).The dynamic bipartite graphs for the bathtub model, the viral infection, and the reaction network with feedback are given in Figures 5d, 5e, and 5f, respectively.
Comparing the equilibrium bipartite graphs with the dynamic bipartite graphs we note that there is no edge 5d.This is a direct consequence of the fact that the time derivative ẊD (t) in equation ( 4) does not depend on X D (t) itself.Similarly, the edges (v I − f I ) and (v E − f E ) are not present in Figure 5b whilst the edges (v I − g I ) and (v E − g E ) are present in Figure 5e.In this case, we see that even though the time derivatives ẊI (t) and ẊE (t) depend on X I (t) and X E (t) in differential equations ( 12) and ( 13), these variables do not appear in the associated equilibrium equations ( 15) and ( 16).Finally, there is no edge (v B − f B ) in Figure 5c while the edge (v B − g B ) is present in Figure 5f.Here, the variable X B (t) does not appear in the equilibrium equation under saturation conditions (20) that stems from the dynamic equation (18) for ẊB (t).The equilibrium bipartite graph can be compared to the dynamic bipartite graph to read off structural differences between the equations before and after equilibrium has been reached.
Causal ordering graph: Application of the causal ordering algorithm to the equilibrium bipartite graphs of the filling bathtub, the viral infection, and the reaction network results in the equilibrium causal ordering graphs in Figures 5g, 5h, and 5i, respectively.Henceforth, we will assume that the dynamic bipartite graph has a perfect matching that extends the natural labelling of the dynamic equations, i.e., such that all pairs (v i , g i ) are matched.Application of the causal ordering algorithm to the associated dynamic bipartite graph for the model of a filling bathtub, the viral infection model, and the reaction network results in the dynamic causal ordering graphs in Figures 5j, 5k, and 5l, respectively. 15s shown in [12], the absence (presence) of a directed path from an equation vertex to a variable vertex in the equilibrium causal ordering graph indicates that a soft intervention targeting a parameter in that equation has no (a generic) effect on the value of that variable once the system has reached equilibrium again.Furthermore, the absence (presence) of a directed path from a cluster to a variable vertex in the equilibrium causal ordering graph indicates that a perfect intervention targeting the cluster has no (a generic) effect on the value of that variable once the system has reached equilibrium again.Notice that the variables v i in the equilibrium causal ordering graph do not always end up in the same cluster with the equilibrium equation f i of the natural labelling.For example, we see in Figure 5g that a soft intervention targeting the equilibrium equation f O (e.g. a change in the value of U 5 ) does not affect the value of the outflow rate X O at equilibrium (since there is no directed path from f O to v O ), even though f O was obtained from the dynamic equation for the time derivative of the outflow rate X O (t).Similarly, targeting f I with a soft intervention in the viral infection model has no effect on X I at equilibrium and targeting f C in the reaction network model has no effect on the equilibrium distribution of X C . 16This suggests that the causal structure at equilibrium of perfectly adapted dynamical systems may differ from the transient causal structure.In the next section, we will use this idea to detect perfect adaptation from background knowledge and experimental data.
Reaction network.11), (12), and (13).Finally, for the reaction network with negative feedback, we let w A , w B , and w C represent the independent exogenous random variables k IA , k CB and k AC , respectively.The equilibrium Markov ordering graphs for the filling bathtub model, the viral infection model, and the model of a reaction network with a negative feedback loop are given in Figures 5m, 5n, and 5o respectively. 17These equilibrium Markov ordering graphs can be used to read off conditional independences in the equilibrium distribution that are implied by the equilibrium equations of the model.For example, since v I is d-separated from v D given v P in the equilibrium Markov ordering graph in Figure 5m, X I will be independent of X D given X P once the system has reached equilibrium.These independences can be tested for in equilibrium data by means of statistical conditional independence tests.These implied conditional independences can for instance be used in the process of model selection [16].

Existence and uniqueness of solutions
The causal ordering algorithm is a graphical tool that can be useful when solving a system of equations.It decomposes the question of existence and uniqueness of a 'global' solution into several 'local' existence and uniqueness problems corresponding to a partitioning of the equations.When a unique global solution exists for all possible joint values of the (independent) exogenous variables, this leads to both a causal semantics and to a Markov property [12].We argue here that these ideas can also be extended to include differential equations.We will illustrate this with the filling bathtub model.We start with the (conceptually simpler) equilibrium model, which solely contains static equations, before discussing what to do when dynamic equations are present.The equilibrium equations ( 7)-( 10) can be solved in steps by following the topological ordering of the clusters in the equilibrium causal ordering graph in Figure 5g.First, use f I to solve for X I , resulting in X I = U I .Then, use f D to solve for X O , which results in X O = X I .Subsequently, use f O to solve for X P , yielding X P = X O U5I K .Finally, use f P to solve for X D , resulting in X D = X P gU3 .By substitution, we obtain a global solution of the form Since we obtain a unique solution of each equation for the target variable in terms of the other variables appearing in the equation, this procedure shows that there exists a unique global solution of the system of equations for any value of the exogenous variables U I , U 1 , U 2 , U 3 , U 4 , U 5 and any value of the input signal I K .Because of this, we obtain both a causal interpretation and a Markov property for the filling bathtub model at equilibrium as described in Section 3.2.
For the dynamic filling bathtub model, we can follow a similar procedure, but now the clusters may also contain differential equations.We can make use of the theory for the existence and uniqueness of solutions of ordinary differential equations (ODEs).First note that the dynamic bipartite graph reflects the structure of the static and dynamic equations, once we rewrite the differential equations as integral equations.For example, for the time interval [t 0 , t]: Rewriting the differential equations as integral equations has two advantages: (i) there is no need to introduce the derivatives as if they were (variation) independent processes; (ii) it makes the dependence on the initial conditions X D (t 0 ), X P (t 0 ) and X O (t 0 ) explicit.The equations ( 3)-( 6) describing the dynamical system can be solved in steps by following the topological ordering of the clusters in the dynamic causal ordering graph in Figure 5j.First, solve f I for X I , resulting in X I (t) = U I .The cluster {g D , g P , g O , v D , v P , v O } has to be dealt with as a single unit, which means we have to solve the subsystem of three differential equations {g D , g P , g O } (that is, equations ( 4)-( 6)) for its solution with components (X D (t), X P (t), X O (t)).By applying the Picard-Lindelöf theorem [see, e.g., 50], one obtains that this subsystem has a unique solution on a time interval [t 0 , ∞) for any initial condition (X D (t 0 ), X P (t 0 ), X O (t 0 )), provided that X I (t) and I K (t) are continuous and that the input signal I K (t) is bounded.Thus, the equations ( 3)-( 6) have a unique global solution for any value of the exogenous variables U I , U 1 , U 2 , U 3 , U 4 , U 5 , any initial condition (X D (t 0 ), X P (t 0 ), X O (t 0 )), and any continuous and bounded input signal I K (t).The approach of [12] can in this way be extended to yield both a dynamic causal interpretation and a Markov property (by using the trick of [34] to interpret path-continuous stochastic processes as random variables). 18 Important to note here is that this explicit solution procedure shows that at equilibrium, the value of the input signal I K may affect the value of X D and X P , but cannot affect the values of X O and X I , while there can be transient effects of the input signal I K (t) on X D (t), X P (t) and X O (t), but not on X I (t).Furthermore, under appropriate local solvability conditions for each cluster, these observations can directly be read off from the (equilibrium and dynamic) causal ordering graphs.

Identification of perfect adaptation
In Section 3.1, we identified perfect adaptation in three simple models through simulations.Here, we consider how to identify models that are capable of perfect adaptation without requiring simulations or explicit calculations.In Section 4.1 we will put the graphical representations of Section 3.2 to use for identifying perfect adaptation in dynamical models.We discuss possibilities for the identification of perfect adaptation from equilibrium data in Section 4.2.

Identification of perfect adaptation via causal ordering
The identification of perfect adaptation via causal ordering makes use of the causal semantics of the equilibrium causal ordering graph.The following lemma states that a change in the input signal has no effect on the value of a variable if there is no directed path from the input vertex to that variable in the equilibrium causal ordering graph.

Lemma 1. Consider a model consisting of static equations, a set of first-order differential equations in canonical form, and an input signal. Assume that the equilibrium bipartite graph has a perfect matching and that the static equations and equilibrium equations derived from the first-order differential equations are uniquely solvable w.r.t. the equilibrium causal ordering graph for all relevant values of the input signal.
If there is no directed path from the input vertex to a variable vertex in the equilibrium causal ordering graph then the value of the input signal does not influence the equilibrium distribution of that variable.
Proof.The statement follows directly from Theorem 20 in [12].
To establish perfect adaptation, we assume that the presence of a directed path in the dynamic causal ordering graph implies the presence of a transient causal effect.
Assumption 1.Consider a model consisting of static equations, a set of first-order differential equations in canonical form, and an input signal.Assume that the dynamic bipartite graph has a perfect matching that extends the natural labelling.If there is a directed path from the input vertex to a variable vertex in the dynamic causal ordering graph, then there will be a response of that variable to changes in the input signal some time later.
Intuitively, this assumption may seem plausible, as the presence of the directed path in the dynamic causal ordering graph implies that the input signal enters into the construction of the solution of the variable, as sketched in Section 3.3.Unless a perfect cancellation occurs, one then expects a generic effect on the solution some time after the change in the input signal.Assumption 1 can be seen as a consequence of a certain faithfulness assumption. 19We conjecture that this assumption is generically satisfied for a large class of dynamical systems (for example, it might hold for almost all parameter values w.r.t. the Lebesgue measure on a suitable parameter space). 20y combining Lemma 1 and Assumption 1, we immediately obtain the following result.
Theorem 1.Consider a model that satisfies the conditions of Lemma 1 and assume that the associated dynamic bipartite graph has a perfect matching that extends the natural labelling.Under Assumption 1, the presence of a directed path from the input signal I to a variable X v in the dynamic causal ordering graph and the absence of such a path in the equilibrium causal ordering graph implies that X v perfectly adapts to changes in the input signal I if the system equilibrates.
Theorem 1 can be directly applied to the equilibrium and dynamic causal ordering graphs in Figure 5 to identify perfect adaptation.For example, we see that there is a directed path from the input signal I K to v O in the dynamic causal ordering graph of the filling bathtub in Figure 5j, while no such path exists in the equilibrium causal ordering graph in Figure 5g.It follows from Theorem 1 that X O perfectly adapts to changes in the input signal I K .This is in agreement with the simulation in Figure 3a.Similarly, we can verify that the amount of infected cells X I in the viral infection model perfectly adapts to changes in the input signal I σ and that X C perfectly adapts to I in the reaction network with negative feedback.Hence, perfect adaptation in the bathtub model, the viral infection model, and the reaction network with negative feedback can be identified by applying the graphical criteria in Theorem 1 to the respective causal ordering graphs.It is important to keep in mind, though, that what we can identify in this way is only the possibility of perfectly adaptive behavior, relying on the implicit assumption that the system will actually equilibrate.
In Appendix D we show that the sufficient conditions in Theorem 1 for the identification of perfect adaptation are not necessary.More specifically, we construct graphical representations for a dynamical model of a basic enzymatic reaction that achieves perfect adaptation but does not satisfy the conditions in Theorem 1.Interestingly, though, after rewriting the equations the perfectly adaptive behavior of these systems can be captured via Theorem 1. Further, in Appendix E we show that the biochemical reaction network in Figure 4b, which Ma et al. [17] identified as being capable of achieving perfect adaptation, does not satisfy the conditions in Theorem 1 either.We show that a change of variables enables one to still capture the perfectly adaptive behavior of this system via Theorem 1.

Identification of perfect adaptation from data
So far we have only considered how perfect adaptation can be identified in mathematical models.In this section we focus on methods for identifying perfect adaptation from data that is generated by perfectly adapted dynamical systems under experimental conditions.The most straightforward approach to detect perfect adaptation is to collect time-series data while experimentally changing the input signal to the system.One can then simply observe whether the variables in the system revert to their original values.However, this type of experimentation is not always feasible.Another way to identify feedback loops that achieve perfect adaptation uses a combination of observational equilibrium data, background knowledge, and experimental data.Our second main result, Theorem 2, gives sufficient conditions under which we can identify a system that is capable of perfect adaptation from experimental equilibrium data.

Theorem 2. Consider a set of first-order dynamical equations in canonical form for variables V , satisfying the conditions of Theorem 1, with equilibrium equations F under the natural labelling. Consider a soft intervention targeting an equation f i ∈ F . Assume that the system is uniquely solvable w.r.t. the equilibrium causal ordering graph both before and after the intervention and that the intervention alters the equilibrium distribution of all descendants of f i in the equilibrium causal ordering graph. If either 1. the soft intervention does not change the equilibrium distribution of X i , or 2. the soft intervention alters the equilibrium distribution of a variable corresponding to a non-descendant of v i in the equilibrium Markov ordering graph, (or both), then the system is capable of perfect adaptation.
Proof.The proof is given in Appendix A.
Theorem 2 applies in particular to experiments on the filling bathtub, viral infection, and chemical reaction systems (for the corresponding graphs, see Figure 5).For example, a soft intervention targeting f O in the bathtub example has no effect on the outflow rate at equilibrium X O , because there is no directed path from f O to v O in Figure 5g, and an intervention targeting f C has no effect on the equilibrium concentration X C in the reaction network because there is no directed path from f C to v C in Figure 5i.In both cases the first condition of Theorem 2 is satisfied.For the viral infection model, we see that a soft intervention targeting f E has an effect on the amount of infected cells X I at equilibrium (since there is a directed path from f E to v I in Figure 5h), while there is no directed path from v E to v I in Figure 5n.In this case the second condition of Theorem 2 is satisfied.
We can devise the following scheme to detect perfectly adapted dynamical systems from data and background knowledge.The procedure relies on several assumptions, including d-faithfulness of the equilibrium distribution to the equilibrium Markov ordering graph.We start by collecting observational equilibrium data and use a causal discovery algorithm (such as LCD or FCI) to learn a (partial) representation of the equilibrium Markov ordering graph, assuming the observational distribution at equilibrium to be d-faithful w.r.t. the equilibrium Markov ordering graph.We then consider a soft intervention that changes a known equation in the first-order differential equation model (i.e. it targets a known equilibrium equation).If this intervention does not change the distribution of the variable corresponding to this target using the natural labelling, or if it changes the distribution of identifiable non-descendants of the variable corresponding to the target according to the learned Markov equivalence class, we can apply Theorem 2 to identify the dynamical system as being capable of perfect adaptation.This way, we could identify perfect adaptation in specific cases such as the filling bathtub, viral infection, and reaction network by exploiting a combination of background knowledge and experimental data.Another example of a possible application of Theorem 2 is given in Section 5.3.

Perfect adaptation in protein signalling
In this section we apply the ideas developed in the previous sections to a biological system that has been intensely studied during the past decades to emphasize the practical relevance of perfect adaptation.The so-called RAS-RAF-MEK-ERK signaling cascade is a text-book example of a protein signalling network, which forms an important ingredient of the 'machinery' of cells in living organisms.The molecular pathways in such a network fulfill various important functions, for instance the transmission and processing of information.Systems biologists make use of dynamical systems to model such networks both qualitatively and quantitatively.Because of the high complexity of protein signalling networks, which typically consist of many different interacting components, this has also been considered a promising application domain for causal discovery methods.
In an influential paper, Sachs et al. [18] applied causal discovery to reconstruct a protein signaling network from experimental data.Over the years, the dataset of [18] has become an often used 'benchmark' for assessing the accuracy of causal discovery algorithms, where the 'consensus network' in [18] is usually considered as the perfect ground truth.The apparent successes of causal discovery on this particular dataset may have led to the impression that causal discovery algorithms can in general successfully discover the causal semantics of complex protein signaling networks from real-world data.However, this success has hitherto not been repeated on other, similar datasets, to the best of our knowledge.Indeed, modeling and understanding such systems and inferring their behavior and structure from data still poses many challenges, for instance because of feedback loops and the inherent dynamical nature of such systems [52].
In this section, we focus on understanding the properties of the equilibrium distribution of a simple model of the RAS-RAF-MEK-ERK signalling pathway, and specifically investigate the phenomenon of perfect adaptation.Like many other biological systems, protein signalling networks often show adaptive behavior which helps to ensure a certain robustness of their functionality against various disturbances and perturbations [43].Using the technique of causal ordering to analyze the conditional independences and causal relations that are implied by the model at equilibrium, we elucidate the causal interpretation of the output of constraint-based causal discovery algorithms when they are applied to equilibrium protein expression data if the parameters are such that the system shows perfect adaptation.
We test some of the model's predictions on real-world data and compare with another model that has been proposed.We do not claim that the perfectly adaptive model that we analyze here is a realistic model of the protein signalling pathway.Although we will show in Section 5.4 that the model is able to explain certain observations in real-world data, this is not that surprising for a model with that many parameters. 21nstead, our goal is to demonstrate that in systems with perfect adaptation the standard interpretation of the output of causal discovery algorithms may not apply. 22This could explain why the output of certain causal discovery algorithms applied to the data of [18] appears to be at odds with the biological consensus network presented in [18], see for example [47] and [9].This section is structured as follows.In Section 5.1 we introduce the perfectly adaptive model for the signalling pathway.We proceed with the associated graphical representations in Section 5.2.Then, in I + = 0.9

Graphical representations
We consider graphical representations of the protein signalling pathway.Using the natural labelling, we construct the dynamic bipartite graph in Figure 7a from the first-order differential equations, with the input signal I included.The associated dynamic causal ordering graph is given in Figure 7b.Under saturation conditions, the equilibrium equations f s , f r , f m , and f e obtained by setting equations ( 25), (27), and ( 29) to zero have the bipartite structure in Figure 7c.Note that there is no edge (f e −v e ) in the equilibrium bipartite graph because X E (t) does not appear in the approximation ( 29) of ( 28).The associated equilibrium causal ordering graph is given in Figure 7d, where the cluster {I} is added with an edge towards the cluster {v e , f s } because I appears in equation ( 25) and in no other equations.So far we have treated all symbols in equations ( 25), ( 26), (27), and ( 28) as deterministic parameters.Let w s , w r , w m , and w e represent independent exogenous random variables appearing in the equilibrium equations f s , f r , f m , and f e respectively.After adding them to the causal ordering graph with edges to their respective clusters we construct the equilibrium Markov ordering graph for the equilibrium distribution in Figure 7e.There is a directed path from the input vertex I to v s , v r , and v m in the dynamic causal ordering graph (see Figure 7b), while the equilibrium causal ordering graph has no directed paths from I to either v s , v r , or v m (see Figure 7d).Under Assumption 1, we can apply Theorem 1 to conclude that X s , X r , and X m will perfectly adapt to a persistent change in the input signal I.This is in line with what we observed in the simulations (see Figure 6).
The d-separations in the equilibrium Markov ordering graph (see Figure 7e) imply conditional independences between the corresponding variables at equilibrium.For example, from the graph in Figure 7e we read off the following (implied) conditional independences: We verified that these conditional independences indeed appear in the simulated equilibrium distribution of the model (see Appendix C for details).

Inhibiting the activity of MEK
A common biological experiment that is used to study protein signalling pathways is the use of an inhibitor that decreases the activity of a protein on the pathway.Such an inhibitor slows down the rate at which the active protein is able to activate another protein.Here, we consider inhibition of MEK activity.We can model this as a change of the parameters of the differential equations in which X m (t) appears.We can interpret this experiment as a soft intervention on differential equation g e in the dynamic model and on equation f e at equilibrium, decreasing the rate k me at which ERK is activated.Since there is a directed path from f e to v m , v r , v s , and v e in the equilibrium causal ordering graph in Figure 7d, expect that a change in an input signal I e on f e (e.g. a change in the parameter k me in the case of the MEK inhibition) affects the equilibrium concentrations of active MEK, RAF, RAS, and ERK.
We assessed the effect of decreasing the activity of MEK on the equilibrium concentrations of RAS, RAF, MEK, and ERK.To that end, we simulated the perfectly adapted model (with parameters as described in Appendix B, in particular, k me = 1.0) until it reached equilibrium.We then decreased the parameter that controls the activity of MEK to k me = 0.98.The recorded responses of the concentrations of active RAS, RAF, MEK, and ERK are displayed in Figure 8. From this we confirm our prediction that inhibition of MEK activity affects the equilibrium concentrations of RAS, RAF, MEK, and ERK.A qualitative aspect of this change that one cannot read off from the graph is that the MEK inhibition increases (rather than decreases) the concentrations of active MEK, RAF, RAS according to the model for this choice of the parameters.
Note that RAS, RAF, and MEK are non-descendants of ERK in the equilibrium Markov ordering graph in Figure 7e, so that under the assumptions in Theorem 2 we could actually use this experiment to detect perfect adaptation in the protein pathway.

Testing model predictions on real-world data
In this subsection, we verify some of the predictions of the model we obtained in Sections 5.2 and 5.3 on real-world data.We will compare with predictions of the causal Bayesian network model proposed by [18].
Figure 9 shows scatter-plots for the (logarithms of) the expressions of active RAF, MEK and ERK in the multivariate single-cell protein expression dataset that was used in [18], for three (out of 14) different experimental conditions.The baseline condition (in blue) is the one where the cells were treated with anti-CD3 and anti-CD28, activators of the RAS-RAF-MEK-ERK signalling cascade.In another condition (in red), the cells were additionally exposed to U0126, a known inhibitor of MEK activity.By inspecting the scatter plots, we get a quick visual check of some of the predictions of the model.In particular, these plots clearly show that inhibition of MEK activity by administering U0126 results in an increase in the concentrations of active RAF and active MEK and a reduction in the concentration of active ERK.Furthermore, we clearly see a strong dependence between RAF and MEK (in both experimental conditions), but there is no discernible dependence between RAF and ERK or between MEK and ERK (in either experimental condition).In the light of the supposedly direct effect of MEK on ERK, it is surprising that the data shows no significant dependence between the two. 24This apparent 'faithfulness violation' is problematic for constraint-based causal discovery methods, as they will typically not identify the causal relation between MEK and ERK.Scatter plots of the logarithms of active RAF, MEK, and ERK concentrations for the data in [18].The blue circles correspond to cells treated only with anti-CD3 and anti-CD28, which activate the signaling cascade.The red circles correspond to cells treated with anti-CD3, anti-CD28 and in addition, the MEK-activity inhibitor U0126.The inhibition of MEK results in an increase of MEK and RAF, whereas ERK is reduced.The black circles correspond to cells treated with β2cAMP (but not anti-CD3 and anti-CD28), which seems to affect MEK, but leaves RAF and ERK invariant.
According to [18], the biological consensus (at the time) was that there is a signalling pathway from RAF to MEK to ERK. 25 They propose to model this as a causal Bayesian network RAF → MEK → ERK, Table 1.For various observations in the data of [18], we indicate whether they are predicted by the causal Bayesian network model RAF → MEK → ERK proposed by [18] and by our perfectly adaptive equilibrium model (Section 5.2).24 In addition, ERK levels are considerably lower than those of RAF and MEK.This might be something specific to this experimental setting (perhaps there was too much time between stimulation and measurement to see the ERK response), as in other experiments high ERK levels and strong correlations with MEK have been observed [53].In the setting of the experiment of [18], the G06976 treatment condition shows that ERK levels can actually get as high as those of RAF and MEK. 25 Nowadays, the biological consensus appears to be that RAF activates MEK, MEK activates ERK, and that it is very likely that there is negative feedback from ERK to RAF, although the molecular pathway of this feedback remains unknown [54].

Observation
and this is also the structure identified by their causal discovery algorithm.That model predicts that inhibiting MEK activity can only affect ERK.In Table 1, we summarize some of the observations made using the scatter plots in Figure 9, and whether or not they are in line with the predictions of the models (our perfectly adaptive model on the one hand, and the causal Bayesian network model on the other hand).
We conclude that the predictions of the perfectly adaptive model are more in agreement with the data than those of the causal Bayesian network model.Still, the perfectly adaptive model does not explain all aspects of the data.For example, the effects of the β2cAMP stimulation (black circles in Figure 9) are hard to explain with either model.β2cAMP is an AMP analogue that is supposed to activate the RAS-RAF-MEK-ERK cascade by promoting active RAS.It seems counterintuitive that this would lead to a strong reduction of active MEK in comparison to the activation of the cascade by means of anti-CD3 and anti-CD28, while leading to the same levels of active RAF and ERK.For completeness, in Table 2 we have indicated for all 12 perturbations in [18] the effects on the levels of active RAF, MEK and ERK.It appears questionable whether a simple causal model (such as the perfectly adaptive model) could account for all the observed perturbation effects.
Table 2. Qualitative effects of reagents on the measured abundances of active RAF, MEK and ERK, as read off from the data in [18].Legend: −− strong decrease, − decrease, (−) slight decrease, 0 no change, (+) slight increase, + increase, ++ strong increase.Most researchers only use conditions 1-9 for causal discovery.The data appears to contain an error: RAF and MEK values are identical for the first 848 samples in conditions 3 and 7; therefore, we disregarded those values.

Caveats for causal discovery
Experiments in which the protein signalling network is perturbed in various ways are of crucial importance to obtaining a causal understanding of the system.While very sophisticated causal discovery algorithms are available, we will here illustrate the key concepts by means of applying one of the simplest causal discovery algorithms based on conditional independences to equilibrium data from the model.We simulate the system in two different conditions, a baseline and a condition where the activity of MEK has been inhibited.Consider observational equilibrium data from the protein signalling pathway model and also experimental equilibrium data from a setting where MEK activity is inhibited.We introduce a context variable C that in this case simply indicates for each sample whether or not a MEK inhibition was applied.Because the decision whether to apply the MEK inhibitor to a sample occurs before the measurement, the equilibrium concentrations measured afterwards cannot causally affect this decision.Hence C is an exogenous variable, i.e., it is not caused by other observed variables.This motivates the use of the LCD algorithm, which requires the existence of such a variable.For the MEK inhibition, the context variable represents a (soft) intervention on the equation f e in the equilibrium causal ordering graph in Figure 7d.The equilibrium Markov ordering graph that includes the context variable C (but where the independent exogenous random variables have been marginalized out) is given in Figure 10a.To construct this graph, the context variable C is first added to the equilibrium causal ordering graph in Figure 7d as a singleton cluster with an edge towards the cluster {v m , f e }.The equilibrium Markov ordering graph is then constructed from the resulting directed cluster graph in the usual way.From Figure 10a, we can read off (conditional) independences to find the LCD triples that are implied by the equilibrium equations of the model.We find that our model implies the following LCD triples: and  (C, v s , v e ).We verified this by explicit simulation (details in Appendix C).
In particular, one of the LCD triples we obtained is (C, v m , v r ), whereas the triple (C, v r , v m ) does not qualify as such.According to the standard interpretation of causal discovery algorithms, this would imply that MEK causes RAF rather than the other way around (see also Section 2.1), a surprising finding in the light of the text-book treatments of the RAS-RAF-MEK-ERK signalling pathway, and the 'consensus network' according to [18].The equilibrium Markov ordering graph corresponding to the causal Bayesian network model of [18] is shown in Figure 10b.As one can see from Figure 10b, the causal Bayesian network model implies no LCD triples at all.
Apparently, the causal relationship RAF→MEK appears to be reversed in the LCD triples found in the equilibrium data of the perfectly adaptive model.Similar observations of apparent 'causal reversals' in protein interaction networks have been observed more often, see also [9,[45][46][47][48].We conclude that the mechanism of perfect adaptation provides one possible theoretical explanation of what might seem at first sight to be an incorrect reversal of a causal edge.
We investigated which of these LCD patterns can be found in the real-world data.In Table 3 we list the p-values of conditional independence tests applied to the data of [18]. 26Using a reasonable critical level for the test (say α = 0.01), we do not find any LCD pattern.Yet, the conditional dependence of RAF on the context variable when conditioning on MEK is much weaker than the conditional dependence of MEK on the context variable when conditioning on RAF.Strictly speaking, no conclusions should be drawn from this, but it does seem to suggest that also here the data is more in line with the perfectly adaptive model than with the causal Bayesian network model; indeed, if the adaptation is imperfect, for example because Table 3. Results of conditional independence tests on the (log-transformed) protein expression data of [18].Specifically, we report the p-values of Kendall's test for partial correlation.

Discussion and conclusion
Perfect adaptation is the phenomenon that a dynamical system initially responds a change of input signal but reverts back to its original value as the system converges to equilibrium.We used the technique of causal ordering to obtain sufficient graphical conditions to identify perfect adaptation in a dynamical system described by a combination of equations and first-order differential equations.To represent the structure of the (non-equilibrium) dynamical system, we introduced the notions of the dynamic bipartite graph and the corresponding dynamic causal ordering graph obtained by the causal ordering algorithm.Moreover, we showed how perfect adaptation can be detected in equilibrium observational and experimental data for soft interventions with known targets.We illustrated our ideas on a variety of dynamical models and corresponding equilibrium equations.We believe that the methods presented in this work provide a useful tool for the characterization of a large class of dynamical systems that are able to achieve perfect adaptation and for the automated analysis of the behavior of certain perfectly adapted dynamical systems.
In all examples that we discussed, the technique of causal ordering revealed the structure of a given set of equations.In some cases, more structure can be revealed by first rewriting the equations before applying the causal ordering algorithm.In Appendix D we analyze a dynamical system describing a basic enzyme reaction, for which rewriting of the equilibrium equations reveals more structure (and hence yields a stronger Markov property).In Appendix E, we analyze the 'Incoherent Feed-forward Loop with a Proportioner Node' (IFFLP).We observe that a nonlinear transformation of the variables (and rewriting the equations correspondingly) reveals more structure, and hence yields more conditional independences and less causal relations amongst the transformed variables.The further development of methods to analyze perfectly adapted dynamical systems that do not satisfy the conditions of Theorem 1 remains a challenge for future work.Also, the question of how one can more generally discover causal structure using nonlinear transformations of the variables is an interesting topic for future research that relates to what is nowadays known as 'causal representation learning' in machine learning [55].
We also investigated the consequences of the phenomenon of perfect adaptation for causal discovery.We demonstrated that for perfectly adapted dynamical systems the output of existing constraint-based causal discovery algorithms applied to equilibrium data may appear counterintuitive and at odds with our understanding of the mechanisms that drive the system.As we have illustrated in this work, careful application of the causal ordering algorithm enables a better theoretical understanding of these phenomena.
We applied our approach to a model for a well-known protein signalling pathway, and tested the model's predictions both in simulations and on real-world protein expression data.The challenges for causal discovery that are encountered in non-linear dynamical systems with feedback loops, possibly leading to context-specific perfectly adaptive behavior, are substantial.If the behavior of the model that we analyzed in Section 5 is representative of that of actual systems occurring in vitro and in vivo, then it seems unlikely that existing causal discovery methods based on causal Bayesian networks will lead to reasonable results.These observations further motivate the development of causal discovery algorithms based on bipartite graphical representations that would be more widely applicable than the existing ones based on causal Bayesian networks or simple structural causal models.perfect matching M of the equilibrium bipartite graph let v j = M (f i ).Then v j is in the same cluster as f i in the equilibrium causal ordering graph by construction.Note that j = i would give a contradiction, as then v i would be an ancestor of v h in the equilibrium Markov ordering graph.Suppose that the vertex f j , which is associated with v j through the natural labelling, is matched to a non-ancestor of v j in the equilibrium causal ordering graph.Because of the edge (g j − v j ) in the dynamic bipartite graph, it follows from Theorem 1 that X j perfectly adapts to an input signal I fj on f j .Therefore the system is able to achieve perfect adaptation.Now suppose that f j is matched to an ancestor v k of v j in the equilibrium causal ordering graph, and consider the vertex f k .The previous argument can be repeated to show perfect adaptation for X k is present when f k is matched to a non-ancestor of v k in the equilibrium causal ordering graph.Otherwise, f k must be matched to an ancestor of v k .Note that the ancestors of v k are a subset of the ancestors of v j , which in turn are a subset of the ancestors of v i .In a finite system of equations, v i has a finite set of ancestors and therefore we eventually find, by repeating our argument, a vertex f m that cannot be matched to an ancestor of v m because v m has no ancestors that are not matched to one of the vertices f i , f j , f k , . . .that were considered up to that point.Because f m is matched to a non-ancestor we then find that X m perfectly adapts to an input signal on I fm as before.

B Simulation settings
For the simulations in Figures 3, 6 and 8 of the model of a filling bathtub, the viral infection model, the reaction network with a feedback loop, and the protein pathway we used the settings listed below.Since we only simulated a single response, we used constant values for the exogenous random variables as well.
Filling bathtub First we recorded the behavior of the system for the parameters 0 until it reached equilibrium.We then changed the input parameter I K to 0.8, 1.0, and 1.3 and recorded the response until the system reverted to equilibrium.Viral infection For the parameter settings I σ = 1.6, d T = 0.9, β = 0.9, d I = 0.3, k = 1.5, a = 0.1, d E = 0.25, we simulated the model until it reached equilibrium.We changed the input parameter I σ to 1.1, 1.3, and 2.0 and recorded the response until equilibrium was reached.Reaction network We simulated the model until it reached equilibrium with parameters 5, k BC = 0.7, K BC = 0.6.The settings were chosen in such a way that the saturation conditions (1 − X B (t)) K CB and X B (t) K F B B were satisfied.We then changed the input signal to 0.25, 1.0, and 10.0 and recorded the response.Protein pathway The parameter settings of the simulation were .0001, T e = 1.0,F e = 0.7, k Fee = 1.2,K Fee = 0.0001.This ensured that the saturation conditions (T e − X e (t)) K me and X e (t) K Fee were satisfied.For Figure 6, we changed the input signal I to 0.9, 1.1, 1.2, respectively, after the system had reached equilibrium, and continued the simulation.For Figure 8 we reduced the parameter value of k me to 0.98 at some point in time after the system had equilibrated.
The same qualitative behavior as reported here can be observed for a range of parameter values.The behavior of the protein pathway is rather complex; in particular, X e (t) may show a switch-like behavior (either taking on a value close to 0, or close to T e ), and therefore some parameter tuning is required if one wants to ensure that the assumed saturation conditions ((T e − X e (t)) K me and X e (t) K Fee ) hold.

C Conditional independences and causal discovery
The equilibrium Markov ordering graph in Figure 7e was derived from the equilibrium equations of the protein pathway model under saturation conditions.From this we can read off the following d-separations: It is easy to check that the equilibrium equations and endogenous variables in this model are uniquely solvable w.r.t. the causal ordering graph.Therefore, the d-separations above imply conditional independences between the variables in the model, assuming the exogenous variables to be independent.
To test whether the predicted conditional independences hold when the system is at equilibrium, we ran the simulation n = 500 times until it reached equilibrium and recorded the equilibrium concentrations X s , X r , X m , and X e .Parameters were as described in Appendix B, except that k Is , k sr , k rm , k me were all drawn randomly from a uniform distribution on [1.0, 1.1], and the input signal I was drawn randomly from a uniform distribution on [0.9, 1.1].We tested all (conditional) independences with a maximum of one conditioning variable using Spearman's rank correlation test with a p-value threshold of 0.01. 27Table 4 shows that the conditional independences with a maximum conditioning set of size one that are implied by the equilibrium Markov ordering graph are indeed present in the simulated data.
To verify the predicted LCD triples of Section 5.5, we ran the simulation n = 500 times with k me = C as the context variable.Parameters were as described in Appendix B, except that I, k Is , k Fss , k sr , k rm , k me were all drawn randomly from a uniform distribution on [1.0, 1.1], and the parameter k Fee from a uniform distribution on [1.2, 1.3].Note that some parameter tuning is necessary if one wants to ensure that the equilibrium state satisfies the saturation conditions (T e − X e (t)) K me and X e (t) K Fee .On the other hand, the parameters need to have sufficient random variation, otherwise LCD may fail because of (almost) deterministic relations between variables.We ran the simulations until the system reaches equilibrium and recorded the equilibrium values of the variables.We then applied the LCD algorithm to search for LCD triples in this equilibrium data with context variable C = k me .For the conditional independence tests we used Spearman's rank correlation with a p-value threshold of 0.01.We found the expected LCD triples (C, v m , v r ), (C, v m , v s ), (C, v m , v e ), (C, v r , v s ), (C, v r , v e ), (C, v s , v e ) and no others.

D Rewriting equations may reveal additional structure
Theorem 1 specifies sufficient but not necessary conditions for the presence of perfect adaptation.The equilibrium distribution of some systems is not faithful to the equilibrium Markov ordering graph associated with the equilibrium equations of the model.Here, we will discuss a dynamical model for a basic enzymatic reaction and we will demonstrate that this model is capable of perfect adaptation.However, it does not satisfy the conditions in Theorem 1, and the presence of directed paths in the equilibrium causal ordering graph does not imply the presence of a causal effect at equilibrium.We will also show that this may be addressed by appropriately rewriting the equations.
The basic enzyme reaction models a substrate S that reacts with an enzyme E to form a complex C, which is converted into a product P and the enzyme E. The dynamical equations for the concentrations X S (t), X E (t), X C (t), and X P (t) are given by: where k −1 , k 0 , k 1 , k 2 , k 3 are parameters taking value in R >0 [39,56].We treat the initial conditions X S (0) = S 0 , X C (0) = C 0 , X E (0) = E 0 , and X P (0) = P 0 as independent exogenous random variables, and consider the parameter k 1 as input signal.
Figure 11a shows the dynamic bipartite graph corresponding to the dynamic equations, and Figure 11b the corresponding dynamic causal ordering graph.Since there is a directed path from k 1 to v P in Figure 11b, we would expect that a change in k 1 would generically lead to a response of X P (t).We verified this by simulating this model with k 5 and with initial conditions S 0 = 0.45, E 0 = 0.5, C 0 = 1.25, and P 0 = 0.40 until the system reached equilibrium.We then recorded the response of X P (t) after changing the input signal k 1 into different values k + 1 .Figure 11c shows that X P (t) indeed responds to changes in the input signal k 1 , but eventually returns to its original equilibrium value.31), (32), and (33).Panel (c) shows simulations that suggest that the concentration X P perfectly adapts after an initial transient response to a persistent change in the parameter k 1 .35), ( 36), (37), and (38).Panels (c) and (d) show the equilibrium bipartite graph and the equilibrium causal ordering graph, respectively, for the rewritten equilibrium equations (34), ( 37), (38) and (39).Rewriting the equilibrium equations yields a sparser equilibrium bipartite graph, and therefore more structure is revealed in the equilibrium causal ordering graph.
The equilibrium equations of the model are given by: where the last equation is derived from the constant of motion X C (t)+X E (t) (see [39] for more details).The equilibrium bipartite graph (Figure 12a) does not have a perfect matching (as it contains more equations than endogenous variables), but we can apply the extended causal ordering algorithm [12] to construct the equilibrium causal ordering graph in Figure 12b.There is a directed path from k 1 to v P in the equilibrium Markov ordering graph.Therefore, even though the basic enzyme reaction does achieve perfect adaptation, we see that it does not satisfy the conditions of Theorem 1.The basic enzyme reaction is an example of a system for which directed paths in the equilibrium causal ordering graph do not imply generic causal relations between variables.By rewriting the equilibrium equations we can achieve stronger conclusions for this particular case.For instance, we can consider the equation f C , obtained from summing equations f S and f C : in combination with f S , f P , and f CE .The equilibrium equations f C and f E can be dropped because they are linear combinations of the other equations.The graphs corresponding with these rewritten equations are shown in Figure 12c-d.The equilibrium bipartite graph for the rewritten equations in Figure 12c turns out to be sparser than the one for the original equilibrium equations in Figure 12a.The equilibrium causal ordering graph in Figure 12d for the rewritten equilibrium equations consequently reveals more structure than the one in Figure 12b for the original equilibrium equations.The two equilibrium causal ordering graphs do not model the same set of perfect interventions.For example, the (non)effects of an intervention targeting the cluster {v S , f S } in the equilibrium causal ordering graph in Figure 12d (where f S is replaced by an equation v S = ξ S setting v S equal to a constant ξ S ∈ R >0 ) cannot be read off from the equilibrium causal ordering graph in Figure 12b.Furthermore, there is no  47), (48), and (49) and variables X A , X R , X C .

F Markov ordering graphs have no inherent causal interpretation
The causal interpretation of the equilibrium Markov ordering graph for the bathtub model is discussed at length in [12].The conclusion is that the Markov ordering graph alone does not contain enough information to read off the effects of interventions in an unambiguous way. 28As a result, the Markov ordering graph does not have a straightforward causal interpretation in terms of interventions, contrary to what is sometimes claimed [13,14].For the sake of completeness we will summarize here the discussion of the causal interpretation of the equilibrium Markov ordering graph of the bathtub model.

Example 2.
For the bathtub model (Section 3.1.1),consider an intervention targeting the dynamical equation g D that also changes the associated equilibrium equation f D .For example, consider putting the bathtub outside in the rain.This will not change the inflow through the faucet, but will add to the total amount of in-flowing water into the tub, and can be modeled by modifying equation ( 4) into: where U R (t) is a new exogenous variable that quantifies the amount of in-flowing water due to the rain (in the original model, U R (t) = 0).Through explicit calculations, it can be shown that this soft intervention has an effect on X O , X P , and X D at equilibrium. 29If we were to read off the effects of a soft intervention targeting f D by verifying whether there is a directed path from v D in the equilibrium Markov ordering graph, we would erroneously conclude that this intervention only has an effect on X D .In a similar fashion it can be shown that the graph does not represent the effects of perfect interventions targeting {f D , v D } or {f D , v O } either.From the equilibrium causal ordering graph one can read off that the absence and presence of (generic) effects of the perfect intervention {f P , v D } correspond with the absence and presence of directed paths in the equilibrium Markov ordering graph starting from v D .Because the equilibrium Markov ordering graph alone does not specify to which experimental procedure a perfect intervention on one of its vertices corresponds to, we conclude that the equilibrium Markov ordering graph on its own does not possess a causal interpretation.
A correct interpretation of a directed edge (v i → v j ) in the equilibrium Markov ordering graph would be that an intervention targeting equations in the cluster of v i in the causal ordering graph generically will have an effect on the equilibrium distribution of v j .However, the Markov ordering graph itself does not specify the variables and equations that share a cluster with the variables in the causal ordering graph.In many dynamical systems, though, the equilibrium equations f i derived from differential equations for variables X i (t) end up in the same cluster as the associated variable v i .Under such an assumption, one could give an unambiguous causal interpretation to the Markov ordering graph in terms of perfect interventions. 30 However, there is another obstacle to interpreting a directed edge (v i → v j ) in the Markov ordering graph as a direct causal effect.This can go wrong in case causal cycles are present.Indeed, the Markov ordering graph for a simple SCM is the acyclification of its graph [23].Therefore, if v j is part of a feedback loop together with v k , and the structural equation of v k depends on v i , the Markov ordering graph will contain the directed edge v i → v j even if there is no direct effect of v i on v j (that is, if the structural equation for v j does not depend on v i ).
Therefore, if one cannot rule out the possibility of feedback, one should avoid reading the Markov ordering graph as if the directed edges represent direct causal effects (and as if directed paths represent causal effects).
30 For first-order dynamical systems in which each variable is self-regulating, each variable vi shares a cluster with the equilibrium equation fi associated with the time derivative Ẋi(t) [16].Note that the examples of perfectly adaptive systems that we have considered in this work do contain variables that are not self-regulating.

Figure 2 .
Figure 2. Several graphs occurring in Example 1.The bipartite graph B associated with equations (1) and (2) is given in Figure 2a.The oriented graph G(B, M ) obtained in step 2 of the causal ordering algorithm (with perfect matching M ) is shown in Figure 2b.The causal ordering graph CO(B) is given in Figure 2c.The corresponding equilibrium Markov ordering graph MO(B) is displayed in Figure 2d.

Figure 3 .
Figure 3. Simulations of the outflow rate X O (t) in the bathtub model (a), the amount of infected cells X I (t) in the viral infection model (b), and the concentration X C (t) in the biochemical reaction network with a negative feedback loop (c).The input signal suddenly changes from an (constant) value for t < t 0 to a different (constant) value for t ≥ t 0 .The timing of this change is indicated by a vertical dashed line.The three systems started with input signalsI − K = 1.2,I − σ = 1.6, and I − = 1.5, respectively.After a transient response, X O (t), X I (t), and X C (t) all converge to their original equilibrium value (i.e., they perfectly adapt to the input signal).
Negative feedback with a buffer node.
Incoherent feed-forward loop with a proportioner node.

Figure 4 .
Figure 4.The two reaction networks that can achieve perfect adaptation[17].Figure4ashows Negative Feedback with a Buffer Node (NFBN), while Figure4bshows an Incoherent Feed-forward Loop with a Proportioner Node (IFFLP).Orange edges represent saturated reactions, blue edges represent linear reactions, and black edges are unconstrained reactions.Arrowheads represent positive influence and edges ending with a circle represent negative influence.

Figure 6 .
Figure 6.Perfect adaptation in the model for the RAS-RAF-MEK-ERK signalling pathway.After an initial response to a change of the input signal from value I − = 1.0 to a value I + at time t = t 0 , the equilibrium concentrations of active RAS, RAF, and MEK revert to their original values, while the concentration of active ERK changes.
Equilibrium Markov ordering graph.

Figure 7 .
Figure 7. Five graphs associated with the protein signalling pathway model under saturation conditions where indices s, r, m, e correspond to concentrations of active RAS, RAF, MEK, and ERK respectively.

Figure 8 .
Figure 8. Simulation of the response of the concentrations of active RAS, RAF, MEK, and ERK after inhibition of the activity of MEK.The system starts out in equilibrium with kme = 1.0.The concentrations of RAS, RAF, MEK, and ERK are recorded after the parameter controlling MEK activity is decreased kme = 0.98 from t = t 0 on.
Log-expressions of active MEK and RAF.Log-expressions of active MEK and ERK.Log-expressions of active RAF and ERK.

Figure 9 .
Figure 9. Scatter plots of the logarithms of active RAF, MEK, and ERK concentrations for the data in[18].The blue circles correspond to cells treated only with anti-CD3 and anti-CD28, which activate the signaling cascade.The red circles correspond to cells treated with anti-CD3, anti-CD28 and in addition, the MEK-activity inhibitor U0126.The inhibition of MEK results in an increase of MEK and RAF, whereas ERK is reduced.The black circles correspond to cells treated with β2cAMP (but not anti-CD3 and anti-CD28), which seems to affect MEK, but leaves RAF and ERK invariant.
Causal Bayesian network model Perfectly adaptive model RAF and MEK are dependent in both conditions + + MEK and ERK are independent in both conditions − − Inhibition of MEK activity affects active RAF − + Inhibition of MEK activity affects active MEK − + Inhibition of MEK activity affects active ERK + +

(Figure 10 .
Figure10.Equilibrium Markov ordering graphs of the protein signalling pathway with the context variable C included, corresponding to two hypothetical models.This context variable indicates whether a cell was treated with a MEK inhibitor or not. c) Response of X P to persistent changes of k1.k1(t) = k − 1 for t < t0, k1(t) = k + 1 for t ≥ t0.

Figure 11 .
Figure 11.The dynamic bipartite graph (a) and dynamic causal ordering graph (b) for the basic enzyme reaction modelled by equations (30), (31),(32), and(33).Panel (c) shows simulations that suggest that the concentration X P perfectly adapts after an initial transient response to a persistent change in the parameter k 1 .
Alternative equilibrium causal ordering graph.

Figure 12 .
Figure 12.Different graphical representations of the basic enzyme reaction at equilibrium.Panels (a) and (b) show the equilibrium bipartite graph and equilibrium causal ordering graph, respectively, for the equilibrium equations (34), (35), (36),(37), and(38).Panels (c) and (d) show the equilibrium bipartite graph and the equilibrium causal ordering graph, respectively, for the rewritten equilibrium equations (34), (37),(38) and(39).Rewriting the equilibrium equations yields a sparser equilibrium bipartite graph, and therefore more structure is revealed in the equilibrium causal ordering graph.
Graphical representations of the bathtub model (left column), the viral infection model (center column), and the reaction network with negative feedback (right column).The input vertices I K , Iσ, and I are represented by black dots while exogenous variables are indicated by dashed circles.For each model, the structure of the equilibrium equations can be read off from the equilibrium bipartite graphs in Figures5a, 5b, and 5c.The structure of the first-order differential equations can be represented with the dynamic bipartite graphs in Figures5d, 5e, and 5f.The equilibrium causal ordering graphs corresponding to the equilibrium bipartite graphs are given in Figures5g, 5h, and 5i.Similarly, the dynamic causal ordering graphs corresponding to the dynamic bipartite graphs can be found in Figures5j, 5k, and 5l.The equilibrium Markov ordering graphs for the equilibrium distribution of the models are given in Figures5m, 5n, and 5o.Equilibrium Markov ordering graph: As explained in Section 2.2, the Markov ordering graph is constructed from the causal ordering graph and includes exogenous variables.For the bathtub model, we let vertices w I , w 1 , . . ., w 5 represent the independent exogenous random variables U I , U 1 , . . ., U 5 that appear in the model.For the viral infection model we let w T , w I , w E , w β represent independent exogenous random variables d T , d I , d E , and β in equations ( Equilibrium Markov ordering graph: Reaction network.Figure 5.