Comparison of open-source software for producing directed acyclic graphs

Many software packages have been developed to assist researchers in drawing directed acyclic graphs (DAGs), each with unique functionality and usability. We examine five of the most common software to generate DAGs: TikZ, DAGitty, ggdag, dagR, and igraph. For each package, we provide a general description of the its background, analysis and visualization capabilities, and user-friendliness. Additionally in order to compare packages, we produce two DAGs in each software, the first featuring a simple confounding structure, while the second includes a more complex structure with three confounders and a mediator. We provide recommendations for when to use each software depending on the user's needs.


Introduction
Research describing how to establish causal relationships has become of increased interest in many disciplines [1][2][3][4][5][6], especially in cases where a randomized controlled experiment is not feasible.One key tool to visualize hypothesized causal relationships, identify where biases may arise, and inform how to address them is a directed acyclic graph (DAG) [1,2,5].These graphs provide a display of the connections between exposure, outcome, and other relevant variables.DAGs are employed across disciplines including epidemiology [7][8][9], sociology [10][11][12], education [13][14][15], and economics [16][17][18].DAGs consist of nodes and edges, where the nodes represent variables and the edges convey direct causal effects by displaying an arrow leaving from the cause and pointing toward the effect.Importantly, a graph qualifies as a DAG if no variable is an ancestor of itself, meaning no cycles occur in the graph, and each edge is pointed in a single direction [19].For a DAG to be considered causal, it is required to include all variables that are common causes of any two existing variables in the graph [1].
Developers have introduced software to produce DAGs across a variety of platforms.DAGitty, dagR, ggdag, igraph, pcalg, and bnlearn are open-source packages in R offering a range of plotting and analysis capabilities [20][21][22][23][24][25].DAGitty offers both a browser-based platform and an R package for creating, editing, and analyzing causal diagrams [20].Ggdag extends the plotting functionality of DAGitty and is tidyverse and ggplot compatible [21].dagR focuses on analysis and data simulation capabilities and provides a framework to draw, manipulate, and evaluate DAGs [23].The R package igraph is designed for network analysis and is especially capable of handling large graphical systems [22].pcalg not only centers around causal structure learning and causal inference discovery but also has some visualization features [24,26].The bnlearn package also focuses on causal discovery through Bayesian network structure learning and parameter learning [25].In Python, three prominent libraries for causal graphs are causal-learn, causal discovery toolbox, and gCastle [27][28][29].All three software packages are focused on their algorithms for causal discovery but have some limited DAG plotting capabilities.In the document preparation system LaTeX, the graphing library TikZ is commonly used to draw DAGs [30].Quiver, a web-based application, allows users to quickly draw a DAG through click and drag motions and has the functionality to export created DAGs to LaTeX code [31].Causal fusion is a similar webbased application; however, access to this resource requires an approved account [32].Tetrad, a free downloadable tool with over 30 years of history, creates, simulates, estimates, tests, and predicts causal and statistical models using DAGs [33].This tool's functionality is very similar to pcalg and bnlearn.With this wide array of DAG drawing software, knowing which option is the most appropriate and easily implementable is a challenge.In 2006, Haughton et al. provided a comparison article [34] that reviewed three statistical methods to illustrate DAGs (MIM, Tetrad, and WinMine).Since their publication, statistical methods to illustrate DAGs have changed; indeed, Tetrad is the only software compared by Haughton et al. that is still maintained.
While numerous methods clearly exist for designing, analyzing, and visualizing a DAG using software, there is no centralized resource comparing modern methods or providing recommendations.DAGs are a commonly used visual tool in publications; indeed, Tennant et al. analyzed a collection of 234 articles published between 1999 and 2017 that mentioned concepts related to DAGs and found that two-thirds of the articles made at least one DAG available [35].However, Tennant et al. noted that such DAGs ranged drastically in size, quality, and notation [35].This is likely in part due to the lack of a comprehensive description of the available software for drawing DAGs.Here, we provide a guide that highlights, compares, and demonstrates how to employ each DAG software.
In our review, we include open-source software that provides a manual of all features or published documentation and can be implemented directly in popular programming languages among statisticians, such as R or LaTeX.We additionally choose to restrict our review to packages focused on visualizing and producing DAGs, and thus exclude pcalg, bnlearn, causal-learn, causal discovery toolbox, and gCastle because their main purpose is causal discovery and analysis.This restricts our scope to five resources: TikZ, DAGitty, ggdag, dagR, and igraph.We evaluate the software across three categories: visual design, analysis capability, and utility.For visual design, we assess whether the software can include curved edges, subscripts, and math notations, the default visual settings, design customization capabilities, and the ability to allow for autogenerated and user-specified node placements.Next, to evaluate analysis capability, we check for the presence of an exposure, outcome, and covariate framework; the ability to identify ancestor/descendant relationships, conditional independencies, and minimally sufficient adjustment sets; and the capacity to simulate data.Finally, we evaluate the utility of the five packages by comparing the resources available, our experience of the learning curve, and the required base software.In Section 2, we describe each software across evaluation criteria and compare performance between methods.In the discussion, we provide general recommendations specific to the user's needs and future directions for DAG software.

DAG producing methods
To compare the five software packages, we create the same two DAGs using each program.In Figure 1, we implement an identical simple confounding causal structure in each software and compare the output.In Figure 2, we draw a more complex causal relationship, including one mediator and three confounders.For Figures 1 and 2, we include below the plots the code used to produce each DAG.We create the DAGs using DAGitty (web version 3.0, CRAN version 0.3-1), ggdag (version 0.2.10), dagR (version 1.2.1), and igraph (version 1.5.0.1) with R-4.2.0 [36] and the TikZ graphs in LaTeX 2022 with TikZ (version 3.1.10).In Table 1, we summarize the capabilities of each software based on its visual design, analysis capability, and utility.We produce the graphs from the four R packages by saving the output as a PNG file, while the TikZ graphs are produced in LaTeX by compiling them to a PDF.We note, however, that all five methods are compatible with R Markdown and Quarto.In Figures 1 and 2, we attempt to maintain uniform styling across methods, with circles around nodes, black text, and arrows and nodes arranged chronologically from left to right.In Table 1, we highlight the design customization settings available for each software to further adapt the DAG to the user's preferences.

TikZ
TikZ is a LaTeX library for creating graphical figures, with extensive customization settings to generate a vast array of different images.It relies on portable graphics format (PGF), another LaTeX language as its base layer [30].Tantau developed and maintains the two libraries.Tantau did not design the library specifically for DAGs,   and thus, there is no built-in exposure, outcome, and covariates framework, analysis features, or automatic placement of nodes.Despite these limitations, TikZ serves as one of the leading software to create DAGs, thanks to its flexibility, which allows the user to easily specify the color, size, shape, and arrangement of nodes and edges.In addition, as TikZ is built into the LaTeX environment, one can use mathematical notation to label variables with inline math mode (denoted by $'s), including subscripts, Greek letters, and any other mathematical symbols.There are many tutorials and user-posted question and answer boards online that explain the different possible customization settings.One can also refer to the TikZ and PGF manual for full documentation of all of the package's capabilities [30].For the purpose of creating causal diagrams, we recommend beginning with DAG-specific resources as TikZ's extensive language can be overwhelming, especially for a beginner in LaTeX.To code a DAG using TikZ, we first create the nodes and specify each node's location, shape, color, and label as desired.We then list the edges and can again customize any stylistic preferences.The code is both intuitive and readable.
As shown in Figure 1, we write the labels using plain text, add circles around the nodes, and use the stealth arrow type.We display the variables chronologically and arrange the exposure, outcome, and confounder such that all edges can be shown using straight lines.As shown in Figure 2, we use a similar design as in Figure 1, but here we write variable labels using inline math mode.This allows us to add subscripts to the confounder nodes, and label them as C C , 1 2 , and C 3 .Here, we incorporate curved arrows by specifying the angle of the desired curve.For those with less familiarity with LaTeX, there may be a steeper learning curve to create figures with TikZ.However, we believe that frequent users of LaTeX will find it easy to incorporate TikZ into their documents to produce DAGs.As shown Figures 1 and 2, it is obvious why TikZ is a popular resource to draw DAGs.We can easily create a clear and visually appealing DAG, with straightforward changes available to adapt the style to a user's preferences.In Table 1, we further see that TikZ's strength comes from its customizability and visual appeal, while its main limitation is that it is not designed for a DAG-specific framework and thus does not have analysis capabilities.

DAGitty
DAGitty is a browser-based interface, downloadable tool, and R library for creating, editing, and analyzing DAGs.The website interface and downloadable tool are accessible via http://www.dagitty.net/and the R package is available on CRAN [20].DAGitty's browser version provides a graphical user interface that allows users to draw and analyze causal diagrams.Its drag-and-click features make the tool very user-friendly and easy to learn.The website allows the user to select and label nodes, connect nodes via directional edges, and identify the exposure, outcome, and covariates.After one creates a causal diagram, one can explore the conditional independence, ancestor/descendant identification, and minimally sufficient adjustment sets [37,38].The user can also copy and paste the model code into R after installing and loading the DAGitty library.Similarly, the R library has the functionality to obtain the conditional independencies, ancestor/descendant identification, and adjustment set lists directly in the statistical program.The DAGitty package in R additionally offers functions that can simulate data based on the specified DAG structure [20].However, the simulation functionality is limited; the creators suggest employing it only for validation purposes and that one use other techniques or software for more complicated simulation studies [20,39].DAGitty 0.9a, the oldest version of the software available, was released in 2010 with its first announcement via a letter in Epidemiology [40].The most current version of the R package available is 3.1, which was updated in 2023 (as of August 2023).DAGitty developers also maintain the browser-based website regularly [41].
In Figure 1, we see the simple confounding DAG with DAGitty's output from R. Notably, there are no circles around the nodes, the lines are very light and thin, and the arrows run very close to the letters.Figure 2 shows the more complicated mediation DAG, with DAGitty's graph shown second to the left.We can see that DAGitty is not able to incorporate the subscripts on the nodes.It can plot the curved edges, but the placement and execution of the curves do not look as polished as in TikZ.The top curved arrow from A to Y appears condensed due to space limitations imposed by the RStudio plot output box and the R Markdown display region.If the curve had a larger radius from A to Y , then the display region in R would cut off the top part of the curve.This region limitation is not a problem on the DAGitty website.
To create both DAGs, we employ the website to set up the initial placement and then copy and paste code from the website into RStudio, where we use the DAGitty R library.The DAGitty website is user-friendly and intuitive.In summary, DAGitty has extensive analysis capabilities, but is less flexible in visual design.Overall, this is a suitable package for users who want to produce a quick DAG and can accept compromises on visual appearance.

ggdag
The R package ggdag allows users to plot and analyze causal graphs [21].It is built on top of DAGitty to utilize DAGitty's powerful algorithms to analyze DAGs, while allowing users to employ ggplot and tidyverse to create professional, reproducible, and visually appealing DAGs [42].It also enables the use of DAGitty objects in the context of tidyverse [21,43].The R functions are flexible in the sense that they allow users to code their DAG structure using DAGitty syntax or ggdag-specific syntax.This feature allows ggdag to have the same analytical capability that DAGitty has, including identification of conditional independencies, ancestor/descendant relationships, and minimally sufficient adjustment sets lists [21].ggdag additionally offers a visual display of adjustment sets via a colored graph [43,44].Finally, this package has a wrapper function that allows one to apply DAGitty's simulating data algorithm to the structural equation model [43].Thus, ggdag and DAGitty are both able to simulate data but under the same limitations.The initial release was in March 2018, and the current version 0.2.10 was updated in 2023 (as of August 2023) [43].
Figure 1 displays the simple confounding DAG with ggdag's output shown in the third panel.In ggdag, edges are specified with structural equation model notation in the dagify() function.The ggplot() plotting function then renders the defined dagify object.Figure 2 illustrates the more complicated mediation DAG, with ggdag's version displayed in the center column.We see that ggdag is able to incorporate curved edges and subscripts on the node labels.The curved edges have a nice bold arc, for which it was easy to control the radius and directionality using the geom_dag_edge_arc() add-on.The incorporation of both the subscript and the professional-looking curved edges makes this graph visually appealing.The creation of this graph took twice the amount of time as DAGitty, suggesting a longer learning curve; however, this might be eased with more frequent use.Incorporating the placement of each node, the location of each curved edge, and the subscripts each took the authors multiple Google searches for examples, vignettes, or user-posted question and answer boards [42][43][44][45].The end product is very appealing, but it did require patience and time.For quicker use of ggdag, one might utilize DAGitty's web application to specify node and edge locations and then use the corresponding code and DAGitty object in ggdag's plotting function.
Table 1 highlights ggdag's performance in terms of analysis capability, and visual appeal.Notably, ggdag can incorporate subscripts and curved arrows, customize node placement, and write Greek symbols to label nodes through unique Unicode values in the ggdag label function [46].In the utility category, ggdag has many vignettes, is well documented, and has many resources available online [42][43][44][45]; however, ggdag falls short compared to other software in ease-of-use due to the relative time needed by the authors to create Figure 2. Overall, ggdag is a valuable tool for users who want to create professional-looking DAGs in R.

dagR
dagR is an R package developed by Lutz P. Breitling, which was originally released in 2010 and most recently updated in 2022 (as of August 2023) [23,39].The package allows users to plot DAGs, identify minimal sufficient adjustment sets, list ancestors of a given node, and simulate data from the specified causal diagram.dagR notably provides additional simulation capabilities compared to DAGitty (and ggdag) by allowing for a combination of binary and continuous variables within the same DAG [39].Unlike DAGitty and ggdag, dagR does not provide functions to identify conditional independencies, and cannot directly identify descendants [23].
While dagR excels at simulating data and provides some analysis functions, it has limitations compared to other available software in terms of visualizations and user-friendliness.The package is lacking in customization settings for DAGs, as it does not allow for curved edges, circles around nodes, subscripts, or math notation.In Figure 1, we see that the visual design is fairly similar to DAGitty, with smaller node labels and thin edges.The dagR default settings print a legend below the DAG, allowing the user to provide longer labels or descriptions for the nodes.However, we found that the legend lines tend to overlay, reducing readability, as seen in Figure 1.In Figure 2, we employ the automatic placement of nodes feature, as without curved arrows, this achieves the clearest arrangement of all edges in the graph.Unfortunately, the DAG still has visual limitations, such as the overlaid arrow heads leading into variables M and Y .With more complex DAGs, it would be challenging to generate a readable graph without curved lines.
In addition, we note that dagR, despite the simple visual results, is one of the hardest to utilize.Compared to DAGitty and ggdag, there are few online resources available with example code and use descriptions.We also find the code itself to be the least intuitive.For example, all edges are specified together in a single vector using pairs of numbers, where each number refers to a different node.With a larger number of variables, it becomes difficult to track which number corresponds to each node, making the system cumbersome.In Table 1, we highlight dagR's strength in analysis, notably data simulation, as well as its limitations in visual design and utility.The package lags in terms of the readability of its graphs and code.Finally, we note that dagR developers provide a function to translate dagR objects to DAGitty [39].Thus, if a user desires a more visually appealing graph, but has already written their causal structure using dagR (perhaps to simulate data), then they can convert the object to DAGitty, and use DAGitty or ggdag to plot.

igraph
Finally, we include igraph, an open-source network analysis tool that emphasizes efficiency and portability [22].Csárdi and Nepusz began the development of igraph in 2006 [22], but many collaborators have since contributed to its growth.The most recent R update of igraph as of August 2023 is version 1.5.0.1, released in July 2023 [47].The tool can be utilized in R, Python, Mathematica, and C/C++, but its core is written in C [22].Here, we focus on igraph's implementation in R.This package excels at auto-placement of nodes and edges, utilizing its vast array of graph layout algorithms, making it a valuable tool for visualizing complex and large DAGs [22,48].While igraph provides many functions to calculate various structural properties of graphs and conduct network analysis, it has limited causal analysis capabilities.It can identify ancestors and descendants (i.e., subcomponent()), but does not offer functions to directly compute conditional independencies and adjustment sets [48].
The igraph notation to specify arrows is similar to dagR, where edges are listed in a vector with every pair denoting the origin and destination nodes of the edge [22,48].However, unlike in dagR, one can name nodes with characters, making the code more readable.While automatic arrangement of the nodes is the default, the user can specify the x and y coordinates for each node via a layout matrix.In Figure 1, we see that the igraph DAG allows for black text surrounded by a circle for each node with black lines connecting each node.Since the default visualization uses navy text, orange circles, and gray arrows, we use several add-on features to achieve the desired styling (i.e., edge.color,vertex.size,and vertex.colorare used in the plotting function).In Figure 2, we again specify the color, size, label, and location of all nodes and edges.To draw curved lines, we use the edge.curvedspecification.To handle subscript notation, we use vertex.labeland the expression() feature; one could easily specify Greek letters here using Unicode.A full list of plotting controls is conveniently located in the R Vignette [48].The DAGs produced are clean, professional, and highly customizable, although fine-tuning all the plotting features can be time-consuming.In summary, igraph excels in its Drawing DAGs  7 flexibility for the visual design of a DAG; however, it has limited analysis functions to answer causal inference questions and less intuitive code compared to TikZ, DAGitty, and ggdag.

Discussion
There are several ways to create DAGs using open-source software, each with different strengths and weaknesses.By focusing on two DAG structures (seen in Figures 1 and 2), we are able to compare the software and highlight key features.Our findings are summarized in Table 1 by displaying the software's visual design, analysis capability, and utility.In this review, we choose to focus on only five software, namely TikZ, DAGitty, ggdag, dagR, and igraph, and primarily restrict our scope to DAGs.
The graphs we create in Figures 1 and 2 attempt to employ circular nodes, all-black coloring, and chronological arrangement, but we note that there is some debate on best stylistic practices for DAGs.While some recommend that variables be arranged such that arrows flow in a single direction (e.g., left to right or top to bottom) [1,35], others arrange nodes by causal proximity [49].Furthermore, some choose to limit circular nodes to latent or unobserved variables and use square nodes or no shape for observed variables [35,50].In Table 1, we highlight which software would be adept at these design changes.To create an acyclic directed mixed graph [51], all the software highlighted except for dagR can display bidirected edges as needed.Meanwhile, TikZ-SWIG, a library using TikZ in LaTeX, is the leading software to draw Single World Intervention Graphs (SWIGs [52]) [53].More work is needed to provide a synopsis of software available to generate the other types of causal graphs and evaluate their performance.
We recommend that one choose a package to draw DAGs appropriate to one's needs.Should the reader want to quickly produce an informal graph, we suggest using DAGitty, as the online platform allows one to generate a figure without writing any code.For DAG interpretations such as identifying minimally sufficient adjustment sets, ancestor/descendant relationships, and conditional independencies, DAGitty offers the widest range of functionality.When simulating data from a DAG, dagR provides the most flexibility.For visualizing large and complex DAGs, we recommend using igraph.Finally, for formal publication quality graphs, we recommend TikZ when the manuscript is being written in LaTeX and the rendered output is a PDF and recommend ggdag when the final result will be coded in R utilizing an R Markdown or Quarto compiler.
While the discussed software offer a wide range of capabilities, we are optimistic that drawing DAGs will become even easier as new tools arise and existing software improve.We write this article using the listed versions of each of the software; the discussed packages may have future updates that address some of the shortcomings we identify.We encourage developers and users to continue contributing to the open-source DAG software community and look forward to future developments.We anticipate that soon the option will exist to take a photo of a hand-drawn graph and convert it to code to render the DAG in digital form.Such software already exists for mathematical formulas, matrices, and chemical diagrams [54,55], and it is the logical next step for DAG drawing tools.

Figure 1 :
Figure 1: Simple confounding DAG example for each software."A" denotes the exposure, "Y" the outcome, and "C" the confounder.The code used to generate each DAG is included below the relative plot.

Figure 2 :
Figure 2: Complex mediation DAG example for each software.A denotes the exposure, Y the outcome, and M the mediator, and "C_1"/ C 1 , "C_2"/C 2 , and "C_3"/C 3 denote the three covariates.The code used to generate each DAG is included below the relative plot.

Table 1 :
Summary of the characteristics and capabilities of each reviewed software.Where possible, the relative code/functions are mentioned.Note that "-" is used to denote when a feature is not available [pos="x,y"] after node specification OR give a list of x and y pairs in the coordinates() coords = list(x=c(), y=c()) in the dagify()