Systems biology graphical notation markup language (SBGNML) version 0.3

Abstract This document defines Version 0.3 Markup Language (ML) support for the Systems Biology Graphical Notation (SBGN), a set of three complementary visual languages developed for biochemists, modelers, and computer scientists. SBGN aims at representing networks of biochemical interactions in a standard, unambiguous way to foster efficient and accurate representation, visualization, storage, exchange, and reuse of information on all kinds of biological knowledge, from gene regulation, to metabolism, to cellular signaling. SBGN is defined neutrally to programming languages and software encoding; however, it is oriented primarily towards allowing models to be encoded using XML, the eXtensible Markup Language. The notable changes from the previous version include the addition of attributes for better specify metadata about maps, as well as support for multiple maps, sub-maps, colors, and annotations. These changes enable a more efficient exchange of data to other commonly used systems biology formats (e. g., BioPAX and SBML) and between tools supporting SBGN (e. g., CellDesigner, Newt, Krayon, SBGN-ED, STON, cd2sbgnml, and MINERVA). More details on SBGN and related software are available at http://sbgn.org. With this effort, we hope to increase the adoption of SBGN in bioinformatics tools, ultimately enabling more researchers to visualize biological knowledge in a precise and unambiguous manner.


Introduction 1
The Systems Biology Graphical Notation (SBGN) aims to standardize the graphical/visual representation of bio-2 chemical and cellular processes (Czauderna and Schreiber, 2017;Junker et al., 2012;Novère et al., 2009;Touré et al., 3 2018). The goal of SBGN is to represent networks of biochemical interactions in a standard, unambiguous way to 4 foster efficient and accurate representation, visualization, storage, exchange, and reuse of various types of bio-5 logical knowledge (e.g., gene regulation, metabolism, and cellular signaling). SBGN is defined by comprehensive 6 sets of symbols with precise semantics, together with detailed syntactic rules defining their use and interpretation. 7 Overall, SBGN is made up of three complementary visual languages. 8 ■ The SBGN Process Description (PD) language (Rougny et al., 2019) visualizes the temporal courses of the 9 molecular processes and interactions taking place between biochemical entities in a particular system. This 10 type of diagram depicts how entities transition from one form to another as a result of different influences to 11 describe the temporal aspects of a biological system. Nodes describe entity pools (e.g., metabolites, proteins, 12 and complexes) and processes (e.g., associations and influences). The edges describe relationships between 13 the nodes (e.g., consumption and stimulation). 14 ■ The SBGN Entity Relationship (ER) language (Sorokin et al., 2015) visualizes the relationships in which a 15 given entity can participate without regard for the temporal aspects. Relationships can be seen as rules 16 describing the influences of entity pool nodes on relationships. Relationships are independent, and this 17 independence is essential in avoiding the combinatorial explosion inherent to process description diagrams. 18 The nodes describe biological entities such as proteins and complexes, and the edges between them describe 19 interactions, relationships and/or influences (e.g., complex formation, stimulation, and inhibition). 20 ■ The SBGN Activity Flow (AF) language (Mi et al., 2015) visualizes the influences between the activities dis- 21 played by molecular entities, rather than the entities themselves. Nodes in SBGN AF diagrams describe the 22 biological activities of the entities such as protein kinase activity or binding activity. The edges describe 23 influences between the activities (e.g., positive influence and negative influence).  28 SBGN is defined neutrally concerning programming languages and software encoding; however, it is oriented pri-29 marily towards allowing models to be encoded using XML, the eXtensible Markup Language (Bray et al., 2004). 30 This document contains specifications of how SBGN maps should be serialized in XML. Note that this specifica- 31 tion is related to all three SBGN languages, with classes such as Glyph and Arc having the same definition and at-32 tributes across all languages. Unlike SBGN, SBGNML does not deal with biological meaning, but, instead, focuses 33 on the computational representation of SBGN graphics, so it is comparable with graphical exchange standards like Section ■ The SBGN AF "perturbation" glyph, which was an activity node, has been deprecated and is now a unit of 1 information.
2 ■ The support of colors and other annotations through extensions enables the storage of rendering informa-3 tion and biological annotations (e.g., database identifiers). 4 The definition of the model description language presented here does not specify how programs should communi-5 cate or read/write SBGN. We assume that for diagram editing software to communicate a model encoded in SBGN, 6 the program will have to translate its internal data structures to and from SBGNML, use a suitable transmission 7 medium and protocol, and to provide any further necessary infrastructure. However, these issues are outside the 8 scope of this document. The software library libSBGN (van Iersel et al., 2012) was developed for reading, writ-9 ing, and manipulating SBGN maps stored in SBGNML format. A broad set of software tools support SBGNML, 10 including modeling software CellDesigner (Balaur et al., 2020), SBGN editors Newt (Sari et al., 2015), Krayon for 11 SBGN 3 , and SBGN-ED (Czauderna et al., 2010). STON (Touré et al., 2016) and ySBGN 4 provide conversion be- 12 tween SBNGML and GraphML/Neo4j, respectively. The software EscherConverter provides an SBGN viewer and a 13 bidirectional converter for metabolic maps in JSON format and SBGNML (King et al., 2015). Numerous databases 14 (Reactome (Croft et al., 2011), Panther Pathways (Mi et al., 2017, Pathways Commons (Rodchenkov et al., 2020), 2 Package syntax and semantics 1 2.1 Document conventions 2 We use Unified Modeling Language (UML) version 2.0 (Dennis et al. 2015) class diagram notation to define the 3 constructs provided by this package. We first provide an overall view of the various data types and constructs 4 along with their relationships, followed by a more local view of the constructs and their relationships in associated 5 sections. 6 In this section, we define the syntax and semantics of the Systems Biology Graphical Notation -Markup Language. We 7 expound on the various data types and constructs defined in this package, then in Section 3 on page 24; we provide 8 complete examples of using the constructs in sample SBGN models. 9 2.2 Namespace URI and other declarations necessary for using this package 10 SBGNML is identified uniquely by an XML namespace URI. An SBGN document must declare the following is the 11 namespace URI for this version of the Systems Biology Graphical Notation -Markup Language for SBGNML version 0. Section 3.1 of the SBML Level 3 specification (Hucka et al., 2019) defines several primitive data types and also 2 uses XML Schema 1.0 data types (Biron and Malhotra, 2000). We assume and use some of them in the rest of this 3 specification, particularly float, ID, IDREF, and string. The Systems Biology Graphical Notation -Markup Language 4 defines other primitive types as described below. The Language is an enumeration of values used to specify which SBGN Language is encoded on the Map element.

7
The possible values are process description, entity relationship, and activity flow.

Section 2.3 Primitive data types
The Class is an enumeration of values used to specify what type a Glyph is encoding. 1 The possible values are unspecified entity, simple chemical, macromolecule, nucleic acid feature, 2 simple chemical multimer, macromolecule multimer, nucleic acid feature multimer, complex, 3 complex multimer, source and sink, perturbation, biological activity, perturbing agent, 4 compartment, submap, tag, terminal, process, omitted process, uncertain process, association, 5 dissociation, phenotype, and, or, not, equivalence, state variable, unit of information, entity, 6 outcome, interaction, influence target, annotation, variable value, implicit xor, delay, 7 existence, location, cardinality, and observable. The Orientation is an enumeration of values used to express how to draw asymmetric glyphs. 10 The orientation of Process Nodes is either "horizontal" or "vertical". It refers to an (imaginary) line connecting the 11 two in/out sides of the PN.

12
The possible values are horizontal, vertical, left, right, up, and down. The value refers to the direction 13 at which the arrow side of the glyph is pointing. The EntityType is an enumeration of values used for Activity Flow maps that specifies the auxiliary unit to display.

16
The possible values are unspecified entity, simple chemical, macromolecule, nucleic acid feature, 17 and complex. The ArcGroupType is an enumeration of values used to define the semantic of an ArcGroup.

20
The only possible value is interaction.

1
The ArcClass is an enumeration of values used to define the semantic of an Arc.

2
The possible values are production, consumption, catalysis, modulation, stimulation, inhibition, 3 assignment, absolute inhibition, absolute stimulation, positive influence, negative influence, 4 unknown influence, equivalence arc, necessary stimulation, and logic arc. The Document object shown in Figure 8 corresponds to the XML element sbgn. The sbgn element is the root of 7 any SBGNML document. 8 The Document object derives from the SbgnBase class and thus inherits all attributes and elements that are present 9 for this class. A Document contains one or more Map elements. The following example shows an sbgn element definition. The map element describes a single SBGN map.

9
The Map object derives from the SbgnBase class and thus inherits all attributes and elements that are present for 10 this class. A Map contains exactly one BBox element.

11
A Map may contain one or more: 12 ■ Glyph elements.
14 ■ ArcGroup elements. 15 In addition, the Map object has the following attributes.

16
The id attribute 17 A Map has an optional attribute id of type ID. The language attribute has been deprecated as of Version 0.3, in favor of the version attribute. One of the at-7 tributes has to be defined on a map element. The version attribute should be used in favor of the language attribute. One of the attributes has to be defined 26 on a map element.

Example 28
The following example shows an abbreviated SBGN Map definition within an sbgn element definition. The example 29 shows a Map with a version attribute. 2.6 The Point class 1 The Point object encodes x and y coordinates.

2
The origin is located in the top-left corner of the map.

3
There is no unit: proportions must be preserved, but the maps can be drawn at any scale. In the example test files, 4 to obtain a drawing similar to the reference file, values in the corresponding file should be read as pixels.

5
Additionally, it may contain zero, one, or two child Point objects, which can be used to encode quadratic or cubic 6 Bézier points.

7
The Point object derives from the SbgnBase class and thus inherits all attributes and elements that are present for 8 this class. In addition, the Point object has the following attributes.
9 The x attribute 10 A Point has a required attribute x of type double. It represents the Cartesian x coordinate horizontally, increasing 11 from left to right.

12
The y attribute 13 A Point has a required attribute y of type double. BBox encodes the bounding box of its parent element.

33
The BBox object derives from the SbgnBase class and thus inherits all attributes and elements that are present for 34 this class. In addition, the BBox object has the following attributes.   The Label element describes the text accompanying a glyph.

26
The Label object derives from the SbgnBase class and thus inherits all attributes and elements that are present for 27 this class. A Label may contain exactly one BBox element. In addition, the Label object has the following attributes.

Section 2.9 The Glyph class
The id attribute 1 A Label has an optional attribute id of type ID.

2
The text attribute 3 A Label has a required attribute text of type string. The text element is a simple string. Multi-line labels are 4 allowed. Line breaks are encoded as &#xA; as specified by the XML standard.

5
The BBox element of a Label 6 The bbox element of a label is optional. When no bounding box is defined, the bounding box of the parent glyph 7 is inherited. The label should be drawn centered horizontally and vertically in the bounding box. 8 When the bounding box is inherited, the label may spill outside (just like it can spill outside its parent glyph).

9
An explicit bbox provides more definite information regarding what surface the label should cover. It defines an 10 upper boundary outside of which the label should (ideally) not spill. It also represents a preferred size: the surface 11 covered by the label can be smaller, but should ideally be as close as possible to the bounding box.

12
In most glyph classes (EPNs, unit of information, etc.), the label is supposed to be centered, so the bounding box 13 is usually omitted (unless there is a specific hint to be shared concerning the area the label should ideally cover).
14 However, the label of a compartment or a complex can be drawn anywhere inside the glyph, so these should prefer- 15 ably have an explicit bounding box. 2.9 The Glyph class 31 The glyph element is:

33
■ or a sub-node (state variable, unit of information, inside of a complex, etc.) 34 In the first case, it is a child of the map element.

35
In the second case, it is a child of another glyph element.

36
The Glyph object derives from the SbgnBase class and thus inherits all attributes and elements that are present for 37 this class. ■ exactly one State element that carries the information of a state variable.
1 ■ exactly one Glyph element called "clone", indicating that the Glyph carries a clone marker. The label element 2 of the child glyph can be used to place text in the clone marker. Figure 15 shows an example.
3 ■ exactly one Callout element. The callout element is only used for glyphs of class annotation. It contains the 4 coordinate of the point where the annotation points to, as well as a reference to the element that is pointed 5 to.
6 ■ exactly one Entity element. The entity is only used in Activity Flow maps. It can only be used on a unit of 7 information glyph on a biological activity glyph, where it is compulsory. It is used to indicate the shape of 8 this unit of information.
9 ■ zero or more child Glyph elements. These will be, for example, used by glyphs of class complex and hold 10 the individual components.
11 ■ zero or more child Port elements describing the anchor points for this glyph.

12
In addition, the Glyph object has the following attributes.

13
The id attribute 14 A Glyph has a required attribute id of type ID. The id attribute (xsd:ID) of a glyph can be referred to, e.g., as a 15 source by arc elements, a target by arc elements or callout elements, by other glyphs if the glyph is of the class 16 compartment.

17
The xsd:ID type is an alphanumeric identifier, starting with a letter.

Section 2.9 The Glyph class
It is recommended to generate meaningless IDs (e.g., "glyph1234") and avoid IDs with meaning (e.g., "epn_ethanol") 1 The class attribute 2 A Glyph has a required attribute class of type string. While the type is of string, the values should be one of the 3 ones defined in Class.

4
The compartmentRef attribute 5 A Glyph has an optional attribute compartmentRef of type IDREF. 6 The compartmentRef is a reference to the ID of the compartment that this glyph is part of. It should only be used In case there are no compartments, entities that can have a location, such as EPNs, are implicit members of an 10 invisible compartment that encompasses the whole map. In that case, this attribute must be omitted.

11
The compartmentOrder attribute 12 A Glyph has an optional attribute compartmentOrder of type double.

13
The compartmentOrder attribute can be used to define a drawing order for compartments. It enables tools to draw 14 compartments in the correct order, especially in the case of overlapping compartments. Compartments are only 15 used in PD and AF, and, thus, this attribute as well.

16
The attribute is of type float, and the attribute value has not to be unique.

17
Compartments with higher compartmentOrder are drawn on top. The attribute is optional and should only be 18 used for compartments.

19
The orientation attribute 20 A Glyph has an optional attribute orientation of type string. While the type is of string, the values should be 21 one of the ones defined in Orientation. The orientation attribute is used to express how to draw asymmetric 22 glyphs.

23
The orientation of Process Nodes is either horizontal or vertical. It refers to an (imaginary) line connecting the 24 two in/out sides of the PN.

25
The orientation of Tags can be left, right, up, or down. It refers to the direction at which the arrow side of the 26 glyph is pointing.

Example 28
The following example shows a Glyph definition within an abbreviated SBGN map definition. The example shows 29 a Glyph of class macromolecule with an optional attribute compartmentRef. Figure 14 shows the corresponding 30 visual representation. A port element describes an anchor point, which arc elements can refer to as a source or target. It consists of 21 absolute 2D Cartesian coordinates and a unique id attribute.

22
Two port elements are required for process nodes and logical operators ( and, or, not, and equivalence).

23
They represent the extremity of the two "arms" which protrude on both sides of the core of the glyph (= square or 24 circle shape).

25
The Port object derives from the SbgnBase class and, thus, inherits all attributes and elements that are present for 26 this class. In addition, the Port object has the following attributes.  The following example shows a Port definition within an abbreviated SBGN map definition. The example shows 10 two Ports on a Glyph.  The following example shows a State definition within an abbreviated SBGN map definition. The example depicts 5 two States on a Glyph of the class macromolecule, one State with a value attribute and a variable attribute and 6 one State with a variable attribute only. Figure 18 shows the corresponding visual representation. Callouts are used in the case of glyphs of class annotation. The callout is always optional. It can be used to show 27 which element the callout points to. 28 The Callout object derives from the SbgnBase class and thus inherits all attributes and elements that are present 29 for this class. A Callout contains at most one Point element. In addition, the Callout object has the following 30 attributes.

Section 2.13 The Entity class
The target attribute 1 A Callout has an optional attribute target of type IDREF. If specified, it references either a Glyph or an Arc in the 2

Map.
3 Example 4 The following example shows a Callout definition within an abbreviated SBGN map definition. The example de-5 picts a Callout on a Glyph of class annotation, pointing to a Glyph of the class macromolecule. Figure 20 con- 6 tains the corresponding visual representation. An entity is only used in Activity Flow maps. It should be placed on a unit of information subglyph of an activity 26 glyph and is used to indicate the entity that performs the activity.

27
The Entity object derives from the SbgnBase class and thus inherits all attributes and elements that are present for 28 this class. In addition, the Entity object has the following attributes.

Section 2.14 The Arc class
The name attribute 1 An Entity has a required attribute name of type string.
2 Example 3 The following example shows an Entity definition within an abbreviated SBGN map definition. The example shows 4 an Entity with the name "macromolecule" placed on a Glyph of class biological activity. Figure 22 shows the 5 corresponding visual representation.

Section 2.15 The ArcGroup class
■ For ER maps: an optional cardinality marker (e.g., "cis" or "trans"), zero or more ports (influence targets), 1 and zero or more outcomes, 2 ■ a mandatory source and target (glyph or port), 3 ■ a geometric description of its whole path from start to end. This path can involve any number of straight 4 lines or quadratic/cubic Bézier curves.

5
The Arc object derives from the SbgnBase class and thus inherits all attributes and elements that are present for 6 this class.

7
An Arc can contain zero or more child Glyph elements. These can be a stoichiometry marker (PD maps), a cardi-8 nality marker (ER maps), or outcome glyphs (ER maps).

9
An Arc contains at the very least one Point element with an element named start that represents the start point 10 of the arc, and another Point element with element name end as the endpoint. Additionally, it may contain any 11 number of Point elements with element name next that represent bend points along the way from start to end.

12
An Arc may also contain any number of Port elements.

13
In addition, the Arc object has the following attributes.
14 The id attribute 15 An Arc has a required attribute id of type ID.

16
The class attribute 17 An Arc has a required attribute class of type string. It describes what kind of an Arc this element represents.

18
While the data type is of string, the values ought to be from the ArcClass enumeration.

19
The source attribute 20 An Arc has a required attribute source of type IDREF. It specifies the source element for this arc.

21
The target attribute 22 An Arc has a required attribute target of type IDREF. It specifies the target element for this arc.

Example 24
The following example shows an Arc definition within an abbreviated SBGN map definition. The example shows 25 one Arc of class consumption and one Arc of class production. Figure 24 shows the corresponding visual rep-26 resentation.  1 The arc group describes a set of arcs and glyphs that have a relation together, for example, in ER arcs of class 2 interaction around a glyph of class interaction.

3
Note that, despite the name, an arc group contains both arcs and glyphs. 4 The ArcGroup object derives from the SbgnBase class and thus inherits all attributes and elements that are present 5 for this class. 6 An ArcGroup can contain: 7 ■ zero or more child Glyph elements, 8 ■ zero or more child Arc elements.

9
In addition, the ArcGroup object has the following attributes. Glyph node of class interaction. It is only introduced here to represent the circle mentioned above.

Example of an Entity Relationship Map 1
The following example of an Entity Relationship map shows the principle of the Polymerase Chain Reaction (PCR) 2 (Mullis et al., 1986). Figure 28 shows the corresponding visual representation of the Entity Relationship map.

Example of an Activity Flow Map 1
The following example of an Activity Flow map shows a signaling pathway involving the regulation of TGFβ-2 induced metastasis, as described by Adorno et al. (2009). Figure 29 shows the corresponding visual representation 3 of the Activity Flow map.

B Including color / style information 1
While SBGNML does not formally define classes and attributes that attach color and rendering information to 2 the Glyph and Arc classes of an SBGN Map, the consensus is, to make use of the SBML Level 3 Render Package 3 (Bergmann et al., 2018). This is being done by adding an SBML renderInformation element as a child of the 4 extension element of a Map. The render information object, then defines a list of colors to be used throughout 5 the document, and its list of styles allows to attach these colors to the id of children of the Map. 6 The example below defines two colors, "color_1" (to be used as the background color for the glyph) and "black" 7 (to be used as the outline). They are attached to the glyph with id "sa5" via the idList attribute style element. Where the previous example used the idList in the Style class to indicate that it applies to a specific Glyph with 37 that id, the following example uses a different mechanism. This time the roleList indicates, that the Style applies 38 to glyphs with a render:objectRole attribute. <colorDefinition id="color_1" value="#ccffccff"/> 51 <colorDefinition id="black" value="#000000"/> 52 </listOfColorDefinitions> 53 54 <listOfStyles> 55 <style id="example" roleList="example_style"> 56 <g stroke="black" stroke-width="2" fill="color_1"/> Here we acknowledge those people and organizations that assisted in the development of version 0.3 release of the 2 SBGNML specification. We acknowledge contributors that attended workshops and forum meetings or, in some 3 other way, provided input to previous revisions to this effort. Then, we acknowledge the bodies that provided 4 financial support for the development of the standard.

5
The following list includes members of the SBGN community that have contributed to the development of SBGNML