Synthetic biology open language visual (SBOL visual) version 3.0

Abstract People who engineer biological organisms often find it useful to draw diagrams in order to communicate both the structure of the nucleic acid sequences that they are engineering and the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 3.0 of SBOL Visual, a new major revision of the standard. The major difference between SBOL Visual 3 and SBOL Visual 2 is that diagrams and glyphs are defined with respect to the SBOL 3 data model rather than the SBOL 2 data model. A byproduct of this change is that the use of dashed undirected lines for subsystem mappings has been removed, pending future determination on how to represent general SBOL 3 constraints; in the interim, this annotation can still be used as an annotation. Finally, deprecated material has been removed from collection of glyphs: the deprecated “insulator” glyph and “macromolecule” alternative glyphs have been removed, as have the deprecated BioPAX alternatives to SBO terms.


Purpose
People who engineer biological organisms often find it useful to draw diagrams in order to communicate both the 2 structure of the nucleic acid sequences that they are engineering and the functional relationships between sequence 3 features and other molecular species. Some typical practices and conventions have begun to emerge for such 4 diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language 5 for expressing the structure and function of genetic designs. At the same time, we aim to make this language simple 6 and easy to use, allowing a high degree of flexibility and freedom in how such diagrams are organized, presented, 7 and styled-in particular, it should be readily possible to create diagrams either by hand or using a wide variety of 8 software programs. Finally, means are provided for extending the language with new and custom diagram elements, 9 and for adoption of useful new elements into the language. In order to ground SBOL Visual with precise definitions, we reference its visual elements to data models with 12 well-defined semantics. In particular, glyphs and diagrams in SBOL Visual are defined in terms of their relation to 13 the SBOL 3 data model (Baig et al., 2020) and terms in the Sequence Ontology (Eilbeck et al., 2005) and the Systems 14 Biology Ontology (Courtot et al., 2011). 15 SBOL Visual is not intended to represent designs at the same level of detail as these data models. Effective visual 16 diagrams are necessarily more abstract, focusing only on those aspects of a system that are the subject of the 17 communication. Nevertheless, we take as a principle that it should be possible to transform any SBOL Visual 18 diagram into an equivalent (if highly abstract) SBOL 3 data representation. Likewise, we require that SBOL Visual 19 should be able to represent all of the significant structural or functional relationships in any GenBank or SBOL data 20 representation.

25
Every glyph in SBOL Visual 3.0 corresponds to an element of the SBOL 3.0 data model (Baig et al., 2020). SBOL 26 Visual 3.0 also defines many terms by reference to SBOL 3.0, or by reference to the Sequence Ontology (Eilbeck et al.,27 2005) or the Systems Biology Ontology (Courtot et al., 2011). 28 SBOL Visual is intended to be compatible with the Systems Biology Graphical Notation Activity Flow Language 29 (SBGN AF) (Le Novère et al., 2009), and species and interaction glyphs have been imported from that language (see:  1

Term Conventions 2
This document indicates requirement levels using the controlled vocabulary specified in IETF RFC 2119 and 3 reiterated in BBF RFC 0. In particular, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 4 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted 5 as described in RFC 2119: 6 ■ The words "MUST", "REQUIRED", or "SHALL" mean that the item is an absolute requirement of the specifica-7 tion.
8 ■ The phrases "MUST NOT" or "SHALL NOT" mean that the item is an absolute prohibition of the specification. 9 ■ The word "SHOULD" or the adjective "RECOMMENDED" mean that there might exist valid reasons in 10 particular circumstances to ignore a particular item, but the full implications need to be understood and 11 carefully weighed before choosing a different course.

12
■ The phrases "SHOULD NOT" or "NOT RECOMMENDED" mean that there might exist valid reasons in 13 particular circumstances when the particular behavior is acceptable or even useful, but the full implications 14 need to be understood and the case carefully weighed before implementing any behavior described with this 15 label.
16 ■ The word "MAY" or the adjective "OPTIONAL" mean that an item is truly optional.

18
The definition of SBOL Visual references several SBOL classes, which are defined as listed here. For full definitions 19 and explanations, see the SBOL 3.0 data model (Baig et al., 2020).

20
■ Component: Describes the structure of designed entities, such as DNA, RNA, and proteins, as well as other enti-21 ties they interact with, such as small molecules or environmental properties, and the functional relationships 22 and constraints relating these elements.

23
■ Feature: Represents a specific occurrence or instance of an entity within the design of a Component, such as 24 a promoter in a genetic construct or an enzyme within a synthesis reaction network. Interaction with a compatible term within its types, i.e., one that is equal to or a child of at least one term 21 associated with the glyph, and with a compatible Participation on the incoming and outgoing edges of the 22 glyph 23 More than one glyph may share the same definition: in this case, these glyphs form a family of variants, of which 24 precisely one MUST be designated as the RECOMMENDED glyph, which is to be used unless there are strong 25 reasons to prefer an alternative variant.

26
It will also frequently be the case that a diagram element could be represented by more than one glyph (e.g., a glyph 27 for a specific term and a glyph for a more general term    Definitions are RECOMMENDED to be from the Sequence Ontology for sequence feature glyphs, from the 5 Systems Biology Ontology for molecular species glyphs, and from the Systems Biology Ontology for interaction 6 glyphs. If no applicable terms are available in the preferred ontology, proposal of a new glyph SHOULD be 7 accompanied by a request to the ontology maintainers to add a term for the undefined entity. 8 2. A glyph SHOULD be relatively easy to sketch by hand (e.g., no high-complexity images or precise angles 9 required).

4.
A glyph specification SHOULD show the glyph in its preferred relative scale with respect to other glyphs. 12 5. A glyph SHOULD be specified using only solid black lines (leaving color and style to be determined by the 13 user, as noted below). 14 6. A glyph SHOULD NOT be similar enough to be easily confused with any other glyph when written by hand, or 15 when scaled either vertically, horizontally, or both. 16 7. A glyph SHOULD NOT include text (note that associated labels are not part of the glyph).

17
In addition, some requirements apply only to certain classes of glyphs:   11. If a sequence feature glyph can represent components of highly variable size or structural complexity, the 25 glyph SHOULD be able to be scaled horizontally to indicate relative scale.   3. The scale of glyphs are RECOMMENDED to be kept consistent with their specification and throughout 9 a diagram, but can be altered if desired, particularly to convey additional information (e.g., length of a 10 sequence). 4. Minor styling effects MAY be chosen (e.g., shadow, corner styling, other "font-level" customization) 12 Figure 3 shows some examples of acceptable style variation.

13
In certain special cases, the style of a glyph may be more constrained, but such cases are expected to be rare and 14 strongly motivated.

16
The collection of SBOL Visual glyphs is not expected to provide complete coverage of all of the types of element that 17 people will wish to include in genetic diagrams, particularly given the ongoing evolution of synthetic biology as

Section 4.3 Extending the Set of Glyphs
■ Additional glyph variants, accompanied by compelling use cases that cannot be adequately addressed by the 1 existing glyph variants.
2 ■ Additional definitions for a glyph, capturing an alternate meaning that is useful to humans but existing within 3 a disjoint branch of the relevant ontology. 4 In order to support the coherent extension of SBOL Visual, whenever a diagram creator uses a glyph not found in 5 Appendix A, the creator SHOULD submit it to be considered for inclusion in an updated version of the standard 6 following the processes for adding new glyphs found on the community website at http://sbolstandard.org 7 Section 4 SBOL Glyphs Page 9 of 91 Copyright XXXX The Author(s). Published by Journal of Integrative Bioinformatics.

1
An SBOL Visual diagram represents information about the structure of a biological design. SBOL Visual is particularly 2 concerned with enabling clear communication about the structure of nucleic acid designs, though there is no 3 requirement that a diagram include such. 4 If desired, an SBOL Visual diagram may also be associated with a machine-interpretable model (e.g., in SBOL,

5
GenBank, or SBML format). In this document we describe the association with the SBOL 3 data mode (Baig et al.,6 2020)l, which provides a formal semantic grounding for all elements of an SBOL Visual diagram, but equivalent 7 associations may be made between diagram elements and other models. In terms of the SBOL 3 data model, the 8 description of a nucleic acid design is formally defined as a representation of a Component, the Feature objects 9 describing its design, the Interaction and Participation objects describing their functional relationships, and 10 the Constraint objects describing relationships of relative location, orientation, topology, and identity. Specifically, an SBOL Visual diagram consists of the classes of objects illustrated in Figure 4. Figure 5 shows an 12 example of such a diagram, in a typical usage. Full details of this specification are provided in the remainder of this 13 section.

15
A diagram for a nucleic acid construct is based around a single or double line, representing the nucleic acid backbone.

16
Information about features of the construct can then be represented by attaching sequence feature glyphs to the 17 backbone, as defined below in Section 5.2.

18
In terms of the SBOL 3 data model, the backbone represents any clustering of Feature objects that describe the 19 structure of a single nucleic acid construct. In particular, a two Feature objects MAY be placed together on a nucleic If two Feature objects do not have any such relation or chain of relations through other Feature objects, then they 1 MUST NOT be placed on the same backbone.

2
The rules for use of nucleic acid backbones in diagrams are:

1
A sequence feature glyph in contact with a nucleic acid backbone indicates a feature of the nucleic acid sequence.

2
In terms of the SBOL 3 data model, this is a Feature with a nucleic acid type that is associated with that nucleic  overlap. An example is provided in Figure 12. 6. A nucleic acid feature SHOULD be represented using a glyph defined in Appendix A.1. In this case, the feature 8 MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data model, 9 this means the glyph is equal to or a parent of at least one of the roles for the Feature or its associated 10 Component if it is a SubComponent. Moreover, the glyph used SHOULD be the RECOMMENDED variant of 11 the most specific applicable glyph. Note that novel glyphs not defined in Appendix A.1 MAY be used, but 12 SHOULD be proposed for adoption as described in Section 4.3. Examples are provided in Figure 14.

Molecular Species 1
A glyph that is not in contact with any backbone represents any class of molecule whose detailed structure is not 2 being shown using sequence feature glyphs. In other words, either not a nucleic acid (e.g., proteins, small molecules) 3 or else an "uninteresting" nucleic acid (e.g., showing a transcribed mRNA, but not the features of its sequence). 4 In terms of the SBOL 3 data model, these are also Feature objects contained within the overall Component for the 5 diagram. 6 1. A molecular species SHOULD be represented using a glyph defined in Appendix A.2. In this case, the species 7 MUST be contained within at least one of the glyph's associated SBO terms. In terms of the SBOL 3 data model, 8 this means the SBO term for the glyph is equal to or a parent of at least one of the types for the associated 9 Feature or its type-defining referent (i.e., the Component linked by instanceOf for a SubComponent or the 10 type associated with the linked Feature for a ComponentReference). Moreover, the glyph used SHOULD be 11 the RECOMMENDED variant of the most specific applicable glyph. Note that novel glyphs not defined in 12 Appendix A.2 MAY be used, but SHOULD be proposed for adoption as described in Section 4.3.  : Examples of recommended, allowed, and not recommended representation of an interaction between a molecular species and a nucleic acid construct, in this case regulation of a promoter by a transcription factor protein that binds on the 5' side of the promoter: (a) shows the RECOMMENDED representation, (b) shows a more generic alternative, and (c) is recommended against because the location does not corresponding with the binding.

21
A directed edge "arrow" attached to one or more glyphs indicates a functional interaction involving those elements.

22
The roles of the elements is indicated by their position at the head or tail of the edge. In terms of the SBOL 2 data 23 model, this is an Interaction, with either one or two Participation relationships, their role set by position at 24 the head or tail of the edge. An example is provided in Figure 16. between arrows with a crossover pattern, in which one edge "diverts" at the intersection (see Figure 17).

27
Examples are provided in Figure 18.

28
2. An interaction SHOULD be represented using a glyph defined in Appendix A.3. In this case, the interaction 29 type MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data   : Examples of recommended, allowed, and forbidden relationships between two interactions in a mutual repression system: (a) non-crossing is recommended, (b) using a crossover pattern is allowed, but (c) crossing without a crossover pattern is forbidden, since the relationship between the two edges is ambiguous.
model, this means the glyph is equal to or a parent of at least one of the types for the Interaction, and that 1 each associated Participation object has a role compatible with its position on the head or tail of the edge.
2 Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most specific applicable glyph.

3
Note that novel glyphs not defined in Appendix A.3 MAY be used, but SHOULD be proposed for adoption as 4 described in Section 4.3. 3. An edge may have multiple heads or multiple tails. In this case, a split or join in an edge represents either 6 multiple participants with the same role (e.g., a transcription factor repressing two instances of a promoter) 7 or a biochemical process (e.g., association of an inducible protein and a small molecule to form an active 8 complex). An edge with multiple heads MUST use the same glyph for each head. An edge that splits or joins 9 with no glyph at the junction represents multiple participants with the same role. Examples are provided 10 in Figure 19.   Example of interaction node whose product plays a role in another interaction: dCas9 and gRNA associate to form a CRISPR complex, which then represses a promoter.
6. An edge MAY have another edge at its head or tail, indicating that the interaction has a participant that is 1 another interaction. This does not have a direct representation in the SBOL 2 data model. To avoid ambiguity, 2 an edge SHOULD NOT connect to only one head of a multi-head arrow or one tail of a multi-tail arrow, and 3 MUST connect only to the body of the edge, not its head or tail. Examples are provided in Figure 22. is not part of the subsystem; only certain diagram elements are allowed to cross a boundary, as defined below. In 8 terms of the SBOL 3 data model, the line represents a SubComponent included within the Component represented by 9 the surrounding diagram, and boundary-crossing elements define ComponentReference relationships. Note that 10 the internals of a subsystem need not be shown: some details can be omitted or a subsystem can even be a "black 11 box" with no internal structure at all being shown. shows the same subsystems but with rounded rectangles and the second being a "black box" subsystem with no internal structure shown, (c) shows subsystems with non-rectilinear borders, and (d) shows a black-box subsystem that is not visually distinct from a sequence feature glyph.

Glyphs for sequence features and molecular species MUST NOT intersect with the boundary of a subsystem.
2 An example is provided in Figure 24(a). Component that indicate that the nucleic acid sequence in the SubComponent is adjacent to the connected 8 portions of nucleic acid sequence in the larger Component. Examples are provided in Figure 24(b). 5. Small rectangles MAY be drawn on the outside of the subsystem boundary to represent its intended interface.

6
In terms of the SBOL 3 data model, these represent the Interface for the SubComponent, with each rectangle 7 associated with a Feature designated as an input, output, or nondirectional element of the interface.

8
An interface rectangle may be connected to an interaction edge head or tail to represent interactions with 9 its associated feature. If both an interface rectangle and a glyph for its associated feature are present in a 10 diagram, then any interaction with the feature from outside of the subsystem MUST both pass through the 11 interface rectangle and connect with the glyph. An interaction with a feature that does not pass through an 12 interface rectangle then MAY be used to represent an unintended interaction with a non-interface element of 13 the subsystem. Examples are provided in Figure 25.

15
The name of any object in a diagram is RECOMMENDED to be displayed as text within, adjacent to, or otherwise 16 clearly visually connected to the object's associated glyph. In terms of the SBOL 3 data model, this is the name 17 property, and if no name is supplied then the displayId MAY be used instead. Examples are provided in Figure 26. Other text or graphics may be included as annotations with no constraint on their syntax or semantics. 2. Annotations SHOULD NOT be used to display information that can be displayed using other SBOL Visual 1 elements.

Inert DNA Spacer
Associated SO term(s) SO:0002223 (Inert DNA Spacer)

Recommended Glyph and Alternates
The inert DNA spacer glyph is a circle with an X in its middle, suggesting the intent to cancel possible interactions:

Prototypical Example
Inserted 5' sequence intended to reduce effect of upstream genetic context on promoter behavior.

Recommended Glyph and Alternates
A polypeptide region inside a coding sequence is indicated by insertion of triangular boundaries inside of the CDS, parallel to the 3' side of the CDS. This will produce chevron segments on the 3' side and a CDS shape on the 5' side:

Prototypical Example
degradation tag on a protein coding sequence nuclear localization tag on a protein coding sequence coding sequence for the membrane-crossing region of a protein This glyph is intended to be used in composition or superposition with the glyph for the coding sequence of which the polypeptide regions are fragments: Example of a coding sequence with three designated domains, an N-tag (blue), C-tag (yellow), and internal region (red):

Notes
Polypeptide region can also be used to represent regions that involve cleavage, such as a 2A self-cleaving polypeptide region (SO:0002224, a child term of SO:0000839). It is RECOMMENDED that cleavage-inducing polypeptide regions be visually distinguished from intact, e.g., through the use of dashed lines.

Stop Site
Associated SO term(s) SO

Recommended Glyph and Alternates
Transcription/Translation End Point is a "stem-top" glyph for describing small sites. In this system: the top glyph indicates the type of site (e.g., Biopolymer Location) the stem glyph indicates whether the site affects DNA, RNA, or protein (respectively: straight, wavy, or looped) The Transcription/Translation End Point top is an asterisk in a circle (in order: transcription, translation):  gfp pTet Figure 28: The same functional unit as in the previous figure, with additional assembly-focused information: there is a 5' overhang before the promoter, a 3' overhand after the terminator, and an assembly scar between the promoter and the ribosome entry site left over from a prior step of assembly. pTet Figure 29: Promoter pTet stored in a circular plasmid. The promoter is prepared for being cut out of the plasmid: it is preceded by a 5' sticky end restriction site and followed by a 3' stick end restriction site. In addition, the plasmid has been bar-coded with a signature and has its origin of replication marked. pTet% Figure 30: Promoter pTet stored in a circular plasmid, flanked by blunt-ended restriction sites. In addition, the plasmid has been bar-coded with a signature and has its origin of replication marked.

Section B Examples
Page 84 Figure 31: Promoter pTet stored in a circular plasmid, flanked by restriction sites with unspecified cut structure. In addition, the plasmid has been bar-coded with a signature and has its origin of replication marked.

Section pTet
pTet Figure 32: Circular plasmid containing a pTet promoter followed by a ribonuclease site. In addition, the plasmid has been bar-coded with a signature and has its origin of replication marked.         • Glyphs include information on interior, bounding box, and recommended backbone alignment.

7
• Sequence feature glyphs are required to have their bounding boxes contact the nucleic acid backbone. 8 • Nucleic acid diagrams now require the nucleic acid backbone line, and the number of lines allowed in 9 various circumstances is constrained.

10
• Explicit statement of when a glyph can and cannot be used to represent a particular element of a 11 diagram.

12
■ Labels that name objects are distinguished from other types of textual annotation.

13
■ Explicit statement of which aspects of a symbol are not controlled.
14 ■ Symbol variants are now supported.

15
In addition, the collection of sequence feature glyphs have been expanded and modified in the following ways: The major difference between SBOL Visual 3 and SBOL Visual 2 is that diagrams and glyphs are defined with respect 2 to the SBOL 3 data model rather than the SBOL 2 data model. SBOL Visual 3 diagrams may still be related to the 3 SBOL 2 data model by following the mapping between SBOL 3 and SBOL 2 data models provided in the SBOL 3 4 specification.

5
A byproduct of this change is that the use of dashed undirected lines for subsystem mappings has been removed.

6
In SBOL Visual 2, dashed line mappings were analogous to an SBOL 2 MapsTo, which is a compound mapping 7 relationship indicating both reference into a subsystem and one of several identity relationships. In SBOL 3, these 8 functions have been divided between two classes, Constraint to indicate relationships (including identity) and 9 ComponentReference to access subsystem features. In SBOL Visual 3, interactions crossing a subsystem boundary 10 line indicate access of subsystem features via ComponentReference. As SBOL 3 Constraint objects can express 11 many other relationships besides identity, however, the potential use of dashed undirected lines to indicate identity 12 relationships is currently reserved as a potential future addition to the SBOL Visual 3 specification, but not yet 13 implemented. Until a decision is made about how to represent these relationships, the specification is mute on both 14 constraints and dashed undirected lines, which means that it is acceptable to use them, if desired, as an annotation 15 indicating identity.

16
In addition, collection of glyphs has been modified as follows: 17 ■ The deprecated Insulator glyph and "shmoo" Macromolecule alternative glyph have been removed.

18
■ Deprecated BioPAX alternatives to SBO terms for molecular species glyphs have been removed.

19
Section C Mapping between to SBOL Visual 1, 2, 3 Page 90 of 91 Copyright XXXX The Author(s). Published by Journal of Integrative Bioinformatics.