Synthetic biology open language visual (SBOL Visual) version 2.3

Abstract People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.3 of SBOL Visual, which builds on the prior SBOL Visual 2.2 in several ways. First, the specification now includes higher-level “interactions with interactions,” such as an inducer molecule stimulating a repression interaction. Second, binding with a nucleic acid backbone can be shown by overlapping glyphs, as with other molecular complexes. Finally, a new “unspecified interaction” glyph is added for visualizing interactions whose nature is unknown, the “insulator” glyph is deprecated in favor of a new “inert DNA spacer” glyph, and the polypeptide region glyph is recommended for showing 2A sequences.


Purpose
People who engineer biological organisms often find it useful to draw diagrams in order to communicate both the 2 structure of the nucleic acid sequences that they are engineering and the functional relationships between sequence 3 features and other molecular species. Some typical practices and conventions have begun to emerge for such 4 diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language 5 for expressing the structure and function of genetic designs. At the same time, we aim to make this language simple 6 and easy to use, allowing a high degree of flexibility and freedom in how such diagrams are organized, presented, 7 and styled-in particular, it should be readily possible to create diagrams either by hand or using a wide variety of 8 software programs. Finally, means are provided for extending the language with new and custom diagram elements, 9 and for adoption of useful new elements into the language. In order to ground SBOL Visual with precise definitions, we reference its visual elements to data models with 12 well-defined semantics. In particular, glyphs in SBOL Visual are defined in terms of their relation to the SBOL 2 13 data model (as defined in BBF RFC 112) and terms in the Sequence Ontology (Eilbeck et al., 2005) and the Systems 14 Biology Ontology (Courtot et al., 2011). 15 SBOL Visual is not intended to represent designs at the same level of detail as these data models. Effective visual 16 diagrams are necessarily more abstract, focusing only on those aspects of a system that are the subject of the 17 communication. Nevertheless, we take as a principle that it should be possible to transform any SBOL Visual 18 diagram into an equivalent (if highly abstract) SBOL 2 data representation. Likewise, we require that SBOL Visual 19 should be able to represent all of the significant structural or functional relationships in any GenBank or SBOL data 20 representation.

26
Every glyph in SBOL Visual 2.3 corresponds to an element of the SBOL 2.3 data model. SBOL Visual 2.3 also defines 27 many terms by reference to SBOL 2.3, or by reference to the Sequence Ontology (Eilbeck et al., 2005) or the Systems 28 Biology Ontology (Courtot et al., 2011). 29 SBOL Visual is intended to be compatible with the Systems Biology Graphical Notation Activity Flow Language 30 (SBGN AF) (Le Novère et al., 2009), and species and interaction glyphs have been imported from that language (see: This document indicates requirement levels using the controlled vocabulary specified in IETF RFC 2119 and 3 reiterated in BBF RFC 0. In particular, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 4 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted 5 as described in RFC 2119: 6 ■ The words "MUST", "REQUIRED", or "SHALL" mean that the item is an absolute requirement of the specifica-7 tion.
8 ■ The phrases "MUST NOT" or "SHALL NOT" mean that the item is an absolute prohibition of the specification. 9 ■ The word "SHOULD" or the adjective "RECOMMENDED" mean that there might exist valid reasons in 10 particular circumstances to ignore a particular item, but the full implications need to be understood and 11 carefully weighed before choosing a different course.

12
■ The phrases "SHOULD NOT" or "NOT RECOMMENDED" mean that there might exist valid reasons in 13 particular circumstances when the particular behavior is acceptable or even useful, but the full implications 14 need to be understood and the case carefully weighed before implementing any behavior described with this 15 label.
16 ■ The word "MAY" or the adjective "OPTIONAL" mean that an item is truly optional.  2 data model, this is formally defined as any Interaction with a compatible term within its types, i.e., one 16 that is equal to or a child of at least one term associated with the glyph, and with a compatible Participation 17 at the head and tail of the arrow.

18
More than one glyph may share the same definition: in this case, these glyphs form a family of variants, of which 19 precisely one MUST be designated as the RECOMMENDED glyph, which is to be used unless there are strong 20 reasons to prefer an alternative variant.

21
It will also frequently be the case that a diagram element could be represented by more than one glyph (e.g., a glyph 22 for a specific term and a glyph for a more general term be represented by either a pentagonal glyph or an arrow glyph, but the pentagon is the RECOMMENDED variant, 29 and so it is likewise preferred. Figure 1 illustrates this example.

30
SHOULD gfp MAY gfp SHOULD NOT Figure 1: A biological design element such as a protein coding sequence (CDS) is best represented by the most specific RECOMMENDED glyph (middle), but can be represented by a less specific glyph such as Unspecified (left) or an approved alternative glyph (right). 11. If a sequence feature glyph can represent components of highly variable size or structural complexity, the 25 glyph SHOULD be able to be scaled horizontally to indicate relative property value.  3. The scale of glyphs are RECOMMENDED to be kept consistent with their specification and throughout 9 a diagram, but can be altered if desired, particularly to convey additional information (e.g., length of a 10 sequence). 4. Minor styling effects MAY be chosen (e.g., shadow, corner styling, other "font-level" customization) 12 Figure 3 shows some examples of acceptable style variation.

13
In certain special cases, the style of a glyph may be more constrained, but such cases are expected to be rare and 14 strongly motivated.

16
The collection of SBOL Visual glyphs is not expected to provide complete coverage of all of the types of element that 17 people will wish to include in genetic diagrams, particularly given the ongoing evolution of synthetic biology as 18 an engineering discipline. As the need for new diagram elements or new practices of usage emerge, new glyphs or 19 glyph definitions are expected to be added to SBOL Visual. In particular, the following three classes of changes are 20 expected to occur regularly, and the SBOL development community will maintain clear processes for proposal and 21 adoption of changes of this type:

22
■ New glyphs, either representing a type of component that previously lacked a glyph or enabling a distinction 23 between types of components previously represented by the same glyph.

24
■ Additional glyph variants, accompanied by compelling use cases that cannot be adequately addressed by the 25 existing glyph variants.

26
■ Additional definitions for a glyph, capturing an alternate meaning that is useful to humans but existing within 27 a disjoint branch of the relevant ontology.

28
In order to support the coherent extension of SBOL Visual, whenever a diagram creator uses a glyph not found in

1
An SBOL Visual diagram represents information about the structure of a nucleic acid design and its associated 2 molecular species and interactions. If desired, an SBOL Visual diagram may also be associated with a machine-3 interpretable model (e.g., in SBOL, GenBank, or SBML format). In this document we describe the association 4 for the SBOL 2 data model, which provides a formal semantic grounding for all elements of an SBOL Visual 5 diagram, but equivalent associations may be made between diagram elements and other models. In terms of 6 the SBOL 2 data model, the description of a nucleic acid design is formally defined as a representation of a 7 ComponentDefinition with a nucleic acid type, the Component and SequenceAnnotation objects describing the

11
A glyph in contact with a nucleic acid backbone indicates a feature of the nucleic acid sequence. In terms of the 12 SBOL 2 data model, this is either a SequenceFeature or a Component with a nucleic acid type that is contained 13 within the ComponentDefinition associated with that nucleic acid backbone. The Component may be contained 14 either directly, as one of the components of the ComponentDefinition, or recursively through a sequence of such 15 containments.   3. Nucleic acid features in a sequential relationship SHOULD be drawn from 5' left to 3' right on the inline 4 strand and from 5' right to 3' left on the reverse complement strand. In terms of the SBOL 2 data model, this 5 indicates a SequenceConstraint on the relative ordering of two features. overlap. An example is provided in Figure 12. sequence of promoter, ribosome entry site, CDS, and terminator: (a) is RECOMMENDED because it uses the preferred variant of the most specific defined glyphs, (b) is allowed because it uses some novel custom non-conflicting symbol, not matching any glyph defined in this document, to encode more specific information about the particular CDS, (c) is recommended against because it uses less specific glyphs, and (d) is forbidden because it use a promoter symbol to represent the terminator.

Molecular Species 9
A glyph that is not in contact with any backbone represents any class of molecule whose detailed structure is not 10 being shown using sequence feature glyphs. In other words, either not a nucleic acid (e.g., proteins, small molecules) 11 or else an "uninteresting" nucleic acid (e.g., showing a transcribed mRNA, but not the features of its sequence).

12
In terms of the SBOL 2 data model, this is a FunctionalComponent that is contained within a ModuleDefinition 13 implicit in the diagram. 14 1. A molecular species SHOULD be represented using a glyph defined in Appendix A.2. In this case, the species 15 MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data model, this 16 means the glyph is equal to or a parent of at least one of the types for the associated ComponentDefinition.

17
Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most specific applicable glyph.

18
Note that novel glyphs not defined in Appendix A.2 MAY be used, but SHOULD be proposed for adoption as 19 described in Section 4.3.

8
A directed edge "arrow" attached to one or more glyphs indicates a functional interaction involving those elements.

9
The roles of the elements is indicated by their position at the head or tail of the edge. In terms of the SBOL 2 data 10 model, this is an Interaction, with either one or two Participation relationships, their role set by position at 11 the head or tail of the edge. An example is provided in Figure 16. 1. Two interaction edges SHOULD NOT cross one another. When edges cross, they MUST indicate the distinction 13 between arrows with a crossover pattern, in which one edge "diverts" at the intersection (see Figure 17).
14 Examples are provided in Figure 18. 2. An interaction SHOULD be represented using a glyph defined in Appendix A.3. In this case, the interaction 16 type MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data 17 model, this means the glyph is equal to or a parent of at least one of the types for the Interaction, and that 18 each associated Participation object has a role compatible with its position on the head or tail of the edge.

19
Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most specific applicable glyph.

20
Note that novel glyphs not defined in Appendix A.3 MAY be used, but SHOULD be proposed for adoption as 21 described in Section 4.3. : Examples of recommended, allowed, and forbidden relationships between two interactions in a mutual repression system: (a) non-crossing is recommended, (b) using a crossover pattern is allowed, but (c) crossing without a crossover pattern is forbidden, since the relationship between the two edges is ambiguous.
3. An edge may have multiple heads or multiple tails. In this case, a split or join in an edge represents either 1 multiple participants with the same role (e.g., a transcription factor repressing two instances of a promoter) 2 or a biochemical process (e.g., association of an inducible protein and a small molecule to form an active   Example of interaction node whose product plays a role in another interaction: dCas9 and gRNA associate to form a CRISPR complex, which then represses a promoter.
5. An edge with its head at an interaction node MAY use an Interaction arrow head to indicate a role other

Section 5. SBOL Visual Diagram Language
6. An edge MAY have another edge at its head or tail, indicating that the interaction has a participant that is 1 another interaction. This does not have a direct representation in the SBOL 2 data model. To avoid ambiguity, 2 an edge SHOULD NOT connect to only one head of a multi-head arrow or one tail of a multi-tail arrow, and 3 MUST connect only to the body of the edge, not its head or tail. Examples are provided in Figure 22.

5
A module within a system MAY be represented by a visual boundary in the form of closed polygon or closed curve. 6 Everything inside of the boundary is part of the module, and everything outside of the boundary is not part of the 7 module; only certain diagram elements are allowed to cross a boundary, as defined below. In terms of the SBOL 2 8 data model, the line represents a Module included within the ModuleDefinition represented by the surrounding 9 diagram, and boundary-crossing elements define MapsTo relationships. Note that the internals of a module need 10 not be shown: some details can be omitted or a module can even be a "black box" with no internal structure at all 11 being shown. shows the same modules but with rounded rectangles and the second being a "black box" module with no internal structure shown, (c) shows modules with non-rectilinear borders, and (d) shows a black-box module that is not visually distinct from a sequence feature glyph.
2. An undirected edge (i.e., having no "arrow head") that crosses the boundary of a module represents a mapping 16 associating the diagram elements that it links. Glyphs associated by a mapping MUST either be sequence  4. Small rectangles MAY be drawn on the outside of the module boundary to represent input/output ports. In 1 terms of the SBOL 2 data model, each rectangle is associated with a FunctionalComponent with a direction 2 property that is in, out, or inout. A port may be connected to an interaction edge head or tail to represent The name of any object in a diagram is RECOMMENDED to be displayed as text within, adjacent to, or otherwise 11 clearly visually connected to the object's associated glyph. In terms of the SBOL 2 data model, this is the name 12 property, and if no name is supplied then the displayId MAY be used instead. Examples are provided in Figure 27.  A SBOL Visual Glyphs 1 The following pages present all current glyphs for SBOL Visual, organized by glyph families. Each entry lists:

Polypeptide Region
Associated SO term(s) SO:0000839 (polypeptide region)

Recommended Glyph and Alternates
A polypeptide region inside a coding sequence is indicated by insertion of triangular boundaries inside of the CDS, parallel to the 3' side of the CDS. This will produce chevron segments on the 3' side and a CDS shape on the 5' side: Prototypical Example degradation tag on a protein coding sequence nuclear localization tag on a protein coding sequence coding sequence for the membrane-crossing region of a protein This glyph is intended to be used in composition or superposition with the glyph for the coding sequence of which the polypeptide regions are fragments: Example of a coding sequence with three designated domains, an N-tag (blue), C-tag (yellow), and internal region (red):

Notes
Polypeptide region can also be used to represent regions that involve cleavage, such as a 2A self-cleaving polypeptide region (SO:0002224, a child term of SO:0000839). It is RECOMMENDED that cleavage-inducing polypeptide regions be visually distinguished from intact, e.g., through the use of dashed lines.

Sticky End Restriction Enzyme Cleavage Site
Associated SO term(s) SO

Recommended Glyph and Alternates
The 5' sticky restriction site glyph is an image of the lines along which two strands of DNA will be cut into 5' sticky ends, and the complementary 3' Sticky Restriction Site glyph is a reflection of the 5' Sticky Restriction Site. Vertical position with respect to the backbone is in a break in a single backbone (in order: five-prime, three-prime): and between strands of a double backbone (in order: five-prime double stranded, three-prime double stranded):  gfp pTet Figure 29: The same functional unit as in Figure 28, with additional assembly-focused information: there is a 5' overhang before the promoter, a 3' overhand after the terminator, and an assembly scar between the promoter and the ribosome entry site left over from a prior step of assembly. pTet Figure 30: Promoter pTet stored in a circular plasmid. The promoter is prepared for being cut out of the plasmid: it is preceded by a 5' sticky end restriction site and followed by a 3' stick end restriction site. In addition, the plasmid has been bar-coded with a signature and has its origin of replication marked. pTet% Figure 31: Promoter stored in a plasmid as in Figure 30, except that the restriction sites before and after the promoter are blunt-end.  Figure 30, except that the cut structure of the restriction sites before and after the promoter is not specified.

Prototypical Example
pTet Figure 33: Promoter stored in a plasmid as in Figure 30, except that there is a ribonuclease site after the promoter rather than restriction sites flanking it.         • Glyphs include information on interior, bounding box, and recommended backbone alignment. 6 • Sequence feature glyphs are required to have their bounding boxes contact the nucleic acid backbone. 7 • Nucleic acid diagrams now require the nucleic acid backbone line, and the number of lines allowed in 8 various circumstances is constrained.

9
• Explicit statement of when a glyph can and cannot be used to represent a particular element of a 10 diagram.

11
■ Labels that name objects are distinguished from other types of textual annotation.

12
■ Explicit statement of which aspects of a symbol are not controlled.

13
■ Symbol variants are now supported.
14 In addition, the collection of sequence feature glyphs have been expanded and modified in the following ways: This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License (http://creativecommons.org/licenses/by-nc-nd/3.0/).