Synthetic Biology Open Language Visual (SBOL Visual) Version 2.1

Abstract People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species . Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.1 of SBOL Visual, which builds on the prior SBOL Visual 2.0 standard by expanding diagram syntax to include methods for showing modular structure and mappings between elements of a system, interactions arrows that can split or join (with the glyph at the split or join indicating either superposition or a chemical process), and adding new glyphs for indicating genomic context (e.g., integration into a plasmid or genome) and for stop codons.


Purpose
People who engineer biological organisms often find it useful to communicate in diagrams, both about the structure 2 of the nucleic acid sequences that they are engineering and about the functional relationships between sequence 3 features and other molecular species. Some typical practices and conventions have begun to emerge for such 4 diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language 5 for expressing the structure and function of genetic designs. At the same time, we aim to make this language simple 6 and easy to use, allowing a high degree of flexibility and freedom in how such diagrams are organized, presented, 7 and styled-in particular, it should be readily possible to create diagrams both by hand and with a wide variety of 8 software programs. Finally, means are provided for extending the language with new and custom diagram elements, 9 and for adoption of useful new elements into the language. In order to ground SBOL Visual with precise definitions, we reference its visual elements to data models with 12 well-defined semantics. In particular, glyphs in SBOL Visual are defined in terms of their relation to the SBOL 2 data 13 model (as defined in BBF RFC 112) and terms in the Sequence Ontology (Eilbeck et al., 2005), the Systems Biology 14 Ontology (Courtot et al., 2011), and BioPAX (Goldberg et al., 2010). 15 SBOL Visual is not intended to represent designs at the same level of detail as these data models. Effective visual 16 diagrams are necessarily more abstract, focusing only on those aspects of a system that are the subject of the 17 communication. Nevertheless, we take as a principle that it should be possible to transform any SBOL Visual 18 diagram into an equivalent (if highly abstract) SBOL 2 data representation. Likewise, we require that SBOL Visual 19 should be able to represent all of the significant structural or functional relationships in any GenBank or SBOL data 20 representation.

25
Every glyph in SBOL Visual 2.1 corresponds to an element of the SBOL 2.1 data model, as defined in BBF RFC 26 112. SBOL Visual 2.1 also defines many terms by reference to BBF RFC 112, or by reference to the Sequence 27 Ontology (Eilbeck et al., 2005), the Systems Biology Ontology (Courtot et al., 2011), or BioPAX (Goldberg et al., 2010). 28 SBOL Visual is intended to be compatible with the Systems Biology Graphical Notation Activity Flow Language 29 (SBGN AF) (Le Novère et al., 2009), and species and interaction glyphs have been imported from that language (see: This document indicates requirement levels using the controlled vocabulary specified in IETF RFC 2119 and 3 reiterated in BBF RFC 0. In particular, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 4 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted 5 as described in RFC 2119: 6 ■ The words "MUST", "REQUIRED", or "SHALL" mean that the item is an absolute requirement of the specifica-7 tion. 8 ■ The phrases "MUST NOT" or "SHALL NOT" mean that the item is an absolute prohibition of the specification. 9 ■ The word "SHOULD" or the adjective "RECOMMENDED" mean that there might exist valid reasons in 10 particular circumstances to ignore a particular item, but the full implications need to be understood and 11 carefully weighed before choosing a different course. 12 ■ The phrases "SHOULD NOT" or "NOT RECOMMENDED" mean that there might exist valid reasons in 13 particular circumstances when the particular behavior is acceptable or even useful, but the full implications 14 need to be understood and the case carefully weighed before implementing any behavior described with this 15 label. 16 ■ The word "MAY" or the adjective "OPTIONAL" mean that an item is truly optional.

18
The definition of SBOL Visual references several SBOL classes, which are defined as listed here. For full definitions 19 and explanations, see BBF RFC 112, describing the SBOL 2.1 data model.

20
ComponentDefinition: Describes the structure of designed entities, such as DNA, RNA, and proteins, as well as 21 other entities they interact with, such as small molecules or environmental properties.

22
■ Component: Pointer class. Incorporates a child ComponentDefinition by reference into exactly one par-23 ent ComponentDefinition. Represents a specific occurrence or instance of an entity within the design of 24 a more complex entity. Because the same definition might appear in multiple designs or multiple times in 25 a single design, a single ComponentDefinition can have zero or more parent ComponentDefinitions, 26 and each such parent-child link requires its own, distinct Component.

27
■ Location: Specifies the base coordinates and orientation of a genetic feature on a DNA or RNA molecule 28 or a residue or site on another sequential macromolecule such as a protein.

29
■ SequenceAnnotation: Describes the Location of a notable sub-sequence found within the Sequence 30 of a ComponentDefinition. Can also link to and effectively position a child Component. 31 ■ SequenceConstraint: Describes the relative spatial position and orientation of two Component objects 32 that are contained within the same ComponentDefinition.

1
A glyph is a visual symbol used to represent an element in an SBOL Visual diagram. All of the currently defined 2 glyphs are collected in Appendix A. This section explains how glyphs are specified and how to add new glyphs.

3
Each SBOL glyph is defined by association with ontology terms, and can be used to represent any diagram element 4 that is well-described by that term. Currently there are three classes of glyphs, each associated with a different 5 ontology and different class in the SBOL 2 data model: Ontology terms. For the SBOL 2 data model, this is formally defined as any Component with a compatible term 8 within its associated roles, i.e., one that is equal to or a child of at least one term associated with the glyph.

9
■ Molecular Species Glyphs represent any class of molecule whose detailed structure is not being shown using 10 sequence feature glyphs. They are associated with BioPAX terms. For the SBOL 2 data model, this is formally 11 defined as any FunctionalComponent with a compatible term within its associated types, i.e., one that is 12 equal to or a child of at least one term associated with the glyph.

13
■ Interaction Glyphs are "arrows" indicating functional relationships between sequence features and/or molec-14 ular species. They are associated with Systems Biology Ontology terms. For the SBOL 2 data model, this is 15 formally defined as any Interaction with a compatible term within its types, i.e., one that is equal to or a 16 child of at least one term associated with the glyph, and with a compatible Participation at the head and 17 tail of the arrow.

18
More than one glyph may share the same definition: in this case, these glyphs form a family of variants, of which 19 precisely one MUST be designated as the RECOMMENDED glyph, which is to be used unless there are strong 20 reasons to prefer an alternative variant.

21
It will also frequently be the case that a diagram element could be represented by more than one glyph (e.g., a glyph 22 for a specific term and a glyph for a more general term). In such cases, it is RECOMMENDED that the most specific 23 applicable glyph be used. However, if upward branching in the relevant ontology means two applicable glyphs do 24 not have an ordered parent/child relation, then either MAY be used.

25
For example, a protein coding sequence (CDS) is a sequence feature that may be represented either using the CDS 26 glyph (Sequence Ontology term SO:0000316) or the Unspecified glyph (Sequence Ontology term SO:0000001). Since 27 SO:0000316 is contained by SO:0000001, the preferred glyph is CDS, rather than Unspecified. Likewise, a CDS may 28 be represented by either a pentagonal glyph or an arrow glyph, but the pentagon is the RECOMMENDED variant, 29 and so it is likewise preferred. Figure 1 illustrates this example.

30
SHOULD gfp MAY gfp SHOULD NOT Figure 1: A biological design element such as a protein coding sequence (CDS) is best represented by the most specific RECOMMENDED glyph (middle), but can be represented by a less specific glyph such as Unspecified (left) or an approved alternative glyph (right). Definitions are RECOMMENDED to be from the Sequence Ontology for nucleic acid components, from 5 BioPAX for other components, and from the Systems Biology Ontology for interactions. If no applicable terms 6 are available in the preferred ontology, proposal of a new glyph SHOULD be accompanied by a request to the 7 ontology maintainers to add a term for the undefined entity.

2.
A glyph SHOULD be relatively easy to sketch by hand (e.g., no high-complexity images or precise angles 9 required).  4. A glyph specification SHOULD show the glyph in its preferred relative scale with respect to other glyphs. 12 5. A glyph SHOULD be specified using only solid black lines (leaving color and style to be determined by the 13 user, as noted below). 14 6. A glyph SHOULD NOT be similar enough to be easily confused with any other glyph when written by hand, or 15 when scaled either vertically, horizontally, or both. 16 7. A glyph SHOULD NOT include text (note that associated labels are not part of the glyph).

17
In addition, some requirements apply only to certain classes of glyphs: 11. If a sequence feature glyph can represent components of highly variable size or structural complexity, the 25 glyph SHOULD be able to be scaled horizontally to indicate relative property value.  3. The scale of glyphs are RECOMMENDED to be kept consistent with their specification and throughout 9 a diagram, but can be altered if desired, particularly to convey additional information (e.g., length of a 10 sequence). 4. Minor styling effects MAY be chosen (e.g., shadow, corner styling, other "font-level" customization) 12 Figure 3 shows some examples of acceptable style variation.

13
In certain special cases, the style of a glyph may be more constrained, but such cases are expected to be rare and 14 strongly motivated.

16
The collection of SBOL Visual glyphs is not expected to provide complete coverage of all of the types of element that 17 people will wish to include in genetic diagrams, particularly given the ongoing evolution of synthetic biology as 18 an engineering discipline. As the need for new diagram elements or new practices of usage emerge, new glyphs or 19 glyph definitions are expected to be added to SBOL Visual. In particular, the following three classes of changes are 20 expected to occur regularly, and the SBOL development community will maintain clear processes for proposal and An SBOL Visual diagram represents information about the structure of a nucleic acid design and its associated 2 molecular species and interactions. If desired, an SBOL Visual diagram may also be associated with a machine-3 interpretable model (e.g., in SBOL, GenBank, or SBML format). In this document we describe the association 4 for the SBOL 2 data model, which provides a formal semantic grounding for all elements of an SBOL Visual 5 diagram, but equivalent associations may be made between diagram elements and other models. In terms of 6 the SBOL 2 data model, the description of a nucleic acid design is formally defined as a representation of a 7 ComponentDefinition with a nucleic acid type, the Component and SequenceAnnotation objects describing the to the backbone, as defined below in Section 6.2. In terms of the SBOL 2 data model, the backbone represents 20 a ComponentDefinition with a nucleic acid type (e.g., DNA, RNA), and the features represent Component and 21 SequenceAnnotation members of the ComponentDefinition.     3. Nucleic acid features in a sequential relationship SHOULD be drawn from 5' left to 3' right on the inline 4 strand and from 5' right to 3' left on the reverse complement strand. In terms of the SBOL 2 data model, this 5 indicates a SequenceConstraint on the relative ordering of two features. overlap. An example is provided in Figure 12. 6. A nucleic acid feature SHOULD be represented using a glyph defined in Appendix A.1. In this case, the feature 3 MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data model, 4 this means the glyph is equal to or a parent of at least one of the roles for the Component or its associated 5 ComponentDefinition. Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most 6 specific applicable glyph. Note that novel glyphs not defined in Appendix A.1 MAY be used, but SHOULD be 7 proposed for adoption as described in Section 5.3. Examples are provided in Figure 14. sequence of promoter, ribosome entry site, CDS, and terminator: (a) is RECOMMENDED because it uses the preferred variant of the most specific defined glyphs, (b) is allowed because it uses some novel custom non-conflicting symbol, not matching any glyph defined in this document, to encode more specific information about the particular CDS, (c) is recommended against because it uses less specific glyphs, and (d) is forbidden because it use a promoter symbol to represent the terminator.

Molecular Species 9
A glyph that is not in contact with any backbone represents any class of molecule whose detailed structure is not 10 being shown using sequence feature glyphs. In other words, either not a nucleic acid (e.g., proteins, small molecules) 11 or else an "uninteresting" nucleic acid (e.g., showing a transcribed mRNA, but not the features of its sequence).

12
In terms of the SBOL 2 data model, this is a FunctionalComponent that is contained within a ModuleDefinition 13 implicit in the diagram. 14 1. A molecular species glyph MUST NOT contact any nucleic acid backbone with any part of its bounding box.
15 2. A molecular species SHOULD be represented using a glyph defined in Appendix A.2. In this case, the species 16 MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data model, this 17 means the glyph is equal to or a parent of at least one of the types for the associated ComponentDefinition.

18
Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most specific applicable glyph.

Section 6. SBOL Visual Diagram Language
Note that novel glyphs not defined in Appendix A.2 MAY be used, but SHOULD be proposed for adoption as 1 described in Section 5.3. The roles of the elements is indicated by their position at the head or tail of the edge. In terms of the SBOL 2 data 5 model, this is an Interaction, with either one or two Participation relationships, their role set by position at 6 the head or tail of the edge. An example is provided in Figure 15. 1. Two interaction edges SHOULD NOT cross one another. When edges cross, they MUST indicate the distinction 8 between arrows with a crossover pattern, in which one edge "diverts" at the intersection (see Figure 16).

9
Examples are provided in Figure 17.  : Examples of recommended, allowed, and forbidden relationships between two interactions in a mutual repression system: (a) non-crossing is recommended, (b) using a crossover pattern is allowed, but (c) crossing without a crossover pattern is forbidden, since the relationship between the two edges is ambiguous.
2. An interaction SHOULD be represented using a glyph defined in Appendix A.3. In this case, the interaction 11 type MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data 12 model, this means the glyph is equal to or a parent of at least one of the types for the Interaction, and that 13 each associated Participation object has a role compatible with its position on the head or tail of the edge.
14 Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most specific applicable glyph.

15
Note that novel glyphs not defined in Appendix A.3 MAY be used, but SHOULD be proposed for adoption as

Section 6. SBOL Visual Diagram Language
3. An edge may have multiple heads or multiple tails. In this case, a split or join in an edge represents either 1 multiple participants with the same role (e.g., a transcription factor repressing two instances of a promoter) 2 or a biochemical process (e.g., association of an inducible protein and a small molecule to form an active 3 complex). An edge with multiple heads MUST use the same glyph for each head. An edge that splits or joins 4 with no glyph at the junction represents multiple participants with the same role. A glyph at the point where 5 an edge splits or joins represents a biochemical process, i.e., an additional Interaction with type and roles 6 set by the process glyph. Examples are provided in Figure 18.

A biochemical process represented by a glyph at an edge junction SHOULD be represented using a glyph
8 defined in Appendix A.4. In this case, the interaction type MUST be contained within at least one of the glyph's 9 associated terms. In terms of the SBOL 2 data model, this means the glyph is equal to or a parent of at least one 10 of the types for the Interaction, and that each associated Participation object has a role compatible 11 with its position on the head or tail of the edge. Moreover, the glyph used SHOULD be the RECOMMENDED 12 variant of the most specific applicable glyph. Note that novel glyphs not defined in Appendix A.4 MAY be used, 13 but SHOULD be proposed for adoption as described in Section 5.3. : Examples of recommended and problematic module boundaries: (a) two modules with visually distinct rectangular borders, (b) shows the same modules but with rounded rectangles and the second being a "black box" module with no internal structure shown, (c) shows modules with non-rectilinear borders, and (d) shows a black-box module that is not visually distinct from a sequence feature glyph.
2. An undirected edge (i.e., having no "arrow head") that crosses the boundary of a module represents a mapping 6 associating the diagram elements that it links. Glyphs associated by a mapping MUST either be sequence 7 features, molecular species, or module ports (see below), and must be of compatible types. In terms of 4. Small rectangles MAY be drawn on the outside of the module boundary to represent input/output ports. In 6 terms of the SBOL 2 data model, each rectangle is associated with a FunctionalComponent with a direction 7 property that is in, out, or inout. A port may be connected to an interaction edge head or tail to represent The name of any object in a diagram is RECOMMENDED to be displayed as text within, adjacent to, or otherwise 16 clearly visually connected to the object's associated glyph. In terms of the SBOL 2 data model, this is the name 17 property, and if no name is supplied then the displayId MAY be used instead. Examples are provided in Figure 23. Other text or graphics may be included as annotations with no constraint on their syntax or semantics.  These glyphs represent features of nucleic acid sequences, and include a bounding box (grey dashed box) and a 9 recommended alignment to the nucleic acid backbone (grey dashed horizontal line).

Ribosome Entry Site
Associated SO term(s) SO:0000139: Ribosome Entry Site

Recommended Glyph and Alternates
The ribosome entry promoter glyph is a half-ovoid sitting on the backbone, suggesting an attached ribosome beginning transcription:

Recommended Glyph and Alternates
The signature glyph is a box sitting atop the backbone with an X and line inside it, suggesting a signature on a form:  gfp pTet Figure 25: The same functional unit as in Figure 24, with additional assembly-focused information: there is a 5' overhang before the promoter, a 3' overhand after the terminator, and an assembly scar between the promoter and the ribosome entry site left over from a prior step of assembly. pTet Figure 26: Promoter pTet stored in a circular plasmid. The promoter is prepared for being cut out of the plasmid: it is preceded by a 5' sticky end restriction site and followed by a 3' stick end restriction site. In addition, the plasmid has been bar-coded with a signature and has its origin of replication marked. pTet% Figure 27: Promoter stored in a plasmid as in Figure 26, except that the restriction sites before and after the promoter are blunt-end.

Section Contents
Page 73 Figure 28: Promoter stored in a plasmid as in Figure 26, except that the cut structure of the restriction sites before and after the promoter is not specified.
pTet Figure 29: Promoter stored in a plasmid as in Figure 26, except that there is a ribonuclease site after the promoter rather than restriction sites flanking it.

Section B. Examples
AAA gfp Flp Figure 34: Promoter regulating the expression of GFP, which is also regulated by an aptamer between it and the poly-A tail of the transcript. The promoter can be cut out by a pair of recombinase target sites, which are acted on by the Flp protein.
The whole construct is stored in a circular plasmid with an origin of replication and also an origin of transfer.  • Glyphs include information on interior, bounding box, and recommended backbone alignment. 6 • Sequence feature glyphs are required to have their bounding boxes contact the nucleic acid backbone. 7 • Nucleic acid diagrams now require the nucleic acid backbone line, and the number of lines allowed in 8 various circumstances is constrained.

9
• Explicit statement of when a glyph can and cannot be used to represent a particular element of a 10 diagram.

11
■ Labels that name objects are distinguished from other types of textual annotation.

12
■ Explicit statement of which aspects of a symbol are not controlled.

13
■ Symbol variants are now supported.
14 In addition, the collection of sequence feature glyphs have been expanded and modified in the following ways: This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License (http://creativecommons.org/licenses/by-nc-nd/3.0/).