Synthetic biology open language visual (SBOL visual) version 2.2

Abstract People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.2 of SBOL Visual, which builds on the prior SBOL Visual 2.1 in several ways. First, the grounding of molecular species glyphs is changed from BioPAX to SBO, aligning with the use of SBO terms for interaction glyphs. Second, new glyphs are added for proteins, introns, and polypeptide regions (e. g., protein domains), the prior recommended macromolecule glyph is deprecated in favor of its alternative, and small polygons are introduced as alternative glyphs for simple chemicals.


Purpose
People who engineer biological organisms often find it useful to draw diagrams in order to communicate both the 2 structure of the nucleic acid sequences that they are engineering and the functional relationships between sequence 3 features and other molecular species. Some typical practices and conventions have begun to emerge for such 4 diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language 5 for expressing the structure and function of genetic designs. At the same time, we aim to make this language simple 6 and easy to use, allowing a high degree of flexibility and freedom in how such diagrams are organized, presented, 7 and styled-in particular, it should be readily possible to create diagrams either by hand or using a wide variety of 8 software programs. Finally, means are provided for extending the language with new and custom diagram elements, 9 and for adoption of useful new elements into the language. In order to ground SBOL Visual with precise definitions, we reference its visual elements to data models with 12 well-defined semantics. In particular, glyphs in SBOL Visual are defined in terms of their relation to the SBOL 2 13 data model (as defined in BBF RFC 112) and terms in the Sequence Ontology (Eilbeck et al., 2005) and the Systems 14 Biology Ontology (Courtot et al., 2011). 15 SBOL Visual is not intended to represent designs at the same level of detail as these data models. Effective visual 16 diagrams are necessarily more abstract, focusing only on those aspects of a system that are the subject of the 17 communication. Nevertheless, we take as a principle that it should be possible to transform any SBOL Visual 18 diagram into an equivalent (if highly abstract) SBOL 2 data representation. Likewise, we require that SBOL Visual 19 should be able to represent all of the significant structural or functional relationships in any GenBank or SBOL data 20 representation.

26
Every glyph in SBOL Visual 2.2 corresponds to an element of the SBOL 2.3 data model. SBOL Visual 2.2 also defines 27 many terms by reference to SBOL 2.3, or by reference to the Sequence Ontology (Eilbeck et al., 2005) or the Systems 28 Biology Ontology (Courtot et al., 2011). 29 SBOL Visual is intended to be compatible with the Systems Biology Graphical Notation Activity Flow Language 30 (SBGN AF) (Le Novère et al., 2009), and species and interaction glyphs have been imported from that language (see: This document indicates requirement levels using the controlled vocabulary specified in IETF RFC 2119 and 3 reiterated in BBF RFC 0. In particular, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 4 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted 5 as described in RFC 2119: 6 ■ The words "MUST", "REQUIRED", or "SHALL" mean that the item is an absolute requirement of the specifica-7 tion.
8 ■ The phrases "MUST NOT" or "SHALL NOT" mean that the item is an absolute prohibition of the specification. 9 ■ The word "SHOULD" or the adjective "RECOMMENDED" mean that there might exist valid reasons in 10 particular circumstances to ignore a particular item, but the full implications need to be understood and 11 carefully weighed before choosing a different course.

12
■ The phrases "SHOULD NOT" or "NOT RECOMMENDED" mean that there might exist valid reasons in 13 particular circumstances when the particular behavior is acceptable or even useful, but the full implications 14 need to be understood and the case carefully weighed before implementing any behavior described with this 15 label.

16
■ The word "MAY" or the adjective "OPTIONAL" mean that an item is truly optional.

1
A glyph is a visual symbol used to represent an element in an SBOL Visual diagram. All of the currently defined 2 glyphs are collected in Appendix A. This section explains how glyphs are specified and how to add new glyphs.

3
Each SBOL glyph is defined by association with ontology terms, and can be used to represent any diagram element 4 that is well-described by that term. Currently there are three classes of glyphs, each associated with a different 5 ontology and different class in the SBOL 2 data model: Ontology terms. For the SBOL 2 data model, this is formally defined as any Component with a compatible term 8 within its associated roles, i.e., one that is equal to or a child of at least one term associated with the glyph.

9
■ Molecular Species Glyphs represent any class of molecule whose detailed structure is not being shown using 10 sequence feature glyphs. They are associated with Systems Biology Ontology terms. For the SBOL 2 data 2.2.0 11 model, this is formally defined as any FunctionalComponent with a compatible term within its associated 12 types, i.e., one that is equal to or a child of at least one term associated with the glyph.

13
■ Interaction Glyphs are "arrows" indicating functional relationships between sequence features and/or molec-14 ular species. They are associated with Systems Biology Ontology terms. For the SBOL 2 data model, this is 15 formally defined as any Interaction with a compatible term within its types, i.e., one that is equal to or a 16 child of at least one term associated with the glyph, and with a compatible Participation at the head and 17 tail of the arrow.

18
More than one glyph may share the same definition: in this case, these glyphs form a family of variants, of which 19 precisely one MUST be designated as the RECOMMENDED glyph, which is to be used unless there are strong 20 reasons to prefer an alternative variant.

21
It will also frequently be the case that a diagram element could be represented by more than one glyph (e.g., a glyph 22 for a specific term and a glyph for a more general term be represented by either a pentagonal glyph or an arrow glyph, but the pentagon is the RECOMMENDED variant, 29 and so it is likewise preferred. Figure 1 illustrates this example.

30
SHOULD gfp MAY gfp SHOULD NOT Figure 1: A biological design element such as a protein coding sequence (CDS) is best represented by the most specific RECOMMENDED glyph (middle), but can be represented by a less specific glyph such as Unspecified (left) or an approved alternative glyph (right). SHOULD be accompanied by a request to the ontology maintainers to add a term for the undefined entity. 8 2. A glyph SHOULD be relatively easy to sketch by hand (e.g., no high-complexity images or precise angles 9 required).  4. A glyph specification SHOULD show the glyph in its preferred relative scale with respect to other glyphs. 12 5. A glyph SHOULD be specified using only solid black lines (leaving color and style to be determined by the 13 user, as noted below). 14 6. A glyph SHOULD NOT be similar enough to be easily confused with any other glyph when written by hand, or 15 when scaled either vertically, horizontally, or both. 16 7. A glyph SHOULD NOT include text (note that associated labels are not part of the glyph).

17
In addition, some requirements apply only to certain classes of glyphs: 11. If a sequence feature glyph can represent components of highly variable size or structural complexity, the 25 glyph SHOULD be able to be scaled horizontally to indicate relative property value.  3. The scale of glyphs are RECOMMENDED to be kept consistent with their specification and throughout 9 a diagram, but can be altered if desired, particularly to convey additional information (e.g., length of a 10 sequence). 4. Minor styling effects MAY be chosen (e.g., shadow, corner styling, other "font-level" customization) 12 Figure 3 shows some examples of acceptable style variation.

13
In certain special cases, the style of a glyph may be more constrained, but such cases are expected to be rare and 14 strongly motivated.

16
The collection of SBOL Visual glyphs is not expected to provide complete coverage of all of the types of element that 17 people will wish to include in genetic diagrams, particularly given the ongoing evolution of synthetic biology as 18 an engineering discipline. As the need for new diagram elements or new practices of usage emerge, new glyphs or 19 glyph definitions are expected to be added to SBOL Visual. In particular, the following three classes of changes are 20 expected to occur regularly, and the SBOL development community will maintain clear processes for proposal and  An SBOL Visual diagram represents information about the structure of a nucleic acid design and its associated 2 molecular species and interactions. If desired, an SBOL Visual diagram may also be associated with a machine-3 interpretable model (e.g., in SBOL, GenBank, or SBML format). In this document we describe the association 4 for the SBOL 2 data model, which provides a formal semantic grounding for all elements of an SBOL Visual 5 diagram, but equivalent associations may be made between diagram elements and other models. In terms of 6 the SBOL 2 data model, the description of a nucleic acid design is formally defined as a representation of a 7

SBOL Visual Diagram Language
ComponentDefinition with a nucleic acid type, the Component and SequenceAnnotation objects describing the  shown. Examples are provided in Figure 8.

11
A glyph in contact with a nucleic acid backbone indicates a feature of the nucleic acid sequence. In terms of the 12 SBOL 2 data model, this is either a SequenceFeature or a Component with a nucleic acid type that is contained   3. Nucleic acid features in a sequential relationship SHOULD be drawn from 5' left to 3' right on the inline 4 strand and from 5' right to 3' left on the reverse complement strand. In terms of the SBOL 2 data model, this 5 indicates a SequenceConstraint on the relative ordering of two features. overlap. An example is provided in Figure 12. 6. A nucleic acid feature SHOULD be represented using a glyph defined in Appendix A.1. In this case, the feature 3 MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data model, 4 this means the glyph is equal to or a parent of at least one of the roles for the Component or its associated 5 ComponentDefinition. Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most 6 specific applicable glyph. Note that novel glyphs not defined in Appendix A.1 MAY be used, but SHOULD be 7 proposed for adoption as described in Section 4.3. Examples are provided in Figure 14. sequence of promoter, ribosome entry site, CDS, and terminator: (a) is RECOMMENDED because it uses the preferred variant of the most specific defined glyphs, (b) is allowed because it uses some novel custom non-conflicting symbol, not matching any glyph defined in this document, to encode more specific information about the particular CDS, (c) is recommended against because it uses less specific glyphs, and (d) is forbidden because it use a promoter symbol to represent the terminator.

Molecular Species 9
A glyph that is not in contact with any backbone represents any class of molecule whose detailed structure is not 10 being shown using sequence feature glyphs. In other words, either not a nucleic acid (e.g., proteins, small molecules) 11 or else an "uninteresting" nucleic acid (e.g., showing a transcribed mRNA, but not the features of its sequence).

12
In terms of the SBOL 2 data model, this is a FunctionalComponent that is contained within a ModuleDefinition 13 implicit in the diagram. 14 1. A molecular species glyph MUST NOT contact any nucleic acid backbone with any part of its bounding box.
15 2. A molecular species SHOULD be represented using a glyph defined in Appendix A.2. In this case, the species 16 MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data model, this 17 means the glyph is equal to or a parent of at least one of the types for the associated ComponentDefinition.

18
Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most specific applicable glyph.

Section 5. SBOL Visual Diagram Language
Note that novel glyphs not defined in Appendix A.2 MAY be used, but SHOULD be proposed for adoption as 1 described in Section 4.3. 2

3
A directed edge "arrow" attached to one or more glyphs indicates a functional interaction involving those elements.

4
The roles of the elements is indicated by their position at the head or tail of the edge. In terms of the SBOL 2 data 5 model, this is an Interaction, with either one or two Participation relationships, their role set by position at 6 the head or tail of the edge. An example is provided in Figure 15. 1. Two interaction edges SHOULD NOT cross one another. When edges cross, they MUST indicate the distinction 8 between arrows with a crossover pattern, in which one edge "diverts" at the intersection (see Figure 16).

9
Examples are provided in Figure 17.  : Examples of recommended, allowed, and forbidden relationships between two interactions in a mutual repression system: (a) non-crossing is recommended, (b) using a crossover pattern is allowed, but (c) crossing without a crossover pattern is forbidden, since the relationship between the two edges is ambiguous.
2. An interaction SHOULD be represented using a glyph defined in Appendix A.3. In this case, the interaction 11 type MUST be contained within at least one of the glyph's associated terms. In terms of the SBOL 2 data 12 model, this means the glyph is equal to or a parent of at least one of the types for the Interaction, and that 13 each associated Participation object has a role compatible with its position on the head or tail of the edge.
14 Moreover, the glyph used SHOULD be the RECOMMENDED variant of the most specific applicable glyph.

15
Note that novel glyphs not defined in Appendix A.3 MAY be used, but SHOULD be proposed for adoption as 16 described in Section 4.3.

Section 5. SBOL Visual Diagram Language
3. An edge may have multiple heads or multiple tails. In this case, a split or join in an edge represents either 1 multiple participants with the same role (e.g., a transcription factor repressing two instances of a promoter) 2 or a biochemical process (e.g., association of an inducible protein and a small molecule to form an active 3 complex). An edge with multiple heads MUST use the same glyph for each head. An edge that splits or joins 4 with no glyph at the junction represents multiple participants with the same role. A glyph at the point where 5 an edge splits or joins represents a biochemical process, i.e., an additional Interaction with type and roles 6 set by the process glyph. Examples are provided in Figure 18. 4. A biochemical process represented by a glyph at an edge junction SHOULD be represented using a glyph 8 defined in Appendix A.4. In this case, the interaction type MUST be contained within at least one of the glyph's 9 associated terms. In terms of the SBOL 2 data model, this means the glyph is equal to or a parent of at least one 10 of the types for the Interaction, and that each associated Participation object has a role compatible 11 with its position on the head or tail of the edge. Moreover, the glyph used SHOULD be the RECOMMENDED 12 variant of the most specific applicable glyph. Note that novel glyphs not defined in Appendix A.4 MAY be used, 13 but SHOULD be proposed for adoption as described in Section 4.3. 14 2.1.0

15
A module within a system MAY be represented by a visual boundary in the form of closed polygon or closed curve. : Examples of recommended and problematic module boundaries: (a) two modules with visually distinct rectangular borders, (b) shows the same modules but with rounded rectangles and the second being a "black box" module with no internal structure shown, (c) shows modules with non-rectilinear borders, and (d) shows a black-box module that is not visually distinct from a sequence feature glyph.
2. An undirected edge (i.e., having no "arrow head") that crosses the boundary of a module represents a mapping 6 associating the diagram elements that it links. Glyphs associated by a mapping MUST either be sequence 7 features, molecular species, or module ports (see below), and must be of compatible types. In terms of The name of any object in a diagram is RECOMMENDED to be displayed as text within, adjacent to, or otherwise 16 clearly visually connected to the object's associated glyph. In terms of the SBOL 2 data model, this is the name 17 property, and if no name is supplied then the displayId MAY be used instead. Examples are provided in Figure 23.

19
Other text or graphics may be included as annotations with no constraint on their syntax or semantics.

Recommended Glyph and Alternates
The ribosome entry promoter glyph is a half-ovoid sitting on the backbone, suggesting an attached ribosome beginning transcription:  gfp pTet Figure 25: The same functional unit as in Figure 24, with additional assembly-focused information: there is a 5' overhang before the promoter, a 3' overhand after the terminator, and an assembly scar between the promoter and the ribosome entry site left over from a prior step of assembly. pTet Figure 26: Promoter pTet stored in a circular plasmid. The promoter is prepared for being cut out of the plasmid: it is preceded by a 5' sticky end restriction site and followed by a 3' stick end restriction site. In addition, the plasmid has been bar-coded with a signature and has its origin of replication marked. pTet% Figure 27: Promoter stored in a plasmid as in Figure 26, except that the restriction sites before and after the promoter are blunt-end.  Figure 26, except that the cut structure of the restriction sites before and after the promoter is not specified.

Prototypical Example
pTet Figure 29: Promoter stored in a plasmid as in Figure 26, except that there is a ribonuclease site after the promoter rather than restriction sites flanking it.

Section B. Examples
AAA gfp Flp Figure 34: Promoter regulating the expression of GFP, which is also regulated by an aptamer between it and the poly-A tail of the transcript. The promoter can be cut out by a pair of recombinase target sites, which are acted on by the Flp protein.
The whole construct is stored in a circular plasmid with an origin of replication and also an origin of transfer.  • Glyphs include information on interior, bounding box, and recommended backbone alignment. 6 • Sequence feature glyphs are required to have their bounding boxes contact the nucleic acid backbone. 7 • Nucleic acid diagrams now require the nucleic acid backbone line, and the number of lines allowed in 8 various circumstances is constrained.

9
• Explicit statement of when a glyph can and cannot be used to represent a particular element of a 10 diagram.

11
■ Labels that name objects are distinguished from other types of textual annotation.

12
■ Explicit statement of which aspects of a symbol are not controlled.

13
■ Symbol variants are now supported.
14 In addition, the collection of sequence feature glyphs have been expanded and modified in the following ways: This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License (http://creativecommons.org/licenses/by-nc-nd/3.0/).