Synthetic biology open language (SBOL) version 3.0.0

Abstract Synthetic biology builds upon genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. When designing a synthetic system, synthetic biologists need to exchange information about multiple types of molecules, the intended behavior of the system, and actual experimental measurements. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, following an open community process involving both wet bench scientists and dry scientific modelers and software developers, across academia, industry, and other institutions. This document describes SBOL 3.0.0, which condenses and simplifies previous versions of SBOL based on experiences in deployment across a variety of scientific and industrial settings. In particular, SBOL 3.0.0, (1) separates sequence features from part/sub-part relationships, (2) renames Component Definition/Component to Component/Sub-Component, (3) merges Component and Module classes, (4) ensures consistency between data model and ontology terms, (5) extends the means to define and reference Sub-Components, (6) refines requirements on object URIs, (7) enables graph-based serialization, (8) moves Systems Biology Ontology (SBO) for Component types, (9) makes all sequence associations explicit, (10) makes interfaces explicit, (11) generalizes Sequence Constraints into a general structural Constraint class, and (12) expands the set of allowed constraints.


D R A F T
Synthetic biology builds upon genetics, molecular biology, and metabolic engineering by applying engineering 2 principles to the design of biological systems. When designing a synthetic system, synthetic biologists need 3 to exchange information about multiple types of molecules, the intended behavior of the system, and actual 4 experimental measurements. Furthermore, there are often multiple aspects to a design such as a specified nucleic 5 acid sequence (e.g., a sequence that encodes an enzyme or transcription factor), the molecular interactions that 6 a designer intends to result from the introduction of this sequence (e.g., chemical modification of metabolites or 7 regulation of gene expression), and the experiments and data associated with the system. All these perspectives 8 need to be connected together to facilitate the engineering of biological systems. 9 The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and 10 exchange of biological design information in synthetic biology, following an open community process involving 11 both "wet" bench scientists and "dry" scientific modelers and software developers, across academia, industry, and 12 other institutions. Previous nucleic acid sequence description formats lack key capabilities relative to SBOL, as 13 shown in Figure 1. Simple sequence encoding formats such as FASTA encode little besides sequence information. 14 More sophisticated formats such as GenBank and Swiss-Prot provide a flat annotation of sequence features that is 15 well suited to describing natural systems but unable to represent the functional relations and multi-layered design 16 structure common to engineered systems. Modeling languages, such as the Systems Biology Markup Language 17 (SBML) Hucka et al. (2003), can be used represent biological processes, but are not sufficient to represent the 18 associated nucleotide or amino acid sequences. SBOL covers both of these needs, by providing a modular and 19 hierarchical representation of the structure and function of a genetic design, as well as its relationship to and use 20 within experiment plans, data, models, etc.

21
SBOL uses existing Semantic Web practices and resources, such as Uniform Resource Identifiers (URIs) and ontologies, 22 to unambiguously identify and define biological system elements, and to provide serialization formats for encoding 23 this information in electronic data files. The SBOL standard further describes the rules and best practices on how to 24 use this data model and populate it with relevant design details. The definition of the data model, the rules on the 25 addition of data within the format, and the representation of this in electronic data files are intended to make the 26 SBOL standard a useful means of promoting data exchange between laboratories and between software programs.    The SBOL effort was started in 2006 with the goal of developing a data exchange standard for genetic designs. 2 Herbert Sauro (University of Washington) secured a grant from Microsoft in the field of computational synthetic 3 biology, which was used to fund the initial meeting in Seattle on April 26-27, 2008. This workshop was organized  Toolchains, and Zymergen.

19
Discussions related to SBOL 3 began at the COMBINE meetings and on the mailing list beginning in the summer 20 of 2018. Over the next year and a half, several SBOL Enhancement Proposals (SEPs) were written and discussed.

21
During the early months of 2020, these SEPs were voted on and approved by the SBOL community. The initial 22 version of the SBOL 3 specification was drafted during HARMONY 2020 at the European Bioinformatics Institute 23 (EBI) in Hinxton, United Kingdom in March 2020.

24
The authors would also like to thank Michael Hucka for developing the LaTeX style file used to develop this 25 document (Hucka, 2017). Synthetic biology designs can be described using: 2 ■ Structural terms, e.g., a set of annotated sequences or information about the chemical makeup of components.
3 ■ Functional terms, e.g., the way that components might interact with each other.

4
As an example, consider an expression cassette, such as the one found in the plasmid pUC18 Norrander et al. (1983).

5
The system is designed to visually indicate whether a gene has been inserted into the plasmid: in the presence of 6 IPTG, it expresses an enzyme that hydrolyses X-gal to form a blue product, but successful insertion disrupts the 7 expression cassette and prevents the formation of this product. Internally, it has a number of parts, including a 8 promoter, the lac repressor binding site, and the lacZ coding sequence. These parts have specific component-level 9 interactions with IPTG and X-gal, as well as native host gene products, transcriptional machinery, and translational 10 machinery that collectively cause the desired system-level behavior.

11
In SBOL 3, both the structural and functional aspects are described using a class called Component, as depicted in 12 Figure 2. Namely, to represent structural aspects, a Component can include Features, some of which may be at some Participations, such as how IPTG and X-gal interact with the gene products. Finally, a Component object can 4 point to a Model object that provides a reference to a complete computational model expressed in a language such 5 as SBML Hucka et al. (2003), CellML Cuellar et al. (2003), or MATLAB MathWorks (2015.

6
Whereas Figure 2 provides an overview of the classes used for describing designs within the SBOL 3 data model, 7 Figure 3 shows the rest of the classes used to describe the usage of a design within design-build-test-learn workflows 8 in general. In particular, designs can be expressed using CombinatorialDerivations, Components, and Sequences.

9
These can describe not only genetic designs, but also designs for strains, multicellular systems, media, samples,  Figure 3: Main classes of information represented by the SBOL 3 standard, and their relationships. Green boxes represent design classes, orange boxes represent build classes, purple boxes represent test classes, yellow boxes represent learn classes, and the gray boxes represent additional utility classes. Each of these classes will be described in more detail below.

1
This section provides some preliminary information to aid in the understanding of the specification. The SBOL 2 data model is specified using Unified Modeling Language (UML) 2.0 diagrams (OMG 2005). This section reviews 3 terminology conventions, the basics of UML diagrams, and our naming conventions. This document indicates requirement levels using the controlled vocabulary specified in IETF RFC 2119. In 6 particular, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD 7 NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 8

2119.
9 ■ The words "MUST", "REQUIRED", or "SHALL" mean that the item is an absolute requirement.

10
■ The phrases "MUST NOT" or "SHALL NOT" mean that the item is an absolute prohibition.

11
■ The word "SHOULD" or the adjective "RECOMMENDED" mean that there might exist valid reasons in 12 particular circumstances to ignore a particular item, but the full implications need to be understood and 13 carefully weighed before choosing a different course.
14 ■ The phrases "SHOULD NOT" or "NOT RECOMMENDED" mean that there might exist valid reasons in 15 particular circumstances when the particular behavior is acceptable or even useful, but the full implications 16 needs to be understood and the case carefully weighed before implementing any behavior described with this 17 label.
18 ■ The word "MAY" or the adjective "OPTIONAL" mean that an item is truly optional.

20
The types of biological design data modeled by SBOL are commonly referred to as classes, especially when discussing 21 the details of software implementation. Each SBOL class can be instantiated by many SBOL objects. These objects

22
MAY contain data that differ in content, but they MUST agree on the type and form of their data as dictated by their 23 common class. Classes are represented in UML diagrams as rectangles labeled at the top with class names (see 24 Figure 4 for examples). property can possess (see below). The remaining (non-association) properties of a class are listed below its name. Each of the latter properties is labeled with its data type and cardinality. 2 In the case of an association property, the class from which the arrow originates is the owner of the association 3 property. A diamond at the origin of the arrow indicates the type of association. Open-faced diamonds indicate 4 shared aggregation, also known as a reference, in which the owner of the association property exists independently 5 of its value.

6
By contrast, filled diamonds indicate composite aggregation, also known as a part-whole relationship, in which the 7 value of the association property MUST NOT exist independently of its owner. In addition, in the SBOL data model, 8 it is REQUIRED that the value of each composite aggregation property is a unique SBOL object (that is, not the value 9 for more than one such property). Note that in all cases, composite aggregation is used in such a way that there 10 SHOULD NOT be duplication of such objects. Such objects are also commonly referred to as "child" objects, and 11 their owning objects as "parent" objects.

12
All SBOL properties are labeled with one of several restrictions on data cardinality. These are:

24
SBOL classes are named using upper "camel case," meaning that each word is capitalized and all words are run 25 together without spaces, e.g., Identified, SequenceFeature. Properties, on the other hand, are named using 26 lower camel case, meaning that they begin lowercase (e.g., role) but if they consist of multiple words, all words after 27 the first begin with an uppercase letter (e.g., roleIntegration). SBOL properties are always given singular names 28 irrespective of their cardinality, e.g., role is used rather than role even though a component can have multiple 29 roles. This is because each relation can potentially stand on its own, irrespective of the existence of others in the set.  The term literal is used to denote an object that can be any of the five types listed above.

Section 5. Identifiers and Primitive Types
In addition to the simple types listed above, SBOL also uses objects with types Uniform Resource Identifier (URI). It 1 is important to realize that in RDF, a URI might or might not be a resolvable URL (web address). A URI is always a 2 globally unique identifier within a structured namespace. In some cases, that name is also a reference to (or within) 3 a document, and in some cases that document can also be retrieved (e.g., using a web browser). The section describes the SBOL data model in detail. Best practices when using the standard can be found in 2 Section 7. All SBOL-defined classes are directly or indirectly derived from the Identified abstract class. This inheritance 5 means that all SBOL objects are uniquely identified using URIs that uniquely refer to these objects within an SBOL 6 document or at locations on the World Wide Web. 7 As shown in Figure 5, the Identified class includes the following properties: displayId, name, description, The displayId property is an OPTIONAL identifier with a data type of String. This property is intended to be an 11 intermediate between a URI and the name property that is machine-readable, but more human-readable than the 12 full URI of an object.

13
If the displayId property is used, then its String value MUST be composed of only alphanumeric or underscore 14 characters and MUST NOT begin with a digit.

15
Note that for objects whose URI is a URL, the requirements on URL structure in Section 5.1 imply that the displayId 16 MUST be set.

17
The name property 18 The name property is OPTIONAL and has a data type of String. This property is intended to be displayed to a 19 human when visualizing an Identified object.

20
If an Identified object lacks a name, then software tools SHOULD instead display the object's displayId or URI.

21
It is RECOMMENDED that software tools give users the ability to switch perspectives between name properties that 22 are human-readable and displayId properties that are less human-readable, but are more likely to be unique.

23
The description property 24 The description property is OPTIONAL and has a data type of String. This property is intended to contain a more 25 thorough text description of an Identified object.

26
The prov:wasDerivedFrom property 27 An Identified object can have zero or more prov:wasDerivedFrom properties, each of type URI. This property is 28 defined by the PROV-O ontology and is located in the https://www.w3.org/TR/prov-o/ namespace (Reference: Section A.1).

1
An SBOL object with this property refers to one or more SBOL objects or non-SBOL resources from which this 2 object was derived. An SBOL object MUST NOT refer to itself via its own prov:wasDerivedFrom property or form a 3 cyclical chain of references via its prov:wasDerivedFrom property and those of other SBOL objects. For example, 4 the reference chain "A was derived from B and B was derived from A" is cyclical.

5
The prov:wasGeneratedBy property 6 An Identified object can have zero or more prov:wasGeneratedBy properties, each of type URI. This property is 7 defined by the PROV-O ontology and is located in the https://www.w3.org/TR/prov-o/ namespace (Reference: 8 Section A.1).

9
An SBOL object with this property refers to one or more prov:Activity objects that describe how this object 10 was generated. Provenance history formed by prov:wasGeneratedBy properties of Identified objects and entity 11 references in prov:Usage objects MUST NOT form circular reference chains.

12
The hasMeasure property 13 An Identified object can have zero or more hasMeasure properties, each of type URI. This property is defined 14 by the OM ontology and is located in the http://www.ontology-of-units-of-measure.org/resource/om-2/ 15 namespace (Reference: Section A.2).

16
An SBOL object with this property refers to one or more om:Measure objects that describe measured parameters for 17 this object.

19
TopLevel is an abstract class that is extended by any Identified class that can be found at the top level of an SBOL Component, Model, Collection, CombinatorialDerivation, Implementation, Attachment, ExperimentalData, 25 prov:Activity, prov:Agent, prov:Plan (see Figure 6). Each of these classes is described in more detail below, 26 except for the classes from the provenance ontology (PROV-O), which are described in Section A.1.

27
The hasAttachment property 28 A TopLevel object can have zero or more hasAttachment properties, each of type URI specifying an Attachment 29 object. The Attachment class is described in more detail in Section 6.10.

31
The purpose of the Sequence class is to represent the primary structure of a Component object and the manner 32 in which it is encoded. This representation is accomplished by means of the elements property and encoding 33 property ( Figure 7).

34
The elements property 35 The elements property is an OPTIONAL String of characters that represents the constituents of a biological or 36 chemical molecule. For example, these characters could represent the nucleotide bases of a molecule of DNA, the 37 amino acid residues of a protein, or the atoms and chemical bonds of a small molecule.

38
If the elements property is not set, then it means the particulars of this Sequence have not yet been determined. The encoding property has a data type of URI, and is OPTIONAL unless elements is set, in which case it is RE-2 QUIRED. This property MUST indicate how the elements property of a Sequence are formed and interpreted.

3
For example, the elements property of a Sequence with an IUPAC DNA encoding property MUST contain characters 4 that represent nucleotide bases, such as a, t, c, and g. The elements property of a Sequence with a Simplified 5 Molecular-Input Line-Entry System (SMILES) encoding, on the other hand, MUST contain characters that represent 6 atoms and chemical bonds, such as C, N, O, and =.
7 Table 1 provides a list of possible URI values for the encoding property. The terms in Table 1 are organized by the 8 type of Component (see Table 2) that typically refer to a Sequence with such an encoding. It is RECOMMENDED 9 that the encoding property of a Sequence contains a URI from  Table 1: URIs for specifying the encoding property of a Sequence, organized by the type of Component (see Table 2) that typically refer to a Sequence with such an encoding.

5
The Component class represents the structural and/or functional entities of a biological design. The primary usage 6 of this class is to represent entities with designed sequences, such as DNA, RNA, and proteins, but it can also be 7 used to represent any other entity that is part of a design, such as simple chemicals, molecular complexes, strains, 8 media, light, and abstract functional groupings of other entities.

9
As shown in Figure 8, the Component class describes a design entity using the following properties: type, role, 10 hasSequence, hasFeature, hasConstraint, hasInteraction, hasInterface, and hasModel. The hasSequence, 11 hasFeature, and hasConstraint properties are used to represent structural information, while the 12 hasInteraction, hasInterface, and hasModel are used to represent functional information. The type property 14 A Component is REQUIRED to have one or more type properties, each of type URI specifying the category of 15 biochemical or physical entity (for example DNA, protein, or simple chemical) that a Component object abstracts for 16 the purpose of engineering design. For DNA or RNA entities, additional type properties MAY be used to describe 17 nucleic acid topology (circular / linear) and strandedness (double-or single-stranded).

18
The type properties of every Component MUST include one or more URIs that MUST identify terms from appropriate 19 ontologies, such as the physical entity representation branch of the Systems Biology Ontology Courtot et al. (2011) 20 or the ontology of Chemical Entities of Biological Interest (ChEBI) Degtyarenko et al. (2008). In order to maximize 21 the compatibility of designs, the type property of a Component SHOULD contain a URI from the physical entity 22 representation branch of the Systems Biology Ontology Courtot et al. (2011). Table 2 provides a partial list of 23 ontology terms and their URIs, and any Component that can be well-described by one of the terms in Table 2 Table 2: Partial list of the most common SBO terms to specify the molecule type using the type property of a Component. Systems of multiple interacting molecules (e.g., a plasmid expressing a protein) should use the functional entity type.
Nucleic Acid Topology types 11 Any Component classified as DNA (see Table 2) is RECOMMENDED to encode circular/linear topology information 12 in an additional type field. This (topology) type field SHOULD specify a URI from the Topology Attribute branch 13 of the Sequence Ontology (SO): this is currently just 'linear' or 'circular' as given in Table 3 or if topology is genuinely unknown. For any Component classified as RNA (see Table 2), a topology type field is 17 OPTIONAL. The default assumption in this case is linear topology. In any case, conflicting topologies MUST NOT be 18 specified.

19
Any Component classified as DNA or RNA MAY also have strand information encoded in an additional (third) type 20 field using a URI from the Strand Attribute branch of the SO (currently there are only two possible terms for single 21 or double-stranded nucleic acids, given in Table 3). In absence of this field, the default strand information assumed 22 for DNA is 'double-stranded' and for RNA is 'single-stranded'. 23 Any other type of Component record (protein, simple chemical, etc.) SHOULD NOT have any type field pointing to 24 SO terms from the topology or strand attribute branches of SO.

25
Note that a circular topology instructs software to interpret the beginning / end position of a given sequence (  Component could contain URIs identifying terms from the Sequence Ontology (SO). As a best practice, a DNA or 4 RNA Component SHOULD contain exactly one URI that refers to a term from the sequence feature branch of the 5 SO. Similarly, the role properties of a protein and simple chemical Component SHOULD respectively contain URIs 6 identifying terms from the MolecularFunction (GO:0003674) branch of the Gene Ontology (GO) and the role 7 (CHEBI:50906) branch of the CHEBI ontology. Table 4 contains a partial list of possible ontology terms for the role 8 properties and their URIs. These terms are organized by the type of Component to which they SHOULD apply (see 9   Table 2). Any Component that can be well-described by one of the terms in Table 4 MUST use the URI for that term 10 as a role.

11
These URIs might identify descriptive biological roles, such as "metabolic pathway" and "signaling cascade," but 12 they can also identify identify "logical" roles, such as "inverter" or "AND gate", or other abstract roles for describing 13 the function of design. Interpretation of the meaning of such roles currently depends on the software tools that read 14 and write them.  Table 4: Partial list of ontology terms to specify the role property of a Component, organized by the type of Component to which they are intended to apply (see Table 2).
The hasSequence property 27 A Component MAY have any number of hasSequence properties, each of type URI, that MUST reference a Sequence 28 object (see Section 6.3). These objects define the primary structure or structures of the Component.

29
If a Feature of a Component refers to a Location, and this Location refers to a Sequence, then the Component

30
MUST also include a hasSequence property that refers to this Sequence.

31
Many Component objects will have exactly one hasSequence property that refers to a Sequence object. In this 32 case, if its has a type from Table 2 and there is an encoding that is cross-listed with this term in object (see Section 6.4.1). The set of relations between Feature and Component objects MUST be strictly acyclic.

39
Taking the Component class as analogous to a blueprint or specification sheet for a biological part or a system of 40 interacting biological elements, the Feature class represents the specific occurrence of a part, subsystem, or other notable aspect within that design. This mechanism also allows a biological design to include multiple instances 1 of a particular part (defined by reference to the same Component). For example, the Component of a polycistronic 2 gene could contain two SubComponent objects that refer to the same Component of a CDS. As another example, 3 consider the Component for a network of two-input repressor devices in which the particular repressors have not yet 4 been chosen. This Component could contain multiple SubComponent objects that refer to the same Component of an 5 abstract two-input repressor device.

6
The hasFeature properties of Component objects can be used to construct a hierarchy of SubComponent and 7 Component objects. If a Component in such a hierarchy refers to a Location object, and there exists a Component 8 object lower in the hierarchy that refers to a Location object that refers to the same Sequence with the same 9 encoding, then the elements properties of these Sequence objects SHOULD be consistent with each other, such 10 that well-defined mappings exist from the "lower level" elements to the "higher level" elements in accordance 11 with their shared encoding properties. This mapping is also subject to any restrictions on the positions of the 12 Feature objects in the hierarchy that are imposed by the SubComponent, SequenceFeature, or Constraint objects 13 contained by the Component objects in the hierarchy.
14 For example, in a plasmid Component with a promoter SubComponent, the sequence at the promoter's Location 15 within the plasmid should be the sequence for the promoter. More concretely, consider DNA Component that 16 refers to a Sequence with an IUPAC DNA encoding and an elements String of "gattaca." In turn, this Component 17 could contain a SubComponent that refers to a "lower level" Component that also refers to a Sequence with an IUPAC 18 DNA encoding. Consequently, a consistent elements String of this "lower level" Sequence could be "gatta," or 19 perhaps "tgta" if the SubComponent is positioned by a Location with an orientation of "reverse complement" 20 (see Section 6.4.2).

21
The hasConstraint property

22
A Component MAY have any number of hasConstraint properties, each of type URI, that MUST reference a 23 Constraint object (see Section 6.4.3). These objects describe, among other things, any restrictions on the relative, 24 sequence-based positions and/or orientations of the Feature objects contained by the Component, as well as spatial 25 relations such as containment and identity relations. For example, the Component of a gene might specify that 26 the position of its promoter SubComponent precedes that of its CDS SubComponent. This is particularly useful 27 when a Component lacks a Sequence and therefore cannot specify the precise, sequence-based positions of its 28 SubComponent objects using Location objects.

29
The hasInteraction property 30 A Component MAY have any number of hasInteraction properties, each of type URI, that MUST reference an 31 Interaction object (see Section 6.4.4).

32
The Interaction class provides an abstract, machine-readable representation of behavior within a Component

33
(whereas a more detailed model of the system might not be suited to machine reasoning, depending on its im-34 plementation). Each Interaction contains Participation objects that indicate the roles of the Feature objects 35 involved in the Interaction.

36
The hasInterface property 37 A Component MAY have zero or one hasInterface property of type URI that MUST reference an Interface object 38 (see Section 6.4.5).

39
An Interface object indicates the inputs, outputs, and non-directional points of connection to a Component.

40
The hasModel property 41 A Component MAY have any number of hasModel properties, each of type URI, that MUST reference a Model object 42 (see Section 6.8).

43
object can link to more than one Model since each might encode system behavior in a different way or at a different 1 level of detail. Feature in the context of its parent Component. If the role for a SubComponent is left unspecified, then the role is 8 determined by the role property of the Component that it is an instanceOf. If provided, these role property URIs 9 MUST identify terms from appropriate ontologies. Roles are not restricted to describing biological function; they 10 may annotate a Feature's function in any domain for which an ontology exists. A table of recommended ontology 11 terms for role is given in Table 4.

12
It is RECOMMENDED that these role property URIs identify terms that are compatible with the type properties 13 of the Feature's parent Component. For example, a role of a Feature which belongs to a Component of type DNA 14 might refer to terms from the Sequence Ontology. Likewise, for any feature that is a SubComponent, the role 15 SHOULD be compatible with the type of the Component that it links to through its instanceOf property.

16
The orientation property 17 The orientation property is OPTIONAL and has a data type of URI. This can be used to indicate how any associated 18 double-stranded Feature is oriented on the elements of a Sequence from their parent Component. http://sbols.org/v2#reverseComplement The region specified by this Feature or Location is on the reversecomplement mapping of the elements of a Sequence. The exact nature of this mapping depends on the encoding of the Sequence. A roleIntegration specifies the relationship between a SubComponent instance's own set of role properties and 10 the set of role properties on the included Component.

11
The roleIntegration property has a data type of URI. A SubComponent instance with zero role properties MAY

12
OPTIONALLY specify a roleIntegration. A SubComponent instance with one or more role properties MUST 13 specify a roleIntegration from Use the union of the two sets: both the set of zero or more role properties given for this SubComponent as well as the set of zero or more role properties given for the included Component. Allowing multiple Location objects on a single SubComponent is intended to enable representation of discontinuous 4 regions (for example, a coding sequence encoded across a set of exons with interspersed introns). As such, the 5 Location objects of a single SubComponent MUST NOT specify overlapping regions, since it is not clear what this 6 would mean. There is no such concern with different objects, however, which can freely overlap in Location (for 7 example, specifying overlapping linkers for sequence assembly).

8
The sourceLocation property 9 The sourceLocation property allows for only a portion of a Component's Sequence to be included, rather than its 10 entirety. For example, when composing parts with certain assembly methods, some bases on the boundary may be 11 removed or replaced. Another example is describing a deletion or replacement of a portion of a sequence. The type property is REQUIRED and contains one or more URIs. The type property is identical to its use in 10 Component. The ExternallyDefined class has been introduced so that external definitions in databases like ChEBI or UniProt 13 can be referenced.
14 The type property 15 The type property is REQUIRED and contains one or more URIs. The type property is identical to its use in 16 Component.

17
The definition property 18 The definition property is REQUIRED and is of type URI that links to a canonical definition external to SBOL.

19
When possible, such definitions SHOULD use the recommended external resources in Section 7.6. For example, an 20 ExternallyDefined simple chemical might link to ChEBI and a protein might link to UniProt. The SequenceFeature class describes one or more regions of interest on the Sequence objects referred to by its 23 parent Component.

24
The hasLocation property 25 A SequenceFeature MAY have any number of hasLocation properties, each of type URI, that MUST refer to 26 Location objects. These follow the same restrictions as for the hasLocation of a SubComponent, notably that the 27 Locations of hasLocation properties attached to the same SequenceFeature MUST NOT overlap. The orientation property is OPTIONAL and has a data type of URI. All subclasses of Location share this property, 33 which can be used to indicate how any associated double-stranded Feature is oriented on the elements of a 34 Sequence from their parent Component.  The order property is OPTIONAL and has a data type of Integer. If there are multiple Location objects associated 2 with a Feature, the order property is used to specify the order (in increasing value) in which the specified Locations 3 are to be joined to form the sequence of the Feature. Note that order values MAY be non-sequential and non-4 positive, if desired.

5
The hasSequence property 6 The hasSequence property is REQUIRED and MUST contain the URI of a Sequence object. All subclasses of 7 Location share this property, which indicates which Sequence object referenced by the containing Component is 8 referenced by the Location. Note that the index of the first location is 1, as is typical practice in biology, rather than 0, as is typical practice in 13 computer science.
14 The start property 15 The start property specifies the inclusive starting position of the Range. This property is REQUIRED and MUST 16 contain an Integer value greater than zero.

17
The end property 18 The end property specifies the inclusive ending position of the Range. This property is REQUIRED and MUST 19 contain an Integer value greater than zero. In addition, this Integer value MUST be greater than or equal to that 20 of the start property. The Cut class has been introduced to enable the specification of a region between two discrete positions. This 23 specification is accomplished using the at property, which specifies a discrete position that corresponds to the 24 index of a character in the elements String of a Sequence (except in the case when at is equal to zero-see below).

The at property 1
The at property is REQUIRED and MUST contain an Integer value greater than or equal to zero. The region 2 specified by the Cut is between the position specified by this property and the position that immediately follows 3 it. When the at property is equal to zero, the specified region is immediately before the first discrete position or 4 character in the elements String of a Sequence.  The Constraint class can be used to assert restrictions on the relationships of pairs of Feature objects contained 10 by the same parent Component. Uses of this class include expressing containment (e.g., a plasmid transformed into 11 a chassis strain), identity mappings (e.g., replacing a placeholder value with a complete definition), and expressing 12 relative, sequence-based positions (e.g., the ordering of features within a template). Each Constraint includes the 13 subject, object, and restriction properties. The subject property is REQUIRED and MUST contain a URI that refers to a Feature contained by the same parent 16 Component that contains the Constraint.

17
The object property 18 The object property is REQUIRED and MUST contain a URI that refers to a Feature contained by the same parent 19 Component that contains the Constraint. This Feature MUST NOT be the same Feature that the Constraint 20 refers to via its subject property.

21
The restriction property 22 The restriction property is REQUIRED and has a data type of URI. This property MUST indicate the type of 23 restriction on the locations, orientations, or identities of the subject and object Feature objects in relation to 24 each other. The URI value of this property SHOULD come from the RECOMMENDED URIs in Table 7, Table 8  The subject contains the object and they might or might not share a boundary (i.e., union of strictlyContains, equals, and covers. Example: a cell contains a protein that may or may not bind to its membrane.

11
http://sbols.org/v3#equals The subject and object occupy the same location in space. Example: a small molecule is distributed throughout an entire sample.

12
http://sbols.org/v3#meets The subject and object are connected at a shared boundary.
Example: two strains of adherent cells meet at their membranes.

13
http://sbols.org/v3#covers The subject contains the object but also shares a boundary. Example: a cell covers its transmembrane proteins.

25
The Interaction class (as shown in Figure 12) provides more detailed description of how the Feature objects

org/v3#precedes
The start of the location for subject is less than the start of the location for object (i.e., union of strictlyPrecedes, meets, and overlaps). Example: a promoter precedes a ribosome entry site, but the exact boundary between the two will be determined by sequence optimization and assembly planning.

17
http://sbols.org/v3#strictlyPrecedes The end of the location for subject is less than the start of the location for object. Example: a promoter strictly precedes a terminator (with a CDS between them).
18 http://sbols.org/v3#meets The end of the location for subject is equal to the start of the location for object. Note: this is a stronger interpretation of meets from Table 8 in the context of a linear sequence. Example: the 3' region adjacent to a blunt restriction site meets the 5' region adjacent to the site.

19
http://sbols.org/v3#overlaps The start of the location for subject is before the start of the location for object and the end of the location for subject is before the end of the location for object. Note: this is a stronger interpretation of overlaps from Table 8 in the context of a linear sequence. Example: two adjacent oligos overlap in a Gibson assembly plan. 20 http://sbols.org/v3#contains The start of the location for subject is less than or equal to the start of the location for object and the end of the location for subject is greater than or equal to the end of the location for object (i.e., union of strictlyContains, equals, finishes, and starts). Note: this is a stronger interpretation of contains from Table 8 in the context of a linear sequence. Example: a composite part contains a promoter.

21
http://sbols.org/v3#strictlyContains The start of the location for subject is before the start of the location for object and the end of the location for subject is after the end of the location for object. Note: this is a stronger interpretation of strictlyContains from Table 8 in the context of a linear sequence.
Example: an RNA transcript strictly contains an intron.

22
http://sbols.org/v3#equals The start and end of the location for subject are equal to the start and end of the location for object. Note: this is a stronger interpretation of equals from Table 8 in the context of a linear sequence.
Example: the transcribed region of a CDS part equals the entire part.

23
http://sbols.org/v3#finishes The start of the location for subject is after the start of the location for object and the end of the location for subject is equal to the end of the location for object. Example: a terminator finishes an expression cassette.

24
http://sbols.org/v3#starts The start of the location for subject is equal to the start of the location for object and the end of the location for subject is before the end of the location for object. Example: a promoter starts an expression cassette. represented by an Interaction.

20
Each type property MUST identify terms from appropriate ontologies. It is RECOMMENDED that exactly one URI 21 specified by a type property refer to a term from the occurring entity branch of the Systems Biology Ontology (SBO).

22
Table 10 provides a partial list of possible SBO terms for the type property and their corresponding URIs.

Identified
Interaction -type[1..*] : URI hasParticipation 0..* Figure 12: Diagram of the Interaction class and its associated properties.  If an Interaction is well described by one of the terms from Table 10, then a type property MUST refer to the URI 9 that identifies this term. Lastly, if there are multiple type properties for an Interaction, then they MUST identify 10 non-conflicting terms. For example, the SBO terms "stimulation" and "inhibition" would conflict.

11
The hasParticipation property 12 An Interaction MAY have any number of hasParticipation properties, each of type URI, that MUST reference a 13 Participation object, each of which identifies the role that its referenced Feature plays in the Interaction.
14 Even though an Interaction generally contains at least one Participation, the case of zero Participation 15 objects is allowed because it is plausible that a designer might want to specify that an Interaction will exist, even 16 if its participants have not yet been determined. Each Participation (see Figure 13) represents how a particular Feature behaves in its parent Interaction.

Section 6. SBOL Data Model
The role property 1 A Participation is REQUIRED to have one or more role properties, each of type URI, that describes the behavior 2 of a Participation (and by extension its referenced Feature) in the context of its parent Interaction.

3
Each role property MUST identify terms from appropriate ontologies. It is RECOMMENDED that exactly one URI 4 specified by a role property refer to a term from the participant role branch of the SBO.  If a Participation is well described by one of the terms from Table 11, then a role property MUST refer to the URI 20 that identifies this term. Also, if a Participation belongs to an Interaction that has a type listed in Table 10, then 21 the Participation SHOULD have a role that is cross-listed with this type in Table 11. Lastly, if there are multiple 22 role properties for a Participation, then they MUST identify non-conflicting terms. For example, the SBO terms 23 "stimulator" and "inhibitor" would conflict.

24
The participant property 25 The participant property MUST specify precisely one Feature object that plays the designated role in its parent 26 Interaction object. The Interface class (shown in Figure 14) is a way of explicitly specifying the interface of a Component. where there are no flows (for instance -a physical interface).

11
The purpose of the CombinatorialDerivation class is to specify combinatorial biological designs without hav-12 ing to specify every possible design variant. For example, a CombinatorialDerivation can be used to spec-13 ify a library of reporter gene variants that include different promoters and RBSs without having to specify a 14 Component for every possible combination of promoter, RBS, and CDS in the library. Component objects that realize 15 a CombinatorialDerivation can be derived in accordance with the class properties template, 16 hasVariableComponent, and strategy (see Figure 15).  The strategy property is OPTIONAL and has a data type of URI.   An Implementation represents a realized instance of a Component, such a sample of DNA resulting from fabricating 7 a genetic design or an aliquot of a specified reagent. Importantly, an Implementation can be associated with a 8 laboratory sample that was already built, or that is planned to be built in the future. An Implementation can also 9 represent virtual and simulated instances. An Implementation may be linked back to its original design using the 10 prov:wasDerivedFrom property inherited from the Identified superclass. An Implementation may also link to 11 a Component that specifies its realized structure and/or function.

Component
TopLevel Implementation built 0..1 Figure 17: Diagram of the Implementation class and its associated properties.

The built property 13
The built property is OPTIONAL and MAY contain a URI that MUST refer to a Component. This Component is 14 intended to describe the actual physical structure and/or functional behavior of the Implementation. When 15 the built property refers to a Component that is also linked to the Implementation via PROV-O properties such 16 as prov:wasDerivedFrom (see Section A.1), it can be inferred that the actual structure and/or function of the 17 Implementation matches its original design. When the built property refers to a different Component, it can be 18 inferred that the Implementation has deviated from the original design. For example, the latter could be used to 19 document when the DNA sequencing results for an assembled construct do not match the original target sequence. 20

21
The purpose of the ExperimentalData class is to aggregate links to experimental data files. An ExperimentalData 22 is typically associated with a single sample, lab instrument, or experimental condition and can be used to describe 23 the output of the test phase of a design-build-test-learn workflow. For an example of the latter, see Figure 28.
As shown in Figure 18, the ExperimentalData class aggregates links to experimental data files using the OPTIONAL 1 hasAttachment property that it inherits from the TopLevel class.  The meta-data provided by the Model class include the following properties: the source or location of the actual 8 content of the model, the language in which the model is implemented, and the model's framework.

9
The source property 10 The source property is REQUIRED and MUST contain a URI reference to the source file for a model.

11
The language property 12 The language property is REQUIRED and MUST contain a URI that specifies the language in which the model is 13 implemented. It is RECOMMENDED that this URI refer to a term from the EMBRACE Data and Methods (EDAM) 14 ontology. Table 14 provides a list of a few suggested languages from this ontology and their URIs. If the language 15 property of a Model is well-described by one these terms, then it MUST contain the URI for this term as its value.

16
The framework property 21 The framework property is REQUIRED and MUST contain a URI that specifies the framework in which the model 22 is implemented. It is RECOMMENDED this URI refer to a term from the modeling framework branch of the SBO

14
The Collection class is a class that groups together a set of TopLevel objects that have something in common.

15
Some examples of Collection objects:

16
■ Results of a query to find all Component objects in a repository that function as promoters. Sequence, and Model objects used to provide its full specification.  The Namespace class is a subclass of Collection and is used to define member entities that share the same URI 2 prefix. Namely, all linked objects MUST have a URI prefix matching the URI of the Namespace object. The purpose of the Experiment class is to aggregate ExperimentalData objects for subsequent analysis, usually 5 in accordance with an experimental design. Namely, the member properties of an Experiment MUST refer to 6 ExperimentalData objects. The purpose of the Attachment class is to serve as a general container for data files, especially experimental data 9 files. It provides a means for linking files and metadata to SBOL designs.

10
The meta-data provided by the Attachment class include the following properties: the source or location of the 11 actual file of the attachment, the format of the file, the size of the file, and the hash for the file.

12
The source property 13 The source property is REQUIRED and MUST contain a URI reference to the source file.
14 The format property 15 The format property is OPTIONAL and MAY contain a URI that specifies the format of the attached file. It is 16 RECOMMENDED that this URI refer to a term from the EMBRACE Data and Methods (EDAM) ontology.

17
The size property 18 The size property is OPTIONAL and MAY contain a long indicating the file size in bytes.

19
The hash property 20 The hash property is OPTIONAL and MAY contain a hash value for the file contents represented as a hexadecimal 21 digest.

22
The hashAlgorithm property 23 The hashAlgorithm property is OPTIONAL and MAY contain the name of the hash algorithm used to generate 24 the value of the hash property. The value of this property SHOULD be a hash name string from the IANA Named then hashAlgorithm MUST be set as well.  This annotation and extension mechanism is designed to enable new types of data to be easily incorporated into 10 the SBOL standard once there is community consensus on their proper representation.

11
Several methods are supported for connecting the SBOL data model with other types of application-specific data: 12 ■ Custom data can be added to an SBOL object by annotating that object with non-conflicting properties. These 13 properties could contain literal data types such as Strings or URIs that require a resolution mechanism 14 to obtain external data. An example is annotating a Component with a property that contains a String 15 description and URI for the parts registry from which its source data was originally imported.

16
■ Custom data in the form of independent objects can participate in the SBOL data model if they are assigned 17 one of the SBOL types Identified or TopLevel. An example is an RDF object that is annotated such that it 18 represents a data sheet that describes the performance of a Component in a particular context.

19
■ Finally, just as custom objects can be embedded in an SBOL document, external documents can embed or 20 refer to SBOL objects. Support for this last case is not explicitly provided in this specification. Rather, this case 21 depends on the external non-SBOL system managing its relationship to SBOL and data serialized in RDF, and 22 is included here for completeness. 23 Each Identified object MAY be annotated with application-specific properties, which MUST be labelled using 24 RDF predicates outside of the SBOL namespace. Additionally, application-specific types may be used in conjunction 25 with the SBOL data model. These application-specific types MUST have two rdf:type properties: one type outside 26 of the SBOL namespace AND an additional SBOL type of either: 27 ■ TopLevel, if the object is to be considered an SBOL top level (i.e., not owned by another object) 28 ■ Identified, if the object is not to be considered an SBOL top level (i.e., is owned by another object) 29 As with SBOL Identified objects, custom Identified objects (and thus also custom TopLevel objects) MAY also 30 include the properties displayId, name, description, etc.   8 Maintaining unique URIs for all SBOL objects can be challenging. To reduce this burden, users of SBOL 3.x are 9 encouraged to follow a few simple rules when constructing the URIs and related properties for SBOL objects. When 10 these rules are followed in constructing an SBOL object, we say that this object is compliant. These rules are as object. The 〈child_type_counter〉 of a new object SHOULD be calculated at time of object creation as 1 + the maxi-33 mum 〈child_type_counter〉 for each 〈child_type〉 object in the parent (e.g., "〈parent_uri〉/SequenceAnnotation37").

34
Note that numbering is independent for each type, so a Component can have children "SubComponent37" and 35 "Constraint37".

36
All examples in this specification use compliant URIs. tools. This allows version information to be included in the root (e.g., GitHub style: "igem/HEAD/"), collection 3 structure (e.g., "promoters/constitutive/2/"), in tool-specific conventions on displayId (e.g., "BBa_J23101_v2") or 4 in information outside of the URI (e.g., by attaching prov:wasRevisionOf properties). When annotating an SBOL document with additional information, there are two general methods that can be used: 7 ■ Embed the information in the SBOL document using properties outside of the SBOL namespace. 8 ■ Store the information separately and annotate the SBOL document with URIs that point to it.

9
In theory, either method can be used in any case. (Note that a third case not discussed here is to annotate external 10 objects with links to SBOL documents, rather than annotating SBOL documents with links to external objects.) 11 In practice, embedding large amounts of non-SBOL data into SBOL documents is likely to cause problems for people 12 and software tools trying to manage and exchange such documents. Therefore, it is RECOMMENDED that small 13 amounts of information (e.g., design notes or preferred graphical layout) be embedded in the SBOL model, while 14 large amounts of information (e.g., the contents of the scientific publication from which a model was derived or flow 15 cytometry data that characterizes performance) be linked with URIs pointing to external resources. The boundary 16 between "small" and "large" is left deliberately vague, recognizing that it will likely depend on the particulars of a 17 given SBOL application.

16
Entities in an SBOL document can be annotated with creation and modification dates. It is RECOMMENDED that 17 predicates, or properties, from DCMI Metadata Terms SHOULD be used to include date and time information.

18
The created and modified terms SHOULD respectively be used to annotate SBOL entities with creation and   23 Authorship information should ideally be added to TopLevel entities where possible. It is RECOMMENDED that 24 the creator DCMI Metadata term SHOULD be used to annotate SBOL entities with authorship information using 25 free text. This property can be repeated for each author.  Each reagent, whether "atomic" (e.g., rainbow bead control) or mixture (e.g., M9 media), SHOULD be represented 38 as a Component.

39
The roles of reagents may vary in context: for example, Arabinose may serve as an inducer or as a media car-bon source. As such, role SHOULD be indicated by an NCI Thesaurus (NCIT) term in a role property of the In order to deal with parameters associated with the context in general but not specific instances, e.g., temperature, 17 pH, total sample volume, the hasMeasure property of Identified can be used. The hasMeasure of a Component 18 provides context-free information (e.g., the pH of M9 media, the GC-content of a GFP coding sequence), while the 19 hasMeasure of a material entity (SBO:0000240) Feature provides a measurement in context (e.g., the dosage of 20 Arabinose in a sample).

21
Values of these parameters SHOULD be specified by attaching a om:Measure with a type set to the appropriate SBO biological designs are split across multiple cells to optimize the system behavior and function. Therefore, there is a 31 need to define a set of best practices so that multicellular systems can be captured using SBOL in a standard way. To represent multicellular systems using SBOL, it is first necessary to represent cells. When doing so, it is important 2 to be able to capture the following information: (i) taxonomy of the strain used, (ii) interactions occurring within cells 3 of this type, and (iii) components inside the type of cell (e.g. genomes, plasmids). The approach RECOMMENDED 4 in this section is capable of capturing this information, as shown in the example in Figure 22. It uses a Component to 5 represent a system that contains cells of the given type. The cells themselves are represented by a SubComponent 6 inside the Component, which is an instanceOf a Component capturing information about the species and strain 7 of the cell in the design. This Component has a type of "cell" from the Gene Ontology (GO:0005623), and a role of 8 "physical compartment" (SBO:0000290). Taxonomic information is captured by annotating the class instance with a 9 URI for an entry in the NCBI Taxonomy Database.

17
The same approach can be extended to represent systems with multiple types of cells. The multicellular system 18 can be represented as a Component that includes each strain of cell as a SubComponent that is an instanceOf a

19
Component defining its strain. Interactions and constraints, such as a molecule that both strains interact with, 20 are implemented using ComponentReferences to link to the definitions within each cell system description. An 21 example is shown in Figure 23. The proportion of cell types present in a multicellular system can be captured using om:Measure on the represen-24 tations of cells in the design. As a best practice, the value of these measure classes is a percentage less than or 25 equal to 100%, representing the amount of a cell type present in the system compared to all other cell types present. 26 Therefore, the sum of all these values specified in the system will typically be equal to 100%, though this may not be 27 the case if the system is not completely defined. An example is shown in Figure 24.  Figure 22: This is a proposed approach for capturing cell designs in SBOL. A Component annotated with a URI pointing to an entry in the NCBI Taxonomy Database is used to capture information about the cell's strain/species. The Component has a type of "Cell" from the Gene Ontology (GO), and a role of "physical compartment". Another Component is used to represent a system in which the cell is implemented. Entities, including the cell, are instantiated as SubComponents, and processes are captured using the Interaction class. Processes that are contained within the cell are represented by including the cell as a participant with a role of "physical compartment".  Figure 23: Captured here is a design involving two cells which both interact with the small molecule "Molecule A". Designs for the sender and receiver systems are captured using constraint to show that each of these cells interacts with the Molecule A contained within it. The overall multicellular system is represented by a Component with a role of "functional compartment", which is an SBO term. The two systems are included in this multicellular design as SubComponents, and the fact that Molecule A is shared between systems is indicated with a constraint.

17
The SBOL namespace, which is http://sbols.org/v3#, is used to indicate which entities and properties in the 18 SBOL document are defined by SBOL. For example, the URI of the type Component is http://sbols.org/v3# 19 Component. The SBOL namespace MUST NOT be used for any entities or properties not defined in this specification.

20
Where possible, we have re-used predicates from widely-used terminologies (such as Dublin Core DCMI Usage 21 Board (2012)) to expose as much of the data as practical to such standard RDF tooling. Second, an SBOL-compliant software tool can support import of SBOL, export of SBOL, or both. If it supports both 14 import and export, it can do so in either a lossy or lossless fashion.

15
In order to test import compliance, developers are encouraged to use the SBOL test files found here:  In order to test export compliance, developers are encouraged to validate SBOL files generated by their software 20 with the SBOL Validator found here: This validator can also be used to check lossless import/export support, since it can compare the data content of 23 files imported and exported by a software tool.

24
Finally, developers of SBOL-compliant tools are encouraged to notify the SBOL editors 25 (sbol-editors@googlegroups.com) when they have determined that their tool is SBOL compliant, so their tool can be 26 publicly categorized as such on the SBOL website. In broad strokes, the SBOL 1 standard focused on conveying physical, structural information, whereas SBOL 2 2 expanded the scope to include functional aspects as well. The physical information about a designed genetic 3 construct includes the order of its constituents and their descriptions. Specifying the exact locations of these 4 constituents and their sequences allows genetic constructs to be defined unambiguously and reused in other 5 designs. SBOL 2 extended SBOL 1 in several ways: it extends physical descriptions to include entities beyond DNA 6 sequences, and it added support for functional descriptions of designs. SBOL 3 refines the SBOL 2 data model to 7 simplify the representation of common use cases.  The mapping from SBOL 1.1 to SBOL 2.x proceeds as follows: x ModuleDefinition with a "direction" property that is not 3 "none" is listed in the Interface of its SBOL 3.x Component. The mapping from direction to interface 4 properties is: "in"->"inputs", "out"->"outputs", "inout" -> "nondirectional". Finally, every Component with 5 "access"="public" and "direction"="none" is listed as "nondirectional" in the Interface.

6
■ Every Component in an SBOL 2.x ComponentDefinition with "access"="public" is listed as "nondirectional" 7 in the Interface of its SBOL 3.x Component. p If the refinement is useRemote, then the restriction is replaces, the subject is the 21 ComponentReference and the object is the SubComponent.

22
p If the refinement is useLocal, then the restriction is replaces, the subject is the SubComponent 23 and the object is the ComponentReference.
p If the refinement is verifyIdentical, then the restriction is verifyIdentical, the subject is 1 the ComponentReference and the object is the SubComponent.
2 p The merge refinement was never well defined and rarely if ever used, so it has been removed from 3 SBOL 3.x. If a merge is encountered, it SHOULD be handled as a useRemote.

4
• As an OPTIONAL optimization, if the SubComponent referred to by the local property of the MapsTo is a 5 "placeholder" with no significant content apart from its MapsTo relationships, then it may be eliminated, 6 all objects that pointed to it can point directly to the new ComponentReference instead, and all transitive 7 constraints using it as a bridge reduced to link the endpoints directly. PROV-O is adopted as a best practice. It is advised that SBOL tools should at least understand this subset, defined 18 in Figure 27. Providers of provenance information are free to make use of more of PROV-O than is described here.

19
It is acceptable for tools that understand more than this subset to use as much as they are able. Tools that only 20 understand this subset must treat any additional data as annotations. probably not make sense for it to be derived from a Collection.

25
The most basic and general type of provenance relationship can be represented using the prov:wasDerivedFrom 26 property. This relationship describes derivation of an SBOL entity from another. Any Identified object may be 27 annotated with this property. More specific provenance relationships can also be defined using PROV-O, such as 28 prov:wasGeneratedBy. Generation of a new object is defined by the W3C PROV-O specification as follows:

29
...the completion of production of a new entity by an activity. This entity did not exist before generation 30 and becomes available for usage after this generation.

31
These relationships are leveraged in SBOL tooling for describing multi-stage synthetic biology workflows.

32
Synthetic biology workflows may involve multiple stages, multiple users, multiple organizations, and interdisci-33 plinary collaborations. These workflows can be described using four core PROV-O classes: prov:Entity, 34 prov:Activity, prov:Agent, and prov:Plan. Any SBOL Identified object can implicitly act as an instance of 35 PROV-O's prov:Entity class. Workflow histories (retrospective provenance) and workflow specifications (prospec-36 tive provenance) can be described in SBOL using prov:Activity objects to link Identified objects into workflows.

37
An prov:Agent (for example a software or a person) runs an prov:Activity according to a prov:Plan to generate 38 new entities. Resources representing prov:Agent, prov:Activity and prov:Plan classes should be handled as 39 TopLevel, whilst prov:Usage and prov:Association resources should be treated as child Identified objects 40 within their parent prov:Activity objects.

41
A design-build-test-learn SBOL ontology has been adopted for use with PROV-O classes (see Table 19). The terms 42 design, build, test, and learn provide a high level workflow abstraction that allows tool-builders to quickly search for 43 and isolate provenance histories relevant to their domain, while keeping track of the flow of data between different 44 users working in different domains of synthetic biology. These terms SHOULD BE used on the type property of 1 the prov:Activity class. (Note that this property is a special property added by the SBOL specification, and is 2 not part of the original PROV-O specification.) Additionally, these terms SHOULD BE used in the prov:hadRole 3 properties on prov:Usage to qualify how the referenced prov:entity is used by the parent prov:Activity. Logical 4 constraints are placed on the order in which different types of prov:Activitys are chained into design-build-test-5 learn workflows. These rules additionally place constraints on the types of objects that may be used as inputs for a 6 particular type of prov:Activity. For example, a design prov:Usage may be used as an input for either a design 7 or build prov:Activity but MUST NOT be used as an input for a test prov:Activity. An example of how these 8 terms are used is provided in Figure 28. Learn describes the process of analyzing experimental measurements to produce a new entity that represents biological knowledge. In addition to the design-build-test-learn terms, users may also wish to include more specific terms to specify how 15 SBOL objects are used in-house in their own recipes, protocols, or computational analyses. In fact, it is expected that 16 the SBOL workflow ontology will be expanded over time, as users experiment with and develop their own custom 17 ontologies. For now, however, it is RECOMMENDED that SBOL tools also include the high-level terms in Table 19 to 18 support data exchange across interdisciplinary boundaries.

19
A.1.1 prov:Activity 20 A generated prov:Entity is linked through a prov:wasGeneratedBy relationship to an prov:Activity, which is 21 used to describe how different prov:Agents and other entities were used. An prov:Activity is linked through a 22 prov:qualifiedAssociation to prov:Associations, to describe the role of agents, and is linked through 23 prov:qualifiedUsage to prov:Usages to describe the role of other entities used as part of the activity. Moreover, 24 each prov:Activity includes optional prov:startedAtTime and prov:endedAtTime properties. When using 25 prov:Activity to capture how an entity was derived, it is expected that any additional information needed will be 26 attached as annotations. This may include software settings or textual notes. Activities can also be linked together 27 using the prov:wasInformedBy relationship to provide dependency without explicitly specifying start and end 28 times.

29
The type property 30 An prov:Activity MAY have one or more type properties, each of type URI that explicitly specifies the type of the 31 provenance prov:Activity in more detail. If specified, it is RECOMMENDED that at least one type property refers 32 to a URI from Table 19. How different entities are used in an prov:Activity is specified with the prov:Usage class, which is linked from 2 an prov:Activity through the prov:Usage relationship. A prov:Usage is then linked to an prov:Entity through 3 the prov:entity property URI and the prov:hadRole property species how the prov:Entity is used. When the 4 prov:wasDerivedFrom property is used together with the full provenance described here, the entity pointed at by 5 the prov:wasDerivedFrom property MUST be included in a prov:Usage.

6
The prov:entity property 7 The prov:entity property is REQUIRED and MUST contain a URI which MAY refer to an Identified object.

8
The prov:hadRole property 9 An prov:Usage MAY have one or more prov:hadRole properties, each of type URI that refers to particular term(s) 10 describing the usage of an prov:Entity referenced by the prov:entity property. Recommended terms that are 11 defined in Table 19 can be used to indicate how the referenced prov:Entity is being used in this prov:Activity.

15
The prov:agent property 16 The prov:agent property is REQUIRED and MUST contain a URI that refers to an prov:Agent object.

17
The prov:hadRole property 18 An prov:Association MAY have one or more prov:hadRole properties, each of type URI that refers to particular 19 term(s) that describes the role of the prov:Agent in the parent prov:Activity.

21
The prov:hadPlan property is OPTIONAL and contains a URI that refers to a prov:Plan. The prov:Plan entity can be used as a place holder to describe the steps (for example scripts or lab protocols) taken 24 when an prov:Agent is used in a particular prov:Activity. Examples of agents are a person, organization, or software tool. These agents should be annotated with additional 27 information, such as software version, needed to be able to run the same prov:Activity again.

29
Codon optimization is an example of where provenance properties can be applied. The relationship between 30 an original CDS and the codon-optimized version could simply be represented using the prov:wasDerivedFrom 31 predicate, in a light-weight form. With more comprehensive use of the PROV ontology, the codon optimization can 32 be represented as an prov:Activity. This prov:Activity can then include additional information, such as the 33 prov:Agent responsible (in this case, codon-optimizing software), and additional parameters.

34
Example -Deriving strains 35 Bacterial strains are often derived from other strains through modifications such as gene knockouts or mutations.
B. subtilis 168 is a laboratory strain and has several advantages as a model organism in synthetic biology. The 1 relationship between the original strain and the 168 strain can be represented using the prov:wasDerivedFrom 2 predicate or, more comprehensively, with an prov:Activity describing the protocols used. Model which describes the hypothesized behavior of a biological device. Using a computational tool, a new Design 6 (Component) is composed from biological parts, which links back to its Model. A genetic construct is then produced 7 in the laboratory via an assembly protocol, and this biological sample is represented by a Build (Implementation). 8 Once constructed, the Build is then characterized in the laboratory using an automated measurement protocol 9 on a Tecan plate reader, thus generating Test data (represented by an ExperimentalData). Finally, a new Model is 10 derived from these data using a fitting algorithm implemented in the Python programming language. The final 11 Model may not match the beginning Model, as the observed behavior may not match the prediction.

1
As specified in the description of CombinatorialDerivation, provenance can be used to link each generated 2 Component (or Collection thereof) back to the source form which it was derived. In particular, each derived 3 design links with prov:wasDerivedFrom to the CombinatorialDerivation that it was derived from. Also, each 4 SubComponent has a prov:wasDerivedFrom linking it to the SubComponent within the template that it is derived 5 from. The advantage of these provenance links is that they provide sufficient information to validate that this 6 derived design has been properly derived from the specified CombinatorialDerivations. There are at least two well-established cases for including measures/parameters and their associated units in 9 SBOL design specifications. These use cases are the specification of genetic circuit designs and their associated 10 parameters (such as rates of transcription) and the specification of environmental conditions for biological system 11 designs (such as growth media concentrations and temperatures). In the first use case, parameters are necessary 12 to enable the generation of quantitive models of circuit behavior from circuit design specifications. In the second 13 use case, measures are necessary to define experimental conditions and enable the analysis of system behavior or 14 characterization with respect to environmental context.

15
The Ontology of Units of Measure (OM) (http://www.ontology-of-units-of-measure.org/resource/om-2) 16 already defines a data model for representing measures and their associated units. Here, a subset of OM is adopted 17 by SBOL to describe these concepts for biological design specifications. As shown in Figure 29, SBOL leverages three 18 of the base classes defined by the OM: om:Measure, om:Unit and om:Prefix. A om:Measure links a numerical value 19 to a om:Unit, which may or may not have a om:Prefix (e.g. centi, milli, micro, etc.). As these classes are adopted by 20 SBOL, om:Measure is treated as a subclass of Identified, while om:Unit and om:Prefix are treated as subclasses of 21 TopLevel. In addition, SBOL adopts the following OM om:Unit subclasses: om:SingularUnit, om:CompoundUnit, 22 om:UnitMultiplication, om:UnitDivision, om:UnitExponentiation, and om:PrefixedUnit. Lastly, SBOL 23 adopts the following om:Prefix subclasses from OM: om:SIPrefix and om:BinaryPrefix.

24
SBOL-compliant tools are allowed to read, write, and modify data belonging to OM classes other than those 25 described here, but this specification does not provide any guidance for the interpretation or use of these data in 26 the context of SBOL. The purpose of the om:Measure class is to link a numerical value to a om:Unit.

29
The om:hasNumericalValue property 30 The om:hasNumericalValue property is REQUIRED and MUST contain a single xsd:float.

31
The om:hasUnit property 32 The om:hasUnit property is REQUIRED and MUST contain a URI that refers to a om:Unit. The OM provides URIs 33 for many existing instances of the om:Unit class for reference (for example, 34 http://www.ontology-of-units-of-measure.org/resource/om-2/gramPerLitre).

35
The type property 36 A om:Measure MAY have one or more type properties, each is of type URI. It is RECOMMENDED that one of 37 these URIs identify a term from the Systems Description Parameter branch of the Systems Biology Ontology (SBO) 38 (http://www.ebi.ac.uk/sbo/main/). This type property of the om:Measure class is not specified in the OM and 39 is added by SBOL to describe different types of parameters (for example, rate of reaction is identified by the SBO 40 term http://identifiers.org/biomodels.sbo/SBO:0000612).  Figure 29: OM classes adopted by SBOL and their subclass relationships to Identified and TopLevel

A.2.2 om:Unit
1 As adopted by SBOL, om:Unit is an abstract class that is extended by other classes to describe units of measure 2 using a shared set of properties.

3
The om:symbol property 4 The om:symbol property is REQUIRED and MUST contain a String. This String is commonly used to abbreviate 5 the unit of measure's name. For example, the unit of measure named "gram per liter" is commonly abbreviated 6 using the String "g/l".

7
The om:alternativeSymbols property 8 The om:alternativeSymbols property is OPTIONAL and MAY contain a set of Strings. This property can be used 9 to specify alternative abbreviations other than that specified using the om:symbol property.

10
The om:label property 11 The om:label property is REQUIRED and MUST contain a String. This String is a common name for the unit of 12 measure and SHOULD be identical to any String contained by the name property inherited from Identified.

13
The om:alternativeLabels property 14 The om:alternativeLabels property is OPTIONAL and MAY contain a set of Strings. This property can be used 15 to specify alternative common names other than that specified using the om:label property. The om:longcomment property is OPTIONAL and MAY contain a String. This String is a long description of the 2 unit of measure and SHOULD be longer than any String contained by the om:comment property. The purpose of the om:SingularUnit class is to describe a unit of measure that is not explicitly represented as a 5 combination of multiple units, but could be equivalent to such a representation. For example, a joule is considered 6 to be a om:SingularUnit, but it is equivalent to the multiplication of a newton and a meter. The om:hasUnit property 8 The om:hasUnit is OPTIONAL and MAY contain a URI. This URI MUST refer to another om:Unit. The om:hasUnit 9 propery can be used in conjunction with the om:hasFactor property to specify whether a om:SingularUnit is 10 equivalent to another om:Unit multiplied by a factor. For example, an angstrom is equivalent to 10 −10 meters.

12
The om:hasFactor property is OPTIONAL and MAY contain a xsd:float. If the om:hasFactor property of a 13 om:SingularUnit is non-empty, then its om:hasUnit property SHOULD also be non-empty. The om:hasTerm1 property is REQUIRED and MUST contain a URI that refers to another om:Unit. This om:Unit is 22 the first multiplication term.

23
The om:hasTerm2 property 24 The om:hasTerm2 property is REQUIRED and MUST contain a URI that refers to another om:Unit. This om:Unit is 25 the second multiplication term. It is okay if the om:Unit referred to by om:hasTerm1 is the same as that referred to 26 by om:hasTerm2.

28
The purpose of the om:UnitDivision class is to describe a unit of measure that is the division of one unit of measure 29 by another.

31
The om:hasNumerator property is REQUIRED and MUST contain a URI that refers to another om:Unit.

32
The om:hasDenominator property 33 The om:hasDenominator property is REQUIRED and MUST contain a URI that refers to another om:Unit. The purpose of the om:UnitExponentiation class is to describe a unit of measure that is raised to an integer power.

2
The om:hasBase property 3 The om:hasBase property is REQUIRED and MUST contain a URI that refers to another om:Unit.

4
The om:hasExponent property 5 The om:hasExponent property is REQUIRED and MUST contain an xsd:integer. The purpose of the om:PrefixedUnit class is to describe a unit of measure that is the multiplication of another unit 8 of measure and a factor represented by a standard prefix such as "milli," "centi," "kilo," etc.

10
The om:hasUnit property is REQUIRED and MUST contain a URI that refers to another om:Unit.

12
The om:hasPrefix property is REQUIRED and MUST contain a URI that refers to a om:Prefix.

13
A.2.9 om:Prefix 14 As adopted by SBOL, om:Prefix is an abstract class that is extended by other classes to describe factors that are 15 commonly represented by standard unit prefixes. For example, the factor 10 −3 is represented by the standard unit 16 prefix "milli."

17
The om:symbol property

18
The om:symbol property is REQUIRED and MUST contain a String. This String is commonly used to abbreviate 19 the name of the unit prefix. For example, the String "m" is commonly used to abbreviate the name "milli."

21
The om:alternativeSymbols property is OPTIONAL and MAY contain a set of Strings. This property can be used 22 to specify alternative abbreviations other than that specified using the om:symbol property.

23
The om:label property

24
The om:label property is REQUIRED and MUST contain a String. This String is a common name for the unit 25 prefix and SHOULD be identical to any String contained by the name property inherited from Identified.

27
The om:alternativeLabels property is OPTIONAL and MAY contain a set of Strings. This property can be used 28 to specify alternative common names other than that specified using the om:label property.

29
The om:comment property 30 The om:comment property is OPTIONAL and MAY contain a String. This String is a description of the unit prefix 31 and SHOULD be identical to any String contained by the description property inherited from Identified.

32
The om:longcomment property 33 The om:longcomment property is OPTIONAL and MAY contain a String. This String is a long description of the The purpose of the om:SIPrefix class is to describe standard SI prefixes such as "milli," "centi," "kilo," etc. The purpose of the om:BinaryPrefix class is to describe standard binary prefixes such as "kibi," "mebi," "gibi," etc.

6
These prefixes commonly precede units of information such as "bit" and "byte."