Specifications of standards in systems and synthetic biology: status and developments in 2020

Abstract This special issue of the Journal of Integrative Bioinformatics presents papers related to the 10th COMBINE meeting together with the annual update of COMBINE standards in systems and synthetic biology.

been added, Open Modelling EXchange format (OMEX) metadata specification 1.0, to harmonise the descriptions of metadata.
Further information on all standards and activities as well as links to the community websites are available from the COMBINE web site at https://co.mbine.org/. Detailed overviews of COMBINE, its history and its organisation have been provided, for example, by Hucka et al. [1], Myers et al. [2] or Waltemath et al. [5] . The annual special issue on COMBINE standards has become a tradition since its launch in 2016. Earlier editions provide summaries of updates for the years 2015-18 [6][7][8][9].
We hope that this editorial is helpful in identifying the relevant specification documents for standards in systems in synthetic biology in the year 2020.

Current versions of COMBINE standards
Please refer to the following specifications when using COMBINE standards. New specifications or updates of existing specifications are highlighted with NEW.

Core standards 2.1.1 BioPAX (Biological PAthway eXchange)
Biological PAthway eXchange (BioPAX) is a standard language for integration, exchange and analysis of biological pathway data. It is expressed in OWL. The current specification is listed in Table 1.

CellML
The CellML language is an XML markup language to store and exchange computer-based mathematical models. The current specifications are listed in Table 2. NEW CellML 2.0 [12]: The development of CellML 2.0 was guided by observing the use of CellML 1.1 in the community for over 10 years. The syntax of CellML 1.1 has been clarified in areas where discrepancies in model interpretation are often seen and simplified to remove features that are never used. These enhancements are primarily aimed at improving the model reuse capabilities of CellML. The single substantial addition to CellML that is introduced in CellML 2.0 is the concept of resets, rules that define a change in model state dependent on specified conditions being met during a simulation experiment.

NeuroML (Neural Open Markup Language)
The NeuroML is an XML-based description language that provides a common data format for defining and exchanging descriptions of neuronal cell and network models. The current specification is listed in Table 3.

SBGN (Systems Biology Graphical Notation)
The SBGN, is a set standard graphical languages to describe visually biological knowledge. It is currently made up of three languages describing Process Descriptions, Entity Relationships and Activity Flows. In addition, SBGN-ML is an XML-based file format describing the geometry of SBGN maps, while preserving their underlying biological meaning. The current specifications are listed in Table 4.
NEW SBGN-ML Milestone 3 [18] includes new developments, such as support for multiple SBGN maps within a single file, complete support for the submap glyph, and the possibility to store colours and annotations through extensions. In addition, the language attribute has been deprecated to add a more detailed version attribute and the SBGN AF perturbation glyph has been deprecated to align with the SBGN AF specification.   The SBML is a computer-readable XML format for representing models of biological processes. SBML is suitable for, but not limited to, models using a process description approach. SBML development is coordinated by an elected editorial board and central developer team. The current specifications are listed in Table 5. NEW SBML Level 3 Package: Distributions, Version 1, Release 1 [25] introduces distributions and uncertainties to SBML. Biological models often contain elements that have inexact numerical values, since they are based on values that are stochastic in nature or data that contains uncertainty. The SBML Level 3 Core specification does not include an explicit mechanism to include inexact or stochastic values in a model, but it does provide a mechanism for SBML packages to extend the Core specification and add additional syntactic constructs. The SBML Distributions package for SBML Level 3 adds the necessary features to allow models to encode information about the distribution and uncertainty of values underlying a quantity.
NEW SBML Level 3 Package: Multistate, Multicomponent and Multicompartment Species, Version 1 Release 2 [30] addresses some issues raised by users about unclear aspects of Release 1 of the specification; it also clarifies the use of XML namespaces, and updates example models.

SBOL (Synthetic Biology Open Language)
The SBOL is a language for the description and the exchange of synthetic biological parts, devices and systems. The current specifications are listed in Table 6.
NEW SBOL Version 3.0.0 [33] condenses and simplifies previous versions of SBOL based on experiences in deployment across a variety of scientific and industrial settings. In particular, SBOL 3.0.0, (1) separates sequence features from part/sub-part relationships, (2) renames ComponentDefinition/Component to Component/Sub-Component, (3) merges Component and Module classes, (4) ensuring consistency between data model and ontology terms, (5) extends the means to define and reference SubComponents, (6) refines requirements on object URIs, (7) enables graph-based serialisation, (8) moves Systems Biology Ontology (SBO) for Component types, (9)    [34] is a refinement to the standard harmonising the ontology used and extending the glyph library to capture new biological parts. Specifically, the changes in SBOL Visual 2.2. include, (1) the grounding of the molecular species glyphs is changed from BioPAX to SBO to better align with the use of SBO terms for interaction glyphs, (2) new glyphs are added for proteins, introns, and polypeptide regions (e. g. protein domains), (3) the prior recommended macromolecule glyph is deprecated in favour of its alternative, and (4) small polygons are proposed as alternative glyphs for simple chemicals.

SED-ML (Simulation Experiment Description Markup Language)
The Simulation Experiment Description Markup Language (SED-ML) is an XML-based format for encoding simulation experiments. SED-ML allows to define the models to use, the experimental tasks to run and which results to produce. SED-ML can be used with models encoded in several languages, as long as they are in XML. The current specification is listed in Table 7.

Associated standards
Associated standards provide an additional layer of semantics to COMBINE representation formats. The current specifications are listed in Table 8.
A COMBINE archive is a single file bundling the various documents necessary for a modelling and simulation project, and all relevant information. The archive is encoded using the OMEX. COMBINE archive metadata provides a harmonised, community-driven approach for annotating a variety of standardised model and data representation formats within a COMBINE archive. BioModels.net qualifiers are standardised relationships (predicates) that specify the relation between an object represented in a description language and the external resource used to annotate it. The relationship is rarely one-to-one, and the information content of an annotation is greatly increased if one knows what it represents, rather than only know it is "related to" the model component. MIRIAM Unique Resource Identifiers allow one to uniquely and unambiguously identify an entity in a stable and perennial manner. MIRIAM Registry is a set of services and resources that provide support for generating, interpreting and resolving MIRIAM URIs. Through the Identifiers.org technology, MIRIAM URIs can be dereferenced in a flexible and robust way.
MIRIAM URIs are used by SBML, SED-ML, CellML and BioPAX controlled annotation schemes. The SBO is a set of controlled, relational vocabularies of terms commonly used in Systems Biology, and in particular in computational modelling.
Each element of an SBML file carries an optional attribute sboterm which value must be a term from SBO. Each symbol of SBGN is associated with an SBO term.
The Kinetic Simulation Algorithm Ontology (KiSAO) describes existing algorithms and their interrelationships through their characteristics and parameters. Kinetic Simulation Algorithm Ontology is used in SED-ML, which allows simulation software to automatically choose the best algorithm available to perform a simulation and unambiguously refer to it.
NEW The OMEX Metadata Specification Version 1.0 [38] has been developed to harmonise the way in which computational models and other types of modelling files are annotated. Guided by consensus across the COMBINE community, the specification provides technical guidelines for encoding metadata that describes the contents of modelling projects within OMEX-formatted COMBINE archives. By helping to ensure that computable knowledge about modelling projects is encoded and shared in a consistent manner across the broader modelling community, the specification can help promote model reuse, reproducibility, discovery and semantics-based analyses.