The CellML Metadata Framework 2.0 Specification

Summary The CellML Metadata Framework 2.0 is a modular framework that describes how semantic annotations should be made about mathematical models encoded in the CellML (www.cellml.org) format, and their elements. In addition to the Core specification, there are several satellite specifications, each designed to cater for model annotation in a different context. Basic Model Information, Citation, License and Biological Annotation specifications are presented.


Introduction
The CellML Metadata 2.0 Framework describes how annotations should be connected to CellML 1.1 model document elements.The framework is designed to be modular.It comprises a Core specification, accompanied by one or more satellite specifications.The satellite specifications are each designed to cater for annotation of models for a specific domain or purpose.Examples include the Citation Specification and the Licensing Specification, which cater for metadata about citable works, and licenses pertaining to the model, respectively.The modular specification framework allows great flexibility through the addition of satellite specifications for dealing with new domains of interest, and incremental development of annotation pertaining to specific domains.The Core and satellite specifications will be discussed in turn.

Scope of the Metadata Framework
 The CellML Metadata Framework 2.0 is designed specifically for the CellML Language 1.1 (http://www.cellml.org/specifications/cellml_1.1). This framework deals with annotations that relate solely to a CellML model document or its elements.Annotations that pertain to the model itself (as in the abstract entity held in people's heads), or that depend on the CellML model document in some context (for example, the curation status of the model in some repository, or the use of the model document in some virtual experiment), are not within the scope of this framework.
When annotating or attempting to use the annotation provided by this framework, the following points should be observed:  This framework holds an 'open-world' assumption.That is, not all real-world relationships are necessarily documented as annotation.It should not be assumed that every possible relationship is be annotated for a given model document. Annotation is not, in general, guaranteed to be correct, or up-to-date.Even assuming an annotation was true when it was made, there is no guarantee that the annotation will remain true over any period of time.Nor is a given annotation about some element of a model document guaranteed to be consistent with any other annotation about any other elements in that document, or in any other document.The CellML Metadata Specification Framework 2.0 details how annotations are to be made but gives no information as to the annotations' validity or truthfulness at a particular point in time.Care should be taken not to allow multiple contradictory annotations about a particular model element to become relevant, including possible contradictory annotations between parent and children elements.

Realisation Strategy
Annotations will be made using RDF (http://www.w3.org/RDF/) triples.For a conceptual primer on RDF, please see the W3C RDF Primer (http://www.w3.org/TR/rdf-primer/). Briefly, an RDF statement has three parts, a Subject, a Predicate and an Object.The Subject of an annotation will generally be a CellML model element, and the Predicate will generally be some kind of relationship type.The Object will generally be an external entity such as an identifier for a publication, or perhaps a record in a database of known genes.An RDF statement typically links a model element to an external entity via some relationship, thereby providing an annotation that the model element has that relationship.
There are several common formats for RDF statements.In keeping with CellML, which is serialized as an XML document, it is recommended that framework annotations be serialized using RDF/XML (http://www.w3.org/TR/rdf-syntax-grammar/).Even within RDF/XML, there are multiple valid ways to express the same relationship.This Metadata Framework documentation typically demonstrates a single simple method for each example, but often there are several alternatives that are equally valid.
In contrast to the CellML Metadata Specification 1.0, in 2.0 annotations should never be made within the CellML model document to which they pertain.Perhaps the simplest (but not the only) way to do this is to place the annotation in a separate XML document, housed its own file, which in turn has the ".xml" extension -assuming the format is RDF/XML.
Model elements can be referenced by RDF statements if they have an appropriate ID.An attribute "cmeta:id" can be created on any CellML element inside its CellML model document.The "id" is required to be unique across all attributes of type ID inside a CellML model document.Making annotations to MathML inside a CellML model document is similarly catered for using MathML's existing "math:id" optional attribute.The creation of "id"s is covered in more detail in the CellML 1.1 language specification, section 8 (http://www.cellml.org/specifications/cellml_1.1/#sec_metadata).
RDF annotation linking to CellML model elements should be enclosed within RDF tags.A simple example is shown below.The example presupposes that there exists, in the same folder as the annotation file, a CellML model document "model.cellml"that has, on its <model> tag, a cmeta:id "model_example" defined.That provides a hook for the RDF Subject to "point to".In this case the model "model_example" is annotated with a simple description, following the CellML Metadata Framework Basic Model Information Specification 2.0.
 Ontologies used for defining relationships in satellite specifications should be derived from existing 'standard' ontologies, as ratified by appropriate communities, where possible. Relationship types to be encoded as part of a new specification, or an update to an existing specification, should be chosen so as to be either direct subsets of, or orthogonal to, the relationship types advocated in existing satellite specifications.
Other satellite specifications may need to be updated to ensure that this is the case. Relationship types should be chosen so as to avoid 'use-mention confusion'.An example of this confusion would be a relationship type 'is a', defined literally, when used with a CellML component and a database record representing a biological entity.This would be incorrect, because a CellML component is not a database record.It is not even a biological entity.In fact, the component represents a biological entity that is represented by the aforementioned database record.While, in some limited domains, assumptions of this type are intrinsic and untangling is taken 'as read', errors of this type make working with annotations from multiple ontologies problematic and should be avoided. Where possible, the collection of advocated predicates should be defined.A specific versioned ontology would be preferable to an evolving ontology whose future members maybe be undefined. Predicates should be in the form of nouns, where possible.This is for maximum compatibility with the Realisation Strategy (see below).For example 'part' is preferable to 'isPart', since resolving the RDF to English yields 'Subject A has an isPart whose values is Object B' in the first case, which is not as natural as 'Subject A has a part whose value is Object B'.  Additional namespaces (such as "dcterms" above) for constructs such as particular relationship types (predicates) or objects, where required, must be detailed in the appropriate Metadata Framework satellite specification.
3 Basic Model Information Specification 2.0

Introduction
Regardless of the type of model encoded in a CellML model document, there are three attributes that apply:  Every model document has a creator (whether human or by some other entity / organisation or artefact). Every model document is created at a point in time. Every model document will be created for some reason or purpose, of varying significance (for example: as part of a scientific mode of enquiry, or as a side-effect of some automated process).
The same could be said about any CellML model element.The creator may be part of an organization and recording that relationship may also be desirable.In addition, it is useful to be able to add descriptive, free-form text annotation to model elements for the general purpose of adding comments about them.
This ability to add author, timestamp information and a comment or description is also extended to annotation elements.One could add annotation forming a 'comment' about a piece of CellML code, then annotate that comment to form a comment about that comment etc.
Together, these annotations are considered 'Basic Model Information'.This specification describes how these should be annotated for a given CellML model document.

Realisation Strategy
People, organizations and other entities (such as software packages or physical artefacts capable of performing some action) will be described using a subset of an external emerging (currently version 0.1) standard known as Friend-Of-A-Friend (FOAF).At the time of writing, the current FOAF specification (Vocabulary Specification 0.98 "Marco Polo Edition") can be found here: http://xmlns.com/foaf/spec/20100809.html.
In FOAF, the aforementioned entities are collectively known as Agents.CellML model document annotations will describe Agents using RDF-formatted FOAF objects.Most often, a FOAF object will become the RDF Objects of some RDF Subject-Predicate-Object triple.For example: the Agent which creates a model or model element.
In order to describe Agents, the following subset of the FOAF Core is considered part of this specification

FOAF objects
 Person -describing a person  Agent -superclass of Person and Group.May also describe other types of Agents not given their own explicit subclass, such as software packages  Group -A collective of Agents

FOAF properties
 name -a name for one of the FOAF objects above  familyName -a family name, used for Person objects  givenName -a given name, used for Person objects  member -used to declare that some Agent is a member of a Group The relationship to specify the creator of a model document can be made by forming an RDF triple where the Subject is the model tag (via a cmeta:id), the Object is the Agent who is the creator, and the Predicate is described using the 'maker' property from FOAF.
FOAF properties and objects not listed above are not considered part of the CellML Metadata Framework Basic Model Information Specification 2.0.
The point in time at which a model element was created should be encoded using the Dublin Core (http://dublincore.org/documents/2010/10/11/dcmi-terms/)term 'created', used similarly to 'maker' above.The time point itself should be encoded in the W3C's Time and Date Formats Specification (http://www.w3.org/TR/NOTE-datetime).This allows the specification of the time point at an encoder-chosen level of precision.
A general comment on a model element, including, potentially, the purpose of a model (which would be encoded with the cmeta:id of the CellML <model> tag as the RDF Subject), should be encoded using the Dublin Core 'description' term, as text.
The recommended namespaces for using FOAF and the Dublin Core in the context of this specification are as follows:
In order to make comments about annotation elements, including other comments, RDF 'reification' is used.Conceptually, this involves making the RDF statement that links some model element with an annotation into a statement with an identifier that can itself be referenced in further RDF statements.In pure RDF, this involves making a set of 4 RDF statements (known as an 'RDF reification quad') that together describe the original annotation statement and provide it with an identifier (more details on reification can be found in http://www.w3.org/TR/rdf-primer/).This is in addition to the RDF that actually makes the original annotation statement.In pure RDF this can become verbose, particularly if the original annotation statement has a lengthy Object.Fortunately, in XML/RDF there is a shorthand version, where predicates are assigned an rdf:ID, which can then be used as the RDF Subject of the comment annotation.This specification recommends that approach.The above example shows the construction of a FOAF Person object, which becomes the RDF subject of a 'maker' relationship for the CellML model document.The 'created' predicate is used to specify that this particular model was created during February 2011, and the 'description' predicate describes the purpose of the model's creation.In the above example all three 'Basic model information' statements are made together, which is recommended, but there is no reason why one or more cannot be absent, or specified as separate statements in the annotation document.

Adding creator and timestamp elements to a model element
Here we assume that the model element we wish to annotate is defined in the "model.cellml"file as <component name="model_parameters" cmeta:id="parameters"> ...other elements...

Adding a text comment to a model element
The variable for this example, and for examples 6 and 7, would be defined in a cellml model (in a "model.cellml"file) as follows

Annotating a comment
Here the comment annotation "vi_comment" is itself annotated with a creator, timestamp and text comment of its own.Note that in this example the timestamp relates to the first comment (with an rdf:ID of "vi_comment") only, and gives no information as to when the second ("Original author confirms...") was made.If that second comment was itself given a nodeID, it could be further annotated with that information if desired.

A variable with a commented timestamp
Here the "vi_variable" variable is annotated with a timestamp, which in turn is commented with the timestamper, and a text comment

Introduction
Citable works (the RDF Objects) should be specified as URIs wherever possible.If the work has a record in a persistent database such as Pubmed (http://www.ncbi.nlm.nih.gov/pubmed/), then the recommended method is to use an Identifiers.orgURI (see http://identifiers.org/ for more on Identifiers.org).Where this is not applicable (if, for example, the work does not have a record, or perhaps is not published at all) another URI may be used (in some examples below, ISSN URNs are used).Care should be taken that the URI be as discoverable and longlived as possible in order for someone to gain meaningful information from it.
If an appropriate URI is not available, an alternative method is to provide a BIBO (http://bibliontology.com/specification)Object in the RDF to represent the bibliographic resource.This object can then be linked to the model element of interest with 'description'.
Any BIBO object may be encoded, and it is recommended that as many BIBO properties should be added as necessary so that someone can clearly identify the cited work.At the time of writing, the recommended BIBO version is that labelled as 'Revision 1.3'.A browsable list of BIBO classes and properties can be found at http://bibotools.googlecode.com/svn/biboontology/trunk/doc/index.html.
The recommended BIBO and Biomodels qualifer namespaces are as follows:

Suggested prefix Namespace URI bibo
"http://purl.org/ontology/bibo/"bqmodel "http://biomodels.net/model-qualifiers"Several of the BIBO properties are 'borrowed' from other collections of properties such as the Dublin Core, and FOAF which are also used elsewhere in the CellML Metadata Framework 2.0 (see the CellML Metadata Framework Basic Information Specification 2.0 for details), hence using those properties in a BIBO object may also require the specification of additional namespaces, if they are not already defined.

Biological Annotation Specification 2.0 6.1 Introduction
This specification describes how annotations can be added to elements of a CellML model document that declare what biological entities or processes those elements represent.

Realisation Strategy
To link CellML elements to biological concepts one should use RDF statements where the CellML element is the RDF Subject, and a URI to the concept the RDF Object.The RDF Predicate should be chosen from the Biomodels Biological Qualifiers (at the time of writing, a list can be found here: http://www.ebi.ac.uk/miriam/main/qualifiers/, under the heading 'biology-qualifiers').In keeping with the principles in the CellML Core Metadata Specification 2.0, the 'noun' forms of the predicates must be used e.g.'identity' rather than 'is', or 'part' rather than 'hasPart'.
The recommended Biological qualifier namespace is shown in the table below.

Suggested prefix Namespace URI bqbiol
"http://biomodels.net/biology-qualifiers/" In keeping with the Biomodels framework, the RDF Objects of the annotation statements should be URIs to biological concepts (not to instances of the biology themselves).Where possible these should link to a publically accessible ontology (or database, if no suitable ontology can be found) of such concepts.The recommended (sub-)ontologies are the following:  Protein Ontology  UniProt  ChEBI  Gene Ontology:cell component  Cell Ontology  Mouse Adult Gross Anatomy  Foundational Model of Anatomy  Ontology of Physics for Biology Where possible, Identifier.orgURIs (http://identifiers.org) should be used as the RDF Object URIs, as these are persistent and easily dereferenced.
It is important to try to be as precise as possible.For example, it might seem useful to annotate a CellML component as representing a particular protein.However, if the component also represents other things, then better alternatives might include using 'part' instead, or annotating a specific CellML variable rather than the entire CellML component as being that particular protein.Similarly, it is important to try to be as specific as possible.
Annotating a CellML component to the effect that it represents 'a protein' (which one?) is often not as useful as relating it to a particular protein.

Examples 1. A component represents a particular protein
The protein is listed in the Protein Ontology as ID:000005120.

An equation contains two terms that deal with different biological processes
The biological processes are represented by Gene Ontology records 0051603 and 0042398.

1. Representing a person, a group and a software package
Or, where an agent might be involved in several annotations within the CellML model document it is recommended to define the Agent separately and use an rdf:nodeID as follows: