Journal of Integrative Bioinformatics

We present here CellML 2.0, an XML-based language for describing and exchanging mathematical models of physiological systems. MathML embedded in CellML documents is used to define the underlying mathematics of models. Models consist of a network of reusable components, each with variables and equations giving relationships between those variables. Models may import other models to create systems of increasing complexity. CellML 2.0 is defined by the normative specification presented here, prescribing the CellML syntax and the rules by which it should be used. The normative specification is intended primarily for the developers of software tools which directly consume CellML syntax. Users of CellML models may prefer to browse the informative rendering of the specification (https://cellml.org/specifications/cellml_2.0/) which extends the normative specification with explanations of the rules combined with examples of their usage.


Introduction
anaerobically respiring microorganisms degrade organic compounds and donate electrons to an external circuit, thereby coupling removal of organics with electrical power production [1]. These systems have proved efficient in laboratory-scale settings, and research into scaled-up designs for treatment of industrial and municipal wastewaters is ongoing [2,3,4]. Once integrated into waste treatment processes, MFC technology is expected to have an advantage over aerobic wastewater treatment systems due to low biomass production and reduced energy consumption. Also, unlike most biogas-producing anaerobic digesters, MFC bioreactors are able to operate in ambient temperatures and do not require the treatment of the end products generated in the cell [1,5].
Thus far, MFC scaling up and development has mostly focused on optimizing reactor design and identifying inexpensive efficient materials, with the aim of reducing cost and improving efficiency to levels required for large-scale wastewater treatment [3,4,6,7,8,9]. However, better knowledge of microbial components will be necessary to achieve optimal performance [9]. Maximizing the range and rate of complex substrate subject to degradation is a critical requirement and is highly determined by metabolic capabilities of inhabiting microorganisms. Therefore, it is essential to identify active populations active within mixed microbial communities with the intent of designing microbial inoculums for scaled-up MFCs. The most efficient community would be expected to contain a significant pool of genes involved in biodegradation and extracellular electron transfer.
Bacteria within MFCs can transfer electrons to anodic materials either through direct contact or via exuded soluble electron shuttling compounds [10,11]. External electron transfer mechanisms are particularly well-characterized for the genera Geobacter and Shewanella. Electron transfer capabilities can vary substantial even between closely related Geobacter species [12]. Fine-scale transcriptional analysis of anode biofilms of Geobacter sulfurreducens found differences in gene expression across the biofilm depths [13] whereas mixed and pure culture biofilms of Geobacter anodireducens were found to be active only on the outer layer of the biofilm, with the dead layer in contact with the anode surface serving as an electrically conductive matrix [14]. Metatranscriptomic analysis of mixed community anode biofilms demonstrated significant changes in gene expression by associated sulfatereducing Desulfobulbaceae in response to changes in electron transfer activity [15].
Previous metagenomic analysis of MFCs has focused on low volume capacity (<0.5L) laboratory-scale MFCs [16,17]. Here we report whole-genome metagenomic analysis of anodic communities from two large 60-64 L capacity distillery wastewater-fed MFCs designed for use in practical-scale treatment as part of a modular MFC system. Examination of the bio-electroremediation potential of the communities and their source inoculums shows that although the MFC bioreactors differed both in design, inoculum and feed composition they developed a common core community. Both MFC communities revealed shifts in population structure from their respective inoculums that occurred during operation of the bioreactors under real-world conditions.

MFC operation and sampling
Two pilot-scale MFC bioreactors were produced by MPower World Ltd (Scotland, UK), and tested for electricity generation and COD removal with real wastewater under ambient conditions. The construction of the MFCs was based on a horizontal flow bioreactor [18]. The first reactor (UK), located at Elvingston Science Park, Scotland, had a 64 L active volume capacity and was fed whisky pot ale from a local distillery. The second reactor (JP) with 60 L active volume capacity was operated at OIST Graduate University (Japan) (Figure 1), and fed with wastewater from an awamori distillery (Mizuho, Naha). Feeds were diluted to achieve an average COD of 10 g L -1 . Both reactors had four pairs of electrodes, air-breathing cathodes and were inoculated with sludge obtained from anaerobic digesters at the local wastewater treatment plants. The UK reactor operated continuously in closed circuit while the JP reactor was maintained on a 12 h closed, 12 h open circuit cycle. COD removal efficiency and power generation were measured to assess the MFC performance. pH of the UK bioreactor was adjusted by using recycled effluent for dilution. This increased the distillery influent pH from 4.0 to 5.5 as verified by a pH meter (Thermo-scientific Orion TM 5-Star Plus, USA) at the inflow sample port. The pH of the JP bioreactor was adjusted to 6.2 using NaHCO 3 powder dissolved in influent solution and verified with a pH meter (Horiba D-51, Japan).

Sample collection, DNA extraction and sequencing
The UK bioreactor was inoculated with AD granular sludge from a whisky wastewatertreatment plant (Elgin, UK), and had been operated for 90 d at the time of sample collection. The second bioreactor (JP) was inoculated with AD granular sludge from a wastewater treatment facility located within Orion brewery (Okinawa, Japan) and operated 70 d before sampling. The anode biofilms for metagenomic analysis were collected in different manners depending on location due to dissimilar MFC configurations. The layer of biofilm was scraped off the anodic surface of the UK bioreactor; whereas a piece of carbon felt with the attached granules was cut off the anode of the JP bioreactor. Also, 10 mL of the corresponding inoculum sludge was collected as a reference sample for each MFC. All samples were immediately preserved in LifeGuard Soil Preservation Solution, following the manufacturer's instructions (MO BIO Laboratories, Carlsbad, CA, USA). Total genomic DNA was extracted from each MFC reference initial inoculum sludge, the anodic biofilm of the UK MFC, and separately from carbon felt and carbon granules of the JP MFC using PowerMax soil DNA isolation kit (Mo Bio Laboratories, Inc., Carlsbad, CA, USA). DNA quality was assessed by spectrophotometric analysis and agarose-gel electrophoresis. Approximately 5 µg of quality-checked DNA was used for library construction and sequencing with a Roche 454 GS FLX Titanium (454 Life Science, Branford, CT, U SA) according to the manufacturer's protocols [19]. Low quality bases and primers were endtrimmed. Reads were assembled de novo by GS DeNovo Assembler version 2.8. All contigs longer than 500 bp were selected for further analysis.

Metagenomic DNA sequence analysis
The resultant contigs and singletons were uploaded to MG-RAST (MetaGenome Rapid Annotation with Subsystem Technology) [20] , where additional removal of artificial replicates and filtering for H. sapiens sequences was performed. Unassembled JP carbon cloth and carbon granule samples were also uploaded to MG-RAST. While MG-RAST analysis is suited well for reads of the same length, it does not provide a way to calculate the phylogenetic distribution in metagenomic samples having mixed sequences of different lengths and coverage. Therefore, after MG-RAST BLAT search, all resulting files were downloaded from the server and analyzed independently.

Phylogenetic annotation
Phylogenetic annotation of each metagenome was performed based on 16S rRNA gene fragments and sequence reads. Ribosomal RNA sequences for each metagenome after clustering (97% identity) were downloaded from MG-RAST server and classified by SILVA Aligner (with the minimum identity with query sequence 0.85), and RDP classifier (with the minimum confidence score of 0.8) [21]. SILVA Aligner was utilized as the primary source of classification, while RDP classification was assigned to the sequence only when not classifiable by SILVA.
To perform phylogenetic annotation on all metagenomic sequences and calculate the phylogenetic ranks abundances, results of BLAT search against RefSeq database was utilized.
To assign taxonomy to the individual sequences the lowest common ancestor (LCA) approach was used. Nearest neighbors of the particular sequence were determined as hits with alignment length of at least 50 bp, and with maximum permitted difference of 10% from the maximum identity, but with identity not less than 60%, and BLAT E-value of at least 10 -5 . These selected hits were used for LCA, and the common part of taxonomy for nearest neighbors was assigned to the metagenomic sequence. Abundance of each phylogenetic rank was calculated as the number of shotgun reads assigned to the particular phylogenetic rank, where abundance for a contig is the number of reads included in this contig. All taxonomic ranges were assigned according to NCBI Taxonomy [22,23].

Functional annotation
All found proteins were assigned to both Clusters of Orthologous Genes (COG) [24] and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology database [25] by MG-RAST. Results of BLAT search against COG and KEGG databases were downloaded from MG-RAST API. COG or KO IDs were assigned to the particular protein, if it had sequence identity more than 60% and alignment length of at least 80 bp.
Abundance for individual COG or KO was calculated as the number of reads, where the particular COG is present, multiplied by the average coverage of the sequence. The abundance of each category or level is the sum of abundances for each COG or KO in the group. To study the most abundant functions on anodic biofilms we chose all the COGs in JP and UK anode metagenomic samples with abundance of more than 0.25%, and compared their abundances with the ones in the JP and UK inoculums.

Functions important for MFC productivity
Proteins involved in phenazine-1-carboxylate biosynthesis pathway were extracted from MetaCyc database [26] while proteins involved in transduction of electrons to anode were extracted from the total 35 found in the literature [27,28,29,30,31,32]. Protein sequences mentioned above were used for BLAST homology searches against both anode metagenome sample protein databases with 45% sequence similarity, alignment length of 60 aa, and expectation of 10 -5 .
Genes involved in acetate utilization were extracted from KEGG, COG and CDD: K00625phosphate acetyltransferase; K01895 -acetyl-CoA synthetase; K00925 -acetate kinase; K14393, and PRK09395 (CDD ID) -acetate permease actP; K01637 and COG2224isocitrate lyase; K01638 -malate synthase. In the case of CDD, alignment matches were revealed by HMMer search against metagenome protein databases with a bitscore cutoff of 120. Taxonomy for each found hit was determined as phylogenetic rank of the sequence, where hits were found. Found genes were plotted on NCBI taxonomy tree and visualized with iTOL [23].

MFC operation and microbial sample collection
Two multi-electrode pilot MFC bioreactors of similar design [18] successfully treated wastewater from local distilleries and generated electrical power; one in Edinburgh, Scotland (UK), the other in Okinawa, Japan (JP). Each MFC was independently inoculated with a local AD sludge and maintained under ambient conditions with average temperatures of 10°C and 27°C for the UK and JP reactor, respectively. The UK bioreactor was fed whisky pot ale wastewater while the JP bioreactor was fed rice wash and spentwash mixed awamori distillery wastewater. The pH level in UK bioreactor was maintained at 5.5 in order to reduce possible methanogenic activities that could potentially decrease electrical power output, while it was adjusted to pH 6.2 in the JP MFC in order to increase digestion of organics. During the course over two months of continuous operation both MFCs demonstrated efficient organics removal coupled with electrical power generation. However, higher maximal values of power output and COD removal were achieved by the JP bioreactor (Table  1). Also, bubbling on the top of anodic chambers of the JP bioreactor implied production of biogas. In order to observe community development and compare metabolic characteristics of microbial consortia from the bioreactors, whole community genomic DNA was isolated from the initial inoculums (UK-ref and JP-ref) and from anodic biofilms of MFCs 90 d and 70 d after bioreactor initiation (UK-mfc and JP-mfc, respectively). Shotgun sequence data were generated for each metagenomic library and further used for taxonomic and functional analysis (Table S1).

MFC microbial community taxonomic analysis
Metagenomic reads were pre-processed (Materials and Methods, Figure S1), and then assembled in contigs within each sample. Initially, biodiversity of the sample was estimated by 16S rRNA gene analysis (Table S2). Also, taxonomy was determined for each metagenomic sequence using the lowest common ancestor (LCA) approach ( Both initial communities underwent structural changes within 70-90 d of MFC operation. In the UK anode biofilm (UK-mfc), the proportions of Proteobacteria, Firmicutes and Actinobacteria increased compared to their abundance in the inoculum sludge. Bacteroidetes, although decreased in proportion to other bacterial groups, remained the most abundant phyla in the community. At the genus level, the most significant increase was shown for Pseudomonas and Lactobacillus (Figure 2b).
In the JP anode biofilm (JP-mfc), Bacteroidetes, Firmicutes and Archaea increased in abundance throughout MFC operation. Proteobacteria remained the most represented in the JP-mfc, although its fraction decreased compared to the corresponding initial community (Figure 2d,e). At the genus level, Clostridium, Bacteroides, Prevotella, Pelobacter, Geobacter, Paludibacter and Methanothermobacter were enriched in JP-mfc. Additional analysis of microbial DNA from different anodic components showed that the latter three genera and species related to Roseiflexus tend to accumulate on carbon cloth rather than on carbon granules (Table S6).
Thus, taxonomic analysis showed that Bacteroidetes, Firmicutes and Proteobacteria are present in distinct anaerobic sludge communities and dominant in MFCs fed by real industrial wastewater maintained under different environmental and operational conditions. The dominance and stability of these bacterial groups has previously been demonstrated in biogas reactors [33,34] and in municipal wastewater-fed small-scale MFC anodic biofilms [16] whereas only Bacteriodetes and Proteobacteria but not Firmicutes were the dominant members of the metagenome of small-scale MFCs provided with acetate-rich feed [17]. Methanogenic Archaea were also well represented in JP-mfc (Figure 2d), consistent with the notable production of gas by this bioreactor. The identified phylotypes along with well-known electricity-generating bacteria Geobacter spp. on anodic cloth are likely to contribute into direct electron transfer from the cell to anode surface. Among those identified, Methanothermobacter (Archaea) was earlier shown to increase power production in the mixed biofilms [35]. Paludibacter species were reported to be abundant in hydrogen producing microbial electrolysis cells [36], however, their impact on current generation has not yet been demonstrated. Roseiflexus species were previously found in the filamentous biofilm of cellulose-fed MFC, and their role in the community was associated with sugar utilization [37].

Most abundant molecular functions and genes responsible for bioremediation in MFC bioreactors
In microbial systems, taxonomic diversity is not necessarily related to functional diversity. Metagenome-based functional analysis of MFC communities was performed in order to examine their metabolic capacities. Besides general functional categories, we looked particularly into distribution of genes and pathways associated with bioremediation and extracellular electron transfer, two major functions of MFC communities.
Identified proteins were distributed among all existing Clusters of Orthologous Groups (COGs) functional categories [24] with "Carbohydrate transport and metabolism" and "Energy production and conversion" pathways overrepresented in the UK and JP bioreactor communities, accordingly (Figure 3a,b; Table S7). Linking identified functions to taxonomy showed that enriched pathways were introduced mostly by highly abundant phyla. In the UK bioreactor, the dominant group of Bacteroidetes was comprised of saccharolytic bacteria that derive energy primarily from carbohydrates and proteins through fermentation, and therefore hold a number of genes for carbon utilization. The enrichment of "Energy production and conversion" functional category in the JP-mfc was caused mostly by Archaea that added to methane metabolism pathways, as well as by Proteobacteria and Firmicutes.
Increased number of genes associated with motility, chemotaxis, signal transduction and membrane transport were found in the anodic biofilms compared to initial inoculums ( Table  S8). The most enriched proteins were critical for biofilm formation or important for energy utilization (Table S9). Functional comparison between inoculum and anode communities showed gene enrichment in "Coenzyme transport and metabolism", "General function prediction only" and "Function unknown" categories on anode biofilms ( Table S7) implying roles of yet unknown proteins in anodic biofilm formation and functioning.
The distribution of genes and pathways involved in biodegradation of the main components of distillery wastewaters, such as acetate, cellulose, starch and proteins was identified for anode biofilm communities. All examined pathways were introduced by higher number of microbial species in the more diverse JP-mfc populations (Figure 4, Figures S2, S3).
The main acetate-catabolizing enzymes (acetyl-CoA synthetase, phosphate acetyltransferase, acetate kinase, acetate permease, glyoxalate cycle enzymes) were mostly distributed between Proteobacteria, Firmicutes and Archaea (Figure 4). Distribution of key enzymes involved in anaerobic cellulose degradation (endogluconase, beta-glucosidase and exoglucanase) showed that Firmicutes were possibly the major cellulose degraders in both bioreactors ( Figure S2). Starch was utilized mostly by abundant Firmicutes, Bacteroidetes and Proteobacteria in both MFCs, and also Archaea in JP-mfc ( Figure S3).
Extracellular protein degradation can be performed by various proteases produced by different groups of microorganisms. An indirect way to determine the main contributors of proteolysis consists of searching for amino acid transporter genes in the community. Functional analysis showed that Clostridia might be the major bacterial family carrying out this function in both MFCs, with the help of Lactobacillus in the UK and Delta-and Gammaproteobacteria and Archaea in JP-mfc (data not shown).  Table S10. A key to the COG letter codes for functional categories can be found in Table S7, a detailed description for particular COG can be found in Table S8.

Diversity of genes related to extracellular electron transfer in the anodic biofilm communities
Our knowledge on extracellular electron transfer mechanisms is mostly built on recent studies of two genera of dissimilatory metal-reducing bacteria, Geobacter and Shewanella. These bacteria use an assembly of membrane and periplasmic c-type cytochromes to move catabolic electrons across the cell envelope directly to extracellular solid conductive surfaces [32]. In addition, extracellular electron transfer via pili-like microbial nanowires was clearly demonstrated for Geobacter species [38,39] which use type IV pili composed of PilA for conduction of electrons through multilayer biofilms [40]. In Shewanella direct electron transfer can occur through outer membrane protein extensions [10] or via soluble electron shuttles produced by cells [41,42].
Metagenome-wide screening of anode biofilm sequences showed the presence of the key genes for both direct and indirect extracellular electron transfer types in both bioreactors. However, the distribution of these genes and their taxonomic assignment differed between the two MFCs (Figure 3c,d; Table S10).
Sequences similar to the genes encoding c-type cytochromes of the "metal respiration" system were found in both biofilms and mostly belonged to different families of Deltaproteobacteria or Gammaproteobacteria in the JP-mfc and UK-mfc, respectively. Notably, outer membrane (OmcABESZ, MtrF) and periplasmic (PpcA, MtrA/D) c-type cytochromes required for electron exchange between the cell surface and the anode were found only in the JP-mfc among Geobacter, Deferribacter, Reinekea and some unidentified species. The sequences similar to the inner membrane c-type cytochromes (MacA, CymA, SirC) were found in both bioreactors, however with significantly higher and taxonomically wider representation in the JP-mfc. The difference in the distribution of electrode respiration related c-type cytochromes between the MFCs can be explained by a higher amount of metal-reducing Deltaproteobacteria in the JP-mfc.
Among taxonomically identified sequences most of pili formation genes were possessed by Proteobacteria, especially by Delta-and Gamma-classes in JP-mfc and UK-mfc, accordingly. The JP-mfc but not the UK-mfc possessed Geobacter PilA homologs. The mannose-sensitive hemagglutinin (Msh) pili, have been shown to be necessary for optimal current generation in Shewanella [43]. The Msh pili sequences showed broader variation in taxonomy, particularly in the JP-mfc, where msh genes were assigned to Firmicutes, Planctomycetes, Verrucomicrobia, Thermotogae, Acidobacteria, different classes of Proteobacteria and Archaea. In the UK-mfc, genes similar to msh were found in Firmicutes and all classes of Proteobacteria. Interestingly, all msh fragments identified in Epsilonproteobacteria were most similar to Arcobacter butzlery, an electrogenic bacterium previously isolated from similarly configured MFC bioreactor [44].
Undirected extracellular electron transfer is carried out via electron shuttles produced by some bacteria. It has been suggested that bacterial shuttle secretion could be stimulated by current generation in MFCs [45]. The endogenously secreted flavins of Shewanella species, mainly riboflavin, are the most well documented electron shuttles in MFCs [46,47,48]. Also, it is known that phenazine-based metabolites produced by Pseudomonas species can function as electron shuttles for Pseudomonas cells themselves and also, in a syntrophic association, for Gram-positive bacteria [49].
The ribB gene, which encodes a riboflavin synthesis enzyme, and most genes involved in the phenazine-1-carboxamide synthesis pathway (phzDEFG) were detected in both biofilms and were related mainly to Bacteroidetes, Firmicutes and Proteobacteria. The electron shuttle gene sequences had broader taxonomic distribution in the JP-mfc, where fragments belonging to Deltaproteobacteria, Verrucomicrobia and Archaea were found. Alphaproteobacteria shuttle sequences were found only in the UK-mfc.
Based on the abundance of extracellular transfer genes in the two MFC anode biofilms, electrochemical activity involved both direct and mediated electron transfer in the JP MFC, and was mainly due to excreted redox components in the UK bioreactor.

Concluding remarks
Identification of key microorganisms that drive bio-electroremediation is the first step for determining the most effective biocatalyst for wastewater-treating MFC systems. Our study whole-genome metagenomic study was conducted on large volume (60-64 L) MFCs operating under conditions that would be expected for real-world treatment in a modular MFC system. The present analysis showed that three bacterial phyla, Proteobacteria, Firmicutes and Bacteroidetes constituted the core of electrochemically active biofilms in independentlyinoculated MFCs fed with different industrial wastewaters in distinct climatic zones.
The microbial community of the MFC operated in the subtropical climate (Okinawa, Japan) had higher biodiversity and held greater gene repertoire for biodegradation and extracellular electron transfer compared to temperate climate population (Edinburgh, UK). The JP MFC was dismantled following the 70 d experimental period reported here but the UK MFC continued to maintain stable electrical output until its decommissioning approximately one year later (unpublished results), demonstrating the resilience of its electrogenic community. The observation of biogas production from the JP MFC implies a possibility for integrated organics removal, electrical power generation and biogas production from one functionally diverse microbial community.
Based on the results of the present study, the MFC bioreactor operating in Edinburgh could potentially have benefited from enrichment with exoelectrogenic microorganisms that use direct electron transfer mechanisms, such as Geobacter spp. Also, providing conditions favorable for methanogenic microbes could enhance organics removal without a significant loss in electrical power production.
Availability of the metagenomic data of mixed anaerobic sludge and electricity generating biofilms will further allow for targeted studies of the enzymes involved in bioelectroremediation and should eventually lead to breakthroughs in metabolic modeling and our ability to predict, develop and maintain functionally stable and robust electrogenic microbial communities optimized for specific waste treatment in a particular geographical location.