VANESA - A Software Application for the Visualization and Analysis of Networks in Systems Biology Applications

Summary VANESA is a modeling software for the automatic reconstruction and analysis of biological networks based on life-science database information. Using VANESA, scientists are able to model any kind of biological processes and systems as biological networks. It is now possible for scientists to automatically reconstruct important molecular systems with information from the databases KEGG, MINT, IntAct, HPRD, and BRENDA. Additionally, experimental results can be expanded with database information to better analyze the investigated elements and processes in an overall context. Users also have the possibility to use graph theoretical approaches in VANESA to identify regulatory structures and significant actors within the modeled systems. These structures can then be further investigated in the Petri net environment of VANESA. It is platform-independent, free-of-charge, and available at http://vanesa.sf.net.


Background
Over the last decades of biomedical research, it has become apparent that a biological element can never be investigated in isolation, since the degree of regulation covers almost all omic levels.Cellular life is mostly a network of interacting elements [1], in which the biological elements, such as DNA, RNA, proteins, and metabolites interact with each other.Cellular life is complex, the investigation very time-consuming and experimental analysis quite complicated.Therefore, scientists mostly have only detailed information and broad knowledge about the main interaction partners.But a biological element or process is always a part of a larger machinery or regulatory process.Thus, natural scientists need reliable information about the involved elements and/or processes and standards for data storage and representation [2].Furthermore, they need manageable biological networks presenting the whole context of regulation in order to produce good theoretical models, which can be used for hypothesis testing.
One possibility for gaining additional knowledge and linking different datasets is by accessing knowledge from biological databases.Biological databases are large repositories storing relevant information.However, this kind of information is distributed over different autonomous and heterogeneous biological databases, which need to be collected, filtered, cleaned, normalized, and linked in complex and time-consuming processes.Actually, more than 1,500 biological databases covering various areas of biology can be found [3].Although data integration Actually, the Systems Biology Markup Language (SBML [5,6]) website 1 lists more than 250 software tools which provide biological modeling.Some are specialized in biological modeling, others in the analysis of high-throughput experiments, others in the visual exploration of biological data, and still others in network reconstruction.In general, there are only a few solutions covering different research fields, such as information fusion, modeling, simulation, and network analysis and visualization.A software application which offers a platform to automatically reconstruct and systematically explore the molecular functionality of a particular biological process leading to the identification of regulatory modules and networks, is still not available.Thus, a strong need for a platform that is able to model and simulate changes in cell organization, and consequently, discuss fundamental questions about metabolic or genetic diseases, has emerged.This has motivated the realization of VANESA, assisting molecular scientists in the semi-automatic reconstruction, analysis, and simulation of biological systems for hypothesis generation and testing (see Figure 1).

Related work
In general, more than 1,500 bioinformatics databases and more than 250 software tools exist.Using these resources, scientists have the possibility to model biological systems in many different ways and furthermore, enrich any kind of biological system with relevant biomedical knowledge.However, the number of available software tools is constantly increasing.In order to narrow the choice of computational tools, only the best suited applications for the modeling, visualization, analysis, and simulation of biological networks are considered, which are also supported and state-of-the-art.For the following discussion, only those tools which are able to model, reconstruct, visualize, and simulate biological systems in one single comprehensive framework are taken into account.In order to compare the software applications, each of the selected tools was examined in terms of graphical modeling usability, the possibility to automatically reconstruct biological networks based on database information, network analysis (graph theory, mathematical analysis, Petri net analysis, etc.), network visualization and interaction, and the possibility of simulating biological systems (see Table 1).Cytoscape offers a strong platform for visualization [8] as well as being a well-designed plug-in structure for the integration of new program modules.A third-party plug-in exists for practically every aspect of system modeling and simulation.However, many show disadvantages in their possibilities or are no longer available in the new version of Cytoscape.The BioNet-Builder, one of the strongest plug-ins for automatic network reconstruction [9], is able to access some important biological databases.For the simulation, several different ODE solvers are available, for example FERN.However, the quality of the results is not always comprehensible and it is difficult to use the results for the identification of network motifs, regulatory switches, and so on.So far, a plug-in for Petri net analysis is not provided.
The CellDesigner [10] is strong in its capacity to draw and model biological systems.The software application is able to access some important databases but it only enables users to enrich model elements with given database information.Based on this information, users can manually extend their networks step by step.Simulations can be performed using external tools such as Copasi [11] and other ODE solvers [12].
By contrast, Cell Illustrator [13] offers an easy-to-use interface, which enables drawing, modeling, analyzing, and simulating complex biological processes and systems based on hybrid functional Petri nets [14].However, the weakness of Cell Illustrator is the simulation itself.There is no information about how the Petri nets and the corresponding processes are defined and simulated.It is not known how conflicts in Petri nets are resolved, how the hybrid simulation is performed, and which integrators are used.Due to its evolutionary design and many changes, the core has become opaque over the last few years.Further down the line there is no possibility to adapt solver settings to achieve reliable simulation results.
In summary, no existing application is able to model, visualize, analyze, and simulate a biological model with sophisticated methods.Users are faced with using a combination of many different approaches and tools to cover all important aspects in dynamic cell modeling.Furthermore, users need prior knowledge in mathematics and a good background in computer science.None of the tools was able to convince or at least to produce biological networks suitable for biological analysis.This is because of the missing link to some important databases or the produced results, which were not specific enough, mainly resulting in unclear and or incomprehensible networks.The knowledge from existing databases could not be employed in a usable way.The simulation is another drawback of some of the tools.If simulation techniques are provided, they are mainly based on mathematical approaches.This requires mathematical knowledge and moreover, a set of biological data and parameters that can be used for simulations.Therefore, Petri nets are more suitable, as they can simulate biological networks in a qualitative and quantitative manner.
Finally, a strong need for VANESA exists, which provides strong modeling features where models can be reconstructed or enriched with biological database information, then analyzed in different ways, and finally simulated in a qualitative and quantitative manner.

Implementation
The requirements for VANESA are vast and complex but present a well-established guide for a powerful framework.Based on the aforementioned guidelines, a system architecture for VANESA which is able to offer all required features, was elaborated [7].In order to reach the main goals of VANESA, namely the reconstruction, analysis, visualization, and simulation of biological networks, all modules are interconnected (see Figure 1).The implementation of these modules and the overall framework is realized in JAVA™, which is a platform independent language.
For database access, VANESA offers a form where scientists can query each database for relevant information.Database information is gained by accessing the data warehouse DAWIS-M.D. [15], which contains the selected databases KEGG [16], HPRD [17], IntAct [18], BREN-DA [19], and Mint [20] (see Figure 1, element 2).In addition, the data warehouse contains eleven different databases covering almost all omic levels.Most of the biological processes within a cell such as enzymatic reaction, protein-protein interaction, metabolic and signaling pathways can be modeled and analyzed.Via the implemented web service, which is realized with an asynchronous Axis2 web service technology [21], it is possible to consult DAWIS-M.D.
to gather biological and medical information.Queries can be sent simultaneously without loss of performance and connection dropouts.Before visualizing the reconstructed model, an appropriate layout algorithm is applied.In the graphical user interface the user has the possibility to examine, edit, extend, and reduce the model.
For further analysis, users are able to map results from laboratory experiments (e.g.microarrays) on an existing network.These results are linked to the network model and made visually accessible.Furthermore, different models can be compared across separate tabs to analyze similarities and differences in network structure, system regulations, and dynamics.Additionally, graph theoretical analyses can be applied to the networks to ease recognition of relevant elements and formations.To make calculated and predicted results more intuitive and understandable, the calculations of these approaches are directly applied and dynamically visualized on the networks.
For the simulation processing, users are able to transform a biological model into a Petri net [22] (see Figure 1, element 3).VANESA can automatically translate the network structure into the xHPN [23,24] formalism.Furthermore, users have the possibility to directly edit and model a system using the Petri net language without first reconstructing a biological model.However, if kinetic data is available it can be incorporated into the Petri net by placing ODEs or system parameters such as capacities and thresholds on the places and transitions.Finally, simulations are performed using OpenModelica and PNlib (see Figure 1, element 4) [25].Once the simulation processing is finished, the results are automatically transformed back to VANESA and made visually accessible with charts and network animation.
Biological standards should ensure that all model concepts are well-defined and can be exported and imported into VANESA (see Figure 1, element 7).Algorithms have been implemented in VANESA, for the exchange of models between different software tools which convert the data structure into the formats SBML [5,6] (Systems Biology Markup Language), CSML (a format to simulate models within the software application CellIllustrator [13]), an easy to use .txtfile export, and .modata exchange format for the Modelica language [26].In the example of the SBML, the models are passed and checked by the online web service2 .This ensures maximal compatibility and error checking, which guarantees valid models that can be imported and exported.
In order to import biological data that can be mapped on an existing network, an easy-to-use import wizard has been realized.Using this wizard, it is possible to map experimental data values on a database created network as shown in the cholesteatoma application case.There-fore, users only need an excel-sheet, based on key (e.g.gene name) -values (e.g.fold-change) listings, which then is processed within VANESA.If users decide to map experimental values on protein-protein-interaction networks, the data mapping from key to protein is automatically performed with information from the BioMarts' RESTful web service [27].In the following, each section represents the most important implementations and technical realization of the design concepts and requirements stated on VANESA.

Network modeling
The back-end structure is a computational representation of a mathematical graph, in which concepts are the nodes and relations are the edges.Each concept represents a real world entity, with specific properties and characteristics.Relations are used to represent how the concepts are related to each other.The graph visualization is based on the open-source library JUNG3 (Java Universal Network/Graph Framework).
For the process of simulation, the biological network has to be transformed into a Petri net.Therefore, the extended Petri net formalism xHPN is used [23,24], which is a powerful mathematical modeling concept properly adapted to the demands of biological processes.Each node in a biological network is replaced by a place and an edge connecting two elements which is then replaced by a transition, according to [23,24].
Places are biological compounds such as metabolites, enzymes, and genes.Transitions are biological processes such as biochemical reactions, metabolic reactions, and interactions.The marking of a Petri net describes abstract biological concentrations such as the amount of molecules or cells.Additionally, every place can be assigned with minimum and maximum capacities.Regular arcs are used to connect biological compounds and processes, test arcs are used to describe activation processes and read arcs describe catalytic processes.Additionally, each arc can be weighted with different coefficients, such as stoichiometric coefficients.Moreover, each biological process can have a delay.In order to model and simulate random duration of biological processes, hazard functions can be assigned to model and simulate stochastic kinetics.Maximum speeds of biological processes, such as kinetics effects/laws, can also be assigned.

Network reconstruction
Biological databases are important resources in assisting scientists in their research, as they provide important data and knowledge from literature, experiments, and results from several analysis techniques.This knowledge can be used to explain biological systems and cell behavior, from the genetic level on up to the entire metabolism.Based on the web service, described in section 3, a variety of networks can be loaded.Using the database search panel, users are able to perform an adjustable depth-search on the integrated databases for biological elements matching or partially matching a given biological definition, name, or identifier, and select a specific organism.
However, information stored in databases is distributed over many separate tables.In order to reconstruct a biological network, links and connections from one biological compound to another have to be established.This process is done piece by piece until a certain pathway map is completed or a given search depth is reached.Primarily, the KEGG database is used to reconstruct metabolic pathways in VANESA.The BRENDA database can be used to reconstruct metabolic networks.For each queried enzyme, substrate, or product, a reaction list is created containing all involved biological elements.Information on reversibility and the type of connection is also considered.Currency metabolites are mainly used as carriers for transferring electrons and other functional groups such as ATP, H 2 O, and CO 2 .They are optionally not shown or considered in metabolic pathways, since structurale analysis with connections through currency metabolites may produce meaningless results.
In order to address this problem, top-ranked metabolites can be excluded by the user.Based on their connection degree, VANESA calculates their ranking and provides users with the possibility of disregarding all or only selected metabolites identified as currency metabolites.Furthermore, the databases Mint, IntAct, and HPRD can be used for the reconstruction of proteinprotein interaction networks.Finally, the user can choose whether binary and complex interactions should be included or excluded in these models.

Network analysis
Using graph theory, several important network structures can be identified.Centrality measurement, for example, can point out important actors or paths within a network.In several approaches, degree measurement was used to identify essential elements within a biological network.A study on Saccharomyces cerevisiae revealed that proteins with a high degree centrality are more essential in comparison to others [28].Other studies described similar findings using degree centralities as described by Hahn et al. [29].
A certain set of algorithms was implemented to support users with the possibility of applying centrality measurements in VANESA.Users can compute different types of local and global network properties [30].The local network properties focus on node specific characteristics in the network so it can be highlighted individually (like the nodes degree).The global network properties consider the complete structure of the network.Each of the implemented global properties reduces the network to one single number so it can be surveyed and compared easily to other networks.The properties focus on different structural aspects of the network.Besides the amount of different node degrees, the largest, the smallest and average node degree can be computed.For a deeper view into the interaction, the average neighbor degree [31] can be considered, since it measures the average node degree of every neighbor and not just a single node.In order to classify a networks overall topology the density [30], centralization and global matching index [31] can be determined.The density describes the networks edge to node ratio, whereas the centralization measures node degree homogeneity.The average of pairwise common neighbors is described by the global matching index.Finally, the networks path structure can be illustrated by the average of shortest paths and by highlighting shortest paths between a pair of nodes.
With a parallel coordinate plot [32], the implemented network properties can be visualized and doi:10.2390/biecoll-jib-2014-239examined in the overall context, as presented in [33].This allows a numerical comparison between sets of differing networks.VANESA additionally offers the possibility to generate random, regular, bipartite, connected, and Hamilton graphs with a given number of nodes and edges.The graphs can be set as directed or undirected, as well as weighted.
Based on the implemented algorithms, biomedical networks can be analyzed in various ways and dimensions.The approaches enable structural, as well as individual node analysis.Especially in dense graphs, which suffer from visual orientation, important elements can be made visually accessible (see Figure 2).Depending on the user's needs, different implemented graph layouts can be used to realign the network for a more convenient visualization.

Simulation
For the process of simulation, the Petri net needs to be translated into the object-oriented modeling language Modelica.A Modelica compiler (OpenModelica [34]) executes the calculation of the Petri net simulation using the PNlib, which is written in Modelica as well.When the simulation is finished, results are matched on the network and made visible (see Figure 3).This communication bridge between VANESA and the Modelica compiler runs invisibly in the background.
Simulation results are available as tables or as charts, and can be exported as JPEG files.If desired, they can be animated within the graphical user interface.The animation is interactive and can be performed for each given time interval.During the animation, the nodes change their size and color depending on the amount of tokens.If the amount of tokens increases, the node gets bigger and is colored red.If the amount of tokens decreases, the place gets smaller and the node is colored blue.Thus, users are able to intuitively recognize system state changes and information flow within the reconstructed models.

Application case
VANESA, with its various functionalities, has proved very useful in a wide area of application cases.Using VANESA, scientists have been able to model and simulate intracellular molecular mechanisms from a variety of research cases, such as cell-to-cell communication in quorum sensing processes [35], cardiovascular diseases [36,37], and cholesteatoma profiles [38].Using this tool, they were able to extend and deepen their knowledge and moreover, were motivated to perform further experiments based on the resulting molecular insights of VANESA.To give an impression of such an application case, this section briefly presents the cholesteatoma investigations, where the software application was used as a valuable tool for medicine.
In a clinical trial, the middle-ear disease cholesteatoma was examined based on the human experimental series.Cholesteatoma is a potentially life-threatening middle-ear disease [39].In order to identify novel cholesteatoma-related genes, scientists from medicine, molecular biology, neurobiology, and bioinformatics began investigating the middle-ear disease in detailed experimental molecular studies, in which VANESA was used as the major bioinformatics tool for biological system modeling and analysis.The aim was to identify and reconstruct proteinprotein interaction networks which might describe the transition of a healthy system into an altered one that causes the development of cholesteatoma.
At the onset of this study, differentially expressed genes in human cholesteatoma in comparison to healthy external auditory canal skin were investigated in microarray experiments.Using this method, the most significant differentially expressed genes within the selected human samples were identified by means of an in-house R-statistic-software-based analysis pipeline, which includes some of the most important Bioconductor4 software packages and is used as the basis for the reconstruction of the biological networks in VANESA.The initial point for the network reconstruction was a list of about 20 hand-selected genes showing a strong differential expression pattern in the cholesteatoma.For each of the selected genes, protein-protein interaction and signaling networks were reconstructed automatically with information derived from the databases IntAct [18], HPRD [17], and Mint [20].This resulted in a set of biological networks containing the direct interaction partners and other nearby biological elements.In general, each of the networks is constructed of at least 15 and up to 200 biological elements.Based on these networks, medical scientists began to compare and investigate the different reconstructed networks in VANESA to identify significant regulatory motifs and structures.Therefore, the network comparison function of VANESA was used, which highlighted similarities and differences between the analyzed networks.Having a notion about the relevant elements and structures, the networks were reduced to the relevant parts and merged into a global network, containing the significant structures and elements of the initial networks.
In the next step, the microarray results were mapped on the analyzed and filtered networks using the microarray-fold-change import function in VANESA.Each of the nodes within the networks was colored with regard to its fold-change, showing the biological regulatory effect in the system.The graph theoretical environment with its node degree centrality measurements highlighted the most significant elements.One example of such a network is presented in Figure 2.This signaling network represents the S100 interaction network.Proteins are the nodes and the edges are protein activation/inactivations, such as phosphorylation and dephosphorylation.The microarray expression level is shown by the color of each node.The network shows the correlation of the up-regulated S100A7, S100A8, and S100A9 genes, as well as the ILK, IκB, USF2, and ARRB2 genes.
Due to these reconstructed and analyzed models, scientists were able to identify genes potentially involved in cholesteatoma development, and furthermore, regulatory motifs and structures which were previously unknown.They investigated all elements within the network showing a high fold-change which were closely connected in the reconstructed signaling pathways.This step was performed in the network visualization pane of VANESA, where the scientist could visually and interactively examine the reconstructed systems.After selecting the most promis-ing motifs, selected genes within these regulatory structures were further analyzed with the real time PCR to prove that the reconstructed regulatory networks were correct.
In summary, the study revealed a set of unknown potentially involved genes in the cholesteatoma development.After analyzing and putting all results into relationship, it was possible to demonstrate that the expression profile of cholesteatoma is similar to a metastatic tumor and chronically inflamed tissue.Furthermore, the reconstructed biological networks, which include cholesteatoma-regulated transcripts, are a valuable new framework for drug-targeting and therapy-development.These regulatory networks enriched with experimental findings are, to the best of our knowledge, the first ones which could help scientists from medicine and biology identify molecular switches turning a healthy system into an unhealthy one.Further biological analyses of these networks are ongoing and already show new findings, which are under experimental investigation.

Conclusions
To deepen the knowledge of a biological system, it is necessary to generate hypotheses and test them.For this purpose, the processes of modeling the biological system, analyzing and simulating the model, and analyzing the simulation results have to be worked through.
Existing applications only cover some of the processes.Thus we developed VANESA which is unique in its ability to reconstruct, visualize, analyze, and simulate biological systems in one work flow.Users have the possibility to semi-automatically reconstruct entire molecular interaction systems in the form of biological networks.Instead of collecting, transforming, normalizing, and linking information from distributed and heterogeneous data sources, research objects can be automatically reconstructed using data from the data warehouse DAWIS-M.D..This results in biological models which enable scientists to focus on complex interactions and/or to investigate the role of individual components and processes within entire biological systems.This can be done by the analysis, based on graph theoretical algorithms, and the simulation of the network.
For the simulation of biological processes users do not need data for kinetics or knowledge about mathematical differentiation equations and programming.VANESA provides an easy-touse graphical user interface in which biological networks can be modeled and simulated using the PNlib.Simulations can be performed using qualitative, stochastic, continuous, hybrid, and functional Petri nets of the xHPN formalism.
Furthermore, using the export function (SBML format in particular) of VANESA, it is possible to share results and models with other applications.Finally, VANESA has already been proven very useful in a wide range of application cases [35,36,37,23,24,38].

Future perspectives
So far, OpenModelica does not support all features defined in xHPN formalism.Thus, for simulation, only continuous functional Petri nets including inhibitory arcs are considered.In doi:10.2390/biecoll-jib-2014-239 general, we are working on the complete support of the xHPN formalism with focus on the support of discrete elements.
Moreover, it is intended to apply and parallelize mathematical models and computational techniques to find new functional relevant building blocks in complex networks.Since these approaches are very intensive in time and complexity, one goal is to perform the approach with parallel algorithms on an existing cluster.

Figure 1 :
Figure 1: An overview of VANESA's aims and objectives [7].The numbers 1 to 7 represent the functionalities of VANESA to model, simulate, analyze, and share biological models.Each number represents a differing bioinformatics approach.

Figure 2 :
Figure 2: Biological degree centrality measurement in a biological protein-protein interaction network (see Section 4) in VANESA.Nodes with the most incident edges are highlighted.Nodes with the same vertex degree are colored in the same way.

Figure 3 :
Figure 3: Simulation results of the transcription-regulated lac-operon system of the bacterium Escherichia coli within VANESA.The presented example simulates the cell behavior of bacterium in response to decreasing glucose and increasing lactose in the cell environment.The charts show the cell dynamics of involved biological elements such as lactose, glucose, and the lacZ gene.

Table 1 :
Comparison of existing software applications concerning necessary features for the modeling and analysis of biological systems.Legend: ++ (strong), + (good), o (sufficient), -(weak), p (only available as plugin/ extension).