This paper surveys the area of biological data integration and data warehousing, which has become a major focus of the data integration research field in the last few decades. The challenges in biological data integration are caused by several factors such as the variety and amount of available data, the heterogeneity of the data in different sources, and the autonomy and different capabilities of the sources. This paper gives insight into a small selection of important biological databases and the problems in biological data integration. We would like to focus on data warehouses that have become a popular approach in bioinformatics and life sciences. We will also introduce major existing integration systems that have been developed such as SRS, DiscoveryLink, BioWarehouse and ONDEX. Finally, this paper presents an in-house data warehouse approach for biological data.
Petri net methodology has been widely used for modeling and
simulation of biological systems due to its intuitive graphical
representation and mathematical power for concurrent systems. In this
article, some of the major problems arising in this field are addressed,
ongoing progresses are discussed, and possible solutions are suggested.
Control of cell proliferation, differentiation, activation and cell removal is crucial for the development and existence of multi-cellular organisms. Apoptosis, or programmed cell death, is a major control mechanism by which cells die and is also important in controlling cell number and proliferation as part of normal development. Molecular networks that regulate these processes are critical targets for drug development, gene therapy, and metabolic engineering. The molecular interactions involved in this and other processes are analyzed and annotated by experts and stored as data in different databases. The key task is to integrate, manage and visualize these data available from different sources and present them in a user-comprehensible manner.
Here we present VINEdb, a data warehouse developed to interact with and to explore integrated life science data. Extendable open source data warehouse architecture enables platform-independent usability of the web application and the underlying infrastructure. A high degree of transparency and up-to-dateness is ensured by a monitor component to control and update the data from the sources. Furthermore, the system is supported by a visualization component to allow interactive graphical exploration of the integrated data. We will use apoptotic pathway and caspase-3 as a case study to show capability and usability of our approach. VINEdb is available at http://tunicata.techfak.unibielefeld.de/VINEdb/.
A significant part of cellular proteins undergo reversible thiol-dependent redox transitions which often control or switch protein functions. Thioredoxins and glutaredoxins constitute two key players in this redox regulatory protein network. Both interact with various categories of proteins containing reversibly oxidized cysteinyl residues. The identification of thioredoxin/glutaredoxin target proteins is a critical step in constructing the redox regulatory network of cells or subcellular compartments. Due to the scarcity of thioredoxin/glutaredoxin target protein records in the public database, a tool called Reversibly Oxidized Cysteine Detector (ROCD) is implemented here to identify potential thioredoxin/glutaredoxin target proteins computationally, so that the in silico construction of redox regulatory network may become feasible. ROCD was tested on 46 thioredoxin target proteins in plant mitochondrion, and the recall rate was 66.7% when 50% sequence identity was chosen for structural model selection. ROCD will be used to predict the thioredoxin/glutaredoxin target proteins in human liver mitochondrion for our redox regulatory network construction project. The ROCD will be developed further to provide prediction with more reliability and incorporated into biological network visualization tools as a node prediction component. This work will advance the capability of traditional database- or text mining-based method in the network construction.
For the implementation of the virtual cell, the fundamental question is how to model and simulate complex biological networks. Therefore, based on relevant molecular database and information systems, biological data integration is an essential step in constructing biological networks. In this paper, we will motivate the applications BioDWH - an integration toolkit for building life science data warehouses, CardioVINEdb - a information system for biological data in cardiovascular-disease and VANESA- a network editor for modeling and simulation of biological networks. Based on this integration process, the system supports the generation of biological network models. A case study of a cardiovascular-disease related gene-regulated biological network is also presented.
One of the major challenges in bioinfomatics is to integrate and manage data from different sources as well as experimental microarray data and present them in a user-friendly format. Therefore, we present CardioVINEdb, a data warehouse approach developed to interact with and explore life science data. The data warehouse architecture provides a platform independent web interface that can be used with any common web browser. A monitor component controls and updates the data from the different sources to guarantee up-todateness. In addition, the system provides a “static” and “dynamic” visualization component for interactive graphical exploration of the data.
This paper presents a novel bioinformatics data warehouse software kit that integrates biological information from multiple public life science data sources into a local database management system. It stands out from other approaches by providing up-to-date integrated knowledge, platform and database independence as well as high usability and customization. This open source software can be used as a general infrastructure for integrative bioinformatics research and development. The advantages of the approach are realized by using a Java-based system architecture and object-relational mapping (ORM) technology. Finally, a practical application of the system is presented within the emerging area of medical bioinformatics to show the usefulness of the approach.
The BioDWH data warehouse software is available for the scientific community at http://sourceforge.net/projects/biodwh/.