In biological research, Saccharomyces cerevisiae yeast cells are used to study the behaviour of proteins. This is a time consuming and not completely objective process. Hence, Image analysis platforms are developed to address these problems and to offer analysis per cell as well. The robust segmentation algorithms implemented in such platforms enables us to apply a machine learning approach on the measured cells. Such approach is based on a set of relevant individual cell features extracted from the microscope images of the yeast cells. In this paper, we composed a set of features to represent the intensity and morphology characteristics in a more sophisticated way. These features are based on first and second order histograms and wavelet-based texture measurement. To show the discrimination power of these features, we built a classification model to discriminate between different groups. The building process involved evaluation of a set of classification systems, data sampling techniques, data normalization schemes and attribute selection algorithms. The results show a significant ability to discriminate different cell strains and conditions; subsequently it reveals the benefits of the classification model based on the introduced features. This model is promising in revealing subtle patterns in future high-throughput yeast studies.
Integration of information is quintessential to make use of the wealth of bioinformatics resources. One aspect of integration is to make databases interoperable through well annotated information. With new databases one strives to store complementary information and such results in collections of heterogeneous information systems. Concepts in these databases need to be connected and ontologies typically provide a common terminology to share information among different resources.
Our focus of research is the zebrafish and we have developed several information systems in which ontologies are crucial. Pivot is an ontology describing the developmental anatomy, referred to as the Developmental Anatomy Ontolgoy of Zebrafish (DAOZ). The anatomical and temporal concepts are provided by the Zebrafish Information Network (ZFIN) and proven within the research community. We have constructed a 3D digital atlas of zebrafish development based on histology; the atlas is series of volumetric models; in each instance, every volume element is assigned to an anatomical term. Complementing the atlas we developed an information system with 3D patterns of gene expression in zebrafish development based on marker genes. The spatial and temporal annotations to these 3D images are drawn from the ontology that we have designed. In its design the DAOZ ontology is structured as a Directed Acyclic Graph (DAG). Such is required to find unique concept paths and prevent self referencing.
As we need to address the ontology in a direct manner, the DAG structure is transferred to a database. The database is used in the integration of our databases that share concepts at different levels of aggregation. In order to make sure that sufficient levels of aggregation for applications in mind are present, the original vocabulary was enriched with more relations and concepts. Both databases can now be addressed with the same unique terms and co-occurrence and co-expression of genes can be readily extracted from the databases. Integration can be further extended to the ZFIN resource and also by including ontologies that relate to gene/gene expression (e.g. Gene Ontology). In this manner, interoperable information retrieval from heterogeneous databases can be realized. This greatly facilitates processing complex information and retrieving relations in the data through machine learning approaches.
The Gene Expression Management System (GEMS) is a database system for patterns of gene expression. These patterns result from systematic whole-mount fluorescent in situ hybridization studies on zebrafish embryos. GEMS is an integrative platform that addresses one of the important challenges of developmental biology: how to integrate genetic data that underpin morphological changes during embryogenesis. Our motivation to build this system was by the need to be able to organize and compare multiple patterns of gene expression at tissue level. Integration with other developmental and biomolecular databases will further support our understanding of development. The GEMS operates in concert with a database containing a digital atlas of zebrafish embryo; this digital atlas of zebrafish development has been conceived prior to the expansion of the GEMS. The atlas contains 3D volume models of canonical stages of zebrafish development in which in each volume model element is annotated with an anatomical term. These terms are extracted from a formal anatomical ontology, i.e. the Developmental Anatomy Ontology of Zebrafish (DAOZ). In the GEMS, anatomical terms from this ontology together with terms from the Gene Ontology (GO) are also used to annotate patterns of gene expression and in this manner providing mechanisms for integration and retrieval . The annotations are the glue for integration of patterns of gene expression in GEMS as well as in other biomolecular databases. At the one hand, zebrafish anatomy terminology allows gene expression data within GEMS to be integrated with phenotypical data in the 3D atlas of zebrafish development. At the other hand, GO terms extend GEMS expression patterns integration to a wide range of bioinformatics resources.
We present a novel approach to modelling biological information using ontologies. The system interlinks three ontologies, comprising anatomical, developmental and taxonomical information, and includes instances of structures for different species. The framework is constructed for comparative analyses in the field of evolutionary development. We have applied the approach to the vertebrate heart and present four case studies of the functionality of the system, focusing on cross-species comparisons, developmental studies, physiological studies and 3D visualisation.
microRNAs are short RNA fragments that have the capacity of regulating hundreds of target gene expression. Currently, due to lack of high-throughput experimental methods for miRNA target identification, a collection of computational target prediction approaches have been developed. However, these approaches deal with different features or factors are weighted differently resulting in diverse range of predictions. The prediction accuracy remains uncertain. In this paper, three commonly used target prediction algorithms are evaluated and further integrated using algorithm combination, ranking aggregation and Bayesian Network classification. Our results revealed that each individual prediction algorithm displays its advantages as was shown on different test data sets. Among different integration strategies, the application of Bayesian Network classifier on the features calculated from multiple prediction methods significantly improved target prediction accuracy.
Mining patterns of gene expression provides a crucial approach in discovering knowledge such as finding genetic networks that underpin the embryonic development. Analysis of mining results and evaluation of their relevance in the domain remains a major concern. In this paper we describe our explorative studies in support of solutions to facilitate the analysis and interpretation of mining results. In our particular case we describe a solution that is found in the extension of the Gene Expression Management System (GEMS), i.e. an integrative framework for spatio-temporal organization of gene expression patterns of zebrafish to a framework supporting data mining, data analysis and patterns interpretation As a proof of principle, the GEMS has been equipped with data mining functionality suitable for spatio-temporal tracking, thereby generating added value to the submission of data for data mining and analysis. The analysis of the genetic networks is based on the availability of domain ontologies which dynamically provides meaning to the discovered patterns of gene expression data. Combination of data mining with the already presently available capabilities of GEMS will significantly augment current data processing and functional analysis strategies
Fungi have crucial roles in ecosystems, and are important associates for many organisms. They are adapted to a wide variety of habitats, however their global distribution and diversity remains poorly documented. The exponential growth of DNA barcode information retrieved from the environment is assisting considerably the traditional ways for unraveling fungal diversity and detection. The raw DNA data in association to environmental descriptors of metabarcoding studies are made available in public sequence read archives. While this is potentially a valuable source of information for the investigation of Fungi across diverse environmental conditions, the annotation used to describe environment is heterogenous. Moreover, a uniform processing pipeline still needs to be applied to the available raw DNA data. Hence, a comprehensive framework to analyses these data in a large context is still lacking. We introduce the MycoDiversity DataBase, a database which includes public fungal metabarcoding data of environmental samples for the study of biodiversity patterns of Fungi. The framework we propose will contribute to our understanding of fungal biodiversity and aims to become a valuable source for large-scale analyses of patterns in space and time, in addition to assisting evolutionary and ecological research on Fungi.