Amino acid repeats are found to play important roles in both structures and functions of the proteins. These are commonly found in all kingdoms of life, especially in eukaryotes and a larger fraction of human proteins composed of repeats. Further, the abnormal expansions of shorter repeats cause various diseases to humans. Therefore, the analysis of repeats of the entire human proteome along with functional, mutational and disease information would help to better understand their roles in proteins. To fulfill this need, we developed a web database HPREP (http://bioinfo.bdu.ac.in/hprep) for human proteome repeats using Perl and HTML programming. We identified different categories of well-characterized repeats and domain repeats that are present in the human proteome of UniProtKB/Swiss-Prot by using in-house Perl programming and novel repeats by using the repeat detection T-REKS tool as well as XSTREAM web server. Further, these proteins are annotated with functional, mutational and disease information and grouped according to specific repeat types. The developed database enables the users to search by specific repeat type in order to understand their involvement in proteins. Thus, the HPREP database is expected to be a useful resource to gain better insight regarding the different repeats in human proteome and their biological roles.
The article reports the main provisions of the concept and solutions for creating the digital platform in the field of bioinformatics and the formation of the thematically oriented and industrial digital ecosystems on its basis. The composition and structure of the digital platform are discussed: information repositories, data and knowledge bases, thematically oriented software repository, task-oriented services for various target groups of users. Within the framework of the platform, it is also planned to organize a system of high-quality access to specialized data centres and high-performance computing infrastructure. Particular attention is devoted to one of the components of such platform - the project office for bioresource collections management. The project office has registered such types of collections as animal collections: wild and laboratory animals, live breeding, museum zoological animal collections, farm animals; plant collections: herbarium funds of plants biological diversity, living collections of natural flora, agricultural plants. Collection types such as collections of human biomaterials, cell culture collections, microorganism collections are important for medical research.
As stated by World Health Organization (WHO) report, 246 million individuals have suffered with diabetes disease over worldwide and it is anticipated that by 2025 this estimation can cross 380 million. So, the proper and quick diagnosis of this disease is turned into a significant challenge for the machine learning researchers. This paper aims to design a robust model for diagnosis of diabetes using a hybrid approach of Chaotic-Jaya (CJaya) algorithm with Extreme Learning Machine (ELM), which is named as CJaya-ELM. In this paper, Jaya algorithm with Chaotic learning approach is used to optimize the random parameters of ELM classifier. Here, to assess the efficacy of the designed model, Pima Indian diabetes dataset is considered. Here, the designed model CJaya-ELM, has been compared with basic ELM, Teaching Learning Based Optimization algorithm (TLBO) optimized ELM (TLBO-ELM), Multi-Layer Perceptron (MLP), Jaya algorithm optimized MLP (Jaya-MLP), TLBO algorithm optimized MLP (TLBO-MLP) and CJaya algorithm optimized MLP models. CJaya-ELM model resulted in the highest testing accuracy of 0.9687, sensitivity of 1, specificity of 0.9688 with 0.9782 area under curve (AUC) value. Results reveal that CJaya-ELM model effectively classifies both the positive and negative samples of Pima and outperforms the competitors.
We present here CellML 2.0, an XML-based language for describing and exchanging mathematical models of physiological systems. MathML embedded in CellML documents is used to define the underlying mathematics of models. Models consist of a network of reusable components, each with variables and equations giving relationships between those variables. Models may import other models to create systems of increasing complexity. CellML 2.0 is defined by the normative specification presented here, prescribing the CellML syntax and the rules by which it should be used. The normative specification is intended primarily for the developers of software tools which directly consume CellML syntax. Users of CellML models may prefer to browse the informative rendering of the specification (https://cellml.org/specifications/cellml_2.0/) which extends the normative specification with explanations of the rules combined with examples of their usage.
Despite the ever-progressing technological advances in producing data in health and clinical research, the generation of new knowledge for medical benefits through advanced analytics still lags behind its full potential. Reasons for this obstacle are the inherent heterogeneity of data sources and the lack of broadly accepted standards. Further hurdles are associated with legal and ethical issues surrounding the use of personal/patient data across disciplines and borders. Consequently, there is a need for broadly applicable standards compliant with legal and ethical regulations that allow interpretation of heterogeneous health data through in silico methodologies to advance personalized medicine. To tackle these standardization challenges, the Horizon2020 Coordinating and Support Action EU-STANDS4PM initiated an EU-wide mapping process to evaluate strategies for data integration and data-driven in silico modelling approaches to develop standards, recommendations and guidelines for personalized medicine. A first step towards this goal is a broad stakeholder consultation process initiated by an EU-STANDS4PM workshop at the annual COMBINE meeting (COMBINE 2019 workshop report in same issue). This forum analysed the status quo of data and model standards and reflected on possibilities as well as challenges for cross-domain data integration to facilitate in silico modelling approaches for personalized medicine.
Biological models often contain elements that have inexact numerical values, since they are based on values that are stochastic in nature or data that contains uncertainty. The Systems Biology Markup Language (SBML) Level 3 Core specification does not include an explicit mechanism to include inexact or stochastic values in a model, but it does provide a mechanism for SBML packages to extend the Core specification and add additional syntactic constructs. The SBML Distributions package for SBML Level 3 adds the necessary features to allow models to encode information about the distribution and uncertainty of values underlying a quantity.
Rule-based modeling is an approach that permits constructing reaction networks based on the specification of rules for molecular interactions and transformations. These rules can encompass details such as the interacting sub-molecular domains and the states and binding status of the involved components. Conceptually, fine-grained spatial information such as locations can also be provided. Through “wildcards” representing component states, entire families of molecule complexes sharing certain properties can be specified as patterns. This can significantly simplify the definition of models involving species with multiple components, multiple states, and multiple compartments. The systems biology markup language (SBML) Level 3 Multi Package Version 1 extends the SBML Level 3 Version 1 core with the “type” concept in the Species and Compartment classes. Therefore, reaction rules may contain species that can be patterns and exist in multiple locations. Multiple software tools such as Simmune and BioNetGen support this standard that thus also becomes a medium for exchanging rule-based models. This document provides the specification for Release 2 of Version 1 of the SBML Level 3 Multi package. No design changes have been made to the description of models between Release 1 and Release 2; changes are restricted to the correction of errata and the addition of clarifications.
This paper presents a report on outcomes of the 10th Computational Modeling in Biology Network (COMBINE) meeting that was held in Heidelberg, Germany, in July of 2019. The annual event brings together researchers, biocurators and software engineers to present recent results and discuss future work in the area of standards for systems and synthetic biology. The COMBINE initiative coordinates the development of various community standards and formats for computational models in the life sciences. Over the past 10 years, COMBINE has brought together standard communities that have further developed and harmonized their standards for better interoperability of models and data. COMBINE 2019 was co-located with a stakeholder workshop of the European EU-STANDS4PM initiative that aims at harmonized data and model standardization for in silico models in the field of personalized medicine, as well as with the FAIRDOM PALs meeting to discuss findable, accessible, interoperable and reusable (FAIR) data sharing. This report briefly describes the work discussed in invited and contributed talks as well as during breakout sessions. It also highlights recent advancements in data, model, and annotation standardization efforts. Finally, this report concludes with some challenges and opportunities that this community will face during the next 10 years.
This special issue of the Journal of Integrative Bioinformatics presents papers related to the 10th COMBINE meeting together with the annual update of COMBINE standards in systems and synthetic biology.
To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein–protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gram-negative and five compartments for Gram-positive bacterial proteins.