Prochlorococcus marinus MIT 9303 is a marine cyanobacterium found in sea waters. It was first isolated from a depth of 100 m in the Sargasso Sea in the year 1992. This cyanobacterium serves as a good model system for scientific research due to the presence of many desirable characteristics like smaller in size, ability to perform photosynthesis and the ease of culture maintenance. The genome of this cyanobacterium encodes for about 3022 proteins. Out of these 3022 proteins, few proteins were annotated as hypothetical proteins. We performed a computational study to characterize one of the hypothetical proteins “P9303_05031” to deduce its functional role in the cell using various bioinformatics techniques. After in-depth analysis, this hypothetical protein showed the conserved domain as of Hsp10 of molecular chaperonins of GroES. In this work, we have predicted the bidirectional best hits for the hypothetical protein P9303_05031 followed by the prediction of protein properties such as primary, secondary and tertiary structures. The existence of the Hsp10 domain indicates its role is essential for the folding of proteins during heat shock. This work represents the first structural and physicochemical study of the hypothetical protein P9303_05031 in Prochlorococcus marinus MIT 9303.
Preliminary studies which may be of significance for research against coronaviruses, including SARS-CoV-2, which has caused an epidemic in China, are presented. An analysis was made of publicly available data that contain information about important metabolites neutralizing coronaviruses. Preliminary studies show that especially Ficus, barley, thistle and sundew should be additionally tested with the aim of producing medicines for coronavirus.
There is a growing need to analyze data sets characterized by several sets of variables observed on the same set of individuals. Such complex data structures are known as multiblock (or multiple-set) data sets. Multi-block data sets are encountered in diverse fields including bioinformatics, chemometrics, food analysis, etc. Generalized Canonical Correlation Analysis (GCCA) is a very powerful method to study this kind of relationships between blocks. It can also be viewed as a method for the integration of information from K > 2 distinct sources (Takane and Oshima-Takane 2002). In this paper, GCCA is considered in the context of multivariate functional data. Such data are treated as realizations of multivariate random processes. GCCA is a technique that allows the joint analysis of several sets of data through dimensionality reduction. The central problem of GCCA is to construct a series of components aiming to maximize the association among the multiple variable sets. This method will be presented for multivariate functional data. Finally, a practical example will be discussed.
JIB.tools 2.0 is a new approach to more closely embed the curation process in the publication process. This website hosts the tools, software applications, databases and workflow systems published in the Journal of Integrative Bioinformatics (JIB). As soon as a new tool-related publication is published in JIB, the tool is posted to JIB.tools and can afterwards be easily transferred to bio.tools, a large information repository of software tools, databases and services for bioinformatics and the life sciences. In this way, an easily-accessible list of tools is provided which were published in JIB a well as status information regarding the underlying service. With newer registries like bio.tools providing these information on a bigger scale, JIB.tools 2.0 closes the gap between journal publications and registry publication. (Reference: https://jib.tools).
The infection mechanism and pathogenicity of Human T-lymphotropic virus 1 (HTLV-1) are ambiguously known for hundreds of years. Our knowledge about this virus is recently emerging. The purpose of the study is to design a vaccine targeting the envelope glycoprotein, GP62, an outer membrane protein of HTLV-1 that has an increased number of epitope binding sites. Data collection, clustering and multiple sequence alignment of HTLV-1 glycoprotein B, variability analysis of envelope Glycoprotein GP62 of HTLV-1, population protection coverage, HLA-epitope binding prediction, and B-cell epitope prediction were performed to predict an effective vaccine. Among all the predicted peptides, ALQTGITLV and VPSSSTPL epitopes interact with three MHC alleles. The summative population protection coverage worldwide by these epitopes as vaccine candidates was found nearly 70%. The docking analysis revealed that ALQTGITLV and VPSSSTPL epitopes interact strongly with the epitope-binding groove of HLA-A*02:03, and HLA-B*35:01, respectively, as this HLA molecule was found common with which every predicted epitope interacts. Molecular dynamics simulations of the docked complexes show they form stable complexes. So, these potential epitopes might pave the way for vaccine development against HTLV-1.
Genetic variance within the genotype of population and its mapping to phenotype variance in a systematic and high throughput manner is of interest for biodiversity and breeding research. Beside the established and efficient high throughput genotype technologies, phenotype capabilities got increased focus in the last decade. This results in an increasing amount of phenotype data from well scaling, automated sensor platform. Thus, data stewardship is a central component to make experimental data from multiple domains interoperable and re-usable. To ensure a standard and comprehensive sharing of scientific and experimental data among domain experts, FAIR data principles are utilized for machine read-ability and scale-ability. In this context, BrAPI consortium, provides a comprehensive and commonly agreed FAIRed guidelines to offer a BrAPI layered scientific data in a RESTful manner. This paper presents the concepts, best practices and implementations to meet these challenges. As one of the worlds leading plant research institutes it is of vital interest for the IPK-Gatersleben to transform legacy data infrastructures into a bio-digital resource center for plant genetics resources (PGR). This paper also demonstrates the benefits of integrated database back-ends, established data stewardship processes, and FAIR data exposition in a machine-readable, highly scalable programmatic interfaces.
In this article, we propose a semi-automated method to rebuild genome ancestors of chloroplasts by taking into account gene duplication. Two methods have been used in order to achieve this work: a naked eye investigation using homemade scripts, whose results are considered as a basis of knowledge, and a dynamic programming based approach similar to Needleman-Wunsch. The latter fundamentally uses the Gestalt pattern matching method of sequence matcher to evaluate the occurrences probability of each gene in the last common ancestor of two given genomes. The two approaches have been applied on chloroplastic genomes from Apiales, Asterales, and Fabids orders, the latter belonging to Pentapetalae group. We found that Apiales species do not undergo indels, while they occur in the Asterales and Fabids orders. A series of experiments was then carried out to extensively verify our findings by comparing the obtained ancestral reconstruction results with the latest released approach called MLGO (Maximum Likelihood for Gene-Order analysis).
In the literature there can be found a wide collection of correlation and association coefficients used for different structures of data. Generally, some of the correlation coefficients are conventionally used for continuous data and others for categorical or ordinal observations. The aim of this paper is to verify the performance of various approaches to correlation coefficient estimation for several types of observations. Both simulated and real data were analysed. For continuous variables, Pearson’s r2 and MIC were determined, whereas for categorized data three approaches were compared: Cramér’s V, Joe’s estimator, and the regression-based estimator. Two method of discretization for continuous data were used. The following conclusions were drawn: the regression-based approach yielded the best results for data with the highest assumed r2 coefficient, whereas Joe’s estimator was the better approximation of true correlation when the assumed r2 was small; and the MIC estimator detected the maximal level of dependency for data having a quadratic relation. Moreover, the discretization method applied to data with a non-linear dependency can cause loss of dependency information. The calculations were supported by the R packages arules and minerva.
Modern chromatography largely uses the technique of gas chromatography coupled with mass spectrometry (GC–MS). For a set of data concerning the drought resistance of barley, the problem of the characterization of a covariance structure is investigated with the use of two methods. The first is based on the Frobenius norm and the second on the entropy loss function. For the four considered covariance structures – compound symmetry, three-diagonal and penta-diagonal Toeplitz and autoregression of order one – the Frobenius norm indicates the compound symmetry matrix and autoregression of order one as the most relevant, whilst the entropy loss function gives a slight indication in favor of the compound symmetry structure.
Triticale (Triticosecale Wittmack) is obtained through the crossing of wheat (Triticum ssp.) and rye (Secale cereale L.) and is characterized by high yield potential, good health and grain value, and high tolerance to biotic and abiotic stress. Poland is a very important region for progress in triticale breeding, since it is home to most cultivars, and numerous genetic studies on triticale have been carried out. Despite the tremendous interest in triticale among both breeders and researchers, there are no studies assessing the adaptation of cultivars to environmental conditions across growing seasons. This study was conducted to investigate the influence of cultivar, management, location and growing season on grain yield. At the same time, this approach provides a new way to determine whether there is any dependency between the eight seasons, and to find the cause of the yield response to environmental conditions in a given growing season.