To the question of the digital platform “bioinformatics” creating and its system-forming solutions

Abstract The article reports the main provisions of the concept and solutions for creating the digital platform in the field of bioinformatics and the formation of the thematically oriented and industrial digital ecosystems on its basis. The composition and structure of the digital platform are discussed: information repositories, data and knowledge bases, thematically oriented software repository, task-oriented services for various target groups of users. Within the framework of the platform, it is also planned to organize a system of high-quality access to specialized data centres and high-performance computing infrastructure. Particular attention is devoted to one of the components of such platform - the project office for bioresource collections management. The project office has registered such types of collections as animal collections: wild and laboratory animals, live breeding, museum zoological animal collections, farm animals; plant collections: herbarium funds of plants biological diversity, living collections of natural flora, agricultural plants. Collection types such as collections of human biomaterials, cell culture collections, microorganism collections are important for medical research.


Introduction
The solutions to problems of ensuring food safety and human health, development of biotechnology are largely related to scientific and technological potential of systemic and molecular biology, genetics. One of the strategic tools for solving numerous tasks in this area is bioinformatics. The results of research in the field of bioinformatics are mathematical and computer (software and hardware) tools for collection, storage, processing, analysis and visualization of biological data, solving applied problems based on analysis of biological data and modelling of biological systems.
For example, the identification of genes that control the productivity traits of cultivated cereals requires the collection and analysis of morphometric data from numerous plants. Manual estimation of parameters such as the number of grains per ear and the grain size is laborious. One of the options for solving this food safety problem is automated computer analysis. The example of such automation is the SeedCounter mobile application developed for grain analysis [1,2]. And its adaptation to assess the morphological characteristics of wild potato tubers using digital image processing [3].
Another practical example in the field of genetic research is FunGeneNet software. This online tool can be used for reconstruction gene networks using experimental gene sets and for estimation their difference from random networks. Using the tool on thyroid cancer and apoptosis networks made possible a biological interpretation of the original gene/protein sets [4]. The EloE (Elongation Efficiency) web application, in turn, can be useful in preliminary estimation of translation elongation efficiency for genes for which experimental data are not available yet. Its results can be used, for instance, in other programs modelling artificial genetic structures in genetically engineered experiments [5].
Considerable experience has been accumulated in the field of bioinformatics and its applicationsmathematical and software solutions have been developed, and specialized information resources have been created (e.g. [6][7][8]). The advances in systems biology and bioinformatics have allowed the reconstruction of several human genome-scale metabolic models providing background for computer pipelines development. Accordingly, database management systems for gene network reconstruction, data retrieving and information storage were developed. However, these are segmental resources and solutions usually. Thereby, the problem of its inventory and integration into the national digital platform "Bioinformatics", focused on solving scientific and applied problems of systemic and molecular biology, genetics, is of particular urgency. Such platform should conceptually be a database for data integration at the national level, providing information management tools for non-experts and after specialized analysis. These issues have already been systematically covered in the course of conferences on integrative bioinformatics [9,10].
The proposed article describes the approach developed and implemented at the Federal Research Centre Institute of Cytology and Genetics SB RAS, to the creation of the digital platform "Bioinformatics" and the implementation of its system-forming solutions.
The concept of the "Bioinformatics" digital platform involves the creation of the integrated solutions set, including: -The technologies and tools of "information storages", databases and knowledge bases for organizing reliable storage of information resources and providing guaranteed access to them (e.g. [11,12]). -The repository of thematically oriented science-based software for processing and analysing information, modelling of complex systems of the considered problem area (e.g. [13][14][15][16]). -Tools for creating intelligent systems for user solutions supporting in the development and execution of scenarios for solving domain-specific tasks (including using technologies based on knowledge processing). -Decisions on implementation of thematically oriented environment (info-communication infrastructure) based on "cloud technologies", specialized data centres and high-performance computing infrastructure (e.g. [17,18]), including organization of high-quality access system to the national high-performance computing infrastructure at petaflops level. -The set of task-oriented services aimed at various target users' groups, and organization of access to them through the system of "hubs" (portals) (e.g. [19][20][21][22]). -Set of solutions to support system of training, retraining and advanced training of specialists.
The digital platform "Bioinformatics" is considered as the basis for formation of thematically oriented and industrial digital ecosystems, including for various sub-sectors of "smart" agriculture, medicine, information and computer support for managing the development of bioresource collections of the Russian Ministry of Science and Higher Education. The databases described below can be the examples of such collections. The effectiveness of wheat' phenotyping can be improved by the introduction of an automated image processing technology, storage of information in databases, use of machine learning algorithms to analyse this information. The computer-Aided information system SpikeDroidDB allows collecting, storing and analysing of information about morphometric characteristics of ears of wheat [23]. The TRY database of plant traits provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide [24]. Plant trait data represent the basis for a vast area of research spanning from biodiversity conservation to evolutionary biology, landscape and ecosystem management, and restoration.
Supporting the foundations and making the biological research infrastructure more sustainable is being discussed not only in Russia but also all over the world. For example, the United States Cultural Collection Network brings together representatives of living collections to collaborate and share information within living collection communities, improve information sharing, communication and collective action [25]. Another example is H3ABioNet, a pan-African bioinformatics network created to enable the African bioinformatics community to analyze its data. Although there were already advances in bioinformatics in Africa at H3ABioNet inception, overall large-scale analysis of genomic data was limited by the availability of expertise and infrastructure [26].
There is also a great need to consider aspects such as data integration and the development of interaction standards in medicine. Current situation with coronavirus pandemic urges on integration of available data on biodiversity, studying molecular mechanisms of host-pathogen interaction that needs in turn integration of data in microbiology, medicine, systems biology fields [27]. A concerted effort is being made in Europe to achieve consensus in this area on standards, security and privacy, and ethical and legal issues [28]. An example of this trend in Asia is the expansion of electronic health records at the University of Malaya Medical Center (UMMC). The integration of multiple clinical visit data sources provides a more complete and accurate update of real-time clinical data used for epidemiological research. This interdisciplinary collaboration has improved the quality of data collection in clinical services, improved hospital data monitoring, quality assurance, and research data management [29].
The prevalence of comorbid diseases poses a major health issue for millions of people worldwide and an enormous socio-economic burden for society. To date, more than 24 million scientific papers can be found in PubMed, with many of them containing descriptions of a wide range of biological processes. For research in this field the workflow system GenCoNet was developed to aggregate data on biomedical entities from heterogeneous data sources [30]. Rapid development of high-performance genomic technologies led to an information explosion in the field of biology and medicine. Effective access to knowledge distributed over a multitude of nonformalized textual sources requires the use of special computer-assisted intelligent methods of data mining. The ANDSystem package was developed for the reconstruction and analysis of molecular genetic networks based on an automated text-mining technique [31]. The SOLANUM TUBEROSUM is the computer platform for complex intellectual processing of large data bodies, including automatic analysis of scientific publications and databases for extraction of information on potato, formalized representation of extracted information in the knowledge base, user access to these data, analysis and visualization of query results [32].
The formation of thematically oriented digital ecosystem creates opportunities for its participants that are often not feasible outside this ecosystem. According to [33], the following are considered as the main characteristics of the digital ecosystem: information technology infrastructure and single information environment for participants' interaction; openness and ability to connect new members; algorithmization of participants' interaction; mutually beneficial interactions of participants (win-win principle); significance of the participants' amount in the activity (scale); cost reduction for ecosystem participants. The term « bioresource collection» in the article is understood as a combination of biological objects of natural and or artificial origin, having a common set of specific characteristics, used to carry out scientific, applied research and for educational process. Organizations of the Russian Ministry of Science and Higher Education support bioresource collections of nine main types (as shown in Figure 2): collections of human biomaterials (e.g. [34]); animal collections: wild and laboratory animals, live breeding (e.g. [35][36][37][38]); animal collections: museum zoological animal collections; animal collections: farm animals; cell culture collections; microorganism collections (e.g. [39,40]); plant collections: herbarium funds of plants biological diversity; plant collections: living collections of natural flora; plant collections: agricultural plants (e.g. [41,42]).

The project office for bioresource collections management implementation
To provide information and computer support for the project activities of teams from research organizations when working with bioresource collections, the web-based information system was developed -the project office. The project office provides support for the electronic document management at the all stages of the formation and implementation of government tasks relating to the organization of the bioresource collections life activity (as shown in Figure 3), including such procedures as:  To solve all of the above problems, it is advisable to use a single development tool for web-based information systems [43,44]. Based on previous experience, Drupal web framework was chosen as such the tool by the development team [45] . This framework is open and free (distributed under the GNU GPL license). The framework consists of the kernel and allows the extension of its functionality by connecting additional modules. The use of the single development tool significantly reduced the time and financial costs of creating information systems for supporting collections, eliminated the need to create communication interfaces between different systems and organized the single user entry point for all collections. After discussing the specifications and creating database collections based on the framework, the corresponding types of information materials were created, interactive forms for working with them were developed, and role-based control of user access to the project office information content was organized.

The workflow of registration of new users and collections
The editorial staff of the project office carry out users' registration. New users can apply for registration using the appropriate form on the main page of the project office. After the user's registration, the letter will be sent to the e-mail specified by him during registration with the data for the initial entrance to the project office. By clicking on the link from the letter, the user will be able to activate his profile in the project office and (if necessary) change his data in the profile. Further, users can use any browser installed on their PC to work in the project office. The project office editorial staff recommends using the latest versions of modern browsers such as Google Chrome, Mozilla Firefox, etc. The e-mail address specified by the user in the registration application is used as the username. The service "forgot password" is also available, which allows user to automatically recover a forgotten password. The user to be added is assigned a role, according to which he will receive a certain set of privileges as part of his work in the project office. A set of rights is established in accordance with the ideology of the various users' roles. Changing individual rights affects all users with that role.
The following user roles currently exist in the project office: -The anonymous user (filing applications for registering collections). The application form for registering a new collection is available for new users in the project office. The results of filling out this form will be sent to the project office e-mail address and displayed in the summary table of all applications (as shown in Figure 4).
To register a new collection in the project office, the following materials should appear: a new user with the "head of collection" role, a taxonomy term with corresponding name and position in the "Collection Tree", the three initial materials to start working with collection in the appropriate sequence: organization, person, passport of collection.
The list of fields to be collected in the registration application form is determined by the set of the required fields in the information materials related to the collection (organization, individual, passport, report/application): The organization of the Russian Ministry of Science and Higher Education holding collection (required for passport, individual), The individual name (required for passport), Bioresource collections categories, The place of work, The collection name, The collection page title (the taxonomy term), The collection type (selection from the list). All content of the project office consists of the following materials set (as shown in Figure 5): passports of collections, works reports over past year, applications for funding for the next year, institutions holding collections, persons (heads of collections, employees of the Ministry), documents.
These types of materials are tied to collections, which together allows gathering all metadata of collections in one place for research, assessment and expertise (as shown in Figure 6 and Figure 7).

The content of the project office
The following main information sections are available for the registered users of the project office: "Bioresource Collections", "Working Group", "Organizations", "Individuals", "Useful Information".  In "Bioresource Collections" section, users have access to the list of the categories of the collections registered at the project office, the list of collections for each category, the consolidated list of the information materials for each collection, and individual materials (depending on the user's access rights).
In "Working Group" section, users can access documents related to the working group's activities, divided into the following categories: "Regulatory and Organizational Documents", "Working Plans", "Records of the Working Group's Meetings and Decisions", "Expert Groups' Reports", "Correspondence with the Russian Ministry of Science and Higher Education", "Correspondence with Organizations Holding Collections", "Reference Information".
In "Useful Information" section, users are available documents related to both the work of the project office itself and general documents that are not specific to the working group's work. The section is divided into the following categories: "Regulatory", "Methods and Instructions", "Examples of Documents", "Analytics", "Materials of Meetings and Conferences".
Either each individual document is presented as a full-text page in the project office itself, or as a file that users can download to their computers for further use.
In "Organizations" section, users can access cards of organizations holding collections. The list supports sorting by individual columns, as well as selections by the following fields: "Abbreviation", "Department affiliation", "Types of supported collection", "Phone number", "Email", "Head of organization".
In "Individuals" section, users can access individuals' cards related to the activities of bioresource collections (heads of collections, working group's members, experts, etc.). The list supports sorting by individual columns, as well as selections by the following fields: "Role in the project office", "Academic degree", "Academic title", "Position", "Organization (place of work)", "Phone number", "Email".
After finding the right organization or individual, user can separately view their complete card with all necessary information regarding working with bioresource collections.
The users' work with information materials of the project office is divided into two main stages -the material creation or its editing. Only part of the materials may be available to the user depending on its role in the project office. The material creating process is reduced to filling out a form consisting of a set of different fields (as shown in Figure 8). The set of possible field types will be described in more detail below. After saving, the information material will appear in the project office section that the author selected when creating it. Some of the fields are required.
To edit any information material in the project office, the user needs to go to the material section, and then select the material itself. If the user has enough permissions to edit this material, then the "Edit" button will appear below the heading of the material. Clicking on it will open the editing form of the material. Next, we will analyse the process of working with certain individual types of fields: -The autocomplete field allows user to select one of several items on the list already entered into the project office earlier. To do this, in this field, user should begin to enter the name of the item, and then select one of the options proposed by the project office. Autocomplete is used to facilitate the selection of very large lists. -The group field combines several different fields whose values should be logically related to each other.
-The address field (to indicate the URL address on the Internet) is divided into two parts -the address itself and its description, which is used to display meaningful text when viewing material instead of the URL address itself. -A user of the project office can use the pop-up calendar to select a date.
-The multi-line text field is used to enter large text blocks that exceed a single text line. This field is equipped with the visual text editor, the interface of which is very similar to the interface of standard office suites. -Separate fields enable users to enter multiple values into them. After filling in the next line, user can click the "Add more" button and get new lines for entering data.
Most fields of information materials, regardless of their type, are equipped with prompts explaining to users exactly what data should be entered into these fields and thereby facilitating users to work with them.

Discussion
The methodological approaches described in the article to the organization of a digital platform in the field of bioinformatics and the creation of thematically oriented digital ecosystems on its basis find practical Technical and consulting support are provided to users during the period of making applications in the system and their expertise [46]. The functionality of the project office was tested by potential users, during the initial operation and mass registration of the collections. Comments and suggestions from the heads of the bioresource collections and experts was carried out. The following main promising areas for the further development of the project office are listed below.
The analysis of the content of the project office -organizing the collection of statistics on visits in general, the definition of the popular information materials, file downloads statistics, visitors sources (search engines, social networks, other sites, etc.) and other parameters. Conducting the regular analysis of statistics on attendance of individual information materials and the project office as a whole.
The development of the user interfaces of the project office version for accessing and working with it from various mobile devices (separation depending on screen resolution and processing power) and operating systems running on them (iOS, Android, etc.).
The organization of constant and generally accessible announcements of materials export sorted into thematic RSS (Really Simple Syndication) feeds, as well as reverse integration of news from scientific sites of similar subjects.
The organization of electronic mailing lists (news, announcements, notifications, etc.) for formed and constantly updated users' groups (roles, types of collections, etc.).
The technical support of the project office: updating and development of the software of the server, providing the ability to perform the functions of the project office in accordance with the tasks, conducting its testing and debugging; maintenance of the developed software and other software tools.