Classi ﬁ cation and geovisualization process of soil data using a web - based spatial information system

: Because of the increased demand for information, the scienti ﬁ c community have published their results on the Internet over the last 20 years, adapting to society require - ments.Duetothedevelopmentofweb - basedspatialinforma - tion soil systems, access to data on various themes and of varying quality has become substantially easier. The focus of our paper is to demonstrate a freely accessible and usable web - based soil database and soil information system ( Soil Information and Soil Classi ﬁ er System ) , which is suitable for geovisualization uploaded soil data and for determining reference soil groups ( RSGs ) in accordance with the World Reference Base of Soil Resources ( WRB; RSG ) . In order to achieve this, we algorithmized the diagnostic soil classi ﬁ ca - tion process of the WRB, then we created decision trees to correspond input soil data to the WRB system. In order to facilitate geovisualization of the spatial data, the Keyhole Markup Language ﬁ le format supported by Google API was applied.


Introduction
Soil classification concerns the grouping of soils with a similar range of properties (chemical, physical and biological) into genetic or diagnostic units (FAO 2006). It follows from this that it is an important element of soil research and data processing, serving mainly as an organizational framework for the description of soil properties (Shi et  . At the Congress, researchers emphasized that by creating the WRB, the goal is not the development of a new soil classification system, but rather the harmonization of the different national systems (Michéli et al. 2006). The system is based on the diagnostic approach to soil classification with unambiguous definitions and quantitative parameters (IUSS Working Group (WG) WRB 2007-2015). National and international soil classification systems, however, are based on different principles. While national level (German, Russian, Hungarian, French and Chinese) classifications are mainly based on the Dokuchaev soil categorization, international systems (USDA, FAO and WRB) mostly utilize the diagnostic classification. It follows from the different systems that the integration or harmonization of certain national systems with the international systems is often difficult to achieve. Over the past two decades, however, numerous attempts have been made to harmonize the soil databases of different origins with the WRB ( Eberhardt and Waltner (2010) suggested an automated methodology which was different from the previous, mainly correlation-and harmonization-based national level projects. The underlying principle behind the suggested  methodology is the usage of the original soil recording data in order to identify the various WRB units, instead of the harmonization of each national soil classification unit. This approach requires the development of the necessary algorithms for each database with different methods. The drawback of this method is that the development of algorithms is time-consuming and the errors resulting from the different methodologies can be corrected only to a limited degree.
The advantage, however, is that after setting up the system, a practically unlimited number of soil units can be classified automatically, thereby making possible the interpretation of larger databases (Waltner 2013).    Our aim was to plan and develop a freely accessible and usable web-based soil database and soil information system (SISCS, Soil Information and Soil Classifier System), which is suitable for the geovisualization of uploaded soil data and for determining reference soil groups (RSGs) in accordance with the WRB (RSG). In order to achieve this, we algorithmize the diagnostic soil classification process of the WRB, then we create decision trees to correspond input soil data to the WRB system.

System design and development
The main purpose of this research is to store soil data in a soil database, as well as to be able to display and classify these data by a service catalogue. It can be achieved in a reasonable way by establishing user groups that have different authorizations. Since our data have geospatial components, it is a natural requirement to geovisualize them with, for example, the help of a Google API. The application was created using PHP (PHP Hypertext Preprocessor) and JavaScript languages, as well as Hyper Text Markup Language. The user interface was designed with the free web template of Medialoot. The database is hosted by the MySQL Database system. The visual representation of the soil profiles is achieved via Keyhole Markup Language (KML). The applied technologies support free access to soil profiles, which can be shared among users and geovisualized on Google Maps. To satisfy this modern need for information, our goals also include the development of a classification application that uses chosen soil parameter lists. The classification is based on the correlation system of the International Union of Soil Sciences, the WRB (Balla et al. 2016).

Use case
The use case model demonstrates the user view of the system in which the purpose of the use case is to describe the relationships between the system to be modeled and its environment. This is a functional diagram, that is, the functions to be implemented by the system are placed in the center (Jacobson et al. 1995). We used the use case diagram to determine who will use the system and for what purposes. During the planning phase, the creation of three types of actors were considered: Guest, SISCS user and Administrator. Due to the standardized interface, Guests, SISCS users and Administrators have common functions, one of which has three sub-functions and another has one sub-function. SISCS users and Administrators also share functions. Furthermore, three more sub-functions were added for the Administrators because of the administrative tasks of user management (Figures 1 and 2).

Database scheme
To build the soil database, the variables and identification keys of the FAO have been used (FAO 2006), which contain all necessary information for WRB classification (IUSS WRB 2014). In addition, to describe a soil profile in a complex manner, qualitative and quantitative information about the soil are required, some of which comes from field analysis while the rest comes from laboratory testing ( Figure 3 and Table 1). The databases created in this way store the environmental descriptions of soil profiles, and the genetic and diagnostic information, as well as the physical and chemical attributes of the soil, separately (Balla et al. 2015a-2016).

Classification according to the WRB
The WRB is the modern soil classification system of diagnostics in Europe. The WRB (2015) is based on the previous classifications of the FAO while including many features of the USDA Soil Taxonomy and can sort soils from all over the world into 32 distinct soil reference groups. The classification is diagnostics based; therefore, it does not sort on the basis of the processes which have resulted in the creation of the soil, but rather on the layers and measurable attributes of the soil itself. These are the diagnostic horizons, diagnostic properties and diagnostic materials, which are clearly defined properties that can be measured or described; therefore, the classification can be transformed entirely into algorithms that are automatic and objective.
To define the RSG, the system uses a three-step hierarchy. At the first level of the hierarchy, the RSG is chosen by a (usually complex) system of conditions. The system analyzes the soil profile for the conditions of the RSGs starting from the first one (Histosol) and keeps going through them until the conditions of an RSG are met. If the soil profile does not meet any of the criteria for the RSGs, it gets sorted into the last one (Regosol). After the proper reference group is found, the classification continues to the next hierarchy level, where it looks for the qualifiers that can be attributed to the soil. The qualifiers that can be applied to certain soil groups are limited, as some of these qualities can be ruled out simply by the nature of some groups, while other qualifiers are unnecessary because they are a common trait in every soil of the RSG. The qualifiers that can be used in certain RSGs are located and listed in the charts that contain the definition of the reference groups. Every group has both principal and supplementary qualifiers. The former can be displayed in front of the name of the RSG with a hyphen, while the latter is displayed after the name of the RSG in brackets and separated by a comma.

Geovisualization of soil profiles
The soil profiles to be displayed on the map are represented by creating a KML file (Figure 4). The KML layer contains the data of the soil profiles stored in the database, which is expanded when a new soil profile is added. For each position indicator, an information bubble provides the geovisualization of further information available during the soil profiling.
As the last phase of the geovisualization of the soil profiles, a KML template created on the basis of the Google API is displayed in the Google Maps interface integrated into the system ( Table 2).

Results
The web-based spatial soil information system (SISCS) has separate interfaces associated with the three different user accounts ( Figure 5). The first level is the SISCS Guest account, which is assigned to users accessing the system through a web. SISCS Guest users can also list the soil profile data stored in the database and they can even download soil data. Google Maps APIs and their integrated KML operating functions support that stored soil data and spatial soil databases are visualized in the optimum KML format. With the available KML template, the information uploaded from the soil database automatically creates a KML file associated with the soil profile and is automatically geovisualized with other soil profiles when loaded from the Google interactive web map.
The second level of the hierarchy is the SISCS User. The user is a member of the user group who can access the system with a username and a password. This level enables them to upload soil profile data with the use of an information form or to import soil data. For sharing, imported or uploaded soil data can be saved or updated, and the existing database can be expanded. The most valuable function is to determine the automated RSG on the basis of the data entered. The highest account privilege belongs to the Administrator group. SISCS Administrators can manage the entire soil database and create user profiles.
The SISCS is accessed via a web browser through the URL http://siscs.exitdebrecen.hu/SISCSv2/. First, a socalled index page is displayed on the interface, which includes a brief description of the system and the contact information. Authorized users can log into their own interface by clicking on Login ( Figure 6).
When developing the user interface, user friendliness was a high priority, so we divided the interface into two separate parts. On the left side, we can see the menu items created on the basis of functional requirements, while on the right side we can see the content, depending on the function selected. The functions of the system and accompanying menu system will be described by authorization types.

Functional interface of Guest
On this user level personal authentication is not required; therefore, anyone can access the entire database and download the soil data. Here users can browse soil profiles and data uploaded by SISCS users as well as geovisualized databases.

Profiles menu
In functional terms, the Profiles menu item can be divided into three parts as follows: Show Profiles, Results and Query functions.

Show Profiles menu
By accessing Show Profiles, the soil profiles of the database will be listed dynamically (Figure 7). Users can gain information on profile identifiers, their geographical locations, the depth of the exposed soil profile, its horizon sequence and reference group. By clicking on the Details link, the digital report regarding the given soil profile can be accessed, which enables users to review the properties of complex profile description (field and laboratory data and recorded diagnostics). Here data cannot be edited.

Results menu
The Results item offers a visual statistical overview on the percentage distribution of the currently stored soil profiles by RSGs, and their occurrence listed in numerical tables.
If the contents of the tables change, for example, due to a deletion of an update, the interface refreshes automatically. The third part of the interface is a dynamically listed table, which includes the Profile ID, reference group, description status and the name of the person who uploaded the soil profile data stored in the database. The soil data regarding soil profiles can be downloaded individually in pdf and kml formats (Figure 8).
Via the Profile Location link users can access the webbased visualization of soil data (Figure 9). Based on the selected data of the uploaded soil records, a KML management function responsible for the geovisualization of the soil profiles automatically displays a kml file on the web map, the visibility of which can be modified by the user. By clicking on the symbol on the map indicating the location of the soil profile, the information interface of the selected soil profiles will appear.

Query menu
The Query item is a query interface on which the soil profiles stored in the database can be listed by setting the search/screening parameters ( Figure 10). This is the most interactive part of the Guest interface, since here users themselves can decide which parameters should be applied for screening and displaying data ( Table 3).

Links menu
In the Links item any user can open the following links: SISCS index page, IUSS (International Union of Soil Sciences) website, the website of the University of Debrecen, References and the Soil Geography seminar e-learning education material for students enrolled in the Geography BSc program.
Via the Gallery link users can open the profile photos uploaded by registered users (Figure 11).

Functional interface of SISCS User
The SISCS user is a member of the user group who can access the system with a username and a password. Their most important function is to upload, save and edit soil profile data exposed and recorded during the field sampling on their own interfaces, as well as to determine the automated RSG on the basis of the data entered. Following login, three further subitems appear in the Profiles menu item (Add New Profiles, Import Profiles and Edit Profiles), which also indicates the functions associated with this user level.

Add new Profiles menu
On this interface, the SISCS user can upload the data recorded during the soil profile exposure in a structured format. The interface created can be divided into four larger units. In the Data of Profiles block users should enter data from the soil profile environment description and upload a photo (Figure 12).
The Information Table block helps the filling out of forms; users can access a drop-down list which provides the information required for the interpretation of the selected soil parameters to help screen incorrect parameters during data entry ( Figure 13).
The Report and Diagnostic (Diagnostic Horizons, Diagnostic Properties and Diagnostic Materials) blocks are suitable for entering and describing genetic and  diagnostic layers determined during field work. With these blocks, the identifier and the depth corresponding to the given profile are critical because the analysis of the recorded samples is also associated with the record with the given ID. The Add/Remove Layer/Horizons/Material/ Properties buttons can be used to create or add new records or delete records from the horizon sequences of the profile (Figure 14).
In the last block of the data entry interface, users can add their own notes regarding profile exposure, which can be useful during later follow-up tasks. Following the filling-out of the record, we can save the soil profile  by clicking on the Save button and classify the soil profile based on the data entered according to the WRB by clicking on the Classifier button. The RSG of the soil profile can be determined manually, or it can be classified manually according to the Hungarian classification system if needed ( Figure 15). It should be noted that the quality management phase of the implemented classification algorithm ensures that the input data are correct, and in the case of incorrect or inappropriately entered data it notifies the user about the incorrect parameters.

Edit Profiles menu
The Edit Profiles item shows the list of profiles uploaded by a logged-in user ( Figure 16). On this interface users can delete or update profiles by clicking on the action buttons on the left side. By clicking on the delete icon the selected soil profiles can be deleted from the database and consequently the dynamically presented interfaces will also be updated (e.g. Results, Profile Location). By clicking on the Edit icon users can begin to correct, update and save the data of the selected soil profile on the data entry form described above.

Import Profiles menu
The Import Profiles item can be used to import csv files exported from the GeoMobilApp application. Here the environmental and diagnostic data of the profile should be uploaded together with the profile photo ( Figure 17).

Operation of GeoMobilApp
After the initialization of the application, users can decide whether they want to review previous soil profiles on their mobile phones or want to define a new sampling profile. In the first case, the database management algorithm will list the previously stored profiles during the startup phase. Here these data can be edited Classification and geovisualization process of soil data using a web-based spatial information system  649  as well (modification or deletion). In the case of a new sampling, the soil data corresponding to the profile can be determined through a process involving multiple screens (steps). At the end of the process, a picture can be taken of the soil profile using the camera of the device. At the end of every determining process not only is the position of the profile defined automatically but the most important information is also saved in a kml file which can be reviewed later.

Functional interface of the Administrator
In addition to the previously described functions, on the Administrator interface one more function can be accessed. Users with Administrator authorization can manage the entire soil database and can create user profiles and assign to them either SISCS user or Administrator authorization. Furthermore, Administrators are able to modify, update or delete the data of the soil profile uploaded by any SISCS user.

Add New User menu
The form provided for the creation of user profiles is shown in Figure 18. Following the entry of administrative data, by clicking on the Save button an account is created through which the user can access his/her own interface (depending on their authorization).

Edit User menu
On this interface, users can delete or update user accounts by clicking on the action buttons on the left side. By clicking on the delete icon, the user will be deleted from the database and therefore will not be able to log in to the system. By clicking on the Edit icon, Administrators can begin to correct, update the authorization level and save the data of the selected account on the data entry form shown in Figure 19.  Classification and geovisualization process of soil data using a web-based spatial information system  651

Process of automatic soil classification
The basic idea for the planning of the classification algorithm was to divide it into separate parts. Since the larger tasks (such as handling several RSGs at the same time or separating the diagnostics) cannot be solved all at once, they had to be divided into smaller tasks. Therefore, the classification mechanism now works on a step-by-step basis, starting from the top and fine-tuning so that the smaller tasks (such as the diagnostic properties) are left for last. The specification of the task is therefore to create an algorithm that is capable of classification using the WRB system. The whole of the classification process can be divided into two separate parts. One of them is to develop a proper list of conditions for every reference group, the other is to assign the qualifiers to these groups using a list of conditions for each one, as well. The first step in the classification process is to describe the soil profiles and to store the soil data in the registry. The proper structure of the registry is essential for the accurate recording of the data, as the soil profile can only be classified by using this information. Collecting data is most commonly the result of field work, therefore, getting the most accurate survey of these data is a primary concern. These data are supplemented by the laboratory measurements that are either too difficult or outright impossible to do in the field. On the second level of the algorithm, the diagnostic horizons, materials and properties must be established by using the lists of criteria associated with each of them (Figures 20 and 21).
In line with Krol et al. (2007), we used Limit check and Internal consistency check to filter extreme or incorrect values and complementary soil data, respectively ( Table 4).

Conclusions
The main goal of the open-access system is to store data of soil profiles in a soil database, and at same time, also to geovisualize and classify them in an online environment. The purpose of organizing soil data into a database and importing it into the information system was to assess whether, following the establishment of decision rules, the data stored in the soil profiles can be used for international correspondence/harmonization. The implementation of automated classification was performed based on the methodology of Eberhardt and Waltner (2010), since the former, mostly correlation-and harmonization-based activities were usually not performed on the basis of the soil data recorded according to the WRB. Instead of the recommended harmonization of certain national soil classification units, we used the original soil recording data for the identification of WRB RSGs. The classifying algorithm was based on the principle of deconstruction into parts. During the classification mechanism we developed, we used a top-down algorithm designing method, during which the algorithm first tests whether the requirements of RSGs are met and then evaluates the criterion system of reference groups in accordance with the WRB. With respect to the evaluation of the decisionmaking rules created, it should be noted that these only represent a "best approximation" for the investigated soil profiles and do not replace the process of field data recording, description and classification conducted in accordance with the detailed WRB methodology. However, until these data are available in a sufficient amount and with appropriate spatial density, it provides a good opportunity for preliminary estimations.