Data mining applications in university information management system development

: Nowadays, the modern management is promoted to resolve the issue of unreliable information transmission and to provide work e ﬃ ciency. The basic aim of the modern management is to be more e ﬀ ective in the role of the school to train talents and serve the society. This article focuses on the application of data mining ( DM ) in the development of information management system ( IMS ) in universities and colleges. DM provides powerful approaches for a variety of educational areas. Due to the large amount of student information that can be used to design valuable patterns relevant to student learning behavior, research in the ﬁ eld of education is continuously expanding. Educational data mining can be used by educational institutions to assess student performance, assisting the institution in recognizing the student ’ s accomplishments. In DM, classi ﬁ cation is a well - known technique that has been regularly used to deter - mine student achievement. In this study, the process of DM and the application research of association rules is introduced in the development of IMS in universities and colleges. The results show that the curriculum covers the whole ﬁ eld and the minimum transaction support count be 2, min conf = 70%. The results also suggested that students who choose one course also tend to choose the other course. The application of DM theory in university information will greatly upsurge the data analysis capability of administrators and improve the management level.


Introduction
With the advancement of the modern management theory and decision-making science, as well as their application in university and college management, universities and colleges will shift from experiencebased management to scientific or information management based on the modern management theories and methodologies [1]. In light of the current situation, the information management method has been consistently supported in universities and colleges, and teaching information management system (IMS) has been established one after the other [2]. However, as the number of students managed and the time spent on using the teaching management information system increases, a large amount of management data based on teaching is collected. Education has become more flexible and broad as college enrollment continues to expand [3]. Most universities and colleges are dealing with a conflict between rising student numbers and tightening teaching resources, posing unprecedented difficulties to education management. As a result of such a novel occurrence, the IMS is growing in popularity and recognition among teachers and students because it allows them to work together effectively [4]. In the previous teaching management, school education management (SEM) only focused on the unique characteristics of the educational field, exaggerating the unique characteristics of school education, emphasizing the management mode based on experience, and to some extent ignoring the similarities between education and general management [5]. Further, SEM only emphasizes the particularity of education field and ignores the commonness between general and education management to some extent. It places too much emphasis on the precision of school curriculum and not enough on the experience-based management approach. For example, most universities and colleges have not established the IMS. It is an important modern management tool; but in sharp contrast, many universities and colleges themselves are developing various management information systems for the government and enterprises [6]. Many universities still do not realize the importance of management information system in university management [7]. Traditional data analysis approaches have some drawbacks in the face of "mountain" data collecting, whether in time or location, such as the inability to cope and unavailability of understandable administrators. There is a need to use these data effectively because it leads to increasingly more serious "data disaster," forcing school administrators to take "decisions disaster" responses. In fact, whatever countermeasures are used, they are motivated by a sense of helplessness. Data mining (DM) technology advances the use of data from basic querying to more advanced applications such as prediction, decision assistance, and analysis [8]. The IMS has gradually been applied to university education as science and technology, as well as information technology, have continued to develop and improve. DM is a process of cooperation among various experts and also a process of high investment in capital and technology. This process should be repeated. In the process of repetition, the essence of things is constantly approached, and solutions to problems are constantly given priority [9]. The subdivision and reorganization of data process add and split the selected records. These data records are chosen from the data exploration clustering by analyzing the neural network (NN), decision tree (DT) mathematical statistics, and time series visualization. Further, a comprehensive interpretation is found to be effective in evaluating data knowledge, data sampling, data exploration, and data adjustment modeling [10]. Most universities and colleges now have an IMS in place, which have essentially overcome the problems and drawbacks of an old-fashioned teaching management [11].

Motivation
In universities and colleges, the purpose of teaching information management (TIM) is to maximize the utilization of teaching resources and leverage the benefits of diverse resources to attract high-skilled, highquality, individuals for the country and society [12]. DM technology is a novel study area with a wide range of applications produced by the integration and intersection of multiple fields. It integrates database methods, artificial intelligence (AI) approaches, mathematical statistics approaches, and visualization approaches. With the expansion of the educational organization and the continuing information technology development, higher education is receiving an influx of teaching resources, and rigorous standards for higher education teaching management are also being imposed [13]. How to better and more effectively use information technology to service TIM and decision making has become a key indicator of a school strength and college or university's teaching management level. The popularization and the use of DM technology is a research into the informationization of teaching management at universities and colleges in the context of a new circumstance that has significant real implications [14]. Implementing scientific DM methods and practices will undoubtedly become the aim and primary job of the future stage of education management informationization and virtual campus development in Chinese institutions [15].
In view of this research problem, Mago and Giabbanelli believe that DM is a way to extract hidden and possibly usable information and knowledge from a vast amount of incomplete, noisy, fuzzy, and random data [16]. Li et al., conducted extensive study into the use of fuzzy approaches in the finding of knowledge [17]. Yanhao et al., studied data cube algebra [18]. On the basis of current research, this article introduces DM and its process, studies the application of association rules in the development of university information management system (UIMS), and shows that the application of DM theory in university information will greatly increase the data analysis ability of managers and improve the management level [19].
The following is a breakdown of the structure of this research article. Section 2 illustrates the related studies on educational DM. Section 3 describes the methods of DM in UIMS. Section 4 contains the technological process of UIMS based on DM technology. Sections 5 and 6 explore results, experimental test, and conclusion.

Related studies
DM technique is the most accurate method for evaluating usable data in the data warehouse [21]. DM is utilized to forecast hidden information using an extraction process in terms of improving decision making [22]. The use of DM for educational activities has been increased based on staff decisions, student performance, and administration decisions [23]. Data-driven knowledge discovery is a paradigm that can be applied to DM [24][25][26]. DM is a multifaceted area that encompasses a variety of topics such as statistics, AI, information technology, learning, data visualization, and retrieval [27].
Because of the enhanced mining application, the educational system has become more balanced [28]. In the sphere of education, the concept of educational data mining (EDM) has rapidly evolved in relation to many types of educational organizations [29]. Further, an academic analyst has been linked to institutional efficiency and student performance issues [30][31][32][33][34][35][36][37][38][39]. The EDM [40] encompasses all aspects that have a direct impact on students at the school. As indicated in Table 1, we give a review of the earlier ten researches that used DM techniques in educational settings, extending from 2016 to 2021.

DM in educational informational management system
EDM is one of the parts of DM, and its main focus is on constructing systems for obtaining hidden information from records of students, which may then be used to improve students' academic performance. Raw data collected from several educational organizations can be transformed into important information, which is used by students, their parents, instructors, educational software developers, and educational researchers in the process of EDM. It can also be viewed of as a system that is a part of the current education system and is capable of generating beneficial interactions with various elements of it. This will allow it to eventually achieve its goal of improving education [59]. EDM is described as the use of standard DM techniques to educational data processing to solve problems in the field of education [60]. There are some examples of EDM applications such as the construction of systems based on e-learning technology [60,61], the educational data clustering [62], and the prediction of student performance [63]. In educational DM, there are several techniques that fall into the categories such as association rule analysis (ARA), sequential pattern, prediction, clustering, classification, and machine learning systems.

Educational settings based on DM techniques
Clustering [23,41,[64][65][66][67][68][69][70], classification [44,64,[71][72][73], sequential pattern [67,74,75], prediction [42,67], ARA [44,64], machine learning [76], and ANN [77] are the most well-known DM techniques. From 1995 to 2005, the bulk of studies on educational DM used the ARA technique [78] because it required less knowledge than other techniques [68]. Nonetheless, by the beginning of 2005, the tendency had shifted, and academics were increasingly using clustering and classification approaches for analysis [79]. It is usual to generate a set of outputs for an association rule, the bulk of which are uninteresting and difficult to interpret for non-DM users [80]. Researchers must first create the data and check that it is consistent with the desired output before selecting the optimal algorithms [79]. Because data splitting is not required in this procedure, they can use the clustering approach instead of the classification approach when their inquiry is of a modest scale [79]. Furthermore, using the same database as in an earlier study [67], the researchers can always evaluate with alternative algorithms. This would make it easier to see if the same outcomes might be achieved with a different approach.

Methods of DM in UIMS
The number of students in schools has expanded dramatically as a result of the continual expansion of enrollment in universities and colleges, considerably increasing the work burden of the management staff in all parts of schools. DM technology is now widely used in a variety of industries, particularly in college and university teaching management systems, where it helps to grasp basic information of students, learning characteristics of master students' and to set up courses of teaching efficiently. Traditional management approach based on manual practice is no longer capable of meeting the demands of today's job. This management approach has a number of flaws, including inefficiency and lack of confidentiality. Furthermore, a great amount of data and files will be produced throughout time, posing significant challenges in terms of searching, updating, and managing them. The UIMS is a critical component of every educational institution, and its content is critical to school managers and decision makers. As a result, the UIMS model should give enough information and query choices for users.

DM technique
The use of DM technology in the modern management education has a wide variety of applications, and it is now the most important effort and goal in the development of teaching management in Chinese institutions. Instructors should pay special attention to students' physiological and psychological characteristics and then fully engage students in their dominant role before fully engaging teachers in their auxiliary and leading roles. Some information is haphazard if it is done by hand, and it may result in making workers' jobs more difficult and the data more jumbled. Data should be extracted by users so that they can be used more effectively in people's lives. DM technology extracts the data information needed for teaching from a large volume of geographic data based on a geographical database using identification technology and statistical methodologies. By constantly evaluating and identifying these data information, the most practical implementation of data processing method is eventually discovered, providing scientific data assistance for leaders of universities and colleges and functional departments of teaching management. As DM and knowledge discovery research has progressed, it has created three powerful technical pillars: mathematical statistics, database, and AI. Basic theory, reuse and maintenance of discovery knowledge, data warehouse, discovery algorithm, visualization technology, quantitative and qualitative exchange model, knowledge representation method, knowledge discovery in unstructured and semistructured data, and online DM are currently the main research topics of data mining and knowledge discovery [20]. The most common types of knowledge discovered by DM are as follows:

Generalization knowledge
Generalized descriptive knowledge of category features is referred to as generalized knowledge. According to the microscopic characteristics of data, it discovers the knowledge represented by it, with universality, higher level concept, medium view and macro view, which reflects the common nature of similar things and is the generalization, refinement, and abstraction of data.

Association
Association expresses the understanding of interdependencies or links between events. When two or more qualities are linked, the value of one can be predicted based on the values of the others. There are two phases to discovering the most well-known association rules. The first phase is to iteratively identify all frequent item sets, with the support rate for frequent item sets having to be at least as high as the user's lowest value. The first phase is to build rules from often occurring item sets with a believability equal to or greater than the user's lowest value. The heart of the association rule discovery algorithm is to identify or find all frequent item sets. The heart of the association rule discovery method is the identification or discovery of all frequent item sets, which also happens to be the section with the most calculations [81].

Introduction to DM process
The general contents of each step in the DM process are as follows:

Determine business objects
The first stage in DM is to clearly define the business challenge and to comprehend the goal of DM. Although the eventual structure of mining is unknown, the issues that must be addressed should be anticipated. DM is deafening blind and needs to determine on a case by case basis when being deaf or blind have an affect on your ability to make a perception check.

Data preparation
• Data selection: By searching all external and internal data information linked to business items, data suitable for DM applications were selected. • Data preprocessing: The data quality was examined to prepare for progressive analysis. Also, the type of mining activity to be carried out was decided. • Data transformation: Data were transformed into a model for analysis. The mining algorithm is built up in this analytical model. It is the key to the success of DM to build an analysis model that is really appropriate for mining algorithm.

DM
Mining the obtained transformed data. All the work can be done automatically except to perfect and select the appropriate mining algorithm.

Technological process of UIMS based on DM technology
The technical process of UIMS based on DM technology is described below:

Search for frequent item set algorithm
The algorithm of searching frequent predicate set on data cube based on Apriori algorithm is called AprioriCube algorithm. The difference between the algorithm and Apriori algorithm is the calculation of predicate set support. In multidimensional association rule mining based on data cube, a predicate set is a combination of dimension members of different dimensions of data cube (d 1 …d n |count), and the support count of predicate set is the frequent measure value stored in cube squares.

Find frequent item sets
A one-dimensional table is created where y 1 represents the attribute of title or degree and sup represents support. A table is created where y 1 and y 2 , represent the attributes of title or degree and sup represents support. A three-dimensional table is created where y 1 and y 2 , represent the attributes of title or degree and sup represents support. All one-dimensional member inputs are deleted from infrequent item sets, minsup is the minimum supported frequency, and RECC is the number of records [83].

Apriori algorithm is used to determine the realization process of the correlation between the courses selected by students 4.2.1 Objectives
According to the data of students' course selection and preselection in the database of students' course selection management, the association rules between courses are mined to determine the association relation of students' course selection, which provides the basis for propaganda planning and course classification.

Determine the type of DM
Assuming that global is a collection of courses available from distance education providers, each course has a Boolean variable demonstrating the absence or presence of the course. Each course selection sequence can be represented by a Boolean vector. Boolean vector can be analyzed to obtain the pattern of course selection reflecting the frequent association of courses. These patterns can be expressed in the form of association rules. Therefore, it can be determined that to find the association relationship between the courses selected by students, we can mine the association rules of selected courses in the student elective management database. Because only the one-dimensional data of the courses selected by students need to be considered, the method adopted in this article is as follows: First, the Apriori algorithm 10 is used to find frequent item sets, and then frequent item sets are generated to generate association rules [84].

Process
The process of UIMS is explained in Algorithm 1.

Algorithm 1
Step 1: Determine the target data of DMthe data of students' elective courses and preselected coursesin the student elective management database, including course names and learning seasons.
Step 2: Collect task-related data sets based on the following relational query.
Step 3: Determine the minimum support threshold minsup.
Step 4: Use Apriori algorithm to discover the frequent itemsets.
Step 5: Make association rules from the frequent itemsets.
Assume that the number of tuples selected is 9, that is, |D| = 9. The tuple identifier is represented by TID and stored in lexicographical order, as shown in Figure 1.

Result and experimental test
Courses in all areas: {I1 = computer graphics, I2 = "image processing," I3 = "computer-aided design," I4 = "IC Computer-Aided Design (ICCAD) software tools," I5 = "semiconductor theory," I6 = "large-scale analog integrated circuits," I7 = "accounting," I8 = "computer fundamentals," I9 = "architectural design"}. D is obtained through relational query from the student course selection management database, as shown in Figure 2. Let the minimum transaction support count be 2, min conf = 70%, then the output rule is: I3 > I4, I9 > I13; that is, students who choose the course of "computer-aided design" also tend to choose the course of "ICCAD software tools," and students who choose the course of "architectural design" also tend to choose the course of "computer-aided design."

The design and improvement of Apriori algorithm based on array is applied in the analysis of test scores
Because of Boolean variable, first need is to convert Boolean variables to unidimensional values with early grades as an example, this study divides the preliminary achievement good, medium, and poor 3, respectively, for Il, I2 I3, sex to male with I2, said woman with I5, said the work at ordinary times is also divided into good, poor, in three levels, I6, I7, and I8 are used to represent the class performance, which is also divided into three grades: good, medium and poor; I9, I10, and ILL are used to represent the rewards and punishments, which are classified as I12, I13, and I14. The final scores are divided into excellent (above 85 points), good (70)(71)(72)(73)(74)(75)(76)(77)(78)(79)(80)(81)(82)(83)(84), and passing (60)(61)(62)(63)(64)(65)(66)(67)(68)(69). Moreover, fail (below 60 points) are represented by I15, IL6, I17, and I18, respectively. In this case, 1 represents the yes of a Boolean variable and 0 represents the no of a Boolean variable, which has the advantage of facilitating array operations. It can be seen from the above that given a student whose early score is medium, gender is male, daily homework is good, class performance is good, rewards and punishments are not good, and the final score is good, it can be represented as 010101001000100100. Here, the early score is the average of previous comprehensive score, 80 points or more is good. The program can realize the preprocessing of data and read the processed data into the array. Table 2 is a view of the data converted to Booleans [85] ( Table 3).
Here are the specific steps for DM: 1) The occurrence times of each subitem are counted, the support degree of each subitem is calculated, the items meeting the minimum support degree into the dimensional array P1 are stored, that is, frequent 1  item set is generated, and the items that do not meet the minimum support degree are deleted. That is, the size of a two-dimensional array is compressed.
2) The subsequent frequent item sets are divided into two steps. Candidate items are generated in the first step, and frequent item sets are generated in the second step. The specific process is as follows: First, by frequent (n − 1) item set (N ≥ 2), each item is naturally connected to generate the candidate item set of N, which is stored in the array Pn in an ascending order. Second, the support degree of candidate N is obtained by scanning the two-dimensional array and recorded in the array Pn. Finally, delete rows and columns whose support count is less than the minimum support min sup . Until all frequent item sets are found, if the number of a candidate item set is zero, the operation is stopped. Finally, the frequent set for all items is printed. The number of times the program scans the transaction array depends on the maximum length of the frequent item set.
3) The confidence of each nonempty subset of the final frequent item set is calculated, the records less than the minimum confidence threshold are deleted, and finally rules are generated.
University management or teachers can use management information systems to identify special pupils and give the foundation for creating necessary psychological education to be successful in the field of education. In the evolution of informatization of management in universities and colleges, DM is an unavoidable stage. Universities must use electronic information technology to increase the management efficiency and quality, establish a perfect education IMS, and perform in-depth analyses of resources of campus and data utilizing modern DM approaches to enhance informationization of management. To properly promote the management informatization development in universities, the educational or research management system is combined with the authentic state of schools, fully understand the bottleneck issues in DM, improve the precision and high quality of data analysis, and fully realize the true role of DM [86].
In relation to information, it is critical to build and improve a UIMS with a greater functional level, to utilize advanced and scientific tools of statistical analysis, and to utilize progressive and scientific mining expertise to deeply analyze management data, in order to improve management intensity of intelligence [87][88][89].
To boost management intelligence intensity, it is necessary to create and improve a UIMS with a greater functional level, to apply advanced and scientific statistical analytical techniques, and to use scientific and advanced mining technology to thoroughly examine management data.

Conclusion
This study presents the application research of DM in the development of university IMS. First, it introduces DM and its process, and studies the application of association rules in UIMS. It shows that the curriculum  Xh  I1  I2  I3  I4  I5  I6  I7  I8  I9  I10  I11  I12  I13  I14  I15  I16  I17  I18   1  1  0  0  1  0  1  0  0  1  0  0  1  0  0  1  0  0  0  2  0  1  0  1  0  0  1  0  0  1  0  0  1  0  0  1  0  0  3  covers the whole field:{I1 = computer graphics, I2 = "image processing," I3 = "computer-aided design," I4 = "ICCAD software tools," I5 = "semiconductor theory," I6 = "large-scale analog integrated circuits," I7 = "accounting," I8 = "computer fundamentals," I9 = "architectural design"}. D is obtained through relational query from the student course selection management database, as shown in Figure 2. Let the minimum transaction support count be 2, then the output rule is, that is, students who choose the course of "computer-aided design" also tend to choose the course of "ICCAD software tools," and students who choose the course of "architectural design" also tend to choose the course of "computer-aided design." The rational procedure of the conclusion of this method has a great dependence on data. When the amount of historical data is increasing and the curriculum planning is improving, it will form a virtuous circle to promote the rationality of the combination of curriculum modules. The application of DM theory in university information database can be further deepened, and the relevant algorithms of DM can be optimized according to the characteristics of university information database. To play a greater role in talent evaluation, discipline echelon construction, postmanagement and the formulation of management information policy, DM is carried out from different angles such as multilayer and multidimension. In future, different machine learning techniques will be used to develop UIMS.

Conflict of interest:
Authors state no conflict of interest.