Technical review of supervised machine learning studies and potential implementation to identify herbal plant dataset

: The use of technology in everyday life is una - voidable, considering that technological advancement occurs very quickly. The current era is also known as industry 4.0. In the industry 4.0 era, there is a conver - gence between the industrial world and information tech - nology. The use of modern machines in the industry makes it possible for business actors to digitize their production facilities and open up new business opportu - nities. One of the developments in information tech - nology that is being widely used in its implementation is machine learning ( ML ) technology and its branches such as computer vision and image recognition. In this work, we propose a customized convolutional neural net - work - based ML model to perform image classi ﬁ cation technique for Indonesian herb image dataset, along with the detailed review and discussion of the methods and results. In this work, we use the transfer learning method toadopttheopensourcepre - trainedmodel,namely,Xception, developed by Google.


Introduction
The development of information technology in the world is happening very fast. This is indicated by the increasing global market demand for computer technology products in various types/forms (i.e., smartphones, personal computers (PCs), and laptops) and the number of new research works on the topic of information technology [1]. One of the main reasons for this phenomenon is that the effective use of information technology can improve performance and enable various activities to be carried out quickly, precisely, and accurately, which will ultimately increase productivity. The new industrial era that the world is currently facing is closely related to many types of technology in various fields. This new era is called the era of the industrial revolution 4.0, where the fourth industrial revolution is usually identified with a widespread digitalization, which allows the integration of various processes that will increase the efficiency of the production process [2][3][4][5]. The industrial era 4.0 is also identified by the increase in the computer technology and internet access that is spread to various circles of society in the world, as shown by research by Kaya and Aydin in Figure 1 [6].
The sustainability of the industrial revolution 4.0 era is also closely related to the potential use of artificial intelligence (AI) technology, which is the driving force of each of the pillars of the 4.0 industrial revolution. The nine pillars of industry 4.0 cannot be separated from AI technology. And within the AI circle, there is a branch that has taken many steps of advancement so far, i.e., machine learning (ML). ML itself is a branch of AI technology designed to imitate human intelligence by studying patterns from a particular dataset. The machine will be able to predict an outcome based on the dataset pattern. The difference between the programming flow in ML and traditional programming is described in Figure 2. The relationship between the industrial revolution 4.0 and AI is illustrated in Figure 3.
In the recent times of industrial era 4.0, one of the branches of ML technology that is being widely used is the computer vision and image recognition algorithm. The AI developed by some of the latest technology companies such as Tesla's autonomous driving and facial recognition in Google Photos are both applications of computer vision and image recongition. The potential use of computer vision in industry and engineering is still considered very broad, ranging from use for quality control and sorting goods to recognizing the shape/element of material through images.
Our work in this study is building a model that adopts the algorithm of image recognition and classification to perform a classification technique using convolutional neural network (CNN) principles and supervised learning method. We have built a model that is suitable to perform image classification for particular Indonesian herbs image dataset, which will be listed in the Section 3. The concern behind choosing Indonesian herb image dataset to be trained by our model is that this commodity is currently one of the main focuses of the development of micro, small, and medium enterprises (Indonesian: UMKM) for making traditional medicine, especially in the time of the global pandemic [7]. We propose this work to fill that concern in the technological aspect of performing image classification of Indonesian herbs, which potentially can help and inspire another image classification-based research on this particular topic of Indonesian herbs in the Industrial 4.0 era.

Review of literature
ML is becoming a superior technology that can be applied in various other fields today. Inventions and new ways of using ML continue to emerge in a fairly fast period of time. Jordan and Mitchell mentioned that ML is one of the fastest growing technical subjects today [8]. ML science has a place at the intersection of computer science and statistics and is at the core of AI and data science. Along with AI technology, ML is the main method for developing renewable technologies. Jordan and Mitchell also mentioned some examples of new developed technologies such as speech recognition, face or image recognition, object detection, natural language processing, autonomous robots, stock market price analysis, and social experimentation. These inventions and technologies have been discovered due to growing ML research works. All of the abovementioned examples of ML applications (which have only been discovered in the last few decades) open an insight into the potential of ML in the future. Mjolsness,  and DeCoste stated that ML continues to develop that was proven by the deeper clarity of the mathematical approaches behind the ML processes and the increase in successful uses over the last 20 years (software or hardware) [9].
ML approaches such as data clustering, neural network classifiers, and nonlinear regression have provided very wide applications in engineering, business, and science practice. And especially from a science and research perspective, ML has the potential to strengthen the understanding and clarity in research. ML will also create an intelligent computing system with general analytic capabilities from scientific thinking. A good observation method and inferring hypotheses cannot be carried out solely by relying on human abilities when a study will involve very large volumes of data and high data acquisition rates. In this case, ML is a very powerful tool to solve those kinds of problems.
ML, according to Mahesh, is the scientific study of algorithms and statistical models that computer systems use to perform certain tasks without having to be programmed explicitly. ML is used to teach machines how to process data more efficiently [10]. And in today's digital era, various kinds of data are available in abundance, and with the abundant availability of these datasets, the demand for using ML methods is increasing rapidly. The purpose of ML itself is to study the data, so that a program or conclusion is generated that represents the dataset. This clearly provides many breakthroughs in various activities that involve a lot of data. According to El Naqa and Murphy, ML is also a way for computers to imitate the way humans (and other living things) learn to process sensory signals/data given (input) to achieve a certain goal [11]. ML methods are seen as part of AI technology. Koza et al. also stated that ML algorithms generally build a model based on a sample of data so that the model can make predictions or decisions without having to be programmed directly and repeatedly [12].
ML relies on different algorithms to solve different data problems. ML has several types of algorithms, which, according to Ayodele, are organized into a taxonomy based on the desired results from the algorithm itself [13]. Examples of commonly known types of algorithms Augm me e en nted y Reality y Industry 4.0 Cl d d d include supervised learning, unsupervised learning, semisupervised learning, reinforcement learning, transduction, and learning to learn. This technology is also starting to experience rapid development and becoming one of the widely used features by modern technology-based companies, such as in Tesla cars, where the technology is used in various sides of the car in order to achieve a reasonable construction of 3D model of the environment [14], and in Apple's Siri speech recognition system which uses neural networks that converts the acoustic pattern of user's voice at each instant into a probability distribution over speech sounds [15]. In this article, we will discuss the development of image classification-based applications with supervised learning algorithms in more detail. Supervised learning is one of the algorithms in ML where the algorithm is set to produce a function that maps the input given to the desired output. Supervised learning is the most common technique for training neural networks and decision trees. Likewise, according to Mohammed et al., the target of the supervised learning is to conclude a function or mapping from training data that have been labeled [16]. Training data consist of input vector X and output vector Y in the form of labels or tags. The label or tag of the Y vector itself is an explanation of each input example of the X input vector. This is why it is called supervised learning, which is the fact that each Y output vector already has a name or label for each training example in the training data. Each name or label has been given in advance manually by the supervisor (supervisor here can mean human or robot). This supervised learning algorithm underlies the development of image classification-based applications for herbal plants, which will be discussed in this article. Description of a supervised learning algorithm in a flow chart is generally shown in Figure 4.
One branch of ML applications with supervised learning algorithms, which is also currently being researched and used in various fields is computer vision technology. According to Szeliski, computer vision is being widely used in very broad and diverse real-world applications, such as Optical character recognition, where machines can read handwritten letters and numbers on vehicle license plates [17]. It also can be used for inspection of certain components or parts as quality assurance, and in automotive safety system, which is used by vehicles to detect unexpected obstacles while on the road. There are many more applications of computer vision that are being intensively used in various sectors. Computer vision is a field of science that discusses how computers can understand various features that exist in an object in an image or video. Huang states, from the point of view of engineering science, that computer vision is aimed at building automated systems capable of performing tasks that the human visual system can perform [18]. A flow diagram illustration of the computer vision process is depicted in Figure 5.
Like ML principles in general, a machine/computer requires a collection of data to be able to learn the data and patterns to be analyzed, as well as computer vision. The dataset needed for computer vision techniques is in the form of a digital image of an object to be analyzed. There are several official websites available on the internet that provide various digital image datasets to train computer vision machines, such as Kaggle, ImageNet, and opensource.google. The tasks performed by machines in performing computer vision, according to Klette, include methods comprising acquiring, processing, analyzing, and understanding a digital image, as well as extracting dimensional datasets from the real world to then produce numerical or symbolic information that the computer itself can understand [19]. Applications of the AI can also be found in road transport, especially when planning routes in autonomous driving [20][21][22][23], logistic flows [24], industrial engineering and processing [25][26][27], and robotization in manufacturing [28,29].

Methodology
The methodology used for making machine vision features of the Herbify application consists of three parts: (1) searching for digital image datasets, (2) creating programming scripts, and (3) training and validation processes. In training the computer to understand the dataset, the computer performs the algorithm specified in the programming script to be created. These three activities are carried out in online cloud media, namely, Google Colab and Google Cloud Platform (GCP). The step-by-step processes is explained generally in

Dataset creation (data harvest)
The dataset required is a collection of digital images of various kinds of herbal plants. Dataset search experiments have been carried out on open-source sites that provide digital image datasets (Kaggle, ImageNet, and opensource.google), but the desired results were not obtained. Thus, the search process for digital image datasets was carried out by downloading pictures of herbal plants directly from the internet, such as ginger, turmeric, garlic, etc. Downloading digital images is done using a Java programming script, which is run on an internet browser console. The script will download a text file (.txt) containing the URLs of the digital images that is available in the browser. Then, each image based on the URL that has been obtained in the .txt file will be downloaded by running the following Python programming script at the command prompt or other Python interpreter application. The snippet of the Java programming script is shown as follows [30]
The images will then be categorized into folders on the cloud storage. This grouping also plays a role in ML methods that will directly categorize a dataset based on its storage folder. By managing datasets in structured folders, it is possible for the computer to directly load sample images according to the needs of the learning process. This will slow down the learning process significantly but the number of datasets that can be managed will be maximized according to the available storage capacity [31]. Under these conditions, ML can be carried out for various types of datasets. Before carrying out the ML process, the dataset must first be sorted, because the image search results on Google Image also contain images that do not match the desired image. The number of datasets obtained was 8,706 digital images, with the number of datasets for each plant presented in Table 1.

ML engineering
Computer programming for ML is done using the Python programming language, with version 3.0 and above. Some interpreters that can be used to write programming scripts include Google Colab, Jupyter Notebook, and Microsoft Visual Studio Code. Meanwhile, the dataset storage directory is placed in cloud storage on the GCP. The first programming code written is to import any libraries that we will use. The libraries used are libraries that are already available in Python version 3.0.0 and above, including: NumPy, Pandas, TensorFlow, OS, Matplotlib, and Keras. The Python programming code script to import the entire library is written as follows: import numpy as np import pandas as pd import tensorflow as tf Supervised ML studies to identify herbal plant dataset  7 import os import matplotlib.pyplot as plt import matplotlib.image as mpimg from keras import Sequential from keras.preprocessing.image import ImageDataGenerator from keras.optimizers import SGD,Adam from keras.layers import Flatten,Dense,BatchNormalization,Activation,Dropout from keras.utils import to_categorical The next step is to augment image data, using the image preprocessing method. Performing this method begins by specifying the directory path of the dataset in cloud storage. In this case, the dataset is stored in the "dataset/train" directory in the cloud storage. The next step is to create a variable for the directory of the owned dataset. The directory variables are divided into two: (1) the training directory, the directory for datasets intended for training on machines, and (2) the validation directory, which is the directory containing datasets intended to validate the results of training. The datasets in each directory are augmented using the following programming script. TRAINING_DIR = 'dataset/train' training_datagen = ImageDataGenerator( rescale = 1/255, The next step is to create a generator variable on the image data that has been augmented in the previous programming script. A data generator is needed to generate image data in the dataset folder. The generator variable is also divided into two: The flow from the directory method is useful when digital image datasets have been sorted and placed in their respective class/label folders. This method will identify the class automatically from the folder name. This method is also separated into two, namely, for training and validation. The next step is to transfer learning a neural network model that has been created. Transfer learning is one of the methods in ML programming that aims to adopt a model that has been created or is commonly called a pre-trained model that is open-source or free to use by anyone. In this work, the team uses a pre-trained model called Xception, a pre-trained neural network model available in the Keras library. Xception is a CNN with 36 convolutional layers [32]. Xception has a total of 7 configurable arguments, such as include_top, weights, input_tensor, input shape, pooling, classes, and classifier activation. The programming script for doing transfer learning on the Xception model is written as follows: from tensorflow.keras.applications.xception import Xception At the end of the Xception neural network, it is necessary to add a connecting node that is activated with the "ReLu" activation function as a connecting layer from the previous neural network from Xception, and the last node as an output node according to the number of classes that have been classified, which is 20, then the script is written as follows: layer = base_model.output # adding fully-connected layer layer = Dense(256, activation = 'relu')(layer) # adding output layer with 20 classes predictions = Dense(20, activation = 'softmax')(layer) Next create variables for the model and optimizer. The model is the entire neural network that will be trained using the existing dataset. The optimizer used here is the "Adam" optimizer, which stands for adaptive moment estimation. The team chooses Adam optimizer for the model because adam is a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement [33]. With various and repeated trial of settings upon the methods of Model and Adam, the model and optimizer variables are written with the following programming script. model = Model(inputs = base_model.input, outputs = predictions) opt = Adam(lr = 0.001, decay = 1e-6, beta_1 = 0.9, beta_2 = 0.999, amsgrad = False) Another thing to be reminded, the first layer that must be trained later on is the top layer (Xception layer itself), since that layer is the one that has the convolution function. Therefore, the programming script for setting it up is as follows: for layer in base_model.layers: layer.trainable = False And then, it will be compiled on the model that has been created.
model.compile(optimizer = opt, loss = 'categorical_crossentropy', metrics = ['accuracy']) From here, the neural network model can be trained using the fit method, which is filled with arguments such as train generator, epoch, data validation, validation generator, and verbose, as written in the following script.
model.fit(train_generator, epochs = 4, validation_data = validation_generator, verbose = 1) At this point, the top layer is well trained and can begin fine-tuning the convolution layer of Xception [34]. The team chose to fine-tune with the freeze layer method, which is the first 125 layers, and unfreeze the rest to train. Freezing layers in the context of a neural network is all about controlling layer weights. When a layer is frozen, its weight cannot be further modified during the training process. This freezing technique also aims to reduce computational time during the training process without  layer.trainable = True After fine-tuning with the freezing layer is done, the neural network model will be recompiled so that the modifications from the fine-tuning can be applied to the model, with the following script: model.compile(optimizer = opt, loss = 'categorical_crossentropy', metrics = ['accuracy']) and do the training one more time for the last. history = model.fit(train_generator, epochs = 10, vali-dation_data = validation_generator, verbose = 1).

Results and discussion
After the programming code in the methodology section is executed, the machine/computer will enter into the training process for the specified dataset. This learning process involves two aspects of accuracy that will be used as parameters for whether the machine/computer is well trained or not, namely, training accuracy and validation accuracy. And after all the epochs have been passed (the dataset cycle enters the neural network), the team tries to plot the training accuracy and validation for each epoch. As shown in Figure 8, the training and validation results carried out using the Xception model show a good quality of accuracy. The results obtained are training accuracy of 0.97 or 97% and validation accuracy of 0.93 or 93%.
After training on the created model, the model is saved in the .h5 program format, which will then be installed on the developed Android program. The classification results of the herbal plants with the Xception model show good performance quality. The team tested the ML model by entering pictures of herbal plants at random, and then the model would classify the images as one of the 20 predefined classes. The parameters of how the model classifies images in the dataset are based on the level of confidence or the number of the model's confidence in the analyzed images. Figure 9 shows the level of confidence of the model in classifying pictures of herbal plants, including pictures of gotu kola leaves, aloe vera, and two pictures of garlic, with the spread of the confidence model figures, which can be seen in Figure 10. It can be seen that the model has a fairly good level of confidence in classifying Pegagan leaves, although there are also confidence figures in other classes of herbal plants, namely Adas at 0.09%, Katuk Leaf 9.90%, Kumis Kucing at 0.03%, Meniran at 15.8%, and Pare 0.01%. The reason for the emergence of confidence figures in other classes of herbal plants is thought to be caused by the possibility of the similarity of colors, shapes, and background images of the leaves. This will serve as material for the team's evaluation that the digital image dataset needs to be trimmed, which in this case is done by removing the background image from the digital image dataset that will be trained on the model. Another test was carried out using an image of an aloe vera leaf, but with a digital image that has the characteristics of an image with a white background (without any other color or shape behind the object of analysis). It turns out that the perfect model confidence number is 100%, which means that the model confidently classifies the inputted digital image as Aloe vera leaf, as shown in Figure 10. Then, two different images of garlic were also tested, but with the same background, and accurate analysis results were obtained, with confidence figures for each garlic image of 95.72 and 100%. Based on these figures, the suspicion about the effect of the background image on the object to be analyzed is getting stronger. This can be a subject for further discussion, especially with the quality of the dataset currently owned by the team.
The current research can be stated to have more variations in terms of herbal plants sample, i.e., color, shape, etc., compared to pioneer works which focused on the identification of leaf type and geometry. The works were introduced by Jackulin and Murugavalli [35] who studied the detection performance of plant disease using ML and deep learning methodologies. The samples were taken to make the machine to understand the difference between healthy leaf and diseased leaf images. It was found from this work that the performace of Inception v4 (48 layered deep CNN network which is the ImageNet model extension) reached almost 100% for training accuracy and approximately 98% for the validation accuracy. On the other hand, another considered program, VGG 16, which is known as visual geometry group, produced around 83% training accuracy, and 81-82% validation accuracy. Similar research works are found which is related to plant dentification by ML, e.g., Roopshree et al. who designed an IoT based authentication system for therapeutic herbs using ML [36], Chen et al. who described an easy method for identifying commonlyused Chinese herbal medicines using AutoML platforms [37], and Xu et al. who developed multiple attention pyramid networks for herbal recognition, which considered Species and Classes in the Plant Kingdom of the plant samples [38]. More details based on the concept of ref. [38] can be projected as good development opportunity for the current ML platform in this research.

Conclusion
The results of the analysis that have been carried out and the model's achievements in learning the datasets are summarized in the following conclusions: 1. Python programming code in the methodology can be run completely without the appearance of error code. 2. The Xception model is able to study Herbify's herbal plant dataset with a fairly high training and validation accuracy, which is in the range of 97% for training accuracy and 93% for validation accuracy. 3. Although the obtained accuracy seems to show a fairly high number, yet it still needs further evaluation whether there is overfit occurence or not. 4. The model is able to recognize digital images that are inputted to be tested according to the classes that have been categorized, with a fairly good confidence number in each test.
With this achievement, it is hoped that it can convince the reader that the training carried out using the existing methods can provide the final result of a model that can classify herbal plants according to their respective classes.
The author also wants to give recommendations to readers that there are still opportunities for further research, especially in the aspect of evaluation and improvement. Given the analytical capacity possessed by the pre-trained Xception model with some additional programming, it is not impossible to create other models that are capable of analyzing more classes of plants, or perhaps also analyzing other types of datasets, such as animals, automotive components, and so on.
Several of the challenges that can be concluded from this research are the fact that there is a need for devices with proficient capabilities to perform fast analysis and sufficient repetition (epochs), because the more the datasets to be analyzed, the greater the weight or burden of the data that the computer must analyze. Another challenge was also found in the stage of image dataset preparation, where an image dataset can be said to be good enough if the image has been trimmed from other objects surrounding the image itself. This preparation obviously will take a lot of time and effort to be done. And last but not least, another challenge that can be concluded is in terms of parameter tuning, both at the image augmentation and modeling stages, which is expected to obtain a good result of accuracy without the occurrence of overfitting.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

Conflict of interest:
The authors state no conflict of interest.
Data availability statement: The authors declare that all data supporting the findings of this study are available within the article.