Google Play Content Scraping and Knowledge Engineering using Natural Language Processing Techniques with the Analysis of User Reviews

: To maintain the competitive edge and evaluating the needs of the quality app is in the mobile application market. The user’s feedback on these applications plays an essential role in the mobile application development industry. The rapid growth of web technology gave people an opportunity to interact and express their review, rate and share their feedback about applications. In this paper we have scrapped 506259 of user reviews and applications rate from Google Play Store from 14 different categories. The statistical information was measured in the results using different of common machine learning algorithms such as the Logistic Regression, Random Forest Classifier, and Multinomial Naïve Bayes. Different parameters including the accuracy, precision, recall, and F1 score were used to evaluate Bigram, Trigram, and N-gram, and the statistical result of these algorithms was compared. The analysis of each algorithm, one by one, is performed, and the result has been evaluated. It is concluded that logistic regression is the best algorithm for review analysis of the Google Play Store applications. The results have been checked scientifically, and it is found that the accuracy of the logistic regression algorithm for analyzing different reviews based on three classes, i.e., positive, negative, and neutral.


Introduction
In natural language processing, classifying documents and strings into different categories is considered a vital task in the process. For organizing, the online information text classification gained an important role nowadays. In literature, the authors have used text classification of an email as spam for detecting user's sentiments of comments or tweets [1]. In text classification, it is difficult to conduct automatic tagging of customer queries, classification of blogs in different categories, and dealing with the small training dataset. More specifically, the learners find that text classification is extremely challenging for generalizing. In this research, different machine learning algorithms were used for Google Play categories, and classifications of text mining were used for android application reviews [2].
For mobile devices within the few clicks, the Google Play Store or application distribution platform that allows users to deploy, buy,and search software applications. In-text reviews, these platforms allow users to share their ratings and reviews about the application [3]. For example, for the specific application, they express their satisfaction or request a new feature. There is some information about the application in reviews, which is more useful for the analysis and application designer, such as documentation, feature reports, and bug reports of user experiences for a specific application of features. These reviews on the application can be present as the "Voice of the Users" that can be more helpful for the development effort and improve future release applications [4].
Several limitations prevent the development team and analysts from using the information in the reviews. Firstly, a considerable effort is required for analyzing many reviews. The latest analysis by the authors in [5] discovered that iPhone users usually put 22 reviews on average per day. A remarkably popular application like Facebook receives significantly more than 4000 reviews per day. Secondly, the standard of the reviews fluctuates extensively from useful ideas and advanced thoughts to insulting reviews. Third, a review typically contains a sentiment mix regarding different app features, which makes it hard, e.g., filtering positive and negative reviews or retrieve the exact reviews for specific features. The usefulness of this star rating from the reviews is restricted to progress teams as a score reflects a mean for that entire app and indeed will combine both positive and negative evaluation of their unique features [6].
Text mining, which is defined as the process by which high-quality information from text is derived, is also known as text data mining. High-quality information is a statistical pattern learning which is derived from the patterns and trends through different means [7]. In the text mining, the input text structuring process normally uses parsing in line with the addition of certain linguistic features that have already been derived and subsequently inserted into a database before output being eventually evaluated and interpreted [8]. Therefore, the high-quality in text mining usually represents the interest, relevance, and novelty combination. Typically, text mining task includes entity relation modeling, for example, learning about the relationship with named entities, document summarization, sentiment analysis, production of granular taxonomies, concept or entity extraction, text clustering, and text categorization [9].
For analyzing Google Play Store reviews semantics, we are comparing the different parameters with different machine learning algorithms and find the best algorithm that we can use for the analysis of semantic analysis of Google Play Store reviews. We calculate different parameters like accuracy, recall, F1 score, and precision with Bigram, Trigram, and N-gram [10]. In this sense, a bigram or diagram refers to two closed sequenced elements from a string of tokens such as typical words, syllables, or letters. A bigram is an n-gram for n=2. The distribution of frequency for every string bigram is mostly used for the text simple statistical analysis in many applications including the cryptography, speech recognition, and computational linguistics [11]. Trigrams, which are a case of the n-gram, are often used in natural processing of language to perform statistical analysis of texts, and to control and use ciphers and codes in cryptography. In the probability and computational linguistics fields, an n-gram is an adjacent sequence of n items from a specific sample of textual speech or content. As per each application, the items can be words base pairs, syllables, letters, or phonemes. The n-grams, which may also be called shingles in case of words, are typically collected from a speech corpus or text [12].

Literature Review
Downloading and employing mobile applications by billions of people around the word has increased rapidly these days due to the recent wide-spread of the easy-to-use stores such as the Apple, Google Play and Windows phone. It is believed that fragmentation relevant to mobile platforms such that of Apple iOS, Windows cell phone, and Android represents an absolute fascinating challenge in the progress of mobile apps. Not too long ago, businesses such as Adobe, IBM, and a growing network of programmers have advocated the development of hybrid apps as a potential remedy to such trouble in the industry. Apps of the Hybrid phone are evolved steadily with their platforms and assembled on specifications of the web [13]. The authors, in this paper, evaluate the portable hybrid apps empirically for the aim of highlighting and investigating the potential and exceptional qualities of the openly offered hybrid apps as perceived by users and their related reviews. The analysis was conducted by mining 11,917 free applications and 3,041,315 reviews obtained from the Google Play Store and assessed according to the perspectives of the end-users. Consequently, the analysis built on an object and reproducible representation of the way by which the development of the hybrid mobile was performing "from the great outdoors" found in genuine reviews, thereby setting a foundation for prospective procedures and methods for establishing hybrid apps [14,15].
User review is an essential part in the markets of the open mobile applications such as the Google Play Store. How is it possible to automatically combine countless reviews of users and produce a concrete sense from them? However, unfortunately, few analytic tools can provide into user reviews beyond simple summaries like user ratings histograms, [16]. This paper suggests the Wiscoma system, which may test hundred and thousands of users reviews along with opinions from mobile apps markets at about three distinct heights of depth. Authors suggest that their system can (a) find inconsistencies in reviews; (b) recognize causes why users dislike or like that specific app, supply a zoomable interactive perspective of users' review; and (c) present important insights into the whole app markets. This proposal applies to different types of apps that identify users significant preferences and concerns [17]. Results with the purposed system will be reported to the 32 GB dataset that is composed of over 13 million users' reviews for 171,493 Android applications in the Google Play Store. The author discusses how this proposed system can help mobile applications market operators such as end-users, individual app developers, and Google [18,19].
Unlike services and products in Amazon.com, mobile apps are always evolving, with all new versions speedily changing the previous versions. Many app stores even now utilize an Amazon-style rating technique, which aggregates just about every rating ever assigned to an app into a store rating. The author mined 10000 mobile application store ratings from Google Play Store to examine the user's satisfaction level. Even though many applications rating designswerechallenging to variate when these applications had gathered a considerable number of raters. The conclusion of this research that the current systems running in the market cannot analyze the user satisfaction levels that can discourage developers from improving the quality of the application [20]. Now, using apps has increased together with the rising craze in the direction of mobiles. The end-users will prefer mobile phones to get several types of mobile app for different purposes. The user will download the app by checking the number of downloads of that particular app [21]. What would be the reviews and ratings? What would be the comments? Users download mobile applications. In the mobile application market, the fraud ranking of the application is an illegal activity that is used to push up the mobile application in the list of the popularity of the application. The application developer uses this fake mechanism periodically in the different application development process [22].
Research on mining user reviews in mobile application stores has progressed in the last couple of decades. Most of the suggested methods count on optimizing the meta description of reading user reviews to different kinds of educational user prerequisites along with uninformative suggestions. Determined by the essential characteristics of reviews regularly produces high-dimensional variations. That raises the intricacy of the classifier also may cause overfitting issues. Authors suggest a publication recruitment tactic for apps inspection classification [23].

Methodology for Google Play Content Scraping and Knowledge Engineering
In the process of classification, starting with the scraping of reviews on applications. On Google Play Store using the App ID request for scrap, the reviews of that specific application scrap several pages with reviews and ratings of the applications. In the next step, apply Bigram, Trigram, and N-gram on reviews by using a python language often used in information retrieval and text mining. After applying Bigram, Trigram, and N-gram extract different features of each application. By using a python, a different algorithm is used for the classification of Naïve Bayes Multinomial, Random forest, and logistic regression, in addition to setting the different parameters such as Precision, Recall, Accuracy, and F1 score and finding the statistics of these parameters. After analyzing and testing, this statistical information analyzes which algorithm has a maximum Precision, Recall, Accuracy, and F1 score information and analyzes that is best for the analysis of reviews classification, as shown in Figure 1.

Data Collection Process
Mobile applications are part of our lives. According to a report, half a million applications were introduced in 2011, and in October 2012, 0.675 million applications were available on the Google Play Store. In our daily life that people use Android apps mostly. Now a day's Android app is being used by every one of us; people use different Android apps, like messaging, social media, gaming, and browsers. This online marketplace provides mobile users with both free and paid access to over a million mobile applications, also refers as "mobile apps" On the Google Play Store website, users can choose from over a million mobile apps for various datasets with predefined categories. Data collection always plays a vital role in every research, and the validity and accuracy of the dataset is also a significant part of any dataset collection process. In this research, we have scraped the hundreds and thousands of user's reviews and ratings of different applications of different categories, as shown in Figure 1. In the start, we have selected different categories of Google Play Store. After choosing different 14 categories of Google Play Store, different scrap application of each category that is shown in Table 1. These categories of applications are Action, Arcade, Card, Communication, Finance, Health and Fitness, Photography, Shopping, Sports, Video Player Editor, Weather, Casual, Medical, and Racing. We have scraped 506259 reviews from 14 different categories of Google Play Store application, as indicated in Table 1.

Results and Discussion
This section addresses the evaluation of the scraped dataset by using different machine learning algorithm like Logistics Regression Algorithm, Naïve Bayes Multinomial, and Random Forest Algorithm. The Bigram, Trigram, and N-gram to were evaluated to find out the best algorithm on the basis of precision, recall, accuracy, and F1 score

Logistics Regression Algorithm for Bigram, Trigram, N-gram
Logistic regression is the statistical model used to model a binary dependent variable. This model is estimating the population parameter (which is a quantity entering into the probability distribution of a statistic of the logistic model). The logistic regression algorithm has been applied in the form of binomial regression. We have scraped 506259 reviews from 14 different categories of Google Play Store application. We have applied a logistic regression algorithm on different population parameter concerning Bigram, Trigram, and N-gram. Find the accuracy of classification of each category application and in statistical information find precision, recall, and F1 score these all parameters we use to measure the accuracy of the dataset is shown in Table 2. Figure 2 views Bar chart of Logistics Regression Algorithm for different precision, recall, f1, and accuracy by using Bigram, Trigram, and N-gram.

Naïve Bayes Multinomial for Bigram, Trigram, N-gram
Naïve Bayes Multinomial used for classification which with the high dimensional dataset. In this algorithm,certain features are dependent on the occurrence of other features. This model is fast to make predictions. We have scraped 506259 reviews from 14 different categories of Google Play Store application. We have  applied a Naïve Bayes Multinomial algorithm on different population parameter concerning Bigram, Trigram, and N-gram. Find the accuracy of classification of each category application and in statistical information find precision, recall, and F1 score these all parameters we use to measure the accuracy of the dataset is shown in Table 3. Figure 3 views Bar chart of Naïve Bayes Multinomial for different precision, recall, f1, and accuracy by using Bigram, Trigram, and N-gram.

Random Forest Algorithm for Bigram, Trigram, N-gram
Random forest classifier is majorly used for decision tree. Many decision trees can develop on the bases of a random selection of datasets and variables. We have scraped 506259 reviews from 14 various categories found in Google Play Store application. The random forest algorithm has been applied on different population parameter concerning Bigram. Find the accuracy of classification of each category application and in statistical information find precision, recall, and F1 score these all parameters we use to measure the accuracy of the dataset is shown in Table 4. Figure 4 views Bar chart of Random Forest Algorithm for different precision, recall, f1, and accuracy by using Bigram, Trigram, and N-gram.

Comparison of Different Machine Learning Algorithms using Bigram
This online marketplace provided free and paid access to users. On the Google Play Store, users can choose from over a million apps from various predefined categories. In this research, we have scraped 506259 reviews from 14 different categories of Google Play Store application. Evaluated the results by using different machine learning algorithms like Naïve Bayes Multinomial, Random Forest, and Logistic Regression algorithm on different paraments concerning Bigram, Trigram, and N-gram. That can check the semantics of reviews about some applications form users that their reviews are good, bad, normal,and so on. Calculated to Bigram, Tri-      gram, and N-gram with different parameters like accuracy, precision, recall, and F1 score, the concluded results were compared to the statistical result of the algorithms. Visualized these statistical results in the form of a bar chart, as shown in Figure 5 to Figure 7. After comparison, analyzed that the logistic regression algo- rithm is the best algorithm for checking the semantic analysis of any Google application users' reviews, as shown in Table 5 to Table 7.

Semantic Analysis of Google Play Store Applications Reviews using Logistic Regression Algorithm
After checking the different population parameter, analyze that the logistic regression algorithm is the best algorithm having the highest accuracy. In this section, we performed analysis and classify all reviews in different classes positive, negative, and neutral. Set target value if the value of the comment is positive, it is equal to 1 if the review is negative, it is equal to 0. Also, analyze the neutral class with the confidence rate if the confidence rate is between the 0 and 1 then classify this to neutral class. Different parameters in our dataset like the category of application, Application Name, Application ID, Reviews, and rating, as shown in Figure 8. However, for checking the semantics of each review, these parameters are more enough.

Uniform Recourse Locator (URL) links
In this step URL must be removed.

UTF-8 BOM (Byte Order Mark)
For characters patterns like "\xef\xbf\xbd," these are UTF-8 BOM. It is a sequence of bytes (EF BB BF) which helps the reader identify a file encoded in UTF-8.

Hashtag / Numbers
A hashtag text can refer to the useful information on the comment. It is possible that it is tough to remove the whole text together by using the "#" or with a number or with any other unique character needs to accommodate.

Negation Handling
is the factor that is not suitable in the review remove them.

Tokenizing and Joining
Parse the whole comment into small pieces and then merge again. After applying the above rules on cleaning, the reviews cleaned formed of reviews.

Find Null Entries from the Reviews
In order to remove the noises and inconstant from data, the null value needs to be removed.

Negative and Positive Words Dictionary
By using word cloud corpus, we have created a dictionary contains a positive and negative words on the basis of words occurrence in a text to get the idea of what kind of words are frequent in the corpus, as shown in Figure 9.

The Semantic Analysis of Reviews using Logistic Regression Algorithm
In the result, we classified all reviews into three different classes and we checked the confidence rate of each rate that how much that comment is positive, negative, and neutral. Set the target value equal to 0 to 1 and check the confidence value in that ratio and check the class of the review using the logistic regression algorithm, as shown in Figure 10.

Conclusion and Future Work
Hundreds and thousands of apps uploaded by developers and downloaded by users are on the Google Play Store. Users use these applications for their specific purpose, and they have their personal experiences. Users download and use these applications and express the application's experience in the form of comments or reviews and give the applications a 0-5 scale rating. We have scraped 506259 reviews for 14 different categories of Google Play Store applications in this research work. We have analyzed the class of the reviews that may be positive, negative, and neutral. We have checked the application semantics with different algorithms of machine learning. We have used three different machine learning algorithms, such as Logistic Regression Algorithm, Random Forest, and Multinomial Naïve Bayes. Evaluate Bigram, Trigram, and N-gram with various parameters such as precision, accuracy, recall, and F1 score, and compared the statistical results of these algorithms. After contrast, we have evaluated that the Logistic regression algorithm is the most active algorithm with a high precision score, and we can use this machine-learning algorithm to test the user reviews. In the future we will increase the number of categories of applications and number of reviews. We will compare the accuracy of the logistic regression algorithm with other different algorithms. We will generate the clusters and check the relationship between application reviews and ratings that can help to more accurately analyze each application.