A study on predicting crime rates through machine learning and data mining using text

: Crime is a threat to any nation ’ s security administration and jurisdiction. Therefore, crime ana - lysis becomes increasingly important because it assigns the time and place based on the collected spatial and temporal data. However, old techniques, such as paperwork, investigative judges, and statistical analysis, are not e ﬃ cient enough to predict the accurate time and location where the crime had taken place. But when machine learning and data mining methods were deployed in crime analysis, crime analysis and predication accuracy increased dramatically. In this study, various types of criminal analysis and prediction using several machine learning and data mining techniques, based on the percentage of an accuracy measure of the previous work, are surveyed and introduced, with the aim of producing a concise review of using these algorithms in crime prediction. It is expected that this review study will be helpful for presenting such techniques to crime researchers in addition to supporting future research to develop these techniques for crime analysis by presenting some crime de ﬁ nition, prediction systems challenges and classi ﬁ cations with a comparative study. It was proved though literature, that supervised learning approaches were used in more studies for crime prediction than other approaches, and Logistic Regression is the most powerful method in predicting crime.


Introduction
Violations of the law pose a danger to the administration of justice and should be curtailed. Computational crime prediction and forecasting can help improve the safety of metropolitan areas. The inability of humans to process large amounts of complicated data from big data makes it difficult to make early and accurate predictions about criminal activity. Computational problems and opportunities arise from accurately predicting crime rates, types, and hot locations based on historical patterns. Still, there is a need for stronger prediction algorithms that target police patrols toward criminal events, despite extensive research efforts [1].
Crime analysis is a methodology approach used to identify crime spots and it is not an easy approach. In year 2020, Geographical Information Systems (GIS) was the non-machine learning tool used earlier for temporal and spatial data. GIS used the crime spots technique that mainly depends on crime type to help reduce crime rates [2].
Crime rate prediction can be defined as a method to build a system for finding crime future patterns and help the law enforcer to solve the crime which lead to reduce its rate in the real-world. Meanwhile, crime forecasting refers to the ability to predict far future crimes, up to years in the future to increase crime preventions, and this can be achieved by using time series approaches to find future crime trends from time series data.
In general, crime analysis in data mining can be predicted using different methods such as statistical methods [3][4][5], cover visualization methods [6][7][8], unsupervised learning, and supervised learning techniques [9][10][11]. Visualization methods include visual explanation of the connection between geographical view and other crime data such as geographic profiling [12], GIS-based crime mapping [13], crime prediction [14][15][16], and asymmetric mapping [17]. However, to obtain the connection between statistical methods, unsupervised learning techniques and crime data such as clustering methods which were very popular. These techniques were implemented as follows: clustering methods were used as criminal behavior analysis [18,19], crime pattern recognition, criminal association analysis, and incident pattern recognition to extract the groups or patterns that had the same features in crime data [20].
Then, the machine learning algorithms' development helped the crime data analysis researchers to investigate crime depending on preprocessing and clustering techniques to extract the crime locations from row data [21], using the supervised and unsupervised machine learning models to analyze these data and discover their pattern based on time and location of crime to produce precise predictions [22]. In addition, the machine learning algorithms' development also helped to investigate the reasons of crime occurring in certain areas by applying machine learning algorithms on history data collected from past years in the same area [23].
Nowadays, the development of classification algorithms, especially machine learning algorithms, helps to enhance the crime prediction [24]. Therefore, researchers tried to connect crime with time depending on various factors to help in resolving the crimes and prevent it and its frequencies. In year 2018, Fourier series was proposed as an analytic technique to accomplish a flexible mathematical model on time periodic effects. This technique explained the accuracy and usefulness of analytical techniques to connect the time factor with crime prediction. Thereby, the analytical techniques effectively achieved the relation between crimes and time, but not for all type of crimes [25].
We can say that, machine learning algorithms is widely used in crime prediction discipline, but it is not more than data mining and each one has its own performance and gives a perfect result.
Our work has been setup so that interested parties become familiar with the previous studies and the accuracy that have been achieved, presented in tabular format. The main contribution of this study is presenting machine learning and data mining applications in predicting crimes, by classifying the studies according to different types of techniques, and providing a brief overview of each applicable methodology that has been used to mine crime, and also, enlisting some challenges faced by such system developers.
The limitation of the state of art works are the lack with big entered geo-area, no generality because of using the same system on two different crime datasets leads to different accuracy percentages with big difference, the lack of works that predict criminal action, and finally but not last, the difficulty that faced the researchers in the crime prediction field, that it may be a missing informations in the on-line crime datasets or the data are repeated.
The rest of the review paper is organized as follows: in Section 2, the research methodology of the survey is explained, in Section 3, crime definitions and descriptions are discussed in detail. In Section 4, challenges of prediction system are discussed. In Section 5, the public datasets are described. The related work is included in Section 6. In Section 7, the prediction system classification is introduced. A comparison study of previous works is explained in Sections 8.1 and 8.2. Eventually, discussion and conclusion are presented at the end of this article in Section 9.

Research methodology
The methodology involved in this review study contains two stages: first is getting the relevant research works on crime prediction with machine learning and data mining studies and analyze them, and second is setting a classification table in Section 8, and finally presenting a study about the performances of various algorithms and the achieved accuracy and comparison between them.
In choosing relevant research works, any Master and Doctoral dissertations or any papers that were not published were ignored. The research keyword was crime prediction with machine learning and data mining or violent crime prediction, the publishing criteria was between 2001 and 2022, the abstract of every article was read and then determined if it is relevant or not.

Crime definition and description
Generally, crimes are classified in to three groups: infractions, felonies, and misdemeanors based on the severity, punishment, and seriousness of crimes. Infractions are minimal crimes such as tailgating, parking overtime, and speeding. Meanwhile, Felonies are considered as most severe crimes followed by misdemeanors which are considered less severe crimes [26]. In addition, the crimes are classified into types based on the time when occurred such as the day, week, month, and season in order to find the connection between these types of crime and then to predict them in the future using machine learning and data mining algorithms. This can be done by using a dataset collected on a certain area for earlier crimes to forecast the future ones.
There are many types of crimes depending on the severity of the crime. Therefore, crimes are classified into three types, which are, felony, misdemeanors, and infractions (or wobblers) [22], as listed and defined in Table 1.
In addition, a crime could be categorized in other categories, such as victim, victimless, and violent crimes and there are other categorizations for crime, but through this study, only the classification mentioned in Table 1 will be considered.

Prediction systems challenges
Researchers and governmental security agents face some problems when it comes to predict crime's location, time, and problems in choosing the effective method to do so. In addition, there are problems faced by the computer science researchers who used machine learning, data mining, and spatial-temporal data. In 2012 and 2016, the near-repeat-victimization and repeat-victimization methods were implemented to predict crimes in houses, streets, and regions. These methods state that if a crime happened in a block, then there is a probability that other crimes are increasing significantly in the same area [27,28].
Crime prediction system developer's challenges: a. The huge amount of data requires a large amount of storage b. Crime-related data are usually in different formats such as text, images, graph, audio, relational data, unstructured data, and semi-structured data [29], so, the process of transforming these data to the understandable format is also a challenge. c. In machine learning, to give the correct label (e.g., prediction or output) to an instance (e.g., context or input) is a challenge. d. Use of appropriate data mining algorithm that gives better results than the used algorithms. e. The environment and surrounding factors, such as the lack of the law and the weather, have an impact on the likelihood of crime, which ultimately causes the crime prediction algorithms to make grave errors. Any crime forecast must take the surrounding and environmental changes into consideration to avoid making such errors and to achieve high prediction accuracy.

Crime datasets
Crime-related data are gathered from a variety of different sources, including police reports, social media, news, and criminal records. It is difficult to gather data of this amount [30]. The datasets are available The illegal entry into a structure to commit theft or a felony. Also, includes attempted forcible entry. Forcible rape Its forcing a female regardless of her age to carnal assault that happens forcibly and against her will. This includes assaults to rape and rape by force. Illegal drug selling This includes drug trafficking and drug distribution which is selling, transporting, and distributing drugs. It is considered as a federal crime by law; a felony crime that involves serious penalties. Robbery The attempt to take anything of value from the custody, control, or care of a person by forcing or threatening by force by putting a victim in fear. Aggravated assault, battery An illegal attack by a person upon another by using a weapon or the victim suffers aggravated bodily injury or obvious severe injuries. Arson Any malicious burning or willful or attempt to burn, with/without intent to defraud a motor vehicle, dwelling house, public building, aircraft, or a personal property of another. Forgery The copying, imitating of something, altering, without authority or right, with the intent to defraud or deceive by passing the thing altered as genuine or original for buying or selling with the intent to defraud or deceive. Misdemeanor Larceny-theft It means the illegal carrying, leading, riding away, or taking of property from the possession or constructive possession of another. Examples are motor vehicle parts, thefts of bicycles shoplifting, and pocket-picking. Fraud The willful deviation of the truth for the sake of persuading another person or other entity in dependence upon it to part with or to surrender a legal right or something of value. Embezzlement It means the illegal misapplication or misappropriation by attacker to his/her own purpose the property, money, or control some other thing of value entrusted to his/her care and custody. Stolen property It means receiving, selling, buying, concealing, possessing, or transporting any property with the knowledge that it has been illegally taken, as by fraud, larceny, robbery, burglary, or embezzlement. Also, attempts are implied. Vandalism It means that to maliciously destroy or willfully, disfigure, deface, or injure any private or public property, personal or real, without the consent of the person or owner having control or custody by tearing, marking, painting, cutting, breaking, drawing, covering with filth, or any other thing. Also, attempts implied. Gambling It means that to illegally wager or bet money on something else of value; promote, assist, or operate some stake; wagering information or transmit; purchase, manufacture, sell, transport, gambling equipment, devices, or goods. Drunkenness It means that to drink alcohol to the edge that one's mental functionalities, faculties, and physical coordination are substantially impaired.

Infraction and wobblers
Overtime parking Parking in an area for longer than the posted time limit.

Speeding ticket
It means a piece of paper that a policeman writes to a person who was driving too fast and it indicates that the driver should pay a fine.

Tailgating
It means dangerous and illegal habit of driving so close to a vehicle in front. If the driver of the vehicle in front stepped the brakes suddenly, online in many countries around the world or gathered from the police departments. During our survey, we noticed that the Chicago crime dataset is more frequently used in crime prediction systems, and that returned to the large population and hight crime rates in this area [31].

Related work
With the huge data size nowadays, the evaluation of machine learning and data mining techniques allow us to deal with this row data and extract the results in better ways. Techniques for criminal activity detection and, more generally, machine learning and data mining, have recently been applied to the area of policing to achieve crime reduction. Correct choices of the parameters for these techniques can help law enforcers to analyze and find the likelihood between crimes as well as patterns and trends in criminal activities, which lead to qualify those activities more efficiently [5].
In this section, the previous related works are discussed and analyzed, these research works are widely variate, some of these take the field of crime analysis to predict, some take the field of application of Artificial Intelligent on crime data, machine learning or data mining (which are subfields of Artificial Intelligent) in order to predict and forecast violence crimes, based on spatial and temporal data in some research works.
During our survey, we noticed five surveys or overviews related to crime prediction and machine learning or data mining.
The earliest was in 2011, a survey of different methods that used to extract patterns from spatial information (they called it spatial data mining (SDM) algorithms) like co-location mining, spatial clustering, spatial hot spots, spatial outliers, spatial auto-regression, conditional auto-regression, and geographically weighted regression, which conclude the effectiveness of these SDM algorithms and the guarantee to use it in the real world, and they found the need for more methods to validate the hypotheses produced by these algorithms [32].
In 2015, some researches in the field of crime prediction with data mining and machine learning were discussed , this research takes a variety of crime related variables then found that the information influencing the crime rate such as age, alcohol, hot spots, media, some policies, etc., do not have effect on crime rate prediction [33], it succeeded in discussion, but there is a shortage in the conclusion.
In 2016, another survey was published. It reviewed over 100 applications of data mining in crime. They made a concise review by preparing a brief table containing the used technique with a specific software, the relevant study area with the expected use and function. They suggested to enhance the benefits, improvements, and usability of data mining techniques in crime data mining by introducing more training and educating fields for these techniques [34].
In 2019, a systematic review of crime prediction and data mining studies between 2004 and 2018 classified the research works based on the used data mining techniques. Based on the challenges addressed and the number of research papers according to technique used, by covering 40 papers, a gap was identified in all of them, that is, when datasets increase, there is a noticeable decrease in the system's overall performance [35]. Finally, in 2020, another systematic review was done, 32 papers were analyzed from 2000 to 2018 in spatial crime forecasting. In this study, in addition to the surveying table that contains the information about the space and time of the research, the crime data, and forecasting details, more than one summary was given, that is, the top four proposed methods, best proposed, and baseline methods applied in the 32 selected papers. This study discussed the points of strengths, weaknesses, threats, and opportunities of the selected papers, and the conclusion was that the contiguity of algorithms should not be ignored in the future [2].

Classification of prediction systems
In machine learning, classification of any prediction systems is the way of defining a model that can describe the concepts or classes of information. The purpose of this model is to predict the class of objects that has an unknown class label. In the real world, police departments are not able to control and limit this large number of crime activities alone. Meanwhile, crimes are rising rapidly, therefore there is a need to use data mining with police detection efforts to predict and then reduce these crime cases. As mentioned, there is a need for technology, especially computer science technologies, using which this problem can be solved as fast as possible. Eventually, prediction systems can be classified according to many factors [36]: a. According to approaches, machine learning and data mining. b. According to prediction type, special and temporal. c. According to dataset, image prediction and data prediction.

Comparison study: Crime prediction vs classification approaches
In this section, Tables 2 and 3 lists the literature surveys of the machine learning and data mining algorithms using different datasets for different cities around the world. In addition, a comparison is made between machine learning and data mining methods toward crime in a border crime prediction system. In these tables, we enlist each selected paper with the important information that will assist other researchers in determining which categories of crime prediction techniques are most powerful. Consequently, these two tables explain the machine learning and data mining algorithms with crime prediction in order to achieve the purpose of this survey. The tables contain the references, the machine learning or data mining algorithms, the used dataset source, and the accuracy of each algorithm depending on a certain dataset that was used for a particular city. The following section discusses crime prediction research works that followed the machine learning and data mining approaches, separately.

Machine learning and crime prediction
Crime prediction has been studied widely due to its relation with the society, these studies employ machine learning algorithms to outfit the crime predicting and forecasting issues. Machine learning algorithms are successfully used to predict spatial crime information. So, in 2006, Support Vector Machine (SVM) algorithm was applied to predict the location of crimes in Columbus, Ohio, US. SVM used both random and clustering approaches to train and test dataset and then predict the hot spot area and improve its effectiveness [37]. These algorithms are used to study the correlation between crime occurrence and crime motivates. In 2013, a Logistic Regression (LR) algorithm was implemented to forecast the relationship between burglar crimes and several other factors which are time of the day, day of the week, barriers, connectors, and repeated victimization, but this model was a failure for large geo-area [38].
In 2015, crime was predicted in southern US states using Random Forest (RF) method after applying SmoteR algorithm to detect the more dangerous crimes. In addition, their work was optimized using R software after the density and population were selected as real values [39]. Eventually, the auto-regressive approach was implemented to forecast the number of crimes that happened in the same time and predict them in urban areas [40]. In 2017, Naive Bayes (NB) algorithm was proposed to predict crime incident depending on history data that shows the same crime happening in the same place. Moreover, NB model was compared with Decision Tree (DT) algorithm in order to test the performance of the proposed method, and found that the NB outperforms the DT even with the computational complexity of DT [41].
In 2020, many research works were presented, one of them fused three methods, the Long short-term memory (LSTM), Residual neural network, and Graph convolutional network to propose a certain mechanism, which was able to extract spatial-temporal features to predict crimes in Chicago. In addition, Root mean square error and Mean absolute error were used as a criterion to test the performance of the applied method [42]. On the other study, a crime network for spatiotemporal data was proposed using Convolutional neural network (CNN) in order to automatically predict the time and place of the crimes [43]. And in another study [44], Recurrent neural network (RNN) with LSTM was integrated in order to design time series crime prediction system to predict crimes in Addis Ababa. Also, in one more study [45], the severity level of crime in Boston was studied and predicted using machine learning algorithm such as SVM, NB, LR, and DT.
We can say that machine learning field successfully analyzed how crime behavior develops over time. Many conclusions were reached from the survey shown in Table 2, such as: a. According to ref. [31], the Deep neural network (DNN) has overcome the SVM, but according to ref. [46], the opposite occurred, the SVM has overcome the DNN, and this can be justified by one reason, the first has worked on an image dataset and the second has worked on a text dataset. So, it is recommended to use DNN in case of an image crime dataset. b. According to refs [1,47], using the same system on two different crime datasets leads to different accuracy percentages with big difference, which shows that the dataset utilized severely affects the results gained. Therefore, this presents a challenge to these algorithms to prove its efficiency and then its accuracy to predict a crime. c. After surveying the machine learning approaches, the highest accuracy crime prediction results gained are shown in Table 2. d. According to ref. [48], the LR algorithm achieves the highest accuracy among the different machine learning algorithms. e. When observing the crime prediction results of the research works adopting the RF method, it was noticed that the highest accuracy achieved is 59.8%, which is considered a poor accuracy, compared with other methods. f. The standard deviations of crime prediction accuracies for each algorithm show that the SVM algorithm outperforms the LR algorithm and achieves (71.9%) accuracy. Actually, it outperforms all other machine learning algorithm's standard deviation results. According to the previous studies, it was noticed that the highest crime prediction accuracy results were gained through the machine learning logistic regression method, which was 95% for Baltimore city in ref. [48]. Furthermore, algorithms such as XGBoost and Logistic Regression have achieved a high accuracy of 94 and 90%, respectively [1]. However, it can be noticed that the same algorithm can perform differently with two different datasets, and this proves that the dataset has a large influence on the crime prediction results.

Data mining and crime prediction
In 2011, special data mining and technologies were proposed to extract patterns from spatial and temporal data. In addition, the data were mined geospatially using special knowledge. In 2011, crimes were predicted in Portland; data mining methods were used to forecast crimes using spatial and temporal dataset collected in Portland and predict whether residential burglary will happen. The methods NB, SVM, DT, and K-Nearest Neighbor algorithms were applied to predict crimes and the result was compared between these methods, which shows the power of neural network in complex systems [57]. Moreover, the pattern extraction usefulness was limited by the complexity of the relationships between spatial data [32]. In 2016, high accuracy was achieved using various DT algorithms to extract knowledge from data collected during 1994 instances, with 128 attributes, then made a comparison between them. In addition, the data were trained and tested using scatter plots to illustrate the crime areas with the severity of each area based on previous data [58]. In year 2016, data mining algorithms were developed and used to classify these crimes based on their types. A crime was characterized according to time, based on factors such as vacations that started with the academic year for colleges and schools. In addition, the classifier was used to predict the severity risk of the crime areas in Denver city between 2010 and 2015 [59]. In 2020, Autoregressive Integrated Moving Average (ARIMA) technique was implemented to predict time series data and then have been visualized with data mining platform. This technique proved that regressive model can work on historical newsfeed data to predict future crimes [60].
Many conclusions were reached from this survey, such as: a. Table 3 shows the comparison of many algorithms implemented against crime prediction challenge, such as DT, NB, RF, etc., either individually or group of them to a certain type of dataset and city. Thereby, this presents a challenge to these algorithms to confirm its effectiveness and then its accuracy to predict a crime. b. The highest accuracy crime prediction results gained, based on the survey of the data mining methods, are shown in Table 3. c. According to ref. [61] the K-mean algorithm achieves the highest accuracy among the different data mining algorithms. d. When we take the standard deviations of crime prediction accuracies for each algorithm, we noticed that the DT algorithm outperforms the NB algorithm and achieves (18.9%) accuracy.
According to the previous study, DT and Neural Network have recorded 94% accuracy for different datasets in refs [48,51] for machine learning algorithms. The k-mean data mining algorithm achieved 93.62% (cluster one) and 93.99% (cluster two) for crimes in India [61].

Conclusion
Crime prediction became the hot research area nowadays because of its correlation benefits to any society or nation's security. It is found that many studies adopted supervised learning approaches to the field of crime prediction compared to others.
It is obviously concluded, that data mining methods achieved the highest crime prediction accuracies, overcoming machine learning methods. Regardless of this, on average, the machine learning out performs data mining in crime prediction. But, when we use the standard deviation of crime prediction accuracies of machine learning and data mining, we can say that the machine learning algorithms perform better than the data mining algorithms.
Eventually, it can be concluded that the comparison of machine learning and data mining algorithms for crime prediction systems give certain indications, such as the selection of an algorithm may depend on the dataset type (like image, text, video, or voice dataset), and there are certain algorithms that preform perfectly on average, but can fail working with other datasets. Crime prediction methods adopting deep learning algorithms were not covered through this survey for time limitation reasons.