Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

4 Issues per year

Open Access
Online
ISSN
2083-2567
See all formats and pricing
More options …

Classifiers Accuracy Improvement Based on Missing Data Imputation

Ivan Jordanov / Nedyalko Petrov / Alessio Petrozziello
Published Online: 2017-11-01 | DOI: https://doi.org/10.1515/jaiscr-2018-0002

Abstract

In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand.

Keywords: machine learning; missing data; model-based imputation; neural networks; random forests; support vector machines; radar signal classification

References

  • [1] C. Enders, Applied missing data analysis. Guilford Press, New York, 2010.Google Scholar

  • [2] J. Osborne, Best Practices in Data Cleaning. SAGE, 2013.Google Scholar

  • [3] P. Schmitt, J. Mandel, M. Guedj, A Comparison of Six Methods for Missing Data Imputation. Journal of Biometrics & Biostatistics, 6(1), 2015, 1-6.Google Scholar

  • [4] G. Ridgeway, Generalized Boosted Models: A guide to the gbm package. Update 1.1, 2007. www.saedsayad.com/docs/gbm2.pdf. Accessed 20 October 2016.

  • [5] M. Richards, Fundamentals of radar signal processing. Tata McGraw-Hill Education, 2005.Google Scholar

  • [6] I. Jordanov, N. Petrov, Intelligent Radar Signal Recognition and Classification. In Abielmona, R., Falcon, R., Zincir-Heywood, N., Abbass, H. (eds.) Recent Advances in Computational Intelligence in Defense and Security, 2016, 101-135.Google Scholar

  • [7] I. Jordanov, N. Petrov, A. Petrozziello, Supervised radar signal classification. Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE., 2016, 1464-1471.Google Scholar

  • [8] L. Carro-Calvo, et al., An evolutionary multiclass algorithm for automatic classification of high range resolution radar targets. Integrated Computer-Aided Engineering, 16(1), 2009, 51-60.Google Scholar

  • [9] E. Granger, M. Rubin, S. Grossberg, P. Lavoie, A What-and-Where fusion neural network for recognition and tracking of multiple radar emitters. Neural Networks, 14 (3), 2001, 325-344.CrossrefGoogle Scholar

  • [10] S. Maytal, F. Provost, Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 2007, 1625-1657.Google Scholar

  • [11] N. Ibrahim, R. Abdullah, M. Saripan, Artificial neural network approach in radar target classification. Journal of Computer Science, 5(1), 2009, 23.Google Scholar

  • [12] M. Ahmadlou, H. Adeli, Enhanced probabilistic neural network with local decision circles: A robust classifier. Integrated Computer-Aided Engineering, 17(3), 2010, 197-210.Google Scholar

  • [13] Z. Yin, W. Yang, Z. Yang, L. Zuo, H. Gao, A study on radar emitter recognition based on SPDS neural network. Information Technology Journal, 10(4), 2011, 883-888.Google Scholar

  • [14] M. Gong, J. Zhao, J. Liu, Q. Miao, L. Jiao, Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks, IEEE Trans. on Neural Networks and Learning Systems, 27(1), 2016, 125-138.Google Scholar

  • [15] C. Shieh, C. Lin, A vector neural network for emitter identification. IEEE Trans. on Antennas and Propagation, 50(8), 2002, 1120-1127.Google Scholar

  • [16] S. Zhai, T. Jiang, A new sense-through-foliage target recognition method based on hybrid differential evolution and self-adaptive particle swarm optimization-based support vector machine, Neurocomputing, 149(1), 2015, 573-584.Web of ScienceGoogle Scholar

  • [17] Z. Xin, W. Ying, Y. Bin, Signal classification method based on support vector machine and high-order cumulants. Wireless Sensor Network, 2(1), 2010, 48-52.Google Scholar

  • [18] E. Abdulkadir, I. Onaran, Pulse Doppler radar target recognition using a two-stage SVM procedure. Aerospace and Electronic Systems, 47(2), 2011, 1450-1457.Google Scholar

  • [19] A. Karatzoglou, M. David, H. Kurt, Support vector machines in R, Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2005.Google Scholar

  • [20] L. Breiman, Random forests. Machine Learning, 45(1), 2001, 5-32.Web of ScienceCrossrefGoogle Scholar

  • [21] A. Yali, D. Geman, Shape quantization and recognition with randomized trees. Neural computation, 9(7), 1997, 1545-1588.Google Scholar

  • [22] M. Fernandez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1), 2014, 3133-3181.Google Scholar

  • [23] M. Wainberg, B. Alipanahi, B. Frey, Are Random Forests Truly the Best Classifiers? Journal of Machine Learning Research 17, 2016, 1-5.Google Scholar

  • [24] I. Jordanov, N. Petrov, Sets with Incomplete and Missing Data – NN Radar Signal Classification. IEEE WCCI’14 World Congress on Computational Intelligence, Beijing, China, 2014, 218-225.Google Scholar

  • [25] R. Geaur, Z. Islam, A decision tree-based missing value imputation technique for data pre-processing. Proceedings of the Ninth Australasian Data Mining Conference, 121, 2011, 41-50.Google Scholar

  • [26] A. Feelders, Handling missing data in trees surrogate splits or statistical imputation? Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 2009, 329-334.Google Scholar

  • [27] A. Petrozziello, I. Jordanov, Data Analytics for Online Travelling Recommendation System: A Case Study. Proceedings of the IASTED International Conference Modelling, Identification and Control (MIC 2017), Innsbruck, Austria, 2017, 106-112.Google Scholar

  • [28] M. Templ, A. Kowarik, P. Filzmoser, Iterative stepwise regression imputation using standard and robust methods. Journal of Computational Statistics and Data Analysis, 55, 2011, 2793-2806.CrossrefGoogle Scholar

  • [29] S. Verboven, K. Branden, P. Goos, Sequential imputation for missing values. Computational Biology and Chemistry, 31(5), 2007, 320-327.CrossrefGoogle Scholar

  • [30] F. Sarro, A. Petrozziello, M. Harman, Multi-objective software effort estimation. Proceedings of the 38th International Conference on Software Engineering, ACM, 2016, 619-630).Google Scholar

  • [31] J. Cohen, Statistical power analysis for the behavioural sciences. Routledge, New York, 2013.Google Scholar

  • [32] P. Dalgaard, Introductory Statistics with R. Springer, New York, 2008.Google Scholar

  • [33] J. Huang, C. Ling, Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 2005, 299-310.CrossrefGoogle Scholar

  • [34] D. Hand, R. Till, A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning, 45(2), 2001, 171-186.Google Scholar

About the article

Received: 2017-02-14

Accepted: 2017-03-28

Published Online: 2017-11-01

Published in Print: 2018-01-01


Citation Information: Journal of Artificial Intelligence and Soft Computing Research, Volume 8, Issue 1, Pages 31–48, ISSN (Online) 2083-2567, DOI: https://doi.org/10.1515/jaiscr-2018-0002.

Export Citation

© 2018 Ivan Jordanov et al., published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in