Abstract
In this paper, a new supervised classification method dedicated to binary predictors is proposed. Its originality is to combine a model-based classification rule with similarity measures thanks to the introduction of new family of exponential kernels. Some links are established between existing similarity measures when applied to binary predictors. A new family of measures is also introduced to unify some of the existing literature. The performance of the new classification method is illustrated on two real datasets (verbal autopsy data and handwritten digit data) using 76 similarity measures.
References
[1] Andrews, J.L. and P.D. McNicholas (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Stat. Comp. 22(5), 1021–1029. 10.1007/s11222-011-9272-xSearch in Google Scholar
[2] Batagelj, V. and M. Bren (1995). Comparing resemblance measures. J. Classif. 12, 73–90. 10.1007/BF01202268Search in Google Scholar
[3] Baulieu, F.B. (1989). A classification of presence/absence based dissimilarity coefficients. J. Classif. 6, 233–246. 10.1007/BF01908601Search in Google Scholar
[4] Bergé, L., C. Bouveyron, and S. Girard. (2012). HDclassif: an R package for model-based clustering and discriminant analysis of high-dimensional data. J. Stat. Softw. 46(6), 1–29. 10.18637/jss.v046.i06Search in Google Scholar
[5] Bouguila, N., D. Ziou, and J. Vaillancourt (2003). Novel mixtures based on the Dirichlet distribution: application to data and image classification. In Machine Learning and Data Mining in Pattern Recognition, Perner P. ed., 172–181, Springer-Verlag, Berlin Heidelberg. 10.1007/3-540-45065-3_15Search in Google Scholar
[6] Bouveyron, C. and C. Brunet (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comp. 22, 301–324. 10.1007/s11222-011-9249-9Search in Google Scholar
[7] Bouveyron, C., M. Fauvel and S. Girard (2015). Kernel discriminant analysis and clustering with parsimonious Gaussian process models. Stat. Comp., 25, 1143–1162. 10.1007/s11222-014-9505-xSearch in Google Scholar
[8] Bouveyron, C., S. Girard and C. Schmid (2007). High-dimensional discriminant analysis. Commun. Stat. A-Theor. 36, 2607– 2623. 10.1080/03610920701271095Search in Google Scholar
[9] Bouveyron, C., S. Girard and C. Schmid (2007). High-dimensional data clustering. Comput. Stat. Data An. 52, 502–519. 10.1016/j.csda.2007.02.009Search in Google Scholar
[10] Byass, P., D.L. Huong and H.V. Minh (2003). A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam. Scand. J. Public Health 31(62), 32–37. 10.1080/14034950310015086Search in Google Scholar PubMed
[11] Cattell, R. (1966). The scree test for the number of factors. Multivar. Behav. Res. 1(2), 245–276. 10.1207/s15327906mbr0102_10Search in Google Scholar PubMed
[12] Celeux, G. and G. Govaert (1991). Clustering criteria for discrete data and latent class models. J. Classif. 8, 157–176. 10.1007/BF02616237Search in Google Scholar
[13] Dundar, M.M. and D.A. Landgrebe (2004). Toward an optimal supervised classifier for the analysis of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 42(1), 271–277. 10.1109/TGRS.2003.817813Search in Google Scholar
[14] Fauvel, M., C. Bouveyron and S. Girard (2015). Parsimonious Gaussian process models for the classification of hyperspectral remote sensing images. IEEE Geosci. Remote Sens. Lett., to appear. 10.1109/LGRS.2015.2481321Search in Google Scholar
[15] Forbes, F. and D. Wraith (2014). A new family of multivariate heavy-tailed distributions with variable marginal amounts of tail-weight: application to robust clustering. Stat. Comp. 24(6), 971–984. 10.1007/s11222-013-9414-4Search in Google Scholar
[16] Franczak, B.C., R.P. Browne and P.D. McNicholas (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1149–1157. 10.1109/TPAMI.2013.216Search in Google Scholar PubMed
[17] Goodman, L.A and W.H. Kruskal (1954). Measures of association for cross classifications. J. Amer. Statist. Assoc. 49, 732– 764. Search in Google Scholar
[18] Goodman, L.A and W.H. Kruskal (1959). Measures of association for cross classifications II. Further discussion and references. J. Amer. Statist. Assoc. 54, 35–75. Search in Google Scholar
[19] Gönen, M. and E. Alpaydin (2011). Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268. Search in Google Scholar
[20] Guermeur, Y. (2002). Combining discriminant models with new multi-class SVMs. Pattern Anal. Appl. 5(2), 168–179. 10.1007/s100440200015Search in Google Scholar
[21] Guermeur, Y. (2007). VC theory of large margin multi-category classifiers. J. Mach. Learn. Res. 8, 2551–2594. Search in Google Scholar
[22] Hastie, T., R. Tibshirani and J. Friedman (2009). The Elements of Statistical Learning. Second edition. Springer, Berlin. 10.1007/978-0-387-84858-7Search in Google Scholar
[23] Hofmann, T., B. Schölkopf and A. Smola (2008). Kernel methods in machine learning. Annals Stat. 36(3), 1171–1220. 10.1214/009053607000000677Search in Google Scholar
[24] Huong, D.L., H.V. Minh and P. Byass (2003). Applying verbal autopsy to determine cause of death in rural Vietnam. Scand. J. Public Health 31(62), 19–25. 10.1080/14034950310015068Search in Google Scholar PubMed
[25] LeCun, Y., L. Bottou, Y. Bengio and P. Haffner (1998). Gradient-based learning applied to document recognition. Proceedings of IEEE 86(11), 2278–2324. 10.1109/5.726791Search in Google Scholar
[26] Jaccard, P. (1901). Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579. Search in Google Scholar
[27] Lee, S. and G. McLachlan (2013). Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat. Comp. 24(2), 181–202. 10.1007/s11222-012-9362-4Search in Google Scholar
[28] Lin, T.I. (2010). Robust mixture modeling using multivariate skew t-distribution. Stat. Comp. 20, 343–356. 10.1007/s11222-009-9128-9Search in Google Scholar
[29] McLachlan, G. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York. 10.1002/0471725293Search in Google Scholar
[30] McLachlan, G., D. Peel and R. Bean (2003). Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data An. 41, 379–388. 10.1016/S0167-9473(02)00183-4Search in Google Scholar
[31] McNicholas, P. and B. Murphy (2008). Parsimonious Gaussian mixture models. Stat. Comp. 18, 285–296. 10.1007/s11222-008-9056-0Search in Google Scholar
[32] Mika, S., G. Ratsch, J. Weston, B. Schölkopf and K.R. Müller (1999). Fisher discriminant analysis with kernels. In Neural Networks for Signal Processing IX, Y.-H. Hu, J. Larsen, E. Wilson and S. Douglas eds., 41–48. The Institute of Electrical and Electronics Engineers, Inc. New York. 10.1109/NNSP.1999.788121Search in Google Scholar
[33] Montanari, A. and C. Viroli (2010). Heteroscedastic factor mixture analysis. Stat. Modeling 10, 441–460. 10.1177/1471082X0901000405Search in Google Scholar
[34] Murphy, T.B., N. Dean and A.E. Raftery (2010). Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications. Annals Appl. Stat. 4, 219–223. 10.1214/09-AOAS279Search in Google Scholar PubMed PubMed Central
[35] Pekalska, E. and B. Haasdonk (2009). Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1017–1032. 10.1109/TPAMI.2008.290Search in Google Scholar PubMed
[36] Scholkopf, B. and A.J. Smola (1990). Learning with Kernels. The MIT Press, Cambridge MA. Search in Google Scholar
[37] Seung-Seok, C., C. Sung-Hyuk and C. Tappert (2010). A survey of binary similarity and distance measures. J. Syst. Cybern. Informatics 8, 43–48. Search in Google Scholar
[38] Shawe-Taylor, J. and N. Cristianini (2004). Kernel Methods for Pattern Analysis, Cambridge University Press. 10.1017/CBO9780511809682Search in Google Scholar
[39] Reeves, B.C. and M.A. Quigley (1997). A review of data-derived methods for assigning causes of death from verbal autopsy data. Int. J. Epidemiol. 26, 1080–1089. 10.1093/ije/26.5.1080Search in Google Scholar PubMed
[40] Sneath, P.H.A. and R.R. Sokal (1973). Numerical Taxonomy: the Principles and Practice of Numerical Classification, W.H. Freeman and Company, San Francisco. Search in Google Scholar
[41] Sylla, S., S. Girard, A. Diongue, A. Diallo and C. Sokhna (2014). Classification supervisée par modèle de mélange: Application aux diagnostics par autopsie verbale. 46èmes Journées de Statistique organisées par la Société Française de Statistique, Rennes. Search in Google Scholar
[42] Tversky, A. (1977). Feature of similarity, Psychol. Rev. 84, 327–352. Search in Google Scholar
[43] Vilca, F., N. Balakrishnan and C. Zeller (2014). Multivariate skew-normal generalized hyperbolic distribution and its properties. J. Multivar. Anal. 128, 73–85. 10.1016/j.jmva.2014.03.002Search in Google Scholar
[44] Wang, J., J. Lee and C. Zhang (2003). Kernel trick embedded Gaussian mixture model. In Algorithmic Learning Theory, Gavalda, R., Jantke, K. P., Takimoto, E. eds., 159–174. Springer-Verlag, Berlin Heidelberg. 10.1007/978-3-540-39624-6_14Search in Google Scholar
[45] Wraith, D. and F. Forbes (2015). Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering. Comput. Stat. Data An. 90, 61–73. 10.1016/j.csda.2015.04.008Search in Google Scholar
[46] Xu, Z., K. Huang, J. Zhu, I. King and M.R. Lyu (2009). A novel kernel-based maximum a posteriori classification method. Neural Networks 22, 977–987, 2009. 10.1016/j.neunet.2008.11.005Search in Google Scholar PubMed
© 2015 Seydou N. Sylla et al.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.