We analyze market baskets of individual households in two consumer durables categories (music, computer related products) by the multivariate logit (MVL) model, its finite mixture extension (FM-MVL) and the conditional restricted Boltzmann machine (CRBM). The CRBM attains a vastly better out-of-sample performance than MVL and FM-MVL models. Based on simulation-based likelihood ratio tests we prefer the CRBM to the FM-MVL model. To interpret hidden variables of conditional Boltzmann machines we look at their average probability differences between purchase and non-purchases of any sub-category across all baskets. To measure interdependences we compute cross effects between sub-categories for the best performing FM-MVL model and CRBM. In both product categories the CRBM indicates more or higher positive cross effects than the FM-MVL model. Finally, we suggest appropriate future research based on larger and more detailed data sets.
A Conditional Probabilities of the Investigated Models
For the MVL model the conditional probability of a purchase of product j can be written as:
For the FM-MVL model with S segments we obtain the following expression for the conditional probability of a purchase of product j:
Parameters of this model are segment-specific. π s denotes the posterior probability of belonging to segment s.
For the CRBM we obtain the following expressions for the conditional probabilities of purchases given hidden variables and for hidden variables given purchases (Li et al. 2015):
y j denotes the binary purchase indicator for product j, h k the binary kth hidden variable.
B Estimation of the MVL Model
Maximum likelihood estimation of the MVL model requires computation of the so-called normalization constant in every iteration that is obtained by summing over 2 J possible market baskets. Only when expression (1) is divided by the normalization constant a proper probability results. For 30 products we would have to deal with more than 1.0 × 109 possible market baskets. Because of the impracticality of this approach we resort to maximum pseudo-likelihood (MPL) estimation. In a simulation study Bel et al. (2018) compare MPL to maximum likelihood estimation for a maximum number of 12 alternatives. These authors conclude that MPL estimation leads to negligible efficiency losses only.
The pseudo-probability for product j is defined as probability of y j conditional on the observed basket y −j , i.e., basket y without product j:
Basket corresponds to the observed basket y except for product j, whose purchase indicator is flipped, i.e., .
MPL estimation is feasible, because the normalization constant drops out in expression (8). Moreover, it is straightforward as the pseudo-likelihood function has only one local maximum. For the MVL model the pseudo-probability P j for product j in basket y is given by (Besag 1972, 1974):
The log pseudo-likelihood LPL of basket y is obtained by summing the logs of pseudo-probabilities across all products
C Estimation of the FM-MVL Model
We assign households to mixture components (i.e., segments) by the Gibbs sampling approach of Shi et al. (2005) replacing the intractable log likelihood value of a basket by its log pseudo likelihood value like in Dippold and Hruschka (2013a) as part of the estimation process. In each iteration, one MVL model specific to the households currently assigned to a segment is estimated by MPL. We start from 100 initial random allocations of households to segments, as the FM-MVL model may be subject to local optima. We choose the solution leading to the best log likelihood value for the estimation sample determined by the Gibbs sampling procedure explained in section 2.3.
D Estimation of the RBM and CRBM
We estimate the RBM and the CRBM by the contrastive divergence (CD) algorithm of Hinton (2002) which approximates the log likelihood. For the CRBM we extend the CD algorithm by adding gradients for the coefficients in β and δ k . Because of the existence of local optima we start the CD algorithm 100 times with random initial coefficient values. Just like for the FM-MVL model we choose the solution of a RBM or CRBM attaining the best log likelihood value for the estimation data using the Gibbs sampling procedure explained in section 2.3.
E Simulation-Based Computation of the Likelihood Ratio Test
The likelihood ratio statistic LRT with LL 1 and LL 0 as log likelihood values for the alternative and the null model can be written as:
The simulation-based approach for the LRT (Lewis et al. 2011) consists of three steps:
Generate S bootstrap samples from the null model.
For each bootstrap sample fit both the null model and the alternative model, determine the likelihood values of these models by the Gibbs sampling procedure explained in section 2.3 and compute the LRT statistic.
The null model is rejected if the proportion of the test statistics for the bootstrap samples which are greater than the test statistic for the estimation data exceeds a prespecified significance level.
Beentjes, S. V., and A. Khamseh. 2020. Higher-Order Interactions in Statistical Physics and Machine Learning: A Non-parametric Solution to the Inverse Problem. Working Paper. arXiv:2006.06010.10.1103/PhysRevE.102.053314Search in Google Scholar
Bel, K., D. Fok, and R. Paap. 2018. “Parameter Estimation in Multivariate Logit Models with Many Binary Choices.” Econometric Reviews 37: 534–50. https://doi.org/10.1080/07474938.2015.1093780.Search in Google Scholar
Besag, J. 1972. “Nearest-Neighbour Systems and the Auto-Logistic Model for Binary Data.” Journal of the Royal Statistical Society: Series B 34: 75–83. https://doi.org/10.1111/j.2517-6161.1972.tb00889.x.Search in Google Scholar
Besag, J. 1974. “Spatial Interaction and the Statistical Analysis of Lattice Systems.” Journal of the Royal Statistical Society: Series B 36: 192–236. https://doi.org/10.1111/j.2517-6161.1974.tb00999.x.Search in Google Scholar
Betancourt, R., and D. Gautschi. 1990. “Demand Complementarities, Household Production, and Retail Assortments.” Marketing Science 9: 146–61. https://doi.org/10.1287/mksc.9.2.146.Search in Google Scholar
Boztuğ, Y., and T. Reutterer. 2008b. “A Combined Approach for Segment-Specific Market Basket Analysis.” European Journal of Operational Research 187: 294–312.10.1016/j.ejor.2007.03.001Search in Google Scholar
Chib, S., P. B. Seetharaman, and A. Strijnev. 2002. “Analysis of Multi-Category Purchase Incidence Decisions Using IRI Market Basket Data.” In Econometric Models in Marketing, edited by P. H. Franses and A. L. Montgomery, pp. 57–92. Amsterdam: JAI.10.1016/S0731-9053(02)16004-XSearch in Google Scholar
Dippold, K., and H. Hruschka. 2013a. “A Model of Heterogeneous Multicategory Choice for Market Basket Analysis.” Review of Marketing Science 11: 1–31. https://doi.org/10.1515/roms-2012-0001.Search in Google Scholar
Duvvuri, S. D., A. Ansari, and S. Gupta. 2007. “Consumers’ Price Sensitivities across Complementary Categories.” Management Science 53: 1933–45. https://doi.org/10.1287/mnsc.1070.0744.Search in Google Scholar
Hruschka, H. 2017a. “Analyzing the Dependences of Multicategory Purchases on Interactions of Marketing Variables.” Journal of Business Economics 87: 295–313. https://doi.org/10.1007/s11573-016-0820-x.Search in Google Scholar
Hruschka, H. 2019. Comparing Unsupervised Probabilistic Machine Learning Methods for Market Basket Analysis. Review of Managerial Science. https://doi.org/10.1007/s11846-019-00349-0.Search in Google Scholar
Hyvärinen, A. 2006. “Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines.” Neural Computation 18: 2283–92. https://doi.org/10.1162/neco.2006.18.10.2283.Search in Google Scholar
Lewis, F., B. Adam, and L. Gilbert. 2011. “A Unified Approach to Model Selection Using the Likelihood Ratio Test.” Methods in Ecology and Evolution 2: 155–62. https://doi.org/10.1111/j.2041-210x.2010.00063.x.Search in Google Scholar
Le Roux, N. and Y. Bengio. 2007. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Technical Report 1294, Département d’informatique et recherche opérationnelle. Université de Montréal.Search in Google Scholar
Li, X., F. Zhao, and Y. Guo. 2015. “Conditional Restricted Boltzmann Machines for Multi-Label Learning with Incomplete Labels.” In Proceedings of the 18th AISTATS Conference. San Diego, CA.Search in Google Scholar
Manchanda, P., A. Ansari, and S. Gupta. 1999. “The “Shopping Basket”: A Model for Multi-Category Purchase Incidence Decisions.” Marketing Science 18: 95–114. https://doi.org/10.1287/mksc.18.2.95.Search in Google Scholar
Mnih, V., H. LaRochelle, and G. E. Hinton. 2011. “Conditional Restricted Boltzmann Machines for Structured Output Prediction.” In Proceedings ot the 27th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain.Search in Google Scholar
Montfar, G. 2018. “Restricted Boltzmann Machines: Introduction and Review.” In Information Geometry And its Applications: On the Occasion of Shun-Ichi Amari’s 80th Birthday, edited by N. Ay, P. Gibilisco, and F. Mats, pp. 75–115. Basel, Switzerland, MA: Springer Nature.10.1007/978-3-319-97798-0_4Search in Google Scholar
Russell, G. J. and A. Petersen. 2000. “Analysis of Cross Category Dependence in Market Basket Selection.” Journal of Retailing 76: 369–92. https://doi.org/10.1016/s0022-4359(00)00030-0.Search in Google Scholar
Seetharaman, P.B., S. Chib, A. Ainslie, P. Boatwright, T. Chan, S. Gupta, N. Mehta, V. Rao, and A. Strijnev. 2005. “Models of Multi-Category Choice Behavior.” Marketing Letters 16: 239–54. https://doi.org/10.1007/s11002-005-5888-y.Search in Google Scholar
Shi, J. Q., R. Murray-Smith, and D. Michael Titterington. 2005. “Hierarchical Gaussian Process Mixtures for Regression.” Statistical Computation 15: 31–41. https://doi.org/10.1007/s11222-005-4787-7.Search in Google Scholar
Smolensky, P. 1986. “Information Processing in Dynamical Systems: Foundations of Harmony Theory.” In Parallel Distributed Processing: Explorations In the Microstructure of Cognition, 1, edited by D. E. Rumelhart, and J. L. McClelland, pp. 194–281. Cambridge, MA: MIT Press. Foundations.Search in Google Scholar
Williams, D.A. 1970. “Discrimination between Regression Models to Determine the Pattern of Enzyme Synthesis in Synchronous Cell Cultures.” Biometrics 26: 23–32. https://doi.org/10.2307/2529041.Search in Google Scholar
Xia, F., R. Chatterjee, and J. H. May. 2019. “Using Conditional Restricted Boltzmann Machines to Model Complex Consumer Shopping Patterns.” Marketing Science 38: 711–27. https://doi.org/10.1287/mksc.2019.1162.Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston