Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter November 23, 2020

Interdependences of Products in Market Baskets: Comparing the Conditional Restricted Boltzmann Machine to the Multivariate Logit Model

Harald Hruschka ORCID logo EMAIL logo


We analyze market baskets of individual households in two consumer durables categories (music, computer related products) by the multivariate logit (MVL) model, its finite mixture extension (FM-MVL) and the conditional restricted Boltzmann machine (CRBM). The CRBM attains a vastly better out-of-sample performance than MVL and FM-MVL models. Based on simulation-based likelihood ratio tests we prefer the CRBM to the FM-MVL model. To interpret hidden variables of conditional Boltzmann machines we look at their average probability differences between purchase and non-purchases of any sub-category across all baskets. To measure interdependences we compute cross effects between sub-categories for the best performing FM-MVL model and CRBM. In both product categories the CRBM indicates more or higher positive cross effects than the FM-MVL model. Finally, we suggest appropriate future research based on larger and more detailed data sets.

Corresponding author: Harald Hruschka, University of Regensburg Faculty of Business Economics and Management Information Systems, Regensburg, 93053, Germany, E-mail:


A Conditional Probabilities of the Investigated Models

For the MVL model the conditional probability of a purchase of product j can be written as:

(4) P ( y j = 1 ) = 1 1 + exp ( ( α j + x T β . j + y T V . j ) )

For the FM-MVL model with S segments we obtain the following expression for the conditional probability of a purchase of product j:

(5) P ( y j = 1 ) = s = 1 S π s 1 1 + exp ( ( α j s + x T β s . j + y T V s . j ) )

Parameters of this model are segment-specific. π s denotes the posterior probability of belonging to segment s.

For the CRBM we obtain the following expressions for the conditional probabilities of purchases given hidden variables and for hidden variables given purchases (Li et al. 2015):

(6) P ( y j = 1 | h ) = 1 1 + exp ( ( α j + β . j x + k = 1 K W j k h k ) )

(7) P ( h k = 1 | y ) = 1 1 + exp ( ( γ k + j = 1 J W j k y j ) )

y j denotes the binary purchase indicator for product j, h k the binary kth hidden variable.

B Estimation of the MVL Model

Maximum likelihood estimation of the MVL model requires computation of the so-called normalization constant in every iteration that is obtained by summing over 2 J possible market baskets. Only when expression (1) is divided by the normalization constant a proper probability results. For 30 products we would have to deal with more than 1.0 × 109 possible market baskets. Because of the impracticality of this approach we resort to maximum pseudo-likelihood (MPL) estimation. In a simulation study Bel et al. (2018) compare MPL to maximum likelihood estimation for a maximum number of 12 alternatives. These authors conclude that MPL estimation leads to negligible efficiency losses only.

The pseudo-probability P ˜ j for product j is defined as probability of y j conditional on the observed basket y j , i.e., basket y without product j:

(8) P ˜ j P ( y | y j ) P ( y | y j ) + P ( y ˜ | y j )

Basket y ˜ corresponds to the observed basket y except for product j, whose purchase indicator is flipped, i.e.,  y ˜ j = 1 y j .

MPL estimation is feasible, because the normalization constant drops out in expression (8). Moreover, it is straightforward as the pseudo-likelihood function has only one local maximum. For the MVL model the pseudo-probability P j for product j in basket y is given by (Besag 1972, 1974):

(9) P ˜ j = exp ( y j ( α j + x T β . j + y T V . j ) ) 1 + exp ( α j + x T β . j + y T V . j )

The log pseudo-likelihood LPL of basket y is obtained by summing the logs of pseudo-probabilities across all products

(10) L P L = j = 1 J log ( P ˜ j )

C Estimation of the FM-MVL Model

We assign households to mixture components (i.e., segments) by the Gibbs sampling approach of Shi et al. (2005) replacing the intractable log likelihood value of a basket by its log pseudo likelihood value like in Dippold and Hruschka (2013a) as part of the estimation process. In each iteration, one MVL model specific to the households currently assigned to a segment is estimated by MPL. We start from 100 initial random allocations of households to segments, as the FM-MVL model may be subject to local optima. We choose the solution leading to the best log likelihood value for the estimation sample determined by the Gibbs sampling procedure explained in section 2.3.

D Estimation of the RBM and CRBM

We estimate the RBM and the CRBM by the contrastive divergence (CD) algorithm of Hinton (2002) which approximates the log likelihood. For the CRBM we extend the CD algorithm by adding gradients for the coefficients in β and δ k . Because of the existence of local optima we start the CD algorithm 100 times with random initial coefficient values. Just like for the FM-MVL model we choose the solution of a RBM or CRBM attaining the best log likelihood value for the estimation data using the Gibbs sampling procedure explained in section 2.3.

E Simulation-Based Computation of the Likelihood Ratio Test

The likelihood ratio statistic LRT with LL 1 and LL 0 as log likelihood values for the alternative and the null model can be written as:

(11) L R T = 2 ( L L 1 L L 0 )

The simulation-based approach for the LRT (Lewis et al. 2011) consists of three steps:

  1. Generate S bootstrap samples from the null model.

  2. For each bootstrap sample fit both the null model and the alternative model, determine the likelihood values of these models by the Gibbs sampling procedure explained in section 2.3 and compute the LRT statistic.

  3. The null model is rejected if the proportion of the test statistics for the bootstrap samples which are greater than the test statistic for the estimation data exceeds a prespecified significance level.


Beentjes, S. V., and A. Khamseh. 2020. Higher-Order Interactions in Statistical Physics and Machine Learning: A Non-parametric Solution to the Inverse Problem. Working Paper. arXiv:2006.06010.10.1103/PhysRevE.102.053314Search in Google Scholar

Bel, K., D. Fok, and R. Paap. 2018. “Parameter Estimation in Multivariate Logit Models with Many Binary Choices.” Econometric Reviews 37: 534–50. in Google Scholar

Bengio, Y. 2009. “Learning Deep Architectures for AI.” Foundation and Trends in Machine Learning 2: 1–27. in Google Scholar

Besag, J. 1972. “Nearest-Neighbour Systems and the Auto-Logistic Model for Binary Data.” Journal of the Royal Statistical Society: Series B 34: 75–83. in Google Scholar

Besag, J. 1974. “Spatial Interaction and the Statistical Analysis of Lattice Systems.” Journal of the Royal Statistical Society: Series B 36: 192–236. in Google Scholar

Betancourt, R., and D. Gautschi. 1990. “Demand Complementarities, Household Production, and Retail Assortments.” Marketing Science 9: 146–61. in Google Scholar

Boztuğ, Y., and L. Hildebrandt. 2008a. “Modeling Joint Purchases with a Multivariate MNL Approach.” Schmalenbach Business Review 60: 400–22.10.1007/BF03396777Search in Google Scholar

Boztuğ, Y., and T. Reutterer. 2008b. “A Combined Approach for Segment-Specific Market Basket Analysis.” European Journal of Operational Research 187: 294–312.10.1016/j.ejor.2007.03.001Search in Google Scholar

Boztuğ, Y., and N. Silberhorn. 2006. “Modellierungsansätze in der Warenkorbanalyse im Überblick.” Journal für Betriebswirtschaft 56: 105–28.10.1007/s11301-006-0008-5Search in Google Scholar

Chib, S., P. B. Seetharaman, and A. Strijnev. 2002. “Analysis of Multi-Category Purchase Incidence Decisions Using IRI Market Basket Data.” In Econometric Models in Marketing, edited by P. H. Franses and A. L. Montgomery, pp. 57–92. Amsterdam: JAI.10.1016/S0731-9053(02)16004-XSearch in Google Scholar

Cox, D. 1972. “The Analysis of Multivariate Binary Data.” Journal of the Royal Statistical Society: Series A C 21: 113–20. in Google Scholar

Dippold, K., and H. Hruschka. 2013a. “A Model of Heterogeneous Multicategory Choice for Market Basket Analysis.” Review of Marketing Science 11: 1–31. in Google Scholar

Dippold, K., and H. Hruschka. 2013b. “Variable Selection for Market Basket Analysis.” Computational Statistics 28: 519–29. in Google Scholar

Duvvuri, S. D., A. Ansari, and S. Gupta. 2007. “Consumers’ Price Sensitivities across Complementary Categories.” Management Science 53: 1933–45. in Google Scholar

Elliot, G. C. 1988. “Interpreting Higher Order Interactions in Log-Linear Analysis.” Psychological Bulletin 103: 121–30.10.1037/0033-2909.103.1.121Search in Google Scholar

Hinton, G. E. 2002. “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14: 1771–800. in Google Scholar

Hinton, G. E., and R. R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313: 504–7. in Google Scholar

Hruschka, H. 2014. “Analyzing Market Baskets by Restricted Boltzmann Machines.” Spectrum 36: 209–28. in Google Scholar

Hruschka, H. 2017a. “Analyzing the Dependences of Multicategory Purchases on Interactions of Marketing Variables.” Journal of Business Economics 87: 295–313. in Google Scholar

Hruschka, H. 2017b. “Multi-Category Purchase Incidences with Marketing Cross Effects.” Review of Managerial Science 11: 443–69. in Google Scholar

Hruschka, H. 2019. Comparing Unsupervised Probabilistic Machine Learning Methods for Market Basket Analysis. Review of Managerial Science. in Google Scholar

Hyvärinen, A. 2006. “Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines.” Neural Computation 18: 2283–92. in Google Scholar

Lewis, F., B. Adam, and L. Gilbert. 2011. “A Unified Approach to Model Selection Using the Likelihood Ratio Test.” Methods in Ecology and Evolution 2: 155–62. in Google Scholar

Le Roux, N. and Y. Bengio. 2007. Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Technical Report 1294, Département d’informatique et recherche opérationnelle. Université de Montréal.Search in Google Scholar

Li, X., F. Zhao, and Y. Guo. 2015. “Conditional Restricted Boltzmann Machines for Multi-Label Learning with Incomplete Labels.” In Proceedings of the 18th AISTATS Conference. San Diego, CA.Search in Google Scholar

Manchanda, P., A. Ansari, and S. Gupta. 1999. “The “Shopping Basket”: A Model for Multi-Category Purchase Incidence Decisions.” Marketing Science 18: 95–114. in Google Scholar

Mnih, V., H. LaRochelle, and G. E. Hinton. 2011. “Conditional Restricted Boltzmann Machines for Structured Output Prediction.” In Proceedings ot the 27th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain.Search in Google Scholar

Montfar, G. 2018. “Restricted Boltzmann Machines: Introduction and Review.” In Information Geometry And its Applications: On the Occasion of Shun-Ichi Amari’s 80th Birthday, edited by N. Ay, P. Gibilisco, and F. Mats, pp. 75–115. Basel, Switzerland, MA: Springer Nature.10.1007/978-3-319-97798-0_4Search in Google Scholar

Ni, J., S. A. Neslin, and B. Sun. 2012. “Database Submission the ISMS Durable Goods Data Sets.” Marketing Science 31: 1008–13. in Google Scholar

Russell, G. J. and A. Petersen. 2000. “Analysis of Cross Category Dependence in Market Basket Selection.” Journal of Retailing 76: 369–92. in Google Scholar

Seetharaman, P.B., S. Chib, A. Ainslie, P. Boatwright, T. Chan, S. Gupta, N. Mehta, V. Rao, and A. Strijnev. 2005. “Models of Multi-Category Choice Behavior.” Marketing Letters 16: 239–54. in Google Scholar

Shi, J. Q., R. Murray-Smith, and D. Michael Titterington. 2005. “Hierarchical Gaussian Process Mixtures for Regression.” Statistical Computation 15: 31–41. in Google Scholar

Smolensky, P. 1986. “Information Processing in Dynamical Systems: Foundations of Harmony Theory.” In Parallel Distributed Processing: Explorations In the Microstructure of Cognition, 1, edited by D. E. Rumelhart, and J. L. McClelland, pp. 194–281. Cambridge, MA: MIT Press. Foundations.Search in Google Scholar

Williams, D.A. 1970. “Discrimination between Regression Models to Determine the Pattern of Enzyme Synthesis in Synchronous Cell Cultures.” Biometrics 26: 23–32. in Google Scholar

Xia, F., R. Chatterjee, and J. H. May. 2019. “Using Conditional Restricted Boltzmann Machines to Model Complex Consumer Shopping Patterns.” Marketing Science 38: 711–27. in Google Scholar

Received: 2020-10-08
Accepted: 2020-11-07
Published Online: 2020-11-23

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 8.12.2022 from
Scroll Up Arrow