Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Review of Marketing Science

CiteScore 2018: 0.12

SCImago Journal Rank (SJR) 2018: 0.114
Source Normalized Impact per Paper (SNIP) 2018: 0.070

See all formats and pricing
More options …

A Model of Heterogeneous Multicategory Choice for Market Basket Analysis

Katrin Dippold / Harald Hruschka
Published Online: 2013-09-28 | DOI: https://doi.org/10.1515/roms-2012-0001


Based on market basket data, using multicategory purchase incidence models, we analyze demand interdependencies between product categories. We propose a finite mixture multivariate logit model to derive segment-specific intercategory effects of market basket purchase. Under the assumption that only a fraction of intercategory effects are significant, we exclude irrelevant effects by variable selection. This leads to a detailed description of consumers’ shopping behavior that varies over segments not only with respect to (w.r.t.) parameters’ values but also w.r.t. included interaction effects. As the high number of product categories in the model prohibits exact maximum likelihood estimation, we adopt pseudo-likelihood estimation. We apply our model to a data set with 31 product categories and 1,794 households purchasing 17,280 baskets in one store. The best fitting model is determined by predictive model selection. We find that a homogeneous model would overestimate the intensity of interaction between product categories.

Keywords: finite mixture model; market basket analysis; variable selection; multivariate logistic regression; pseudo-likelihood estimation


  • Abramson, C., R. L. Andrews, I. S. Currim, and M. Jones. 2000. “Parameter Bias from Unobserved Effects in the Multinomial Logit Model of Consumer Choice.” Journal of Marketing Research 37(4):410–26.CrossrefGoogle Scholar

  • Ainslie, A., and P. E. Rossi. 1998. “Similarities in Choice Behavior across Product Categories.” Marketing Science 17(2):91–106.CrossrefGoogle Scholar

  • Akaike, H.. 1974. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19(6):716–23.CrossrefGoogle Scholar

  • Allenby, G. M., N. Arora, and J. Ginter. 1998. “On the Heterogeneity of Demand.” Journal of Marketing Research 35(3):384–9.CrossrefGoogle Scholar

  • Allenby, G. M., and P. E. Rossi. 1999. “Marketing Models of Consumer Heterogeneity.” Journal of Econometrics 89(1–2):57–78.Google Scholar

  • Andrews, R. L., A. Ainslie, and I. S. Currim. 2002. “An Empirical Comparison of Logit Choice Models with Discrete Versus Continuous Representations of Heterogeneity.” Journal of Marketing Research 39(4):479–87.CrossrefGoogle Scholar

  • Andrews, R. L., and I. S. Currim. 2002. “Identifying Segments with Identical Choice Behaviors across Product Categories: An Intercategory Logit Mixture Model.” International Journal of Research in Marketing 19(1):65–79.CrossrefGoogle Scholar

  • Andrews, R. L., and I. S. Currim. 2003. “A Comparison of Segment Retention Criteria for Finite Mixture Logit Models.” Journal of Marketing Research 40(2):235–43.CrossrefGoogle Scholar

  • Besag, J. 1974. “Spatial Interaction and the Statistical Analysis of Lattice Systems.” Journal of the Royal Statistical Society Series B 36(2):192–236.Google Scholar

  • Besag, J. 1975. “Statistical Analysis of Non-Lattice Data.” The Statistician 24(3):179–95.CrossrefGoogle Scholar

  • Besag, J. 1977. “Efficiency of Pseudolikelihood Estimation for Simple Gaussian Fields.” Biometrika 64(3):616–18.CrossrefGoogle Scholar

  • Betancourt, R., and D. Gautschi. 1990. “Demand Complementarities, Household Production, and Retail Assortments.” Marketing Science 9(2):146–61.CrossrefGoogle Scholar

  • Boatwright, P., S. Dhar, and P. E. Rossi. 2004. “The Role of Retail Competition, Demographics and Account Retail Strategy as Drivers of Promotional Sensitivity.” Quantitative Marketing and Economics 2(2):169–90.Google Scholar

  • Boztuğ, Y., and L. Hildebrandt. 2008. “Modeling Joint Purchases with a Multivariate MNL Approach.” Schmalenbach Business Review 60(October):400–22.Google Scholar

  • Boztuğ, Y., and T. Reutterer. 2008. “A Combined Approach for Segment-Specific Market Basket Analysis.” European Journal of Operational Research 187(1):294–312.CrossrefGoogle Scholar

  • Boztuğ, Y., and N. Silberhorn. 2006. “Modellierungsansätze in Der Warenkorbanalyse Im Überblick.” Journal Für Betriebswirtschaft 56(2):105–28.CrossrefGoogle Scholar

  • Bronnenberg, B. J., M. W. Kruger, and C. F. Mela. 2008. “The IRI Marketing Data Set.” Marketing Science 27(4):745–8.CrossrefGoogle Scholar

  • Carlin, B. P., and T. A. Louis. 2000. Bayes and Empirical Bayes Methods for Data Analysis. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar

  • Carreira-Perpiñán, M. A., and G. E. Hinton. 2005. “On Contrastive Divergence Learning.” In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, edited by R. G. Cowell and Z. Ghahramani. Barbados: Savannah Hotel.Google Scholar

  • Celeux, G.. 1998. “Bayesian Inference for Mixtures: The Label Switching Problem.” In COMPSTAT 98, edited by R. Payne and P. Green, 227–32. Heidelberg: Physica.Google Scholar

  • Chiang, J. 1991. “A Simultaneous Approach to the Whether, What and How Much to Buy Question.” Marketing Science 10(4):297–315.CrossrefGoogle Scholar

  • Chib, S., P. B. Seetharaman, and A. Strijnev. 2002. “Analysis of Multi-Category Purchase Incidence Using IRI Market Basket Data.” In Econometric Models in Marketing, edited by P. H. Franses and A. L. Montgomery, vol. 16, 57–92. Amsterdam: JAI.Google Scholar

  • Cox, D. R. 1972. “The Analysis of Multivariate Binary Data.” Journal of the Royal Statistical Society Series C (Applied Statistics) 21(2):113–20.Google Scholar

  • Cressie, N. C. 1993. Statistics for Spatial Data. Revised Edition. New York: John Wiley & Sons.Google Scholar

  • Dippold, K., and H. Hruschka. 2013. “Variable Selection for Market Basket Analysis.” Computational Statistics and Data Analysis, 28(2):519–539.CrossrefGoogle Scholar

  • Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Roweth. 1987. “Hybrid Monte Carlo.” Physics Letters B 195(2):216–22.CrossrefGoogle Scholar

  • Duvvuri, S. D., A. Ansari, and S. Gupta. 2007. “Consumers’ Price Sensitivities across Complementary Categories.” Management Science 53(12):1933–45.CrossrefGoogle Scholar

  • Efron, B., and R. J. Tibshirani. 1998. An Introduction to the Bootstrap: Monographs on Statistics and Applied Probability, vol. 57. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar

  • Frühwirth-Schnatter, S. 2006. Finite Mixture and Markov Switching Models. New York: Springer.Google Scholar

  • Geweke, J. 2005. Contemporary Bayesian Econometrics and Statistics. Hoboken, NJ: John Wiley & Sons.Google Scholar

  • Guadagni, P. M., and J. D. C. Little. 1983. “A Logit Model of Brand Choice Calibrated on Scanner Data.” Marketing Science 2(3):203–38.CrossrefGoogle Scholar

  • Gupta, S. 1988. “Impact of Sales Promotions on When, What, and How Much to Buy.” Journal of Marketing Research 25(4):342–55.CrossrefGoogle Scholar

  • Gupta, S., and P. K. Chintagunta. 1994. “On Using Demographic Variables to Determine Segment Membership in Logit Mixture Models.” Journal of Marketing Research 31(1):28–36.Google Scholar

  • Hansen, K. T., V. Singh, and P. K. Chintagunta. 2006. “Understanding Store-Brand Purchase Behavior across Categories.” Marketing Science 25(1):75–90.CrossrefGoogle Scholar

  • Hoch, S. J., B. -D. Kim, A. L. Montgomery, and P. E. Rossi. 1995. “Determinants of Store-Level Price Elasticity.” Journal of Marketing Research 32(1):17–29.CrossrefGoogle Scholar

  • Horowitz, A. M. 1991. “A Generalized Guided Monte Carlo Algorithm.” Physics Letters B 268(2):247–52.CrossrefGoogle Scholar

  • Hruschka, H. 1991. “Bestimmung Der Kaufverbundenheit Mit Hilfe Eines Probabilistischen Meßmodells.” Zfbf Zeitschrift Für Betriebswirtschaftliche Forschung 43:418–34.Google Scholar

  • Hruschka, H. 2013. “Comparing Small and Large Scale Models of Multicategory Buying Behavior.” International Journal of Forecasting, 423–434.Google Scholar

  • Hruschka, H., M. Lukanowicz, and C. Buchta. 1999. “Cross-Category Sales Promotion Effects.” Journal of Retailing and Consumer Services 6(2):99–105.Google Scholar

  • Huang, F., and Y. Ogata. 2002. “Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice.” Annals of the Institute of Statistical Mathematics 54(1):1–18.CrossrefGoogle Scholar

  • Hyvärinen, A. 2006. “Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines.” Neural Computation 18(10):2283–92.CrossrefPubMedGoogle Scholar

  • Kim, B.-D., K. Srinivasan, and R. T. Wilcox. 1999. “Identifying Price Sensitive Consumers: The Relative Merits of Demographic vs. Purchase Pattern Information.” Journal of Retailing 75(2):173–93.CrossrefGoogle Scholar

  • Kruger, M. W., and D. Pagni. 2008. IRI Academic Data Set Description, Version 1.31. Chicago: Information Resources Incorporated.Google Scholar

  • Laud, P. W., and J. G. Ibrahim. 1995. “Predictive Model Selection.” Journal of the Royal Statistical Society Series B 57(1):247–62.Google Scholar

  • Magnussen, S., and R. Reeves. 2007. “Sample-Based Maximum Likelihood Estimation of the Autologistic Model.” Journal of Applied Statistics 34(5):547–61.CrossrefGoogle Scholar

  • Manchanda, P., A. Ansari, and S. Gupta. 1999. “The ‘Shopping Basket’: A Model for Multi-Category Purchase Incidence Decisions.” Marketing Science 18(2):95–114.CrossrefGoogle Scholar

  • McFadden, D. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior.” In Frontiers in Econometrics, edited by P. Zarembka, 105–42. New York: Academic Press.Google Scholar

  • Mild, A. and Reutterer, T. 2003. “An improved collaborative filtering approach for predicting cross-category purchases based on binary market basket data.” Journal of Retailing and Consumer Services 10(3):123–133.Google Scholar

  • McLachlan, G. J., and D. Peel. 2000. Finite Mixture Models. New York: John Wiley & Sons.Google Scholar

  • Moon, S., and G. J. Russell. 2004. A Spatial Choice Model for Product Recommendations. Working Paper, Tippie School of Business, University of Iowa.Google Scholar

  • Neal, R. M. 1996. Bayesian Learning for Neural Networks. Lecture Notes in Statistics 118. New York: Springer.CrossrefGoogle Scholar

  • Niraj, R., V. Padmanabhan, and P. B. Seetharaman. 2008. “Research Note: A Cross-Category Model of Households’ Incidence and Quantity Decisions.” Marketing Science 27(2):225–35.CrossrefGoogle Scholar

  • Nylund, K. L., T. Asparouhov, and B. O. Muthen. 2007. “Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study.” Structural Equation Modeling 14(4):535–69.CrossrefGoogle Scholar

  • Osindero, S., M. Welling, and G. E. Hinton. 2006. “Topographic Product Models Applied to Natural Scene Statistics.” Neural Computation 18(2):381–414.CrossrefGoogle Scholar

  • Rasmussen, C. E. 1996. “Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression.” PhD Thesis, University of Toronto.Google Scholar

  • Rossi, P. E., R. E. McCulloch, and G. E. Allenby. 1996. “The Value of Purchase History Data in Target Marketing.” Marketing Science 15(4):321–40.CrossrefGoogle Scholar

  • Russell, G. J., and A. Petersen. 2000. “Analysis of Cross Category Dependence in Market Basket Selection.” Journal of Retailing 76(3):319–32.Google Scholar

  • Russell, G. J., S. Ratneshwar, A. D. Shocker, D. Bell, A. Bodapati, A. Degeratu, L. Hildebrandt, N. Kim, S. Ramaswami, and V. H. Shankar. 1999. “Multi-Category Decision-Making: Review and Synthesis.” Marketing Letters 10(3):319–32.CrossrefGoogle Scholar

  • Schwarz, G. 1978. “Estimating the Dimension of a Model.” The Annals of Statistics 6(2):461–4.CrossrefGoogle Scholar

  • Seetharaman, P. B., A. Ainslie, and P. K. Chintagunta. 1999. “Investigating Household State Dependence Effects across Categories.” Journal of Marketing Research 36(4):488–500.CrossrefGoogle Scholar

  • Seetharaman, P. B., C. Siddhartha, A. Ainslie, P. Boatwright, T. Chan, S. Gupta, N. Mehta, V. Rao, and A. Strijnev. 2005. “Models of Multi-Category Choice Behavior.” Marketing Letters 16(3–4):239–54.CrossrefGoogle Scholar

  • Sherman, M., T. V. Apanasovich, and R. J. Carroll. 2006. “On Estimation in Binary Autologistic Spatial Models.” Journal of Statistical Computation and Simulation 76(2):167–79.CrossrefGoogle Scholar

  • Shi, J. Q., R. Murray-Smith, and D. M. Titterington. 2005. “Hierarchical Gaussian Process Mixtures for Regression.” Statistics and Computing 15(1):31–41.CrossrefGoogle Scholar

  • Singh, V. P., K. T. Hansen, and S. Gupta. 2005. “Modeling Preferences for Common Attributes in Multi-Category Brand Choice.” Journal of Marketing Research 42(2):195–209.CrossrefGoogle Scholar

  • Song, I., and P. K. Chintagunta. 2007. “A Discrete-Continuous Model for Multicategory Purchase Behavior of Households.” Journal of Marketing Research 44(4):595–612.CrossrefGoogle Scholar

  • Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde. 2002. “Bayesian Measures of Model Complexity and Fit.” Journal of the Royal Statistical Society Series B 64(4):583–639.Google Scholar

  • Tüchler, R. 2008. “Bayesian Variable Selection for Logistic Models Using Auxiliary Mixture Sampling.” Journal of Computational and Graphical Statistics 17(1):76–94.CrossrefGoogle Scholar

  • Varki, S., and P. K. Chintagunta. 2004. “The Augmented Latent Class Model: Incorporating Additional Heterogeneity in the Latent Class Model for Panel Data.” Journal of Marketing Research 41(2):226–33.CrossrefGoogle Scholar

  • Wang, H. D., M. U. Kalwani, and T. Akcura. 2007. “A Bayesian Multivariate Poisson Regression Model of Cross-Category Store Brand Purchasing Behavior.” Journal of Retailing and Consumer Service 14(6):369–82.Google Scholar

  • Wang, L., J. Liu, and S. Z. Li. 2000. “MRF Parameter Estimation by MCMC Method.” Pattern Recognition 33(11):1919–25.CrossrefGoogle Scholar

  • Wedel, M., and W. A. Kamakura. 1998. Market Segmentation: Conceptual and Methodological Foundations. International Series in Quantitative Marketing. Boston, MA: Kluwer Academic Publishers.Google Scholar

  • Wedel, M., W. A. Kamakura, N. Arora, A. Bemmaor, J. Chiang, T. Elrod, R. Johnson, P. Lenk, S. Neslin, and C. S. Poulsen. 1999. “Discrete and Continuous Representations of Unobserved Heterogeneity in Choice Modeling.” Marketing Letters 10(3):219–32.CrossrefGoogle Scholar

About the article

Published Online: 2013-09-28

Examples for this approach which deal with another multicategory problem by focusing on the covariance of coefficients across categories are the probit models of Ainslie and Rossi (1998) and Seetharaman, Ainslie, and Chintagunta (1999) or the logit models of Hansen, Singh, and Chintagunta (2006) and Singh, Hansen, and Gupta (2005). In the latter two studies, a factor analytic structure is imposed on the covariance matrix of coefficients.

There has been a long discussion on the performance of finite vs continuous mixture models (e.g. Wedel et al. 1999; for a short summary, see Varki and Chintagunta 2004). Whereas Allenby, Arora, and Ginter (1998) and Allenby and Rossi (1999) argue that FM models do not sufficiently represent consumer heterogeneity, especially when the number of segments is small and complete homogeneity within a segment is an unrealistic assumption, Andrews, Ainslie, and Currim (2002) do not find any performance superiority of the continuous over the FM models. Even for a very limited number of segments (one to three), the continuous and the discrete model recover parameters and forecast holdout data equally well. Additionally, Wedel and Kamakura (1998) and Wedel et al. (1999) stress the consistency of the FM model with the way management thinks about consumers in segments.

Segment is the marketing interpretation of a component in a FM model. Therefore, the terms segment and component are used interchangeably in the text.

In contrast to probit models, there is no biasing effect of joint non-purchase that would be the most frequently occurring event. We also remark that we follow the cross-category effect definition by Hruschka, Lukanowicz, and Buchta (1999). In contrast to Russell and Petersen (2000), cross-category effects do not depend on a household’s typical basket size. This modeling decision is justified, because (1) the inclusion of basket size resulted only in a weak improvement of the LL value for holdout data in the RP model and (2) our model already accounts for interaction effect variability by estimating different effects for different segments.

We thank one anonymous reviewer who suggested to discuss this issue and drew our attention to the latent variable interpretation of co-incidence. We also thank the editor for suggesting relevant references.

With , Z has elements, that is, all possible market baskets. Huang and Ogata (2002) observe exponents between 9 and 15 to be the limit of computation.

The general disadvantage of wrong standard errors can be easily adjusted for, as correct standard errors can be computed with bootstrapping (e.g. Efron and Tibshirani 1998).

This two-step approach is conventionally used in FM models for multicategory choice (e.g. Song and Chintagunta 2007).

See, for example, Andrews and Currim (2002) for a complete tabulation including formulas for calculations.

We argue that the homogeneous model smoothes interaction effects. The lower number of interaction effects included for the heterogeneous model might also contribute to such results. Boztuğ and Reutterer (2008) formulate a similar hypothesis, though they motivate it differently. Chib, Seetharaman, and Strijnev (2002) present the opposite effect.

Our model specification does not include RP’s category-specific household variables time since last category purchase (TIME) and loyalty (LOYAL). As this model is estimated over the purchases within one shop only neglecting purchases in other stores, we do not have complete information on a consumer’s shopping history. Therefore, the values of TIME and LOYAL would not be meaningful. Besides, we already account for heterogeneity with the FM model and do not need auxiliary measures of consumer diversity.

Test runs showed that the model parameters and especially the household assignment to segments stabilize quickly.

For reasons of comparability, PLL is also calculated for the independence model whose parameters are estimated by ML.

We only consider interaction effects larger than 0.001 in absolute size.

We thank one anonymous reviewer who recommended to compare the results of our model to those obtained by a correlational analysis.

Household income in thousand US$.

HS: high school; C: college; TS: technical school; PG: postgraduate work. SC: some college, what means that the person left college without a degree.

For model stability, variable selection is not applied to category constants or marketing-mix coefficients.

Citation Information: Review of Marketing Science, Volume 11, Issue 1, Pages 1–31, ISSN (Online) 1546-5616, ISSN (Print) 2194-5985, DOI: https://doi.org/10.1515/roms-2012-0001.

Export Citation

©2013 by Walter de Gruyter Berlin / Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

유영글, 조완형, Youngchan Choe, and Junghoon Moon
Journal of Distribution and Management Research, 2017, Volume 20, Number 1, Page 35
Michael W. Kruger
SSRN Electronic Journal, 2013
Vithala R. Rao, Gary J. Russell, Hemant Bhargava, Alan Cooke, Tim Derdenger, Hwang Kim, Nanda Kumar, Irwin Levin, Yu Ma, Nitin Mehta, John Pracejus, and R. Venkatesh
Customer Needs and Solutions, 2017
Harald Hruschka
Journal of Forecasting, 2017, Volume 36, Number 3, Page 230
Thomas Reutterer, Kurt Hornik, Nicolas March, and Kathrin Gruber
Journal of Business Economics, 2017, Volume 87, Number 3, Page 337
Harald Hruschka
Journal of Business Economics, 2017, Volume 87, Number 3, Page 295
Harald Hruschka
Review of Managerial Science, 2017, Volume 11, Number 2, Page 443
Timothy J. Richards
Agribusiness, 2017, Volume 33, Number 2, Page 135

Comments (0)

Please log in or register to comment.
Log in