In this paper, we provide theoretical predictions on the long-run behavior of an adaptive decision maker with foregone payoff information. In the model, the decision maker assigns a subjective payoff assessment to each action based on his past experience and chooses the action that has the highest assessment. After receiving a payoff, the decision maker updates his assessments of actions in an adaptive manner, using not only the objective payoff information but also the foregone payoff information, which may be distorted. The distortion may arise from “the grass is always greener on the other side” effect, pessimism/optimism or envy/gloating; it depends on how the decision maker views the source of the information. We first provide conditions in which the assessment of each action converges, in that the limit assessment is expressed as an average of the expected objective payoff and the expected distorted payoff of the action. Then, we show that the decision maker chooses the optimal action most frequently in the long run if the expected distorted payoff of the action is greater than the ones of the other actions. We also provide conditions, under which this model coincides with the experience-weighted attraction learning, stochastic fictitious play and quantal response equilibrium models, and thus this model provides theoretical predictions for the models in decision problems.
Since we know that if , in the following proofs, we compare the limit assessments of two actions.
Proof of Lemma 4
For each action, the following equation holds:
Hence if holds, then
Proof of Lemma 5 (i)
Here, we prove by contradiction. First, we consider the case in which the condition holds. We now assume that . Since , we have
Note that since we have
Now, since we have
And by the hypothesis that , we have
However, the inequalities  and  contradict each other.
Next, we consider the case in which the condition holds. Since the limit assessment of each action takes a value between the expected objective payoff and expected distorted payoff of the action, we should have that .
Last, we consider the case in which the condition holds. Again, we assume that Since we have
However, this contradicts that and .
Proof of Lemma 5 (ii)
We assume that one of the following inequalities,
holds strictly. Also we assume that . Now consider some , such that for any . Then
What we show here is that the trajectories of ODEs  starting from the points with never enter the area of Q with so that at the unique rest point , which is globally asymptotically stable, we should have that .
First, consider the initial point , such that
Note that (i) and , (ii) if , then , and (iii) if and , then
Therefore, the trajectories starting from do not enter the area with .
Next, we assume that
Then and , and it is obvious that the trajectories of the ODEs do not enter the area with .
Finally, we assume that
Then , and
And again, the trajectories of the ODEs do not enter the area with .
In sum, the trajectories that start from the points on the line with never enter the area of Q with , and thus . We can apply this argument to the other cases where .⃞
Proof of Proposition 4
Assume that . By the property of choice rules, we have that and hence . Since we have
Since , we have . However, this condition contradicts the original hypothesis.⃞
Proof of Proposition 5
We show that if then and thus . Now we assume that . Then
Since , we have that and thus .⃞
Beggs, A. W. 2005. “On the Convergence of Reinforcement Learning.” Journal of Economic Theory122:1–36. Search in Google Scholar
Benaïm, M. 1999. “Dynamics of Stochastic Approximation Algorithms.” In Séminaire De Probabilités, XXXIII, Lecture Notes in Mathematics, vol. 1709, edited by J.Azéma, M.Émery, M.Ledoux and M.Yor, 1–68. Berlin: Springer. Search in Google Scholar
Benaïm, M., and M. W.Hirsch. 1999. “Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games.” Games and Economic Behavior29:36–72. Search in Google Scholar
Borkar, V. S. 2008. Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge, UK: Cambridge University Press. Search in Google Scholar
Brown, G. W. 1951. “Iterative Solution of Games by Fictitious Play.” In Activity Analysis of Production and Allocation, edited by T. C.Koopmans. New York: Wiley. Search in Google Scholar
Camerer, C., and T. H.Ho. 1999. “Experience-Weighted Attraction Learning in Normal Form Games.” Econometrica67:827–74. Search in Google Scholar
Cominetti, R., E.Melo, and S.Sorin. 2010. “A Payoff-Based Learning Procedure and Its Application to Traffic Games.” Games and Economic Behavior70:71–83. Search in Google Scholar
Conley, T. G., and C. R.Udry. 2010. “Learning About a New Technology: Pineapple in Ghana.” American Economic Review100:35–69. Search in Google Scholar
Duffy, J., and N.Feltovich. 1999. “Does Observation of Others Affect Learning in Strategic Environments? An Experimental Study.” International Journal of Game Theory28:131–52. Search in Google Scholar
Erev, I., and A. E.Roth. 1998. “Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria.” American Economic Review88:848–81. Search in Google Scholar
Fudenberg, D., and D. M.Kreps. 1993. “Learning Mixed Equilibria.” Games and Economic Behavior5:320–67. Search in Google Scholar
Grosskopf, B., I.Erev, and E.Yechiam. 2006. “Foregone with the Wind: Indirect Payoff Information and Its Implications for Choice.” International Journal of Game Theory34:285–302. Search in Google Scholar
Grygolec, J., G.Coricelli, and A.Rustichini. 2012. “Positive Interaction of Social Comparison and Personal Responsibility for Outcomes.” Frontiers in Psychology3:25. Search in Google Scholar
Hall, P., and C. C.Heyde. 1980. Martingale Limit Theory and Its Application. New York: Academic Press. Search in Google Scholar
Heller, D., and R.Sarin. 2001. “Adaptive Learning with Indirect Payoff Information.” Working Paper. Search in Google Scholar
Hofbauer, J., and W. H.Sandholm. 2002. “On the Global Convergence of Stochastic Fictitious Play.” Econometrica70:2265–94. Search in Google Scholar
Hopkins, E. 2002. “Two Competing Models of How People Learn in Games.” Econometrica70:2141–66. Search in Google Scholar
Laslier, J.-F., R.Topol, and B.Walliser. 2001. “A Behavioral Learning Process in Games.” Games and Economic Behavior37:340–66. Search in Google Scholar
Leslie, D. S., and E. J.Collins. 2005. “Individual q-Learning in Normal Form Games.” SIAM Journal on Control and Optimization44:495–514. Search in Google Scholar
McKelvey, R. D., and T. R.Palfrey. 1995. “Quantal Response Equilibria for Normal Form Games.” Games and Economic Behavior10:6–38. Search in Google Scholar
Roth, A. E., and I.Erev. 1995. “Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term.” Games and Economic Behavior8:164–212. Search in Google Scholar
Rustichini, A. 1999. “Optimal Properties of Stimulus-Response Learning Models.” Games and Economic Behavior29:244–73. Search in Google Scholar
Sarin, R., and F.Vahid. 1999. “Payoff Assessments without Probabilities: A Simple Dynamic Model of Choice.” Games and Economic Behavior28:294–309. Search in Google Scholar
Tsitsiklis, J. N. 1994. “Asynchronous Stochastic Approximation and Q-Learning.” Machine Learning16:185–202. Search in Google Scholar
Watkins, C. J. C. H., and P.Dayan. 1992. “Q-Learning.” Machine Learning8:279–92. Search in Google Scholar
Grygolec, Coricelli, and Rustichini (2012) investigate the effect of envy and gloating on the evaluations of unchosen actions. However, they have not provided any theoretical investigation for the case.
It is continuous, due to the dominated convergence theorem, while it is a function – not correspondence – due to the fact that the probability of the shock-affected assessments of two actions being the same is zero.
By the result of Tsitsiklis (1994), we can relax the condition and allow the sequence of weighting parameters to be stochastic. In addition, we can allow weighting parameters among actions to be different in each period. That is, the sequence of weighting parameters is , where is a weighting parameter of action i in period n.
For example, see Hall and Heyde (1980, 36).
It is worth to note that there exist mean-preserving distortion functions, such that the distribution of becomes a mean-preserving spread (or contraction) of the one of for each .
They use for the discount factor.
See Grygolec, Coricelli, and Rustichini (2012) for this form of distortion function.
Boundedness is also satisfied, since we assume that payoffs are bounded.
If there are more than two actions, then we will have different dynamics for assessments. The case is left for future work.
Alternatively, each population is large enough, so that the probability of a decision maker being picked again is almost 0.
©2014 by De Gruyter