Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter March 10, 2022

MSE-optimal K-factor of the Elo rating system for round-robin tournament

  • Victor Chan EMAIL logo

Abstract

The Elo rating system contains a coefficient called the K-factor which governs the amount of change to the updated ratings and is often determined by empirical or heuristic means. Theoretical studies on the K-factor have been sparse and not much is known about the pertinent factors that impact its appropriate values in applications. This paper has two main goals: to present a new formulation of the K-factor that is optimal with respect to the mean-squared-error (MSE) criterion in a round-robin tournament setting and to investigate the effects of the relevant variables, including the number of tournament participants n, on the optimal K-factor (based on the model-averaged MSE). It is found that n and the variability of the deviation between the true rating and the pre-tournament rating have a strong influence on the optimal K-factor. Comparisons between the MSE-optimal K-factor and the K-factors from Elo and from the US Chess Federation as a function of n are also provided. Although the results are applicable to other sports in similar settings, the study focuses on chess and makes use of the rating data and the K-factor values from the chess world.


Corresponding author: Victor Chan, Department of Mathematics, Western Washington University, Bellingham 98225, WA, USA, E-mail:

Acknowledgments

We are grateful to the Associate Editor and two anonymous referees for providing valuable comments and suggestions, which helped improve this paper considerably.

  1. Author contribution: The author has accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Conflict of interest statement: The author declares no conflicts of interest regarding this article.

Appendix

Result 1

(Approximate Asymptotic Limit of the Updated Rating Vector r* under the MSE-Optimal K-factor as m → ∞). Let r*, r, and r t be the n-dimensional vectors of post-tournament ratings, pre-tournament ratings, and the true ratings during the tournament, respectively — as defined in Section 3.2 . Define r ̄ t and r ̄ as the arithmetic mean of the elements in r t and r, respectively. Assume that the true ratings obey the conservation of Elo ratings so that the sum of the elements in r t equals the sum of the elements in r, i.e., n r ̄ t = n r ̄ .

Consider an m-replicate round-robin tournament, i.e., every player meets every other player m times, in which the MSE-optimal K-factor ko, given in Eq. (11), is used in the Elo Rating System to update the ratings. Then r* converges to r t approximately (in the linear sense) as m → ∞.

Note:

  1. The convergence is in the sense of ‘almost surely’ and the approximation (in the linear sense) is based on a linear approximation of the paired-comparison model used in the ERS.

  2. An important property of the ERS is the conservation of ratings, i.e., with the K-factor fixed, the sum of all post-tournament ratings i = 1 n R i * always equals the sum of all pre-tournament ratings i = 1 n R i . For more details, see Chapter 5 of Langville and Meyer (2012).

Proof

By Eq. (5) and the definition of z given in Section 3.2,

r * r = m k U X m p

where X is the vector of the scores X ij . Both X ij and p are defined in Section 3.2. With k = ko,

(21) r * r = r t r U ( p t p ) t r a c e U U D t / m + ( p t p ) U U ( p t p ) U X m p r t r U ( p t p ) ( p t p ) U U ( p t p ) U p t p

as m → ∞, since X/mp t almost surely by the Law of Large Numbers.

Because treating the model for win probability in greater generality facilitates the proof, we shall consider the general paired-comparison model H, rather than the specific Bradley–Terry model given in Eq. (2). The function H is continuous and strictly increasing within its domain and hence its inverse H−1 exists over the interval {p : 0 < p < 1}. This inverse can be well approximated within a large portion of the interval by a line centered at p = 0.5. A linear approximation of R i R j is therefore given by

(22) R i R j = H 1 ( p i j ) α + β p i j

for i > j, where α and β are constants that depend on the model H. Likewise, R i t R j t α + β π i j .

Note that, for i > j,

(23) R i R j = 0 , , 0 , 1 i t h , 0 , , 0 , 1 j t h , 0 , , 0 R 1 R 2 R n = e i j r

where e i j is a (1 × n) row vector in which the ith element is 1, the jth element is −1, and all other elements are zeroes.

Define the n 2 × n matrix W as follows:

(24) W = e 12 e 13 e n 1 , n ,

where the indices in the subscript of e′ follow the standard order, as described in Section 3.2. By Eqs. (22) and (23), it follows that αj + βpWr and αj + βp t Wr t , where j is a vector of 1’s and in this instance of dimension n 2 . Thus,

β p t p W r t r .

Before the proof is finally demonstrated, three straightforward short results are needed:

W W = n 1 1 1 1 n 1 1 1 1 1 n 1 = n I + ( 1 ) J ,

where I is the identity matrix of dimension n and J is the (n × n) matrix of 1’s. This result, which can be shown fairly easily, leads to the following one.

W W W = n I + ( 1 ) J W = n W .
  1. U = W′ where U is the design matrix of the round robin with n players.

With the aid of all the results above, the limit in Eq. (21) becomes

r t r U ( p t p ) ( p t p ) U U ( p t p ) U p t p β r t r U W ( r t r ) ( r t r ) W U U W ( r t r ) U p t p = β n U p t p 1 n U W r t r = 1 n n I + ( 1 ) J r t r = r t r ( r ̄ t r ̄ ) j = r t r

and thus r* → r t approximately, as claimed.

Note that the limit will be exact if a uniform distribution is used for the paired-comparison model, i.e., H ( x ) = c x + 1 2 for 1 2 c < x < 1 2 c and c > 0.

References

Bradley, R. A. 1976. “Science, Statistics, and Paired Comparisons.” Biometrics 32 (2): 213–32. https://doi.org/10.2307/2529494.Search in Google Scholar

Bradley, R. A., and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39 (3): 324–45. https://doi.org/10.1093/biomet/39.3-4.324.Search in Google Scholar

Cattelan, M. 2012. “Models for Paired Comparison Data: A Review with Emphasis on Dependent Data.” Statistical Science 27 (3): 412–33. https://doi.org/10.1214/12-sts396.Search in Google Scholar

David, H. A. 1988. The Method of Paired Comparisons, 2nd ed. London: Griffin.Search in Google Scholar

Elo, A. E. 1978. The Rating of Chess Players, Past and Present. New York: Arco.Search in Google Scholar

FIDE. 2021a. Rating Calculator. https://ratings.fide.com/calc.phtml?page=change (accessed August 09, 2021).Search in Google Scholar

FIDE. 2021b. FIDE Handbook. https://handbook.fide.com/chapter/B022017 (accessed August 09, 2021).Search in Google Scholar

FIFA. 2020a. Women’s Ranking Procedure. https://www.fifa.com/fifa-world-ranking/procedure/women (accessed April 29, 2020).Search in Google Scholar

FIFA. 2020b. Men’s Ranking Procedure. https://www.fifa.com/fifa-world-ranking/procedure/men (accessed April 29, 2020).Search in Google Scholar

Glickman, M. E. 1999. “Parameter Estimation in Large Dynamic Paired Comparison Experiments.” Applied Statistics 48 (3): 377–94. https://doi.org/10.1111/1467-9876.00159.Search in Google Scholar

Glickman, M. E., and T. Doan. 2020. The US Chess Rating System. http://www.glicko.net/ratings/rating.system.pdf (accessed July 06, 2021).Search in Google Scholar

Hvattum, L. M., and H. Arntzen. 2010. “Using Elo Ratings for Match Result Prediction in Association Football.” International Journal of Forecasting 26: 460–70. https://doi.org/10.1016/j.ijforecast.2009.10.002.Search in Google Scholar

Kovalchik, S. 2016. “Searching for the GOAT of Tennis Win Prediction.” Journal of Quantitative Analysis in Sports 12 (3): 127–38. https://doi.org/10.1515/jqas-2015-0059.Search in Google Scholar

Langville, A. N., and C. D. Meyer. 2012. Who’s # 1? The Science of Rating and Ranking. New Jersey: Princeton University Press.10.1515/9781400841677Search in Google Scholar

Lehmann, R., and K. Wohlrabe. 2017. “Who is the ‘Journal Grand Master’? A New Ranking Based on the Elo Rating System.” Journal of Informetrics 11 (3): 800–9. https://doi.org/10.1016/j.joi.2017.05.004.Search in Google Scholar

Pelanek, R. 2016. “Applications of the Elo Rating System in Adaptive Educational Systems.” Computers & Education 98: 169–79. https://doi.org/10.1016/j.compedu.2016.03.017.Search in Google Scholar

Pirjol, D. 2013. “The Logistic-Normal Integral and its Generalizations.” Journal of Computational and Applied Mathematics 237 (1): 460–9. https://doi.org/10.1016/j.cam.2012.06.016.Search in Google Scholar

USCF. 2016. The US Chess Title System. https://www.glicko.net/ratings/titles.pdf (accessed July 06, 2021).Search in Google Scholar

Received: 2021-08-31
Revised: 2021-12-22
Accepted: 2022-01-27
Published Online: 2022-03-10
Published in Print: 2022-03-26

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 28.9.2023 from https://www.degruyter.com/document/doi/10.1515/jqas-2021-0079/html
Scroll to top button