Randomization in Online Experiments

Konstantin Golyaev 1
  • 1 Konstantin Golyaev, Microsoft Research, One Microsoft Way, Redmond, USA
Konstantin Golyaev


Most scientists consider randomized experiments to be the best method available to establish causality. On the Internet, during the past twenty-five years, randomized experiments have become common, often referred to as A/B testing. For practical reasons, much A/B testing does not use pseudo-random number generators to implement randomization. Instead, hash functions are used to transform the distribution of identifiers of experimental units into a uniform distribution. Using two large, industry data sets, I demonstrate that the success of hash-based quasi-randomization strategies depends greatly on the hash function used: MD5 yielded good results, while SHA512 yielded less impressive ones.

  • Gilbert, S.L., Lynch N.A. (2002), Brewer’s Conjecture And the Feasibility of Consistent, Available, Partition-Tolerant Web Services. ACM SIGACT News 33: 51–59.

    • Crossref
    • Export Citation
  • Graham, R.L., Knuth D.E., Patashnik O. (1994), Concrete Mathematics: A Foundation for Computer Science. Reading, MA, USA, Addison-Wesley.

  • Gueron, S.S.J., Walker J. (2011), SHA-512/256. Proceedings of the 2011 Eighth International Conference on Information Technology: New Generations, pp. 354–358.

  • Harris R.P., M. Helfand, S.H. Woolf, K.N. Lohr, C.D. Mulrow, S.M. Teutsch, D. Atkins, Methods Work Group, Third US Preventive Services Task Force (2001), Current Methods of the US Preventive Services Task Force: A Review of the Process. American Journal of Preventive Medicine 20 (3 Suppl).

  • Fisher, R.A. (1935), The Design of Experiments. Edinburgh, UK, Oliver and Boyd.

  • Knight, F.H. (1921), Risk, Uncertainty, and Profit. Boston, MA, Hart,Schaffner and Marx.

  • Kohavi, R., Longbotham R., Sommerfield D., Henne R.M. (2009), Controlled Experiments On the Web: Survey and Practical Guide.

  • Matsumoto, M., Nishimura T. (1998), Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Transactions on Modeling and Computer Simulation 8: 3–30.

    • Crossref
    • Export Citation
  • Paarsch, H.J., Golyaev K. (2016), A Gentle Introduction to Effective Computing in Quantitative Research: What Every Research Assistant Should Know. Cambridge, USA, MIT Press.

  • Rivest, R. (1992), The MD5 Message-Digest Algorithm. USA, RFC 1321, RFC Editor.

  • Schilling, M.F. (2012), The Surprising Predictability of Long Runs. Mathematics Magazine 85 (2): 141149.

Purchase article
Get instant unlimited access to the article.
Log in
Already have access? Please log in.

Log in with your institution

Journal + Issues