Most scientists consider randomized experiments to be the best method available to establish causality. On the Internet, during the past twenty-five years, randomized experiments have become common, often referred to as A/B testing. For practical reasons, much A/B testing does not use pseudo-random number generators to implement randomization. Instead, hash functions are used to transform the distribution of identifiers of experimental units into a uniform distribution. Using two large, industry data sets, I demonstrate that the success of hash-based quasi-randomization strategies depends greatly on the hash function used: MD5 yielded good results, while SHA512 yielded less impressive ones.
Graham, R.L., Knuth D.E., Patashnik O. (1994), Concrete Mathematics: A Foundation for Computer Science. Reading, MA, USA, Addison-Wesley.
Gueron, S.S.J., Walker J. (2011), SHA-512/256. Proceedings of the 2011 Eighth International Conference on Information Technology: New Generations, pp. 354–358.
Harris R.P., M. Helfand, S.H. Woolf, K.N. Lohr, C.D. Mulrow, S.M. Teutsch, D. Atkins, Methods Work Group, Third US Preventive Services Task Force (2001), Current Methods of the US Preventive Services Task Force: A Review of the Process. American Journal of Preventive Medicine 20 (3 Suppl).
Fisher, R.A. (1935), The Design of Experiments. Edinburgh, UK, Oliver and Boyd.
Knight, F.H. (1921), Risk, Uncertainty, and Profit. Boston, MA, Hart,Schaffner and Marx.
Kohavi, R., Longbotham R., Sommerfield D., Henne R.M. (2009), Controlled Experiments On the Web: Survey and Practical Guide.
Matsumoto, M., Nishimura T. (1998), Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Transactions on Modeling and Computer Simulation 8: 3–30.
Matsumoto, M., Nishimura T. (1998), Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Transactions on Modeling and Computer Simulation 8: 3–30.10.1145/272991.272995)| false