Abstract
There is a growing interest in approaches based on latent-variable models for detecting fraudulent behavior on educational tests. Wollack and Schoenig (2018) noted the presence of five types of statistical/psychometric approaches to detect the three broad types of test fraud that occur in educational tests. This paper includes a brief review of the five types of statistical/psychometric approaches mentioned by Wollack and Schoenig (2018). This paper then includes a more detailed review of the recent approaches for detecting test fraud using both item scores and response times—all of these approaches are based on latent-variable models. A real data example demonstrates the use of two of the approaches.
References
American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (2014). Standards for educational and psychological testing. Washington DC: American Educational Research Association.Search in Google Scholar
Baker, F. B., & Kim, H., S. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker.10.1201/9781482276725Search in Google Scholar
Bartholomew, D., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis. New York, NY: John Wiley & Sons, Ltd.10.1002/9781119970583Search in Google Scholar
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.Search in Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.10.1007/BF02291411Search in Google Scholar
Bolsinova, M., & Tijmstra, J. (2018). Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology, 71, 13–38.10.1111/bmsp.12104Search in Google Scholar
Boughton, K., Smith, J., & Ren, H. (2017). Using response time data to detect compromised items and/or people. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 177–190). Washington, DC: Routledge.Search in Google Scholar
Buss, W. G., & Novick, M. R. (1980). The detection of cheating on standardized tests: Statistical and legal analysis. Journal of Law and Education, 9, 1–64.Search in Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48 (6), 1–29.10.18637/jss.v048.i06Search in Google Scholar
Cizek, G. J., & Wollack, J. A. (2017). Handbook of detecting cheating on tests. Washington, DC: Routledge.Search in Google Scholar
Cleveland, W. S. (1981). LOWESS: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35, 54.10.2307/2683591Search in Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.Search in Google Scholar
Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86. (doi=10.1111/j.2044-8317.1985.tb00817.x)10.1111/j.2044-8317.1985.tb00817.xSearch in Google Scholar
Drasgow, F., Levine, M. V., & Zickar, M. J. (1996). Optimal identification of mismeasured individuals. Applied Measurement in Education, 9, 47–64. (doi=10.1207/s15324818ame0901 5)10.1207/s15324818ame0901_5Search in Google Scholar
Dykstra, R. (1991). Asymptotic normality for chi-bar-square distributions. Canadian Journal of Statistics, 19, 297–306. (doi=10.2307/3315395)10.2307/3315395Search in Google Scholar
Eckerly, C. (2020). Answer similarity analysis at the group level. Manuscript under review.Search in Google Scholar
Eckerly, C., Smith, R., & Lee, Y. (2018, October). An introduction to item preknowledge detection with real data applications. Paper presented at the Conference on Test Security, Park City, UT.Search in Google Scholar
Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis. John Wiley & Sons, Ltd.Search in Google Scholar
Ferrara, S. (2017). A framework for policies and practices to improve test security programs: Prevention, detection, investigation, and resolution (PDIR). Educational Measurement: Issues and Practice, 36 (3), 5–24.10.1111/emip.12151Search in Google Scholar
Finger, M. S., & Chee, C. S. (2009, April). Response-time model estimation via confirmatory factor analysis. Paper presented at the Annual meeting of the National Council of Measurement in Education, San Diego, CA.Search in Google Scholar
Fox, J.-P., Klein Entink, R. H., & Klotzke, K. (2017). LNIRT: Lognormal response time item response theory models. (R package version 0.2.0)Search in Google Scholar
Fox, J.-P., & Marianti, S. (2017). Person-fit statistics for joint models for accuracy and speed. Journal of Educational Measurement, 54, 243–262. (doi=10.1111/jedm.12143)10.1111/jedm.12143Search in Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis. New York, NY: Chapman and Hall.Search in Google Scholar
Glas, C. A. W., & van der Linden, W. J. (2010). Marginal likelihood inference for a model for item responses and response times. British Journal of Mathematical and Statistical Psychology, 63, 603–626. (doi=10.1348/000711009x481360)10.1348/000711009X481360Search in Google Scholar
Government Accountability Office. (2013). K-12 education: States’ test security policies and procedures varied (GAO-13-495R) (Tech. Rep.). Washington, DC: Author.Search in Google Scholar
Haberman, S. J., & Lee, Y.-H. (2017). A statistical procedure for testing unusually frequent exactly matching responses and nearly matching responses (ETS Research Report No. RR-17-23). Princeton, NJ: ETS.Search in Google Scholar
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In Educational measurement (pp. 143–200). New York, NY: Macmillan.Search in Google Scholar
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer Academic Publishers.10.1007/978-94-017-1988-9Search in Google Scholar
Hanson, B. A., Harris, D. J., & Brennan, R. L. (1987). A comparison of several statistical methods for examining allegations of copying (ACT research report series no. 87-15). Iowa City, IA: American College Testing.Search in Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer.10.1007/978-0-387-84858-7Search in Google Scholar
Holland, P. W. (1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (ETS Research Report No. RR-94-4). Princeton, NJ: ETS.Search in Google Scholar
Kasli, M., & Zopluoglu, C. (2018, October). Do people with item pre-knowledge really respond faster to items they had prior access? An empirical investigation. Paper presented at the Conference on Test Security, Park City, UT.Search in Google Scholar
Kingston, N., & Clark, A. (2014). Test fraud: Statistical detection and methodology. New York, NY: Routledge.10.4324/9781315884677Search in Google Scholar
Klein Entink, R. H., Fox, J. P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48. (doi=10.1007/s11336-008-9075-y)10.1007/s11336-008-9075-ySearch in Google Scholar
Lee, S. Y., & Wollack, J. (2020). Concurrent use of response time and response accuracy for detecting examinees with item preknowledge. In R. Feinberg & M. Margolis (Eds.), Integrating timing considerations to improve testing practices. New York, NY: Routeledge.Search in Google Scholar
Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30, 358–392.10.1177/0741088313491692Search in Google Scholar
Lewis, C., & Thayer, D. T. (1998). The power of the K-index (or PMIR) to detect copying (ETS Research Report No. RR-98-49). Princeton, NJ: ETS.Search in Google Scholar
Luce, R. D. (1986). Response times. New York, NY: Oxford University Press.Search in Google Scholar
Man, K., & Harring, J. R. (2019). Negative binomial models for visual fixation counts on test items. Educational and Psychological Measurement, 79, 617–635.10.1177/0013164418824148Search in Google Scholar
Man, K., Harring, J. R., & Sinharay, S. (2019). Use of data mining methods to detect test fraud. Journal of Educational Measurement, 56, 251–279.10.1111/jedm.12208Search in Google Scholar
Marianti, S., Fox, J.-P., Avetisyan, M., Veldkamp, B. P., & Tijmstra, J. (2014). Testing for aberrant behavior in response time modeling. Journal of Educational and Behavioral Statistics, 39, 426–451. (doi=10.3102/1076998614559412)10.3102/1076998614559412Search in Google Scholar
Maris, G., & van der Maas, H. (2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77, 615–633.10.1007/s11336-012-9288-ySearch in Google Scholar
Maynes, D. (2013). Educator cheating and the statistical detection of group-based test security threats. In J. A. Wollack & J. J. Fremer (Eds.), Handbook of test security (pp. 173–199). New York, NY: Routledge.Search in Google Scholar
Maynes, D. (2014). Detection of non-independent test-taking by similarity analysis. InSearch in Google Scholar
N. M. Kingston & A. K. Clark (Eds.), Test fraud: Statistical detection and methodology (pp. 53–82). New York, NY: Routledge.Search in Google Scholar
McLeod, L. D., Lewis, C., & Thissen, D. (2003). A Bayesian method for the detection of item preknowledge in computerized adaptive testing. Applied Psycholgoical Measurement, 27, 121–137. (doi=10.1177/0146621602250534)10.1177/0146621602250534Search in Google Scholar
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107–135.10.1177/01466210122031957Search in Google Scholar
Molenaar, D., Tuerlinckx, F., & van der Maas, H. L. J. (2015). A bivariate generalized linear item response theory modeling framework to the analysis of responses and response times. Multivariate Behavioral Research, 50 (1), 56–74.10.1080/00273171.2014.962684Search in Google Scholar
National Center for Education Statistics. (2012). Transcript of proceedings of the testing integrity symposium (Tech. Rep.). Washington, DC: Institute of Education Science.Search in Google Scholar
National Council on Measurement in Education. (2012). Testing and data integrity in the administration of statewide student assessment programs (Tech. Rep.). Madison, WI: Author.Search in Google Scholar
Olson, J. F., & Fremer, J. (2013). TILSA test security guidebook: Preventing, detecting, and investigating test securities irregularities. Washington DC: Council of Chief State School Officers.Search in Google Scholar
Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer-based licensure examinations. Educational Measurement: Issues and Practice, 35 (1), 38–47. (doi=10.1111/emip.12102)10.1111/emip.12102Search in Google Scholar
R Core Team. (2019). R: A language and environment for statistical computing. Vienna, Austria.Search in Google Scholar
Ranger, J., Kuhn, J.-T., & Gaviria, J.-L. (2014). A race model for responses and response times in tests. Psychometrika, 80, 791–810.10.1007/s11336-014-9427-8Search in Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.Search in Google Scholar
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108.10.1037/0033-295X.85.2.59Search in Google Scholar
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17 (5), 1–25.Search in Google Scholar
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48 (2), 1–36.Search in Google Scholar
Sijtsma, K. (1986). A coefficient of deviant response patterns. Kwantitative Methoden, 7, 131–145.Search in Google Scholar
Silvapulle, M. J., & Sen, P. K. (2001). Constrained statistical inference: Order, inequality, and shape constraints. New York, NY: John Wiley & Sons, Inc.10.1002/9781118165614Search in Google Scholar
Sinharay, S. (2016). Asymptotically correct standardization of person-fit statistics beyond dichotomous items. Psychometrika, 81, 992–1013.10.1007/s11336-015-9465-xSearch in Google Scholar
Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42, 46–68. (doi=10.3102/1076998616673872)10.3102/1076998616673872Search in Google Scholar
Sinharay, S. (2018). A new person-fit statistic for the lognormal model for response times. Journal of Educational Measurement, 55, 457–476.10.1111/jedm.12188Search in Google Scholar
Sinharay, S. (2020). Detection of item preknowledge using response times. Applied Psychological Measurement, 44, 376–392.10.1177/0146621620909893Search in Google Scholar
Sinharay, S., Duong, M. Q., & Wood, S. W. (2017). A new statistic for detection of aberrant answer changes. Journal of Educational Measurement, 54, 200–217. (doi=10.1111/jedm.12141)10.1111/jedm.12141Search in Google Scholar
Sinharay, S., & Johnson, M. S. (2020). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73, 397–419.10.1111/bmsp.12187Search in Google Scholar
Sinharay, S., & van Rijn, P. W. (2020). Assessing fit of the lognormal model for response times. Journal of Educational and Behavioral Statistics, 51, 419–440.10.3102/1076998620911935Search in Google Scholar
Smith, R. W., & Davis-Becker, S. L. (2011, April). Detecting suspect examinees: An application of differential person functioning analysis. Paper presented at the annual meeting of the National Council of Measurement in Education, New Orleans, LA.Search in Google Scholar
Snijders, T. (2001). Asymptotic distribution of person-fit statistics with estimated person parameter. Psychometrika, 66, 331–342.10.1007/BF02294437Search in Google Scholar
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95–110.10.1007/BF02294208Search in Google Scholar
Thisted, R. A. (1988). Elements of statistical computing: Numerical computation. London: Chapman and Hall.Search in Google Scholar
Townsend, J., & Ashby, F. (1978). Methods of modeling capacity in simple processing systems. In J. Castellan & F. Restle (Eds.), Cognitive theory, vol. 3 (pp. 199–239). Hillsdale, NJ: Erlbaum.Search in Google Scholar
van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204.10.3102/10769986031002181Search in Google Scholar
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308. (doi=10.1007/s11336-006-1478-z)10.1007/s11336-006-1478-zSearch in Google Scholar
van der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46, 247–272.10.1111/j.1745-3984.2009.00080.xSearch in Google Scholar
van der Linden, W. J. (2016). Lognormal response-time model. In W. van der Linden (Ed.), Handbook of item response theory, Volume 1. Models. Boca Raton, FL: Chapman and Hall/CRC.Search in Google Scholar
van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73, 365–384.10.1007/s11336-007-9046-8Search in Google Scholar
van der Linden, W. J., & Lewis, C. (2015). Bayesian checks on cheating on tests. Psychometrika, 80, 689–706.10.1007/s11336-014-9409-xSearch in Google Scholar
van der Linden, W. J., & Sotaridona, L. (2006). Detecting answer copying when the regular response process follows a known response model. Journal of Educational and Behavioral Statistics, 31, 283–304.10.3102/10769986031003283Search in Google Scholar
van der Maas, H. L. J., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences. Psychological Review, 118, 339–356.10.1037/a0022749Search in Google Scholar
van Rijn, P. W., & Ali, U. S. (2017). A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing. British Journal of Mathematical and Statistical Psychology, 70, 317–345. (doi=10.1111/bmsp.12101)10.1111/bmsp.12101Search in Google Scholar
Wang, C., Xu, G., Shang, Z., & Kuncel, N. (2018). Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method. Journal of Educational and Behavioral Statistics, 43, 469–501.10.3102/1076998618767123Search in Google Scholar
Wang, X., Liu, Y., & Hambleton, R. K. (2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41, 243–263. (doi=10.1177/0146621616687285)10.1177/0146621616687285Search in Google Scholar
Wollack, J. A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21, 307–320.10.1177/01466216970214002Search in Google Scholar
Wollack, J. A., & Cizek, G. J. (2017). The future of quantitative methods for detecting cheating. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 390–399). Washington, DC: Routledge.Search in Google Scholar
Wollack, J. A., Cohen, A. S., & Eckerly, C. A. (2015). Detecting test tampering using item response theory. Educational and Psychological Measurement, 75, 931–953.10.1177/0013164414568716Search in Google Scholar
Wollack, J. A., & Eckerly, C. (2017). Detecting test tampering at the group level. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 214–231). Washington, DC: Routledge.Search in Google Scholar
Wollack, J. A., & Fremer, J. J. (2013). Handbook of test security. New York, NY: Routledge.10.4324/9780203664803Search in Google Scholar
Wollack, J. A., & Maynes, D. (2017). Detection of test collusion using cluster analysis. In10.4324/9781315743097-6Search in Google Scholar
G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 124–150). Washington, DC: Routledge.Search in Google Scholar
Wollack, J. A., & Schoenig, R. W. (2018). Cheating. In B. B. Frey (Ed.), The SAGE encyclopedia of educational research, measurement, and evaluation (pp. 260–265). Thousand Oaks, CA: Sage.Search in Google Scholar
Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In Educational measurement (pp. 111–153). West Port, CT: American Council on Educationl and Praeger Publishers.Search in Google Scholar
© 2021 Sandip Sinharay, published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.