Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access January 29, 2021

Latent-variable Approaches Utilizing Both Item Scores and Response Times To Detect Test Fraud

  • Sandip Sinharay EMAIL logo
From the journal Open Education Studies

Abstract

There is a growing interest in approaches based on latent-variable models for detecting fraudulent behavior on educational tests. Wollack and Schoenig (2018) noted the presence of five types of statistical/psychometric approaches to detect the three broad types of test fraud that occur in educational tests. This paper includes a brief review of the five types of statistical/psychometric approaches mentioned by Wollack and Schoenig (2018). This paper then includes a more detailed review of the recent approaches for detecting test fraud using both item scores and response times—all of these approaches are based on latent-variable models. A real data example demonstrates the use of two of the approaches.

References

American Educational Research Association, American Psychological Association, & National Council for Measurement in Education. (2014). Standards for educational and psychological testing. Washington DC: American Educational Research Association.Search in Google Scholar

Baker, F. B., & Kim, H., S. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Marcel Dekker.10.1201/9781482276725Search in Google Scholar

Bartholomew, D., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis. New York, NY: John Wiley & Sons, Ltd.10.1002/9781119970583Search in Google Scholar

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.Search in Google Scholar

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.10.1007/BF02291411Search in Google Scholar

Bolsinova, M., & Tijmstra, J. (2018). Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology, 71, 13–38.10.1111/bmsp.12104Search in Google Scholar

Boughton, K., Smith, J., & Ren, H. (2017). Using response time data to detect compromised items and/or people. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 177–190). Washington, DC: Routledge.Search in Google Scholar

Buss, W. G., & Novick, M. R. (1980). The detection of cheating on standardized tests: Statistical and legal analysis. Journal of Law and Education, 9, 1–64.Search in Google Scholar

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48 (6), 1–29.10.18637/jss.v048.i06Search in Google Scholar

Cizek, G. J., & Wollack, J. A. (2017). Handbook of detecting cheating on tests. Washington, DC: Routledge.Search in Google Scholar

Cleveland, W. S. (1981). LOWESS: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35, 54.10.2307/2683591Search in Google Scholar

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.Search in Google Scholar

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67–86. (doi=10.1111/j.2044-8317.1985.tb00817.x)10.1111/j.2044-8317.1985.tb00817.xSearch in Google Scholar

Drasgow, F., Levine, M. V., & Zickar, M. J. (1996). Optimal identification of mismeasured individuals. Applied Measurement in Education, 9, 47–64. (doi=10.1207/s15324818ame0901 5)10.1207/s15324818ame0901_5Search in Google Scholar

Dykstra, R. (1991). Asymptotic normality for chi-bar-square distributions. Canadian Journal of Statistics, 19, 297–306. (doi=10.2307/3315395)10.2307/3315395Search in Google Scholar

Eckerly, C. (2020). Answer similarity analysis at the group level. Manuscript under review.Search in Google Scholar

Eckerly, C., Smith, R., & Lee, Y. (2018, October). An introduction to item preknowledge detection with real data applications. Paper presented at the Conference on Test Security, Park City, UT.Search in Google Scholar

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis. John Wiley & Sons, Ltd.Search in Google Scholar

Ferrara, S. (2017). A framework for policies and practices to improve test security programs: Prevention, detection, investigation, and resolution (PDIR). Educational Measurement: Issues and Practice, 36 (3), 5–24.10.1111/emip.12151Search in Google Scholar

Finger, M. S., & Chee, C. S. (2009, April). Response-time model estimation via confirmatory factor analysis. Paper presented at the Annual meeting of the National Council of Measurement in Education, San Diego, CA.Search in Google Scholar

Fox, J.-P., Klein Entink, R. H., & Klotzke, K. (2017). LNIRT: Lognormal response time item response theory models. (R package version 0.2.0)Search in Google Scholar

Fox, J.-P., & Marianti, S. (2017). Person-fit statistics for joint models for accuracy and speed. Journal of Educational Measurement, 54, 243–262. (doi=10.1111/jedm.12143)10.1111/jedm.12143Search in Google Scholar

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis. New York, NY: Chapman and Hall.Search in Google Scholar

Glas, C. A. W., & van der Linden, W. J. (2010). Marginal likelihood inference for a model for item responses and response times. British Journal of Mathematical and Statistical Psychology, 63, 603–626. (doi=10.1348/000711009x481360)10.1348/000711009X481360Search in Google Scholar

Government Accountability Office. (2013). K-12 education: States’ test security policies and procedures varied (GAO-13-495R) (Tech. Rep.). Washington, DC: Author.Search in Google Scholar

Haberman, S. J., & Lee, Y.-H. (2017). A statistical procedure for testing unusually frequent exactly matching responses and nearly matching responses (ETS Research Report No. RR-17-23). Princeton, NJ: ETS.Search in Google Scholar

Hambleton, R. K. (1989). Principles and selected applications of item response theory. In Educational measurement (pp. 143–200). New York, NY: Macmillan.Search in Google Scholar

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer Academic Publishers.10.1007/978-94-017-1988-9Search in Google Scholar

Hanson, B. A., Harris, D. J., & Brennan, R. L. (1987). A comparison of several statistical methods for examining allegations of copying (ACT research report series no. 87-15). Iowa City, IA: American College Testing.Search in Google Scholar

Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer.10.1007/978-0-387-84858-7Search in Google Scholar

Holland, P. W. (1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (ETS Research Report No. RR-94-4). Princeton, NJ: ETS.Search in Google Scholar

Kasli, M., & Zopluoglu, C. (2018, October). Do people with item pre-knowledge really respond faster to items they had prior access? An empirical investigation. Paper presented at the Conference on Test Security, Park City, UT.Search in Google Scholar

Kingston, N., & Clark, A. (2014). Test fraud: Statistical detection and methodology. New York, NY: Routledge.10.4324/9781315884677Search in Google Scholar

Klein Entink, R. H., Fox, J. P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48. (doi=10.1007/s11336-008-9075-y)10.1007/s11336-008-9075-ySearch in Google Scholar

Lee, S. Y., & Wollack, J. (2020). Concurrent use of response time and response accuracy for detecting examinees with item preknowledge. In R. Feinberg & M. Margolis (Eds.), Integrating timing considerations to improve testing practices. New York, NY: Routeledge.Search in Google Scholar

Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30, 358–392.10.1177/0741088313491692Search in Google Scholar

Lewis, C., & Thayer, D. T. (1998). The power of the K-index (or PMIR) to detect copying (ETS Research Report No. RR-98-49). Princeton, NJ: ETS.Search in Google Scholar

Luce, R. D. (1986). Response times. New York, NY: Oxford University Press.Search in Google Scholar

Man, K., & Harring, J. R. (2019). Negative binomial models for visual fixation counts on test items. Educational and Psychological Measurement, 79, 617–635.10.1177/0013164418824148Search in Google Scholar

Man, K., Harring, J. R., & Sinharay, S. (2019). Use of data mining methods to detect test fraud. Journal of Educational Measurement, 56, 251–279.10.1111/jedm.12208Search in Google Scholar

Marianti, S., Fox, J.-P., Avetisyan, M., Veldkamp, B. P., & Tijmstra, J. (2014). Testing for aberrant behavior in response time modeling. Journal of Educational and Behavioral Statistics, 39, 426–451. (doi=10.3102/1076998614559412)10.3102/1076998614559412Search in Google Scholar

Maris, G., & van der Maas, H. (2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77, 615–633.10.1007/s11336-012-9288-ySearch in Google Scholar

Maynes, D. (2013). Educator cheating and the statistical detection of group-based test security threats. In J. A. Wollack & J. J. Fremer (Eds.), Handbook of test security (pp. 173–199). New York, NY: Routledge.Search in Google Scholar

Maynes, D. (2014). Detection of non-independent test-taking by similarity analysis. InSearch in Google Scholar

N. M. Kingston & A. K. Clark (Eds.), Test fraud: Statistical detection and methodology (pp. 53–82). New York, NY: Routledge.Search in Google Scholar

McLeod, L. D., Lewis, C., & Thissen, D. (2003). A Bayesian method for the detection of item preknowledge in computerized adaptive testing. Applied Psycholgoical Measurement, 27, 121–137. (doi=10.1177/0146621602250534)10.1177/0146621602250534Search in Google Scholar

Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107–135.10.1177/01466210122031957Search in Google Scholar

Molenaar, D., Tuerlinckx, F., & van der Maas, H. L. J. (2015). A bivariate generalized linear item response theory modeling framework to the analysis of responses and response times. Multivariate Behavioral Research, 50 (1), 56–74.10.1080/00273171.2014.962684Search in Google Scholar

National Center for Education Statistics. (2012). Transcript of proceedings of the testing integrity symposium (Tech. Rep.). Washington, DC: Institute of Education Science.Search in Google Scholar

National Council on Measurement in Education. (2012). Testing and data integrity in the administration of statewide student assessment programs (Tech. Rep.). Madison, WI: Author.Search in Google Scholar

Olson, J. F., & Fremer, J. (2013). TILSA test security guidebook: Preventing, detecting, and investigating test securities irregularities. Washington DC: Council of Chief State School Officers.Search in Google Scholar

Qian, H., Staniewska, D., Reckase, M., & Woo, A. (2016). Using response time to detect item preknowledge in computer-based licensure examinations. Educational Measurement: Issues and Practice, 35 (1), 38–47. (doi=10.1111/emip.12102)10.1111/emip.12102Search in Google Scholar

R Core Team. (2019). R: A language and environment for statistical computing. Vienna, Austria.Search in Google Scholar

Ranger, J., Kuhn, J.-T., & Gaviria, J.-L. (2014). A race model for responses and response times in tests. Psychometrika, 80, 791–810.10.1007/s11336-014-9427-8Search in Google Scholar

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.Search in Google Scholar

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108.10.1037/0033-295X.85.2.59Search in Google Scholar

Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17 (5), 1–25.Search in Google Scholar

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48 (2), 1–36.Search in Google Scholar

Sijtsma, K. (1986). A coefficient of deviant response patterns. Kwantitative Methoden, 7, 131–145.Search in Google Scholar

Silvapulle, M. J., & Sen, P. K. (2001). Constrained statistical inference: Order, inequality, and shape constraints. New York, NY: John Wiley & Sons, Inc.10.1002/9781118165614Search in Google Scholar

Sinharay, S. (2016). Asymptotically correct standardization of person-fit statistics beyond dichotomous items. Psychometrika, 81, 992–1013.10.1007/s11336-015-9465-xSearch in Google Scholar

Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42, 46–68. (doi=10.3102/1076998616673872)10.3102/1076998616673872Search in Google Scholar

Sinharay, S. (2018). A new person-fit statistic for the lognormal model for response times. Journal of Educational Measurement, 55, 457–476.10.1111/jedm.12188Search in Google Scholar

Sinharay, S. (2020). Detection of item preknowledge using response times. Applied Psychological Measurement, 44, 376–392.10.1177/0146621620909893Search in Google Scholar

Sinharay, S., Duong, M. Q., & Wood, S. W. (2017). A new statistic for detection of aberrant answer changes. Journal of Educational Measurement, 54, 200–217. (doi=10.1111/jedm.12141)10.1111/jedm.12141Search in Google Scholar

Sinharay, S., & Johnson, M. S. (2020). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73, 397–419.10.1111/bmsp.12187Search in Google Scholar

Sinharay, S., & van Rijn, P. W. (2020). Assessing fit of the lognormal model for response times. Journal of Educational and Behavioral Statistics, 51, 419–440.10.3102/1076998620911935Search in Google Scholar

Smith, R. W., & Davis-Becker, S. L. (2011, April). Detecting suspect examinees: An application of differential person functioning analysis. Paper presented at the annual meeting of the National Council of Measurement in Education, New Orleans, LA.Search in Google Scholar

Snijders, T. (2001). Asymptotic distribution of person-fit statistics with estimated person parameter. Psychometrika, 66, 331–342.10.1007/BF02294437Search in Google Scholar

Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49, 95–110.10.1007/BF02294208Search in Google Scholar

Thisted, R. A. (1988). Elements of statistical computing: Numerical computation. London: Chapman and Hall.Search in Google Scholar

Townsend, J., & Ashby, F. (1978). Methods of modeling capacity in simple processing systems. In J. Castellan & F. Restle (Eds.), Cognitive theory, vol. 3 (pp. 199–239). Hillsdale, NJ: Erlbaum.Search in Google Scholar

van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204.10.3102/10769986031002181Search in Google Scholar

van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308. (doi=10.1007/s11336-006-1478-z)10.1007/s11336-006-1478-zSearch in Google Scholar

van der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46, 247–272.10.1111/j.1745-3984.2009.00080.xSearch in Google Scholar

van der Linden, W. J. (2016). Lognormal response-time model. In W. van der Linden (Ed.), Handbook of item response theory, Volume 1. Models. Boca Raton, FL: Chapman and Hall/CRC.Search in Google Scholar

van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73, 365–384.10.1007/s11336-007-9046-8Search in Google Scholar

van der Linden, W. J., & Lewis, C. (2015). Bayesian checks on cheating on tests. Psychometrika, 80, 689–706.10.1007/s11336-014-9409-xSearch in Google Scholar

van der Linden, W. J., & Sotaridona, L. (2006). Detecting answer copying when the regular response process follows a known response model. Journal of Educational and Behavioral Statistics, 31, 283–304.10.3102/10769986031003283Search in Google Scholar

van der Maas, H. L. J., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences. Psychological Review, 118, 339–356.10.1037/a0022749Search in Google Scholar

van Rijn, P. W., & Ali, U. S. (2017). A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing. British Journal of Mathematical and Statistical Psychology, 70, 317–345. (doi=10.1111/bmsp.12101)10.1111/bmsp.12101Search in Google Scholar

Wang, C., Xu, G., Shang, Z., & Kuncel, N. (2018). Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method. Journal of Educational and Behavioral Statistics, 43, 469–501.10.3102/1076998618767123Search in Google Scholar

Wang, X., Liu, Y., & Hambleton, R. K. (2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41, 243–263. (doi=10.1177/0146621616687285)10.1177/0146621616687285Search in Google Scholar

Wollack, J. A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21, 307–320.10.1177/01466216970214002Search in Google Scholar

Wollack, J. A., & Cizek, G. J. (2017). The future of quantitative methods for detecting cheating. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 390–399). Washington, DC: Routledge.Search in Google Scholar

Wollack, J. A., Cohen, A. S., & Eckerly, C. A. (2015). Detecting test tampering using item response theory. Educational and Psychological Measurement, 75, 931–953.10.1177/0013164414568716Search in Google Scholar

Wollack, J. A., & Eckerly, C. (2017). Detecting test tampering at the group level. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 214–231). Washington, DC: Routledge.Search in Google Scholar

Wollack, J. A., & Fremer, J. J. (2013). Handbook of test security. New York, NY: Routledge.10.4324/9780203664803Search in Google Scholar

Wollack, J. A., & Maynes, D. (2017). Detection of test collusion using cluster analysis. In10.4324/9781315743097-6Search in Google Scholar

G. J. Cizek & J. A. Wollack (Eds.), Handbook of detecting cheating on tests (pp. 124–150). Washington, DC: Routledge.Search in Google Scholar

Wollack, J. A., & Schoenig, R. W. (2018). Cheating. In B. B. Frey (Ed.), The SAGE encyclopedia of educational research, measurement, and evaluation (pp. 260–265). Thousand Oaks, CA: Sage.Search in Google Scholar

Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In Educational measurement (pp. 111–153). West Port, CT: American Council on Educationl and Praeger Publishers.Search in Google Scholar

Received: 2020-08-13
Accepted: 2020-12-30
Published Online: 2021-01-29

© 2021 Sandip Sinharay, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 4.2.2023 from https://www.degruyter.com/document/doi/10.1515/edu-2020-0137/html
Scroll Up Arrow