Skip to content
Licensed Unlicensed Requires Authentication Published online by De Gruyter May 10, 2022

A comparison of joint dichotomization and single dichotomization of interacting variables to discriminate a disease outcome

Sybil Prince Nelson ORCID logo, Viswanathan Ramakrishnan, Paul Nietert, Diane Kamen, Paula Ramos and Bethany Wolf

Abstract

Dichotomization is often used on clinical and diagnostic settings to simplify interpretation. For example, a person with systolic and diastolic blood pressure above 140 over 90 may be prescribed medication. Blood pressure as well as other factors such as age and cholesterol and their interactions may lead to increased risk of certain diseases. When using a dichotomized variable to determine a diagnosis, if the interactions with other variables are not considered, then an incorrect threshold for the continuous variable may be selected. In this paper, we compare single dichotomization with joint dichotomization; the process of simultaneously optimizing cutpoints for multiple variables. A simulation study shows that simultaneous dichotomization of continuous variables is more accurate in recovering both ‘true’ thresholds given they exist.


Corresponding author: Sybil Prince Nelson, Department of Mathematics, Washington and Lee University, 204 W Washington St, Lexington, VA 24450, USA, E-mail:

Funding source: South Carolina Clinical and Translational Research Institute, Medical University of South Carolina’s CTSA, NIH/NCATS

Award Identifier / Grant number: UL1TR000062

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: This project was supported in part by the South Carolina Clinical and Translational Research Institute, Medical University of South Carolina’s CTSA, NIH/NCATS Grant Number UL1TR000062.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. Lobo, I. Epistasis: gene interaction and the phenotypic expression of complex diseases like alzheimer’s. Nat Educ 2008;1:180.Search in Google Scholar

2. Manolio, TA, Collins, FS. Genes, enviornment, health and disease: facing up to complexity. Hum Hered 2007;63:63–6. https://doi.org/10.1159/000099178.Search in Google Scholar

3. McKinney, BA, Reif, DM, Ritchie, MD, Moore, JH. Machine learning for detecting gene-gene interactions. Appl Bioinf 2006;5:77–88. https://doi.org/10.2165/00822942-200605020-00002.Search in Google Scholar

4. Benjamin, O, Lappin, SL. End-stage renal disease. In: StatPearls[Internet]. Treasure Island (FL): StatPearls Publishing; 2021. 2021 Jan–. PMID: 29763036.Search in Google Scholar

5. SCORE2 working group and ESC Cardiovascular risk collaboration. SCORE2 risk prediction algorithms: new models to estimate 10-year risk of cardiovascular disease in Europe. Eur Heart J 2021;42:2439–54. https://doi.org/10.1093/eurheartj/ehab309.Search in Google Scholar

6. Kraemer, HC. Risk ratios, odds ratio, and the test QROC. In: Evaluating medical tests. Newbury Park, CA: SAGE Publications, Inc; 1992:103–13 pp.Search in Google Scholar

7. Youden, WJ. Index for rating diagnostic tests. Cancer 1950;3:32–5. https://doi.org/10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3Search in Google Scholar

8. Boehning, D, Holling, H, Patilea, V. A limitation of the diagnostic-odds ratio in determining an optimal cut-off value for a continuous diagnostic test. Stat Methods Med Res 2011;20:541–50.10.1177/0962280210374532Search in Google Scholar

9. Greiner, M, Pfeiffer, D, Smith, RDt. Principles and practical application of the receiver operating characteristic analysis for diagnostic tests. Prev Vet Med 2000;45:23–41. https://doi.org/10.1016/s0167-5877(00)00115-x.Search in Google Scholar

10. Greiner, M. Two-graph receiver operating characteristic (TG-ROC): a Microsoft-EXCEL template for the selection of cut-off values in diagnostic tests. J Immunol Methods 1995;185:145–6. https://doi.org/10.1016/0022-1759(95)00078-o.Search in Google Scholar

11. Strobl, C, Boulesteix, AL, Augustin, T. Unbiased split selection for classification trees based on the Gini Index. Comput Stat Data Anal 2007;52:483–501. https://doi.org/10.1016/j.csda.2006.12.030.Search in Google Scholar

12. Vargha, A, Rudas, T, Delaney, HD, Maxwell, SE. Dichotomization, partial correlation, and conditional independence. J Educ Behav Stat 1996;21:264–82. https://doi.org/10.2307/1165272.Search in Google Scholar

13. Lopez-Raton, M, Rodriguez-Alvarez, MX, Cardosa-Suarez, C, Gude-Sampedro, F. OptimalCutpoints: an R package for selecting optimal cutpoints in diagnostic testing. J Stat Software 2014;61:1–36. https://doi.org/10.18637/jss.v061.i08.Search in Google Scholar

14. Aoki, K, Misumi, J, Kimura, T, Zhao, W, Xie, T. Evaluation of cutoff levels for screening of gastric cancer using serum pepsinogens and distributions of levels of serum pepsinogen I, II and of PG I/PG II ratios in a gastric cancer case-control study. J Epidemiol 1997;7:143–51. https://doi.org/10.2188/jea.7.143.Search in Google Scholar

15. Breiman, L, Friedman, J, Stone, CJ, Olshen, RA. Classification and regression trees. CRC Press; 1984.Search in Google Scholar

16. Vermont, J, Bosson, JL, Francois, P, Robert, C, Rueff, A, Demongeot, J. Strategies for graphical threshold determination. Comput Methods Progr Biomed 1991;35:141–50. https://doi.org/10.1016/0169-2607(91)90072-2.Search in Google Scholar

17. PrinceNelson, SL, Ramakrishnan, V, Nietert, PJ, Kamen, DL, Ramos, PS, Wolf, BJ. An evaluation of common methods for dichotomization of continuous variables to discriminate disease status. Commun Stat 2017;46:10823–34. https://doi.org/10.1080/03610926.2016.1248783.Search in Google Scholar PubMed PubMed Central

18. R Core Team. R:A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2013.Search in Google Scholar

19. Altman, DG, Lausen, B, Sauerbrei, W, Schumacher, M. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst 1994;86:829–35. https://doi.org/10.1093/jnci/86.11.829.Search in Google Scholar PubMed

20. MacCallum, R, Zhang, S, Preacher, K. On the practice of dichotomization of quantitative variables. Psychol Methods 2002;7:19–40. https://doi.org/10.1037/1082-989x.7.1.19.Search in Google Scholar PubMed

21. Altman, D, Royston, P. The cost of dichotomizing continuous variables. Br Med J 2006;332:1080. https://doi.org/10.1136/bmj.332.7549.1080.Search in Google Scholar PubMed PubMed Central

22. Naggara, O, Raymond, J, Guilbert, F, Weill, A, Altman, DG. Analysis by categorizing or dichotomizing continuous variables is inadvisable: an example from the natural history of unruptured aneurysms. Am J Neuroradiol 2011;32:437–40. https://doi.org/10.3174/ajnr.A2425.Search in Google Scholar PubMed PubMed Central

23. Metze, K. Dichotomization of continuous data – a pitfall in prognostic factor studies. Pathol Res Pract 2008;204:213–4. https://doi.org/10.1016/j.prp.2007.12.002.Search in Google Scholar PubMed

24. Hunter, J, Schmidt, F. Dichotomization of continuous variables: the implications for meta-analysis. J Appl Psychol 1990;75:334–49. https://doi.org/10.1037/0021-9010.75.3.334.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2021-0071).


Received: 2021-07-23
Revised: 2022-02-15
Accepted: 2022-02-16
Published Online: 2022-05-10

© 2022 Walter de Gruyter GmbH, Berlin/Boston