Clinical laboratory testing generates information to benefit patient management decisions in support of health, while inaccurate or inappropriate testing may contribute to patient harm [, , . Moreover, measures of diagnostic test accuracy (DTA) provide insight into a test’s (or test combination’s) ability to contribute to quality and safety within diagnostic pathways by estimating a test’s clinical validity 4]. Based on rates of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FNs), these measures can inform the appropriate role of a test in a diagnostic pathway, and can assist in interpretation of test results for individual patients as generally depicted in Figure 1 , , , .
Measures of DTA can be determined through diagnostic cross-sectional studies and diagnostic case-control studies which assess performance of one or more index tests in relation to a gold standard or reference method test [, , , . Through these study designs, rates of TP, FP, TN and FN are derived for assembly into various summary measures as illustrated in Figure 2 14].
DTA systematic reviews (SRs) are a method for developing recommendations on the use of a test or a combination of tests [, , . In DTA SRs, studies are synthesized to obtain pooled, and potentially more reliable, DTA measures. Further, DTA SRs may help investigators determine how DTA may vary by populations, settings/clinical contexts, or positivity thresholds 15], [, , . However, such findings are not the only arbiter of decisions on test implementation and interpretation in support of diagnostic quality and safety , , . For example, a DTA evidence base, of itself, cannot directly indicate downstream consequences without additional linkage to separate bodies of evidence (e.g. on treatment efficacy for the target condition), or through logical inference 23]. Adding to the interpretative challenges, published DTA SRs often provide a scattershot of DTA measures, without guidance on which are more informative for those making test implementation decisions [, , . This situation is especially problematic when one considers, as noted by Schiff 2012, it is possible “the average clinician could care less about…a new study to increase their positive predictive value” 27]. While more patient-centered research strategies have been described through controlled trials assessing outcomes of test-and-treat interventions, and through hierarchical assessments of test efficacy, such studies are less often found in the available evidence base , , , , , .
Some of these limitations of DTA SRs may partially relate to infrequent use of an analytic framework, which provides scope and context for DTA measures, and is a recommended standard for SRs in general , . Interpretive challenges may also arise when primary DTA studies are poorly reported or demonstrate risk of bias. Reporting standards for DTA studies are found in the Standards for Reporting of Diagnostic Accuracy, and risk of bias is identifiable through the Quality Assessment of Diagnostic Accuracy Studies risk of bias tool , , , .
In addition to these challenges, some SR methods [e.g. the Centers for Disease Control and Prevention, Division of Laboratory Systems (CDC DLS) Laboratory Medicine Best Practices (LMBP) method] require determination of qualitative effect size ratings (e.g. “substantial”, “moderate”, “minimal”) as a partial determinant of the strength of body-of-evidence . Table 1 details the CDC DLS LMBP method’s criteria for rating the strength of a body-of-evidence, in general taking into account the number of studies (within an intervention group) with particular effect size ratings and study quality ratings.
To address these DTA SR challenges, a clinically meaningful approach was needed in order to derive a single qualitative effect size rating for each DTA study and for a body of evidence as a whole. This paper describes the approach developed, which is based on:
Location of a diagnostic accuracy study within a four-quadrant likelihood ratio scatter matrix and
Matrix quadrant demarcation derived from established likelihood ratio thresholds signifying high clinical validity.
Materials and methods
Likelihood ratios are depicted in Table 2, and multiple resources are available to further aid in understanding and interpretation , , , , , , , , . In general, clinical interpretation of likelihood ratios includes use of probability thresholds, Fagan’s nomogram, and Bayesian reasoning.
Figure 3 further characterizes likelihood ratios by illustrating their relationship to other DTA measures, and their relationship to ruling-in or ruling-out a target condition.
Use of likelihood ratios to determine effect size rating
The American Society for Microbiology (ASM) recently completed a SR using the CDC DLS’s LMBP method to derive practice recommendations for Clostridium difficile testing practices (pending publication at this time). The evidence base for this SR consisted of DTA studies, and the SR was conducted in collaboration with the CDC DLS.
DTA studies present analysis challenges not encountered when assessing other types of evidence such as randomized controlled trials and before-and-after studies , and the CDC DLS LMBP SR method was not developed to optimally address some of these challenges. An important challenge is interpretation of the tradeoff between diagnostic sensitivity and specificity, particularly in the context of the LMBP method of evidence grading. Given that DTA studies report two related effects (diagnostic sensitivity and specificity), the review team determined that an approach was needed to capture (1) the trade-off between these two measures of effect, and (2) the clinical importance of this tradeoff. Lastly, an approach for deriving a single qualitative effect size rating from these measures was needed, expressible as “substantial”, “moderate”, or “minimal” (see Table 1 evidence rating criteria) .
Approach step 1
The solution (developed by authors MLR and JSP) was based on two diagnostic accuracy effect measures: the positive likelihood ratio (+LR) and the negative likelihood ratio (−LR). Further, the solution adopts cutoff points described in the literature as providing strong evidence of a test’s ability to rule-in or rule-out a disease , , , , , , , , and extends them into the following +LR and –LR effect pairings:
“Substantial” effect rating, if: +LR>10 and –LR<0.1
“Moderate” effect rating, if: +LR>10 and –LR>0.1 or +LR<10 and –LR<0.1
“Minimal” effect rating, if: +LR<10 and –LR>0.1
It is necessary to express some caveats for these likelihood ratio cutoffs. First, these cutoffs, and the post-test probabilities of disease derived by using them, are not of themselves diagnostic. Accurate diagnosis depends on integration of information arising from diagnostic processes, including history, physical findings, and results of other testing, and it depends on multi-professional efforts to overcome diagnosis “pitfalls and challenges” , [, , . Second, there is an arbitrary nature to setting cutoffs/thresholds in support of qualitative effect size judgments. Cutoffs in support of effect size interpretation, therefore, are not ironclad rules of thumb – effect size interpretation should occur in context of the practical, clinical importance for whatever is being researched , , , . Further, while these cutoffs provide strong evidence of a test’s ability to rule-in (+LR>10) or rule-out (–LR<0.1) a target condition, in practice this ability is dependent on a patient’s pre-test probability of disease in order for a “large and…conclusive [change] from pretest to posttest probability” to be observed, as is readily demonstrated by Fagan’s nomogram 17], , . Finally, as mentioned previously, DTA values (including likelihood ratios) are not a fixed attribute of a test, but may vary according to population, setting/clinical context, or positivity threshold.
Nevertheless, the approach is rooted in established cutoffs representing thresholds for “high” clinical validity in service of (1) straightforward, binary handling of data, and (2) meaningful handling of FP/FN tradeoffs often observed in DTA measures. Given broad acceptance in the literature of these likelihood cutoffs for “high” test information value, a defensible approach was established to meet a specific challenge: derive qualitative effect size measures for a DTA evidence base in a way that is amenable to the CDC DLS LMBP SR method of evidence rating. In general, approaches for simplifying information when making judgments has demonstrated advantages, including accuracy, transparency and accessibility, when the approach is rule-based and framed to a specific context . In sum, this approach allowed for meaningful derivation of effect size ratings for each DTA study, using a DTA measure that is multi-use in nature (application to “test performance, clinical utility, and decision making”), and which may overcome interpretability shortcomings associated with other DTA measures [, , . Finally, effect ratings derived from +LR/−LR pairings advances the notion that “test results can be valuable both when positive and negative” 61] by preserving the discrimination potential of FPs and FNs.
A last note, while the mathematical ratio of +LR to −LR is commonly referred to as the diagnostic odds ratio (DOR), basing effect ratings on the DOR was determined by the review team to be an unacceptable approach. For example, values for DOR are repetitive across various pairings of +LR to −LR, as illustrated in Deeks 2001 , obscuring FP and FN tradeoffs, and further challenging defensible effect rating judgments. For example, a DOR of 500 could indicate either a “substantial” or “moderate” effect if linked to this approach. It is for this reason that pairings of +LR to −LR are assessed, rather than their mathematical ratio expressed as DOR.
Approach step 2
The second step in deriving a single qualitative effect size rating for each study was integrating these cutoffs into a four-quadrant likelihood ratio scatterplot of +LR and –LR pairings, as is further described in the next section.
Generalized approaches for determining effect size ratings
Before applying this approach, a review team in collaboration with an expert panel should identify the relative clinical importance of FPs and TPs, contextualizing the test to the relevant population and clinical setting , , .
The general approach of Figure 4 may be taken when the expert panel determines the clinical importance of FPs and FNs is approximately equal, or that the test (in its intended role) should have the ability to both accurately rule-in and rule-out a target condition. From this perspective, use of point estimates [vs. use of confidence interval (CI) limits] is illustrated when judging effect size rating. However, this approach is flexible if a review team determines it is more appropriate to upgrade or downgrade an effect size rating based on whether the CI for a point estimate overlaps quadrants. For example, the effect size rating could be based on the lower end of a CI if a review team determines that aspect of an estimate is more important to communicate through effect ratings.
While the approach in Figure 4 is based on an assumption of equal weight for the clinical importance of FPs and FNs, an expert panel may determine the clinical importance of one outweighs the other. There may be scenarios, then, where what might be considered a “Moderate” effect could either be upgraded to “Substantial” or downgraded to “Minimal.”
When the effects of the disease are serious, but the disease is treatable and the treatment does not cause patient harms or incur high costs. In this scenario a paired effect in the upper right quadrant of Figure 4 might be considered “Substantial” rather than “Moderate”.
Readers may also refer to Hsu et al. 2011 for additional scenarios weighing benefits of correct classification (i.e. TPs and TNs) against the harms of incorrect classifications (i.e. FPs and FNs) as may further inform tailored use of these likelihood scatter matrices . Lastly, readers may also consider the literature on “misdiagnosis” (i.e. wrong diagnosis), “missed diagnosis”, or “delayed diagnosis” , , , .
Therefore, as an alternative to the approach illustrated in Figure 4, one of the following perspectives may be emphasized:
Rule-in a target condition (or when the clinical importance of FP results outweighs that of FN results) or
Rule-out a target condition (or when the clinical importance of FN results outweighs that of FP results).
Figure 5 depicts this alternative perspective when using the +LR/−LR scatter matrix to derive effect size ratings. In this case, the figure also depicts how interpretation of effect size may be affected by whether a point’s CIs cross quadrants, with “moderate” effects occurring when the CI crosses the horizontal line (left-hand Figure) or the vertical line (right-hand Figure).
Likelihood ratio scatter matrix in the ASM-CDC DLS SR
There are two considerations when rating effect sizes for DTA statistics: (1) identifying an overall index of sample size as relates to the tradeoff between sensitivity and specificity; and (2) weighing their relative clinical importance. To create an overall index of effect, the likelihood ratio scatter matrix was created using the midas command in Stata 15 (Stata Corp., College Station, TX, USA). These scatterplot matrices can be created in any standard statistical package, though the midas procedure in Stata provides the benefit of computing these via a subroutine of the more general diagnostic meta-analysis procedure.
Figure 6 illustrates the likelihood ratio scatter matrix used as a practical tool to rate effect sizes for the SR on C. difficile testing approaches. When paired likelihood ratios were within areas indicating high clinical validity (+LR>10 and –LR<0.1), the review team in collaboration with the project’s expert panel described this as a “Substantial” effect, especially if the CIs of the estimate (as represented by the crosshairs on the summary diamond) did not cross into other quadrants. When only one of the likelihood ratios was within the areas used to indicate high clinical validity, it was considered a “Moderate” effect.
Table 3 illustrates how qualitative effect size ratings subsequently informed the final level of qualitative synthesis in the ASM-CDC DLS SR: rating of overall strength of body-of-evidence using CDC DLS LMBP criteria (Table 1). Strength of body-of-evidence ratings were then used to inform practice recommendations, with three C. difficile test practices achieving a “recommended” categorization as shown in Table 3. This table provides counts for only the highest rated pairings of quality-to-effect for each test practice category. A list of quality ratings and effect size ratings for all studies in the SR is available from the authors upon request, as are details on specific testing practices assessed.
Additional implication of this effect rating approach
Additional methods of grading evidence
Other methods for grading the strength of evidence, such the Grading of Recommendations Assessment, Development and Evaluation (GRADE) , may benefit from this effect size rating approach. As when applying the CDC DLS LMBP method to a DTA evidence base, challenges in applying GRADE have been described , [, . Some of these challenges in GRADE have been (in part) addressed by considering DTA a surrogate (intermediate) patient outcome, to the extent that rates of TP, FP, TN, and FN can be inferably linked to patient management or patient health consequences 23], , , . However, expressing “magnitude of effect” – one of the GRADE criteria for assessing the strength of evidence – for a DTA evidence base appears to remain a challenge.
In GRADE, “magnitude of effect” is a criterion that can upgrade the strength of evidence. While Gopalakrishna et al. 2014 described important challenges in applying three of the GRADE strength of evidence criteria (inconsistency, imprecision, and publication bias), there is no clear solution provided for assessing “magnitude of effect” for DTA. Yet, several GRADE papers, including Gopalakrishna et al. 2014, recommend that (1) the differential patient consequences of TPs, FP, TNs, and FNs be considered when making recommendations from a DTA evidence base, and that (2) these differential consequences should inform emphasis of particular DTA measures , , . On this last point, however, little detailed guidance is provided.
We suggest these a priori considerations can be expressed through an analytic framework for DTA SRs, which should depict inferable (in the absence of direct evidence) clinical outcome types. Clinical outcomes that can be linked to laboratory testing have been described in the literature [, , . Further, by appropriating the effect rating approach described here, patient-important consequences of TPs, FPs, TNs, and FNs can be preserved through pairings of +LR/−LN in a way that (1) is readily visualized for “magnitude of effect” assessment, and that (2) promotes transparent, defensible, and reproducible judgments on effect rating toward grading the strength of a body of evidence. In this way DTA SR “judgments on which would be the more critical accuracy measures to focus” 69] could be addressed in a straightforward, intuitive way that is comparable across DTA SRs.
In sum, this +LR/−LR effect rating approach provides a defensible means of deriving effect ratings, as can then inform potential upgrading of strength of evidence when using the GRADE method , .
For diagnostic quality and safety measures
Diagnostic error has been defined as the “failure to (a) establish an accurate and timely explanation of the patient’s health problem(s), or (b) communicate that explanation to the patient” [. Identifying meaningful measures of diagnostic quality and safety, however, is a noted challenge , , , . While “diagnostic accuracy” in this context signifies more than simply “DTA” (or test clinical validity) 4], , , +LR/−LR pairings represent an aspect of diagnostic information quality and can aid test interpretation, although (of themselves) are not necessarily a suitable direct measure of diagnostic quality and safety.
In this way, +LR/−LR scatterplot matrix pairings may inform the “Diagnostic Process” domain of quality and safety measures described in the 2017 National Quality Forum (NQF) report Improving Diagnostic Quality and Safety. DTA can be equated with a component of diagnostic accuracy identified in the NQF report as “measurement of initial diagnostic accuracy” or “accuracy of initial diagnosis” . In this context, +LR/−LR scatterplot matrix pairings signal a test’s (or a combination of tests) ability to correctly or incorrectly classify patients in relation to a diagnosis, in a way that is straightforward, visual, and clinically meaningful. Further, +LR/−LR pairings may provide an additional means to express whether “diagnostic tests have adequate analytical and clinical validity [as is] critical to preventing diagnostic errors” .
While benefiting transparent effect rating judgments, any approach that simplifies findings risks information loss. For example, this approach does not contain information as to resource utilization (e.g. costs), patient preferences, or the indirectness of evidence to patient outcomes. Readers are further cautioned that a diagnostic test is not an isolated element of the patient diagnostic process; however, a test provides information, the quality of which can be assessed toward test utility and patient-related outcomes. Additionally, the strength of this approach may be diminished if DTA for an index test is established in relation to an imperfect reference standard, although this concern was not formally assessed , .
Finally, use of probabilistic tools (e.g. likelihood ratios, Bayesian reasoning) and “statistical numeracy” has been shown to challenge health care professionals when interpreting diagnostic information , [, , , . Yet, this approach to interpreting DTA measures may be relevant to interventions to improve clinical insights from diagnostic reasoning 84], especially in cases where laboratories implement recommendations to provide likelihood ratios in results reporting .
Findings of DTA SRs should be interpreted in relation to intended clinical use in support of diagnostic quality and safety. The approach described in this paper facilitates meaningful interpretation of results, as well as determination of qualitative effect size ratings. In this way, +LR/−LR scatterplot matrix pairings are answerable to the call to “move beyond summary measures and ask how a new diagnostic test reclassifies patients”  by facilitating ratings of effect linked to clinical practice.
Plebani M. Quality in laboratory medicine: an unfinished journey. J Lab Precis Med 2017;2:1–4. Google Scholar
Hallworth MJ, Epner PL, Ebert C, Fantz CR, Faye SA, Higgins TN, et al. Current evidence and future perspectives on the effective practice of patient-centered laboratory medicine. Clin Chem 2015;61:589–99. CrossrefPubMedGoogle Scholar
Hayen A, Macaskill P, Irwig L, Bossuyt P. Appropriate statistical methods are required to assess diagnostic tests for replacement, add-on, and triage. J Clin Epidemiol 2010;63:883–91. CrossrefPubMedGoogle Scholar
Aakre KM, Langlois MR, Watine J, Barth JH, Baum H, Collinson P, et al. Critical review of laboratory investigations in clinical practice guidelines: proposals for the description of investigation. Clin Chem Lab Med 2013;51:1217–26. PubMedGoogle Scholar
Christenson RH, Committee on Evidence Based Laboratory Medicine of the International Federation for Clinical Chemistry Laboratory M. Evidence-based laboratory medicine – a guide for critical evaluation of in vitro laboratory testing. Ann Clin Biochem 2007;44:111–30. CrossrefGoogle Scholar
Macaskill P, Gatsonis C, Deeks JJ, Harbord RM, Takwoingi Y. Chapter 10: Analyzing and Presenting Results. Cochrane Handbook for Systematic Reveiws of Diagnsotic Test Accuracy. The Cochrane Collaboration. 2010. Available at: https://methods.cochrane.org/sdt/handbook-dta-reviews.
Jones CM, Ashrafian H, Skapinakis P, Arora S, Darzi A, Dimopoulos K, et al. Diagnostic accuracy meta-analysis: a review of the basic principles of interpretation and application. Int J Cardiol 2010;140:138–44. PubMedCrossrefGoogle Scholar
Van den Bruel A, Cleemput I, Aertgeerts B, Ramaekers D, Buntinx F. The evaluation of diagnostic tests: evidence on technical and diagnostic accuracy, impact on patient outcome and cost-effectiveness is needed. J Clin Epidemiol 2007;60:1116–22. PubMedCrossrefGoogle Scholar
Staub LP, Lord SJ, Simes RJ, Dyer S, Houssami N, Chen RY, et al. Using patient management as a surrogate for patient health outcomes in diagnostic test evaluation. BMC Med Res Methodol 2012;12:12. PubMedCrossrefGoogle Scholar
Brozek JL, Akl EA, Jaeschke R, Lang DM, Bossuyt P, Glasziou P, et al. Grading quality of evidence and strength of recommendations in clinical practice guidelines: part 2 of 3. The GRADE approach to grading quality of evidence about diagnostic tests and strategies. Allergy 2009;64:1109–16. PubMedCrossrefGoogle Scholar
Hsu J, Brozek JL, Terracciano L, Kreis J, Compalati E, Stein AT, et al. Application of GRADE: making evidence-based recommendations about diagnostic tests in clinical practice guidelines. Implement Sci 2011;6:62. PubMedCrossrefGoogle Scholar
Singh S, Chang SM, Matchar DB, Bass EB. Grading a body of evidence on diagnostic tests. In: Chang SM, Matchar DB, Smetana GW, Umscheid CA, editors. Methods guide for medical test reviews. Rockville, MD: The Agency for Healthcare Quality and Research (AHRQ), 2012. Google Scholar
Lord SJ, Irwig L, Simes RJ. When is measuring sensitivity and specificity sufficient to evaluate a diagnostic test, and when do we need randomized trials? Ann Intern Med 2006;144: 850–5. PubMedCrossrefGoogle Scholar
Christenson RH, Snyder SR, Shaw CS, Derzon JH, Black RS, Mass D, et al. Laboratory medicine best practices: systematic evidence review and evaluation methods for quality improvement. Clin Chem 2011;57:816–25. CrossrefPubMedGoogle Scholar
Woolf S, Schunemann HJ, Eccles MP, Grimshaw JM, Shekelle P. Developing clinical practice guidelines: types of evidence and outcomes; values and economics, synthesis, grading, and presentation and deriving recommendations. Implement Sci 2012;7:61. CrossrefPubMedGoogle Scholar
Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155:529–36. CrossrefPubMedGoogle Scholar
Cohen JF, Korevaar DA, Altman DG, Bruns DE, Gatsonis CA, Hooft L, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 2016;6:e012799. PubMedCrossrefGoogle Scholar
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7–18. CrossrefPubMedGoogle Scholar
Korevaar DA, van Enst WA, Spijker R, Bossuyt PM, Hooft L. Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD. Evid Based Med 2014;19:47–54. PubMedCrossrefGoogle Scholar
Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA 1994;271:703–7. CrossrefGoogle Scholar
Moreira J, Bisoffi Z, Narvaez A, Van den Ende J. Bayesian clinical reasoning: does intuitive estimation of likelihood ratios on an ordinal scale outperform estimation of sensitivities and specificities? J Eval Clin Pract 2008;14:934–40. CrossrefPubMedGoogle Scholar
Bossuyt P, Davenport C, Deeks J, Hyde C, Leeflang M, Scholten R. Chapter 11: Interpreting Results and Drawing Conclusions. Cochrane Handbook for Systematic Reviews of Diagnsotic Test Accuracy. The Cochrane Collaboration. 2013. Available at: https://methods.cochrane.org/sdt/handbook-dta-reviews.
Price CP, Christenson RH, American Association for Clinical Chemistry. Evidence-based laboratory medicine: principles, practice, and outcomes, 2nd ed. Washington, DC: AACC Press, 2007:17–8 Google Scholar
National Quality Form. Improving Diagnostic Quality and Safety: Final Report. NQF, Washington, DC, 2017. Available at: http://www.qualityforum.org/ProjectDescription.aspx?projectID=83357.
Cooper HM, Hedges LV, Valentine JC. The handbook of research synthesis and meta-analysis, 2nd ed. New York: Russell Sage Foundation, 2009:632. Google Scholar
Schiff GD, Kim S, Abrams R, Cosby K, Lambert B, Elstein AS, et al. Diagnosing Diagnosis Errors: Lessons from a Multi-institutional Collaborative Project. In: Henriksen K, Battles JB, Marks ES, Lewin DI, editors. Advances in patient safety: from research to implementation (Volume 2: Concepts and Methodology). Rockville, MD: The Agency for Health Quality and Reasearch (AHRQ), 2005. Google Scholar
Graber ML, Trowbridge R, Myers JS, Umscheid CA, Strull W, Kanter MH. The next organizational challenge: finding and addressing diagnostic error. Jt Comm J Qual Patient Saf 2014;40:102–10. CrossrefPubMedGoogle Scholar
Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924–6. PubMedCrossrefGoogle Scholar
Gopalakrishna G, Mustafa RA, Davenport C, Scholten RJ, Hyde C, Brozek J, et al. Applying grading of recommendations assessment, development and evaluation (GRADE) to diagnostic tests was challenging but doable. J Clin Epidemiol 2014;67:760–8. CrossrefPubMedGoogle Scholar
Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ 2008;336:1106–10. PubMedCrossrefGoogle Scholar
Ferrante di Ruffano L, Hyde CJ, McCaffery KJ, Bossuyt PM, Deeks JJ. Assessing the value of diagnostic tests: a framework for designing and evaluating trials. BMJ 2012;344:e686. CrossrefPubMedGoogle Scholar
Andrews JC, Schunemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, et al. GRADE guidelines: 15. Going from evidence to recommendation-determinants of a recommendation’s direction and strength. J Clin Epidemiol 2013;66:726–35. CrossrefPubMedGoogle Scholar
Brozek JL, Akl EA, Compalati E, Kreis J, Terracciano L, Fiocchi A, et al. Grading quality of evidence and strength of recommendations in clinical practice guidelines part 3 of 3. The GRADE approach to developing recommendations. Allergy 2011;66:588–95. CrossrefPubMedGoogle Scholar
Balogh E, Miller BT, Ball J, Institute of Medicine (U.S.). Committee on Diagnostic Error in Health Care. Improving diagnosis in health care. Washington, DC: The National Academies Press, 2015. Google Scholar
Reitsma JB, Rutjes AW, Khan KS, Coomarasamy A, Bossuyt PM. A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidemiol 2009;62:797–806. PubMedCrossrefGoogle Scholar
Whiting PF, Davenport C, Jameson C, Burke M, Sterne JA, Hyde C, et al. How well do health professionals interpret diagnostic information? A systematic review. BMJ Open 2015;5:e008155. PubMedCrossrefGoogle Scholar
Graber ML, Kissam S, Payne VL, Meyer AN, Sorensen A, Lenfestey N, et al. Cognitive interventions to reduce diagnostic error: a narrative review. BMJ Qual Saf 2012;21:535–57. CrossrefPubMedGoogle Scholar
About the article
Published Online: 2018-09-22
Published in Print: 2018-11-27
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Disclaimer: The findings and views expressed in this paper do not necessarily represent those of the CDC, nor of the CDC DLS LMBP initiative.
Research funding: None declared.
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.