Diagnostic error is increasingly recognized as a leading cause of patient morbidity and mortality . Understanding and teaching diagnostic reasoning is one strategy to mitigate this problem , with multiple approaches being suggested , , . Using Bayesian reasoning, defined here as the process of using base rate (pre-test) probabilities and new clinical information (history, exam findings, or tests) to calculate a revised (post-test) probability, is one approach to reduce diagnostic error by improving deliberate, analytical (System 2) reasoning , , .
Physicians often have poor understanding of concepts germane to Bayesian reasoning such as likelihood ratios, which can lead to inaccurate post-test probability estimation . Using natural frequencies (rather than probabilities) and visual aids has shown some promise in improving Bayesian reasoning . Most physicians still do not use formal Bayesian methods because of perceived lack of practicality and discomfort with performing the calculations , . This suggests physicians often lack understanding and tools (e.g. technology for real-time calculation) to integrate Bayesian reasoning into clinical practice and bedside teaching. Our aim was to create a workshop that equips clinician educators with the tools to effectively teach and use Bayesian reasoning in clinical settings.
We developed a workshop that incorporated smartphone apps, visual models, and case-based learning. Smartphone apps convey multiple advantages for teaching Bayesian reasoning. They quickly calculate post-test probabilities once pretest probabilities and test characteristics are entered, which mitigates principal discomfort with calculations , . Variables can be easily manipulated in the apps for different clinical situations or alter scenarios, using “what ifs” to compare and contrast how post-test probabilities might change based on different clinical information. Many apps contain easily accessible repositories of clinical prediction rules (e.g. Wells’ criteria) and describe test characteristics. Several apps also provide references to primary literature, allowing physicians to review the source material and determine if the reported test characteristics apply to their patient population. Finally, smartphones are nearly ubiquitous with medical trainees , making apps readily available, portable, and relatively affordable teaching tools. Using visual aids has been suggested as a method to understand Bayesian reasoning , , , so we created a visual model that helps demonstrate the concepts of probabilities and thresholds in clinical contexts (Figure 1). We employed case-based learning followed by reflective discussion to encourage practice with the concepts and apps, and to facilitate experiential learning . The authors iteratively developed clinical cases that involve uncertainty in diagnosis to promote discussion amongst learners.
Our 90-min workshop begins with a case presentation of a patient with a febrile respiratory illness. Learners anonymously enter how probable they think it is that the patient has influenza into Poll Everywhere (Poll Everywhere, San Francisco, CA, USA), an online audience response system. Next, they are given a negative rapid influenza test result with its sensitivity and specificity values, and are asked to enter a revised probability into Poll Everywhere. The broad range of submitted answers for both pre- and post-test probabilities are then reviewed and participants are asked to reflect on how they arrived at those numbers. The variation in answers leads to robust conversations about how clinicians set probabilities and interpret test results. We then review concepts about test characteristics such as sensitivity, specificity, and likelihood ratios, emphasizing the understanding of underlying concepts. Learners are introduced to the Fagan nomogram,  a tool that allows a user to draw a line from a pre-test probability (determined by the user) through a likelihood ratio value (determined by the “test”) to arrive at a post-test probability. The visual representation of these concepts (Figure 1) is displayed and manipulated through various scenarios to illustrate how changes in different variables affect each scenario, and to illustrate how multiple clinicians with different thresholds or probabilities can lead to provider disagreement and can create confusion for learners.
We then introduce multiple smartphone apps that allow for integration of pre-test probabilities, likelihood ratios, and clinical prediction rules at the bedside. This list of apps is not an exhaustive list for clinical practice, but rather a list of apps that we have found to be useful in teaching. The smartphone apps cost between $0 and $9.99; a few require subscription fees (Table 1). Participants work through the cases using the introduced smartphone apps, sharing with the group how they envision utilizing the concepts and apps for clinical practice and teaching learners.
For example, one case encourages participants to imagine themselves as supervisors on a ward team admitting a young adult patient with severe iron deficiency anemia and 3-year history of intermittent, severe abdominal pain, and diarrhea. After getting more case details, workshop participants discuss their pre-test probabilities of inflammatory bowel disease (IBD) and commit to a specific probability between 0% and 100%. They are given the sensitivity and specificity for fecal calprotectin in IBD, and told that one of their imagined learners wants to send the test while another wants to refer the patient for endoscopy. Attendees role-play navigating this clinical decision with the use of a smartphone app and accompanying visual aids (Figure 1) to turn this disagreement between learners into a Bayesian learning opportunity. Using cases allows learners to practice applying the concepts and smartphone apps in a clinical scenario. Other cases integrate other concepts such as clinical prediction rules and risk/benefit considerations when ordering tests.
The Alliance for Academic Internal Medicine (AAIM) gave permission to share anonymous evaluation data. The University of Cincinnati Institutional Review Board deemed this analysis of aggregate evaluation data without individual identifiers as not human subject research.
Our workshop has been delivered three times at the national AAIM chief resident meeting with an estimated 200 total learners (actual attendance was not recorded). Forty-nine evaluations were completed by attendees. The average weighted rating on a 5-point Likert scale over the 3 years for the prompt “Overall satisfaction with the session” was 4.32, with other workshops at the same conference averaging 3.96. Narrative comments from attendee free responses suggested that the visual model and smartphone apps would be potentially useful for teaching the concepts to their own learners:
“I feel much more confident in my understanding of EBM diagnostic reasoning, and I expect that I will be using much of what I gained from this lecture to teach residents in the coming academic year”.
“I had heard this information in the past and never really bought into it because it was poorly delivered and provided no way to actually use it. This solved both of those problems by having an engaging lecture and also immediately providing tools that allow for this to be clinically incorporated”.
Our experiences suggest teaching Bayesian reasoning with smartphone apps, visual models, and case-based learning is well received by chief resident participants. In addition to the aforementioned conference, we have used these tools to teach Bayesian reasoning at local student and resident conferences, faculty development sessions, interdisciplinary regional meetings, and two grand rounds. In conversations with learners across levels and disciplines, a common theme that arises is difficulty assigning specific numbers to pre-test probabilities due to a feeling of subjectivity. Our case-based discussions emphasize that pre-test probabilities are often subjective but are not arbitrary, as clinicians can use epidemiology, clinical experience, clinical prediction rules, and illness scripts to determine them. Our workshop provides opportunities for practice and feedback with clinical scenarios to develop comfort with setting probabilities.
Our description of this innovation has several limitations. First, the evaluation was based on a small sample size with few narrative comments. Second, the quantitative outcomes presented are low on Kirkpatrick’s hierarchy , and do not assess higher level learning outcomes such as knowledge, attitudes, or behaviors. Third, evaluation data from other venues where this workshop has been given were not included in the study. Fourth, due to the niche market that these apps occupy, they are often built and maintained by small groups or individual developers, and can be subject to delayed updates, bug fixes, and operating system compatibilities that limit their long-term viability.
A workshop using visual models, clinical cases, and smartphone apps was well received by chief residents as a way to learn and teach Bayesian reasoning. We anticipate that our approach can be adapted to others’ local education needs, and inform the development of Bayesian reasoning curricula. Knowledge and skills decay over time, so it is likely that a single workshop is not sufficient to teach Bayesian reasoning. Next steps include integrating Bayesian reasoning into our local residency and medical school curricula in a more longitudinal fashion by developing additional cases and giving learners more opportunities to practice with Bayesian concepts at regularly scheduled small-group learning sessions. Future outcome measures could include learner comfort with Bayesian reasoning and diagnostic accuracy with designed clinical cases such as measured by Vermeersch et al.  Outcomes higher on Kirkpatrick’s hierarchy that could also be measured include clinician behaviors such as rate of app usage in true clinical care, test ordering rates, and resident teaching behaviors.
Graber ML, Kissam S, Payne VL, Meyer AN, Sorensen A, Lenfestey N, et al. Cognitive interventions to reduce diagnostic error: a narrative review. BMJ Qual Saf 2012;21:535–57. CrossrefPubMedWeb of ScienceGoogle Scholar
Whiting PF, Davenport C, Jameson C, Burke M, Sterne JA, Hyde C, et al. How well do health professionals interpret diagnostic information? A systematic review. BMJ Open 2015;5:e008155. CrossrefPubMedWeb of ScienceGoogle Scholar
Reid MC, Lane DA, Feinstein AR. Academic calculations versus clinical judgments: practicing physicians’ use of quantitative measures of test accuracy 1. Am J Med 1998;104:374–80. CrossrefGoogle Scholar
Kolb D. Experiential education: experience as the source of learning and development. Englewood Cliffs, NJ: Prentice Hall, 1984. Google Scholar
Kirkpatrick DL. Evaluating training programs: the four levels. San Francisco, CA: Berrett-Koehler, 1994. Google Scholar
About the article
Published Online: 2019-02-28
Published in Print: 2019-06-26
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.