The dual process model of reasoning has gained popularity within medical education , . This model distinguishes between rapid and slow cognitive processes referred to as System 1 (rapid) and System 2 (slow) , . This model is associated with the heuristics and biases literature where System 1 is often characterized as error prone . Although research provides evidence against this characterization , , , , the interpretation remains popular. This proposition has sparked research into diagnostic aids to support the integration of both analytic (System 1) and non-analytic (System 2) processes. Accordingly, two classes of strategies have arisen with the explicit goal of engaging both systems to examine available data.
One strategy focuses on debiasing through the recruitment of metacognition to enhance cognitive processing (i.e. be more cautious or attentive), and consider sources of bias , . This approach implies that System 2 can act as a rational corrective agent against diagnostic errors. Attempts to evaluate the effectiveness of metacognitive debiasing strategies have been disappointing; although participants in these studies are able to learn the terminology and the related concepts well, this new knowledge does not improve their diagnostic accuracy , , .
The second set of strategies aid in mobilizing existing knowledge , , , , , , , . These strategies link specific aspects of the clinical problem to prior medical knowledge using prompts in a checklist format. Experimentally, checklists have led to small improvements in diagnostic accuracy , , , . The benefit seems to come from guiding the clinician through a detailed search of the patient case for clinical features that might support their provisional diagnosis or infer alternatives , . These cognitive checklists do not focus on actions or behaviors, as in a surgical context , , but rather on search strategies and consideration of specific attributes. Key features of effective reflective protocols include an initial or provisional diagnosis followed by identification of important case features, alternative diagnoses, and evidence to confirm or contradict the provisional diagnosis .
To date, research has demonstrated not only that diagnostic accuracy may be improved through enforced (i.e. not self-guided) analysis of available data , , but also that these effects are fragile , ,  and may depend on having the requisite knowledge to decide when revisions to a provisional diagnosis are necessary . One aspect of error identification and correction that remains unexplored is the mechanism that leads to the decision to change a diagnosis. What factors within the exploration of a patient case may influence whether a clinician decides to take another look or change their diagnosis?
Graber and colleagues ,  synthesized the essential aspects of an analytic approach into four distinct elements: seeking alternative explanations, exploring the consequences of missing these alternative diagnoses, being open to tests that would help differentiate between different differential diagnoses, and being open to uncertainty. We translated these elements of reflective practice into a simple diagnostic checklist (the ACT diagnostic checklist) that asks three questions of the clinician:
Are there Alternative diagnoses that have not been considered?
What are the Consequences of missing these diagnoses?
Is there contradictory evidence from Traits (features of the case) that do not support my provisional diagnosis?
We studied whether clinicians’ decisions to revise a diagnosis is related to their ability to identify case features using the ACT diagnostic checklist.
Materials and methods
A selection of eight written medical cases were adapted from previous studies , . These cases represented real-life scenarios taken from the experiences of emergency physicians in Hamilton health care centers. Based on previous studies, these cases were deemed appropriately challenging for residents in their first year of training. All case vignettes presented information about patient history, primary complaint, physical exam results, and test results (if relevant) in a standard sequence. A sample case that was used in the study is presented in Supplementary material, Appendix A.
Emergency medicine residents in their first and second year of training at the University of Toronto (UofT) were approached by the principal investigator (MK) and asked to volunteer for the study. Initial recruitment occurred through email linking volunteers to an online survey. However, due to lack of success, additional recruitment was conducted in person with a paper-based survey.
In accordance with the World Medical Association Declaration of Helsinki regarding ethical conduct of research involving human subjects, the study was approved by the Ethics Board at St. Michael’s Hospital (#15-228).
Participants were asked to read a patient case, select an initial diagnosis, use the ACT diagnostic checklist, and select a final diagnosis for each case; participants could diagnose up to eight cases. All participants were first oriented to the ACT diagnostic checklist and then reviewed each case in the same order. While completing each task, the participant was able to see the entire case vignette.
Task one: Participants reviewed the written case vignette and made a provisional diagnosis using five key words or less in an open field text box.
Task two: Participants then completed the ACT tool. This appeared to them as a series of three questions with free text boxes in which the participant could identify up to two alternative diagnoses, identify one consequence of missing the provisional and each alternative diagnosis (e.g. a consequence of missing a diagnosis of aortic dissection would be death), and identify any missing evidence or inconsistent information (traits) related to the provisional diagnosis.
Task three: Participants were asked to state a final diagnosis, which could be the initial (provisional) diagnosis, any of the alternative diagnoses, or a diagnosis that was not previously listed.
Data collection and scoring
Subject matter experts led by JS developed a scoring rubric which was applied in four previous studies using written cases , , , . Using this same rubric, the provisional and final diagnoses were scored as correct or incorrect (1 or 0) by SM and MK. All responses using the ACT diagnostic checklist were scored by MK (senior emergency medicine resident) based on clinical appropriateness (see Supplementary material, Appendix B).
Alternatives were scored 0 for no alternatives offered, and 1 for each plausible alternative.
Consequences were scored overall between 0 and 3. A score of 3 could be achieved by providing three reasonable consequences related to missing the provisional diagnosis and missing the two alternative diagnoses. Consequences and alternatives were scored independently.
The traits were scored 0 for no evidence provided and a score of 2 for providing two plausible data that were missing or contradicted the provisional diagnosis.
An additional score was assigned for the severity of each listed consequence ranging from 0 (not severe) to 3 (very severe). A score of 3 was given to any condition that is acutely life or limb threatening, requires emergent medical or surgical intervention, or results in a permanent disability. A score of 1 was given to any condition that could be managed non-urgently and would not require immediate attention by a physician. A score of 2 was assigned to anything that fell in between these two definitions.
To summarize, this study produced the following sets of responses for analysis: (a) accuracy scores for provisional and final diagnoses (out of 1), (b) accuracy scores for ACT responses: alternative diagnoses (out of 2), consequences (out of 3) and traits (out of 2), and (c) severity rating per consequence (out of 3).
To determine if the decision to revise a diagnosis was related to responses to the ACT protocol questions, the average ACT scores and the severity scores of the consequences were submitted to a binary logistic regression analysis with the binary outcome of a decision to revise or retain a provisional diagnosis. Additionally, we explored if average ACT scores varied with the decision to revise, or the accuracy of the provisional diagnosis. We conducted two separate multivariate analysis of variance (ANOVA). In the first analysis, the accuracy of the provisional diagnosis was a between-subjects factor and the three ACT scores were dependent variables. In the second analysis, the decision to revise or retain was a between-subjects factor and the three ACT scores were dependent variables.
Recruitment was stopped after 17 participants; not all participants completed all eight cases. Eight participants were recruited online and nine were recruited in person. Only scores for completed cases were included in the analysis. A total of 105 cases were completed (Table 1).
Relationship between ACT protocol and the decision to revise
The binary logistic regression analysis revealed that three scores derived from the ACT protocol were associated with the decision to revise: severity rating of the consequence for missing the provisional diagnosis, the overall percent correct for identifying consequences, and the overall percent correct for identifying missing evidence or traits. The other three factors were not significant predictors. The overall model was significant (χ2=23.5, df=6, p<0.001). The Hosmer-Lemeshow test revealed that the model fit the data (χ2=11, p=0.2).
The odds ratio (OR) for severity was 0.27 [95% confidence interval (CI)=0.09–0.83], with a β of −1.3 suggesting an inverse relationship between perceived severity and decision to revise (Wald=5.17, p<0.05). The decision to revise the provisional diagnosis was associated with a lower perceived severity of consequence for missing the provisional [1.85/3, standard deviation (SD)=0.8] than the decision to retain the provisional (2.5/3, SD=0.7) [F (1102)=4.78, p<0.01].
The average score for consequences was also a significant factor (OR=0.04, 95% CI=0.0–0.56), with a β=−3.15 suggesting that the overall ability to identify correct consequences was more strongly related to the decision to retain a diagnosis (Wald=5.76, p<0.05).
Finally, the average score for traits was a significant factor (OR=14.51, 95% CI=1.95–107.98), with a β=2.68 suggesting that the identification of correct missing or inconsistent evidence in the case was associated with decisions to revise (Wald=6.82, p<0.01).
Analysis of individual ACT tool sections
Two alternative diagnoses were provided in 90% of cases, but they were not always appropriate. Only 47 diagnoses were accompanied by two correct alternatives. This translates into an overall percent correct score for alternatives of 66%, SD=34%. There was no relationship detected between the score for alternatives and the decision to revise (p>0.8). The low score for alternatives indicated a limited ability to consider multiple unrelated diagnoses or consequences of missing them. When provisional diagnoses were revised, the percent correct for alternatives was about the same (67.9%) as when the provisional was retained (65.9%). When provisional diagnoses were correct, the alternatives identified were slightly more accurate (70.5%) compared to when they were incorrect (61.2%), but this was not significant.
The percent correct for all three consequences was 84%, SD=28%. The percent of accurate consequences was higher (87.2%) when provisional diagnoses were retained than when revised [66.7%, F (1103)=6.8, p<0.01]. We explored this further using descriptive statistics. On 75 out of 105 responses, the score for consequences was 100% (that is participants indicated three correct consequences for the provisional and two alternatives). When participants had a perfect score for consequences, 68 out of 75 diagnoses were retained.
The percent correct for contradictory evidence (T) was 32%, SD=36%. Percent correct for traits was low (29%) when diagnoses were retained, compared to 57% when diagnoses were revised [F (1103)=8.2, p<0.01], suggesting this evidence may have served to cast doubt on the provisional diagnosis. Conversely, accuracy for detecting contradictory evidence was higher when the provisional diagnosis was incorrect (38%) than when it was correct (28%), but this was not significant (p=0.2).
Relationship between ACT protocol and accuracy of the final diagnosis
A binary logistic regression examining the ACT scores and the final diagnostic accuracy did not identify a predictive factor as none of the variables in the equation were significant. However, there was a marginal relationship between correctly detecting contradictory evidence (i.e. traits) and providing a correct diagnosis (p=0.07).
The three-point ACT checklist was ineffective as an error reduction tool. The accuracy of the diagnosis was identical for both provisional and final diagnoses (0.53, SD=0.50). Overall, 49 out of 105 provisional diagnoses were incorrect, with only 14 attempted corrections and no reduction of errors. The number of revisions that resulted in increased accuracy was equal to the number that resulted in decreased accuracy. Table 1 describes the count of changes made and the direction of the change (i.e. correct or incorrect).
It has been suggested that taking the time to reflect on a diagnosis in a structured manner can improve self-correction of diagnostic error , . To date, there has been little evidence in support of this proposal . One reason for the low success is that studies also demonstrate a reluctance to revise a provisional diagnosis , . Specifically, Monteiro and colleagues demonstrated that the decision to revise occurs on an individual case basis, is not determined by individual physician’s experience or knowledge, and does not typically lead to improved diagnostic accuracy .
The goal of the current study was to evaluate whether an abbreviated diagnostic checklist (ACT) can illuminate the relationship between features of a patient case and the decision to revise a provisional diagnosis. Critically, the goal of the study was not to evaluate individual physician’s diagnostic accuracy or tendencies to revise a diagnosis when applying analytic reasoning; these have already been explored in other studies , , , . The goal was to evaluate the impact of specific items in the diagnostic checklist on the decision to revise as well as on overall accuracy.
The scores derived from the ACT tool ranged from around 33% on the detection of traits (indicating junior residents may need training on detecting contradictory clinical features) to 84% on the identification of consequences (suggesting participants had much stronger knowledge about the clinical attributes of competing conditions). Notably, there were relationships detected between the participants’ accuracy for identifying contradictory information and the decision to revise a provisional diagnosis. This finding highlights the potential beneficial characteristic of reflective protocols that specifically instruct participants to identify inconsistent information within patient cases , , , .
When a provisional diagnosis was incorrect, participants were more likely to accurately detect contradictory evidence, and more likely to attempt a revision. However, the lack of improvement in diagnostic accuracy would suggest that participants were not able to self-correct even in the face of this evidence. An important aspect of diagnostic reasoning that still needs exploration is the ability to re-examine a case or seek additional input once doubt has been identified. The somewhat low scores for identifying alternatives (66%) offer a possible explanation – that participants were unable to identify a reasonable alternative 33% of the time. These missed opportunities may be a reasonable estimate of the source of diagnostic error; a limited ability to generate accurate diagnostic hypotheses restricts the potential for self-correction , , .
This study is best understood in the context of process-based strategies to reduce diagnostic error. Providing diagnosticians with a structured process, whether simple as in this study, or more complex as used by Mamede and colleagues, may provide little to no advantage at point of care. What is most promising about the results of this study, however, is the potential to use this simplified diagnostic checklist as an assessment or training protocol. Trainees may be encouraged to consider using the ACT protocol with practice cases to help activate and structure their knowledge as it pertains to medical diagnosis. Additionally, trainees may benefit from using the protocol in the workplace to identify potential sources of doubt, prompting them to seek a second opinion or additional advice.
There was also a positive relationship between the perceived severity of consequences for missing the provisional diagnosis and the decision to revise the provisional diagnosis; for increasing perceived severity, more diagnoses were revised. This finding may be consistent with medical training that emphasizes attention to “can’t miss” diagnoses; if participants felt that the more severe diagnosis had to be pursued in order to rule it out, this may have also affected the decision to revise. It is possible that the perception that serious consequences are more likely for missing one diagnosis than others may encourage a physician to commit more strongly to that diagnosis.
There are several limitations to this study. The major limitation, which is true for any point-of-care diagnostic checklist, is that there is no reasonable way to force each individual to reason in a standard manner. We cannot be assured that our participants were answering the ACT questions in the same order all the time and whether that impacted their answers at all. However, this would be a limitation of real practice as well; thinking cannot be standardized. A second major limitation of the paper is that the scores of the ACT responses were done by one provider with broadly specified criteria (“based on clinical appropriateness”); these factors limit the replicability of these results. In addition, there was no control group and it is possible that some other factor besides the ACT tool led to participants’ decisions. Finally, it was not possible to force all participants to spend an equal amount of time on each case or to complete all the cases.
Simplistic or brief point-of-care error reduction strategies do not appear to be effective. Future work should focus on improving the acquisition and integration of knowledge and experience , , , , , . This is particularly important during the transition toward more independent practice for junior emergency medicine residents. Although the ACT tool did not affect diagnostic accuracy, it may be a useful teaching tool at the bedside, particularly when staff are reviewing with junior residents and developing an assessment and plan for their patients.
We would like to acknowledge the assistance of Melissa McGowan in the development and organization of this study.
Eva KW, Hatala RM, LeBlanc VR, Brooks LR. Teaching from the clinical reasoning literature: combined reasoning strategies help novice diagnosticians overcome misleading information. Med Educ 2007;41:1152–8. CrossrefPubMedWeb of ScienceGoogle Scholar
Norman GR, Monteiro SD, Sherbino J, Ilgen JS, Schmidt HG, Mamede S. The causes of errors in clinical reasoning: cognitive biases, knowledge deficits, and dual process thinking. Acad Med 2017;92:23–30. CrossrefWeb of SciencePubMedGoogle Scholar
Shimizu T, Matsumoto K, Tokuda Y. Effects of the use of differential diagnosis checklist and general de-biasing checklist on diagnostic performance in comparison to intuitive diagnosis. Med Teach 2013;35:e1218–29. CrossrefPubMedWeb of ScienceGoogle Scholar
Mamede S, van Gog T, van den Berge K, Rikers RM, van SaaseJL, van Guldener C, et al. Effect of availability bias and reflective reasoning on diagnostic accuracy among internal medicine residents. J Am Med Assoc 2010;304:1198–203. Web of ScienceCrossrefGoogle Scholar
Sibbald M, de Bruin AB, Cavalcanti RB, van Merrienboer JJ. Do you have to re-examine to reconsider your diagnosis? Checklists and cardiac exam. BMJ Qual Saf 2013;22:333–8. Web of ScienceCrossrefGoogle Scholar
Sibbald M, De Bruin AB, van Merrienboer JJ. Finding and fixing mistakes: do checklists work for clinicians with different levels of experience? Adv Health Sci Educ Theory Pract 2014;19:43–51. CrossrefPubMedGoogle Scholar
Sibbald M, de Bruin AB, Yu E, van Merrienboer JJ. Why verifying diagnostic decisions with a checklist can help: insights from eye tracking. Adv Health Sci Educ Theory Pract 2015;20:1053–60. CrossrefPubMedGoogle Scholar
Van Klei WA, Hoff RG, Van Aarnhem EE, Simmermacher RK, Regli LP, Kappen TH, et al. Effects of the introduction of the WHO “Surgical Safety Checklist” on in-hospital mortality: a cohort study. Ann Surg 2012;255:44–9. Web of ScienceCrossrefPubMedGoogle Scholar
Ilgen JS, Bowen JL, McIntyre LA, Banh KV, Barnes D, CoatesWC, et al. Comparing diagnostic performance and the utility of clinical vignette-based assessment under testing conditions designed to encourage either automatic or analytic thought. Acad Med 2013;88:1545–51. Web of ScienceCrossrefPubMedGoogle Scholar
Monteiro SD, Sherbino J, Patel A, Mazzetti I, Norman GR, Howey E. Reflecting on diagnostic errors: taking a second look is not enough. J Gen Intern Med 2015;30:1270–4. CrossrefWeb of SciencePubMedGoogle Scholar
Sherbino J, Dore KL, Wood TJ, Young ME, Gaissmaier W, Kreuger S, et al. The relationship between response time and diagnostic accuracy. Acad Med 2012;87:785–91. CrossrefPubMedWeb of ScienceGoogle Scholar
Norman G, Sherbino J, Dore K, Wood T, Young M, Gaissmaier W, et al. The etiology of diagnostic errors: a controlled trial of system 1 versus system 2 reasoning. Acad Med 2014;89:277–84. Web of SciencePubMedCrossrefGoogle Scholar
Monteiro SD, Sherbino JD, Ilgen JS, Dore KL, Wood TJ, Young ME, et al. Disrupting diagnostic reasoning: do interruptions, instructions, and experience affect the diagnostic accuracy and response time of residents and emergency physicians? Acad Med 2015;90:511–7. Web of ScienceCrossrefPubMedGoogle Scholar
Zwaan L, de Bruijne M, Wagner C, Thijs A, Smits M, van der Wal G, et al. Patient record review of the incidence, consequences, and causes of diagnostic adverse events. Arch Intern Med 2010;170:1015–21. Web of SciencePubMedCrossrefGoogle Scholar
The online version of this article offers supplementary material (https://doi.org/10.1515/dx-2018-0073).
About the article
Published Online: 2019-04-16
Published in Print: 2019-06-26
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.