Diagnostic laboratories have as their main aim to produce reliable results that correctly represent the clinical status of the patient. Over recent years much attention has been paid optimising the analytical quality (precision and bias) of the tests, partly on the basis of methods for setting analytical performance specifications (APS) , . The criteria for APS can be defined by using different approaches, and can be based on clinical outcomes, biological variation (BV) or state-of-the-art analytical performance . Ideally, the APS criteria would be based on requirements relevant to clinical outcome, but high-quality data are sparse. Using the state-of-the-art analytical performance has the disadvantage that it is unrelated to clinically desirable performance or to what is needed to minimise the “analytical noise” as compared to the biological signal. Therefore, in most cases, BV is currently the best available and most straightforward basis for defining APS criteria , , .
Nowadays, diagnostic laboratories are beginning to use the six-sigma concept to assess the performance of laboratory tests and to select appropriate quality control (QC) rules and procedures. Briefly, the sigma score is a marker for the analytical performance of a test, on the basis of the allowable total error (TEA), and actual analytical bias and imprecision. By convention TEA is based on the APS used for monitoring, as the APS for monitoring are more strictly applied than the APS for diagnosis . It is questionable, whether for tests only needed to confirm or rule out a particular diagnosis the TEA should not be adapted and based on a population-based reference range instead of individual range as discussed recently .
In general, the higher the sigma score, the better the analytical performance of the test in relation to the TEA and the lower the frequency of QC that is required , . The “six” in six-sigma refers to the ideal goal. In this case the measurement is expected to generate less than four erroneous results per million test results . A three-sigma is generally considered as the minimum quality. Sigma scores below 3 result in more than 66 error per thousand measurements and application of QC rules to such measurements leads to unacceptable rates of false acceptance and false rejection of test results , , . So far, only a limited number of publications are available about the practical implications of applying the six-sigma method in clinical laboratories with the TEA based on BV data , , .
The TEA needed to calculate the sigma score is derived from within-subject (CVI) and between-subject (CVG) BV data . One difficulty in using published BV data in general, and also in haemostasis, is the large heterogeneity in BV data. This large variation may be explained by several factors, including differences in the duration of the study, the demographics of the study group, the number of subjects and sampling points, pre-analytical conditions, the assay method and statistical methodology. Consequently, depending on the BV data that are used, large differences can be obtained for the sigma score and thus also for the resulting QC-rules, though it is not known which sigma score is closest to reality. A helpful tool for assessing the quality of BV data may be the recently published checklist of Aarsand et al. . This tool scores the presence of essential elements that affect the veracity and utility of the data.
Other components necessary for the calculation of sigma scores are the analytical bias and imprecision of the diagnostic test. The imprecision can be determined by using data from internal or external quality assessment programmes, while bias can often only be determined from external control data. Previously, Thelen et al. demonstrated that the imprecisions of internal and external control data were not identical and suggested that the difference may be due to multiple factors, such as, for example, the more robust statistics on the larger number of data for internal controls . In addition, internal and external control data describe different parts of the analytical performance and therefore comparisons between both sets of data are difficult to make .
The aim of this study was to evaluate the sigma scores of routine haemostasis tests (prothrombin time [PT], activated partial thromboplastin time [APTT], fibrinogen and antithrombin [AT]) for three different instruments and reagent combinations. The TEA was derived from the most reliable data available by selecting studies containing essential elements for BV estimation by applying the Aarsand et al. checklist . As the published data of BV, even after selection using the BV checklist, differ significantly and also determination of analytical performance using internal and external control data is subject to many influences, we discuss the effect of these variables on the sigma score outcomes and the consequences for daily practice.
Materials and methods
In this study the focus was placed on the internal quality sample in the normal range, where the highest sigma score was expected. Internal QC data in the normal range of the coagulation tests PT, APTT, fibrinogen and AT were obtained with a CS5100 analyser (Siemens, The Hague, The Netherlands) using Siemens reagents (Thromborel S, Actin FS, Thromboclottin and Innovance AT, respectively), with a TOP 500 (Werfen, Bedford, MA, USA) using Werfen reagents (ReadiPlasTin, SynthASil, Q.F.A. Thormbin [Bovine], Liquid AT, respectively) and with a STA-R MAX (Stago, Asnières-sur-Seine, France) using Stago reagents (Cephascreen, Neoplastine-R, Liquid Fibrinogen, STAchrom ATIII, respectively). QC material was used from the analyser/reagent suppliers; Siemens VisuN, HemosIL Normal Control 1 (for PT, APTT, fibrinogen), HemosIL Normal Control Assayed (for AT) and Stago STA-Quali-CLOT I. Data was collected from three consecutive months from routine QC measurements (two control measurements per day), according to the manufacturer’s instructions.
Calculations were based on six external quality surveys with two to three data points per survey organised by ECAT or SKML (SKML, Organization for Quality Assurance of Medical Laboratory Diagnostics, The Netherlands). For the APTT, PT and fibrinogen data six surveys from 2017 were used and for AT the data was derived from four surveys from 2017 and two surveys from 2016.
The six-sigma score was calculated by using the formula: sigma=(TEA–bias)/CVA with all values expressed as percentage (%). CVA=analytical coefficient of variation.
As a sigma score of 3.0 is commonly set as the minimum acceptable quality, this was used as the criterion for adequate quality .
Allowable total error
Desirable TEA (%) was determined for each variable using the following formula :
Modified TEA[d], which is used for diagnostic testing
A literature search was performed (July 2018, PubMed), selecting for BV, coagulation, haemostasis and the specific tests. In addition, literature derived from references in recent studies were included. TEA was calculated from selected published data, based on the Biological Variation Data Critical Appraisal Checklist (BIVAC) . D is a scoring option that is used when an essential item is missing in a BV study. Studies with a D-score using the BIVAC and studies with a study period shorter than 1 week, were excluded. The minimum, median and maximum BV values were used for calculating the sigma scores using internal QC data and the median BV value was used for calculating the sigma score of external QC data. A comparison was made between sigma scores derived from internal and external QC data.
Analytical coefficient of variation and analytical bias
CVA of internal control material was calculated by using data derived from internal quality performance during daily practice. For internal quality assessment the bias was not taken into account.
The long-term within-laboratory analytical CV (LCVA) was calculated from external quality assessment data. The model to establish the LCVA is described in detail elsewhere . Briefly, linear regression was applied using the consensus values (X) of each survey as the independent and the corresponding laboratory values (Y) as the dependent variable. The slope (b) and the variability (sy|x) of each regression line were calculated. The LCVA was based on the variability of the regression line (sy|x) and the mean value of all consensus values (
Results of the external quality assessment programmes also allow calculation of the long-term bias (B) . B was calculated by the formula:
and included in the sigma score. This formula takes into account both systematic and proportional bias. sx is the standard error of X, the number of laboratory results included is expressed by n and
On the basis of the literature search, 15 studies were found with BV data for PT, APTT, fibrinogen and/or AT. Three studies were excluded because of a short study period (<1 week) , ,  and two studies were excluded completely ,  and one study partly  due to missing information about the health status of the subjects and/or because of an outdated measurement method. Although studies with a D-score according to the quality criteria of Aarsand et al.  were excluded, still a large variation in the CVI and CVG was observed (Table 1). The sigma score was determined by calculating the TEA using the minimum, median and maximum BV values from the selected papers in combination with internal QC data (Table 2). This resulted in a sigma score higher than 3 for some of the analyser platforms for the PT, APTT and fibrinogen when using the maximum coefficient of variation for both CVI and CVG determined in the BV studies. Using the median coefficient of variation derived from the BV data, only two of the three laboratories reached a sigma score above 3.0 for the fibrinogen analysis and one laboratory for the PT. All other tests were below a sigma score of 3.0. With the minimum values (CVI and CVG) for BV only one laboratory reached a sigma score above 3.0 for the fibrinogen analysis and no sigma score above 3.0 was seen for any other test. These results show that the choice of the CVI and CVG values used dramatically influences the sigma score and thereby the QC strategy that should be applied in daily practice accordingly. Based on external QC data almost all results show a sigma score below 3.0, except for one laboratory that reached a sigma score above 3.0 for fibrinogen (Table 3).
Biological variation derived from literature data.
|CVI, %||CVG, %||TEA, %||References|
|PT||2.3||2.6||5.8||4.0||4.9||6.8||3.1||3.5||7.0||, , , , |
|APTT||1.7||3.3||6.8||7.1||7.8||8.9||3.2||4.8||8.4||, , , , |
|Fibrinogen||6.8||11.5||18.6||14.7||16.4||20.2||9.7||14.5||22.2||, , , , , , |
|AT||1.1||3.1||5.7||2.6||7.8||10.4||1.6||4.7||7.7||, , , , |
Comparison of sigma scores in three laboratories based on CVA of internal QC data and applying the minimum, median and maximum values of biological variation for a desirable TEA.
Comparison of sigma scores calculated from external QC data and by applying the median values of biological variation for a desirable TEA.
As many haemostatic tests are used for diagnostic purposes, a comparison was made between the sigma scores based on criteria for monitoring versus criteria for diagnostic testing. The results are shown in a Sigma Method Decision Chart (Figure 1). A clear shift towards more favourable sigma scores is visible for all methods.
As there are many choices available both intellectually and practically, a straightforward implementation of APS has not yet been possible. In our study, we focussed on APS based on BV data and demonstrated that applying the six-sigma concept on QC data frequently results in sigma values below the minimum acceptable value of 3.0 and observations vary depending on the BV data used. This raises the question of whether the six-sigma concept is useful for quantifying the analytical performance of haemostatic tests and whether six-sigma could be used for setting up the optimal QC-rules in the haemostatic field.
In our inventory of published BV studies that fulfilled the criteria for use in determining the APS, we noted that there is massive variation in the quality of these studies. This makes it necessary to apply a structured way of selecting studies that are reliable, as was also discussed previously by Carbone . The BIVAC list, which was developed recently, provides a structured way of selecting reliable studies . Despite this structured selection, a large variation was still observed for the CVI and CVG (Table 1). This is caused by the heterogeneity in the study setup, as regards, e.g. the duration of the study, the number of subjects studied, the number of sampling points, the composition of the group of subjects, etc. In addition, differences in methods and analysers may also have an impact on the BV data.
When combining the minimum, median and maximum TEA values with the internal QC data, a wide range in sigma values was found. When using the maximum value of BV data, more than 50% of the sigma scores were above 3.0, but while using the minimum value only one laboratory had a sigma score of more than 3.0 for fibrinogen. These results show that sigma scores fluctuate dramatically depending on which BV data are used. Overall, when using the median values, only 25% of the sigma scores were above 3.0. So far, only few data on sigma scores for haemostasis parameters has been published using internal QC data. One study presented a summary of the performance of PT and APTT in multiple clinical laboratories with CVA comparable to our study, but different sigma values because the TEA was not based on BV data . Another study focussing on clinical chemistry parameters showed that indeed sigma scores based on BV data varied extensively when internal QC data were used and that about one third of the clinical chemistry tests had a sigma score below 3.0 .
Furthermore, in daily routine a pathological control is also taken into account, and we expect that the results for this will be even more variable, as the CVA of samples in the clinically relevant area are often higher than for a normal control. As a consequence, a lower sigma score is expected for the pathological control than for the sigma scores presented in this paper.
When using external QC data for our calculations, the sigma scores were even less favourable. The scores for PT, APPT and AT were all below 3.0; only for fibrinogen was a score above 3.0 observed. Similar results were obtained in a recent study by Molina et al.  who demonstrated that the majority of laboratories barely reached the desirable TEA for PT, APTT and AT based on BV data and external QC results. On the other hand, for fibrinogen a large number of laboratories were able to achieve the desirable APS, as in our study. Although their approach is different from ours, it confirms that these tests have difficulty in achieving the APS based on BV data.
Comparing the coefficient of variation based on internal QC data with that based on external QC data demonstrated that these results are similar in about 60% of cases. For the other results the LCVA calculated using external QC data showed higher values than CVA using internal QC data. Also, others stated that the within-laboratory imprecision using internal QC data was comparable, but not identical to imprecision using external QC data . In addition, for external QC data the bias was included in the sigma score, which explains part of the lower sigma score for some tests using external QC data. Another important difference between the two different approaches is that for internal QC data a control sample with a normal level for the routine haemostasis tests was used in our study, while for the external QC data control samples covering a wide concentration range were used. Furthermore, internal QC data explains how a certain test performs over time within a certain laboratory and demonstrates the within-laboratory imprecision compared to their own mean value. In addition, different laboratories may not use similar internal QC materials, while on the other hand, similar materials are used for external QC data, when laboratories participate in the same EQA program. External QC data provides information on how the analytical process is performed compared to other laboratories over time and demonstrates the accuracy of the test . Both the imprecision and bias can be extracted from the external QC data. This provides guidance for a laboratory on how to improve and optimise the analytical process, while internal QC data shows only the stability of the performance with their own analytical process as a reference. Recently, it was suggested that internal and external QC could be combined in order to monitor the analytical bias in “real time” . In this way, harmonisation of tests results could be performed continuously in comparison to other laboratories, which could result in a lower imprecision and bias due to a quicker response to fluctuations over time.
Our study demonstrates that the majority of basic haemostasis tests do not meet the APS using the median value of BV data. One problem is that in that situation no Westgard QC rules are available. If the sigma score is below 3.0, it is very difficult to ensure that analytical quality is satisfactory. Even the strictest Westgard rules will result in a decreased probability for rejection of an analytical run due to a decreased error detection . Schoenmakers et al. suggest introducing extra control measurements and repeat analysis on a second analyser. Furthermore, they suggest improving the imprecision by performing the analysis in duplicate, triplicate or even more . On the other hand, Thelen et al. suggest using state-of-the-art criteria as an alternative, if BV criteria for the parameters do not meet the APS . In the Milan criteria mention is made of the fact that state-of-the-art criteria could be used as an alternative . Our study demonstrates that APS based on BV data is still difficult to use in the field of haemostasis. By further standardising BV studies, more reliable TEA and more precise estimation of the sigma score with a smaller range can be obtained. It is important to take the uncertainty of the sigma value into account when defining accurate APS criteria, because our study shows that the range in sigma values is very wide (Table 2). As even after standardisation BV data will vary, we would suggest using the median TEA including a range for calculating the sigma score. Aarsand et al. suggested performing a meta-analysis with a weighted median based on the quality score of the BV studies . For these calculations, scorings are needed by independent experts to prevent different interpretations of the quality items. A limitation of our study is that we used median values.
As an example, in our current study we observed that LCVA for AT ranges between a minimum and maximum value of 3.9% and 7.7%, respectively. Previously, a median LCVA for AT of 7% was demonstrated using external QC data from more than 80 laboratories , . The results of the “Sysmex” laboratory in our study demonstrated a very low LCVA value compared to the previous studies , , which indicates a high performance. But although this result is almost the highest achievable, it still did not reach a sigma score above 3.0. It is therefore questionable, whether the six-sigma concept is useful for quantifying the quality of this test. Many haemostatic tests are only needed for diagnostic purposes, as in our study fibrinogen and AT, while the current application of the six-sigma concept is based on clinical application used for monitoring . In such cases, performance goals for diagnostic testing could be applied , , as has recently been discussed in a review . If the imprecision for monitoring is replaced by the imprecision for diagnostic testing in the equation for TEA, the sigma score will increase, and many more tests will fulfil the APS. For fibrinogen 100% of all methods will demonstrate a sigma score above 3.0, with a median value of 5.6 (range: 3.7–9.1) (Figure 1). And also, if the criteria for diagnostic testing are compared with the criteria for monitoring using AT QC data, the sigma score increased more towards three-sigma within a range of 0.8–3.1 instead of 0.0–1.4 (Figure 1). Taken together, the calculation of TEA may be adapted to the purpose of a test for diagnostic testing or monitoring.
In conclusion, the use of six-sigma in the haemostasis field is difficult and fraught with obstacles. On account of the large variation in published BV data an updated database is needed, in which only studies are included which fulfil standardised criteria. The consequence of the low sigma values of most haemostasis tests is that performance goals based on BV data are difficult to reach in daily practice. Restriction of the six-sigma model to monitoring hampers application of this model for diagnosis only. A modification of the APS for diagnostic purposes may be an improvement for certain tests. Taken together, laboratories in the field of haemostasis need suitable APS, which are fit for purpose.
The authors thank the SKML for providing external QC data.
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Sandberg S, Fraser CG, Horvath AR, Jansen R, Jones G, Oosterhuis W, et al. Defining analytical performance specifications: consensus statement from the 1st strategic conference of the European Federation of Clinical Chemistry and Laboratory Medicine. Clin Chem Lab Med 2015;53:833–5.
Oosterhuis WP, Bayat H, Armbruster D, Coskun A, Freeman KP, Kallner A, et al. The use of error and uncertainty methods in the medical laboratory. Clin Chem Lab Med 2018;56:209–19.
Fraser CG, Harris EK. Generation and application of data on biological variation in clinical chemistry. Crit Rev Clin Lab Sci 1989;27:409–37.
Fraser CG, Hyltoft Petersen P, Libeer JC, Ricos C. Proposals for setting generally applicable quality goals solely based on biology. Ann Clin Biochem 1997;34:8–12.
Harris EK. Proposed goals for analytical precision and accuracy in single-point diagnostic testing. Theoretical basis and comparison with data from College of American Pathologists proficiency surveys. Arch Pathol Lab Med 1988;112: 416–20.
Oosterhuis WP. Analytical performance specification in clinical chemistry: the holy grail? J Lab Precis Med 2017;02:78.
Gras JM, Philippe M. Application of the Six Sigma concept in clinical laboratories: a review. Clin Chem Lab Med 2007;45:789–96.
Westgard JO. Useful measures and models for analytical quality management in medical laboratories. Clin Chem Lab Med 2016;54:223–33.
Westgard S, Bayat H, Westgard JO. Analytical Sigma metrics: a review of Six Sigma implementation tools for medical laboratories. Biochem Med (Zagreb) 2018;28:020502.
Westgard JO, Westgard SA. Assessing quality on the Sigma scale from proficiency testing and external quality assessment surveys. Clin Chem Lab Med 2015;53:1531–5.
Schoenmakers CH, Naus AJ, Vermeer HJ, van Loon D, Steen G. Practical application of Sigma Metrics QC procedures in clinical chemistry. Clin Chem Lab Med 2011;49:1837–43.
Tran MT, Hoang K, Greaves RF. Practical application of biological variation and Sigma metrics quality models to evaluate 20 chemistry analytes on the Beckman Coulter AU680. Clin Biochem 2016;49:1259–66.
El Sharkawy R, Westgard S, Awad AM, Ahmed AO, Iman EH, Gaballah A, et al. Comparison between Sigma metrics in four accredited Egyptian medical laboratories in some biochemical tests: an initiative towards sigma calculation harmonization. Biochem Med (Zagreb) 2018;28:020711.
Aarsand AK, Roraas T, Fernandez-Calle P, Ricos C, Diaz-Garzon J, Jonker N, et al. The biological variation data critical appraisal checklist: a standard for evaluating studies on biological variation. Clin Chem 2018;64:501–14.
Thelen MH, Jansen RT, Weykamp CW, Steigstra H, Meijer R, Cobbaert CM. Expressing analytical performance from multi-sample evaluation in laboratory EQA. Clin Chem Lab Med 2017;55:1509–16.
Libeer JC, Baadenhuijsen H, Fraser CG, Petersen PH, Ricos C, Stockl D, et al. Characterization and classification of external quality assessment schemes (EQA) according to objectives such as evaluation of method and participant bias and standard deviation. External quality assessment (EQA) Working group A on analytical goals in laboratory medicine. Eur J Clin Chem Clin Biochem 1996;34:665–78.
Gowans EM, Hyltoft Petersen P, Blaabjerg O, Horder M. Analytical goals for the acceptance of common reference intervals for laboratories throughout a geographical area. Scand J Clin Lab Invest 1988;48:757–64.
Meijer P, de Maat MP, Kluft C, Haverkate F, van Houwelingen HC. Long-term analytical performance of hemostasis field methods as assessed by evaluation of the results of an external quality assessment program for antithrombin. Clin Chem 2002;48:1011–5.
Chen Q, Shou W, Wu W, Guo Y, Zhang Y, Huang C, et al. Biological and analytical variations of 16 parameters related to coagulation screening tests and the activity of coagulation factors. Semin Thromb Hemost 2015;41:336–41.
Shou W, Chen Q, Wu W, Cui W. Biological variations of lupus anticoagulant, antithrombin, protein C, protein S, and von Willebrand factor assays. Semin Thromb Hemost 2016;42:87–92.
Riese H, Vrijkotte TG, Meijer P, Kluft C, De Geus EJ. Covariance of metabolic and haemostatic risk indicators in men and women. Fibrinol Proteol 2001;14:1–12.
Thompson SG, Martin JC, Meade TW. Sources of variability in coagulation factor assays. Thromb Haemost 1987;58:1073–7.
Salomaa V, Rasi V, Stengard J, Vahtera E, Pekkanen J, Vartiainen E, et al. Intra- and interindividual variability of hemostatic factors and traditional cardiovascular risk factors in a 3-year follow-up. Thromb Haemost 1998;79:969–74.
Blomback M, Eneroth P, Landgren BM, Lagerstrom M, Anderson O. On the intraindividual and gender variability of haemostatic components. Thromb Haemost 1992;67:70–5.
de Maat MP, van Schie M, Kluft C, Leebeek FW, Meijer P. Biological variation of hemostasis variables in thrombosis and bleeding: consequences for performance specifications. Clin Chem 2016;62:1639–46.
Dot D, Miro J, Fuentes-Arderiu X. Within-subject and between-subject biological variation of prothrombin time and activated partial thromboplastin time. Ann Clin Biochem 1992;29(Pt 4):422–5.
Wada Y, Kurihara M, Toyofuku M, Kawamura M, Iida H, Kayamori Y, et al. Analytical goals for coagulation tests based on biological variation. Clin Chem Lab Med 2004;42:79–83.
Costongs GM, Bas BM, Janson PC, Hermans J, Brombacher PJ, van Wersch JW. Short-term and long-term intra-individual variations and critical differences of coagulation parameters. J Clin Chem Clin Biochem 1985;23:405–10.
Rudez G, Meijer P, Spronk HM, Leebeek FW, ten Cate H, Kluft C, et al. Biological variation in inflammatory and hemostatic markers. J Thromb Haemost 2009;7:1247–55.
Chambless LE, McMahon R, Wu K, Folsom A, Finch A, Shen YL. Short-term intraindividual variability in hemostasis factors. The ARIC Study. Atherosclerosis Risk in Communities Intraindividual Variability Study. Ann Epidemiol 1992;2:723–33.
Sakkinen PA, Macy EM, Callas PW, Cornell ES, Hayes TE, Kuller LH, et al. Analytical and biologic variability in measures of hemostasis, fibrinolysis, and inflammation: assessment and implications for epidemiology. Am J Epidemiol 1999;149:261–7.
Marckmann P, Sandstrom B, Jespersen J. The variability of and associations between measures of blood coagulation, fibrinolysis and blood lipids. Atherosclerosis 1992;96:235–44.
de Maat MP, de Bart AC, Hennis BC, Meijer P, Havelaar AC, Mulder PG, et al. Interindividual and intraindividual variability in plasma fibrinogen, TPA antigen, PAI activity, and CRP in healthy, young volunteers and patients with angina pectoris. Arterioscler Thromb Vasc Biol 1996;16:1156–62.
Carobene A. Reliability of biological variation data available in an online database: need for improvement. Clin Chem Lab Med 2015;53:871–7.
Harrison HH, Jones JB. Using Sigma quality control to verify and monitor performance in a multi-instrument, multisite integrated health care network. Clin Lab Med 2017;37: 207–41.
Molina A, Guinon L, Perez A, Segurana A, Bedini JL, Reverter JC, et al. State of the art vs. biological variability: comparison on hematology parameters using Spanish EQAS data. Int J Lab Hematol 2018;40:284–91.
Badrick T, Graham P. Can a combination of average of normals and “real time” external quality assurance replace internal quality control? Clin Chem Lab Med 2018;56:549–53.
Westgard JO, Westgard SA. The quality of laboratory testing today: an assessment of sigma metrics for analytic quality using performance data from proficiency testing surveys and the CLIA criteria for acceptable performance. Am J Clin Pathol 2006;125:343–54.
Meijer P, Kluft C, Haverkate F, De Maat MP. The long-term within- and between-laboratory variability for assay of antithrombin, and proteins C and S: results derived from the external quality assessment program for thrombophilia screening of the ECAT Foundation. J Thromb Haemost 2003;1:748–53.