Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Clinical Chemistry and Laboratory Medicine (CCLM)

Published in Association with the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM)

Editor-in-Chief: Plebani, Mario

Ed. by Gillery, Philippe / Lackner, Karl J. / Lippi, Giuseppe / Melichar, Bohuslav / Payne, Deborah A. / Schlattmann, Peter / Tate, Jillian R.

12 Issues per year

IMPACT FACTOR 2016: 3.432

CiteScore 2016: 2.21

SCImago Journal Rank (SJR) 2015: 0.873
Source Normalized Impact per Paper (SNIP) 2015: 0.982

See all formats and pricing
In This Section
Volume 54, Issue 2 (Feb 2016)


Useful measures and models for analytical quality management in medical laboratories

James O. Westgard
  • Corresponding author
  • Department of Pathology and Laboratory Medicine, University of Wisconsin Madison, WI, USA
  • Westgard QC Inc., Madison, WI, USA
  • Email:
Published Online: 2015-09-30 | DOI: https://doi.org/10.1515/cclm-2015-0710


The 2014 Milan Conference “Defining analytical performance goals 15 years after the Stockholm Conference” initiated a new discussion of issues concerning goals for precision, trueness or bias, total analytical error (TAE), and measurement uncertainty (MU). Goal-setting models are critical for analytical quality management, along with error models, quality-assessment models, quality-planning models, as well as comprehensive models for quality management systems. There are also critical underlying issues, such as an emphasis on MU to the possible exclusion of TAE and a corresponding preference for separate precision and bias goals instead of a combined total error goal. This opinion recommends careful consideration of the differences in the concepts of accuracy and traceability and the appropriateness of different measures, particularly TAE as a measure of accuracy and MU as a measure of traceability. TAE is essential to manage quality within a medical laboratory and MU and trueness are essential to achieve comparability of results across laboratories. With this perspective, laboratory scientists can better understand the many measures and models needed for analytical quality management and assess their usefulness for practical applications in medical laboratories.

Keywords: accuracy; allowable total error (ATE); analytical quality specifications; measurement uncertainty (MU); sigma-metric; total analytical error (TAE); traceability


Analytical performance goals are the driving force for analytical quality management and have widespread impact on quality practices in all medical laboratories. As discussed in a recent review by Plebani [1]:

A better analytical quality should be achieved by setting and implementing evidence based analytical quality specifications in everyday practice; if this will be done, rules for internal quality control and external quality assessment procedures would be more appropriate. Moreover, there is a compelling need for standardization programs improving metrological traceability and correcting biases and systematic errors. Finally, more stringent metrics, such as Six Sigma, should be largely introduced in clinical laboratories to further improve current analytical quality.

In this context, an important conference on “Defining analytical performance goals 15 years after the Stockholm Conference” was held recently in Milan by the European Federation of Laboratory Medicine (EFLM) (see proceedings published in May 2015 issue of CCLM, [2]). The earlier Stockholm conference established a hierarchy of quality goals and goal setting models that have been widely accepted globally and now guide current laboratory practices [3]. The Milan conference may establish new global guidelines and a new metrological order with an emphasis on measurement uncertainty (MU) and possible exclusion of total analytical error (TAE). Given the importance of these issues, there is a need for a broad audience to examine the outcomes and recommendations from the Milan conference [4]:

In this revision, the hierarchy is simplified and represented by three different models to set analytical performance specifications. There is general agreement that some of these are better suited for certain measurands than for others.

Model 1. Based on the effect of analytical performance on clinical outcomes.

  1. Direct outcome studies – investigating the impact of analytical performance of the test on clinical outcomes;

  2. Indirect outcome studies – investigating the impact of analytical performance of the test on clinical classifications or decisions and thereby the probability of patient outcomes, e.g. by simulation or decision analysis.

Model 2. Based on components of biologic variation of the measurand.

Model 3. Based on state-of-the-art.

The purposes of both the original Stockholm hierarchy and the revised Milan hierarchy are to prioritize analytical specifications based on clinical outcomes over those based on biologic goals over those based on state-of-the-art performance. The Milan simplification from 5 to 3 levels of models is a modest adjustment of the Stockholm guidance. However, two underlying issues – (1) an emphasis on MU to the possible exclusion of TAE and (2) a corresponding preference for separate specifications for precision and bias rather than a combined total error specification – would have serious impact on current quality management practices.

The medical laboratory community must consider the usefulness of TAE as a measure of both precision and bias [5], the different purposes of TAE and MU and the medical laboratory’s priority for managing quality over measuring uncertainty [6], the practical needs for goals for allowable total analytical error (ATE), and the integration of TAE and ATE with Six Sigma concepts and metrics to provide a quality management system compliant with the ISO 15189 technical requirements [7]. The fundamental underlying issue of TAE vs. MU must be resolved to implement a cohesive quality system that makes appropriate use of both TAE and MU to achieve quality examination results within a medical laboratory and comparability of results across laboratories.

Measures and models

The major analytical concepts that drive analytical quality management are accuracy, trueness, precision, and traceability. As shown in Figure 1, those concepts can be related to measures of performance, such as TAE, bias, SD or CV, and MU. To utilize those measures for managing analytical quality, specifications must be defined using goal-setting models to establish limits for the amounts of errors that are allowable. Those error goals, called analytical performance goals or analytical quality specifications, are used to validate the performance of examination procedures via experimental studies and statistical data analysis, assess performance relative to the desired quality using sigma-metrics, and plan SQC procedures to ensure attainment of the desired quality in routine production, taking into account the precision and bias observed for the measurement procedure and the rejection characteristics of different control rules and different numbers of control measurements. Operating specifications for precision, bias, and SQC describe the characteristics needed at the bench level to ensure that the desired quality is achieved during routine operation. Finally, quality must be monitored long term to characterize performance, identify problems, and prioritize improvements.

Figure 1:

Measurement concepts, performance characteristics, and models for Analytical Quality Management.

Different measures and models have different purposes and applications. Goal-setting models are intended for establishing allowable limits for precision, bias, and total error. Setting a goal or target for MU is a new undertaking that is complicated by need to eliminate or correct biases or to assume these biases show up as long-term random error and are included in estimates of standard uncertainty (MU expressed as a standard deviation). But, that ignores the immediate reality that biases exist for individual examination procedures in individual laboratories.

Analytic performance characteristics

To better understand the different measures and models, Table 1 provides the definitions of critical performance characteristics. Most of these definitions are the official ISO definitions except for total analytical error, allowable total error, and total error, which are defined in the context of the CLSI guideline for estimating total analytical error [8].

Table 1

Definitions of important performance characteristics.

Table 2 summarizes important analytical performance characteristics, their measures, how they are estimated, and standard experimental protocols available from the Clinical and Laboratory Standards Institute (CLSI). Note that ISO does not identify a measure of accuracy, even though TAE has been commonly used in medical laboratories for many years. ISO instead emphasizes MU, which is a measure of traceability, not accuracy. Both are essential for quality, but traceability is a different performance characteristic than accuracy. Therein lies a major conflict between the perspectives of metrologists and medical laboratory scientists.

Table 2

Relationship between performance characteristics, measures of performance, estimation approach, and standard experimental protocols from Clinical and Laboratory Standards Institute.

In medical laboratories, a test result usually involves a single measurement, therefore the quality of a test result inherently involves the effects of both the precision and bias (trueness) of the examination procedure. That was the original rationale for TAE, i.e. the total impact of the random and systematic errors [5]:

To the analyst, precision means random analytical error. Accuracy, on the other hand, is commonly thought to mean systematic analytic error… None of this terminology is familiar to the physician who uses the test values, therefore, he [or she] is seldom able to communicate with the analyst in these terms. The physician thinks rather in terms of total analytical error, which includes both random and systematic components. From his [or her] point of view, all types of analytic error are acceptable as long as the total analytical error is less than a specified amount…

Thus, the intent of TAE is to provide the measure that fulfills the ISO definition of accuracy as a combination of random and systematic errors. Acceptable accuracy implies that the combined precision and bias of most measurements (say 95%) should be small compared to the requirement for intended medical use.

Total analytical error model

Manufacturers seldom make a claim for accuracy, except for examination procedures classified as “waived methods” in the US where the FDA requires the determination of TAE. The CLSI protocol [8] for direct determination of TAE by a candidate method versus a traceable comparative method requires analysis of 120 patient samples by the new method and up to 10 replicates of each sample by a traceable comparative method (to minimize its random error contribution to the difference between methods). That protocol may be feasible for manufacturers, but direct determination of TAE is not practical in most medical laboratories. Instead, laboratories typically determine precision and bias separately, then combine those estimates using a model that adds bias and a multiple of the SD:


where |bias| is the absolute value of the bias, z is a multiplier for the standard deviation (SD) that may be set as 1.96 for a two-sided 95% limit, but is more commonly set as 1.65 for a one-sided 95% limit. TAE is an expected limit of error, just like expanded measurement uncertainty. If bias were zero, then a 95% limit of TAE would be ±1.96*SD and would be of the same form as MU with a coverage factor of 1.96. A critical issue between TAE and MU is the treatment of bias, whether bias can be eliminated or corrected, or whether the estimate should include bias.

Sigma-metric quality-assessment model

Six Sigma concepts can be readily adapted for assessing the quality of examination procedures. A sigma-metric can be calculated as follows [9]:


where ATE describes the “tolerance limits” and Biasobs and SDobs represent the observed performance of the examination procedure. Concentration units are preferred when a single reference material is analyzed following a protocol to estimate precision by the SD and bias by comparison of the observed mean with an assigned reference value. Alternatively, all terms may be in percentages, i.e.

Sigma-metric=(%ATE %Biasobs)/%CVobs

Percentage figures are more reliable over a range of concentrations when the estimate of precision comes from a replication experiment or SQC data and the estimate of bias comes from a comparison of methods experiment or PT/EQA samples.

As an example, the College of American Pathologists sets ATE for HbA1c as 6.0%. For an examination procedure that has a 2.0% bias and a 1.0% CV, the sigma-metric would (6.0%–2.0%)/1.0% or 4.0σ, which is acceptable (>3.0σ) though not world class quality (6σ). A graphical tool, called the Method Decision Chart [10], can be used to provide a visual display of quality on the sigma-scale.

SQC quality-planning model

To select appropriate SQC procedures, the rejection characteristics of different control rules and different numbers of control measurements can be described by power curves, or power function graphs [11]. Given a specification for ATE, the size of the medically important systematic error (ΔSEcrit) can be calculated as follows [12]:


where 1.65 is the z-value for a one-sided confidence limit that allows a 5% risk of a medically important error.

To select SQC procedures, the calculated critical errors can be drawn over power curves [13]. With the introduction of Six Sigma concepts, the expression (ATE–Biasobs)/SDobswas replaced by the sigma-metric, thereby simplifying the calculation:


This relationship allows power function graphs to be re-scaled in terms of the sigma-metric and makes it quicker and easier to select appropriate control rules and numbers of control measurements. This adaptation is found in the CLSI guideline for SQC procedures, where it is called a Sigma-metric SQC Selection Tool [14]. Figure 2 shows a HbA1c examination procedure that has a sigma-metric of 4.0 and should be controlled by a 13s/22s/R4s/41s multi-rule procedure or a 12.5s single-rule procedure, each having a total of four control measurements per run.

Figure 2:

Quality planning tool for selection/design of SQC procedures having two levels of controls.

The probability for rejection is plotted on y-axis versus the size of systematic error on bottom x-axis and the sigma-metric on the top x-axis. In the key at the right, the different power curves correspond, top to bottom, to the list of control rules, the probability for false rejection (Pfr), total number of control rules (N), and number of runs (R) over which the rules are applied. Vertical line represents examination procedure with observed sigma-metric of 4.0. Chart produced by the EZ Rules3 computer program with permission of Westgard QC.

Six Sigma quality management system model

With the integration of Six Sigma concepts and metrics, the traditional error framework for managing analytical quality has evolved into a Six Sigma Quality Management System (6σQMS) [7, 15] that is compliant with ISO 15189 requirements [16]. As shown in Figure 3, the objective and quantitative management of analytical quality is achieved through several steps in the process:

  • Step 1 – definition of quality for intended use in the form of ATE or a clinical decision interval (Dint)

  • Step 3 – validation of examination procedures with use of a method decision chart [10];

  • Step 5 – formulation of a total quality control strategy based on the observed sigma-metric [17];

  • Step 6 – selection/design of statistical QC (SQC) procedures using a Sigma-metric SQC Selection Tool [14], Charts of Operating Specifications [18,19], or Westgard sigma rules [7]);

  • Step 7 – development of a Total QC Plan that optimizes controls for risk-based QC plans based on a Sigma control prioritization matrix [20];

  • Step 10 – monitoring performance by determination of MU from SQC data and quality on the sigma-scale through PT/EQA programs with the aid of a Sigma Proficiency Assessment Chart [21].

Figure 3:

Model for implementing a Six Sigma Quality Management System. From reference 7 with permission of Westgard QC.

While the focus of this discussion is TAE, the analytical quality-planning model can be expanded to include pre-examination variables, such as sampling variation, sampling bias, and within-subject biologic variation [22]. This expansion creates a clinical decision interval model that is useful for assessing the effects of multiple replicates on analytic variation, multiple samples on within-subject variation, and appropriate SQC designs.

Goal setting models

On the basis of the above discussion, TAE should be recognized as a useful measure of the ISO concept of accuracy; likewise, specifications for ATE should be recognized as useful for managing analytical quality in medical laboratories. That also reveals there are serious limitations to having separate goals for precision and bias, which appear to be preferred in the Milan discussions [23]. Goals for ATE are essential because they are the most widely used and most widely useful analytical quality specifications.

Level 3 models

State-of-the-art models are employed by most if not all PT/EQA programs and are expressed in the form of ATE. In some cases, such as the US CLIA criteria for acceptable performance in PT, those criteria are written into law, with no documentation of their origin [24]. The German RiliBAEK criteria are likewise legally prescribed [25], but there is more information available about their derivation. At the Milan Conference, Orth described the RiliBAEK goals as based on the 90th to 95th percentiles of EQA participants [26]. Jones has described in detail the state-of-the-art practices for setting Allowable Limits of Performance (ALP) for the RCPA program (Royal College of Pathologists of Australasia) [27], where consideration is given to clinical criteria, biologic variation, achievable limits for 80% of laboratories, as well as the need to drive improvement by setting optimal rather than desirable or minimal goals. According to Jones, there is a need for more transparency in how different PT/EQA programs establish their ATE criteria if those criteria are to be harmonized across different programs.

Level 2 models

Goals based on biologic variation are readily available from a database that was initially presented at the Stockholm conference [28] and has been updated several times through the dedicated efforts of Dr. Carmen Ricos and her colleagues from Spain [29]. The work on biologic goals was begun in the US by Cotlove, Harris, and Williams in 1970 [30], adopted in the recommendations from College of American Pathologist’s 1976 Aspen Conference on Analytical Goals in Clinical Chemistry [31], greatly expanded by the many studies of Fraser [32], and adopted again by the Stockholm consensus in 1999 [3]. The Milan proceedings include recommendations to standardize the experimental protocols to ensure the reliability of studies on biologic variation [33, 34].

The biologic goal-setting models described by Fraser and Petersen [35] are widely used today:


where CVi is the intra-individual variation, CVg is the between individual variation, CVa is the allowable analytical CV, Biasa is the allowable analytical bias, and ATEb is the biologic allowable total analytical error. The rationale for the use of CVi for setting CVa is the demanding medical application for monitoring individual patients, whereas the rationale for the use of CVi and CVg for setting Biasa is the impact on diagnostic classifications vs. reference intervals. Combining the two makes use of the model for TAE and provides a way for setting goals for PT/EQA surveys [35].

Setting CVa based on CVi is widely accepted, dating back to the recommendations of the Aspen Conference as well as the Stockholm consensus. Setting Biasa based on CVi and CVg makes use of the Gowan’s model and the effects of analytical performance on maintaining common reference intervals [36] and minimizing misclassification of patients. Combining these two maximum specifications to calculate a biologic ATE has been criticized [37] and one of the new proposals from the Milan conference is a model that combines state-of-the-art and biologic variation [38]. The proposed new model would overcome situations where biologic goals are too demanding, which is a limitation for measurands such as sodium, calcium, etc., that are very tightly controlled physiologically. However, the new model is complex, difficult to apply for the many practical needs in the laboratory (e.g. selection/design of SQC procedures), and defaults to state-of-the-art performance in those cases where biologic goals are very demanding. Thus it represents a lower level 3 model rather than a level 2 biologic model. A better alternative is to maintain the current Fraser-Petersen model and develop clinical outcome models to account for the actual medical use for those measurands where biologic goals are too demanding, i.e. utilize a model that is higher rather than lower in the hierarchy.

Level 1 models

Unfortunately, clinical outcome models for setting analytical quality specifications are difficult, complicated, and require more research and development. Petersen and Klee [39] have recently discussed the effects of precision and bias on guideline-driven medical decision limits and the difficulties with sharp decision limits, concluding that probability function curves are preferable to sharp cutoffs. Such an approach is both difficult to develop and difficult to implement, thus clinical outcome models remain the ideal, but are not yet practical for hundreds of current measurands. A simpler alternative may be to define a “gray-zone” that separates the positive and negative classifications of results.

HbA1c provides an example where clinical diagnostic and treatment guidelines are well-defined and widely utilized. For example, the American Diabetes Association (ADA) [40] recommends classifying patients as follows:

  • ≤5.6 %Hb (37.7 mmol/mol) is considered normal;

  • 5.7 to 6.4 %Hb (38.8 to 46.4 mmol/mol) represents pre-diabetes and provides a gray zone between normal and diabetic classifications;

  • 6.5 %Hb (47.5 mmol/mol) is the cutoff for diagnosis of diabetes;

  • 7.0 %Hb (53.0 mmol/mol) is the target for treatment.

In certifying the “equivalence” of HbA1c measurement procedures, the US National Glycohemoglobin Program (NGSP) specifies an ATE of 6.0% for agreement between a manufacturer’s method and the NGSP secondary reference procedures. Likewise, the College of American Pathologists (CAP) sets an ATE of 6.0% for acceptable performance in proficiency testing vs. a true value assigned by a reference method. Even for level 1 models and guideline driven outcomes, goals for ATE are needed for monitoring performance nationally and internationally through EQA programs that assess the quality of the many different examination procedures available today.


Historically, both TAE and MU were intended to be measures of quality. TAE was introduced in the 1970s to provide a measure of quality of test results produced by only a single measurement in a medical laboratory [5]. According to Panteghini [41], the concept of uncertainty was introduced in the 1990s due to the lack of consensus on how to express the quality of measurement results. MU was useful in metrology laboratories for assigning values to reference materials where multiple measurements were performed. Given that quality is related to the totality of features and characteristics that are necessary to satisfy requirements for intended use, it is essential to recognize that both TAE and MU are valid measures for their intended uses, but they do have different purposes.

TAE is a measure of accuracy that is practical and useful for managing quality in a medical laboratory, as discussed in the collective opinion of a convocation of quality experts in 2009 [42]. MU is the measure of the quality of the traceability chain and is critical for minimizing the total variation from an examination procedure. MU and trueness are critical for achieving comparability of results across examination procedures and across laboratories. If we can agree that accuracy, trueness, precision, and traceability are different yet compatible concepts, then we can establish a more cohesive quality system with appropriate measures and goals for each.

Goals for ATE are necessary and useful for validating the performance of examination procedures, selecting/designing SQC procedures, developing QC plans, assessing quality on the sigma-scale, and satisfying regulatory and accreditation requirements for successful participation in PT/EQA programs. It is important to distinguish these application models from goal-setting models. A criticism of the model for calculating ATE from components of biologic variation [37, 38] doesn’t negate the usefulness of ATE goals for analytical quality management.

All models are based on assumptions that simplify the complexity of the real world. For example, the specification that allowable precision should be 0.5*CVi means that analytic variation will add approximately 12% noise to the biologic signal or test variability observed for an individual patient. There is no scientific theory that says 12% noise is optimal, but this is an accepted rule of thumb that goes back to the earliest days of setting analytical goals. Setting a precision goal based on intra-individual variation is logical because patient monitoring is the most demanding medical use. Setting a bias goal based on group variation is logical because diagnostic classification is the most demanding medical use. Combining these goals for precision and bias to set a goal for ATE is also logical given that medical laboratories make only a single measurement to produce a test result whose accuracy depends on both precision and bias. But these are all assumptions that help us describe and express our perspectives on reality and aid us in our thinking and planning.

All models are inherently limited by their assumptions, but those simplifications of reality make models useful. For practical applications, simplicity is often an advantage over complexity. For example, a simple calculation of a sigma-metric is advantageous over a more complicated calculation, such as a patient-weighted sigma-metric [43] or the z-transformation calculation [44] because it is easier to perform and can be readily implemented. The effects of such new models should be evaluated for their impact on the QMS. For example, for the nine measurands studied by Coskun [44], the average difference between the sigma-metrics calculated by z-transformation minus the simple model was 0.16σ, with a maximum of 0.35σ. Such small differences would not generally affect the decisions being made about the acceptability of performance and the selection of SQC procedures.

New models must be critically evaluated to assess their practical value. For example, a new SQC model from the Milan conference employs an acceptance chart having ATE limits that are narrowed by subtracting the observed MU [45]. Such a chart will have rejection properties that are determined by a single statistical control rule (identifiable by dividing the control limit by the observed SD of the method). Because this new model is limited to a single statistical rule, it lacks the flexibility to consider multi-rule SQC procedures that are generally needed whenever the sigma-metric of an examination procedure is 4 or less.

Not all simplifications are useful. The assumption that bias can be eliminated or completely corrected is not valid for clinically complex measurands, as has been demonstrated for HbA1c, which is perhaps our most standardized examination procedure today. Results from CAP surveys continue to show significant biases between examination subgroups, in spite of the US NGSP certification program and the IFCC global standardization program. For example, for 3 survey samples in 2014 having concentrations of 6.49 %Hb (47.4 mmol/mol), 6.97 %Hb (52.7 mmol/mol), and 9.65 %Hb (82.0 mmol/L), the average absolute biases were 2.31%, 2.29%, and 1.55% for the 26 examination subgroups, resp., with observed maximum biases of 6.0%, 6.2%, and 4.3% [21]. On average, bias consumed about a third of the 6.0% ATE budget and for some examination subgroups consumed the entire error budget. Similar biases were observed by Weykamp et al [46] in their recent investigation and evaluation of targets for HbA1c based on sigma-metrics and biologic variation.

Biases still exist, in spite of extensive standardization and corrections intended to provide equivalent results. Correction is not as simple as re-calibration because different measurement principles actually produce different results for complex biologic measurands due to their selectivity and specificity. Given today’s many proprietary examination procedures, i.e. 26 subgroups for HbA1c, it is difficult to standardize the measurement principle, thus corrections are necessary even though their effectiveness is inherently limited. Corrections within a laboratory are further limited by the availability and expense of certified reference materials, may be illegal under some regulations, or may invalidate a manufacturer’s approval for the marketplace (e.g. CE mark, [47]).

Biases cannot be ignored, even though short term biases due to reagent lots, calibrators, etc., will show up in long term estimates of the standard uncertainty. A medical laboratory must deal with a given set of reagents, calibrators, etc., where real biases exist. Daily quality management requires the use of TAE, whereas MU is necessary for long-term monitoring of the entire traceability chain and, together with trueness, is critical for prioritizing improvements to achieve comparability of results across laboratories.

Comparability of results can be measured by EQA/PT programs that assess bias (trueness) by comparison to a reference value and determine the standard uncertainty as a top-down estimate for a method subgroup, thereby including performance of different analyzers, analysts, reagent lots, calibrator lots, etc. Commutable samples are needed, such as the whole blood preparations provided by CAP for HbA1c surveys. ATE criteria can be applied to determine quality on the sigma-scale, as shown for HbA1c in Figure 4 [21], providing a readily understandable measure of the success of efforts to achieve traceability and comparability.

Figure 4:

Sigma Proficiency Assessment Chart for 2014 College of American Pathologists (CAP) survey results for HbA1c GH2-01 sample with concentration of 6.49 %Hb.

TEa=6.0%. Each point represents the observed trueness (%Bias, y-axis) and the observed Standard Uncertainty (%CV, x-axis) for one of 26 examination subgroups. Results represent a total of 3187 laboratories. From reference 21 with permission of CCLM.

Manufacturers should use EQA/PT results to assess the need to reduce bias and/or reduce standard uncertainty. Braga et al [47] recommend that 50% of the total uncertainty budget should be available for uncertainty of reference materials and methods and the manufacturer’s calibration and transfer procedure. The other 50% should be budgeted for the commercial system imprecision and individual laboratory performance, including a safety margin for internal QC. That means most of the uncertainty budget is the responsibility of the manufacturer’s production process and the manufacturer’s product performance in the laboratory. The laboratory’s responsibility is to use EQA/PT survey results to select traceable and comparable measurement procedures, then follow manufacturers’ directions for use and establish effective SQC procedures to verify attainment of the intended quality of test results.

A remaining issue is whether MU must be determined by the laboratory to inform physicians of the known variability of test results. Laboratories can certainly estimate MU from SQC data collected under intermediate precision conditions, as recommended by ISO 15189, and can make those estimates available to physicians. However, even advocates of MU acknowledge that physicians are already overloaded with laboratory data and have their own difficulties incorporating statistical concepts. A better approach would be to inform the experts who develop diagnostic and treatment guidelines of the need to incorporate the known MU into the interpretive guidelines and to avoid the use of sharp cutoffs by defining an “uncertainty-zone” between different clinical classifications.

The success of the Stockholm conference was the recognition that it was better to identify different approaches and prioritize their usefulness, rather than to dwell on a 30 year debate about which single approach for was best for setting analytical goals. The success of the Milan conference likewise depends on recognizing the variety of measures and models that are useful in analytical quality management and encouraging their appropriate applications, rather than dwelling on what is now a 15 year argument about TAE vs. MU. TAE and MU are both useful for their intended purposes, i.e. TAE is essential for quality management in the medical laboratory and MU is essential for quality management by manufacturers. A comprehensive perspective that recognizes the difference between measures of TAE and MU, as well as their complementary nature and complementary purposes, would be a major contribution to unifying the measures and models for analytical quality management.

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

Research funding: None declared.

Employment or leadership: None declared.

Honorarium: None declared.

Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.


  • 1.

    Plebani M. The CCLM contribution to improvements in quality and patient safety. Clin Chem Lab Med 2013;51: 39–46. [Web of Science] [Crossref]

  • 2.

    Special issue: 1st EFLM Strategic Conference “Defining analytical performance goals – 15 years after the Stockholm conference. Clin Chem Lab Med 2015;53:829–953. [Web of Science]

  • 3.

    Hyltoft Petersen P, Fraser CG, Kallner A, Kenny D. Strategies to Set Global Analytical Quality Specifications in Laboratory Medicine. Scand J Clin Lab Invest 1999;59:475–585.

  • 4.

    Sandberg S, Fraser CG, Horvath AR, Jansen R, Jones G, Oosterhuis W, et al. Defining analytical performance specifications: Consensus statement from the 1st strategic conference of the European Federation of Clinical Chemistry and Laboratory Medicine. Clin Chem Lab Med 2015;53:833–5. [Crossref]

  • 5.

    Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem 1973;19:49–57.

  • 6.

    Westgard JO. Managing quality vs. measuring uncertainty. Clin Chem Lab Med 2010;48:31–40. [Web of Science] [Crossref]

  • 7.

    Westgard JO, Westgard SA. Basic quality management systems: essentials for quality management in the medical laboratory. Madison WI: Westgard QC Inc, 2014.

  • 8.

    EP21A. Estimation of Total Analytical Error for Clinical Laboratory Methods. Wayne, PA:Clinical and Laboratory Standards Institute, 2003.

  • 9.

    Westgard JO. Six Sigma quality design & control, 2nd ed. Madison, WI: Westgard QC Inc., 2006.

  • 10.

    Westgard JO. Basic method validation, 3rd ed. Madison WI: Westgard QC Inc., 2008.

  • 11.

    Westgard JO, Groth T. Power functions for statistical control rules. Clin Chem 1979;25:394–400.

  • 12.

    Westgard JO, Barry PL. Cost-effective quality control: managing the quality and productivity of analytical processes. Washington, DC: AACC Press, 1986.

  • 13.

    Koch DD, Oryall JJ, Quam EF, Feldbruegge DH, Dowd DE, Barry PL, et al. Selection of medically useful Quality-Control procedures for individual tests done in a multitest analytical system. Clin Chem 1990;36:230–3.

  • 14.

    C24A3. Statistical Quality Control for Quantitative Measurement Procedures. Wayne, PA:Clinical and Laboratory Standards Institute, 2006.

  • 15.

    Westgard JO, Westgard SA. Quality control review: implementing a scientifically based quality control system. [Epub ahead of print] Ann Clin Biochem July 5, 2015 as DOI:10.1177/0004563215597248. [Crossref]

  • 16.

    ISO 15189. Medical laboratories – Requirements for quality and competence. Geneva: ISO, 2012

  • 17.

    Westgard SA. Prioritizing risk analysis quality control plans based on sigma-metrics. Clin Lab Med 2013;33:41–53. [Crossref]

  • 18.

    Westgard JO. Charts of operational process specifications (‘OPSpecs Charts’) for assessing the precision, accuracy, and quality control needed to satisfy proficiency testing performance criteria. Clin Chem 1992;38:1226–33.

  • 19.

    Westgard JO, Hyltoft Petersen P, Wiebe D. Laboratory process specifications for assuring quality in the US National Cholesterol Education Program. Clin Chem 1991;37:656–61.

  • 20.

    Westgard JO. Six Sigma risk analysis: developing analytic QC plans for the medical laboratory. Madison, WI: Westgard QC Inc., 2011.

  • 21.

    Westgard JO, Westgard SA. A graphical tool for assessing quality on the sigma-scale from proficiency testing and external quality assessment surveys. Clin Chem Lab Med 2015;53:1531–6. [Crossref] [Web of Science]

  • 22.

    Westgard JO, Seehafer JJ, Barry PL. Allowable imprecision for laboratory tests based on clinical and analytical outcome criteria. Clin Chem 1993;40:1909–14.

  • 23.

    Petersen PH. Performance criteria based on true and false classification and clinical outcomes. Influence of analytical performance on diagnostic outcome using a single clinical component. Clin Chem Lab Med 2015;53:849–55. [Crossref]

  • 24.

    US Centers for Medicare & Medicaid Services (CMS). Medicare, Medicaid, and CLIA Programs: Laboratory Requirements Relating to Quality Systems and Certain Personnel Qualifications. Final Rule. Fed Regist Jan 24, 2003;16:3650–3714.

  • 25.

    RiliBÄK Regulation. Duestsches Arzteblatt 2014;111(38):A1683-A1618. English version: Revision of the Guidelines of the German Medical Association on Quality Assurance in Medical Laboratory Examinations – Rili-BAEK. J Lab Med 2015;39:26–69.

  • 26.

    Orth M. Are regulation-driven performance criteria still acceptable? – the German point of view. Clin Chem Lab Med 2015;53:893–8. [Crossref] [Web of Science]

  • 27.

    Jones GRD. Analytical performance specifications for EQA schemes – need for harmonization. Clin Chem Lab Med 2015;53:919–24.

  • 28.

    Ricos C, Alverez V, Cava F, García-Lario JV, Hernández A, Jiménez CV, et al. Current databases on biological variation: pros, cons, and progress. Scand J Clin Lab Invest 1999;59:491–500. [Crossref]

  • 29.

    Ricos C, Alverez V, Perich C, Fernández-Calle P, Minchinela J, Cava F, et al. Rationale for using data on biological variation. Clin Chem Lab Med 2015;53:863–70. [Crossref] [Web of Science]

  • 30.

    Cotlove E, Harris EK, Williams GZ. Biological and analytic components of variation in long-term studies of serum constituents in normal subjects. III. Physiological and medial implications. Clin Chem 1970;16:1028–32.

  • 31.

    Elevitch FR, ed. Proceedings of the 1976 Aspen Conference on Analytical Goals in Clinical Chemistry. Chicago, IL: College of American Pathologists, 1977.

  • 32.

    Fraser CG. Biological variation: from principles to practice. Washington, DC: AACC Press, 2001.

  • 33.

    Carobene A. Reliability of biological variation data available in an online database: need for improvement. Clin Chem Lab Med 2015;53: 871–878. [Crossref]

  • 34.

    Bartlett WA, Braga F, Carobene ACoşkun A, Prusa R, Fernandez-Calle P, et al. A checklist for critical appraisal of studies of biological variation. Clin Chem Lab Med 2015; 879–886. [Web of Science]

  • 35.

    Fraser CG, Hyltoft Petersen P. Quality goals in external quality assessment are best based on biology. Scand J Clin Lab Invest 1993;53(Suppl 212):8–9.

  • 36.

    Gowans EM, Hyltoft Petersen P, Blassbjerg O, Horder M. Analytical goals for the acceptance of common reference intervals for laboratories throughout a geographical area. Scand J Clin Lab Invest 1988;48:757–64. [Crossref]

  • 37.

    Oosterhuis, WP. Gross overestimation of total allowable error based on biological variation. Clin Chem 2011;57: 1334–6. [Crossref] [Web of Science]

  • 38.

    Oosterhuis WP, Sandberg S. Proposal for the modification of the conventional model for establishing performance specifications. Clin Chem Lab Med 2015;53:925–37. [Web of Science] [Crossref]

  • 39.

    Petersen PH, Klee GG. Influence of analytical bias and imprecision on the number of false positive results using Guideline-Driven Medical Decision Limits. Clin Chim Acta 2014;430:1–8. [Web of Science] [Crossref]

  • 40.

    American Diabetes Association. Standards of Medical Care in Diabetes – 2013. Diabetes Care 2013;36:S11–S66.

  • 41.

    Panteghini M. Application of traceability concepts to analytical quality control may reconcile total error with uncertainty of measurement. Clin Chem Lab Med 2010;48:7–10. [Web of Science] [Crossref]

  • 42.

    Burnett D, Ceriotti F, Cooper G, Parvin C, Plebani M, Westgard J. Collective opinion paper on findings of the 2009 convocation of experts on quality control. Clin Chem Lab Med 2010;48:41–52. [Web of Science]

  • 43.

    Woolworth A, Korpi-Steiner N, Miller JJ, Rao LV, Yundt-Pacheco J, Kuchipudi L, et al. Utilization of assay performance characteristics to estimate hemoglobin A1c result reliability. Clin Chem 2014;60:1073–9. [Web of Science]

  • 44.

    Coskun A, Serteser J, Kilercik M, Aksungar F, Unsal I. A new approach for calculating the Sigma Metric in clinical laboratories. Accred Qual Assur 2015;20:147–52. [Crossref] [Web of Science]

  • 45.

    Ceriotti F, Brugnoni D, Mattioli S. How to define a significant deviation from the expected internal quality control result. Clin Chem Lab Med 2015;53:913–8. [Crossref] [Web of Science]

  • 46.

    Weykamp C, John G, Gillery P, English E, Ji L, Lenters-Westra E, et al. Investigation of 2 models to set and evaluate quality targets for HbA1c: Biologic variation and sigma metrics. Clin Chem 2015;61:752–9. [Crossref] [Web of Science]

  • 47.

    Braga G, Infusino I, Panteghini M. Performance criteria for combined uncertainty budget in the implementation of metrological traceability. Clin Chem Lab Med 2015;53:905–12. [Web of Science] [Crossref]

About the article

Corresponding author: James O. Westgard, Department of Pathology and Laboratory Medicine, University of Wisconsin Madison, WI, USA; and Westgard QC Inc., Madison, WI, USA, E-mail:

Received: 2015-07-24

Accepted: 2015-08-28

Published Online: 2015-09-30

Published in Print: 2016-02-01

Citation Information: Clinical Chemistry and Laboratory Medicine (CCLM), ISSN (Online) 1437-4331, ISSN (Print) 1434-6621, DOI: https://doi.org/10.1515/cclm-2015-0710. Export Citation

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Wytze P. Oosterhuis and Elvar Theodorsson
Clinical Chemistry and Laboratory Medicine (CCLM), 2016, Volume 54, Number 2

Comments (0)

Please log in or register to comment.
Log in