Understanding the limitations of your assay using EQA data with serum creatinine as an example

Objectives: Laboratories need to take into consideration the speci ﬁ city and imprecision of assays not only in veri ﬁ - cation, but also of quality assessment. This study investigates the composition of serum used in EQA materials by comparing material from a single and multiple donors (pooled material), across multiple methods, using creatinine as an example. Methods: Sixteen di ﬀ erent serum matrices were distributed as 36 specimens through the UK NEQAS for Acute and Chronic Kidney Disease Scheme from March 2022 to March 2023. Male-only and female-only serum was used as single donations, pooled donations, unmanipulated or with added exogenous creatinine. Specimens were distributed to primarily UK participants (approximately n=500) for creatinine analysis. Data has been reviewed by method compared to the enzymatic creatinine method principle mean. Results: From the 16 di ﬀ erent matrices, only the enzymatic creatinine assay systems from Roche Cobas and Siemens Atellica met the minimum acceptable bias goal, from biological data, of 5.6 %, in all specimens. Pooled material showed less variation in bias across all methods. Conclusions: Since Laboratories invest a lot of time and money in quality management, they need to know the limitations of their assays so that they are not investigating ‘ apparent ’ EQA/IQC problems which are purely due to non-speci ﬁ c, imprecise assay, rather than an analytical issue in their laboratory. When large numbers of individual donations are combined, interferents are essentially diluted out. Therefore, if EQA material is of this type it will be very di ﬃ cult to determine the actual assay ’ s bias and variability.


Introduction
Modern laboratories produce a large number of test results, from a wide range of assays, that are used every day in decision making in healthcare.Therefore, in an economically and operationally constrained healthcare environment, it is ever more important that resources are used appropriately to deliver the best possible outcomes for patients.This is not only in the selection of assays before implementation, day-to-day test requesting and operational protocols, but also maintenance of the quality of service that is being provided over time.Laboratories are increasingly restricted on their choice of assay, for any analyte, as they are more often than not tied to long contracts with a single provider.
As part of ISO 15189 accreditation, laboratories are required to verify all assays before implementation [1].This includes all parts of the manufacturers' validation claims.Manufacturers are usually clear about their assay performance in their 'kit inserts' (Instructions for Use [IFU]).However, just because it is documented and the assay is CE/ UKCA marked and used by other laboratories, this does not mean that it is analytically fit for the purpose that the laboratory intends to use it for.Ongoing verification of an assay's fitness for purpose is usually maintained by regular Internal Quality Control (IQC) and External Quality Assessment (EQA).There is no single standard to which a laboratory has to perform as minimum requirements for IQC or EQA, neither is there a single standard for production of IQC or EQA material.ISO/IEC 17043:2010 (and now ISO/IEC 17043:2023) sets out standards for proficiency testing, but this does not provide definitive guidance for EQA scheme design and as such EQA schemes operate with varying number of specimens, frequency, specimen matrix and composition.This will all impact on the usefulness of the data, whether it is a snapshot of the laboratory's performance at that point in time, or whether it reflects the overall performance as part of post market surveillance.An opinion paper was published in 2017 on the analytical performance specifications for EQA, but once again only guidance was provided [2].
EQA material can be prepared using large volumes of serum that have been pooled and manipulated to achieve different concentrations of all analytes in question covered by the EQA scheme, or alternatively single donations of serum can be dispatched with little or no manipulation.Either method of preparation can and will impact the quality of the data collected by the EQA provider which may in turn impact its usefulness and effectiveness for the laboratory.
IQC and EQA data are good sources of information about the performance of assays.Both have limitations which the laboratory needs to understand and appreciate.IQC is aimed at real time decisions, for example whether a batch of results can be released but also looks at imprecision and repeatability over a long period of time.As such, by necessity, IQC material is prepared in very large volumes to allow this consistency and so consequently running in new lots is time consuming and expensive.Just because an IQC is of a serum matrix this does not necessarily mean that it is directly comparable to clinical specimens due to the pooling of large volumes and any other manipulation that it may have undergone to ensure its stability.
There is a drive to ensure that EQA material is commutable with patient specimens, i.e. the EQA material 'behaves' the same way as a patient specimen would on an assay system.Formal commutability studies require the pairwise analysis of both patient specimens and EQA material on two or more methods to show that if there are the same method related biases on all materials there is commutability of the EQA material for these methods [3][4][5].However, if a method is not specific or sensitive for the analyte in question, testing for the 'commutability' of the material in question is irrelevant as the method does not have sufficient selectivity.
Assays for creatinine have been part of the foundation of laboratory medicine for the last century [6,7].However, the clinical utility of the creatinine result has evolved from a simple marker of renal disease severity [8,9] to being included in derived equations, such as estimated glomerular filtration rate (eGFR) [10,11], and used in algorithms, such as that for acute kidney injury (AKI) where ratios are calculated with the patient's lowest creatinine result being used as the denominator [12,13].Clinical trials routinely require the measurement of a subject's renal function throughout the trial.All these situations require creatinine assays to be accurate at both high and low concentrations, have good imprecision and, crucially, be analytically stable over time.
Most manufacturers offer both a compensated kinetic Jaffe and an enzymatic creatinine assay.The limitations of the kinetic Jaffe creatinine assay, whether compensated or not, are well known and mainly relate to the poor specificity of the assay for creatinine.This means that the assay is often subject to interference not only from haemolysis, icterus and lipaemia, but also from other 'unknown chromogens' with similar wavelengths of absorbance and from assay architecture co-reactants such as glucose, ketones, protein etc. [14][15][16][17].However, whilst laboratories continue to purchase and routinely use the compensated kinetic Jaffe creatinine assays, manufacturers will continue to supply.There have been many publications which have examined the differences between the compensated kinetic Jaffe and enzymatic creatinine assays.The general consensus is that the enzymatic creatinine assays are preferable to their compensated kinetic Jaffe counterparts [15][16][17][18] and UK NEQAS has been demonstrating and saying this since 2005.
In 2022, data from the UK NEQAS for Acute and Chronic Kidney Disease Scheme indicated that approximately 35 % of UK laboratories were still using a compensated kinetic Jaffe assay despite multiple recommendations that an assay more specific for creatinine should be used.This is even with recommendations for serum creatinine assays to be improved [19] and for laboratories to use creatinine assays that are specific for creatinine (for example, enzymatic assays) and zero-biased compared with isotope dilution mass spectrometry (IDMS) [20].It should be pointed out that enzymatic creatinine assays are not without their own limitations and there have been a number of publications highlighting these, for example interference from high concentrations of lithium heparin (under-filled tubes) [21], high concentrations of N-acetylcysteine [22], IgM paraproteinaemia [23] and haemolysis [24].
Laboratories can, and do, spend a considerable amount of time and money performing IQC and EQA and they should be taking their choice of assay and any specificity issues into consideration when evaluating assay bias and imprecision.An assay can only ever be as good as what it was designed for.This is true for all assays and laboratories should be risk assessing these limitations for patients in terms of the population that they serve.
The purpose of this study was to investigate the composition of serum used in EQA materials by comparing material from a single donor, multiple donors (pooled material) and the spiking of a single analyte into pooled material, across multiple methods.Creatinine has been used as an example due to the clinical importance of this assay and the wide application of creatinine results.The UK NEQAS for Acute and Chronic Kidney Disease Scheme was utilised for the study as it allowed a large number of laboratories and methodologies to be included over a number of matrices and samples.

Specimen preparation
From March 2022 (Distribution 185) to March 2023 (Distribution 196) a total of 36 liquid, serum specimens were distributed for creatinine analysis as part of the UK NEQAS for Acute and Chronic Kidney Disease EQA Scheme for the determination of eGFR.
All serum was sourced either as off-the-clot serum from NHS Blood and Transplant, Filton, UK or commercially procured from TCS Biosciences Ltd., Buckingham, UK.Serum was received frozen and either used as an 'individual donation' (after being filtered to 1 um, using a glass fibre filter paper, Merck, Gillingham, UK), or used as 'pooled material' with serum donations from donors with the same blood group being mixed together, and then filtered as above [Using serum from the same blood group is one of Birmingham Quality's protocols to minimise the potential of micro clots which could impact a laboratory's analytical result.]Higher concentrations of creatinine were achieved by the addition of anhydrous creatinine, Merck, Gillingham, UK (up to a maximum concentration of ∼300 μmol/L) and during this time period one specimen was diluted using 5 % w/v human serum albumin (HSA) (Pharmacy Grade).
Six individual donations (three male and three female) were distributed, un-manipulated, as six of the 36 specimens.Ten different serum 'pools' (four male and six female) were distributed as the remaining 30 specimens.These 'pools' were either un-manipulated (n=4), had added creatinine (n=25) and one specimen was diluted with 5 % w/v Human Serum Albumin.Pool volumes ranged from ∼1.25 L (eight individual donations) to ∼4.5 L (30 individual donations).In total there were 16 different serum matrices distributed throughout the year.This frequency of EQA specimens distributed is at the high end of what is typically used in Europe and the USA.
Three specimens are distributed, notionally monthly, to all participants enrolled in the EQA Scheme.Specimens are stored frozen until dispatch where they were then transported at room temperature (the majority of which arrive within 24 h).Participants are advised to analyse immediately as if they were from a patient.For each specimen, participants report back online (manually or automatically), to the Scheme at Birmingham Quality, the creatinine concentration determined.Participants are also able to check the method that they are using and can change if required to ensure that they are registered for the correct method.

Data analysis
In the UK NEQAS for Acute and Chronic Disease scheme outliers are removed by Healy trimming which is applied to all data [25].This essentially leads to the highest and lowest 5 % of specimen level data being removed from statistical calculations of means and CVs.
Creatinine results are evaluated at both method principle (for example compensated kinetic Jaffe or enzymatic) and at the manufacturer analyser level (Abbott Alinity, Roche Cobas etc.).A mean, standard deviation and coefficient of variation is calculated for each method.However, the target value for every laboratory, irrespective of the method that they are using for creatinine, is the enzymatic creatinine method principle mean, being calculated from results from specific methods.The UK NEQAS for Acute and Chronic Kidney Disease scheme uses the enzymatic creatinine principle as the method mean for a number of reasons (1) the results are available in real time and (2) the number of specimens that are distributed (three, monthly) would make it a very expensive EQA service if every specimen had a reference method value.The enzymatic creatinine principle method means are periodically checked by the analysis of specimens using a Reference Method accredited to ISO/IEC 17025:2017 to give further confidence in the targetsand there is good agreementtherefore there is no justification for undertaking this additional work.This is continually reviewed and if the enzymatic creatinine assays were to change the Scheme would also change to reflect this.

Accuracy
The target value for creatinine in the UK NEQAS for Acute and Chronic Kidney Disease Scheme is the enzymatic creatinine method principle mean and approximately 250 results have contributed to it for each of the 36 specimens.The enzymatic creatinine method principle mean has been validated, during this time period, by analysis of 16 specimens by an isotope-dilution mass spectrometry reference method.The correlation between the enzymatic creatinine method mean and the reference method has an R 2 of 0.9989, and at a creatinine concentration of 100 μmol/L the difference between the two is only 0.68 % (100.7 μmol/L).
Figure 1 shows by method, the % bias for each specimen from Distribution 185 to Distribution 196.This data and a summary is also tabulated in Table 1.Table 1 shows details for each specimen as well as the method bias for each specimen compared to the enzymatic creatinine target value.The summary at the bottom of the table gives the minimum, maximum and median number of data points that have contributed to the method mean which the bias has been calculated from, as well as the minimum, maximum and median biases.The number of individual specimens and individual matrices where the minimum acceptable bias for creatinine of 5.6 % is also given.The minimum acceptable bias for creatinine using analytical performance specification data from biological variation is 5.6 %.This has been chosen as this is the performance specification that has been agreed and is used in the UK NEQAS for Acute and Chronic Kidney Disease scheme.Though analytically it may be better to use desirable analytical performance specifications from biological variation data or even outcome study data, pragmatically we have found that participant laboratories do not engage with us if their targets and goals are perceived to be unobtainable or have to be met in a single step [26].The minimum acceptable bias for creatinine using analytical performance specification data from biological variation is 5.6 %.The enzymatic creatinine assays have a more consistent bias across the concentration range (the bias varies between manufacturers), whereas the compensated kinetic Jaffe creatinine assays show more variation in bias at lower concentrations.
All compensated kinetic Jaffe creatinine assays had at least one specimen that exceeded the 5.6 % minimum bias with 6/16 serum matrices being impacted for both the Roche Cobas and Siemens Advia compensated kinetic Jaffe assays.The enzymatic creatinine assays showed less bias with both the Roche Cobas and Siemens Atellica enzymatic assays having method means for all 16 serum matrices within the 5.6 % minimum bias.There were some individual specimens where the bias was within 5.6 % for all method/manufacturer combinations, but this was only ever seen in pooled material, for example Specimen 185A, 187B and 196C.

Imprecision
Imprecision profiles for all methods from Distribution 185 to Distribution 196 are shown in Figure 2. The minimum acceptable analytical imprecision for creatinine using analytical performance specification data from biological variation is 3.4 % [26] and this is highlighted on each graph.The imprecision is numerically lower for the enzymatic assays across the concentration range (0-300 μmol/L).A minimum number of data points of seven were used to calculate analytical imprecision; the maximum number of data points was over 170.

Single donations
Six of the specimens were derived from single donations of serum (Attribute S -186A, 186B, 186C, 195A, 195B and 195C).Though formal commutability studies have not been performed, these are considered the most representative material compared to routine clinical specimens.Reference Values were obtained on all six of these donations (creatinine concentration 51.0-93.4μmol/L) and in all cases there was good agreement with the enzymatic creatinine method mean (R 2 of 0.9972).None of the six donations showed every manufacturer method mean to be within the minimum acceptable bias of 5.6 % (Table 1).Four out of the six donations did met the minimum acceptable bias for all manufacturer's enzymatic creatinine methods.One limitation of the study is that creatinine was not added to single donations.This was because we wanted material to be used in its most native form, but this has limited evaluation of data with regard to added creatinine.
Any bias values outside the . % limits have been printed in bold italics.

Pooled donations
Pooled material was distributed with or without added creatinine.In total, four different pools of serum without added creatinine (Attribute P) and seven different pools with added creatinine (Attribute PXthe base for one of these pools was distributed as one of the endogenous pools) were distributed.Review of the data from the four different endogenous creatinine only pools (188A, 190A, 190B and 190C) showed the Roche Cobas compensated kinetic Jaffe creatinine assay failed to meet the minimum acceptable bias of 5.6 % in three out of four of the specimens, the Beckman AU compensated kinetic Jaffe creatinine assay failed on one specimen and the Siemens ADVIA enzymatic creatinine assay failed on one specimen.
From the seven different pooled materials that had different amounts of creatinine added, no single material acheived the minimum acceptable bias criteria for all assays.Overall there is less of an impact on bias in the pooled material with added creatinine, but this is likely to be because the creatinine concentration is higher and assays have better imprecision at higher concentrations.).It can be seen that there is a higher degree of variation between methods, both for compensated kinetic Jaffe and enzymatic method principles, in the three single donations of 'normal creatinine concentration' relative to pooled material also of 'normal creatinine concentration'.When creatinine is added into serum to achieve higher concentrations (Figure 3(C)) the relative method bias is mainly related to the base material, not the added creatinine.We know that the imprecision of creatinine assays decreases at higher concentrations, therefore this may explain why there is little impact on relative method bias as creatinine concentration increases.

Reagent variability
Though we did not collect reagent/calibrator lot number information from Participants for their creatinine assay, it can be assumed that there have would been changes to both over the time period of the study.There may have been multiple lots of reagent in use at any one time for any/all  manufacturers.Figure 4 shows for each manufacturer the box and whisker spread of B Score (Bias Scorethe average of 18 individual specimen %biases over a rolling time window of six months) against time, which is represented as the distribution number on the x-axis.Notionally the distributions are one month apart, therefore the x-axis covers a five year period.Plots such as those shown in Figure 4 give an indication of the spread and variation in biases for individual methods at any one time.It is likely that it is the differences in reagent/calibrator lots that contribute to this.For some manufacturer/assay combinations, such as the Roche Cobas enzymatic creatinine assay, the assay bias and spread or participant results has remained relatively stable; whereas the Siemens ADVIA enzymatic creatinine assay has shown a change in B score, from >−9 % to approximately −3 % at Distribution 196 (March 2023).The spread of data for the Siemens ADVIA compensated kinetic Jaffe creatinine assay is much tighter at Distributions 192-196 than those previous.

Discussion
Through the UK NEQAS for Acute and Chronic Kidney Disease Scheme, over the 12 month period March 2022 to March 2023, 36 specimens have been distributed.These were as 16 different serum matrices either as single donations, pooled donations or pooled donations with added creatinine.Variation in relative method biases has been observed which is not consistent between methods on different serum base material.This is evident on both single and pooled donations; however, there is some consistency when the same base material is used with different amounts of creatinine added.
As expected, the compensated kinetic Jaffe assays are those mainly impacted and though this is not a new phenomenon, it is disappointing that laboratories are continuing to purchase and use these assays.Those laboratories need to be aware that the specificity deficiencies of those assays will impact IQC, EQA as well as their patients.Laboratories should always be taking into consideration their service requirements and focus on the clinically relevant concentrations, for them.In the case of creatinine, good accuracy and reproducibility at low concentrations is just as important, or potentially more important, than that at elevated concentrations as these results are included as a denominator in algorithmic calculations such as that for AKI.
It is impossible to routinely identify inaccurate results in patient specimens; however, IQC and falsely ascribed EQA 'errors' will be identified and could lead to unnecessary investigation which is both costly financially and in laboratory time.The laboratory needs to be aware of the limitations of all assays that they are using and take these into account in EQA review, like they would for their risk assessment in patient management.
In this study, lack of commutability of EQA material cannot be excluded as no formal commutability studies have been undertaken; however, minimally manipulated fresh frozen human serum is more likely to behave as clinical specimens compared to say delipidated, defibrinated plasma.The use of frozen material could still be considered a limitation of the study as patient specimens are likely to be analysed fresh.
Use of pooled serum which has been highly manipulated may artificially give false reassurance about the performance of the assay which would not be replicated in patient specimens due to the dilution of any unnknown chromogens with similar wavelengths of absorbance or other interfering compounds.The laboratory needs to be aware of assay specificity and imprecision issues and should consider the type of base material(s) that are provided in EQA schemes as well as the frequency and concentration range covered.Interfering substances have not been discussed in depth as part of this work, but the laboratory should appreciate they are not theoretical possibilities, they are a reality in the clinical specimens in laboratories.A single base material which has been highly manipulated or a low frequency scheme, will not adequately assess the quality and performance of their assays.This is not only crucial for an individual laboratory, but also forms the basis of post market surveillance which is essential for all laboratories to ensure that only the best and most suitable assays are available on the market.
Though this paper uses creatinine as an example the generic principles that have been discussed with regard to EQA Scheme design hold true for all analytes and methods.ISO 15189:2022 is written as a risk based approach for obtaining the best for patient care.EQA as one of the tools of quality management should be fully utilised, where appropriate with well designed schemes, to assist with risk management and not seen as an additional parallel process.

Conclusions
It is only the intensive scheme design that allows these types of observation to be made.It would be impossible for a low frequency scheme (because of insufficient data points) or a large participant scheme (where sample volumes mean that single donation or oligo donation pools cannot be used) to tease out these subtleties.Similarly, any scheme that has a large number of analytes for evaluation within a single

Figure 1 :
Figure 1: Bias plot for serum creatinine by method for specimens distributed from March 2022 (Distribution 185) to March 2023 (Distribution 196).The minimum acceptable bias for creatinine using analytical performance specification data from biological variation is 5.6 %.

Figure 2 :
Figure 2: Imprecision profiles for serum creatinine for all methods from March 2022 (Distribution 185) to March 2023 (Distribution 196).The minimum acceptable analytical imprecision for creatinine using analytical performance specification data from biological variation is 3.4 %.

Figure 3 :
Figure 3: Relative method bias for (A-C) three individual single donations, (D-F) three different pooled donations and (G-I) three pools from the same pooled base with added creatinine.Each individual method bias has been calculated against the enzymatic creatinine method mean.

Figure 4 :
Figure 4: Box and Whisker B score trend plots for serum creatinine for all methods from October 2017 (Distribution 137) to March 2023 (Distribution 196).The B score is the bias score calculated from the average of 18 individual specimen biases over a six month time period.

Table  :
Summary table of method mean % biases for