Functional reference limits: a case study of serum ferritin

Abstract Reference intervals depend on the distribution of results within a reference population and can be influenced by subclinical disease. Functional reference limits present an opportunity to derive clinically relevant reference limits from routinely collected data sources, which consist of mixed populations of unhealthy and healthy groups. Serum ferritin is a good example of the utility of functional reference limits. Several studies have identified clinically relevant reference limits through examining the relationship between serum ferritin and erythrocyte parameters. These ferritin functional limits often represent the inflection point at which erythrocyte parameters change significantly. Comparison of ferritin functional reference limits with those based on population distributional reference limits reveals that the lower reference limit may fall below the point at which patients become clinically unwell. Functional reference limits may be considered for any biomarker that exhibits a correlated relationship with other biomarkers.


Introduction
Reference intervals are often a representation of the 'nondiseased' population and reported as the 2.5th and 97.5th percentiles [1,2]. Sampling a non-diseased population can be challenging, as any sampling approach to derive reference intervals will inherently have unhealthy subjects within the population [3]. Although various study designs and statistical approaches can be utilized to alleviate such bias being reflecting into reference intervals, populations with high incidence of subclinical disease are particularly challenging. Additionally, clinical conditions that exist in a wide spectrum of severity, from physiological to pathological, makes it hard to clearly delineate the subpopulations to be excluded.
Serum ferritin is often reported as having lower reference intervals in females compared to males [4]. However, harmonization studies reporting such differences suggest the possibility of subclinical diseases influencing the reference intervals [5]. As shown in Table 1, serum ferritin reference intervals from various studies are persistently lower for females. This notion was the focus of a review by Rushton and Barth [4], who suggested that the difference between genders for the serum ferritin reference intervals is due to the high incidence of subclinical iron deficiency among the general adult female population. The extent of any subclinical disease in a population is generally unknown and its impact therefore difficult to estimate. Iron deficiency is a global health concern and is highly prevalent among the general female population [6] and difficult to identify [7].
Often, differences between sexes are conditional on age: certain age groups may show larger differences compared to others. Serum ferritin reference intervals often present with differences between sexes and age, with lower levels among females during adolescence and early adulthood, compared to childhood and older ages [8]. One of the foremost theories is the effect of onset of menses resulting in the differences observed. Opinion varies regarding levels of serum ferritin observed during the stages of the menstrual cycle. A study by Kim et al. [9] showed different levels in serum ferritin biomarkers during different stages of the menstrual cycle. In contrast, while the study by Belza et al. [10] had fewer subjects, their study design included longitudinal sampling of subjects and suggested a single blood sample at any time in the cycle would identify iron deficiency. How hormonal changes with menstruation influences serum ferritin reference intervals is not clear. It is possible that the effect is negligible, and the reported reference intervals, being lower for adolescent and adult women in reproductive ages is an accurate reflection of physiological changes. As shown in Figure 1, percentiles among females begin to decline at ages corresponding to the onset of menses. It is also possible that women have a greater prevalence of subclinical iron deficiency due to not replenishing iron lost during the menstrual cycle or previous childbirth, which will inevitably reflect on the percentiles calculated as reference intervals [4]. Another potential reason for the age and gender difference of serum ferritin levels could be differences in iron absorption, although more studies comparing iron absorption differences among adults are needed [11].

Functional reference limits
Statistical analysis is limited to the available information, and although a few numbers of outliers can be addressed, inclusion of a high number of pathological results is difficult to reliably address. Statistical analysis based on population distributions could therefore be vulnerable to biases within the population distribution and could result in reference intervals that are not clinically "normal", even though it may be statistically normally distributed. Additionally, transformations to achieve normally distributed data need to be carefully performed to avoid further inadvertent inclusion of pathological results. An emerging approach to deriving reference intervals is by utilizing both diseased and non-diseased populations to determine "functional" reference limits. Here, the word "functional" can describe both biological and statistical relationships. Physiological processes occur as functional relationships, with changes in one component affecting another related component(s). Homeostasis is a physiological state where the interactions between biomarkers occur within normal operating parameters; with disease, these physiological interactions become pathological and the components of physiological processes show correlated disturbances. Functional reference limits can be defined as the numerical threshold at which the relationship between physiologically related biomarkers changes significantly. Physiologically, it may represent a functional change in physiologic state (e.g., failure of homeostasis) or pathology. Statistically, this is often represented by a change in curvature of the function curve. Often, the functional relationship is specific for a particular clinical condition/physiology. Functional reference limits are calculated by correlating two interrelated biomarkers with each other, displayed visually in a graph, and mathematically modeling the correlation of the biomarkers at the point where the relationship becomes non-linear as displayed in Figure 2.
Functional reference limits are particularly suitable for serum ferritin, with clinical relevance to iron deficiency anemia. The earliest ferritin reference intervals were derived from ferritin levels at which an absence of stainable iron stores in bone marrow aspirate (considered the gold standard) are observed, which could be regarded as the earliest example of a "functional reference limit" [17]. Given the difficulty of sample collection and subjectivity of stainable bone marrow iron stores, a functional ferritin reference limit derived from modern ferritin assays against well characterized and automated measurement of erythrocyte parameters, easily extracted from laboratory databases, and able to account for confounding influences described previously, would provide a sound evidence base for a functional ferritin threshold for iron deficiency.
A decrease in serum ferritin is followed similarly by decreases in erythrocyte biomarkers, as per the progression of iron deficiency into anemia [7]. In treatment of iron deficiency anemia, hemoglobin increases with increasing serum ferritin, up to a serum ferritin threshold above which the hemoglobin concentration "plateaus". This threshold or pseudo-plateau point can be described as the functional reference limit, as shown in Figure 3. Åsberg et al. [18] described the functional correlation between hemoglobin and serum ferritin for ambulant patients, finding an abrupt downward shift in hemoglobin concentrations when serum ferritin reduced below 20 μg/L. Abdullah et al. [19] assessed the functional correlation between serum ferritin and hemoglobin for children aged between 12 and 36 months, reporting a serum ferritin value of <17.9 μg/L is when functional iron deficiency may occur, based on the deteriorating levels in hemoglobin. Markus et al. [20] correlated serum ferritin with several erythrocyte parameters: hemoglobin, mean corpuscular volume, mean corpuscular hemoglobin concentration, mean corpuscular hemoglobin, and red cell distribution width. From their observations, they proposed 26 μg/L for children between 4 months and <13 years, and at least 39 μg/L for adolescents as lower functional limits. Foy et al. [21] used adult serum ferritin and erythrocyte results from hospitals, reporting an approximately five percent decline in erythrocyte indices between 10 to 25 ng/mL serum ferritin. Similarly, Sezgin et al. [22] utilized pediatric serum ferritin results and correlated erythrocyte results to calculate a functional reference limit for serum ferritin, finding a "hematological plateau" occurring near 10 μg/L ( Figure 3).
Before computational resources were as advanced as today, Agha et al. [23] illustrated the correlation between transferrin saturation and serum ferritin as a scatterplot. Although they did not define it as a functional reference limit, the authors found serum ferritin levels below 10 ng/ mL is associated with a transferrin saturation less than 15%. Åsberg et al. [18] had a much larger dataset, and used quantile regression modeling to calculate 10th, 50th, and 90th percentiles for the relationship between hemoglobin and serum ferritin, in addition to multivariable regression spline to determine the simplest non-linear transformation for serum ferritin. These more advanced statistical processes are possible through the improved computational power today.
Abdullah et al. [19] used restricted cubic spline regression to determine the association between serum ferritin and hemoglobin, in their case, also adjusting for potential confounders. Utilizing the spline model, they defined the point at which the relationship plateaus as: the serum ferritin level which predicts the maximum increase in hemoglobin as a function of serum ferritin. Additionally, they correlated the hemoglobin cut-off value corresponding to anemia with serum ferritin.
Markus et al. [20] calculated a non-linear quantile regression to the median with three-parameter exponential equations to the erythrocyte parameters, as a function of ferritin. They derived their functional limits through determining the point at which the modeled hematological values reached a near asymptote (the point at which the pseudoplateau occurred, in their case). Foy et al. [21] utilized a cubic smoothing spline modeling, with their analysis stratified by age and gender. Their limits were defined as the serum ferritin level at which a greater than 5 or 10% decline occurred for erythrocyte parameters; with the exception of hemoglobin distribution width and red cell volume, for which an increase in these parameters relative to the population mean in stable ferritin region (defined as 50-150 ng/ mL) were designated as the limits. Sezgin et al. [22] calculated a fractional polynomial regression model to the median. Their functional reference limit was defined as a plateau occurring when the erythrocyte value difference between subsequent correlated ferritin values becomes less than one.
Although the general principle remains the same, these studies use different approaches to determine the functional limits ( Table 2). The first difference arises on the modeling of the results, and the second difference arises from the definition of the "limit" at which the functional decline or increase occurs. In determining functional reference limits for any biomarker, the first aim is to determine the shape of the association between the biomarkers, ideally through exploratory data analysis. In Figure 3, the scatterplot suggests a singular infliction point. Further exploratory modeling through median spline suggests that after this infliction points, a plateaulike shape occurs. Based on this information, a suitable model to describe this association is selected, in which case a fractional polynomial regression model was deemed appropriate. In contrast, Abdullah et al. observed four to five knots rather than a singular infliction points, with varying levels of changes occurring within the pseudo-plateau [19]. In such case, a restricted cubic spline model was deemed appropriate to model the association between serum ferritin and hemoglobin. Once the model is formed, it is used to determine the "infliction point", representing a point or region where a significant change occurs, which may be of statistical significance (e.g. plateau), and/or clinical significance (e.g., significant decline in hemoglobin in relation to changing serum ferritin values). The infliction point may be defined through various approaches, as listed in Table 2  : Correlation between serum ferritin (µg/L) and hemoglobin (g/L) results for a female population between 13 and 18 years, used to calculate functional reference limits, from Sezgin et al. [22]. Scatterplot presents the raw test result values, while the solid-line going through the scatterplot is the fitted quantile regression model. The lighter line closely following the quantile regression model is a median spline fitted to show the distribution of the scatterplot. The threshold at which the pseudo-plateau begins to occur is defined as the functional reference limit, which was designated as 10 μg/L serum ferritin. Vertical lines at 10, 15, and 20 μg/L serum ferritin are provided for reference. Reprinted with permission.
defining the infliction point as the serum ferritin value corresponding to a Hb value of 110 g/L [18,19,21]. Currently, there is no gold standard on the best approach to defining an infliction point, although the different approaches seem to produce comparable results.
Conceivably, functional reference limits can be determined for biomarkers beyond serum ferritin, including calcium homeostasis, and other micronutrient deficiency/ excesses. Vitamin D deficiency is also common among the general population. There is no clear consensus for the cutoffs to define vitamin D deficiency, and deficiencies present at varying levels by age and gender [24][25][26]. Functional reference limits can provide clinically relevant values through modeling the association between vitamin D, parathyroid hormone, and calcium [25,26]: all of which play an important role in calcium and phosphate homeostasis, with implications for development of calcium disturbances, and osteoporosis [27]. Folate levels can vary considerably between different populations as a result of different dietary intakes [28,29]. Through the association between folate and homocysteine, a robust functional reference limit which is not influenced by dietary factors can be estimated [30].

Clinical use of functional reference limits
The main utility of functional reference limits lies as a means of supporting evidence for reference intervals, and as potential clinical decision limits for specific clinical conditions. In the case of serum ferritin, functional limits provide valuable information on the effect of inclusion of subclinical iron deficiency on female reference intervals, as well as providing a clinically relevant limit for iron deficiency anemia. Sezgin et al. correlated serum ferritin with various erythrocyte biomarkers, and determined a functional reference limit for each [22]. At the higher end point, a serum ferritin level near 10-11 μg/L was associated with a sharp decline in hemoglobin levels, which is suggestive of incipient iron deficiency anemia. At the lower end point, a serum ferritin level near 6-8 μg/L was correlated with declining mean corpuscular volume, which is indicative of microcytic anemia. Clinically, such information may provide insight into physiological/homeostatic limits and severity of suspected iron deficiency. However, functional reference limits do not indicate when a patient becomes iron deficient, which would require either reference intervals or clinical decision limits.
An interesting observation when comparing serum ferritin functional reference limits to the distribution-based reference intervals (calculated by laboratories and harmonization studies) is that the functional limits are higher compared to the lower reference limit for serum ferritin (Tables 1 and 2), and is more evident for females. Functional limits calculated for serum ferritin are determined by associations with erythrocyte parameters: the values derived are statistically relevant thresholds that represent the iron stores at which erythropoiesis starts to be impacted. As such, these functional limits are expected to be higher than population distribution derived reference intervals.
It has been thought previously that reference intervals for females were too low [4], which may be an artifact resulting from the high prevalence of (subclinical) iron deficiency among the general female population [31]. When lower reference intervals for serum ferritin are used for  assessing whether a patient may have iron deficiency, patients may be erroneously misclassified as being iron replete, when in fact they may already have iron deficiency. Although cut-off values for serum ferritin coincide better with the functional limits [8] and are claimed to be used instead of reference intervals for diagnosing iron deficiency [32] the question of the accuracy of lower serum ferritin reference interval remains. Differential diagnosis of iron deficiency based on clinical symptoms is difficult, and serum ferritin is the more useful indicator of this condition [7]. Insufficient iron levels are particularly debilitating for young children even before the onset of anemia, with deficiencies thought to be associated with cognitive developmental impairment [33]. If distribution based percentiles are too low, as is suspected among young females, then iron deficiency may be underdiagnosed in the clinic. Sezgin et al. [22] compared distribution based reference intervals to functional reference limits using the same pediatric population. Their results show that percentiles calculated from the same population can fall below the functional limit indicative of iron deficiency erythropoiesis. Similarly, comparing their reference interval study to functional reference limit study, Parkin et al. reached to the conclusion that the proportion of misclassification due to using lower reference interval compared to functional reference limit for iron deficiency may be higher [5].

Functional reference limits and clinical decision limits
Functional reference limits and clinical decision limits share many similarities in their utility: they can both be used to determine whether a laboratory test result is indicative of a disease or deteriorating condition. Likewise, upper limits and lower limits for a biomarker are determined independently, unlike reference intervals in which the two limits are related to the distribution of results within the population it is being derived from.
The main difference between functional and clinical limits lies in their approach to deriving the limit. Clinical decision limits distinguish diseased population from non-diseased population and arbitrarily determine an 'acceptable' point or threshold for disease (often based on sensitivity and specificity) [34]. Functional reference limits on the other hand use a pool of diseased and nondiseased population to statistically determine the point when homeostasis is disturbed. Functional reference limits therefore do not necessarily have to have an association with disease: they can be used as an indicator of deteriorating condition.
In the case of serum ferritin, a clinical decision limit may indicate when iron deficiency should be suspected, is moderate, or severe. Functional reference limits, as correlated by erythrocyte parameters, indicates at what level of serum ferritin the patient may become anemic, resulting from iron deficiency.

Limitations of functional reference limits
Functional reference limits require biomarkers to have biological associations with others such that change in one results in a change, or a cascade of changes in others. Many hematological biomarkers exist in a homeostatic state with one another, often showing correlated relationships. Increase in cellular iron levels for example is associated with increased ferritin concentrations, and vice versa [35]. However, not all biomarkers have such associations and correlation with others, and mostly act without affecting other biomarkers. For example, prostate specific antigens are currently not known to have any associations with other biomarkers that is indicative of a specific condition or disease. Functional reference limits can therefore not be determined in such cases.
Unlike reference intervals, functional reference limits are independent of one another, such that a lower functional reference limit and an upper functional limit are calculated separately. In some cases, the correlated biomarkers used for calculating lower functional reference limits cannot be used to calculate upper functional limits and vice versa. For example, a lower functional reference limit for serum ferritin can be calculated through its association with hemoglobin during iron deficiency anemia. On the upper end, it is not clear if hemoglobin can be reliably correlated with serum ferritin. Initial investigations suggest a relationship may be present due to anemia of chronic disease, with an inverted-U shaped association. Researches should therefore ensure that selected biomarkers have a homeostatic relationship with one another.
Sample size considerations also need to be taken into account when calculating functional reference limits. The number of samples required for accurate modeling is often higher than would be required for a reference intervals calculated from non-diseased populations, as a mixed sample of healthy and unhealthy populations are required. A further challenge (also faced by reference interval calculation) is the requirement for partitioning, often for age and sex, which demands a larger overall number of samples. Not accounting for such differences could potentially lead to misclassification and poorer patient outcomes [36]. Often, different ages are grouped together based on the similarity of values between the ages [37,38]. However, age grouping can potentially mask normal physiological trends and result in inaccurate reference intervals, particularly if the age groups are too broad [39]. Functional reference limits can be calculated through readily available information in many laboratory information systems, which often contain a large sample of results. Researchers calculating functional reference limits should however ensure the sample sizes are appropriate for an accurate model to be fitted, particularly if age-and/or sex-partitioning is required.

Special consideration: pregnancy
Pregnancy is a time of numerous physiological changes and daily iron requirements can increase from 1 to 6 mg/ day with advancing gestation [40]. Iron deficiency with or without anemia is particularly prevalent in pregnant women, with a recent Australian study finding at first trimester presentations to the antenatal clinic, 31.3% and 5.7% having iron deficiency or iron deficiency anemia respectively [41].
A recent Australian study by Randall et al. [42] examined the relationship between maternal hemoglobin levels ≤20 weeks of gestation and pregnancy outcomes. Their study confirmed earlier work on a U-shaped relationship, with those women which had low or high hemoglobin having a greater risk of adverse outcomes at delivery than those with a normal hemoglobin [43]. Recommendations for routine iron supplementation in pregnancy vary and can depend on the local prevalence of iron deficiency [40,44]. Crispin et al. [45] found serum ferritin measured in the first trimester was predictive of anemia in the pre-delivery period and can identify women for iron therapy and may be better than relying on hemoglobin alone.
Daru et al. [46] has thoroughly reviewed ferritin thresholds for definition of iron deficiency in pregnancy, and found cutoffs ranged from <6 to <60 μg/L. Daru et al. also reiterates the two commonly used cutoffs, <12 and <15 μg/L were introduced into international guidelines by consensus agreements back in 1998 and 2001 respectively [46][47][48]. Daru et al. aimed criticism at the ferritin cutoff <30 μg/L which has been incorporated into national guidelines determined from the 1998 study of van den Broek et al. [49] in which 47% of recruited participants had HIV, and the small sample size limited the estimation the effect of HIV status on serum ferritin levels [46]. Serum ferritin may be predicative of pre-delivery anemia [42] and disturbances in hemoglobin levels are associated with poorer pregnancy outcomes [45]. A strong evidence based ferritin threshold for iron deficiency in pregnancy is urgently required: current ferritin cutoffs are derived from historical studies measured by assays likely no longer in use [46] against the backdrop of varying recommendations, both for and against, the use of routine iron supplementation [40,44].

Summary and outlook
Functional reference limits provide a conceptual framework and flexible modeling approach by utilizing both healthy and unhealthy samples whilst providing clinically relevant reference limits. For serum ferritin, lower functional limits determined through its correlation with erythrocyte biomarkers represent changes which occur during iron deficiency anemia. Functional reference limits can be calculated for any biomarker which presents with a statistical relationship with other correlated biomarker(s).
Research funding: None declared. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. Competing interests: Authors state no conflict of interest.