Relationship between fitness performance and a newly developed continuous body composition score in U.S. adolescent boys

Objectives: Body composition (BC) assessment typically requires the administration of a single test and can have different evaluation outcomes depending on the selected test and the specific population. The purpose of this study was twofold. Firstly, to develop and validate a novel continuous body composition (CBC) score using the continuous response model (CRM). Secondly, to examine the relationship between CBC scores and fitness performance. Methods: Data from the 2012 NHANES National Youth Fitness Survey (NNYFS) were used and consisted of n=212 adolescent boys 12–15 years of age. CBC scale variables included body mass (BM), body mass index (BMI), arm circumference (AC), waist circumference (WC), calf circumference (CC), calf skinfold (CSF), triceps skinfold (TSF), and subscapular skinfold (SSF). Fitness performance variables included cardiorespiratory fitness (CRF, mL/kg/min), leg strength (LS, lb), modified pull-ups (MPU, #), grip strength (GS, kg), and plank (PL, sec). Samejima’s CRM, factor analysis, convergent validity coefficients and score reliability were used to validate the CBC scale. Multinomial logistic regression and multiple linear regression were used to examine the relationship between CBC scores and fitness performance variables. Results: Factor analysis of the CBC scale variables retained a single factor (loadings > 0.81, 88% explained variance) with strong internal consistency (α=0.96). The CRM analysis indicated all CBC scale variables fit a unidimensional construct with adequate discrimination (as: 0.71–2.16) and difficulty (bs: −0.04–1.44). CBC scores (Mean=0, SD=1.00) displayed strong reliability (SEE.θ =0.22, r.θ =0.95) with lower values representing smaller-more-lean individuals and higher values representing larger-less-lean individuals. All fully adjusted regression models showed significant (ps<0.05) negative relationships between CBC scores and CRF, MPU, and PL and positive relationships between CBC scores and LS and GS. Conclusion: The CRM-derived CBC score is a novel measure of BC and found to be positively associated with strength performance and negatively associated with endurance performance in U.S. adolescent boys.


Introduction
Body composition (BC) concerns the chemical components of the body and is most commonly assessed under the twocompartment model of fat and fat-free mass [1]. Several laboratory-based techniques provide valid measures of percent body fat (PBF), such as hydrostatic weighing, air displacement plethysmography, and dual-energy X-ray absorptiometry (DXA) [2]. However, these said techniques are costly, require extensive clinician training, and can be burdensome to participants. BC assessment using fieldbased techniques can address these shortcomings and include a range of measures such as body mass index (BMI), body girth, as well as PBF via skinfold technique, body girth, and bioelectrical impedance analysis (BIA) [3]. Although field-based techniques add convenience to the assessment of BC, especially when large numbers of participants are considered, their ability to measure consistently is hindered by considerable error [4]. Therefore, an alternative BC assessment that can be conveniently administered to a relatively large number of individuals while providing adequate reliability would be valuable to both researchers and health practitioners.
One way to design such a BC assessment is to consider BC a discernible latent construct that can be measured using several observed tests or items. There are numerous psychometric advantages to measuring a construct using multiple items over a single item. Firstly, multi-item scales allow for the empirical assessment of internal consistency of the scales [5]. Secondly, multi-item scales allow for the canceling out (averaging) of random measurement error, providing increased reliability [6]. Thirdly, multi-item scales allow for greater ability to separate individuals across the construct spectrum, providing greater discrimination and measured information [7]. Fourthly, multi-item scales can include items that target a broader range of a trait, providing greater construct validity [8,9]. Much of multi-item measurement theory has been investigated in the context of written aptitude tests and/or self-reported attitudinal scales with dichotomous and polytomous responses. However, physical traits, such as BC, can also be evaluated using multiple items but which are each measured on a continuous scale. The added advantage of multi-items in this context is the ability to create a BC score that provides for much richer psychometric properties than any one single BC test.
The primary purpose of this study was to develop and validate a novel continuous body composition (CBC) score using item response theory (IRT) for continuous response items, specifically, the continuous response model (CRM). The secondary purpose of this research was to examine the relationship between CRM-derived CBC scores and fitness performance in U.S. adolescent boys.

Study procedures
Data for this research came from the 2012 National Health and Nutrition Examination Survey's (NHANES) National Youth Fitness Survey (NNYFS). The purpose of the 2012 NNYFS was to assess both PA and physical fitness levels in U.S. youth aged 3-15 years [10]. The NNYFS design employed a four-stage probability sample of noninstitutionalized civilian U.S. residents and included 1,640 youth who were interviewed and 1,576 who were examined. NNYFS data are available to the public and organized into categories labeled: Demographics, Dietary, Examination, Questionnaire, and Limited Access. For the current study, Demographic and Examination data only were used. Due to the sex differences in many BC variables, this study was delimited to adolescent boys aged 12-15 years.

Body composition (BC) variables
Eight BC variables were used in this study and each assessed by trained medical personnel using standardized methods [11]. Detailed protocols are explained elsewhere but briefly described here. BC variables included body mass (BM), BMI, arm circumference (AC), waist circumference (WC), calf circumference (CC), calf skinfold (CSF), triceps skinfold (TSF), and subscapular skinfold (SSF). BM was measured in kilograms (kg) using a portable floor scale. BMI was assessed using both BM and standing height (by wall stadiometer) and computed as kg/m 2 . AC was measured in centimeters (cm) and taken at the midpoint of the right upper arm in a relaxed state. WC was measured in cm at a horizontal plane (judged using a mirror) just above the iliac crest. CC was measured in cm at the maximal circumference with participant sitting down. CSF was measured in millimeters (mm) on the inside (medial side) of the lower right leg at the level of the CC point. TSF was measured in mm at the posterior surface midpoint mark of the upper arm. SSF was measured in mm at the inferior angle of the right scapula. All skinfold measurement protocols required a double thickness of skin with underlying adipose tissue.

Fitness performance variables
Five fitness performance variables were used in this study and similarly assessed by trained health professionals using standardized protocols [12]. Briefly, fitness performance variables included cardiorespiratory fitness (CRF), leg strength (LS), modified pull-ups (MPU), grip strength (GS), and plank (PL). CRF was measured using one of five submaximal exercise treadmill protocols varying in speed and grade [13]. Participants were assigned to a specific four-stage protocol based on their age, sex, BMI, and self-reported physical activity readiness (PAR) score. Submaximal heart rate and predicted submaximal oxygen consumption (VO 2 ) during each of the middle two stages were used to estimate participant maximal oxygen consumption (VO 2max ) in mL/kg/min. LS was measured using an isometric knee extension procedure with a practitioner grasped hand-held dynamometer place on the participant's shin [14]. Participants were tested while seated and strapped to a chair so as to isolate the action of the quadriceps. Maximal knee extension force (in pounds) was measured thrice while alternating each leg and allowing a 1 min rest period between trials. Peak force across all six tests was used as a measure of LS in this study. MPU was assessed using a modifiable horizontal bar system positioned 2 inches above the participant's extended reach while lying on their back [15]. During the MPU test, after grasping the bar using an overhand grip, the participant lifts their body upward off the surface while maintaining floor contact with their heels only. The participant flexes at the elbows until their chest reaches a predetermined height (8 inches below the bar) and then extends the elbows to the starting position, while maintain a straight-bodied form throughout the movement. The maximum number of correctly completed pull-ups was used as the measure of MPU in this study. GS was measured while the participant was standing and holding a handgrip dynamometer [16]. After completing a submaximal practice trial, the participant was instructed to squeeze the handgrip device as hard as they could. Each hand was tested three times, alternating hands with a 1 min rest between trials. The summed force (in kg) of each hand's maximum score was used as a measure of GS in this study. Finally, PL was assessed using a front plank position with the participant lying face down with only their forearms and toes touching the floor while maintaining a straight back [17]. The isometric position of the front plank is held for as long as possible. The maximum number of seconds the plank was held was used as a measure of PL in this study.

Demographic variables
In order to control for possible demographic confounding, age, race, and income were used in this study. Age was used as a numeric covariate ranging from 12 to 15 years. Race was used as a categorical covariate and comprised the following four groups: (1) Non-Hispanic White, (2) Non-Hispanic Black, (3) Mexican/Hispanic, and (4) Other Races and Multi-racial. Finally, income was used as a numeric covariate, collected as family income, and comprised 12 different income brackets ranging from 1=$0-$4,999 to 12=$100,000 and over.

Continuous response model (CRM)
Item response theory (IRT) is a modern approach to scale validation that entails modeling the relationship between observed responses to items and the latent trait purported to be measured by those items [18]. There are many benefits to IRT over traditional methods and can be found elsewhere [19]. Briefly, with the use of IRT: (1) models can be tested for appropriate fit to data, (2) item parameters are considered invariant to changes in persons (i.e., regardless of sample characteristics), and (3) person parameters are considered invariant to changes in the test (i.e., easy vs. difficult tests). Additionally, an IRT model can be depicted as a monotonic plot of the probability of endorsing an item in relation to a person's ability [20]. Most IRT models used by researchers are for either dichotomous [21] or polytomous [22] response items [23]. Despite being neglected by researchers, IRT may also be employed with continuous response items. Specifically, Samejima introduced the continuous response model (CRM) as a limiting form of the graded response model (GRM) [24,25]. The three parameters CRM used in this study [26] is specified for a single person and single item as and a denotes the discrimination parameter, b the difficulty parameter, and α the scaling parameter linking the original response scale to the latent trait scale (θ). Similar to the 2-parameter logistic (2 PL) IRT model [27], a parameter represents the steepness of the item's probability curve. (In the context of polytomous response items, these probability curves are called category characteristic curves (CCC) or operating characteristic curves (OCC) if category boundaries are modeled in cumulative fashion). As well, the b parameter represents the location of the OCC curves on the θ scale. Unique to the CRM, however, is the α parameter which represents the distance between an item's OCC curves. Also unique to the CRM is that regardless of the item's scale range (0-k), only three item parameters are estimated (i.e., a, b, and α). Whereas, the GRM estimates a step parameter for each response category of a polytomous item. In summary, the above CRM represents the probability of a respondent with a specific θ obtaining a score of x or higher on a particular item with a continuous measurement scale ranging from 0 to k, given the parameters a, b, and α.

Statistical analyses
The statistical analysis plan was separated into two phases. Phase I concerned the development and validation of the continuous body composition (CBC) score. Phase II concerned examining the relationship between CBC scores and fitness performance variables. For phase I, exploratory data analysis was conducted and descriptive statistics with bivariate correlation coefficients computed for the eight BC scale variables. During this step, three cases were identified as multivariate outliers, with unusually large Mahalanobis Distance values (chisquare ps<0.001) [28] -and were removed from the analysis. Factor analysis was also performed to ensure the BC variables measured a unidimensional trait. The eigenvalue greater than 1.00 criteria was used to retain factors [29,30]. Additionally, item-test correlations, Cronbach alpha, and alpha with item deleted were used to examine validity of scale items [31]. The final step in phase I involved fitting the CRM to the data where item parameters, test information, standard error of estimate, and score reliability were examined along with threedimensional CCCs [32,33]. Scores were outputted from the descriptive statistics (standardized sum scores), factor analysis (factor scores), and CRM (CBC scores) steps. For phase II, exploratory data analysis was performed and descriptive statistics computed for the five physical fitness variables. Multinomial logistic regression was run to estimate the fitness-related odds of being in the lowest CBC tertile relative to middle and then highest CBC tertiles [34]. Finally, multiple linear regression was run to examine the relationship between fitness performance variables and continuous CBC scores [35]. Analyses were weighted to produce generalizations representative of noninstitutionalized U.S. boys aged 12-15 years [36,37]. SAS version 9.4, SPSS version 26, and R version 3.5 were used for all analyses [38][39][40]. Additionally, an item's difficulty (b) and discrimination (a) can be inspected in the CCCs with the center of the curves representing the item's b and the spread of the curves relative to the response scale representing its a. Each CCC comprising the CBC scale indicated proper item functioning. Table 4 contains additional CBC scale validity evidence with bivariate correlations between the new CRM-derived CBC scores (CBC), CBC scale item factor scores (SFS), and standardized scale sum scores (SSS) [40]. All correlation coefficients were strong (rs>0.96) and significant (ps<0.05). In particular, the strong correlation between CBC and SSS presents additional evidence for the use of simple SSS in BC   Table 6 contains descriptive statistics for the fitnessrelated predictor variables. MPU (Mean=10.7, SE=0.89 kg, CV=8.3%) displayed the greatest amount of variability in adolescent boys. Table 7 contains results from the multiple regression modeling using fitness variables to predict the continuous form of CBC scores. Models 1 through 5 are Hart: Fitness performance and body composition single fitness predictor models, adjusted for age, race, and income. These initial models each saw fitness values significantly (ps<0.05) predict CBC scores, with GS (β=0.54, p<0.0001, R 2 =0. 30) showing the largest explained variance. Additionally, CRF, MPU, and PL each displayed a negative relationship with CBC. Whereas, LS and GS displayed a positive relationship with CBC. Model 6 is a fully adjusted model that included all five fitness predictors. Model 6 saw two fitness predictors (CRF and PL) drop below the level of significance (ps>0.05). In model 7, after removing PL from the previous analysis, the remaining four fitness predictors significantly (ps<0.053) predicted CBC scores with large explained variance (p<0.0001, R 2 =0.45). Table 8 contains results from the multinomial logistic regression analyses. Tabled values represent the fitnessrelated odds of being in the lowest CBC tertile relative to middle tertile and then relative to the highest CBC tertile. Models 1 through 5 are again single fitness predictor models, adjusted for age, race, and income. These initial models for both sets of logistic regressions each saw fitness values significantly (ps<0.

Discussion
The primary purpose of this study was to develop and validate a CBC score using the IRT-based CRM. Results clearly showed that eight individual anthropometric items represent a unidimensional BC trait. With the eight-item BC scale displaying adequate construct validity and internal consistency using classical test theory methods. Furthermore, scale items fit the CRM, indicating adequate item functioning and reliable BC measurement. These results are the first to present the development and validation of a multi-item BC assessment in adolescent boys. As previously mentioned, there are many advantages in using a multi-item assessment to measure a relatively complex trait. In context with this study's results, these advantages include (1) an improved association between the CBC scores and the latent BC trait, (2) an increased ability to measure individuals across a wider BC trait spectrum, (3) a reduced measurement error associated with BC assessment, and (4) an increased amount of measured information that allows for greater categorization of individuals [41]. The need for these improved measurement properties can be underscored by examining the literature for consistency among the commonly used BC assessments. For example, a recent study using over 700 adolescents, compared the use of BMI and percent body fat (criterion measure) in classifying individuals into overweight and obesity categories [42]. Results of this study showed that the BMI-derived overweight and obesity classification underestimated the prevalence in both categories, as compared to the PBF-derived classification estimates. These results highlight that two of the more common forms of BC assessment do not agree in terms of evaluating  adolescents. Several other studies support the same divergence between BC assessments in adolescents [43][44][45][46][47]. Consequently, the development and validation of the CBC scale provides a more psychometrically sound assessment of overall BC in adolescent boys. The secondary purpose of this research was to examine the relationship between CRM-derived CBC scores and fitness performance in U.S. adolescent boys. Results showed unequivocally that CBC is independently related to cardiorespiratory, muscular endurance, and muscular strength fitness components. Specifically, negative relationships were found between CBC and cardiorespiratory performance and muscular endurance performance. Conversely, positive relationships were seen between CBC and muscular strength performance. These findings are consistent with findings using single-item BC measures. For example, a large population-based fitness study of Latin-American adolescents examined the relationship between cardiorespiratory performance scores and BC [48]. Results from this research showed that negative associations were found between peak oxygen uptake and measures of BMI, WC, and weight-to-height ratio in adolescent boys. Several other studies support this indirect CRF and BC relationship in adolescents [49][50][51][52][53]. Additionally, a recent study examined the associations between BMI as a measure of BC and muscular strength in adolescents with obesity [54]. Results from this research indicated that BMI in adolescents was positively associated with muscular strength, as measured by eight-repetition maximum bench press and leg press tests. Other studies converge around these positive associations of muscular strength and BC results [55][56][57]. Several studies also support the negative association between muscular endurance and BC [58][59][60]. Altogether, the findings from the second part of this study serve as additional validity evidence for the CBC scale, given its ability to detect known fitness performance relationships in adolescent boys.
There are some future research suggestions for the CBC scale worth mentioning. Although internal consistency and score reliability were both established in this study, future studies should examine the stability of the CBC scale over time. Also, a more in-depth construct validity study is recommended to determine the extent to which CBC scores can detect differences in certain athletes with known BC profiles (e.g., wrestlers, cross-country runners, gymnasts, etc.). Finally, future studies should evaluate a similar CBC scale in adolescent girls.
One strength regarding this current study is its use of IRT to validate a novel CBC scale. IRT is increasingly being applied in the health sciences to assess the measurement properties of latent behavioral, attitudinal, and patientreported outcome scales [61,62]. The use of this modern psychometric analysis to validate the CBC scale, and specifically the use of CRM, sets a precedent for the application   of multi-item scales to assess a unidimensional BC trait. Another strength of this study was its use of a nationally representative sample of U.S. boys ages 12-15 years, increasing this study's external validity. A final strength worth mentioning is the use of objectively measured BC and objectively measured physical fitness measures assessed by trained medical professionals, distinguishing these findings from other studies utilizing self-reported measures and non-standardized methods. Despite these strengths, there are some limitations worth declaring. The NNYFS is not of continuous or longitudinal nature and therefore the results of this cross-sectional study should only be considered as correlational. Another limitation of this study was its inability to control for puberty. It is possible that some adolescents of the same BC profile (response pattern) differed in fitness characteristics due to differences in natural hormones. Therefore, owing to these limitations, study results should be considered with caution.

Conclusions
This study presents development and validation evidence for a multi-item BC scale, resulting in a novel CBC score for adolescent boys. The psychometric evidence supports the use of simple standardized sum scores, across individual BC assessments, as a sufficient measure of CBC. Moreover, CBC scores were found to be positively associated with strength performance and negatively associated with endurance performance in U.S. adolescent boys. Health promotion specialists should be aware of the advantages of using multi-item scales to assess BC for evaluation.
Research funding: None declared. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.