Self-assessed puberty is reliable in a low-income setting in rural Pakistan

Objectives: Staging sexual maturation is an integral component of adolescent research. The Pubertal Development Scale (PDS) is commonly used as a puberty selfassessment tool because it avoids the use of images. Among the youth living in rural Pakistan, we determined the accuracy of self-reported pubertal assessments using a modified PDS compared to the ‘gold standard’ of physically assessed Tanner stages by a physician. Methods: The strength of agreement between selfassessed puberty using a modified PDS and the ‘gold’ standard of physician-assessed Tanner stageswas reported using weighted kappa (κw) for girls (n = 723) of 9.0–14.9 years of age or boys (n = 662) of 10.0–15.9 years of age living in the rural District of Matiari. Results: Agreement between the gold standard and selfassessment for puberty was substantial, with a κw of 0.73 (95%confidence interval [CI]: 0.67; 0.79) for girls and a κw of 0.61 (95% CI: 0.55; 0.66) for boys. Substantial agreement was observed for both boys and girls classified as thinness but only for girlswith a normal bodymass index. Thosewho were classified as severely thin had moderate agreement. The prevalence of overestimation was 18.5% (95% CI: 15.9– 21.5) for girls and 2.7% (95% CI: 1.7–4.3) for boys, while the prevalence of underestimation estimation was 8.0% (95% CI: 6.2–10.2) for girls and 29.0% (95%CI: 25.8–32.6) for boys. Conclusions: Most girls and boys assessed their pubertal development with substantial agreement with physician assessment. Girls were better able to assess their puberty, but they were more likely to overestimate. Agreement for boys was also substantial, but they were more likely to underestimate their pubertal development. In this rural Pakistan population, the PDS seems to be a promising tool for self-assessed puberty.

The transition period between childhood and adulthood encompasses adolescence and the biological process of sexual maturation called puberty. The timing of puberty is widely used to assess the impact of exposures like nutrition and environment on the health of a youth [1]. Although pubertal assessment can be highly sensitive, generally, it is accepted to be an important factor for the correct diagnosis of certain health conditions during adolescence. Additionally, the disruption of typical pubertal timing is associated with a higher risk of many disease outcomes later in life including cardiovascular and metabolic disorders, as well as compromised adult height and bone density in both men and women [2][3][4]. These adverse outcomes underscore the need to determine variations in pubertal milestone timing through accurate puberty measurements.
Pubertal timing refers to the age of attainment of secondary sexual characteristics. Sexual maturity rating, or Tanner stages, classify pubertal development on a five-point graded scale for breast and pubic hair development for girls and genital and pubic hair development for boys [5,6]. The 'gold standard' for determining Tanner stages is the physical examination by a health-care professional. However, physical assessments are resource dependent and may be impractical for large cohort studies. Furthermore, owing to the sensitive nature of the physical assessment, children and parents may be less likely to consent to this method of examination over self-assessment. Alternatively, selfassessment tools have been used in the assessment of puberty development. Standardized tools for self-assessment do not exist, but the most common tool includes a series of photographs or line drawings representing the five Tanner stages. When it is not appropriate to show images to participants, the Pubertal Developmental Scale (PDS) [7] has been used extensively-a questionnaire-based measure of pubertal events that avoids the sensitive aspects of a physical exam or visually explicit sexual maturation diagrams.
This study aims to determine the accuracy of self-reported pubertal assessments using a modified PDS compared to the 'gold standard' of physically assessed Tanner stages by a physician among the youth living in rural Pakistan.

Methods
A cross-sectional study among girls (n = 723) of 9.0-14.9 years of age and boys (n = 662) of 10.0-15.9 years of age living in the rural District of Matiari, Pakistan, was undertaken between January 2019 and February 2020. Matiari has a population of about 770,000, which is representative of rural conditions in Pakistan [8]. Anthropometric measures, sexual maturation, and other participant characteristics were assessed. Details regarding the Nash-wo-Numa (growth and development) study protocol are published in the study by Campisi et al. [9]. Ethics approval has been granted by the Ethics Review Committee at the Aga Khan University, Karachi, Pakistan, 5251-WCH-ERC-18, and Research Ethics Board at SickKids Hospital, Toronto, Canada: 1000060684 (trial registration number: NCT03647553).
The puberty assessment module was composed of two components: (a) self-assessed puberty staging based on a modified PDS and (b) physically assessed Tanner staging by same-sex study physicians [5][6][7]. Owing to the sensitive nature of puberty assessment, additional parental consent and participant assent were obtained specifically for this module of the Nash-wo-Numa study.

Self-assessment of pubertal milestones
The PDS is a self-reported questionnaire without the use of images for children and adolescents to report the events related to the development of secondary sexual characteristics [7,10]. It encompasses a set of questions about acne, growth and axillary hair. Additionally, facial hair and voice changes are included for males and breast development and menstruation for females. The PDS is not intended to provide a one-to-one translation to the five Tanner Stages [11] as it measures pubertal events that are not considered by Tanner ratings like voice break in males and menarche. In the current study, modification to the PDS included removal of the questions about the pubertal growth spurt and acne. Supporting these modifications is a systematic review that observed a weak association of acne with timing of puberty especially among different ethnic groups [12] and two studies that reported poor correlations between growth spurt and PDS scores [13,14]. All responses were dichotomous (yes/no) and categorized onto a three-point scale outlined by the UK Royal College of Physicians [15].
Self-assessment was obtained through an interview by the samesex physician as literacy is typically low as indicated by low levels of education completion in this study. The interviewer-administered PDS has been implemented previously in similar research settings [14,16]. Additionally, before administering the PDS, physicians explained puberty and the related body changes using a standard script.

Physician assessment of pubertal milestones
Physical assessments using the Tanner criteria have been deemed the 'gold standard' for assessment of sexual development [17]. Considering local cultural norms, Tanner stage assessment was carried out by a same-sex study physician while each participant was lightly clothed in a private clinical examination room. This methodology was adopted from a previous study on school-age children in an urban slum setting in Karachi, Pakistan [18]. Originally limited to a visual assessment for breasts, the current consensus indicates the addition of palpation to distinguish breast tissue from adipose tissue, in order to avoid the overestimations of breast staging in overweight girls [19]. Accordingly, in such cases, the current study employed palpation in breast assessment.
Tanner stage assessment training for study physicians was conducted prior to the start of the study and at regular intervals for the duration of the study. It included hands-on training conducted by a pediatric endocrinologist (K.N.H.) and an online module with an evaluation component designed at the Hospital for Sick Children proven to improve accuracy, confidence, and comfort in pubertal examinations [20,21]. Owing to the limited capacity of this study, the physical Tanner stage assessment was carried out by the same physician who administered the PDS. One male conducted all male assessments, and one female physician conducted all female assessments.

Statistical analysis
Responses to the PDS were converted into three phases of puberty based on the criteria outlined by the Royal College of Paediatrics and Child Health in the UK [15]. Physician-assessed Tanner stages were converted to the three puberty phases as follows: prepuberty, Tanner stage 1; in puberty, Tanner stage 2 + 3; and completing puberty, Tanner stage 4 + 5. Body mass index (BMI) and height were converted to BMI z-scores (BMIZ) and height-for-age z-scores (HAZ) using World Health Organization Growth References for Children and Adolescents [22].
To determine the strength of agreement between self-assessed puberty and physician-assessed puberty, we report weighted kappa (κ w ) along with percent agreement as per the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) by Kottner et al. [24]. The linear weighting scheme by Warrens [25] was applied in the calculation of κ w . Agreement analyses were conducted for all participants and subgroups as per BMI categories and HAZ categories. As there were very few (boys, n = 8; girls, n = 3) obese or overweight participants, the agreement analysis was not conducted on this BMI category. The strength of agreement criteria for the κ w statistic was determined using Landis [26] benchmarks wherein κ w < 0.01 is considered poor; κ w = 0.01-0.20 is considered slight; κ w = 0.21-0.40 is considered fair; κ w = 0.41-0.60 is considered moderate; κ w = 0.61-0.80 is considered substantial; and κ w > 0.8 is considered almost perfect. The proportion of underestimation and the proportion of overestimation when comparing the self-assessed PDS against physician-assessed puberty for the overall sample and BMI category and stunting status are also reported. Participant characteristics (mean, 95% confidence interval [CI], or n, %) stratified by agreement level (correctly assessed, overestimated, or underestimated) for girls and boys were calculated. We determined mean values to be significantly different if the 95% confidence intervals did not overlap. All analysis was performed in Stata version 14.2.

Results
This analysis represents 1,385 children and adolescents (52.2% females). Participant characteristics are presented in Table 1. The overall κ w for girls was substantial at 0.73 (95% CI: 0.67; 0.79) and for boys was substantial at 0.61 (95% CI: 0.55; 0.66). Bodyweight classified as normal (girls only) or thinness had a substantial agreement. Conversely, those who were classified as severely thin had moderate agreement ( Figure 1). As there were too few participants classified as overweight or obese, agreement was not calculated for this bodyweight classification. Overall, 3 × 3 puberty phase tables as well as 3 × 3 tables for each BMI category and stunting status by sex are presented in Supplementary Tables 1-5.
Girls were more likely to overestimate, while boys were more likely to underestimate their pubertal development. The prevalence for overestimation was 18.5% (95% CI: 15.9-21.5) for girls and 2.7% (95% CI: 1.7-4.3) for boys. Conversely, the prevalence for underestimation estimation was 8.0% (95% CI: 6.2-10.2) for girls and 29.0% (95% CI: 25.8-32.6) for boys. The only participant characteristics that differed by the level of agreement (overestimation or underestimation) were age, weight, and education level. The mean age of girls was significantly greater at 12.53 years (95% CI: 12.35; 12.81) among those who overestimated their pubertal development than among those who correctly assessed their pubertal development at 11.96 years (95% CI: 11.8; 12.11), whereas the mean age of 13.76 years (95% CI: 13.59; 13.93) for boys who underestimate their pubertal development was significantly greater than those who correctly assessed their pubertal development, with a mean age of 12.44 years (95% CI: 12.28; 12.59) (Supplementary Table 6). For girls, the mean weight (kg) was significantly lower at 26

Discussion
The results of the current study indicate that the majority of girls and boys living in Matiari can assess their pubertal development with substantial agreement to clinician assessment when using a questionnaire tool without images like the PDS. Although not significant owing to the small number of participants in the severely thin category, selfassessment among such participants had the lowest agreement with physician-assessed puberty regardless of sex ( Figure 1). While this may reflect the inability to detect breast tissue with low bodyfat in girls, these results cannot be similarly explained in boys. Not surprisingly, stunting did not impact the ability to self-assess puberty. Stunting did not impact the ability to self-assess puberty. Overall, female selfassessment was more accurate among Matiari participants.
Few studies have reported the agreement between youth-assessed puberty using the PDS and clinicianassessed puberty [13,14,27,28]. Furthermore, comparing the current study to those in the literature is problematic owing to the use of different agreement statistics and to the use of five-point over three-point puberty development scales. Two recent studies reported agreement between youth-assessed puberty using the PDS and clinicianassessed puberty use a three-point scale for puberty similar to the current study. One study conducted in the USA reported a moderate strength of agreement (κ = 0.57; 95% CI: 0.49-0.66), but this kappa coefficient was unweighted and not stratified by sex thereby limiting the sex-wise comparison to the current study [27]. Conversely, a Brazilian study reported high levels of agreement but used intraclass correlation coefficients of 0.82 for boys (95% CI: 0.70-0.90) and of 0.81 for girls (95% CI: 0.70-0.89) [13]. Older PDS agreement studies map the PDS onto the five-point Tanner stages. These studies report a lower strength of agreement, which can be related to the differences between using a fivepoint scale over a three-point scale. A study from China reported moderate strength of agreement (κ w = 0.57; 95% CI: 0.44, 0.71) for girls and (κ w = 0.58; 95% CI: 0.47, 0.69) for boys [28] despite providing the youth with Tanner stages (using images) before completing their PDS assessment. A study from Soweto, South Africa, also mapped the PDS onto the five Tanner stages and reported a fair agreement (κ = 0.34), but this kappa coefficient was unweighted and only for girls [14]. The authors suggested that possible reasons for the low agreement included were a lack of awareness of pubertal development and low literacy in Soweto. Possible reasons for the greater strength of agreement between the current study and the aforementioned studies include mapping the PDS onto a three-category puberty scale, the use of dichotomous responses, and the explanation of puberty by study physicians before the PDS.
A recent systematic review reporting the agreement between self-reported to clinician-based physical assessment of Tanner Stages observed greater agreement when more details were provided to the youth [29]. In the current study, participants were provided additional pubertal information through the mini-education physician script/ dialogue which may have contributed to a better agreement. Also consistent with this recent systematic review was the finding related to lower strength agreement among boys. Irrespective of the puberty self-assessment tool, boys seem to be less reliable than girls in accurately assessing pubertal development. One possible explanation is that changes in boys like voice change and testicular development are more subtle, whereas for girls, breast development and menarche are more easy to assess.
Inaccurate self-assessments seemed to be impacted by age and by weight among girls and boys. It is possible that these observations are the result of bias observed in the   reporting of socially desirable norms but need further validation in this population. The bias to report 'socially desirable' norms has been reported in adolescent literature for the misreporting of self-assessed body weight and height [30]. One of the most important strengths of the current study pertains to its large sample size from an under-represented rural low-and middle-income countries (LMIC) population. Previous studies examining the agreement between the PDS and clinician-assessed Tanner stages contained sample sizes of <300 participants. The current study also used one physician per sex, thereby avoiding inter-rater variability, thus eliminating potential bias due to the rater. Additionally, physicians underwent periodic retraining and evaluation using the online assessment module [20,21]. It is possible that low school attendance and low literacy rates among participants in the current study could have contributed to a lack of knowledge around sexual development and negatively impacted the ability of the participants to self-assess pubertal development, although this seems to have been largely mitigated by the study design including puberty education before the PDS puberty assessments. Because the PDS does not translate directly into Tanner stages, this limitation should be considered in light of various clinical situations and research questions. A limitation of the current study is the low number of overweight and obese participants, for which accuracy of the modified PDS could not be calculated. Validation of the modified PDS in similar settings with participants whose BMI is higher is necessary.
With more than 2,000 citations, the PDS is an extremely popular tool used to assess puberty. Surprisingly, there is limited evidence in the literature assessing the agreement between youth-assessed puberty using the PDS and clinician-assessed puberty using Tanner stages mapped onto a five-point or three-point scale. In the current study, we found the PDS to be a promising tool when exact correspondence to Tanner stages is not necessary and when accompanied by a short puberty education script. In LMIC settings, administering a modified PDS could inform general clinical practice and research, where the invasive physical exam is difficult to administer owing to cultural sensitivity, lack of same-sex physicians, or prohibitive costs.