Prostate health index (PHI) as a reliable biomarker for prostate cancer: a systematic review and meta-analysis

Objectives: Prostate cancer (PCa) represents the second most common solid cancer in men worldwide. In the last decades, the prostate health index (PHI) emerged as a reliable biomarker for detecting PCa and differentiating between non-aggressive and aggressive forms. However, before introducing it in clinical practice, more evidence is required. Thus, we performed a systematic review and meta-analysis for assessing the diagnostic performance of PHI for PCa and for detecting clinically significant PCa (csPCa). Methods: Relevant publications were identified by a systematic literature search on PubMed and Web of Science from inception to January 11, 2022. Results: Sixty studies, including 14,255 individuals, met the inclusion criteria for our meta-analysis. The pooled sensitivity and specificity of PHI for PCa detection was 0.791 (95%CI 0.739 – 0.834) and 0.625 (95%CI 0.560 – 0.686), respectively. The pooled sensitivity and specificity of PHI for csPCa detection was 0.874 (95%CI 0.803 – 0.923) and 0.569 (95%CI 0.458 – 0.674), respectively. Additionally, the diagnostic odds ratio was 6.302 and 9.206, respectively, for PCa and csPCa detection, suggesting moderate to good effectiveness of PHI as a diagnostic test. Conclusions: PHI has a high accuracy for detecting PCa and discriminating between aggressive and non-aggressive PCa. Thus, it could be useful as a biomarker in predicting patients harbouring more aggressive cancer and guiding biopsy decisions.


Introduction
Prostate cancer (PCa) represents the most common solid tumour in men over 60 years and the second leading cause of cancer death in men, after lung cancer [1].
PCa is a very heterogeneous disease characterised by a wide spectrum of clinical manifestations, ranging from clinically insignificant forms to lethal castration-resistant ones. It has been estimated that more than 50% of patients has a low risk of progression [2]. In these patients, active surveillance instead of a radical surgery procedure is recommended. Noteworthy, the over-diagnosis and overtreatment of indolent tumours is major trouble associated with PCa. Thus, the early identification and the appropriate management of the patients is fundamental. In this scenario, laboratory medicine has a key role. Worldwide, the PCa screening is based on the use of the prostate-specific antigen (PSA). It is a serine protease, which physiologically dissolves seminal clots. The circulating PSA consists of 80-95% complexed forms and the small remaining proportion of free form. The test for measuring total PSA (tPSA) levels, including both complexed and free PSA (fPSA), was developed and approved by the Food and Drug Administration for PCa over 30 years ago [3]. However, the PSA-based screening has several drawbacks. First, PSA is organ-specific and not cancer-specific. Although it has high sensitivity, it has poor specificity and low positive predictive value (PPV), resulting in unnecessary biopsies. Additionally, PSA cannot accurately identify aggressive PCa [4], leading to over-diagnosing and over-treatment in patients with low-risk disease that may not require active clinical intervention. Indeed, up to 42% of PCa detected based on PSA are clinically insignificant. Consequently, the identification of patients with clinically significant PCa (csPCa), which requires treatment, is one of the main concerns in daily practice. Finally, PSA levels are influenced by several factors, such as benign prostatic hyperplasia, infection, age, and drug [5,6]. Thus, there is active research for identifying reliable biomarkers to guide Clinicians in the detection of PCa and its aggressive forms to appropriately treat the patient.
In the last decades, a role for the different forms of PSA has emerged. In the early 1990s, literature evidence showed that increased levels of fPSA are commonly associated with benign conditions [7,8]. Noteworthy, fPSA consists of three different forms: benign PSA, intact inactive PSA, and proPSA. Among these, proPSA is the form associated with PCa. proPSA has several molecular isoforms, including [−2], [−4], and [−5, −7] [9]. The [−2] proPSA (p2PSA) is the most stable in serum. In 2010, the Beckman Coulter introduced an automated immunoassay for its detection and developed an index, namely the prostate health index (PHI), which is calculated by a mathematical combination of the values of tPSA, fPSA, and p2PSA, according to the following formula: (proPSA/fPSA)×√tPSA. In 2012, the FDA approved PHI for PCa detection in men with the following characteristics: (i) older than 50 years; (ii) PSA between 4 and 10 µg/L; (iii) or a non-suspicious digital rectal examination (DRE) [10]. Additionally, some Authors showed that PHI outperforms tPSA and fPSA in the detection of csPCa [11,12].
Although several Authors showed that PHI has good analytical performance for detecting PCa, the European Association of Urology stated that there is too limited evidence to implement these tests into routine screening programs [13]. Also, the American Urological Association has declared that more evidence is required to confirm the reliability of PHI to decrease the number of unnecessary biopsies while keeping the capacity to detect csPCa [14].
The aim of this study was to assess the accuracy of PHI for detecting PCa and identifying csPCa.

Materials and methods
We followed the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) guidelines 2020 [15]. All studies investigating the diagnostic efficacy of PHI for PCa detection were searched for inclusion.

Literature search strategy
Two reviewers systematically and independently (LA and MV) performed a comprehensive electronic search of PubMed and Web of Science. The following Medical Subject Heading (MeSH) terms "Prostate Health Index", "PHI", "cancer prostate" and "tumor prostate" were used to search articles. No publication date restriction was applied, and the date of our search was until January 11, 2022.

Study selection
The inclusion criteria were: (i) retrospective and prospective study design; (ii) English language; (iii) sufficient data provided to calculate the outcome; (iv) PCa diagnosis confirmed on biopsy.
Exclusion criteria were: (i) evaluation of only the prognostic role of PHI; (ii) lack of evaluation of PHI accuracy; (iii) letters, case reports, animal studies, reviews, and meta-analysis (vi) other languages than English; (v) full-text not found.

Data collection
Two authors (LA and MV) independently collected data referring to studies and patient characteristics. The extracted information from each study included the first author's name and year of publication, study design, inclusion criteria, study population, nr of positive biopsy, nr of csPCa, the calibration system used (WHO vs. Hybritech), PHI cut-off value, outcome data [sensitivity, specificity, true positive (TP), false negative (FP), true negative (TN), false positive (FP)].

Statistical analysis
Meta-analytical summaries of PHI performance were calculated following the bivariate binomial approach by fitting a generalized linear mixed model (GLMM) [16][17][18]. Summary pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio and diagnostic odds ratio (DOR) were calculated by R Language v. 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria) and RStudio IDE v.1.3.1093 (RStudio, PBC, Boston, MA) with the lme4, mada and meta packages [19]. Pooled results were confirmed by importing data into the interactive application MetaDTA (Diagnostic Test Accuracy Meta-Analysis v. 2.01) hosted on the shinyapps server and available at https://crsu.shinyapps.io/dta_ma/ [20,21]. Hierarchical summary receiver operating characteristic (HSROC) model parameters estimated by MetaDTA (lambda or accuracy parameter, theta or cut-point parameter, beta or shape parameter, the variance of the accuracy parameter and the variance of threshold parameter) were imported into the software Review Manager (RevMan) v. 5.4.1 (The Cochrane Collaboration, 2020) to obtain the HSROC plots [22]. Heterogeneity across the studies was evaluated by plotting sensitivities and specificities, together with their 95%CI, by Forest and Crosshair plots [23] and by inconsistency index (I2), calculated as 100%*(Q − df)/Q, where Q is Cochran's heterogeneity statistic and df the degrees of freedom. Publication bias was evaluated by funnel plot and Deeks's formal test.

Study selection
The process of study selection is schematically presented in the PRISMA flow diagram (Figure 1). After the removal of duplicates, a total of 371 articles were obtained. After screening the title and abstracts, 273 studies were excluded because they were literature review, case reports, letters, or meta-analysis; they did not measure PHI; they did not evaluate the diagnostic accuracy of PHI for PCa detection. The full text of 92 studies was further evaluated. Finally, a total of 60 studies were included, 42 for PCa and 18 for csPCa analysis.

Study characteristics and quality assessment
The main characteristics of all the studies included in the meta-analysis are reported in Table 1.
The diagnostic performances of the studies for PCa and csPCa analysis are described in Tables 2 and 3, respectively. For PCa studies (n=42), the sample size included was between 50 and 1,538, with cut-off, sensitivity and specificity ranging, respectively, from 21.3 to 63.9, from 0.380 to 0.945 and from 0.213 to 0.963 (Table 2). For csPCa studies (n=18), the sample size included was between 43 and 1,538, with cut-off, sensitivity and specificity ranging, respectively, from 17.8 to 67.6, from 0.533 to 1.000 and from 0.211      The HSROC plots in Figures 7 and 8 report the points representing the sensitivity-specificity pairs of the single PCa and csPCA studies, the summary operating point (summary values for sensitivity and specificity) and the summary ROC curve, together with the 95% confidence region around the summary operating point and the 95% prediction region.

Discussion
In this systematic review and meta-analysis, we evaluated the accuracy of PHI as a biomarker of PCa and csPCa by analysing results from 60 studies, including a total of 14,255 individuals. The main findings of our meta-analysis can be summarised as follows: (i) the pooled sensitivity and specificity of PHI for PCa detection were 0.791 (95%CI 0.739-0.834) and 0.625 (95%CI 0.560-0.686), respectively; (ii) the pooled sensitivity and specificity of PHI for csPCa detection were 0.874 (95%CI 0.803-0.923) and 0.569 (95% CI 0.458-0.674), respectively; (iii) DOR was 6.302 and 9.206, respectively for PCa and csPCa detection, suggesting moderate to good effectiveness of PHI as a diagnostic test. Overall, our findings suggest that PHI has a high accuracy for detecting PCa and discriminating between aggressive and non-aggressive PCa. Thus, it could be useful as a biomarker in predicting patients harbouring more aggressive cancer and guiding biopsy decisions.
The early detection of PCa and the discrimination between benign and malignant forms is fundamental for the appropriate intervention. The gold standard for PCa diagnosis remains the biopsy. However, the laboratory has a key role in early identifying patients at high risk of PCa, eligible for biopsy. The most widely used screening biomarker worldwide is PSA. In the past, a one-size-fits-all approach based on PSA was used for early identifying PCa and consequently determining the need for prostate biopsy in all men. However, PSA is characterised by a low specificity for PCa and it is not associated with the aggressiveness of cancer. In the last decades, multiparametric magnetic resonance imaging (mpMRI) of the prostate has emerged as the gold standard for predicting positive biopsy [73]. The Prostate Imaging Reporting and Data System (PI-RADS) score released by an international collaboration of the American College of Radiology (ACR) and European Society of Uroradiology (ESUR) in 2015 is a structured reporting schema that helps to determine the risk of csPCa on prostate mpMRI. The PI-RADS score ranges from 1 to 5 and it should be interpreted as follows: 1-2=low risk of PCa; 3=intermediate risk of PCa; and 4-5=high risk of PCa. PI-RADS 3 represents a "gray zone", with only 15% of patients having PCa. Additionally, PPV has been reported to be 0.49 for csPCa, and a few patients with a negative mpMRI have high-grade PCa [74]. Thus, mrMRI presents some limitations in selecting patients to undergo biopsy [75][76][77]. It should also be considered that mpMRI is an expensive tool and requires an experienced radiologist. The drawbacks of PSA and mpMRI could be overcome by the most recent developed index PHI.
The latter should be used in clinical practice as a complementary test to PSA and mpMRI. Indeed, PHI should be evaluated when PSA has a value within the "gray zone", between 2 and 10 µg/L, allowing to spare unnecessary biopsies and to select patients for active surveillance. Similarly, it could be used when a PI-RADS 3 is obtained. Some Authors also tested if PHI could be used as an alternative test to mpMRI, but less evidence is available to date [25,27,61].
Interestingly, our data show that PHI could reliably detect patients with more aggressive PCa. The association between PHI and PCa aggressiveness is supported by literature evidence. Several Authors reported a significant correlation between PHI levels and histological features of tumor malignancy, such as grade, stage, and volume, evaluated after radical prostatectomy [78,79]. Additionally, some Authors showed that PHI could predict the biochemical recurrence (BCR) of the PCa [80,81]. The performance of PHI for predicting csPCa has been evaluated alone or in combination with other tools. Hsieh et al. showed that the combination of mpMRI and PHI has a better predictive power for csPCa than PHI and mpMRI alone and would have avoided up to 50% of biopsies while missing only one csPCa patient [35]. Kim et al. proposed a strategy based on the use of PHI as a triage test for identifying patients eligible for mpMRI and/or biopsy [28]. Such a strategy could be effective, efficient, and cheap, allowing the selection of only high-risk patients for more laborious and expensive investigations. Foj et al. recently developed a nomogram also incorporating PHI to address the individual probability of aggressive PCa in patients at biopsy [82]. Similarly, Loeb et al. developed a nomogram including PHI [56], showing that adding PHI to currently available risk prediction tools significantly improved the prediction of aggressive prostate cancer.
Some observations should be made because some issues hamper the introduction of PHI in clinical practice.
First, there is no consensus on the optimal decisional cutoff for both detecting PCa and csPCa, with a high variability of proposed PHI values, ranging from 21.33 to 63.9 for PCa and from 26.7 to 67.6 for csPCa. This could be related to the high heterogeneity among studies in terms of sample size, inclusion criteria adopted, and the use of different calibrations (Table 1). Specifically, the Beckman Coulter gives the possibility to calibrate the PSA according to the Hybritech method or the WHO standard. However, there is a discrepancy of 16-20% between the PHI values obtained using the two calibrations, with WHO calibration turning out lower PHI values than Hybritech ones [83]. Thus, different cut-offs should be adopted according to the calibration method chosen. Additionally, some Authors established the best cut-off PHI according to the Youden Index, others according to the best sensitivity and others according to the best specificity. When selecting a test cutoff, which maximises sensitivity or specificity or a tradeoff between them, several elements should be taken into consideration, among them the prevalence of the disease in the population or in a particular subgroup, combination with the result of other biomarkers or procedures (i.e. DRE, PSA), risk of unnecessary further procedures (i.e. biopsy) and potential post-procedure complications, missed diagnoses and economic impact. Although some cost-consequence analysis studies have been performed to assess the impact of different PHI cut-offs, it is not entirely clear if these results are applicable to different populations, at what stage of the diagnostic process or with other biomarkers PHI should be used, or if missed diagnoses are true missed or instead delayed diagnoses [84]. It is reasonable to argue that different cut-offs could be applied to different subgroups of patients based on disease prevalence or a specific diagnostic strategy (rulein vs. rule-out, single vs. multiple biomarkers, population vs. high-risk patients). Many other studies are needed to evaluate and define specific PHI cut-offs.
Finally, prostate volume (PV) could influence the heterogeneity of PHI results among studies. Interestingly, Filella et al. showed that the diagnostic performance of PHI changes according to PV, with the highest accuracy in patients with small prostate volume [85]. Moreover, several Authors described an association between PV and PCa as well as tPSA. Accordingly, the PHI density (PHID), calculated as PHI/PV, has been introduced. Mearini     prospective large cohort study showed that PHID had better accuracy than PHI for detecting PCa but not csPCa [88]. Overall, the contrasting literature evidence achieved to date cannot to draw conclusions whether PV could improve the predictive ability of PHI. Thus, more studies are required to evaluate the usefulness of PHID for PCa and csPCa detection.
A cost-effectiveness strategy based on the best combination of PSA, PHI and mpMRI for detecting patients at high risk of PCa eligible for biopsy and with more aggressive forms should be developed, validated, and integrated into the guidelines. For this purpose, large-multicentre randomized-controlled studies are mandatory.
In conclusion, our data show that PHI is a reliable biomarker of PCa and csPCa. Nowadays, Clinicians have valuable tools for triaging patients at risk of PCa. Thus, the clinical paradigm should be shifted toward a more personalized approach to prostate biopsy decisions based on a multiparameter approach integrating biomarkers, including PSA and PHI, and clinical findings from mpMRI.
Research funding: None declared. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. Open circles represent sensitivity-specificity pairs of the 42 included studies. Black circle indicates the summary operating point (summary values for sensitivity and specificity). The curve solid line represents the summary ROC curve, whereas the solid and dashed closed curves indicate, respectively, the 95% confidence region around the summary operating point and the 95% prediction region. The range of the summary ROC curve was limited from the min to the max specificity of the included studies. Open circles represent sensitivity-specificity pairs of the 18 included studies. Black circle indicates the summary operating point (summary values for sensitivity and specificity). The curve solid line represents the summary ROC curve, whereas the solid and dashed closed curves indicate, respectively, the 95% confidence region around the summary operating point and the 95% prediction region. The range of the summary ROC curve was limited from the min to the max specificity of the included studies.