The heterogeneity of misclassifications (FPF and FNF) in association with type I and type II errors under Neyman–Pearson lemma has increasingly gained attention in the field of evaluating the accuracy of medical diagnosis and disease screening based on a test characterized by a continuous biomarker. Assuming such heterogeneity is a reflection of utility or individual preference for the trade-off between the two errors, we make use of the fuzzy set theory to capture the fuzziness related to the utility or individual preference for such trade-off by the derivation of the two membership functions with the incorporation of significant factors that account for heterogeneity. The fuzzy set regression method was then built for solving the heterogeneity in association with the two misclassifications. The concept of linking fuzziness with sensitivity function as a function of FPF was envisaged to minimize the degree of fuzziness through the proposed unified fuzzy set regression so as to determine the optimal sensitivity and specificity. The novel estimation method using the two-stage least square method was also proposed to assess the inter-dependence of covariates affecting the fuzziness of the two misclassifications relaxing the traditional ROC analysis with independent assumption.

Although the trade-off between type I and type II errors has been optimized by Neyman–Pearson lemma and the corresponding optimal TPF and FPF has been solved by the ROC analysis, the issue on the heterogeneity of two misclassifications related to the utility of individual preference for minimizing the two errors still remained to be resolved. To the best of our knowledge, this is the first study to use the fuzzy set concept, together with the utility function related to FPF and FNF, to assess the heterogeneity of the misclassification of disease status with interval-scaled variables and to identify the optimal individual-based cutoff given different risk profiles. Our fuzzy set regression method has several unique characteristics. We introduced the two membership functions underpinning the fuzzy set to present the heterogeneity of the two misclassification rates. We proposed a unified regression model that simultaneously incorporates two logit regression models that include a series of significant risk factors to capture the degree of fuzziness of misclassification to which different individual characteristics contributed. This fuzzy set regression method is very flexible for handling interval-scaled biological markers affected by a mixture of other interval-scaled variables and categorical variables. The second is that because the fuzzy index does not present the relative weight of the utility associated with false-negative and false-positive cases, the fuzzy utility ratio approach was further developed to identify the subgroup optimal cutoff based on the trade-off between sensitivity and specificity given different risk profiles. Third, we proposed a novel statistical method for estimating parameters given a simultaneous equations system, which provides a very powerful approach for testing the statistical significance of putative factors affecting the degree of misclassification. The advance in making allowance for inter-dependence of the two simultaneous equations renders the minimization of the two errors more powerful and flexible compared with the optimal test based on the Neyman–Pearson lemma. Finally, our proposed fuzzy set regression method is very flexible to handling the circumstance when the fuzzy utility ratio (*F*_{2}/*F*_{1}) can be provided by empirical data on the survey of the measuring utility value in the association with FPF and FNF in the field of game theory.

The proposed fuzzy set regression method was illustrated with an example of osteoporosis screening with BMD. The fuzzy index score, based on the percentile of risk score derived from the two logit membership functions, gives a clear profile of the heterogeneity associated with the two misclassifications (Figure 2). The degree of fuzziness is determined not only by the percentile of the risk score but also by the relative contribution of the relevant covariates of fuzziness between FNF and FPF. It is obvious that the relevant covariates make equally significant contributions to the fuzziness in both error rates in the example of osteoporosis. Although the fuzzy index score can show the heterogeneity of the two misclassification rates, it is known that the utility ratio, calculated as the FPF divided by the FNF, indicates whether sensitivity or specificity is more important. The fuzzy utility ratio was used not only to show the heterogeneity of the two misclassification rates but also to identify the optimal sensitivity and specificity to determine the optimal subgroup cutoff, given different individual risk profiles.

It could be argued that one may consider the application of the conventional multiple logistic regression analysis without using information on fuzzy utility ratio to identify subgroup optimal cutoff. This can be achieved by first estimating clinical weights (regression coefficients), and 500 simulations were run based on this underlying model to generate the sensitivity and specificity given different cutoffs. The optimal sensitivity and specificity for each subgroup in combination with three covariates (age, obesity, and menopause) could be ascertained from 500 simulated estimates. The results show that the conventional logistic regression model is less discriminative to identify the optimal subgroup cutoff such as the covariate (obesity and menopause) in the illustration of 55 and 65 years of age in (see the final column) than our proposed fuzzy regression model considering fuzzy utility ratio. The explanation for this disparity is that our proposed fuzzy regression model not only considers factors affecting the joint distribution of sensitivity and specificity but also considers the relative contribution between sensitivity and specificity using information on fuzzy utility ratio but the conventional multiple logistic regression failed to consider these two unique features.

Our fuzzy set regression method is also helpful for identifying individual cutoff values like an example of population-based osteoporosis screening with BMD. Using the predetermined fuzzy utility ratio, as shown in Figure 2, the optimal cutoff for different risk groups given the trade-off between sensitivity and specificity can be selected. The optimal cutoff value for different individual characteristics can also be identified by using the predetermined fuzzy utility ratio as mentioned above. The strategy developed here has significant clinical implications for individualized disease screening.

Although our method can accommodate the characteristics of screening with interval-scaled biological markers, this study raises two concerns. First, we only demonstrated how the proposed fuzzy set regression method was applied to a single marker versus multiple markers, although the proposed two-stage least-squares method can be extended to multiple markers, as in eq. (8). This should be tested with empirical data in the future. Second, we captured the heterogeneity of false-negative and false-positive cases using the two logit membership functions. It is necessary to extend other link functions under the framework of a generalized linear model in the future.

In conclusion, the fuzzy set regression method incorporating the membership functions encoded by a constellation of covariates was proposed to assess the heterogeneity of the two misclassifications (FPF and FNF) in disease screening with interval-scaled markers, which is related to the utility (measured by the fuzziness of being classified as disease or non-disease) for the trade-off between the two errors. To minimize the degree of fuzziness, the fuzzy set regression method in combination with the ROC optimization method given a fuzzy utility ratio was then applied to finding the optimal sensitivity and specificity, both of which, in turn, determine the optimal subgroup-specific or subject-specific cutoff for different risk groups. Our proposed fuzzy set regression method was demonstrated successfully with an example of screening for osteoporosis with bone marrow density.

## Comments (0)