Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Current Directions in Biomedical Engineering

Joint Journal of the German Society for Biomedical Engineering in VDE and the Austrian and Swiss Societies for Biomedical Engineering

Editor-in-Chief: Dössel, Olaf

Editorial Board: Augat, Peter / Buzug, Thorsten M. / Haueisen, Jens / Jockenhoevel, Stefan / Knaup-Gregori, Petra / Kraft, Marc / Lenarz, Thomas / Leonhardt, Steffen / Malberg, Hagen / Penzel, Thomas / Plank, Gernot / Radermacher, Klaus M. / Schkommodau, Erik / Stieglitz, Thomas / Urban, Gerald A.


CiteScore 2018: 0.47

Source Normalized Impact per Paper (SNIP) 2018: 0.377

Open Access
Online
ISSN
2364-5504
See all formats and pricing
More options …

Learning discriminative classification models for grading anal intraepithelial neoplasia

Philipp Kainz
  • Corresponding author
  • Institute of Biophysics, Center for Physiological Medicine, Medical University of Graz, 8010 Graz, Austria
  • KML vision, 8010 Graz, Austria
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Michael Mayrhofer-Reinhartshuber
  • Institute of Biophysics, Center for Physiological Medicine, Medical University of Graz, 8010 Graz, Austria
  • KML vision, 8010 Graz, Austria
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Roland Sedivy
  • Center of Pathology, Danube Private University Krems, 3500 Krems-Stein, Austria; and Pathologie Länggasse, 3001 Bern, Switzerland
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Helmut Ahammer
  • Institute of Biophysics, Center for Physiological Medicine, Medical University of Graz, 8010 Graz, Austria
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2016-09-30 | DOI: https://doi.org/10.1515/cdbme-2016-0093

Abstract

Grading intraepithelial neoplasia is crucial to derive an accurate estimate of pre-cancerous stages and is currently performed by pathologists assessing histopathological images. Inter- and intra-observer variability can significantly be reduced, when reliable, quantitative image analysis is introduced into diagnostic processes. On a challenging dataset, we evaluated the potential of learning a classifier to grade anal intraepitelial neoplasia. Support vector machines were trained on images represented by fractal and statistical features. We show that pursuing a learning-based grading strategy yields highly reliable results. Compared to existing methods, the proposed method outperformed them by a significant margin.

Keywords: digital pathology; fractal analysis; histopathological image analysis; machine learning; support vector machines; tissue grading

1 Introduction

Anal intraepithelial neoplasia (AIN) can be present in various forms and usually precedes anal carcinoma [1]. Assessing alterations of epithelium is crucial to derive an accurate estimate of pre-cancerous stages and carcinoma in-situ. The inspection of biopsy tissue is done qualitatively by physicians to diagnose anorectal disease, either using glass slides and light microscopy, or digital whole slide images. AIN is typically classified into four classes that correspond to non-neoplastic tissue (AIN0), and increasing grades of dysplasia (AIN1-AIN3). Important cues for grading are the density, shape, and texture of cell nuclei, as well as the distribution of normal and abnormal cells within the tissue [2]. However, qualitative analysis strongly depends on the observer’s experience and frequently leads to irreproducible results. Since defining clear transitions between grades is not always possible, the inter-observer variability can be considerable. Using quantitative image analysis, drawbacks of subjective assessments can be tackled by introducing reproducible, validated methods in biomedical diagnostics.

Previous work used fractal image analysis [3], [4] to grade AIN, or cervical intraepithelial neoplasia [5]. Fractal dimensions were computed globally for each image, and the value range was segmented by rigidly searching for statistically significant thresholds to separate the classes. In other work [6], a variety of data mining methods was explored to grade AIN images based on statistical texture features. In order to add significant value to a diagnostic process in terms of reducing time-consuming tasks, a classifier must be able to deal with a great input variance [7], generalize well, and reliably predict the grading of new images. Two crucial aspects have not been addressed in [3], [4], [6]: Firstly, none of the previous methods involved building a classifier on training data and evaluating it on novel test data. Secondly, since authors reported errors on training sets, we cannot assess the ability of their methods to generalize to unseen data.

The objective of this work is to learn a discriminative classification model for predicting the grade of AIN. We show that a support vector machine (SVM [8]) learns a robust classifier on a set of global image features. Our learning-based approach outperforms the previously proposed methods by a large margin on a challenging dataset [6]. Furthermore, we assess the influence of common strategies for data augmentation and class label balancing on the generalization performance.

2 Material and methods

2.1 Histopathology dataset

We used a set of hematoxylin and eosin stained images (n = 136, 749 × 580 pixels) of different AIN grades [3], [6], cf. Figure 1. An expert pathologist labeled images containing healthy tissue (n0 = 17) as AIN0, and low-grade neoplasia (n1 = 36) as AIN1. High-grade neoplasia was labeled as AIN2 (n2 = 55), and AIN3 (n3 = 28). We assigned 80% of the images in each class to the training, and 20% to a hold-out test set. Further, 20% of the training samples were used as validation set for parameter tuning, before the test set was predicted. This resulted in 84 images in the original training set (10 AIN0 / 22 AIN1 / 35 AIN2 / 17 AIN3), 23 in the validation set (3/6/9/5), and 29 in the test set (4/8/11/6).

Expert gradings of anal intraepithelial neoplasia.
Figure 1

Expert gradings of anal intraepithelial neoplasia.

Image pre-processing and augmentation: We hypothesized that augmentation and balancing of a training set improves the ability of our classifier to generalize. Hence, two additional training sets were created by applying label-preserving elastic deformations and parameterized random intensity variations of the individual channels in HSV color space. The training sets are referred to as (a) for the original, (b) for the augmented, and (c) for the augmented and balanced training set. For (b), we added 30 versions of each image in (a), such that it comprised 2604 images. However, the class label distribution remained unbalanced. For (c), we oversampled the underrepresented classes in (b) to achieve a uniform distribution over all four grades of AIN (4304 images). All images were resized to 512 × 512 pixels by bilinear interpolation, prior to feature extraction.

2.2 Learning AIN grading classifiers

Feature extraction: Since AIN images show randomly orientated tissue, we considered the rotational invariance of global texture features advantageous. Inspired by [3], [4], [5], [6], we extracted 22 statistical and 304 fractal features to represent the image content. Statistical features consisted of summed pixel values of individual channels in three different color models (RGB, L*a*b*, HSV) and the gray value image (mean RGB), first and second order statistical parameters [9] of the NTSC luminance gray value image (variance, energy (1st, 2nd), entropy (1st, 2nd), skewness, kurtosis, third moment, fourth moment, contrast, homogeneity, correlation). Fractal features included estimated values for the fractal dimension based on the Fourier method (DF) applied to the gray value image [3] (first 216 distance values in frequency space), the box-counting method applied to the nuclei-segmented, binary image [5] (three scale-ranges: 20 − 24, 25 − 29, 20 − 29). In addition, the recently developed pyramidal gradient and pyramidal differences methods [10] applied to gray value images (mean RGB,R,G,B,L*,a*,b*,H,S,V; combinations of the scale-ranges 20 − 27 with at least four consecutive scales) were used.

Classifier training and inference: Given a set of labeled training samples, we employed a linear SVM to learn a maximum-margin classifier, characterized by an optimal hyper-plane that separates two classes. To solve our multi-class classification problem (AIN0-AIN3), multiple SVMs were trained in a one-versus-one scheme. The regularization parameter C, determining the trade-off between margin size and training error, was optimized to maximize the F1-score on a validation set. To infer the grading of a test sample, each SVM cast a vote for a grade. Then, majority voting determined the final grade.

Experimental setup: Three different sets of features were examined on each training set (cf. Section 2.1): (I) all features, (II) only the statistical features, (III) only DF (DF showed statistically significant differences for AIN1-AIN3 in [3]). Hence, nine classifiers were evaluated, which are identified as Ia-IIIc.

Features in each training set were standardized across all samples to zero mean, and unit variance. Validation and test set were standardized using the values computed from the corresponding training set. IQM (11), LIBSVM [12] and WEKA [13] were used in our experiments.

Performance metrics: Let TPc be the number of true positives, FPc false positives, and FNc false negatives per class. Precision, recall and F1-score is computed class-wise as PRCc = TPc/(TPc + FPc); RECc = TPc/(TPc + FNc); F1c = 2 ⋅ PRCc ⋅ RECc/(PRCc + RECc). Overall performance was measured in terms of weighted average precision (PRC = ∑c ωc PRCc), recall (REC = ∑c ωc RECc), and F1-score (F1 = ∑c ωc F1c), where ωc = nc/n is the weight of each class.

3 Results

In Table 1, overall PRC, REC, and F1 are reported for the validation and test set. Detailed results on the test set are presented as confusion matrices for all nine classifiers (Ia–IIIc), cf. Figure 2. Best validation results were obtained when all extracted features were used jointly (I). PRC, REC and F1 measures ranged from 0.81 to 0.91 on the validation, and 0.79–0.90 on the test set. Using statistical features only (II), performance metrics dropped to 0.65–0.76, and 0.45–0.66, respectively. Classification based on DF yielded 0.38–0.69 for the validation, and 0.27–0.45 for the test set (III).

Table 1

Quantitative results of grading methods (linear SVMs with parameters optimized on validation sets) for all four grades of AIN.

Confusion matrices for the test set obtained with the nine different SVM classifiers Ia–IIIc. Rows: Statistical and fractal features (I), statistical features (II), DF (III). Columns: Original (A), augmented (B), augmented and balanced (C) training set. For each true class, colors towards magenta encode a higher tendency of a classifier to predict a particular class. Higher values along the main diagonal are desired.
Figure 2

Confusion matrices for the test set obtained with the nine different SVM classifiers Ia–IIIc. Rows: Statistical and fractal features (I), statistical features (II), DF (III). Columns: Original (A), augmented (B), augmented and balanced (C) training set. For each true class, colors towards magenta encode a higher tendency of a classifier to predict a particular class. Higher values along the main diagonal are desired.

Generally, results obtained by models that used all available features for classification slightly improved, when an augmented training set was used. This behavior was not observed for classifiers trained on statistical features or DF. The results for all classifiers are comparable for unbalanced and balanced training sets.

Figure 3 illustrates qualitative results for four images by presenting their ground truth labels and the grades that were predicted by our trained classifiers. For this illustration, one image per class was chosen randomly from the test set.

Qualitative comparison of evaluated AIN grading methods. Text in red color denotes classification errors, top right corner shows the ground truth label.
Figure 3

Qualitative comparison of evaluated AIN grading methods. Text in red color denotes classification errors, top right corner shows the ground truth label.

4 Discussion and conclusion

We examined different strategies to learn a classification model using SVM for grading histopathological images of AIN. Our results indicate that a combination of multiple fractal and statistical features greatly improved the outcome. For models that used all available features, we obtained highly similar performance on the validation and test set, which emphasizes our system’s ability to generalize well to unseen samples without over-fitting. Balancing the training set did not generally result in increased performance. Nevertheless, we could verify our hypothesis that augmentation aids classifiers during learning phase.

The authors of [3] claimed that DF reflected AIN grades well, but the mean recall actually was <0.5, excluding AIN0. Here, we included AIN0 and could not confirm that DF properly represents AIN grades. A much larger feature set was required to achieve generalization rates acceptable for the use in biomedical diagnostics. In practice, physicians frequently discriminate only low- and high-grade AIN. Our best performing system (Ib) can predict these two classes with an accuracy of 96.55%.

This much higher performance can be explained by the fact that compared to previous work [3], [4] not only a single value for the fractal dimension, but a multitude of values derived from measurements over varying scale-ranges was used. Features from different scale-ranges reflect measures at different scales, which are important when grading AIN (density, shape, texture of cell nuclei, distribution of cells).

Nevertheless, manually extracting image features remains subject to experience and is application-specific. End-to-end machine learning approaches, e.g. convolutional neural networks [14], are able to automatically tackle this problem. However, they usually require much larger training datasets. It is common that huge sets of labeled training data are scarce in biomedical imaging. Hence, an evaluation of our approach on more extensive AIN datasets is required as they become available. Depending on the available size of a dataset, a semi-supervised learning setting, i.e. making use of unlabeled instances, should also be considered.

Acknowledgement

PK and MMR equally contributed to this publication.

Author’s Statement

Research funding: The author state no funding involved. Conflict of interest: Authors state no conflict of interest. Material and Methods: Informed consent: Informed consent has been obtained from all individuals included in this study. Ethical approval: The research related to human use complies with all the relevant national regulations, institutional policies and was performed in accordance with the tenets of the Helsinki Declaration, and has been approved by the authors’ institutional review board or equivalent committee.

References

  • [1]

    Simpson JAD, Scholefield JH. Diagnosis and management of anal intraepithelial neoplasia and anal cancer. Br Med J. 2011;343:d6818. Google Scholar

  • [2]

    Bejarano PA, Boutros M, Berho M. Anal squamous intraepithelial neoplasia. Gastroenterol Clin North Am. 2013;42:893–912. Google Scholar

  • [3]

    Ahammer H, Kroepfl JM, Hackl C, Sedivy R. Fractal dimension and image statistics of anal intraepithelial neoplasia. Chaos Soliton Fract. 2011;44:86–92. Google Scholar

  • [4]

    Klonowski W, Pierzchalski M, Stepien P, Stepien R, Sedivy R, Ahammer H. Application of Higuchi’s fractal dimension in analysis of images of Anal Intraepithelial Neoplasia. Chaos Soliton Fract. 2013;48:54–60. Google Scholar

  • [5]

    Fabrizii M, Moinfar F, Jelinek HF, Karperien A, Ahammer H. Fractal analysis of cervical intraepithelial neoplasia. PLoS One. 2014;9:1–9. Google Scholar

  • [6]

    Ahammer H, Kroepfl JM, Hackl C, Sedivy R. Image statistics and data mining of anal intraepithelial neoplasia. Pattern Recogn Lett. 2008;29:2189–96. Google Scholar

  • [7]

    Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001;23:89–109. Google Scholar

  • [8]

    Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97. Google Scholar

  • [9]

    Gonzalez RC, Woods RE. Digital image processing. Upper Saddle River, NJ: Prentice Hall International; 2008. Google Scholar

  • [10]

    Mayrhofer-Reinhartshuber M, Ahammer H. Pyramidal fractal dimension for high resolution images. Chaos. 2016;26:073109. Google Scholar

  • [11]

    Kainz P, Mayrhofer-Reinhartshuber M, Ahammer H. IQM: an extensible and portable open source application for image and signal analysis in Java. PLoS One. 2015;10:e0116329.Google Scholar

  • [12]

    Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2:27:1–27:27. CrossrefGoogle Scholar

  • [13]

    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explor. 2009;11:10–8. Google Scholar

  • [14]

    LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86:2278–324. Google Scholar

About the article

Published Online: 2016-09-30

Published in Print: 2016-09-01


Citation Information: Current Directions in Biomedical Engineering, Volume 2, Issue 1, Pages 419–422, ISSN (Online) 2364-5504, DOI: https://doi.org/10.1515/cdbme-2016-0093.

Export Citation

©2016 Philipp Kainz et al., licensee De Gruyter.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in