December 2012
Volume 53, Issue 13
Free
Retina  |   December 2012
Automated “Disease/No Disease” Grading of Age-Related Macular Degeneration by an Image Mining Approach
Author Affiliations & Notes
  • Yalin Zheng
    From the Departments of Eye and Vision Science and
    St. Paul's Eye Unit, Royal Liverpool University Hospital, Liverpool, United Kingdom; and the
  • Mohd Hanafi Ahmad Hijazi
    Computer Science, University of Liverpool, Liverpool, United Kingdom; the
    School of Engineering and Information Technology, Universiti Malaysia Sabah, Sabah, Malaysia.
  • Frans Coenen
    Computer Science, University of Liverpool, Liverpool, United Kingdom; the
  • Corresponding author: Yalin Zheng, Department of Eye and Vision Science, Institute of Ageing and Chronic Disease, University of Liverpool, 3rd Floor, UCD Building, Daulby Street, Liverpool, L69 3GA; yalin.zheng@liv.ac.uk
Investigative Ophthalmology & Visual Science December 2012, Vol.53, 8310-8318. doi:10.1167/iovs.12-9576
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yalin Zheng, Mohd Hanafi Ahmad Hijazi, Frans Coenen; Automated “Disease/No Disease” Grading of Age-Related Macular Degeneration by an Image Mining Approach. Invest. Ophthalmol. Vis. Sci. 2012;53(13):8310-8318. doi: 10.1167/iovs.12-9576.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose.: To describe and evaluate an automated grading system for age-related macular degeneration (AMD) by color fundus photography.

Methods.: An automated “disease/no disease” grading system for AMD was developed based on image-mining techniques. First, image preprocessing was performed to normalize color and nonuniform illumination of the fundus images to define a region of interest and to identify and remove pixels belonging to retinal vessels. To represent images for the prediction task, a graph-based image representation using quadtrees was then adopted. Next, a graph-mining technique was applied to the generated graphs to extract relevant features (in the form of frequent subgraphs) from images of both AMD and healthy volunteers. Features of the training data were then fed into a classifier generator for training purposes before employing the trained classifiers to classify new “unseen” images.

Results.: The algorithm was evaluated on two publically available fundus-image datasets comprising 258 images (160 AMD and 98 normal). Ten-fold cross validation was used. The experiments produced a best specificity of 100% and a best sensitivity of 99.4% with an overall accuracy of 99.6%. Our approach outperformed previous approaches reported in the literature.

Conclusions.: This study has demonstrated a proof-of-concept, image-mining technique for automated AMD grading. This technique has the potential to be further developed as an automated grading tool for future whole-scale AMD screening programs.

Introduction
Age-related macular degeneration (AMD) is the leading cause of irreversible blindness in the developed world. 1 It has a significant impact upon the activities of daily living and the quality of life of patients affected by AMD; it consequently poses a substantial socioeconomic burden on society. The prevalence of AMD and its resulting visual impairment and blindness is expected to significantly increase given the world's ageing population. 2 There is mounting evidence that highlights the significance of early diagnosis and treatment to prevent progression to advanced AMD and eventual loss of vision. 2,3  
The diagnosis of AMD is usually based on detecting its characteristic color fundus photographic features, such as drusen and pigment abnormality in the macula, using the Age-related Eye Diseases Study (AREDS) classification system and severity scale. 4,5 With respect to the importance of the detection of features for the diagnosis of AMD, substantial work has been directed at applying image processing and content-based image retrieval techniques to support the diagnosis of AMD, for example the automated segmentation of drusen. 610 However, performance of these segmentation-based techniques is still not sufficient for wide-scale clinical application, largely because of the fact that the underlying segmentation techniques are not robust enough for handling feature variations found in fundus images, such as quality, color, illumination, and so on. In fact, detection of lesions is merely a steppingstone for most medical applications; the objective is to extract useful clinical information for the follow-on decision-making process. The study described here was directed at systems for the automated diagnosis of AMD. Certainly a lesion–detection-based strategy would be a natural one to pursue, unfortunately this strategy has proved to be challenging and has yet to provide useful results, as noted in previous work on this aspect (Barriga ES, et al. IOVS 2010;51:ARVO E-Abstract 1793). 1113  
We advocate an alternative strategy, founded on the concept of image mining, to achieve an automated AMD classification system with a minimal need for segmentation. Image mining does not require a representation that is interpretable by human observers as long as image-salient features are captured. The image–mining-based approach has been successful in categorizing magnetic resonance brain scan images 14 with a correct selection of image features, and the approach was conjectured to perform well in classifying images based on their color information. In this paper we promote the use of spatial context information within images. Our previous work has highlighted the challenge of this strategy, including the representation of images so as to preserve spatial relationships and the selection of appropriate features. 15 Here we propose, describe, and evaluate a proof-of-concept image mining technique for disease/no-disease grading of AMD by color fundus photography. 
Methods
Image Dataset
The proposed automated AMD grading system was evaluated using two publically available fundus images datasets: ARIA (http://www.eyecharity.com/aria_online) and STARE (http://www.ces.clemson.edu/∼ahoover/stare). The ARIA dataset comprises 161 images (101 AMD and 60 normal) acquired using a fundus camera (FF450+; Carl Zeiss Meditec, Inc., Dublin, CA) at a 50° field with a resolution of 576 × 768 pixels.The STARE dataset comprises 97 images (59 AMD and 38 normal) taken using a fundus camera (TOPCON TRV-50; Topcon Corp., Tokyo, Japan) at a 35° field and with a resolution of 605 × 700 pixels. These two datasets were merged into a single dataset comprising 258 images (160 AMD and 98 normal). An experienced, accredited grader at the Liverpool Ophthalmic Reading Center reviewed all the AMD images and split them into three categories: early (14), intermediate (29), and advanced (117) AMD according to the AMD severity scale set out by the AREDS. 4 More specifically, early AMD (AREDS category 2) is characterized by many small drusen or a few intermediate-size (63–124 um) drusen or retinal pigmentary abnormalities. Intermediate AMD (AREDS category 3) is characterized by at least one large (>125 um) drusen, numerous medium-size drusen, or geographic atrophy that does not extend to the center of the macula. Advanced AMD (AREDS category 4) can be either non-neovascular or neovascular. Advanced AMD is characterized by drusen and geographic atrophy extending to the center of the macula. 
Image Mining Framework
The proposed framework comprises five stages: (1) preprocessing, (2) image decomposition and graph representation, (3) WFSG mining, (4) feature selection, and (5) classification. 
Preprocessing
The objective of the preprocessing stage was to enhance the effectiveness of the classification system by first enhancing the images. The following steps were applied: 
  1.  
    A “mask image” Ibackground was first defined as proposed by Harr 16 by applying intensity thresholding and morphologic operators to the original image I (Fig. 1A): pixels within the circular fundus region of interest were marked as “1” while the rest as “0,” as shown in Figure 1B.
  2.  
    A new image, Icolor (Fig. 1C), was generated after color normalization of the original image I by using a histogram specification approach. 17
  3.  
    A common approach proposed by Foracchia et al. 18 was then applied to Icolor to eliminate the illumination variation, as a result Iillumination was generated (Fig. 1D).
  4.  
    A new image, Iprocessed (Fig. 1E), was generated after applying a contrast enhancement technique called Contrast Limited Adaptive Histogram Equalization (CLAHE) 19 to Iillumination . This was adopted because of its demonstrated superiority over other comparable techniques. 20
  5.  
    Blood vessels in the image Iprocessed were detected by an approach that used wavelet features and a supervised classification technique. 21 The vessel pixels in Ivessel (Fig. 1F) and those pixels marked with a “0” (black) in Ibackground (Fig. 1B) were not considered in the subsequent analysis. In this work, localization and removal of the optic disc was deliberately omitted, as it was observed from our previous experience that this process does not show benefit in terms of classification performance. 22
Figure 1. 
 
Illustration of preprocessing steps: (A) original image; (B) image mask; (C) image after color normalization; (D) image after illumination normalization; (E) image after contrast enhancement; and (F) the identified blood vessels.
Figure 1. 
 
Illustration of preprocessing steps: (A) original image; (B) image mask; (C) image after color normalization; (D) image after illumination normalization; (E) image after contrast enhancement; and (F) the identified blood vessels.
Image Partition and Graph Representation
One challenge of image mining is how to represent an image so as to maintain its structural information. Hierarchical trees are often used to represent images due to their ability to focus on the interesting parts of the input data, thus permitting an efficient representation of the problem and consequently improving the execution time. 23 Therefore, in this work we used a quadtree representation, the most common hierarchical data structure used in relation to image decomposition. The decomposition commenced by splitting an image into four equal-sized quadrants, with the root of the quadtree representing the entire image. The splitting process continued by further decomposing each quadrant to generate further subquadrants, and terminated when a certain level of granularity (or a desired maximum level of decomposition Dmax ) was reached or all sub-quadrants were homogeneous. A quadrant is homogeneous if it contains similar pixels values. In this study homogeneity was defined in terms of the similarity between the average intensity value of a quadrant and that of its subquadrants. If the difference of average intensities between a quadrant and any of its subquadrants divided by its average intensity is less than a predefined threshold, the quadrant is considered homogeneous. A threshold value of 10% was empirically chosen as the default setting in this study. Figure 2 illustrates the decomposition process of a retinal image. 
Figure 2. 
 
Illustration of image decomposition using the quadtree technique.
Figure 2. 
 
Illustration of image decomposition using the quadtree technique.
Throughout the decomposition process the tree data structure was continuously appended to (it is constructed dynamically). Each identified subregion was represented as a “node” in the tree data structure, whilst the relationship between each subregion and its parent was represented by the edges. The red, green, and blue color model was used to extract pixel intensity values, hence three trees were generated initially (one for each channel) and merged on completion. 
WFSG Mining
On completion of image decomposition the input image set was represented as a collection of trees (Fig. 3). Each tree was defined as follows: T = (V, E, LV, LE, u) where V and E were sets of vertices and edges, respectively; LV and LE were sets of labels for vertices and edges, respectively; and u defined a label mapping function. To extract frequent subtrees (image features) for classification, a weighted frequent subgraph (WFSG) mining algorithm was used. 24 Further details of WFSG are presented in Appendix A. 
Figure 3. 
 
Illustration of the quadtree data structure.
Figure 3. 
 
Illustration of the quadtree data structure.
The number of features discovered by the WFSG mining algorithm was determined by two thresholds: σ and λ, where σ denotes the minimum node support threshold and λ denotes the minimum edge weight threshold. Relatively low σ and λ values are required in order to extract a sufficient number of features. However, setting threshold values too low may result in large numbers of features, of which many may be redundant and/or ineffective in terms of the desired classification task, as well as adding to the computational cost. Thus, a feature selection process was applied to the discovered features. 
Feature Selection
Feature selection is often a desirable process in classification applications as this will serve to improve both the computational efficiency and the classification performance by reducing the data dimensions to only the most appropriate features. For this study, a feature ranking mechanism was employed that used linear support vector machine (SVM) weights to rank features. 25 To generate the weights (to be used for the ranking), the L2-regularized SVM with the L2-loss function (provided in the LIBLINEAR library, 26 which can be downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvm 27 ) was employed to rank the set of identified features generated from the previous stage. The resulting list of features was sorted in descending order according to their individual weights (discriminative power). This process allowed us to select the top K features for the subsequent classification task, consequently the size of the feature space (h) was reduced by a factor of h–K. Again K is a free parameter and its value was tuned for the best classification performance. 
Classification
Two different classification techniques were used, Naïve Bayes 28,29 and SVM. 27 Naïve Bayes was selected because it has been shown to work well and is comparable to other techniques, 28 and it does not require user defined parameters. SVM was selected because it is recognized as one of the most effective classification methods in machine learning. For the SVM, the LibSVM 27 library was used. A C-SVC formulation of SVM, with a radial basis function kernel k(x i ,x j ) = exp(−γx i x j 2), was employed to generate the SVM classifier. The optimal parameters, such as the soft margin C for C-SVC and the γ parameter of the radial basis function kernel, were determined using the associated grid search strategy. 27  
Evaluation
The proposed system was evaluated in order to investigate its performance by varying four parameter values: (1) depth of decomposition (Dmax ), (2) minimum node support threshold (σ), (3) minimum edge weight threshold (λ), and (4) number of features selected (K). All our experiments were conducted using ten-fold cross validation. On each ten-fold cross validation iteration, one-tenth of the data was used as the test set while the remainder was used as the training set. Comparisons were also made with related work reported in the literature. The authors have only identified four instances of previous work on retinal image AMD classification by other research groups: (1) Chaum et al., 11 (2) Barriga et al. (IOVS 2010;51:ARVO E-Abstract 1793) (3) Brandon and Hoover, 12 and (4) Agurto et al. 13  
Metrics
Three commonly used metrics were used to evaluate performance: sensitivity, specificity, and accuracy. Their corresponding 95% confidence intervals (CIs) were also calculated according to the Wilson score method.30 Sensitivity (resp. specificity) is a measure of the effectiveness in identifying positive (resp. negative) cases, while accuracy is a metric to indicate the overall classification performance. These metrics are defined as follows:     
Results
For the experiments on the effect of combinations of different parameter values (e.g., Dmax , σ, λ, and K), our results are shown in Tables 1 through 3 for Dmax values of 5, 6, and 7, respectively. For each Dmax , a range of σ values from 10% to 90% was used (incremented in steps of 10), while a range of λ values from 20% to 80% (incremented in steps of 20) was used. In Tables 1 through 3, only results corresponding to σ values from 10% to 50% are shown. Tables 1 through 3 show that the SVM classifier produced better results than the Naïve Bayes one with respect to all three Dmax values. For Dmax = 5, the best accuracy using the SVM classifier was 89.3% (sensitivity 92.8%; specificity 83.5%) while for the Naïve Bayes it was 76.1% (sensitivity 80.7%; specificity 68.1%). Note that as σ and λ were increased, the number of features decreased and, consequently, the accuracy reduced for both classifiers. The same trends may be observed for Dmax = 6 and 7 as well. 
Table 1. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 5
Table 1. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 5
minFreq σ, % minRatio λ, % SVM Naïve Bayes
Feature Size K Se, % Sp, % Acc, % Feature Size K Se, % Sp, % Acc, %
10 20 1000 97.0 33.7 73.4 50 80.7 68.1 76.1
40 200 92.8 83.5 89.3 50 78.3 72.1 76.1
60 200 95.8 25.6 69.7 50 75.2 60.1 69.6
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
20 20 200 92.8 50.8 77.2 50 75.3 66.1 71.9
40 200 92.8 50.8 77.2 50 75.3 66.1 71.9
60 200 95.8 24.6 69.3 50 74.7 60.1 69.3
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
30 20 200 95.8 28.6 70.8 100 72.2 62.2 68.5
40 200 95.8 28.6 70.8 100 72.2 62.2 68.5
60 200 95.8 25.6 69.7 50 75.2 60.1 69.6
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
40 20 50 87.8 38.8 69.5 50 76.5 51.9 67.4
40 50 87.8 38.8 69.5 50 76.5 51.9 67.4
60 50 87.8 38.8 69.5 50 76.5 51.9 67.4
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
50 20 100 94.0 29.7 70.0 50 79.4 42.7 65.8
40 100 94.0 29.7 70.0 50 79.4 42.7 65.8
60 100 94.0 29.7 70.0 50 79.4 42.7 65.8
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
Table 2. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 6
Table 2. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 6
minFreq σ, % minRatio λ, % SVM Naïve Bayes
Feature Size K Se, % Sp, % Acc, % Feature Size K Se, % Sp, % Acc, %
10 20 1000 99.4 100.0 99.6 50 80.2 73.6 77.6
40 1000 98.3 96.0 97.4 50 80.1 77.2 79.0
60 1000 93.4 42.7 74.6 50 76.5 67.0 73.0
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
20 20 1000 99.4 100.0 99.6 50 79.5 72.5 76.8
40 1000 99.4 100.0 99.6 50 79.5 72.5 76.8
60 1000 93.4 42.7 74.6 50 76.5 67.0 73.0
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
30 20 1000 92.8 49.9 76.9 100 74.1 66.0 71.1
40 1000 92.8 49.9 76.9 100 74.1 66.0 71.1
60 1000 93.4 42.7 74.6 50 76.5 67.0 73.0
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
40 20 400 95.3 55.2 80.3 50 75.5 57.7 68.9
40 400 95.3 55.2 80.3 50 75.5 57.7 68.9
60 400 95.3 55.2 80.3 50 75.5 57.7 68.9
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
50 20 200 92.3 54.9 78.4 100 78.4 53.9 69.3
40 200 92.3 54.9 78.4 100 78.4 53.9 69.3
60 200 92.3 54.9 78.4 100 78.4 53.9 69.3
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
Table 3. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 7
Table 3. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 7
minFreq σ, % minRatio λ, % SVM Naïve Bayes
Feature Size K Se, % Sp, % Acc, % Feature Size K Se, % Sp, % Acc, %
10 20 4000 99.4 100.0 99.6 1000 79.5 77.5 78.7
40 4000 99.4 100.0 99.6 1000 78.3 77.3 77.9
60 1000 99.4 100.0 99.6 100 75.8 74.5 75.4
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
20 20 1000 99.4 100.0 99.6 1000 77.7 77.5 77.6
40 1000 99.4 100.0 99.6 1000 77.7 77.5 77.6
60 1000 99.4 100.0 99.6 100 75.8 74.5 75.4
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
30 20 4000 95.3 75.6 87.9 50 78.2 73.3 76.4
40 4000 95.3 75.6 87.9 50 78.2 73.3 76.4
60 1000 99.4 100.0 99.6 100 75.8 74.5 75.4
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
40 20 1000 97.6 97.8 97.7 50 75.9 70.2 73.8
40 1000 97.6 97.8 97.7 50 75.9 70.2 73.8
60 1000 97.6 97.8 97.7 50 75.9 70.2 73.8
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
50 20 1000 95.3 51.0 78.7 100 81.4 61.2 73.9
40 1000 95.3 51.0 78.7 100 81.4 61.2 73.9
60 1000 95.3 51.0 78.7 100 81.4 61.2 73.9
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
Overall, for the Naïve Bayes classifiers a best accuracy of 79.0% was achieved with Dmax = 6, σ = 10% and λ = 40%. For the SVM classifiers, a best accuracy of 99.6% was observed with settings for Dmax = 6 and 7. These all occurred when σ = 10% or 20% while λ varied from between 20% to 60%. The associated sensitivity value is 99.4% (95% CI, 96.6%–99.9%) and the specificity value is 100% (95% CI, 96.2%–100%). A sub-analysis has showed that with the SVM approach the sensitivity in detection of early, intermediate and advanced AMD is 100% (95% CI, 78.5%–100%), 96.6% (95% CI, 82.8%–99.4%) and 100% (95% CI, 96.8%–100%), respectively. 
Discussion
This study was a proof of concept that demonstrated the feasibility of image–mining-based classification for automated AMD disease/no disease grading. We have employed two classifier generation techniques, SVM and Naïve Bayes. Our experiments, using two public retinal image databases, produced highly accurate results using SVM classification, with a best accuracy of 99.6% (sensitivity: 99.4%; specificity: 100%). Our SVM approach also showed promising results in detection of early, intermediate, and advanced AMD. The only misclassification of an intermediate AMD image is due to its very poor quality where almost half of the image is in black. This implied that the use of our technique would not miss any patients who need urgent care. 
Our comparative study demonstrated clearly that the proposed framework outperforms the previous work (Barriga, et al. IOVS 2010;51:ARVO E-Abstract 1793). 1113 Our results, and those previously reported in the literature, are presented in Table 4. In our comparison both the SVM and the Naïve Bayes classifiers were tested with σ = 10%, λ = 20%, and Dmax = 7. The SVM approach yielded better results than the Naïve Bayes classifier and the previous approaches in terms of sensitivity, specificity, and accuracy. Brandon and Hoover 12 used the STARE dataset while the others used data sets that are not publicly available. Table 4 features some missing values because these were not reported in the literature and could not be derived by the authors. The results recorded by Barriga et al. (IOVS 2010;51:ARVO E-Abstract 1793) included only sensitivity (75%) and specificity (50%). On the other hand, the method of Chaum et al. 11 was applied in a multiclass setting and hence only accuracy (88%) was reported. In their evaluation, 12 AMD images were classified as unknown and excluded from the accuracy calculation. If this number was included as misclassifications, the accuracy would drop to 75%. Brandon and Hoover 12 only reported the accuracy (90%) and specificity (89%), however, we were able to calculate the sensitivity value (90%). Their evaluation was applied not only to AMD screening (AMD versus nonAMD) but also to the grade (severity) of the detected AMD. To obtain an overall sensitivity value we summed the total number of AMD images (irrespective of their AMD grades) and counted the number of these images that were correctly classified. The most recent work by Agurto et al. 13 reports detection of AMD with sensitivity of 94% and 90%, and specificity of 50% and 50% for two databases, respectively.The accuracy results can be derived from their reported results of sensitivity and specificity, and they were lower than 80%. In contrast to these approaches our new system can achieve sensitivity similar to the others but a substantially higher specificity. In clinical practice this improvement would reduce many unnecessary referrals due to false alarms. The evidence from automated disease/no-disease grading of diabetic retinopathy (DR) research has shown that the introduction of such systems, even with specificity as low as approximately 50%, still can lead to cost effectiveness and reduced overall workload. 31 Our technique provided comparable sensitivity and much higher specificity; as such, our technique represented a considerable advance. Moreover, our technique showed the potential to provide patients with the results at the point of service. It will be able to work without intraobserver and interobserver grading variability, tiredness of human graders, and the need for regular training and certification that is required with respect to the human graders employed in manual grading programs. 
Table 4. 
 
Comparison of Results of Our Proposed Approaches with Those from Previous Methods
Table 4. 
 
Comparison of Results of Our Proposed Approaches with Those from Previous Methods
Approach Dataset Size Sensitivity, % Specificity, % Accuracy, %
Brandon and Hoover 12 97 90 89 90
Chaum et al.11 395 N/A N/A 88
Barriga et al.* 100 75 50 N/A
Agurto et al. 13 392 (Rist database) 90 60 79
94 50 78
395 (UTHSCSA database) 90 60 76
90 50 76
Proposed Bayes approach 258 79.5 77.5 78.7
Proposed SVM approach 258 99.4 100 99.6
All the studies, including ours, only use a relatively small number of images (<500) that may not be well representative of the population to be screened to address such a challenging problem due to the nature of medical imaging research. As presented above, the widths of the 95% CI in the detection of early and intermediate AMD are larger than 10%, which implies that a larger sample size is needed in order to narrow this down. This limitation suggests that the proposed technique should be further validated by considering large-scale studies before it can be introduced into clinical practice. We envisage that the sample size of such studies has to be carefully considered in order to establish the scalability and generality of the proposed technique and to precisely estimate the level of expected sensitivity. According to Buderer, 32 the sample size is dependent on disease prevalence, expected sensitivity and specificity, and the corresponding width of the CI. For instance, if the prevalence of any AMD is approximately 10% in the screened population, and the expected sensitivity and specificity is to be ≥ 90% and ≥ 95%, respectively, a minimum sample size of approximately 350 is required to confirm sensitivity greater than 90% when the width of the 95% CI is 5%. If the prevalence is 1% and all the other requirements are the same as above, then the sample size will become approximately 3500. The latter case may reflect the need for a substantially large sample size for the validation of the program with respect to the detection of subgroups of AMD (e.g., advanced AMD). However, the above results are not necessarily conclusive. The actual required sample size in any future study will have to be determined by the specific application and its performance requirements (i.e., the sample size needed by a validation study for screening people aged over 65 years would be smaller than that for screening people aged over 50 years for the same level of performance). Another important factor with respect to any future validation study is how to establish the reference standard for grading that is crucial for training and validation of the automated grading system. To this end we believe that the proposed strategies developed for automated DR grading can be readily adapted. In addition some additional components require further development for the current system to become a stand-alone automated grading system. For example, image quality is an important factor with respect to the detection of lesions and subsequent diagnosis; an automated image quality assessment mechanism is therefore desirable. It is also expected that further development will make it possible to automatically assess disease severity scales. 
In our research we also noted that, due to the nature of image mining techniques, the image representation used for the classification is no longer interpretable by human observers. It would be desirable, with respect to its acceptance and practical use, to allow the model to also be clinically interpretable. This may provide a better way for clinicians to interpret fundus photography and allow them to focus on spatial patterns. This has become a research topic in itself. On the other hand, our argument is that the most important feature of a prediction system like ours is its ability to make correct predictions. No system will be clinically useful if it is transparent to understanding but performs badly. As described above, our technique involves graph-mining and feature-selection processes in the classifier training phase which may require substantial computing and storage resources when dealing with large datasets. This may be a potential weakness of our technique. However, it is envisaged that with current technical advances in computing this would not be a key issue with respect to scalability and performance. 
Over past decades some newly emerging imaging techniques, such as fundus autofluorescence (FAF) and optical coherence tomography (OCT), have become available and show potential for AMD screening. FAF imaging is a noninvasive imaging technique that allows assessment of the integrity of the retinal pigment epithelium cells. 33 Although it has demonstrated potential for the analysis of distribution patterns of drusen and quantification of geographic atrophy, and as a prognostic tool to predict development of AMD, extensive work is warranted to investigate its use for AMD screening. The advent of OCT has revolutionized diagnosis and treatment of retinal disease. 34 OCT is a noncontact, noninvasive, high-resolution imaging technique that allows cross-sectional images of the retina to be obtained in almost real time and more importantly allows further quantitative analysis of features of the retina. 3538 It has been extensively used in the guidelines for follow up and retreatment of patients with AMD. 39 It appears to be a very promising technique to support AMD screening. However, OCT imaging may not show hemorrhaging, and may miss some abnormalities due to the large gap (or undetected region) between adjacent B-scans. Cost effectiveness may also be an issue, as OCT devices are much more expensive than standard color fundus cameras. It should also be noted that the current AMD severity scale was developed and validated as part of a large scale study (i.e., AREDS) using color fundus photography. 4 Effort would be needed to investigate the mapping of this scale between the new emerging techniques and color fundus photography. Therefore, it is believed that both FAF and OCT will help further establish the clinical validation for AMD screening, but may not be feasible for AMD screening alone. A combination of different diagnostic imaging techniques such as OCT, FAF, and color fundus may be an optimal solution with respect to future automated screening purposes. Whatever the case, our technique is a generic approach that can be extended to any of the above. 
Although the proposed approach has confirmed the technical feasibility of an automated AMD grading system, to the best of our knowledge no such programs exist currently. As suggested by Karnon et al. 40 the major concern for AMD screening is the significant uncertainty about its cost effectiveness, although annual screening from age 60 years onwards appeared to be beneficial at the time of their study. We noted that this conclusion was made without considering the potential benefit of using automated grading systems as, at that time, automated grading was merely at early stage proof-of concept and no sufficient detail was available for evaluation. Lessons and experience accumulated in DR screening and, in particular, recent development of automated grading could provide more insight into best practices. As an example, although specificity as high as ours is not achieved in automated disease/no-disease DR grading, models that combine automated and manual grading have demonstrated cost-effectiveness and a reduced overall workload. 31 If there had been an automated AMD grading system with similar performance, the cost effectiveness of AMD screening would be much improved compared to that observed in 2008. 40 Together with other advances in therapeutic treatment, there would be more weight to support AMD screening. Certainly, introduction of a new screening system is a rather complicated process, not only because of the need to satisfy well-established screening criteria 41 but also with respect to various political, economic, and ethical hurdles. 42 An alternative use for an automated AMD screening system, of the form proposed here, is as a “second opinion” generator. 31 We envisage that our approach has great potential for the above activities and laid a foundation for future research and the implementation of an automated screening system. In addition, the principle and methodology that we propose here may also be adapted for accurate analysis of disease progression, which is important for monitoring disease development and timely treatment. 
In conclusion, this study demonstrated a powerful image–mining-based technique for automated AMD grading whose superior performance warrants further development in order to translate this technique into clinical practice as an automated AMD grading tool. 
Acknowledgments
The authors thank the Ministry of Higher Education Malaysia for their financial support, and the Foundation for the Prevention of Blindness for their support. 
Appendix A
Let G = {g1,g1,···, gn} be n graphs representing an image dataset of n images (one graph for each image). In the context of the WFSM algorithm used,24 each node had a weight defined by the average color intensity value of the region (quadrant) represented by the node. A weight was also assigned to each edge; edge weights were defined as the difference between the average intensity of the child node and that of its parent. The WFSM algorithm extracted frequent subtrees (image features) for classification purposes. More specifically, a subgraph, sg, was considered frequent (important) if it satisfied the following two conditions: (1) Nwr × sup(sg) ≥ σ, and (2) Ewr(sg) ≥ λ, where Nwr denoted the node weighting, sup(sg) denoted the support (i.e., frequency) of sg, and σ denoted the minimum node-support threshold; Ewr denoted the edge weighting, and λ denoted the minimum edge-weight threshold. The weightings Nwr and Ewr were computed as follows:  where |Δsg| denoted the number of graphs in which sg occurred in the graph dataset G, |G| was the number of graphs in the graph data set G, while wnode(g) and wedge(g) were the average weights of nodes and edges in g, respectively. For full details interested readers should refer to Jiang and Coenen.24  
The output of the WFSM algorithm was then a set of WFSTs. In order to allow the application of existing classification algorithms to the identified WFSTs, feature vectors were built from them. The identified set of WFSTs was first used to define a feature space. Each image was then represented by a single feature vector comprised of some subset of the WFSTs in the feature space. In this manner the input set can be translated into a two-dimensional, binary-valued table of size n × k, of which the number of rows, n, represented the number of images and k the number of identified WFSTs. An additional class label column will be added for the training data. 
References
Pascolini D Mariotti S Pokharel G 2002 global update of available data on visual impairment: a compilation of population-based prevalence studies. Ophthalmic Epidemiol . 2004;11:67–115. [CrossRef] [PubMed]
Rein D Wittenborn J Zhang X Forecasting age-related macular degeneration through the year 2050. Arch Ophthalmol . 2009;127:533–540. [CrossRef] [PubMed]
Lamoureux EL Mitchell P Rees G Impact of early and late age-related macular degeneration on vision-specific functioning. Br J Ophthalmol . 2011;2011:666–670. [CrossRef]
Age-Related Eye Disease Study Research Group. The age-related eye disease study system for classifying age-related macular degeneration from stereoscopic color fundus photographs: AREDS report no. 6. Am J Ophthalmol . 2001;132:668–681. [CrossRef] [PubMed]
Davis MD Gangnon RE Lee LY The age-related eye disease study severity scale for age-related macular degeneration: AREDS report no. 17. Arch Ophthalmol . 2005;123:1484–1498. [CrossRef] [PubMed]
Sbeh ZB Cohen LD Mimoun G Coscas G. A new approach of geodesic reconstruction for drusen segmentation in eye fundus images. IEEE Trans Med Imaging . 2001;20:1321–1333. [CrossRef] [PubMed]
Rapantzikos K Zervakis M Balas K. Detection and segmentation of drusen deposits on human retina: potential in the diagnosis of age-related macular degeneration. Med Image Anal . 2003;7:95–108. [CrossRef] [PubMed]
Kose C Sevik U Gencalioglu O. Automatic segmentation of age-related macular degeneration in retinal fundus images. Comput Biol Medicine . 2008;38:611–619. [CrossRef]
Kose C Sevik U Gencalioglu O. A statistical segmentation method for measuring age-related macular degeneration in retinal fundus images. J Med Syst . 2008;34:1–13. [CrossRef]
Barriga ES Murray V Agurto C Multi-scale AM-FM for lesion phenotyping on age-related macular degeneration. Proceedings of the Twenty-second IEEE International Symposium on Computer-Based Medical Systems . August 3–4, Albuquerque, NM. 2009;1–5.
Chaum E Karnowski TP Govindasamy VP Abdelrahman M Tobin KW. Automated diagnosis of retinopathy by content-based image retrieval. Retina . 2008;28:1463–1477. [CrossRef] [PubMed]
Brandon L Hoover A. Drusen detection in a retinal image using multi-level analysis. In: Ellis RE, Peters TM, eds. Lecture Notes in Computer Science (Proc of MICCAI 03) . Montréal, Canada: Springer-Verlag; 2003;618–625.
Agurto C Barriga ES Murray V Automatic detection of diabetic retinopathy and age-related macular degeneration in digital fundus images. Invest Ophthalmol Vis Sci . 2011;52:5862–5871. [CrossRef] [PubMed]
Elsayed A Coenen F Jiang C Garcia-Finana M Sluming V. Corpus callosum MR image classification. Knowledge-Based Systems . 2010;23:330–336. [CrossRef]
Hijazi MHA Coenen F Zheng Y. Data mining techniques for the screening of age-related macular degeneration. Knowledge-Based Systems . 2011;29:83–92. [CrossRef]
ter Haar F. Automatic Localization of the Optic Disc in Digital Colour Images of the Human Retina . Utrecht, The Netherlands: Utrecht University; 2005. Thesis.
Gonzalez RC Woods RE. Digital Image Processing. 3rd ed. Harlow, UK: Pearson Prentice Hall; 2008.
Foracchia M Grisan E Ruggeri A. Luminosity and contrast normalization in retinal images. Med Image Anal . 2005;9:179–190. [CrossRef] [PubMed]
Zuiderveld K. Contrast limited adaptive histogram equalization. In: Heckbert PS ed. Graphics Gems IV . San Diego, CA: Academic Press Professional, Inc.; 1994:474–485.
Hijazi MHA Coenen F Zheng Y. Image classification using histograms and time series analysis: a study of age-related macular degeneration screening in retina image data. Proceedings of 10th Industrial Conference on Data Mining . Berlin, Germany; 2010:197–209.
Soares JVB Leandro JJG Jr Cesar RM Jelinek HF Cree MJ. Retinal vessel segmentation using the 2-D gabor wavelet and supervised classification. IEEE Trans Med Imaging . 2006;25:1214–1222. [CrossRef] [PubMed]
Hijazi MHA Coenen F Zheng Y. Retinal image classification using a histogram based approach. Proceedings of International Joint Conference on Neural Network 2010 (World Congress on Computational Intelligence 2010) . 2010:3501–3507.
Samet H. The quadtree and related hierarchical data structures. ACM Computing Surveys . 1984;16:187–260. [CrossRef]
Jiang C Coenen F. Graph-based image classification by weighting scheme. In: Allen T, Ellis R, Petridis M, eds. Applications and Innovations in Intelligent Systems XVI (Proc AI2008) . London, UK: Springer; 2009:63–76.
Chang YW Lin CJ. Feature ranking using linear SVM. JMLR: Workshop and Conference Proceedings . 2008:53–64.
Fan RE Chang KW Hsieh CJ Wang XR Lin CJ. LIBLINEAR: a library for large linear classification. J Mach Learn Res . 2008;9:1871–1874.
Chang CC Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol . 2011;2:Article 27. doi:10.1145/1961189.1961199 http://doi.acm.org/10.1145/1961189.1961199 .
Domingos P Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss. Journal of Machine Learning . 1997;29:103–130. [CrossRef]
Witten I Frank EH. Data Mining: Practical Machine Learning Tools and Techniques. 2nd ed. San Francisco, CA: Morgan Kaufmann Publishers; 2005.
Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Mede . 1998;17:857–872. [CrossRef]
Fleming AD Philip S Goatman KA Prescott GJ Sharp PF Olson JA. The evidence for automated grading in diabetic retinopathy screening. Curr Diabetes Revs . 2011;7:246–252. [CrossRef]
Buderer NMF. Statistical methodology: I. Incorporating the prevalence of disease into the sample size calculation for sensitivity and specificity. Acad Emerg Med . 1996;3:895–900. [CrossRef] [PubMed]
Holz FG Schmitz-Valckenberg S Spaide RF Bird AC eds. Atlas of Fundus Autofluorescence Imaging . 1st ed. Berlin, Germany: Springer; 2007.
Huang D Swanson EA Lin CP Optical coherence tomography. Science . 1991;254:1178–1181. [CrossRef] [PubMed]
Chiu SJ Li XT Nicholas P Toth CA Izatt JA Farsiu S. Automatic segmentation of seven retinal layers in SDOCT images congruent with expert manual segmentation. Opt Express . 2010;18:19413–19428. [CrossRef] [PubMed]
Jain N Farsiu S Khanifar AA Quantitative comparison of drusen segmented on SD-OCT versus drusen delineated on color fundus photographs. Invest Ophthalmol Vis Sci . 2010;51:4875–4883. [CrossRef] [PubMed]
Gregori G Wang F Rosenfeld PJ Spectral domain optical coherence tomography imaging of drusen in nonexudative age-related macular degeneration. Ophthalmology . 2011;118:1373–1379. [PubMed]
Chiu SJ Izatt JA O'Connell RV Winter KP Toth CA Farsiu S. Validated automatic segmentation of AMD pathology including drusen and geographic atrophy in SD-OCT images. Invest Ophthalmol Vis Sci . 2012;53:53–61. [CrossRef] [PubMed]
Chakravarthy U Harding SP Rogers CA Ranibizumab versus bevacizumab to treat neovascular age-related macular degeneration: one-year findings from the ivan randomized trial. Ophthalmology . 2012;119:1399–1411. [CrossRef] [PubMed]
Karnon J Czoski-Murray C Smith K A preliminary model-based assessment of the cost–utility of a screening programme for early age-related macular degeneration. Health Technol Assess . 2008;12:iii–iv, ix–124.
Wilson JMG Jungner G. Principles and Practice of Screening for Disease. Public Health Paper No. 34 . Geneva, Switzerland: World Health Organization; 1968:26–39.
Abramoff MD Niemeijer M Russell SR. Automated detection of diabetic retinopathy: barriers to translation into clinical practice. Expert Rev Med Devices . 2010;7:287–296. [CrossRef] [PubMed]
Footnotes
 Supported by the Ministry of Higher Education Malaysia (MHAH) and Foundation for the Prevention of Blindness (YZ). The authors alone are responsible for the content and writing of the paper.
Footnotes
 Disclosure: Y. Zheng, None; M.H.A. Hijazi, None; F. Coenen, None
Figure 1. 
 
Illustration of preprocessing steps: (A) original image; (B) image mask; (C) image after color normalization; (D) image after illumination normalization; (E) image after contrast enhancement; and (F) the identified blood vessels.
Figure 1. 
 
Illustration of preprocessing steps: (A) original image; (B) image mask; (C) image after color normalization; (D) image after illumination normalization; (E) image after contrast enhancement; and (F) the identified blood vessels.
Figure 2. 
 
Illustration of image decomposition using the quadtree technique.
Figure 2. 
 
Illustration of image decomposition using the quadtree technique.
Figure 3. 
 
Illustration of the quadtree data structure.
Figure 3. 
 
Illustration of the quadtree data structure.
Table 1. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 5
Table 1. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 5
minFreq σ, % minRatio λ, % SVM Naïve Bayes
Feature Size K Se, % Sp, % Acc, % Feature Size K Se, % Sp, % Acc, %
10 20 1000 97.0 33.7 73.4 50 80.7 68.1 76.1
40 200 92.8 83.5 89.3 50 78.3 72.1 76.1
60 200 95.8 25.6 69.7 50 75.2 60.1 69.6
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
20 20 200 92.8 50.8 77.2 50 75.3 66.1 71.9
40 200 92.8 50.8 77.2 50 75.3 66.1 71.9
60 200 95.8 24.6 69.3 50 74.7 60.1 69.3
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
30 20 200 95.8 28.6 70.8 100 72.2 62.2 68.5
40 200 95.8 28.6 70.8 100 72.2 62.2 68.5
60 200 95.8 25.6 69.7 50 75.2 60.1 69.6
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
40 20 50 87.8 38.8 69.5 50 76.5 51.9 67.4
40 50 87.8 38.8 69.5 50 76.5 51.9 67.4
60 50 87.8 38.8 69.5 50 76.5 51.9 67.4
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
50 20 100 94.0 29.7 70.0 50 79.4 42.7 65.8
40 100 94.0 29.7 70.0 50 79.4 42.7 65.8
60 100 94.0 29.7 70.0 50 79.4 42.7 65.8
80 50 89.9 40.7 71.6 50 82.5 43.6 68.1
Table 2. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 6
Table 2. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 6
minFreq σ, % minRatio λ, % SVM Naïve Bayes
Feature Size K Se, % Sp, % Acc, % Feature Size K Se, % Sp, % Acc, %
10 20 1000 99.4 100.0 99.6 50 80.2 73.6 77.6
40 1000 98.3 96.0 97.4 50 80.1 77.2 79.0
60 1000 93.4 42.7 74.6 50 76.5 67.0 73.0
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
20 20 1000 99.4 100.0 99.6 50 79.5 72.5 76.8
40 1000 99.4 100.0 99.6 50 79.5 72.5 76.8
60 1000 93.4 42.7 74.6 50 76.5 67.0 73.0
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
30 20 1000 92.8 49.9 76.9 100 74.1 66.0 71.1
40 1000 92.8 49.9 76.9 100 74.1 66.0 71.1
60 1000 93.4 42.7 74.6 50 76.5 67.0 73.0
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
40 20 400 95.3 55.2 80.3 50 75.5 57.7 68.9
40 400 95.3 55.2 80.3 50 75.5 57.7 68.9
60 400 95.3 55.2 80.3 50 75.5 57.7 68.9
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
50 20 200 92.3 54.9 78.4 100 78.4 53.9 69.3
40 200 92.3 54.9 78.4 100 78.4 53.9 69.3
60 200 92.3 54.9 78.4 100 78.4 53.9 69.3
80 200 93.3 39.9 73.5 50 78.3 47.9 66.9
Table 3. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 7
Table 3. 
 
Classification Results Using SVM and Naïve Bayes Classifiers with Dmax = 7
minFreq σ, % minRatio λ, % SVM Naïve Bayes
Feature Size K Se, % Sp, % Acc, % Feature Size K Se, % Sp, % Acc, %
10 20 4000 99.4 100.0 99.6 1000 79.5 77.5 78.7
40 4000 99.4 100.0 99.6 1000 78.3 77.3 77.9
60 1000 99.4 100.0 99.6 100 75.8 74.5 75.4
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
20 20 1000 99.4 100.0 99.6 1000 77.7 77.5 77.6
40 1000 99.4 100.0 99.6 1000 77.7 77.5 77.6
60 1000 99.4 100.0 99.6 100 75.8 74.5 75.4
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
30 20 4000 95.3 75.6 87.9 50 78.2 73.3 76.4
40 4000 95.3 75.6 87.9 50 78.2 73.3 76.4
60 1000 99.4 100.0 99.6 100 75.8 74.5 75.4
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
40 20 1000 97.6 97.8 97.7 50 75.9 70.2 73.8
40 1000 97.6 97.8 97.7 50 75.9 70.2 73.8
60 1000 97.6 97.8 97.7 50 75.9 70.2 73.8
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
50 20 1000 95.3 51.0 78.7 100 81.4 61.2 73.9
40 1000 95.3 51.0 78.7 100 81.4 61.2 73.9
60 1000 95.3 51.0 78.7 100 81.4 61.2 73.9
80 1000 95.8 21.5 68.1 50 81.4 58.2 72.8
Table 4. 
 
Comparison of Results of Our Proposed Approaches with Those from Previous Methods
Table 4. 
 
Comparison of Results of Our Proposed Approaches with Those from Previous Methods
Approach Dataset Size Sensitivity, % Specificity, % Accuracy, %
Brandon and Hoover 12 97 90 89 90
Chaum et al.11 395 N/A N/A 88
Barriga et al.* 100 75 50 N/A
Agurto et al. 13 392 (Rist database) 90 60 79
94 50 78
395 (UTHSCSA database) 90 60 76
90 50 76
Proposed Bayes approach 258 79.5 77.5 78.7
Proposed SVM approach 258 99.4 100 99.6
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×