Free
Retina  |   October 2011
Automated Assessment of Diabetic Retinopathy Severity Using Content-Based Image Retrieval in Multimodal Fundus Photographs
Author Affiliations & Notes
  • Gwénolé Quellec
    From Telecom Bretagne and
  • Mathieu Lamard
    University of Bretagne Occidentale, LaTIM (Laboratoire Traitement de l'Information Médicale), Brest, France; and
  • Guy Cazuguel
    From Telecom Bretagne and
  • Lynda Bekri
    Service d'Ophtalmologie, Centre Hospitalier Universitaire (CHU), Brest, France.
  • Wissam Daccache
    Service d'Ophtalmologie, Centre Hospitalier Universitaire (CHU), Brest, France.
  • Christian Roux
    From Telecom Bretagne and
  • Béatrice Cochener
    Service d'Ophtalmologie, Centre Hospitalier Universitaire (CHU), Brest, France.
  • Corresponding author: Gwénolé Quellec, Laboratoire de Traitement de l'Information Médicale, Bâtiment 2bis (I3S), CHU Morvan, 5, avenue Foch 29609, Brest CEDEX, France; gwenole.quellec@telecom-bretagne.eu
Investigative Ophthalmology & Visual Science October 2011, Vol.52, 8342-8348. doi:10.1167/iovs.11-7418
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Gwénolé Quellec, Mathieu Lamard, Guy Cazuguel, Lynda Bekri, Wissam Daccache, Christian Roux, Béatrice Cochener; Automated Assessment of Diabetic Retinopathy Severity Using Content-Based Image Retrieval in Multimodal Fundus Photographs. Invest. Ophthalmol. Vis. Sci. 2011;52(11):8342-8348. doi: 10.1167/iovs.11-7418.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose.: Recent studies on diabetic retinopathy (DR) screening in fundus photographs suggest that disagreements between algorithms and clinicians are now comparable to disagreements among clinicians. The purpose of this study is to (1) determine whether this observation also holds for automated DR severity assessment algorithms, and (2) show the interest of such algorithms in clinical practice.

Methods.: A dataset of 85 consecutive DR examinations (168 eyes, 1176 multimodal eye fundus photographs) was collected at Brest University Hospital (Brest, France). Two clinicians with different experience levels determined DR severity in each eye, according to the International Clinical Diabetic Retinopathy Disease Severity (ICDRS) scale. Based on Cohen's kappa (κ) measurements, the performance of clinicians at assessing DR severity was compared to the performance of state-of-the-art content-based image retrieval (CBIR) algorithms from our group.

Results.: At assessing DR severity in each patient, intraobserver agreement was κ = 0.769 for the most experienced clinician. Interobserver agreement between clinicians was κ = 0.526. Interobserver agreement between the most experienced clinicians and the most advanced algorithm was κ = 0.592. Besides, the most advanced algorithm was often able to predict agreements and disagreements between clinicians.

Conclusions.: Automated DR severity assessment algorithms, trained to imitate experienced clinicians, can be used to predict when young clinicians would agree or disagree with their more experienced fellow members. Such algorithms may thus be used in clinical practice to help validate or invalidate their diagnoses. CBIR algorithms, in particular, may also be used for pooling diagnostic knowledge among peers, with applications in training and coordination of clinicians' prescriptions.

Diabetic retinopathy (DR) is the leading cause of blindness in the working population of the United States and the European Union. 1,2 Detecting and monitoring DR in the at-risk population (diabetics), generally using eye fundus photography, is crucial for providing timely treatment of DR and therefore preventing visual loss. 3 With the likelihood of an increase in the at-risk population, 4 many computer image analysis algorithms have been proposed in the literature to analyze fundus photographs automatically. 5,6 Recently, several image-analysis groups compared the performance of their image analysis algorithms with the performance of clinicians in detecting DR in large-scale screening programs. 7 10 Disagreements between algorithms and clinicians were found to be comparable to disagreements among clinicians. 9  
Screening DR mainly involves detecting microaneurysms, usually the first appearing signs of DR, although detecting hemorrhages and exudates has also been proven useful. 11 In comparison, grading (i.e., monitoring) DR is much more complicated: according to the International Clinical Diabetic Retinopathy Disease Severity (ICDRS) scale, it involves detecting additional types of lesions (e.g., neovascularizations and intraretinal microvascular abnormalities) with a larger variability of scales and shapes. 12 As a consequence, little work has been done so far to automate DR grading, in comparison with DR screening. However, several algorithms of increasing complexity have recently been proposed by our group for automated DR grading using fundus photography, 13,14 together with demographic data in the most advanced algorithms. 15,16 These algorithms are all based on the content-based image retrieval (CBIR) paradigm, which is explained hereafter. 17,18 In CBIR, the content of images is characterized with low-level features (e.g., color, texture, and local shape) and these features are mapped to concepts (e.g., DR severity) using machine learning; in particular, it is not necessary to develop a dedicated algorithm for each lesion type. Once a query image has been automatically characterized, the most similar images, together with their medical interpretations, are searched for in a reference database. These most similar images are then used to infer an automated diagnosis for the query image. Note that a CBIR approach, relying on dedicated algorithms and limited manual inputs, has also been proposed by another image analysis group. 19  
The purpose of this study was twofold: first, to discover whether, as in a DR screening context, disagreements between automated DR grading algorithms and clinicians are comparable to disagreements among clinicians. For this purpose, a dataset of 85 consecutive DR examinations (1176 images) was graded, at eye and patient level, by two clinicians with different experience levels. The dataset was also automatically graded by three (one novel and two recently published) algorithms from our group. The first two automated algorithms analyze images independently to make a decision; the second algorithm, which is novel, allows additional flexibility in the way images are characterized. The third algorithm uses both image characterizations and demographic data to make a decision; note that analyzing both image characterizations and demographic data are in the DR monitoring protocol. Second, the possible uses of automated DR-grading algorithms in a clinical context were examined. The place of automated DR screening algorithms in clinical practice has been widely discussed. 7,9 —these algorithms may be used for triage, in that physicians are unable to screen the entire at-risk population—but those of automated DR grading have not. The specific advantages of CBIR in this context are emphasized. 
Methods
Diabetic Retinopathy Dataset (DRD)
A dataset of 85 consecutive DR follow-up examinations was collected at Brest University Hospital (Brest, France) from June 2003 through September 2007. All patients had known diabetes at the time of their examinations. Demographic data was collected during each examination (e.g., sex of the patient, age of the patient at the time of the examination, and personal and family medical history). Since two patients contributed one eye to the study, 168 eyes were photographed. Photographs were obtained by retinal photographers, using a retinal digital camera (Topcon TRC-50IA, Tokyo, Japan) connected to a computer. Seven photographs were obtained per eye: one red-free photograph, one blue-filtered photograph, and seven fluorescein angiographs (Fig. 1). After fluorescein injection, a temporal photograph sequence was obtained: one early angiograph, three intermediate angiographs (central, temporal and nasal fields), and one late angiograph. Overall, 1176 photographs were obtained. These images have a definition of 1280 pixels per row and 1008 pixels per column; they are stored in either TIFF or JPEG format. 
Figure 1.
 
Set of multimodal photographs from one eye diagnosed with mild nonproliferative DR. (a) Red-free photograph. (b) Blue-filtered photograph. (c) Early angiograph. (d) Intermediate angiograph (central retina). (e) Intermediate angiograph (temporal retina). (f) Intermediate angiograph (nasal retina). (g) Late angiograph.
Figure 1.
 
Set of multimodal photographs from one eye diagnosed with mild nonproliferative DR. (a) Red-free photograph. (b) Blue-filtered photograph. (c) Early angiograph. (d) Intermediate angiograph (central retina). (e) Intermediate angiograph (temporal retina). (f) Intermediate angiograph (nasal retina). (g) Late angiograph.
The study protocol adhered to the tenets of the Declaration of Helsinki. The Institutional Review Board of Brest University Hospital approved the study protocol, and because only deidentified data were used, a waiver of written informed consent was granted. 
Human Expert Standard
Two clinicians were involved in this study: one with 7 years' (Clinician1) and the other with 2 years' (Clinician2) experience. Each clinician was asked to grade disease severity in all 168 eyes according to a modified ICDRS scale. This scale consists of the five ICDRS levels: 0, no apparent DR; 1, mild nonproliferative DR; 2, moderate nonproliferative DR; 3, severe nonproliferative DR; and 4, proliferative DR, 12 as well as an additional level (5, treated DR). The clinicians interpreted the 168 eyes in randomized order. When interpreting one eye, the clinicians had access to all seven photographs, as well as the available demographic data, but they were masked to all photographs and interpretations from the contralateral eye. 
Two months later, Clinician1 interpreted the dataset a second time. Therefore, three interpretations are available at eye level: Clinician1 contributed EyeGrades1a and EyeGrades1b; Clinician2 contributed EyeGrades2
Based on each of these three interpretations, disease severity was also determined (automatically) at patient level: When a patient contributed two eyes to the study, disease severity at patient level was defined as the maximum disease severity at eye level among those two eyes. Therefore, three interpretations are also available at patient level: PatientGrades1a, PatientGrades1b, and PatientGrades2
Training and Testing Subsets
The proposed algorithms rely on the machine learning paradigm, and so examples are necessary for the training. In this purpose, DRD was divided between two subsets (A and B) with equal distribution of sex, diabetes type, and DR severity. All eyes from the same patient were assigned to the same subset. Except for the above-mentioned conditions, DRD was divided randomly between the two subsets. 
EyeGrades1a and PatientGrades1a were used as the reference standard for algorithm supervision at eye level and patient level, respectively. Performance was assessed by two-fold cross-validation. At first, subset A was used for training (i.e., tuning the algorithms), and subset B was used for testing (i.e., comparing the outputs of the algorithms with the reference standard). Then, subset B was used for training, and subset A was used for testing. 
Automated DR Severity Assessment Using CBIR
To automatically grade disease severity in a query eye (or patient) Q, the following procedure was applied:
  1.  
    The digital content of each image Iq in Q and of each image I ts in the training subset was automatically characterized by a feature vector.
  2.  
    The distance between the characterization of Iq and that of each image I ts in the training subset was computed.
  3.  
    The k nearest neighbors of Iq , within the training subset, were sought with respect to the distance measure in step 2.
  4.  
    The most frequent diagnosis among the nearest neighbors of every image Iq in Q (according to the reference standard) was assigned to Q.
The first three steps are the usual steps of a CBIR system. 17 Should a clinician using the system disagree with the proposed automated diagnosis (step 4), the nearest neighbors can be displayed and used by the clinician to revise the diagnosis. 
The use of the wavelet transform 20 has been proposed in previous works 13,14 to characterize the digital content of images, and the superiority of this methodology over several alternatives has been shown. 13,14 This approach was improved further in the present study, in that a novel set of wavelet filters was introduced. Two wavelet adaptation algorithms of increasing complexity, referred to as the Global and Local algorithms, are presented in the Appendix. 
A third algorithm, referred to as the Fusion algorithm, was evaluated: steps 1 and 2 were based on local wavelet adaptation and steps 3 and 4 were improved, as explained hereafter. A recently published information fusion algorithm from our group, based on Bayesian networks and the Dezert-Smarandache theory, 15 was used to combine the characterizations of all images from a query eye (or patient), as well as demographic data, to find the k most similar eyes (or patients) in the training subset. The most frequent diagnosis among these k nearest neighbors (with respect to the reference standard) was used as the automated diagnosis for the query eye (or patient). 
A fourth algorithm, referred to as the NoAngiography algorithm, was evaluated: It is similar to the Fusion algorithm, except that it is masked to all angiographs. Should the proposed algorithm perform equally well without angiographs, we might recommend that no (or less) fluorescein injection be performed, which would allow the imaging session to be streamlined and less invasive. 
Interobserver Agreement
Agreements between (1) the two clinicians, or (2) the same clinician at two different times, or (3) a clinician and an algorithm, were assessed by Cohen's κ 21 and weighted κ (κw). 22 The equation for κ is:   where p a is the observed probability of agreement and p ca is the probability of chance agreement. The simplest and more standard weighting scheme was used for κw: a disagreement of n sl severity levels was weighted by n sl
Training the Algorithms
Each algorithm (Local, Global, Fusion and NoAngiography) was tuned to maximize Cohen's κ between its outputs and the reference standard in the training subset (see Training and Testing Subsets). In particular, k, the number of nearest neighbors (see Automated DR Severity Assessment Using CBIR), was trained by leave-one-out cross-validation in the training subset. Agreement between each algorithm and clinicians was then assessed, in the testing subset, using the optimal value for k
Use of a CBIR Algorithm as a Second Opinion
Finally, we tested the effectiveness of a CBIR algorithm at providing a second opinion. The proposed CBIR algorithms, trained against the reference standard provided by the most experienced clinician (Clinician1), were used to automatically predict when the least experienced clinician (Clinician2) disagrees with his more experienced fellow member. We compared the probability of agreement (or disagreement) between Clinician1 and Clinician2, knowing that Clinician2 agrees (or disagrees) with a given algorithm, to the overall probability of agreement (or disagreement) between Clinician1 and Clinician2. A two-tailed test for difference between proportions was performed. 
Results
Patients were 61 years old on average (SD 13 years). There were 41 women and 44 men. Of the patients, 32 had type I diabetes and 46 had type II; diabetes type was unknown for 7 patients. 
Confusion matrices of clinician interpretations, in terms of DR severity at eye and patient level, are given in Tables 1 and 2, respectively. 
Table 1.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Eye Level
Table 1.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Eye Level
EyeGrades1a
0 1 2 3 4 5 Total
EyeGrades1b
0 23 4 0 0 1 0 28
1 0 36 11 0 0 0 47
2 0 1 23 3 2 1 30
3 0 0 1 34 0 0 35
4 0 0 0 1 10 0 11
5 0 0 0 0 1 16 17
Total 23 41 35 38 14 17 168
EyeGrades2
0 15 0 0 0 0 0 15
1 5 19 2 0 0 0 26
2 2 20 22 4 1 0 49
3 1 1 8 18 1 2 31
4 0 1 0 9 10 1 21
5 0 0 3 7 2 14 26
Total 23 41 35 38 14 17 168
Table 2.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Patient Level
Table 2.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Patient Level
PatientGrades1a
0 1 2 3 4 5 Total
PatientGrades1b
0 9 2 0 0 0 0 11
1 0 15 7 0 0 0 22
2 0 1 11 2 2 1 17
3 0 0 0 18 0 0 18
4 0 0 0 1 7 0 8
5 0 0 0 0 0 9 9
Total 9 18 18 21 9 10 85
PatientGrades2
0 6 0 0 0 0 0 6
1 3 10 1 0 0 0 14
2 0 8 13 3 1 0 25
3 0 0 3 10 1 1 15
4 0 0 0 4 6 2 12
5 0 0 1 4 1 7 13
Total 9 18 18 21 9 10 85
In comparison, confusion matrices of the best performing algorithm (Fusion) versus the reference standard, in terms of DR severity at eye and patient level, are given in Table 3. The optimal value for k was k = 5 on subset A and k = 7 on subset B. Agreement between all algorithms and clinicians, at eye and patient levels, are reported in Tables 4 and 5, respectively. 
Table 3.
 
Confusion Matrices of the Best Performing Algorithm (Fusion) versus the Reference Standard, in Terms of DR Severity
Table 3.
 
Confusion Matrices of the Best Performing Algorithm (Fusion) versus the Reference Standard, in Terms of DR Severity
Reference Standard
0 1 2 3 4 5 Total
Fusion at eye level
0 12 2 0 0 0 0 14
1 6 27 2 1 0 0 36
2 4 4 25 7 2 1 43
3 1 7 5 25 2 1 41
4 0 1 3 3 7 1 15
5 0 0 0 2 3 14 19
Total 23 41 35 38 14 17 168
Fusion at patient level
0 6 1 0 0 0 0 7
1 2 13 3 1 0 0 19
2 1 0 13 5 1 1 21
3 0 4 2 13 2 1 22
4 0 0 0 1 5 1 7
5 0 0 0 1 1 7 9
Total 9 18 18 21 9 10 85
Table 4.
 
Agreement between Observers at Eye Level
Table 4.
 
Agreement between Observers at Eye Level
EyeGrades1a EyeGrades1b EyeGrades2 Global Local Fusion NoAngiography
Cohen's κ
    EyeGrades1a 1 0.809 0.493 0.387 0.456 0.573 0.466
    EyeGrades1b 1 0.410 0.346 0.422 0.509 0.446
    EyeGrades2 1 0.411 0.380 0.391 0.340
    Global 1 0.683 0.516 0.658
    Local 1 0.689 0.730
    Fusion 1 0.738
    NoAngiography 1
Weighted κ
    EyeGrades1a 1 0.884 0.678 0.538 0.626 0.696 0.614
    EyeGrades1b 1 0.636 0.507 0.591 0.656 0.604
    EyeGrades2 1 0.595 0.598 0.586 0.536
    Global 1 0.784 0.660 0.755
    Local 1 0.774 0.786
    Fusion 1 0.826
    NoAngiography 1
Table 5.
 
Agreement between Observers at Patient Level
Table 5.
 
Agreement between Observers at Patient Level
PatientGrades1a PatientGrades1b PatientGrades2 Global Local Fusion NoAngiography
Cohen's κ
    PatientGrades1a 1 0.769 0.526 0.384 0.450 0.592 0.457
    PatientGrades1b 1 0.485 0.375 0.410 0.578 0.431
    PatientGrades2 1 0.397 0.382 0.391 0.436
    Global 1 0.778 0.556 0.757
    Local 1 0.723 0.734
    Fusion 1 0.673
    NoAngiography 1
Weighted κ
    PatientGrades1a 1 0.861 0.714 0.549 0.633 0.717 0.628
    PatientGrades1b 1 0.692 0.520 0.600 0.720 0.607
    PatientGrades2 1 0.583 0.623 0.597 0.591
    Global 1 0.851 0.677 0.794
    Local 1 0.789 0.819
    Fusion 1 0.792
    NoAngiography 1
The effectiveness of CBIR at predicting when Clinician2 agrees or disagrees with Clinician1, at the eye and patient levels, is presented in Tables 6 and 7, respectively. 
Table 6.
 
Prediction of Agreement and Disagreement between Experts at Eye Level
Table 6.
 
Prediction of Agreement and Disagreement between Experts at Eye Level
P(EyeGrades2=EyeGrades1a) P(EyeGrades2=EyeGrades1a| EyeGrades2=Algorithm) P
Algorithm: test of agreement
    Global 74.71% 0.0098
    Local 80.72% 0.0004
    Fusion 58.33% 90.59% <0.0001
    NoAngiography 84.62% <0.0001
P(EyeGrades2≠EyeGrades1a) P(EyeGrades2≠EyeGrades1a| EyeGrades2≠Algorithm) P
Algorithm: test of disagreement
    Global 59.26% 0.0092
    Local 63.53% 0.0010
    Fusion 41.67% 74.70% <0.0001
    NoAngiography 64.44% 0.0005
Table 7.
 
Prediction of Agreement and Disagreement between Experts at Patient Level
Table 7.
 
Prediction of Agreement and Disagreement between Experts at Patient Level
P(PatientGrades2=PatientGrades1a) P(PatientGrades2=PatientGrades1a| PatientGrades2=Algorithm) P
Algorithm: test of agreement
    Global 79.07% 0.0417
    Local 88.10% 0.0018
    Fusion 61.18% 95.35% <0.0001
    NoAngiography 82.61% 0.0116
P(PatientGrades2≠PatientGrades1a) P(PatientGrades2≠PatientGrades1a| PatientGrades2≠Algorithm) P
Algorithm: test of disagreement
    Global 57.14% 0.0508
    Local 65.12% 0.0049
    Fusion 38.82% 73.81% 0.0002
    NoAngiography 64.10% 0.0088
Discussion
In this article, the diabetic retinopathy (DR) grading performance of CBIR algorithms from our group was compared to the performance of two clinicians with different experience levels. 
First, interclinician agreement (κ = 0.493 at eye level; κ = 0.526 at patient level) was much lower than intraclinician agreement (κ = 0.809 at eye level; κ = 0.769 at patient level), at least for clinicians with different experience levels. Note, however, that the diagnoses of Clinician2 (the least experienced clinician) seldom differed from those of Clinician1 (the most experienced clinician) by more than one severity level (Tables 1, 2); wider divergences were observed more often between algorithms and Clinician1 (Table 3). 
Second, the simplest CBIR algorithms (Global and Local), which combine image characterizations in a basic way, were less efficient than Clinician2 in terms of Cohen's κ and weighted κ (Tables 4, 5). On the other hand, the performance of the most advanced algorithm (Fusion), which elegantly combines image characterizations and demographic data, compared favorably to the performance of Clinician2 (κ = 0.573 at eye level, κ = 0.592 at patient level). 
Third, we found that masking the Fusion algorithm to all angiographs noticeably decreased diagnosis performance (κ = 0.466 at eye level; κ = 0.457 at patient level). This performance decrease may be due to the higher discrimination power of angiography over other image modes. However, it may also be because nasal and temporal fields were photographed only after fluorescein injection. Further analyses are therefore needed to draw conclusions about the usefulness of angiography for automated DR severity assessment. 
Fourth, the potential usefulness of CBIR algorithms as a second opinion, to assist the least experienced clinicians, has been shown (Tables 6, 7). In particular, whenever Clinician2 disagreed with the algorithm at patient level, there was a 73.81% probability that he also disagreed with Clinician1, as opposed to 38.82% without prior knowledge (P = 0.0002). This result could serve as a warning that he should revise his diagnosis. Similarly, whenever Clinician2 agreed with the algorithm, there was a 95.35% probability that he also agreed with Clinician1, as opposed to 61.18% without prior knowledge (P < 0.0001). This should increase his confidence. 
Note that each of the above observations was made both at eye level and at patient level. 
We believe the fourth observation is of great practical value. We propose that an algorithm be used in the context of DR severity assessment, as a second opinion, to help validate or invalidate the diagnoses of young clinicians. Because such algorithms are able to provide a diagnosis in seconds, 15 they may be embedded in clinicians' workstations and display the proposed diagnosis on a screen. One advantage of the CBIR approach, over traditional computer-assisted diagnosis (CADx), is its interactivity: Should the clinician disagree with the proposed second opinion, he or she may visualize (also from a workstation) the k nearest neighbors from the reference dataset that were used to infer the second opinion, together with their medical interpretations from more experienced clinicians. This feature would help the clinician (1) to see whether the algorithm obviously made an error and, if not, (2) to compare his or her interpretation with that of his or her peers on similar cases. 
More generally, the proposed CBIR-based approach may be used to pool diagnostic knowledge among peers (either hospitalwide or nationwide), which has several possible applications. First, it may be used for training: Interns may now be able to compare their interpretations of real-life cases with those of renowned experts. Second, it may help in reducing interclinician variability and therefore help clinicians to coordinate their clinical decisions and prescriptions in a DR-grading program (e.g., to determine which patients should undergo a particular treatment), as is already done in screening programs. 7,8  
In conclusion, this preliminary study paves the way to the use of CBIR algorithms in clinical practice as a second opinion, to help validate or invalidate the diagnoses of young clinicians. 
Appendix: A Novel Wavelet-Based Image Characterization for CBIR
Let I be an input image of size M × N, and let w be a wavelet filter of size (2K + 1)(2L + 1). Filter w is used to extract information from I at a given analysis scale, in a given direction. The convolution of I and translated versions of w lead to the following set (referred to as a subband):   where s is the analysis scale. By varying the filter's aspect ratio (K/L), we can obtain subbands associated with different directions (horizontal, vertical, and nondirectional). 
The coefficients of filter w were tuned (as described in §a or §b) to increase the performance of DR severity assessment in the training subset. 
To characterize the digital content of image I, the distribution of the xs ;K,L (i,j) coefficients is modeled in several subbands. Because the xs ;K,L (i,j) coefficients in each subband have a 0-mean generalized Gaussian distribution (for small values of s), 13,23 their distribution can be efficiently modeled by their standard deviation σ s;K, L (i,j) and kurtosis κ s ;K,L (I):   where ms ;K,L,d (I) is the dth order moment of the distribution. 
The proposed image characterization is a feature vector consisting of the [σ s ;K,L (I),κ s ;K,L (I)] couples extracted in several subbands. To characterize the lowest frequencies (corresponding to s→∞), an intensity histogram of I was also included in the proposed image characterization. 
Distance D(I, J) between the characterization of image I and that of image J was defined as follows 13 :   where D H(I,J) denotes the Euclidean distance between the intensity histograms of I and J; α s ;K,L and β s ;K,L are subband weights. 
a. Global Adaptation of Image Characterizations and Distance Measures
In this first scenario, one wavelet filter was tuned for each filter size (i.e., each couple: K, L). Small filter sizes proved sufficient in previous works, 13,14 and so the following couples were used:
  •  
    K = 2, L = 0 ⇒ filter size: 5×1 (vertical filter V 0),
  •  
    K = 0, L = 2 ⇒ filter size: 1×5 (horizontal filter H 0), and
  •  
    K = 1, L = 1 ⇒ filter size: 3×3 (nondirectional filter ND 0).
The sum of all coefficients in a wavelet filter must be 0 20 ; consequently, there were four undetermined coefficients in the first two filters and eight in the last one. As in previous work, three analysis scales were used in this study: s = 1, s = 2, and s = 3; overall, there were 18 undetermined weights in equation 4
The undetermined coefficients (16 wavelet coefficients and 18 weights) were tuned to maximize Cohen's κ between the outputs of the algorithm and the reference standard, in the training subset. A genetic algorithm was used to find the optimal set of coefficients. 13,24  
b. Local Adaptation of Image Characterizations and Distance Measures
Since several types of lesions play a role in the definition of DR severity, 12 it is likely that the boundaries between severity levels in the area of image characterizations are complex. Therefore, in this second scenario, the wavelet filters and distance measures were allowed to vary continuously in image characterization space. A continuous function was used to map d 0(I), the initial characterization of an image I, obtained by the initial set of filters {V 0,H 0,ND 0 } (see §a), to a new set of filters {V(I),H(I),ND(I)} and a new set of weights W(I). Continuity means that two images with similar initial characterizations are mapped to similar sets of filters and weights; the following equation holds for V(I) [similar equations hold for H(I), ND(I), and W(I)]:    
The mapping functions above were tuned on the training subset: For each image I ts in the training subset {V(I ts ),H(I ts ),ND(I ts )} and W(I ts ) were tuned to maximize severity assessment performance for I ts , while respecting the continuity constraint (see §c). The continuity constraint prevented the algorithm from overfitting the training data. Moreover, it was necessary to define {V(Iq ),H(Iq ),ND(Iq )} and W(Iq ), where Iq is an image in the testing subset. These coefficients were obtained by multivariate interpolation 25 :   Similar equations hold for H(Iq ), ND(Iq ), and W(Iq ). These new filters and weights were used to find the k most similar neighbors of I
c. Training the Mapping Functions for Local Filter and Weight Adaptation
Let I ts be an image from the training subset. A gradient descent was used to find {V(I ts ),H(I ts ),ND(I ts ),W(I ts )}, using {V 0,H 0,ND 0,W 0} as the initial point. 
At each step of the descent (j), let J ts (j) be the nearest neighbor of I ts with a different severity level from I ts , and let K ts (j) be the nearest neighbor of I ts with the same severity level as I ts , such that D (j)(I ts , K ts (j)) ≥ D (j)(I ts , J ts (j)). One way to increase severity assessment performance for I ts is to minimize the ratio between S (j) = D (j)(I ts , K ts (j)) and O (j) = D (j)(I ts , J ts (j)). To respect the continuity constraint, each coefficient x in {V(I ts ),H(I ts ),ND(I ts ),W(I ts )} was increased by Δx (j) (I ts ) at each step (j):   where λ > 0, which controls the regularity of the mapping functions, was obtained by cross-validation on the training subset; the same value for λ was used in equation 6. If x is a weight (respectively a wavelet coefficient), then equation 7 can be computed using equation 4 (respectively, equation 4 and §d). 
d. Derivatives of the Proposed Image Characterizations with Respect to Wavelet Filter Coefficients
Let (k,l) be the coordinates of one coefficient in w[k ϵ(−K,…,K), l ϵ (−L,…L)] with respect to which we would like to compute the derivative of I's characterization (see equation 3). 
Let's first compute the derivative of the dth order moment ms;K,L,d(I) of the distribution (see equation 3) with respect to wk,l (see equation 2):   The derivative of I's characterization, with respect to wk,l, is given by:    
Footnotes
 Disclosure: G. Quellec, None; M. Lamard, None; G. Cazuguel, None; L. Bekri, None; W. Daccache, None; C. Roux, None; B. Cochener, None
References
Klonoff DC Schwartz DM . An economic analysis of interventions for diabetes. Diabetes Care. 2000;23:390–404. [CrossRef] [PubMed]
Sjølie AK Stephenson J Aldington S . Retinopathy and vision loss in insulin-dependent diabetes in Europe. The EURODIAB IDDM Complications Study. Ophthalmology. 1997;104:252–260. [CrossRef] [PubMed]
Kinyoun JL Martin DC Fujimoto WY Leonetti DL . Ophthalmoscopy versus fundus photographs for detecting and grading diabetic retinopathy. Invest Ophthalmol Vis Sci. 1992;33:1888–1893. [PubMed]
Mokdad AH Bowman BA Ford ES Vinicor F Marks JS Koplan JP . The continuing epidemics of obesity and diabetes in the United States. JAMA. 2001:286;1195–1200. [CrossRef] [PubMed]
Niemeijer M van Ginneken B Cree MJ . Retinopathy online challenge: automatic detection of microaneurysms in digital color fundus photographs. IEEE Trans Med Imaging. 2010;29:185–195. [CrossRef] [PubMed]
Winder RJ Morrow PJ McRitchie IN Bailie JR Hart PM . Algorithms for digital image processing in diabetic retinopathy. Comput Med Imaging Graph. 2009;33:608–622. [CrossRef] [PubMed]
Philip S Fleming AD Goatman KA . The efficacy of automated “disease/no disease” grading for diabetic retinopathy in a systematic screening programme. Br J Ophthalmol. 2007;91:1512–1517. [CrossRef] [PubMed]
Abràmoff MD Niemeijer M Suttorp-Schulten MSA Viergever MA Russell SR van Ginneken B . Evaluation of a system for automatic detection of diabetic retinopathy from color fundus photographs in a large population of patients with diabetes. Diabetes Care. 2008:31;193–198. [CrossRef] [PubMed]
Abràmoff MD Reinhardt JM Russell SR . Automated early detection of diabetic retinopathy. Ophthalmology. 2010;117:1147–1154. [CrossRef] [PubMed]
Dupas B Walter T Erginay A . Evaluation of automated fundus photograph analysis algorithms for detecting microaneurysms, haemorrhages and exudates, and of a computer-assisted diagnostic system for grading diabetic retinopathy. Diabetes Metab. 2010;36:213–220. [CrossRef] [PubMed]
Niemeijer M Abràmoff MD van Ginneken B . Information fusion for diabetic retinopathy CAD in digital color fundus photographs. IEEE Trans Med Imaging. 2009;28:775–785. [CrossRef] [PubMed]
Wilkinson CP Ferris FL Klein RE . Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology. 2003;110:1677–1682. [CrossRef] [PubMed]
Quellec G Lamard M Cazuguel G Cochener B Roux C . Wavelet optimization for content-based image retrieval in medical databases. Med Image Anal. 2010;14:227–241. [CrossRef] [PubMed]
Quellec G Lamard M Cazuguel G Cochener B Roux C . Adaptive nonseparable wavelet transform via lifting and its application to content-based image retrieval. IEEE Trans Image Process. 2010;19:25–35. [CrossRef] [PubMed]
Quellec G Lamard M Cazuguel G Roux C Cochener B . Case retrieval in medical databases by fusing heterogeneous information. IEEE Trans Med Imaging. 2010;14:227–241.
Quellec G Lamard M Bekri L Cazuguel G Roux C Cochener B . Medical Case retrieval from a committee of decision trees. IEEE Trans Inf Technol Biomed. 2010;14:1227–1235. [CrossRef] [PubMed]
Smeulders AWM Worring M Santini S Gupta A Jain R . Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell. 2000;22:1349–1380. [CrossRef]
Müller H Michoux N Bandon D Geissbuhler A . A review of content-based image retrieval systems in medical applications: clinical benefits and future directions. Int J Medical Inform. 2004;73:1–23. [CrossRef]
Chaum E Karnowski TP Govindasamy VP Abdelrahman M Tobin KW . Automated diagnosis of retinopathy by content-based image retrieval. Retina. 2008;28:1463–1477. [CrossRef] [PubMed]
Mallat S . A Wavelet Tour of Signal Processing. 2nd ed. San Diego, CA: Academic Press; 1999.
Cohen J . A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:27–46. [CrossRef]
Cohen J . Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–220. [CrossRef] [PubMed]
Van de Wouwer G Scheunders P Van Dyck D . Statistical texture characterization from discrete wavelet representations. IEEE Trans Image Process. 1999;8:592–598. [CrossRef] [PubMed]
Goldberg DE . Genetic Algorithms in Search, Optimization and Machine Learning. Boston: Kluwer Academic Publishers; 1989.
Shepard D . A two-dimensional interpolation function for irregularly-spaced data. Proceedings of the 23rd ACM National Conference. New York: Association for Computing Machinery: 1968;517–524.
Figure 1.
 
Set of multimodal photographs from one eye diagnosed with mild nonproliferative DR. (a) Red-free photograph. (b) Blue-filtered photograph. (c) Early angiograph. (d) Intermediate angiograph (central retina). (e) Intermediate angiograph (temporal retina). (f) Intermediate angiograph (nasal retina). (g) Late angiograph.
Figure 1.
 
Set of multimodal photographs from one eye diagnosed with mild nonproliferative DR. (a) Red-free photograph. (b) Blue-filtered photograph. (c) Early angiograph. (d) Intermediate angiograph (central retina). (e) Intermediate angiograph (temporal retina). (f) Intermediate angiograph (nasal retina). (g) Late angiograph.
Table 1.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Eye Level
Table 1.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Eye Level
EyeGrades1a
0 1 2 3 4 5 Total
EyeGrades1b
0 23 4 0 0 1 0 28
1 0 36 11 0 0 0 47
2 0 1 23 3 2 1 30
3 0 0 1 34 0 0 35
4 0 0 0 1 10 0 11
5 0 0 0 0 1 16 17
Total 23 41 35 38 14 17 168
EyeGrades2
0 15 0 0 0 0 0 15
1 5 19 2 0 0 0 26
2 2 20 22 4 1 0 49
3 1 1 8 18 1 2 31
4 0 1 0 9 10 1 21
5 0 0 3 7 2 14 26
Total 23 41 35 38 14 17 168
Table 2.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Patient Level
Table 2.
 
Confusion Matrices of Clinician Interpretations, in Terms of DR Severity, at Patient Level
PatientGrades1a
0 1 2 3 4 5 Total
PatientGrades1b
0 9 2 0 0 0 0 11
1 0 15 7 0 0 0 22
2 0 1 11 2 2 1 17
3 0 0 0 18 0 0 18
4 0 0 0 1 7 0 8
5 0 0 0 0 0 9 9
Total 9 18 18 21 9 10 85
PatientGrades2
0 6 0 0 0 0 0 6
1 3 10 1 0 0 0 14
2 0 8 13 3 1 0 25
3 0 0 3 10 1 1 15
4 0 0 0 4 6 2 12
5 0 0 1 4 1 7 13
Total 9 18 18 21 9 10 85
Table 3.
 
Confusion Matrices of the Best Performing Algorithm (Fusion) versus the Reference Standard, in Terms of DR Severity
Table 3.
 
Confusion Matrices of the Best Performing Algorithm (Fusion) versus the Reference Standard, in Terms of DR Severity
Reference Standard
0 1 2 3 4 5 Total
Fusion at eye level
0 12 2 0 0 0 0 14
1 6 27 2 1 0 0 36
2 4 4 25 7 2 1 43
3 1 7 5 25 2 1 41
4 0 1 3 3 7 1 15
5 0 0 0 2 3 14 19
Total 23 41 35 38 14 17 168
Fusion at patient level
0 6 1 0 0 0 0 7
1 2 13 3 1 0 0 19
2 1 0 13 5 1 1 21
3 0 4 2 13 2 1 22
4 0 0 0 1 5 1 7
5 0 0 0 1 1 7 9
Total 9 18 18 21 9 10 85
Table 4.
 
Agreement between Observers at Eye Level
Table 4.
 
Agreement between Observers at Eye Level
EyeGrades1a EyeGrades1b EyeGrades2 Global Local Fusion NoAngiography
Cohen's κ
    EyeGrades1a 1 0.809 0.493 0.387 0.456 0.573 0.466
    EyeGrades1b 1 0.410 0.346 0.422 0.509 0.446
    EyeGrades2 1 0.411 0.380 0.391 0.340
    Global 1 0.683 0.516 0.658
    Local 1 0.689 0.730
    Fusion 1 0.738
    NoAngiography 1
Weighted κ
    EyeGrades1a 1 0.884 0.678 0.538 0.626 0.696 0.614
    EyeGrades1b 1 0.636 0.507 0.591 0.656 0.604
    EyeGrades2 1 0.595 0.598 0.586 0.536
    Global 1 0.784 0.660 0.755
    Local 1 0.774 0.786
    Fusion 1 0.826
    NoAngiography 1
Table 5.
 
Agreement between Observers at Patient Level
Table 5.
 
Agreement between Observers at Patient Level
PatientGrades1a PatientGrades1b PatientGrades2 Global Local Fusion NoAngiography
Cohen's κ
    PatientGrades1a 1 0.769 0.526 0.384 0.450 0.592 0.457
    PatientGrades1b 1 0.485 0.375 0.410 0.578 0.431
    PatientGrades2 1 0.397 0.382 0.391 0.436
    Global 1 0.778 0.556 0.757
    Local 1 0.723 0.734
    Fusion 1 0.673
    NoAngiography 1
Weighted κ
    PatientGrades1a 1 0.861 0.714 0.549 0.633 0.717 0.628
    PatientGrades1b 1 0.692 0.520 0.600 0.720 0.607
    PatientGrades2 1 0.583 0.623 0.597 0.591
    Global 1 0.851 0.677 0.794
    Local 1 0.789 0.819
    Fusion 1 0.792
    NoAngiography 1
Table 6.
 
Prediction of Agreement and Disagreement between Experts at Eye Level
Table 6.
 
Prediction of Agreement and Disagreement between Experts at Eye Level
P(EyeGrades2=EyeGrades1a) P(EyeGrades2=EyeGrades1a| EyeGrades2=Algorithm) P
Algorithm: test of agreement
    Global 74.71% 0.0098
    Local 80.72% 0.0004
    Fusion 58.33% 90.59% <0.0001
    NoAngiography 84.62% <0.0001
P(EyeGrades2≠EyeGrades1a) P(EyeGrades2≠EyeGrades1a| EyeGrades2≠Algorithm) P
Algorithm: test of disagreement
    Global 59.26% 0.0092
    Local 63.53% 0.0010
    Fusion 41.67% 74.70% <0.0001
    NoAngiography 64.44% 0.0005
Table 7.
 
Prediction of Agreement and Disagreement between Experts at Patient Level
Table 7.
 
Prediction of Agreement and Disagreement between Experts at Patient Level
P(PatientGrades2=PatientGrades1a) P(PatientGrades2=PatientGrades1a| PatientGrades2=Algorithm) P
Algorithm: test of agreement
    Global 79.07% 0.0417
    Local 88.10% 0.0018
    Fusion 61.18% 95.35% <0.0001
    NoAngiography 82.61% 0.0116
P(PatientGrades2≠PatientGrades1a) P(PatientGrades2≠PatientGrades1a| PatientGrades2≠Algorithm) P
Algorithm: test of disagreement
    Global 57.14% 0.0508
    Local 65.12% 0.0049
    Fusion 38.82% 73.81% 0.0002
    NoAngiography 64.10% 0.0088
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×