Figures 5 and 6 show the raw data points of the redness and edgeness features,
respectively, versus the median human grade. There is a clear
relationship between the features and the human data, although the
relationship is not necessarily linear (especially for
f r), and may have varying degrees of
consistency (for example, the three or four outliers in the edgeness
data in
Fig. 6 ).
Our goal is to predict, in some fashion, the grade from the extracted
image data. We denote by
ĝ(
f)
the estimated grade
g based on feature value
f.
Clearly, we want to constrain the grade to lie within the scale
\[0{\leq}{\hat{g}}(f){\leq}100.\]
The solid lines in the
Figures 5 and 6 represent the chosen
regressions. Because of the wide range of
f r (up to 1.0), a linear fit is
inappropriate, and a hyperbolic regression was therefore chosen for
f r, having an asymptote at
g = 100 and a slope at the origin of 45/0.05.
Although unenlightening, for completeness the temporal redness
regression follows
\[{\hat{g}}^{2}(f_{\mathrm{r}})-900{\hat{g}}(f_{\mathrm{r}})\ {\cdot}\ f_{\mathrm{r}}-109{\hat{g}}(f_{\mathrm{r}}){+}90,000f_{\mathrm{r}}-270{=}0.\]
Although this may appear overfit, the equation was fit by
adjusting only one free parameter, once the slope and asymptote were
specified.
A more straightforward linear regression was chosen for
f e, where the three misfitting data
points were eliminated from the coefficient learning process. The
resultant expression for the nasal edgeness regression is
\[{\hat{g}}(f_{\mathrm{e}}){=}5{+}60\ {\cdot}\ f_{\mathrm{e}}/0.16.\]
With two estimators
ĝ(
f r),
ĝ(
f e) defined,
there is clearly an ambiguity regarding which estimator to use or
whether the estimators can somehow be combined automatically. If
ĝ(
f r),
ĝ(
f e) are
viewed as approximate “measurements” of the true grade
g, then under certain conditions the optimal linear
Bayesian estimate of the grade is
\[{\hat{g}}(f_{\mathrm{r}},\ f_{\mathrm{e}}){=}\ \frac{{\hat{g}}(f_{\mathrm{r}}){\varsigma}_{\mathrm{e}}^{2}{+}{\hat{g}}(f_{\mathrm{e}}){\varsigma}_{\mathrm{r}}^{2}}{{\varsigma}_{\mathrm{e}}^{2}{+}{\varsigma}_{\mathrm{r}}^{2}}\]
and the associated estimation error variance is
\[\mathrm{var}{[}{\hat{g}}(f_{\mathrm{r}},\ f_{\mathrm{e}}){]}{=}\ \frac{1}{{\varsigma}_{\mathrm{e}}^{-2}{+}{\varsigma}_{\mathrm{r}_{}^{-2}}}\]
where ς
e 2, ς
r 2 are the error
variances of the single-feature estimators
ĝ(
f e),
ĝ(
f r)
respectively. (Ideally,
ĝ(
f r),
ĝ(
f e) should
be unbiased estimates of grade
g, and the errors in the two
estimates are assumed to be independent.) These error variances cannot
be deduced theoretically, but have to be inferred from the data. We
computed them as the smoothed local sample variance of the human grades
around the regressed curves. The resultant 1-SD curves are shown in
Figures 5 and 6 . Clearly the Bayesian estimator
7 biases in
favor of estimator
ĝ(
f e) for eyes
having only mild redness, and toward
ĝ(
f r) for
severe redness.
These developments were discussed and illustrated, for compactness,
based on only one half of the data, ignoring the nasal redness and
temporal edgeness cases. In the following results all the data are
used.
Figure 7 shows the estimation results, using the Bayesian
estimator
7 for both the temporal and nasal data. The
estimation results lie very close to the dashed-line ideal, with a
correlation coefficient of 0.976 between the estimates and the human
medians. For comparison purposes, an equivalent plot is shown in
Figure 8 , where a statistical sample of the human grades is plotted against the
median, for a corresponding correlation coefficient of only 0.841.
The error bars in
Figure 7 are unit SD in length, based on the Bayesian
error variance.
8 If the error variances are accurate, they
should meaningfully reflect the distribution of the estimates
ĝ around the true value
g—that is,
\[\frac{{\hat{g}}(f_{\mathrm{r}},\ f_{\mathrm{e}})-g}{\sqrt{\mathrm{var}{[}{\hat{g}}(f_{\mathrm{r}},\ f_{\mathrm{e}}){]}}}\]
should be zero-mean, unit-variance Gaussian. Experimentally, the
distribution in
equation 9 was found to be approximately Gaussian, with
a mean of −0.04 and a variance of 1.01, clearly validating the
estimated error variances.
Figure 9 compares the error SDs associated with the grading estimates of
individuals, the 50% most consistent individuals, and our proposed
automated system. Our system represents a great reduction in error over
the individuals and except for cases of severe redness, where our
regression and learning have a paucity of data, our errors are
competitive with the 50% set. Finally,
Figure 10 shows the performance of each individual, compared with our proposed
system. Of the 72 clinicians who took part in the experiment, only 1
was able to match the consistency (measured as the correlation
coefficient) of our proposed method. Clearly, our errors are
competitive with or better than even the most consistent graders.