purpose. Clustering by unsupervised learning with machine learning classifiers was shown to segment clusters of patterns in standard automated perimetry (SAP) for glaucoma in previous publications. In this study, unsupervised learning by independent component analysis decomposed SAP field patterns into axes, and the information represented by these axes was evaluated.

methods. SAP fields were used that were obtained with the Humphrey Visual Field Analyzer (Carl Zeiss Meditec, Dublin, CA) from 189 normal eyes and 156 eyes with glaucomatous optic neuropathy (GON) determined by masked review with stereoscopic optic disc photographs. The variational Bayesian independent component analysis mixture model (vB-ICA-mm) partitioned the SAP fields into the most informative number of clusters. Simultaneously, the model learned an optimal number of maximally independent axes for each cluster.

results. The most informative number of clusters in the SAP set was two. vB-ICA-mm placed 68.6% of the eyes with GON in a cluster labeled G and 98.4% of the eyes with normal optic discs in a cluster labeled N. Cluster G optimally contained six axes. Post hoc analysis of patterns generated at −1 SD and +2 SD from the cluster G mean on the six axes revealed defects similar to those identified by experts as indicative of glaucoma. SAP fields associated with an axis showed increasing severity, as they were located farther in the positive direction from the cluster G mean.

conclusions. vB-ICA-mm represented the SAP fields with patterns that were meaningful for glaucoma experts. This process also captured severity in the patterns uncovered. These findings should validate vB-ICA-mm as a data-mining technique for new and unfamiliar complex tests.

^{ 1 }In 1889 Bjerrum began to uncover patterns of visual field defects, such as a comet-shaped arcuate scotoma or a nasal step scotoma, with quantitative perimetry.

^{ 2 }Consequently, for more than 110 years, generations of experts in glaucoma have accumulated knowledge to recognize patterns of visual field defects that indicate glaucoma. With the advent of standard automated perimetry (SAP) and the development of statistical field analysis packages such as Statpac (Carl Zeiss Meditec, Inc., Dublin, CA), the depth of defect within the patterns of loss and the relationship of adjacent test locations to each other could be quantified. We applied machine-learning data-mining techniques to uncover visual field patterns associated with glaucoma and to compare these patterns with the subjective qualitative and semiquantitative patterns evolved from experience by human experts.

^{ 3 }

^{ 4 }

^{ 5 }

^{ 6 }

^{ 7 }

^{ 8 }Another way is by finding useful patterns in large groups of patients. In the present study, we applied a variational Bayesian independent component analysis mixture model (vB-ICA-mm) to subdivide the results of SAP field tests performed in a cohort of normal subjects and patients with glaucomatous optic neuropathy (GON), to find patterns of interest in the SAP fields, and to propose how this knowledge can improve medical care.

*learning with a teacher*or

*supervised learning*.

*learning without a teacher*or

*unsupervised learning*. For our purposes, the goal can be to organize data, such as the set of visual fields, into differing clusters of patterns with similar members. Hence, another name given to the process is

*clustering*, and the investigation of the clusters segmented from the data is

*cluster analysis.*

*Component analysis*projects the data within each cluster located in multidimensional space onto axes that meaningfully represent the data. Although the axes do not produce clusters, representation of the data with axes still yields useful information about the patterns in the data. A further refinement,

*principal component analysis*(PCA) projects

*d*-dimensional data on a lower dimension subspace of

*s*-orthogonal axes. Though the expectation is that the orthogonal axes are independent, in reality, the axes may not be independent. We chose

*independent component analysis*(ICA), because it incorporates a measure of independence to produce axes that are maximally independent.

^{ 9 }

^{ 10 }There may be data distributions in which components are nonlinearly related or clustered so that they are difficult to describe by one ICA model. The

*ICA mixture model*

^{ 9 }is a nonlinear ICA technique that extends the linear ICA method by learning multiple ICA models and weighting them in a probabilistic manner. The ICA mixture model settles on the optimal number of axis sets simultaneously with the generation of the axes (Figs. 1A 1B) .

^{ 11 }The variational Bayesian framework helps to capture the number of axes in the local axis set and reduces the computational complexity (by bounding intractable integrals). The amalgamation of all these processes is the

*variational Bayesian ICA mixture model*(vB-ICA-mm)

^{ 6 }that we applied in this study to fields in healthy eyes and glaucomatous eyes to identify patterns of field loss associated with this diagnosis.

^{ 12 }angle abnormalities on gonioscopy, diseases other than glaucoma that could affect the visual fields, and medications known to affect visual field sensitivity. Subjects with a best-corrected visual acuity worse than 20/40, spherical equivalent outside ±5.0 D, and cylinder correction greater than 3.0 D were excluded. Poor quality stereoscopic photographs of the optic nerve head also served as an exclusion for the glaucoma category. A family history of glaucoma was not an exclusion criterion.

**x**= (L1, … , L54, age), excluding locations L18 and L31, because they fell in the normal blind spot. Age was included because both normal and glaucomatous SAP fields are affected by age, and age had been used in some studies that incorporated supervised learning.

^{ 3 }

^{ 4 }

^{ 5 }

^{ 7 }

^{ 8 }

^{Appendix}. A general description follows. Starting with 345 subjects evenly distributed in

*c*clusters and with random initialization of axes, the axes and the probability of each cluster were learned. The number of clusters,

*c*, was increased from

*c*= 1 until

*c*= 5, seeking the value beyond which no further gain in information would be obtained. Subjects belonging to the same cluster defined the mean and templates for that cluster. For each SAP field, its cluster assignment was then recomputed according to its likelihood value given by ICA. ICA alone would have sought one set of axes. The mixture model of ICA created clusters and sought an optimal set of axes for each cluster. Bayesian learning was applied to overcome overfitting by maximum-likelihood estimation. The cluster assignments and axis adjustments were iterated until no further change occurred in the cluster assignment.

^{ 6 }

^{ 9 }The possibility of ending up in a high local minimum of the error surface was minimized by selecting the model with the best marginal likelihood value from 100 different random initializations, thus ending up as close to the global minimum as possible.

^{ 13 }Instead of creating subclusters, as was reported with the variational Bayesian mixture of factor analysis mixture model,

^{ 13 }vB-ICA-mm creates axes within the clusters. The axes were analyzed by generating patterns in SAP field input space at specific points along the axes created by the vB-ICA-mm. We did not want to ignore potential information in the SAP fields. Information was also sought by examining the SAP fields in addition to examining patterns generated at points on the axes. The SAP fields in a cluster generated by the vB-ICA-mm were thus organized into clouds around the axes created by the vB-ICA-mm, and these clouds were analyzed.

*c*, each field was assigned to an axis. In 53-dimensional space, the angle was calculated at the cluster

*c*centroid between the vector for any individual SAP field and the vector of each of the axes. The individual field was assigned to the axis with which the SAP field vector had the smallest angle. This created a cloud of points in space around each axis, with the axis running up the center of the cloud (Fig. 2) . One can imagine the shape of each cloud to approximate a “hypercone” (multidimensional manifestation of a cone in three-dimensional space) expanding away from the glaucoma mean. Each axis had a positive cloud and a negative cloud. Each field was assigned to one axis in one direction, and there was no overlap in the clouds. This allowed us to look at the fields falling within each cloud and, we hoped, to determine what was similar about those fields in one cloud and how they differed from fields falling within the other clouds.

*c*, to discern whether a consistent pattern of field loss was shown in each cloud, and if so, to describe the pattern within each of the resultant groups of fields. Disagreements were resolved by consensus.

^{ 4 }

^{ 13 }The mean glaucoma field was not quite uniformly depressed, with greater depression than average at the nasal step zone, a slightly greater depression than average in the superior hemifield, and the least depression just inferior to fixation (Fig. 4) .

*glaucoma*mean. The axis direction was considered to be positive if the distance from the

*normal*mean field always increased as the shift from the glaucoma mean increased along the axis. As the distance from the glaucoma mean increased in the negative direction, the distance of a point on the axis to the normal mean initially decreased until the minimum distance of the axis to the normal mean was reached (Fig. 1B) . Thereafter, the distance from a point on the axis to the normal mean increased as the distance from the glaucoma mean increased.

^{ 4 }mixture of Gaussian, a comparable classifier that had knowledge of the correct diagnosis during the learning phase (supervised learning), had a sensitivity of 67% when specificity was restricted to 100%. Consequently, the cluster structure obtained with unsupervised learning by vB-ICA-mm correlated well with that obtained by supervised learning.

^{ 11 }

^{ 14 }The generated field patterns and the analyzed SAP fields each had 52 dimensions; with the addition of age, they were in 53-dimensional space. We will discuss the shapes in three-dimensional space, because it is easier to grasp cognitively. Cluster G can itself be segmented into clouds. The elongated clouds that are created around axes within cluster G are likely to have a different type of similarity of its members than clusters close to spherical in shape. Clusters of glaucoma fields created by unsupervised learning with variational Bayesian mixture of factor analysis were somewhat spherical.

^{ 13 }With that type of clustering, it is possible for one spherical cluster to have mostly mild defects and another cluster to have mostly advanced defects.

^{ 15 }That report describes increasing severity as worsening of the same pattern rather than the addition of new patterns. Another question is the effect of the age difference between the normal and glaucoma participants. In cluster analysis performed on the same participants, the age difference was found not to influence the patterns.

^{ 13 }Because severity, which is incorporated in each axis, is not a factor that distinguishes the axes, the age difference between the normal and glaucoma participants also does not affect the representation of data created with vB-ICA-mm. The representation of patterns is based on glaucoma and not age.

^{ 15 }

^{ 6 }vB-ICA-mm automatically determines the number of clusters and dimensions (axes) of each cluster. The input data (52 field locations plus age for each eye) are denoted by

**x**, in 53-dimensional space, and vB-ICA-mm models the data density by clusters,

*p*(

**x**) = \({{\sum}_{c}}P(c)p(\mathbf{x}{\vert}c)\) , where

*P*(

*c*) is the probability mass of cluster

*c*and

*p*(

**x**|

*c*) is the probability density of

**x**within cluster

*c*. Within each cluster \(c\) , the data are modeled by the linear combination of independent sources,

**x**

^{ c }= A

^{ c }

**s**

^{ c }+

**υ**

^{ c }+

**ς**

^{ c }, where

**A**

^{ c }= (

**A**

_{1}

^{ c },

**A**

_{2}

^{ c }, ⋮ ,

**A**

_{ n }

^{ c }) is the mixing matrix of the independent axes (

**A**

_{1}

^{ c }, ⋮ ,

**A**

_{ n }

^{c}),

**s**

^{ c }= (

*s*

_{1}

^{ c }, ⋮ ,

*s*

_{ n }

^{ c })

^{ T }are the activation coefficients along the axes, and \(\mathbf{{\upsilon}}^{\mathit{c}}\) is the centroid, \(\mathbf{{\varsigma}}^{\mathit{c}}\) the noise, and

*n the dimensionality (number of axes) of cluster c. The noise is modeled by Gaussian distribution with zero mean and covariance \(\mathbf{{\psi}}^{\mathbf{c}}\) . So the distribution of*

**x in any cluster can be written as**p(**x**|c) = p(**x**|**A**^{ c },**υ**^{ c },

**ψ**

^{c}) = ∫N(

**x**|

**A**

^{c}

**s**

^{c}+

**υ**

^{c},

**ψ**

^{c})p(

**s**

^{c})d

**s**

^{c}, where \(N\) denotes Gaussian distribution, the activation coefficients \(\mathbf{s}^{c}\) are assumed to be independent, and the density of each source, \(s_{m}^{c}\) , is modeled by

*k mixtures of Gaussian, p(*s

_{m}

^{c}) = \({{\sum}_{k}}{\pi}_{mk}^{c}N(s_{m}^{c}{\vert}{\phi}_{mk}^{c},\ {\beta}_{mk}^{c}),\) where each \({\pi}_{mk}^{c}\) is a mixture weight and \(N\) denotes Gaussian distribution whose mean is \({\phi}_{mk}^{c}\) and variance is \({\beta}_{mk}^{c}\) . The prior for mixing matrix \(\mathbf{A}^{c}\) is also Gaussian with zero mean and covariance \(\mathbf{{\alpha}}\) : \(p(\mathbf{A}_{nm}^{c}{\vert}{\alpha}_{m}^{c}){=}N(\mathbf{A}_{nm}^{c}{\vert}0,{\alpha}_{m}^{c}).\) Often, maximum likelihood estimation overfits data, and Bayesian learning overcomes overfitting by introducing priors. The priors introduced for the parameters

**π**,

**φ**,

**β**,

**α**,

**υ**,

**ψ**,

*P*(

*c*) are D, N, Γ, Γ, N, Γ, D, respectively, where N, Γ, and D are Gaussian, Gamma, and Dirichlet distributions, respectively.

**θ**= {

**π**,

**φ**,

**β**,

**α**,

**υ**,

**ψ**,

*P*(

*c*)}. Given all priors for these parameters, denoted by

*p*(

**θ**), the marginal likelihood is given by

*p*(

**x**) = ∫

*p*(

**x**|

**θ**)

*p*(

**θ**)

*d*

**θ**. This Bayesian approach automatically performs model selection, but exact Bayesian learning is rarely computationally tractable, because it is hard to obtain the posterior distribution of parameters

*p*(

**θ**|

**x**). We need a simpler function,

*q*(

**θ**), to approximate the true posterior of parameters, and the log marginal likelihood is lower bounded by

*c*(

*c*= 1,2,… 5), each number of axes,

*m*(

*m*= up to 20), vB-ICA-mm did the following:

^{ 6 }for further details of the functional form for priors, close functional form of

*q*(

**θ**), and the learning rules.

*c*and

*m*, we had 100 models. All the steps were repeated while simultaneously varying the number of clusters and the number of axes. We chose the final model by comparing all the models based on their marginal likelihood values (the larger the value, the better) and the classification accuracy.

**Figure 1.**

**Figure 1.**

**Figure 2.**

**Figure 2.**

**Figure 3.**

**Figure 3.**

**Figure 4.**

**Figure 4.**

Axis | Total | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | |||||||

Negative side of glaucoma mean | ||||||||||||

n | 18 | 7 | 8 | 15 | 4 | 0 | 52 | |||||

Average | −0.93 | −0.75 | −0.72 | −0.79 | −0.56 | |||||||

Min; max | −1.5; −0.5 | −1.0; −0.5 | −1.0; −0.4 | −1.3; −0.4 | −0.6; −0.4 | |||||||

Positive side of glaucoma mean | ||||||||||||

n | 7 | 9 | 10 | 16 | 5 | 11 | 58 | |||||

Average | +1.76 | +2.63 | +2.50 | +1.78 | +3.86 | +2.49 | ||||||

Min; max | +0.8; +6.4 | +1.0; +4.5 | +0.6; +4.9 | +1.2; +2.9 | +1.4; +6.1 | +1.4; +4.1 | ||||||

Total | 25 | 16 | 18 | 31 | 9 | 11 | 110 |

**Figure 5.**

**Figure 5.**

*Invest Ophthalmol Vis Sci.*2003;44:1813). Invest Ophthalmol Vis Sci. 2002;43:2660–2665. [PubMed]