The application of the variational Bayesian ICA mixture model in this project is adapted from the formal presentation in Chan.
6 vB-ICA-mm automatically determines the number of clusters and dimensions (axes) of each cluster. The input data (52 field locations plus age for each eye) are denoted by
x, in 53-dimensional space, and vB-ICA-mm models the data density by clusters,
p(
x) =
\({{\sum}_{c}}P(c)p(\mathbf{x}{\vert}c)\) , where
P(
c) is the probability mass of cluster
cand
p(
x|
c) is the probability density of
xwithin cluster
c. Within each cluster
\(c\) , the data are modeled by the linear combination of independent sources,
x c = A
c s c +
υ c +
ς c , where
A c = (
A 1 c ,
A 2 c , ⋮ ,
A n c ) is the mixing matrix of the independent axes (
A 1 c , ⋮ ,
A n c),
s c = (
s 1 c , ⋮ ,
s n c )
T are the activation coefficients along the axes, and
\(\mathbf{{\upsilon}}^{\mathit{c}}\) is the centroid,
\(\mathbf{{\varsigma}}^{\mathit{c}}\) the noise, and
n the dimensionality (number of axes) of cluster c. The noise is modeled by Gaussian distribution with zero mean and covariance \(\mathbf{{\psi}}^{\mathbf{c}}\) . So the distribution of x in any cluster can be written asp(x|c) = p(x|A c ,υ c ,
ψ c) = ∫N(
x|
A c s c +
υ c,
ψ c)p(
s c)d
s c, where
\(N\) denotes Gaussian distribution, the activation coefficients
\(\mathbf{s}^{c}\) are assumed to be independent, and the density of each source,
\(s_{m}^{c}\) , is modeled by
k mixtures of Gaussian, p(s
m c) =
\({{\sum}_{k}}{\pi}_{mk}^{c}N(s_{m}^{c}{\vert}{\phi}_{mk}^{c},\ {\beta}_{mk}^{c}),\) where each
\({\pi}_{mk}^{c}\) is a mixture weight and
\(N\) denotes Gaussian distribution whose mean is
\({\phi}_{mk}^{c}\) and variance is
\({\beta}_{mk}^{c}\) . The prior for mixing matrix
\(\mathbf{A}^{c}\) is also Gaussian with zero mean and covariance
\(\mathbf{{\alpha}}\) :
\(p(\mathbf{A}_{nm}^{c}{\vert}{\alpha}_{m}^{c}){=}N(\mathbf{A}_{nm}^{c}{\vert}0,{\alpha}_{m}^{c}).\) Often, maximum likelihood estimation overfits data, and Bayesian learning overcomes overfitting by introducing priors. The priors introduced for the parameters
π,
φ,
β,
α,
υ,
ψ,
P(
c) are D, N, Γ, Γ, N, Γ, D, respectively, where N, Γ, and D are Gaussian, Gamma, and Dirichlet distributions, respectively.