Demographic and clinical data of the participants were analyzed (SPSS ver. 17 for Windows; SPSS Inc., Chicago, IL). Rasch analysis
19,20 was performed (Winsteps, ver. 3.67, Chicago, IL)
21 to determine the validity and reliability of the NEI VFQ-25. As there are six different types of rating scales, we used four Andrich rating-scale models
19 (one for each item group) to obtain the estimates of the required visual ability of each item, perceived visual ability of each person, and the category thresholds for each response categories. For each of the item types with different rating scales, we first investigated evidence of disordered thresholds, as this indicates whether the participants could not reliably discriminate between the response categories. Combining adjacent categories is often the solution to disordered thresholds.
22
Once response category performance was satisfactory, person and item measures were examined for fit to the Rasch model, with an unconditional maximum-likelihood estimation routine. Rasch analysis locates item difficulty and person ability on a logit scale (log of odds). How well the data fit the model was evaluated by the item fit statistics infit and outfit. The information-weighted (infit) statistic is more sensitive to the pattern of responses to person-targeted items and less sensitive to the presence of outliers; therefore, it is the main fit statistic reported herein. The outlier-sensitive (outfit) statistic is sensitive to unexpected behavior by persons or items far from the subject's ability level. In the mean square (MNSQ) form, fit statistics show variance in the data with an expected value of 1.0. MNSQ values less than 1.0 indicate that the items are too predictable, thereby suggesting redundancy. Values of more than 1.0 suggest unpredictability due to noise in the data and are considered to be misfitting. Values between 0.7 and 1.3 are considered acceptable.
23,24 These values represent 30% less or more variance than expected for the item.
The person separation index is the ratio of the variance in the person measures for the sample to the average error in estimating these measures. It is a measure of how broadly the persons could be distinguished into statistically distinct levels. The person separation reliability coefficient describes the reliability of the scale to discriminate between the persons of different abilities. A person separation index of ≥2.0 or a reliability value of ≥0.8 represents the minimum acceptable level of separation.
20,23 A value of 0.8 is equivalent to a person separation ratio (
G) of 2, which means that there are three strata [strata = (4
G + 1)/3], or significantly different levels, of person ability that can be distinguished by the items.
20,25 Targeting is a method of assessing how well the difficulty of the items in the scale suits the ability of the sample. Suitability can be assessed by inspecting the person-item maps or numerically using the mean scores for person and item measures. Effective targeting is evident when the person and item means are close to each other.
23,26
To test the hypothesis that the NEI VFQ-25 measures a single underlying construct (unidimensionality) we conducted a principal components analysis (PCA) of the residuals (difference between the observed and expected responses).
27,28 Data are considered unidimensional if most of the variance is explained by the principal component and there is no significant explanation of the residual variance by the contrasts to the principal component. The variance explained by the principal component for the empiric calculation should be comparable to that of the model and should be >60%.
28 Furthermore, the unexplained variance by the contrasts should be <2.0 Eigenvalue units, which is close to that seen with random data.
For further validation, we tested differential item functioning (DIF), which assesses whether the items have different meanings for the different groups of the sample. DIF is tested for a range of variables: age, sex, ocular comorbidity, and level of vision loss. The raw differences in item calibration between groups were examined to identify DIF. DIF was considered absent if it was less than 0.50 logits, minimal but probably inconsequential if it ranged between 0.50 and 1.0 logits, and notable if it was >1.0 logit.
29,30
The 12 subscales of the NEI VFQ-25 were then analyzed separately by using the same procedures and criteria as that used to analyze the overall scale. However, six subscales did not fit the Rasch model due to item insufficiency. Hence, new factor models were hypothesized based on common underlying themes that classify most of the remaining items. These new models were then assessed by using the CFA and Rasch analysis.
Using the Rasch calibrated person measures, hypothesized factor models were evaluated by CFA (performed with AMOS, ver 16; SPSS Science, Chicago, IL). This analysis allows the assessment of the overall model fit, testing the relationship between the observed variables and their underlying latent constructs in the model. To determine the adequacy of model fit with the data, we used the following fit indices: (1) χ
2, (2) the goodness-of-fit index (GFI), (3) the adjusted goodness-of-fit index (AGFI), (4) the comparative fit index (CFI), (5) the Tucker-Lewis index (TLI), and (6) the root mean square error of approximation (RMSEA). A nonsignificant χ
2 probability value indicates a good model fit. However, χ
2 is sensitive to sample size. To address this concern, a relative χ
2 is used (ratio of χ
2 to degrees of freedom, χ
2/
df) with a recommended range of 1.0 to 2.0.
31 For GFI, AGFI, CFI, and TLI values, <0.90 indicates lack of fit, between 0.90 and 0.95 indicates reasonable fit, and between 0.95 and 1.00 indicates good fit.
32–34 The RMSEA values must be ≤0.05 to indicate good fit. Values between 0.05 and 0.08 indicate reasonable fit.
33,34 The scale structure (with the best fit characteristics) was then examined for validity and unidimensionality with Rasch analysis (Winsteps, ver. 3.67).
21