The data were analyzed with Winsteps software (ver. 3.68)
20 using the Andrich rating scale model for polytomous data.
21
In the first step, we assessed the response categories and the thresholds.
8,21 The threshold represents the intersection between any two adjacent categories (i.e., between 1 and 2, 2 and 3, and so on) where the probability of either category being chosen is equal. In the ADVS, there are four thresholds for five categories of each item. We used category probability curves (CPCs) to examine the ordering of thresholds graphically. Thresholds should demonstrate an order from most to least difficult category, but disordering can occur. Disordered thresholds suggest that the response categories are not efficient in discriminating between two ability levels; that is, participants with more ability could respond with the same category as another participant with lower ability. Disordering occurs because participants have difficulty discriminating between response categories. We reorganized the categories that showed disordered thresholds by combining certain categories. Once the response categories were found to perform as intended, we performed further Rasch analyses.
Measurement precision was assessed in terms of person separation, which gives an estimate of the spread or separation of persons by strata or groups along the measurement construct.
8,22 The minimum acceptable separation is 2.0, and this enables the distinction of three strata (for example, mild, moderate, and severe visual disability).
Rasch fit statistics in combination with PCA of residuals were used to test the dimensionality of the ADVS and each subscale.
23 As the Rasch model is probabilistic, some amount of deviation in scores is expected. This deviation in expected versus observed scores is captured by fit statistics (i.e., infit mean square, or MnSq).
22 The ideal value of Infit MnSq is 1.0 (indicates no deviation). In accordance with the literature, an infit MnSq between 0.7 and 1.3 was an indicator of acceptable fit. Items outside this range were considered misfits.
24 In essence, this range permitted observations to contain up to 30% less or more variation than predicted by the model. Misfitting items were removed iteratively (i.e., one at a time) starting with the most misfitting, until all remaining items fit the model.
25 Furthermore, when items fit the model's expectations, the residuals
26 (observed minus expected scores) should be randomly distributed, with all meaningful variance in the data accounted for by the Rasch dimension of item difficulty–person ability. In practice, however, some interitem correlations typically remain; PCA describes the additional factors that may be extracted from the data.
9–11 If 60% or more of the variance is accounted for by the principal component, then there is a low likelihood of additional components being present.
27 The first contrast in the residuals reveals whether there are any patterns within the variance unexplained by the principal component to suggest that a second trait is being measured. We used the criterion of an eigenvalue of >2.0 for the first contrast, which indicates that the contrast has the strength of at least two items to be sufficient evidence of a second construct, as this is greater than the magnitude seen with random data.
27
In the present study, we performed PCA, an assessment for misfitting items. Thus, the iterative method to remove items that did not fit the model is different from the earlier Rasch analysis of the ADVS. We used this approach because items can misfit for several reasons, including poorly constructed wording. Fit statistics identify only items that misfit, not misfitting items that group to form additional constructs, so fit statistics alone are not as informative of multidimensionality as PCA. When PCA is performed first, it helps to more clearly identify additional construct(s) if they are present in the overall scale.
An ideal scale should function in the same way regardless of which group is assessed. DIF occurs when given the same level of the latent trait, the difficulty levels of items vary systematically based on sample characteristics, such as age and sex. The variables for DIF analysis, selected a priori, included age (<76 years vs. ≥76 years; median age, 76), sex, cataract status (first eye versus second eye surgery), systemic comorbidity and ocular comorbidity (present versus absent). Testing for DIF can occur based on either significance or magnitude. Because significance testing is highly sample-size dependent, we prefer testing for DIF magnitude.
28 Therefore, in the present study, we defined DIF based on magnitude: insignificant DIF as <0.50 logit, mild (but probably inconsequential) as between 0.50 and 1.00 logit, and notable as >1.00 logit.
29
For a well-targeted instrument (i.e., item difficulty matched with participant ability), there would be no ceiling or floor effects in the person-item map.
30,31 Consequently, mistargeting implies lower person separation, leading to inability to differentiate between participants along the latent trait.
30 The person-item map illustrates targeting and further helps to identify gaps and redundancies in the item distribution. Appropriate items can then be added to fill the gaps, and redundant items can perhaps be deleted.
Adequate person separation constituted the minimum acceptable measurement properties of the Rasch models for the subscales and the entire ADVS to be termed a measure. If the subscales could not be repaired, full analysis of dimensionality using PCA was not performed.
Rasch analysis was conducted in two phases: assessment of performance of the subscales in phase I, and investigation of dimensionality of the entire ADVS to determine whether more appropriate subscales could be developed in phase II. Descriptive statistics were analyzed with commercial software (SPSS software ver. 15.0; SPSS, Chicago, IL).