Rasch analysis
34 was conducted using the Andrich rating scale model
35 with Winsteps software (version 3.68, available in the public domain at
http://www.winsteps.com/index.htm).
36 Given that the 32-item IVI uses two different ratings scales, we used the two Andrich rating scale model (one for items 1–19 and another for items 20–32). This approach has been described previously.
37,38 Rasch analysis is an iterative procedure that estimates interval measurement from ordinal data and the unit is logits (log-odd units).
38–40 For our study, a negative item logit indicates a more difficult item and a negative logit value for a participant indicates that the he/she possesses a higher level of the assessed latent construct (VRQoL); that is, better VRQoL. The Rasch procedures have been described in detail previously in this journal.
41,42 We used six fundamental indicators to assess the validity of the IVI
43 : (1) Behavior of the rating scale was defined specifically as an examination of category thresholds (threshold is the midpoint between adjacent response categories and indicates the point where the likelihood of choosing either response category is equal). (2) Item fit (extent to which use of a particular item is consistent with the way the participants have responded to other items) to the Rasch model was assessed using the weighted mean square (MnSq) or infit statistic. The infit MnSq is less sensitive to distortion from outliers, so is considered the more informative fit statistic,
44 and is the ratio of the observed variance of the residuals to the variance explained by the Rasch model. It has an expected value of 1.0 (range from 0 to infinity). Deviations in excess of the expected value may be interpreted as “noise” or lack of fit between the items and the model. We reported fit statistics as mean square standardized residuals (MNSQ) and used a criterion of 0.7 to 1.3 for Infit MnSq to diagnose misfitting items.
45 Any misfitting item (fit > 1.30 indicated 30% more variance than expected and, thus, suggested that the item measures a construct different from the overall scale) was removed and Rasch analysis rerun, and this iterative process was continued until no further misfit was observed. (3) Measurement precision was represented by person separation index (PSI, minimum acceptable value of 2.0) and associated reliability; that is, person separation reliability (PSR, minimum acceptable value of 0.8). (4) Targeting was the extent to which the items match participant's VRQoL, and was inspected using the person-item map; >1.0 logits indicates notable mistargeting.
37 (5) Unidimensionality was defined as the extent to which all the items measure a single underlying construct (VRQoL in the case of IVI) assessed by principal components analysis (PCA). The rationale for this is that after the “Rasch factor” has been extracted (in an attempt to account for all variation in the data) only standardized residuals equivalent to random noise should remain. A high level of variance accounted for by the principal component suggests a lower chance of finding additional components and a variance of ≥60% is considered to be good. Also, if the variance explained by the principal component for empirical data (variance components for observed data) and for the Rasch model (variance that would be explained if data complied with Rasch definition of unidimensionality) are comparable, then the chance of finding additional constructs is low.
44 The first contrast in the residuals reports whether there are any patterns within variance unexplained by the principal component to suggest that a second construct is being measured. We used the criterion that the contrast should have the strength of at least 3 items (as measured by an eigenvalue > 3.0; eigenvalue provides an indication of the proportion of the total variance explained by an individual factor) to be considered evidence of a second construct as this indicates that the potential second dimension has only marginal explanatory power, and this result allows for ignoring further components.
42 The loading of items onto the contrasts allows identification of which items tap different constructs; we used a minimum loading of 0.4 to identify contrasting items. (6) Lastly, we assessed differential item functioning (DIF; form of bias in which one subgroup, e.g., women, with given levels of VRQoL respond differently to an item compared to another subgroup, e.g., men, with similar levels of VRQoL) for age (<23/>23 years), sex, and keratoconus subgroups as defined in the study. We considered DIF to be insignificant if it was <0.50 logits, mild (but probably inconsequential) if it was 0.50 to 1.00 logit, and notable if >1.00 logit.
46,47 The interval-level IVI scores generated by the Winsteps software (version 3.68, available in the public domain at
http://www.winsteps.com/index.htm), after the data fit the Rasch model, were used for analyses.