Abstract
purpose. To examine the psychometric properties of the Ocular Comfort Index (OCI), a new instrument that measures ocular surface irritation designed with Rasch analysis to produce estimates on a linear interval scale.
methods. The OCI was self-completed by 452 subjects. Some of them repeated the questionnaire, to aid in determining its reliability and test–retest repeatability. Ten versions were produced to evaluate question order effects. In addition, three construct hypotheses were tested to verify that the OCI was measuring what was intended, concordance with the Ocular Surface Disease Index (OSDI), the relationship with tear break-up time (TBUT), and the change in TBUT after the use of ocular lubricants in individuals with moderate dry eye.
results. A 12-item OCI was developed with well-functioning items and categories: 95% confidence interval for the intraclass correlation coefficient = 0.81 to 0.91; person separation = 2.66; item separation = 11.12; and 95% repeatability coefficient = 13.1 units (0–100 scale). The ordering of items had no effect on OCI measures (P = 0.41). The OCI measure exhibited a positive correlation with the OSDI score (P < 0.0001) and a negative correlation with TBUT (P < 0.0001) and was able to detect improvement in symptoms of dry eye in individuals before and after treatment (P < 0.0001).
conclusions. The OCI was shown to have favorable psychometric properties that make it suitable for assessing the impact of ocular surface disease on patient well-being and changes in severity brought about by disease progression or therapeutic strategies.
Measurement of the level of discomfort caused by ocular surface disease is currently limited by the shortcomings of available questionnaires. For example, the McMonnies survey was developed to assist the diagnosis of dry eye syndrome (DES) by considering epidemiologic risk factors, the frequency of symptoms of ocular irritation, and sensitivity to environmental triggers.
1 2 Notwithstanding its usefulness in diagnosis,
3 the McMonnies score cannot be relied on as an indicator of symptom severity, defined in this work as an aggregate function of frequency and intensity, because this component of its tally is combined with unrelated others. The McMonnies survey is also unsuitable for appraising temporal changes, because the responses to its epidemiologic questions are likely to be identical at different time points, and its questions relating to environmental triggers introduce noise if they have not been experienced between replicate testing.
The Ocular Surface Disease Index (OSDI) was developed more recently to grade the severity of DES and is notable among other questionnaires for ocular surface disease for having undergone psychometric testing and having been accepted by the U.S. Food and Drug Administration (FDA) for use in clinical trials.
4 5 6 This instrument has a 12-item, five-category Likert design with three subscales that sequentially ask about symptoms of ocular irritation and the impact on vision-related functioning and environmental triggers of DES. It cannot be assumed that the difficulty step between each category is constant or that the difficulty of all questions is comparable, from which it follows that its scale may not be additive or linearly related to symptom severity. Instead, the score of the OSDI should be interpreted as a relative ordering of person afflictions.
7 8 This limits what can be inferred from the OSDI, because ordinal rankings are less suited to the estimation of any effect size than are interval data.
9 Also its gains in applicability made by investigating several symptom domains are offset by reductions in its interpretative potential.
10 Furthermore, its handling of missing data, caused by omitted or “not applicable” responses, by expressing the score as a percentage of the maximum possible value of the questions answered, is inadequate; higher percentages will be achieved if difficult items, those that are more likely to score lower, are not answered.
The Ocular Comfort Index (OCI) was conceived in response to deficiencies in existing instruments for use in clinical trials. The OCI was designed and tested with Rasch analysis, which calibrates the “difficulty” of items (ρ) along the latent variable of interest that act as marks on a ruler against which person “ability” (α) can be compared.
11 These terms stem from the technique’s origins in aptitude testing. In this context, more difficult items are those that tend to receive lower scores, and more able persons experience a greater degree of discomfort. In Rasch models the probability of an observed response by a person to an item is related to their functional ability, which is defined as the difference between their ability and the item’s difficulty (α − ρ). For dichotomous 0 or 1 responses the ability of a subject is equated to the difficulty of items with which they have a 50% chance of success:
P(1|α,ρ) = 0.5 when α = ρ. Polytomous items with
m categories can be considered to represent
m − 1 marks on a ruler to which persons taking the item are compared, rather than just the one with binary response structures. Here, a person has a 50% chance of responding with category
x rather than category
x − 1 when his or her ability matches the difficulty of the item summed with the step calibration of the category (τ
x ):
P(
x|α,ρ) = 0.5 when α = ρ + τ
x .
12 The theory of Rasch analysis was expounded in a recent review article.
13 The foremost merit of these methods is that derived values meet the requirements of a noninteractive conjoint structure, so item difficulty and person ability can be estimated from observed responses without ambiguity.
14
The purpose of this study was to develop and test the validity of a new instrument capable of measuring the severity of discomfort caused by ocular surface disease for use in clinical trials.
Symptoms may be present most of the time but mild and vice versa, and so items asking about the frequency and intensity of symptoms could have probed different latent traits in violation of the requirement of the Rasch model for unidimensionality. It was therefore reassuring that the fit statistics of all questions were within suggested guidelines.
32 33 Indeed, the perfect pairing of the difficulties of items that asked about the same symptom suggests that most individuals did not differentiate between frequency and intensity, which is consistent with the reports of others.
34
The range of average item difficulties of the 12-item OCI was relatively narrow (−1.14 to 0.74 logits). This result threatens to limit the range of persons for whom the instrument performs well, because the information yield of each item is inversely proportional to the disparity between its difficulty and the ability of the person taking that item. However, the instrument’s range of applicability is broadened by its polytomous responses that differ more in their difficulties (−2.72 to 3.60 logits) than the items that, with perhaps the exception of tiredness and pain, were essentially synonymous. The similarity of item difficulties may account for the variation in relative frequency of various symptoms in dry eye populations reported in the literature. Toda et al.
35 found, similar to this study, that ocular fatigue is the most common complaint of these patients in Japan, above dryness and pain; whereas, Begley et al.
34 and Nichols et al.
36 independently reported that dryness was more common than tiredness in North America. Alternative explanations for these discrepancies include geographic variability in the interpretation of adjectives used to describe symptoms or differences in the wording of questions between studies.
Item order did not influence response patterns. This result was anticipated, because the influence of contextual factors is generally limited to when questions ask about attitudes or are emotionally weighted.
37
The repeatability and reliability of the OCI were acceptable, particularly considering that the symptoms of ocular surface disease are known to exhibit considerable variability and so observed differences between replicate testing would have embodied both measurement error and real person variation.
34
The OCI exhibited a moderate positive correlation with the OSDI and a moderate negative correlation with TBUT, as predicted. That the strength of these correlations was not greater is not necessarily a cause for concern, because the OSDI differs from the OCI in that it probes several, albeit related, dimensions; and a low TBUT is just one of many causes of ocular surface irritation. Further authentication of the premise that the OCI evaluates ocular discomfort was that its score improved in subjects with DES after treatment.
Floor response patterns may result from the complete absence of symptoms or poor instrument sensitivity. The OCI elicited such responses less often than did the OSDI; indeed, the developers of the OSDI reported that an even greater proportion of their sample (12.2%) responded this way.
4 The discrepancy suggests that the OCI is better able to measure milder degrees of discomfort than is the OSDI. Another concern for the OSDI was the high proportion of subjects who responded “not applicable” to one or more of its environmental trigger items, reducing the precision of its estimate of ocular discomfort in these cases and, because it is based on raw data counts, altering test difficulty in unknown ways.
A drawback of Rasch analysis is that it cannot estimate person measures for extreme floor/ceiling raw scores. The complete absence of symptoms or the notion that symptoms could not be worse is at odds with its philosophy, yet the rejection of any data is undesirable in clinical trials. Several methods have been proposed to generate definite measures for such response patterns that assume that an extreme score implies a measure only slightly out of the range of the test.
38 The software used in this work assigned 0 scores a value of 0.3 score points and subtracted 0.3 score points from maximum scores to allow the estimation of person measures.
39 This was considered when the OCI was linearly rescaled so that extreme raw scores correspond to its measurement scale bounding values of 0 and 100.
In the calibration of the OCI, approximately 5% of subjects were excluded because, based on statistical considerations, their response patterns were deemed incompatible with the Rasch model for the whole data set. These subjects were excluded to ensure that the instruments measures were valid in terms of measurement theory for most of the respondents. However, it is likely that if the OCI is used in clinical trials some subjects will respond in abnormal ways, as identified by their fit statistics. The OCI does generate measures of discomfort for these persons, although of relatively low precision, and so they can be included in any analysis. It would, however, be prudent to check such data for transcription errors and to investigate whether these subjects are unusual in any other regard. Also, subsequent analysis can be repeated with and without these persons. The inclusion of misfitting persons is unlikely to have a significant effect on results unless they constitute a relatively large proportion of the study sample.
Another issue for those using the OCI is what level of significance to ascribe to the units of its scale. As an interval measure, it can be surmised that an increase from 5 to 10 units denotes the same increase as from 15 to 20 units, although it cannot be assumed that this represents a doubling of symptom severity as it would if it were a ratio scale. However, these changes have no intrinsic clinical significance. Clinical significance must be ascertained by future work that compares changes in instrument scores with minimally important changes defined on external criteria.
40 Of note in this regard, in this study the use of ocular lubricants in subjects with DES was moderately well appreciated and typically reduced the OCI score by more than six units. Based on these data, it seems reasonable to suggest that that changes of three or more units are likely to be noticed by patients and therefore that this step can be regarded as an estimate of a minimally important treatment difference.
The OCI produces valid measures of ocular surface irritation when scored with maximum-likelihood iterative procedures. Good results can be achieved by using Rasch software with item difficulties and category structure anchored to the values reported in this paper, or with a computer program written in commercial software (Excel; Microsoft; Redmond, WA) available freely from the corresponding author (
OCI Calculator), as are copies of the questionnaire (
OCI Questionnaire).
The OCI is suitable for use in clinical trials to assess the impact of ocular surface disease on patients’ well-being and the effectiveness of therapeutic strategies. Its major benefits over existing instruments are that, through Rasch analysis, it produces estimates on a linear interval scale rather than ordinal ranks and so is better able to quantify change and, through statistical methods, to account more satisfactorily for missing data. However, the clinical significance of its units requires empiric determination and, as with all questionnaires that employ Rasch methods, it struggles to deal with extreme raw scores.
MEJ is supported by a research scholarship from Ultralase Ltd.
Submitted for publication October 18, 2006; revised December 8, 2006, and March 6, 2007; accepted August 20, 2007.
Disclosure:
M.E. Johnson, Ultralase, Ltd. (F);
P.J. Murphy, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked “
advertisement” in accordance with 18 U.S.C. §1734 solely to indicate this fact.
Corresponding author: Michael E. Johnson, Cardiff University, School of Optometry and Vision Sciences, King Edward VII Avenue, Cardiff, CF10 3NB, Wales, UK;
JohnsonM2@cardiff.ac.uk.
Table 1. Fit Statistics for All 15 Items in the Preliminary Analysis
Table 1. Fit Statistics for All 15 Items in the Preliminary Analysis
Item | Infit | | Outfit | |
| MNSQ | ZSTD | MNSQ | ZSTD |
1. In the last week, did your eyes feel comfortable? | 0.76 | (−4.0) | 0.82 | (−2.5) |
2. In the last week, how often did your eyes feel dry? | 1.07 | (1.1) | 0.98 | (−0.3) |
3. When your eyes felt dry, typically, how intense was the dryness? | 0.90 | (−1.6) | 0.82 | (−2.4) |
4. In the last week, how often did your eyes feel gritty? | 1.13 | (1.9) | 1.02 | (0.3) |
5. When your eyes felt gritty, typically, how intense was the grittiness? | 0.92 | (−1.1) | 0.83 | (−2.1) |
6. In the last week, how often did your eyes feel stingy? | 1.02 | (0.3) | 0.93 | (−0.8) |
7. When your eyes stung, typically, how intense was the stinging? | 0.88 | (−1.8) | 0.79 | (−2.4) |
8. In the last week, how often did your eyes feel tired? | 1.13 | (1.8) | 1.19 | (2.5) |
9. When your eyes felt tired, typically, how intense was the tiredness? | 0.97 | (−0.5) | 0.99 | (−0.1) |
10. In the last week, how often did your eyes feel painful? | 0.93 | (−0.9) | 0.83 | (−1.8) |
11. When your eyes felt painful, typically, how intense was the pain? | 0.95 | (−0.6) | 0.90 | (−1.0) |
12. In the last week, how often did your eyes itch? | 1.12 | (1.8) | 1.15 | (1.9) |
13. When your eyes itched, typically, how intense was the itching? | 1.05 | (0.8) | 1.10 | (1.2) |
14. In the last week, how often did your vision change between clear and blurred? | 1.37 | (5.0) | 1.34 | (3.8) |
15. When your vision was changeable, how bothersome was it? | 1.15 | (2.2) | 1.05 | (0.6) |
Table 2. Subject Information for the Various Arms of the Study
Table 2. Subject Information for the Various Arms of the Study
Study Arm | Number | Median Age (y) | Proportion Female | Median OCI Score (0–100 Scale) | Comparison of Median OCI Score with Total |
Repeated OCI | 95 | 33.5 | (70/95) 74% | 35 | P = 0.92 |
OCI versus OSDI | 337 | 29 | (223/114) 66% | 33 | P = 0.22 |
OCI versus TBUT | 102 | 29 | (42/102) 59% | 44 | P < 0.001 |
Ocular lubricants | 65 | 38 | (39/65) 60% | 49 | P < 0.001 |
Total | 452 | 34 | (154/452) 66% | 35 | — |
Table 3. Category Diagnostics for the Secondary Analysis after the Removal of Grossly Misfitting Items and Persons
Table 3. Category Diagnostics for the Secondary Analysis after the Removal of Grossly Misfitting Items and Persons
Response | Frequency | Infit MNSQ | Outfit MNSQ | μ x | τ x | ωmin | ωmax |
0 | 1625 | 0.95 | 0.97 | −2.10 | — | −∞ | −2.12 |
1 | 911 | 1.00 | 0.89 | −1.45 | −1.19 | −2.12 | −1.17 |
2 | 743 | 1.05 | 0.89 | −0.97 | −1.05 | −1.17 | −0.60 |
3 | 658 | 0.95 | 0.90 | −0.52 | −0.67 | −0.60 | 0.03 |
4 | 682 | 1.01 | 1.03 | −0.14 | −0.39 | 0.03 | 1.04 |
5 | 297 | 1.18 | 1.21 | 0.30 | 0.95 | 1.04 | 2.77 |
6 | 53 | 1.38 | 1.20 | 0.52 | 2.35 | 2.77 | +∞ |
Table 4. Estimates of Item Difficulties and Fit Statistics from the Secondary Rasch Analysis after the Removal of Grossly Misfitting Items and Persons
Table 4. Estimates of Item Difficulties and Fit Statistics from the Secondary Rasch Analysis after the Removal of Grossly Misfitting Items and Persons
Item | Difficulty (logits) | SE | Infit | | Outfit | |
| | | MNSQ | ZSTD | MNSQ | ZSTD |
11. Pain (int.) | 0.74 | 0.05 | 0.91 | (−1.1) | 0.85 | (0.61) |
10. Pain (freq.) | 0.66 | 0.05 | 0.96 | (−0.5) | 0.87 | (0.62) |
7. Sting (int.) | 0.36 | 0.05 | 0.86 | (−2.0) | 0.75 | (0.68) |
6. Sting (freq.) | 0.26 | 0.05 | 0.99 | (−0.2) | 0.89 | (0.67) |
5. Gritty (int.) | 0.25 | 0.05 | 0.98 | (−0.2) | 0.90 | (0.66) |
4. Gritty (freq.) | 0.12 | 0.05 | 1.16 | (2.2) | 1.04 | (0.66) |
13. Itch (int.) | 0.09 | 0.05 | 1.09 | (1.2) | 1.15 | (0.63) |
12. Itch (freq.) | −0.04 | 0.04 | 1.17 | (2.4) | 1.19 | (0.64) |
3. Dryness (int.) | −0.14 | 0.04 | 0.89 | (−1.7) | 0.81 | (0.72) |
2. Dryness (freq.) | −0.33 | 0.04 | 1.08 | (1.2) | 1.00 | (0.73) |
9. Tiredness (int.) | −0.82 | 0.04 | 1.00 | (0.0) | 1.03 | (0.69) |
8. Tiredness (freq.) | −1.14 | 0.04 | 1.12 | (1.6) | 1.20 | (0.69) |