Abstract
Purpose.:
To develop a questionnaire (in Spanish) to measure computer-related visual and ocular symptoms (CRVOS).
Methods.:
A pilot questionnaire was created by consulting the literature, clinicians, and video display terminal (VDT) workers. The replies of 636 subjects completing the questionnaire were assessed using the Rasch model and conventional statistics to generate a new scale, designated the Computer-Vision Symptom Scale (CVSS17). Validity and reliability were determined by Rasch fit statistics, principal components analysis (PCA), person separation, differential item functioning (DIF), and item–person targeting. To assess construct validity, the CVSS17 was correlated with a Rasch-based visual discomfort scale (VDS) in 163 VDT workers, this group completed the CVSS17 twice in order to assess test-retest reliability (two-way single-measure intraclass correlation coefficient [ICC] and their 95% confidence intervals, and the coefficient of repeatability [COR]).
Results.:
The CVSS17 contains 17 items exploring 15 different symptoms. These items showed good reliability and internal consistency (mean square infit and outfit 0.88–1.17, eigenvalue for the first residual PCA component 1.37, person separation 2.85, and no DIF). Pearson's correlation with VDS scores was 0.60 (P < 0.001). Intraclass correlation coefficient for test–retest reliability was 0.849 (95% confidence interval [CI], 0.800–0.887), and COR was 8.14.
Conclusions.:
The Rasch-based linear-scale CVSS17 emerged as a useful tool to quantify CRVOS in computer workers.
Spanish Abstract
Items judged appropriate for a CRVOS questionnaire were identified in different ways:
-
Through a search of the different databases (MEDLINE, EMBASE, and PROQOLID) focusing on studies conducted to date on CRVOS
9,22–28 ;
-
By asking 14 optometrists (with 9 ± 6 years of clinical experience) to detail the words used by their patients to describe these symptoms and list the most common VDT-related complaints;
-
According to the recommendations of others,
13,20,21,29 we conducted semistructured interviews with 59 VDT workers (mean age 38.6 ± 9.2 years, 52.5% female) fulfilling the definition of “VDT worker” established by the Instituto Nacional de Seguridad e Higiene en el Trabajo (INSHT, Spanish Institute of Health and Safety at Work)
30 ;
-
Also through incorporation of five items of the VFQ25
31 and one item of the VF14
32 questionnaires.
In this first stage, we obtained a pool of 277 items. Two optometrists then used an item assessment guide based on the recommendations of Streiner and Norman
21 (see
Supplementary Table S1 for details) to reduce the item bank to 138. These 138 items were evaluated by a group of 16 volunteer users who were instructed to choose the items that best described each symptom. In addition, for each proposed item they chose the response category group, among the groups used in similar questionnaires cited in the literature, that best described the severity of the symptoms they experienced at work.
This process served to generate 77 items for a pilot questionnaire fulfilling the following inclusion criteria: There had to be at least one item for each symptom described in the prior item-generation stages; if users' preferred item for a symptom differed from the item best rated by the experts, both were included. Also, the response category group for each item was chosen by the users in such a manner that initially one item had a seven-category response scale, 34 had a six-category scale, 26 had a five-category scale, and 16 had a four-category scale.
The pilot questionnaire (CVSS77) consisted of the 77 items selected as described above plus 11 items designed to obtain information on age (18–65 years), sex, and whether the respondent fulfilled the criteria for a “VDT worker” as defined by the Spanish INSHT.
30 Subjects were required to provide replies for at least 66% of all items.
The pilot CVSS77 was distributed among the members of a trade union (Unión General de Trabajadores) and a health and safety at work organization (Grupo OTP-Prevención de Riesgos Laborales) from May 7 to October 19, 2012 via their Web sites. Each time the Web site was accessed, one of six versions of the questionnaire with the items in different order appeared to avoid order effects.
The questionnaire was completed online by 636 subjects. Forty-eight questionnaires were eliminated because they were incorrectly completed, leaving 588 completed questionnaires for validation.
The Rasch model is an item response theory (IRT) model. The model transforms raw scores to preserve the distance between the locations of two persons regardless of the particular items administered. The main IRT concept is that a mathematical model is used to predict the probability of a person successfully replying to an item according to the person's ability and item difficulty.
33
Since the selected items were polychotomous, for Rasch analysis we had to choose between the partial credit model (PCM, which considers a different rating scale for each item) and the Andrich rating scale model (RSM, which assumes equal category thresholds across items). The PCM is less restrictive than RSM because it allows for different response categories in different items, yet it may complicate the communication to the audience and requires a larger dataset.
34 The PCM was finally selected for two reasons: (1) RSM would mean making a priori assumptions about the similarity of scale points across items, and we had no evidence of this in our item set; and (2) several items (e.g., A30–A22 and B7–B8) initially showed different response patterns despite sharing the same rating scale structure, so PCM was likely to offer more scoring precision than RSM.
The PCM implemented in BIGSTEPS software (version 2.82, MESA measurement;
http://www.winsteps.com/bigsteps.htm, provided in the public domain by WINSTEPS, Chicago, IL, USA) was used to identify unusual response patterns. Infit and outfit mean square values, which compare predicted and observed responses, were obtained for each subject and, according to established criteria,
35 four questionnaires were revised because their outfit was >2.5; two of these were discarded because responses lacked coherence. This left 586 valid completed questionnaires. A further 10 questionnaires were excluded by BIGSTEPS because scores were under the minimum estimated measure, leaving 576 valid responses.
This paper describes a new tool to quantify vision-related symptoms associated with VDT use at work, developed using conventional techniques and Rasch analysis to provide reliable and valid measures.
The final number of items included (17) is similar to those of other available vision-related validated questionnaires.
12,15,16,39 This number of items means that the subject can complete the questionnaire quickly, especially if in electronic format.
The 17 items of the scale were designed to obtain information about 15 different symptoms. These symptoms have been included with different frequencies in other questionnaires used in research on CRVOS.
6,9–11 The behavior of the symptoms defined in our questionnaire resembles that of the two main contributing factors in the factorial analysis described by Sheedy et al.
40 for experimentally induced asthenopia. However, the CVSS17 includes a broader range of symptoms like photophobia (A33 and C23) and “blinking a lot” (A20), which were noticeably influenced by these two factors. The detection of two main factors, one related to the external symptom factor of Sheedy et al.
40 and the other related to the internal symptom factor, along with the presence of photophobia, suggests that the symptom model assessed by CVSS17 is similar to existing described models.
9–11,40
The item identification and reduction methods used in the CVSS17 development were systematic and rigorous in order to ensure content validity.
13,14,41 The PCM was used to reorder the response categories. This enables the selection of items with a good discrimination capacity and provides a statistically justified scale, without significant missing data, that shows ordered thresholds on Rasch analysis.
Because the selected items had different question formats (i.e., symptom severity, symptom frequency, subject opinion), we decided to include several rating scales that were chosen by a set of study subjects according to their suitability. The aim was, as far as possible, to use the most appropriate rating scale for each item. However, based on recently published data,
42,43 we consider the use of multiple rating scales as the major limitation of the CVSS17 because they can increase respondent burden, and also because they provided some evidence that differences in rating scale formats have some effect on an item's calibrations beyond item content.
43 Although the measurement properties of the CVSS17 may not be compromised per se, this should be taken into account when one is interpreting its item difficulty estimates, investigating improvements to the instrument, and comparing CVSS17 scores with similar scales.
The Rasch statistics used revealed that all items fit the model and, together with the residual PCA, confirmed its unidimensionality. Moreover, the point-biserial correlation calculated for each item of the CVSS17 was in the range 0.43 to 0.67, indicating significant yet nonredundant correlation.
Although Cronbach's α coefficient is not considered a useful measure of the reliability of a scale,
13 we decided to include it in our analysis to facilitate comparisons with other scales. For clinical applications, a coefficient between 0.9 and 0.95 is recommended.
44 Thus, we consider that the internal consistency of the CVSS17 (Cronbach's α = 0.92) makes it useful for comparisons between groups and for clinical applications.
The person separation value obtained (2.85) indicates that the tool is sufficiently sensitive to distinguish between high and low performers, and the person reliability index (0.89) indicates a capacity of CVSS17 to distinguish three or four levels of symptoms.
45 Also, the item separation value calculated (8.61) indicates that the person sample was large enough to confirm the item difficulty hierarchy (i.e., construct validity) of the tool.
45
The final questionnaire showed no DIF for the defined groups (male–female, presbyopes–nonpresbyopes). This means there was no difference in the way in which these subgroups responded to the test, indicating the validity of CVSS for all these subgroups.
33
The mean difference between person capacity and item difficulty was −0.89 logits, a little over the 0.5 logits difference recommended by Pesudovs et al.,
13 indicating that items targeted the more symptomatic end of the CVSS17. This is common for a symptom scale due to the presence in the sample of many subjects with few or no symptoms
15 and/or to a tendency for subjects to underreport their discomfort.
The summary statistics of the Rasch model confirmed that all the selected items contribute significantly to the overall score and that they all measure a related concept. Based on these observations, we propose that this concept is the set of visual and ocular symptoms associated with work-time VDT use.
Given the lack of a gold standard with which to compare our CVSS17 data, we used another validated instrument that measures a closely related concept, the VDS,
46 which has been used to measure reading-related visual discomfort.
47 Significant moderate to high correlation was detected between this scale and CVSS17, and VDS scores also correlated significantly with the two main factors of our scale. For factor–VDS correlations, Pearson's correlation coefficient was bigger for factor 2. These correlates can be considered the first evidence of the validity of CVSS17.
According to ICCs, test–retest reliability for the CVSS17 was good. The COR was somewhat higher than expected, probably due to the influence of eight subjects whose scores varied by 10 points or more when the questionnaire was completed twice. This was revealed by the fact that the ICC and COR significantly improved when this analysis was repeated with these subjects excluded.
The printed Spanish version of the CVSS17 and its Rasch-based scoring chart are provided as
Supplementary Material. We also provide an English version for its potential international use. However, more clinical research is needed to obtain more evidence of the validity of the scale (discriminant validity, divergent validity, and further evidence of construct validity) and to determine normal values of CVSS for population subgroups varying in socioeconomic status, race, and so on. Future studies will also need to determine the extent to which CVSS17 can detect clinically important changes over time (minimum clinically important difference, MID).
In conclusion, the CVSS17 questionnaire was developed using conventional techniques and Rasch analysis, ensuring construct validity and providing measures as a linear interval scale rather than ordinal measures. The CVSS17 is therefore able to assess CRVOS without the main limitations of previously developed instruments.
12,13,15
The authors thank Unión General de Trabajadores, Grupo OTP-Prevención de Riesgos Laborales, Fraternidad-Muprespa, and Siemens España for the cooperation of their video display terminal workers.
Disclosure: M. González-Pérez, None; R. Susi, None; B. Antona, None; A. Barrio, None; E. González, None