Abstract
Purpose.:
To investigate, using Rasch analysis, whether the 15-item Glaucoma Quality of Life-15 (GQL-15) forms a valid scale and to optimize its psychometric properties.
Methods.:
One hundred eighteen glaucoma patients (mean age, 65.7 years) completed the German-version of the GQL-15. Rasch analysis was performed to assess category function (how respondents differentiated between the response options), measurement precision (discriminative ability), unidimensionality (whether items measure a single construct), targeting (whether items are of appropriate difficulty for the sample), and differential item functioning (whether comparable subgroups respond differently to an individual item). Where any of these attributes were outside acceptable ranges, steps were taken to improve the instrument.
Results.:
The five-response categories of the GQL-15 were well differentiated by respondents, as demonstrated by ordered and well-spaced category thresholds. The GQL-15 had an excellent measurement precision but demonstrated poor targeting of item difficulty to person ability and multidimensionality, indicating that it was measuring more than one construct. Removal of six misfitting items created a nine-item unidimensional instrument with good measurement precision and no differential item functioning but poor targeting. A new name, the Glaucoma Activity Limitation (GAL-9) questionnaire, is proposed for the short version, which better reflects the construct under measurement.
Conclusions.:
The GAL-9 has superior psychometric properties over the GQL-15. Its only limitation is poor targeting of item difficulty to person ability, which is an inevitable attribute of a vision-related activity limitation instrument for glaucoma patients, most of whom have only peripheral visual field defects and little difficulty with daily activities.
Glaucoma is the second leading cause of blindness after cataracts; it affects approximately 68 million people worldwide, 10% of whom are blind.
1,2 Glaucoma is often asymptomatic and therefore recognized as the silent cause of blindness. Nevertheless, the diagnosis, which requires lifelong follow-up and frequent ocular antihypertensive medication or surgery, can have a huge impact on a patient's life.
3 Hence, a comprehensive assessment of the impact of the disease and its treatment on patients from their perspective has become important for the measurement of glaucoma impact and treatment outcomes. The patient's point of view is measured using various types of questionnaires known as patient-reported outcomes (PROs). Highlighting their importance, the US Food and Drug Administration has also endorsed that PRO measures be included in all clinical trial end points for disease impact and outcome assessment in glaucoma.
4
A number of glaucoma-specific questionnaires or instruments have been developed in the past two decades.
5 –10 The Glaucoma Quality of Life-15 (GQL-15) questionnaire is concise and easy to administer.
11,12 Independent reviews have described it as one of the better glaucoma-specific instruments, with good acceptability among clinicians and patients.
13,14 Initially derived from a 62-item pilot instrument, the 15 item GQL-15 was first described in 2003.
12 These 15 items were selected on the basis of their strong relationship with visual field loss in glaucoma patients.
11 Several studies have used the GQL-15.
15 –17 The name of the instrument suggests that the trait under measurement is vision-related quality of life; however, all the items refer to activity limitation (near vision, peripheral vision, mobility, and dark adaptation).
11
Although there is no universally accepted definition of vision-related quality of life, there is growing consensus that it should include multidimensional assessment of the impact of vision on everyday activities, emotional well-being, social relationships, and independence.
18 The World Health Organization's International Classification of Functioning (WHO-ICF) also provides a unifying framework for the health-related consequences of a disease based on three components: impairment, activity limitations, and participation restriction.
19 When the WHO-ICF framework is conceptualized in vision, impairment refers to the diseases and disorders of the eye (e.g., glaucoma). Activity limitations are the difficulties in executing vision-related tasks as a result of impairment, such as inability to cross the road because of glaucomatous visual field loss. Participation restrictions are barriers to involvement in life situations caused by activity limitation, such as inability to go shopping because of inability to cross the road. When following the WHO-ICF framework, the construct being measured by the GQL-15 is vision-related activity limitations, not quality of life. Indeed, the papers describing its development explain that the purpose is to measure self-reported visual disability despite the name containing the term quality of life.
11
Similar to other glaucoma-specific instruments, the GQL-15 was developed and validated using the traditional method (Classical Test Theory [CTT]).
11 CTT provides a limited assessment of the psychometric properties of an instrument and produces scoring by the sum of raw ordinal values assigned to each item, which is not a true interval-level measurement.
20,21 This limits the interpretability of the instrument as a measure.
22 The problem can be resolved with the use of Rasch analysis. Rasch analysis estimates the raw questionnaire data to interval-level data.
23 It also provides greater insight into the psychometric properties of an instrument, including the assessment of response categories, measurement of precision, item fit to the construct, item targeting, and unidimensionality. Interval-level data not only provide a valid measurement, they enable the use of robust parametric statistics.
22 These benefits of improved psychometric assessment and interval-level scoring have led to the use of Rasch analysis in the development of new questionnaires
24 –26 and the reengineering of existing questionnaires.
27 –30 To the authors' knowledge, the Glaucoma Symptom Scale (GSS), is the only glaucoma-specific instrument that was Rasch analyzed, but it did not demonstrate to have satisfactory psychometric properties.
31
The primary aim of the present study was to explore the psychometric properties of the GQL-15 using Rasch analysis and to assess whether it forms a valid scale. If the GQL-15 was found to form a valid scale but to have suboptimal psychometric properties, the secondary aim was to optimize its psychometric properties.
The data were analyzed in two phases: assessment of the psychometric properties of the original GQL-15 and reengineering of the GQL-15 to optimize its psychometric properties.
Rasch analysis is a probabilistic mathematical model that estimates item difficulty, person ability, and threshold for each response category on a single continuum logit scale. A logit (log-odds ratio) is an interval scale that represents the probability of a person endorsing a particular response category in an item over (1 − the same probability [i.e., log ln(p/1 − p)]). For this analysis, the person with higher ability and items of greater difficulty were located on the negative side of the logit scale and vice versa. Rasch analysis was used for the following assessments: response scale analysis, measurement precision, unidimensionality, targeting, and differential item functioning.
Response Scale Analysis.
Measurement Precision.
Unidimensionality.
Targeting.
Rasch analysis generates a person-item map that provides a visual observation of the relative position of item difficulty to person ability. By default, the item mean is placed at 0 logit. For a perfectly targeted instrument, both item and person means lie on the sample point on the map (i.e., mean difference = 0 logits). However, a difference of person and item means of up to 1 logit is acceptable. A difference between means of >1 logit indicates notable mistargeting. Poor targeting occurs because of items clustering at a certain point along the map, large gaps between items, and the higher or lower ability of the study population than the required level of ability to endorse the items.
Differential Item Functioning.
Validity.
Response Scale Analysis.
Measurement Precision and Targeting.
Item Fit and Unidimensionality.
All 15 items fit the Rasch model within liberal infit (0.66–1.41) and outfit (0.59–1.39) ranges and, hence, could be considered productive for measurement. However, those items outside the range of 0.7 to 1.3 were a potential source of noise and could be considered for removal to optimize the psychometric properties of the instrument. Two items (1 and 15) were underfitting, and three items (8, 12, and 10) were overfitting the Rasch model (
Table 3).
Table 3. Rasch Fit Statistics and Item Measure for the GQL-15
Table 3. Rasch Fit Statistics and Item Measure for the GQL-15
Item No. | Items | Infit (MNSQ) | Outfit (MNSQ) | Item Measure (logit) |
1 | Reading newspapers | 1.41 | 1.39 | 0.28 |
2 | Walking after dark | 1.05 | 0.80 | 0.33 |
3 | Seeing at night | 1.13 | 0.08 | −1.35 |
4 | Walking on uneven ground | 0.91 | 0.83 | −0.43 |
5 | Adjusting to bright lights | 1.25 | 1.25 | −0.97 |
6 | Adjusting to dim lights | 1.10 | 1.08 | −0.34 |
7 | Going from light to dark room or vice versa | 1.03 | 0.99 | −1.31 |
8 | Tripping over objects | 0.66 | 0.59 | 0.49 |
9 | Seeing objects coming from the side | 1.02 | 1.07 | −0.45 |
10 | Crossing the road | 0.70 | 0.61 | 0.78 |
11 | Walking on steps/stairs | 0.75 | 0.91 | 0.75 |
12 | Bumping into objects | 0.68 | 0.55 | 0.63 |
13 | Judging distance of foot to step/curb | 0.83 | 0.98 | 0.28 |
14 | Finding dropped objects | 0.88 | 0.80 | 0.49 |
15 | Recognizing faces | 1.36 | 1.22 | 0.82 |
The PCA of the residuals showed that the variance explained by the principal component was 65.6%. However, the unexplained variance explained by the first contrast was 2.2 eigenvalue units, which suggests that the instrument was not unidimensional. Three items representing mobility (10, Crossing the road; 12, Bumping into objects; and 11, Walking on steps/stairs) loaded positively by >0.4 onto first contrast. Similarly, two items representing dark adaptation (5, Adjusting to bright lights; and 7, Going from light to dark room or vice versa) loaded negatively by < −0.4 onto the first contrast. No further contrast exceeded 2.0 eigenvalue units.
Differential Item Functioning.
Reengineering the GQL to Optimize Its Psychometric Properties.
Validity Assessment of the Nine-Item Questionnaire.
This study shows that the GQL-15 functions within the Rasch model in terms of measurement precision, response category functioning, and DIF. However, it was not a unidimensional scale. This is a fundamental problem because it becomes unclear what the instrument measures. To draw a clinical analogy, imagine a device that measures both intraocular pressure (IOP) and central corneal thickness but produces only one score. What would 600 mean, a thick cornea and a low IOP or a thin cornea and a high IOP? For glaucoma management, it is imperative that the two constructs under measurement be segregated into identifiable components. The same is true in questionnaires. Our analysis suggests that the GQL-15 largely measures one construct (vision-related activity limitation) but is contaminated by a mobility construct that appears to be different. Creation of a nine-item version enabled unidimensional measurement with excellent psychometric attributes. Hence, the shorter version is a better measure than the GQL-15 of vision-related activity limitation in patients with glaucoma.
To establish unidimensionality in the GQL-15, six items (1, 5, 8, 10, 12, and 15) were considered for removal on the basis of the PCA of the residuals and fit statistics. Among the three items related to mobility, the two items (10, crossing the road; 12, bumping into objects) loaded >0.4 in the first PCA contrast, and one item (8, tripping over objects) grossly misfitted the Rasch model. These activities may be important in glaucoma patients,
41 but Rasch analysis identified these items tap a different construct so are as a source of noise and multidimensionality in this activity limitations scale. Similarly, two near vision items (1, reading the newspaper; 15, recognizing faces) misfitted the model, possibly for a number of reasons. For example they may be related to near vision correction rather than to glaucoma; hence, these items do not behave predictably across the whole population. A glare disability item (5, adjusting to bright light) also misfitted the Rasch model. This was perhaps influenced by the presence of cataract or innate photophobia in a subset of patients; thus, the item did not behave as expected and again was removed from the scale.
Interestingly, though three mobility items misfitted and were removed, the shorter version still contains four mobility items (2, 4, 11, and 13). Similarly, three items that are related to light and dark adaptation (3, seeing at night; 6, adjusting to dim lights; 7, going from a light room to a dark room or vice versa) are also retained in the short version. Although it may seem inconsistent to remove some items and retain others within these two conceptual areas, the key issue is that the items retained behave predictably within the whole item set across the entire glaucoma population whereas the removed items did not.
The GQL has a five-category response scale that performs well. The categories are used in the order intended, and each occupies a wide portion of the measurement scale. The only concern is that the two higher-end categories had a low frequency of utilization (5, severe difficulty = 4%; 4, a lot of difficulty = 7%). Categories with low frequency can be problematic because they do not provide stable threshold values, and it may be recommended that they be collapsed into adjacent categories to eliminate noise that may arise from unstable calibrations.
34 We could have collapsed these two categories to improve utilization frequency. However, this would not have improved the psychometric properties of the instrument. As is typical of glaucoma patients, the majority of the study population had low visual disability. It is simply unlikely that many patients with glaucoma would endorse the categories of quite a lot of difficulty or severe difficulty. However, removing these categories may lead to loss of valuable psychometric information when the instrument is used on patients with higher disability.
42 Hence, five response categories were retained.
A significant association between instrument score and visual field loss was found in this study. The impact of visual field loss was stronger in the worse eye than in the better eye on the (GAL-9) score. Other studies have also reported that visual field loss in the worse eye has a great influence in self-reported visual disability in patients with glaucoma.
3,17,43 Similarly, visual acuity in the worse eye also demonstrated better association with the GAL-9 score than visual acuity in the better eye. However, the association was not distinctly different as in the worse and better eye visual field loss. Significant correlations between the GAL and visual parameters have demonstrated its validity. It would also have been interesting to investigate the correlation between the binocular visual fields on the questionnaire score. However, the binocular visual field test was not recorded in our study population.
The original name of the questionnaire might be misleading because the instrument measures only vision-related activity limitations. Hence, we have proposed a new name for the short-version instrument, the GAL-9 questionnaire. The GAL-9 also demonstrates poor targeting (difference between item mean and person mean ≥1.00 logit).
30 Poor targeting is an inevitable attribute because most people with glaucoma have little difficulty in performing everyday tasks
44,45 until they have poor visual acuity and advanced visual field loss.
12,46 This is consistent with the findings of this study and the visual acuity and visual field status of our population. Poor targeting is also a common problem in other vision-specific questionnaires when used on more able patients, such as in second eye cataract surgery patients who become more able after first eye cataract surgery.
39,47,48
Alternatively, more sensitive items addressing higher visual ability can be added to optimize the targeting of the instrument, but this strategy requires revalidation with each new addition. Such a process is lengthy, time consuming, and does not guarantee avoidance of poor targeting. It might be a good idea to develop a new comprehensive instrument that addresses holistic issues such as treatment effects and the psychosocial impact of glaucoma or to develop a superior strategy in the form of item banking. An item bank consists of a larger number of Rasch-calibrated items in a pool, which are presented by computer-adaptive testing (CAT) to patients on the basis of their response to previous items. This tailoring of item presentation ensures targeting of item difficulty to person ability. Hence, an item bank with the use of CAT provides a rapid, precise, and accurate measurement of the impact of disease on patients.
49
In conclusion, the revised GAL-9 has superior psychometric properties, including unidimensionality and good measurement precision with the added advantage of low respondent burden. Indeed, the GAL-9 is the only PRO designed for use in glaucoma patients to have been demonstrated to have satisfactory psychometric properties (the Glaucoma Symptom Scale does not).
31 Therefore, we encourage the use of the GAL-9 because of its psychometric properties and interval scaling, at least until a superior questionnaire is available. To simplify implementation, a spreadsheet (Excel; Microsoft, Redmond, WA) enabling estimation of person ability in logits from category responses is available for download (Supplementary File S2,
http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.11-7423/-/DCSupplemental).