Abstract
Purpose:
The purpose of this study was to assess the psychometric properties of diabetic retinopathy (DR) and diabetic macular edema (DME) quality-of-life (QoL) item banks and determine the utility of the final calibrated item banks by simulating a computerized adaptive testing (CAT) application.
Methods:
In this clinical, cross-sectional study, 514 participants with DR/DME (mean age ± SD, 60.4 ± 12.6 years; 64% male) answered 314 items grouped under nine QoL item pools: Visual Symptoms (SY); Ocular Comfort Symptoms (OS); Activity Limitation (AL); Mobility (MB); Emotional (EM); Health Concerns (HC); Social (SC); Convenience (CV); and Economic (EC). The psychometric properties of the item pools were assessed using Rasch analysis, and CAT simulations determined the average number of items administered at high and moderate precision levels.
Results:
The SY, MB, EM, and HC item pools required minor amendments, mainly involving removal of six poorly worded, highly misfitting items. AL and CV required substantial modification to resolve multidimensionality, which resulted in two new item banks: Driving (DV) and Lighting (LT). Due to unresolvable psychometric issues, the OS, SC, and EC item pools were not pursued further. This iterative process resulted in eight operational item banks that underwent CAT simulations. Correlations between CAT and the full item banks were high (range, 0.88–0.99). On average, only 3.6 and 7.2 items were required to gain measurement at moderate and high precision, respectively.
Conclusions:
Our eight psychometrically robust and efficient DR/DME item banks will enable researchers and clinicians to accurately assess the impact and effectiveness of treatment therapies for DR/DME in all areas of QoL.
Diabetic retinopathy (DR) is a common complication of diabetes,
1 which can result in substantial and sometimes irreversible vision loss in its proliferative stages. Diabetic macular edema (DME) can occur at any stage of DR and is responsible for severe loss of central vision.
2 The impact of DR on health-related quality of life (QoL) is substantial, especially at the vision-threatening stages.
3,4 Novel therapies such as anti- VEGF intravitreal injections have shown promising results for improving vision loss and QoL, particularly for DME.
5,6
Patient reported outcome measures (PROs) are essential to guide service provision and improve care in clinical practice
7 and inform rehabilitation programs. They are required by regulatory authorities in clinical trials to assess the patient-centered effectiveness of novel treatments.
8 To date, however, most currently available QoL instruments are not specific to DR and DME, which means they may lack sensitivity in capturing issues specific to the conditions, such as the impact of laser treatment or intravitreal injections on QoL and the difficulty associated with managing diabetes (a chronic health condition that requires a high degree of self-management) with an eye condition that requires frequent monitoring and that is often associated with visual impairment. In addition, the QoL impact of DR/DME and treatment has only been assessed using paper-pencil PROs,
9,10 which have a finite number of items and often fail to optimally target participants' impairment level across the spectrum of disease severity even though participants must answer every item. This can increase respondent burden,
11,12 lower response rates, and reduce data quality.
13 Moreover, most QoL instruments predominantly focus on visual functioning, whereas QoL also encompasses vision-related symptoms, pain, concerns, inconvenience, social life, and work issues.
14 These limitations are addressed by modern psychometric techniques such as item banking and computerized adaptive testing (CAT).
12,15 The advantages of item response theory calibrated item banks are well recognized; for example, the National Institutes of Health (NIH) Toolbox vision-targeted health-related quality of life measure.
16
An item bank is a pool of calibrated items (questions) that measure a latent construct such as “health concerns.”
17 CAT is a method for administering items from a calibrated item bank. It selectively chooses the questions asked based on the examinee's impairment level by presenting targeted items (i.e., those that will provide the greatest amount of information) to the respondent.
18 Subsequent items are selected based on the examinee's previous responses and selection proceeds until a predefined stopping criterion is reached. CAT requires fewer items than paper-pencil tests and may enhance measurement validity, precision, and accuracy.
18,19
We have developed item banks to assess the specific impact of DR/DME on nine relevant aspects of patients' QoL.
20,21 Here, we assess the psychometric properties of these item banks and investigate the utility of the final calibrated item banks by simulating a CAT application.
The item banks resulting from this work will provide, for the first time, measurement of eight areas of QoL specific to people with DR and DME, in addition to the introduction of relatively novel constructs such as Mobility, Health Concerns, Convenience, Driving, and Lighting. Furthermore, with simulation testing indicating that less than 10 items are required to gain precise measurement of each QoL item bank, our CAT is likely to be a time-efficient modality for use in clinics and research settings. Overall, this work will enable researchers and clinicians to comprehensively explore the impact of DR/DME from the patient's perspective for the first time. With the availability of eight item banks, researchers and clinicians can now choose the constructs relevant to their participants and patients, respectively.
Although the psychometric properties of most item banks were very good following amendments, AL, EM, and HC had higher than satisfactory eigenvalues, even after DV and LT were separated from AL. This may indicate multidimensionality, however, because the residual item loadings did not suggest meaningful secondary dimensions and because all items fit within their respective constructs, we did not split the scales. Although precision was good overall, targeting of item “difficulty” to participant “impairment” was suboptimal for the AL, MB, EM, HS, SC, CV, and DV item banks likely due to the relatively small number of participants with bilateral vision impairment (∼36%). Targeting may improve by adding items relevant to those at the less impaired end of the spectrum. This is relatively simple in item banking, where new, uncalibrated items are added to the bank and their calibration, relative to the existing items, is determined using Rasch analysis.
39 However, poor targeting of item difficulty to person impairment is largely overcome by CAT as the test is tailored to the individual's impairment level.
33
Three item banks demonstrated suboptimal fit to key Rasch model criteria and were therefore not further considered in the current study. The OS item bank had poor measurement precision and dimensionality indictors, suggesting that ocular comfort symptoms may not be a relevant construct for people with DR/DME. The SC item bank only obtained adequate measurement precision after one third of participants with extreme scores (i.e., those who reported no problem to all items) were removed from the Rasch analysis, suggesting that restrictions in social life may only be relevant for those with substantial vision impairment from DR/DME. Finally, the rating scale of the EC domain displayed highly disordered thresholds. Although order was restored by collapsing categories from 5 to 3, the resulting loss of measurement precision meant that the efficiency of CAT for this item bank was compromised. Future work will involve crafting and pilot testing additional items to improve the psychometric properties of these item banks and ensure their suitability for CAT.
Our study demonstrates the potential for advancing QoL measurement in DR/DME using an item banking and CAT approach, which addresses the shortcomings associated with short-form paper-pencil questionnaires.
12,17,19 For example, our CAT simulation tests indicated that only six to seven items were needed to gain measurement of the emotional impact of DR/DME with a high degree of precision. Such brevity may reduce test takers' burden and increase motivation because items are tailored to their individual situation.
12 Brief questionnaires are also highly valued in clinical settings where clinicians may have very little time to quantify patients' QoL using a PRO. Moreover, as CAT automates scoring, results can be integrated promptly into patient feedback and treatment,
40,41 which aligns well with the recent push to incorporate collection of PRO data in clinical care.
7
As a result of these benefits, item banking and CAT are gaining momentum worldwide in health-related research, and item banks have been developed for cancer-related fatigue
40,42 arthritis
43, paediatrics,
44 spinal cord injury,
45 and low vision,
46 among others. Our rigorous methodology for the development and calibration of item banks is similar to that used by the PROMIS (Patient Reported Outcomes Measurement Information System) group, albeit with different IRT models for psychometric analysis and calibration (graded response model versus Rasch analysis). For example, Tulsky et al. recently provided a comprehensive description of the development of 14 unidimensional spinal cord injury QoL (SCI-QOL) item banks across physical, emotional, and social health domains, from qualitative content development to psychometric testing and item bank calibration and finally to CAT evaluation.
47 Item banks for vision-related activity limitation, symptoms and QoL have also recently been developed by Pesudovs et al. in cataract patients
33; however, most of the items relate to activity limitation and the QoL bank requires additional content to become a comprehensive measure.
33 Moreover, because the item banks were formed by pooling items from 19 extant vision-related activity limitation questionnaires rather than developing content anew from qualitative work and were validated only in cataract patients, their applicability to DR/DME patients is likely to be limited.
One strength of our study is the large proportion of participants with severe DR/DME, which is often lacking in related studies. Another is the sophisticated psychometric techniques used to ensure that item banks were calibrated without LID and to address minor item misfit without having to delete numerous items unnecessarily. Similarly, efforts were made to rehome groups of items contributing to multidimensionality into related item banks. However, a few limitations should be noted. For example, nearly two-thirds of the sample was male, which may infer a sex bias in the results, although this may simply reflect the higher prevalence of diabetes and diabetic complications in men. The relatively high correlations between some of our measures may support a multidimensional latent structure underlying DR-specific QoL. However, our aim was to produce unidimensional measurement tools that provide users with the ability to administer selected scales for a given purpose. However, given that multidimensional IRT and bifactor models are available for use in item banking and CAT,
48–50 the potential to form a multidimensional DR/DME QoL item bank should be considered. For practical reasons, both face-to-face and phone interviews were conducted. Ideally, data collection should be restricted to a single method since mode of administration may affect data quality.
51 However, when we stratified the sample by mode of administration (
n = 268 face to face;
n = 246 phone), we found very similar psychometric properties between the two groups and no DIF for mode of interview (
Supplementary Tables S14–S24). In addition, the long interview duration may have reduced data quality. However, participants were given opportunities to rest and complete the questions over two sessions if desired. Finally, as our cutoff for detecting LID was 0.3 rather than the more commonly accepted value of 0.2, we may have missed noteworthy LID, thus artificially inflating reliability and precision estimates. Similarly, we used a conservative cutoff for detecting DIF (>1.0) and therefore may have missed detecting and accounting for moderate to large DIF for some items.
Our item banks will be validated in a future study using CAT by assessing completion time and average number of items administered; content range coverage and test precision; temporal reliability; and criterion, convergent, and divergent validity. We are currently developing an online testing platform which can be implemented through various platforms such as an iPad and can provide real-time scoring and recording of data. Given the rapidly increasing prevalence of diabetes and associated complications worldwide, the development of our item banks is timely. As recent advancements in treatments for DR and DME such as anti-VEGF therapy continue to gain momentum, a comprehensive PRO will be invaluable for use in clinical trials to compare the impact of novel treatment therapies from the patient's perspective. Similarly, the item banks will allow researchers and policy planners to effectively design and evaluate rehabilitation programs for DR/DME, and may also assist in identifying patients with specific QoL issues for timely referral for counselling or assistive services.
In summary, our eight item banks enable robust and comprehensive assessment of DR-specific QoL. CAT simulation results indicate that only a small number of items are required to obtain precise measurement of each QoL construct. Once validation using CAT is complete, our item banks offer clinicians and researchers the means to efficiently and accurately assess the impact of DR/DME and novel treatment therapies on eight aspects of QoL. In particular, relatively novel constructs such as Mobility, Health Concerns, Convenience, Driving, and Lighting can be explored in patients with DR/DME for the first time.
The authors thank Mike Linacre, Alan Tennant, and John Barnard for advice and support on the Rasch analysis and computerized adaptive testing simulations conducted in this study. The authors also thank the anonymous reviewers of this manuscript who provided comments and edits that enhanced the quality of this work.
Supported by National Health and Medical Research Council Centre for Clinical Research Excellence (CCRE) Grant 529923 (Translational Clinical Research in Major Eye Diseases); CCRE Diabetes; Novartis Pharmaceuticals Australia Grant CRFB002DAU09T; and the Royal Victorian Eye and Ear Hospital. EKF is funded by Australian National Health and Medical Research Council (NHMRC) Early Career Fellowship 1072987. GR is funded by NHMRC Career Development Award 1061801 The Centre for Eye Research Australia receives Operational Infrastructure Support from the Victorian Government.
The sponsor or funding organization had no role in the design or conduct of this research
Disclosure: E.K. Fenwick, None; J. Khadka, None; K. Pesudovs, None; G. Rees, None; T.Y. Wong, None; E.L. Lamoureux, None