Abstract
Purpose:
We previously developed a Diplopia Questionnaire capturing frequency of diplopia on a 5-point Likert scale (always, often, sometimes, rarely and never) in specific gaze positions (reading, distance straight ahead, right, left, up, down), to incorporate patient reported symptoms into outcome assessment for strabismus. For some analyses of diplopia, we have proposed defining “success” as “never or rarely” in distance straight ahead and reading positions. We investigated the test-retest reliability of such a classification.
Methods:
64 adults (18 to 87 years old) with stable strabismus and no intervention within the previous 6 months, completed the Diplopia Questionnaire at a clinic visit and at least 5 days later (5 to 154 days), 30 (47%) by mail and 34 (53%) at a return visit. Strabismus types included childhood, neurologic and mechanical and no change in treatment was allowed between test and retest. For analysis we categorized patients by defining “success” as “never or rarely” in distance straight ahead and reading positions. Agreement was assessed by calculating kappa values and frequencies.
Results:
At the first exam, 20 (31%) would have been classified as success, and 19 (30%) at the second exam. Even though agreement would have been designated “substantial” with a kappa of 0.74 (95% CI 0.56 to 0.92), 4 patients (6%, 95% CI 2% to 15%) would have been classified as success on the first exam but not on the second, and 3 (5%), 95% CI 1% to 13%) would have been classified as success on the second but not the first. Of these 7 discrepancies (11%, 95% CI 5% to 21%), the most common was a difference between “sometimes” and “rarely” (5/7, 71%). Discrepancies occurred across the spectrum of strabismus types and ages.
Conclusions:
Despite the importance of incorporating patient report into outcome assessment, test-retest variability may lead to misclassification, particularly when data are being used to dichotomize into “success-failure.” Alternative strategies for cohort studies include scoring on a continuous scale (as we previously described for the Diplopia Questionnaire). Other approaches might include using visual analog scales rather than descriptors, or standardized definitions of those descriptors.