Research on test–retest reliability of a new method to measure melanopsin-driven postillumination pupil response (PIPR) induced by hemifield, central-field, and full-field light stimulation was performed by Lei et al.
1 The new PIPR method is useful in clinical practice. We would like to share our opinion and expertise with the authors and other readers in order to broaden the discussion.
Table 3 shows a large variation in the measurements. The 95% confidence interval of intraclass correlation coefficient (ICC) was 0.69 to 0.96 for the lower hemifield and 0.71 to 0.97 for the upper hemifield. These values undermine the clinical utility of this method for assessing the PIPR. Merely reporting the ICC and coefficient of variation is not adequate to demonstrate whether a method is precise enough for clinical diagnostics. The authors should perform more rigorous statistical analyses and thoroughly understand the statistical parameters suggested by the British and International Standards, such as test–retest repeatability (TRT).
2–5 Test–retest repeatability is defined as the 2.77 within-subject standard deviation, which means an interval within which 95% of the differences between measurements are expected to lie.
An important question is whether a sample size of 20 visually normal subjects is sufficient for such an interesting study. The authors need to provide statistical power calculations for ICC and coefficient of variation. It is more appropriate to use the sample calculation method suggested by Bland and Altman.
3,6 For precision studies, the sample size calculation is independent of the instrument or technology. The only variable factors are the number of repeated measurements (
n') and the confidence with which precision is estimated, typically 10%, which corresponds to a confidence of 0.10. So the formula is
where
n is the sample size and
m is the number of repeated measures. Therefore, if the authors assessed reproducibility with three repeated measurements and wanted to be within 10% confidence interval, they would require the following sample size:
Therefore,
n = 96 subjects.
Alternatively, a weaker confidence interval of 20% would require
Therefore,
n = 24 subjects.
We encourage the authors to recruit more subjects to reinforce the quality of their research and improve the clinical application of PIPR testing.
The lower and upper hemifield stimulations were repeated in the second session scheduled within 1 month of the first session. Read and Collins
7 demonstrated that significant physiological diurnal variation occurs in corneal thickness and shape, including the anterior and posterior corneal surfaces. The melanopsin-containing intrinsically photosensitive retinal ganglion cells (ipRGCs) also exhibit circadian rhythm variations.
8 It is unclear whether the authors used the same unit time. To investigate the intersession reproducibility, we suggest that measurements be performed at the same time as the first session, by the same examiner, using the same protocol to minimize ipRGC circadian variations. It is also important to test the reproducibility of the new method in different diseases, such as age-related macular degeneration and glaucoma, where higher variability as compared to normal subjects may be expected.