Consider that three HRT images, at each patient visit, are acquired at regular intervals during a clinical follow-up. After registration of the image series, the topographic height at each individual pixel is considered in turn. Visually, this can be done by plotting the topographic height at each pixel as a time series
(Fig. 1) . Next, a suitable statistic is derived for summarizing the change, or stability, of the topographic height at that pixel over time: the line of best fit (slope) derived from ordinary least-squares regression. The standard error (SE) of this slope gives an indication of how well the data fit the linear trend, with relatively high values indicating a poor fit or a noisy series of observation. Our test statistic at each pixel is simply the absolute value of the slope divided by the SE. A relatively large test statistic would be evidence of clear linear change of topographic height at that pixel. This process is performed at all the pixels, and the patient’s series of data is reduced to a
statistic image—no longer a physiological image, but a 256 × 256-pixel map of statistics summarizing change within the image. The next step is to determine whether the observed test statistic at each pixel is unusual, or more extreme, than would be expected by chance. This testing of the
significance of the test statistic is not completed in the conventional manner, by considering the observed test statistic as a random variable from a probability model, but uses a
permutation test. We randomly shuffle, or relabel, the order of the observed data and recalculate the test statistic for all possible permutations of the order of images. If we let
N denote the number of all possible labelings,
t i the statistic corresponding to labeling
i, then the set of
t i for all possible relabeling constitutes the
permutation distribution. For example, there would be 369,600 [12!/(3! × 3! × 3! × 3!)] of these in a series of four clinical visits with three scans at each visit (see
Appendix for more details of this calculation). We then assume that all the
t i are equally likely and determine the significance of the observed test statistic by counting the proportion of the permutation distribution as, or more, extreme than our observed value, giving us our
P-value. If
P is, for example, <5% we label the pixel as active or changing. (We therefore assume that images acquired at the same visit are no more correlated than images acquired between visits. Previous work on the influence of time separation on interimage topographic variability support the intuition behind this approach.
32 ) This
permutation test is performed pixel by pixel, and the statistic image becomes “thresholded” at the 5% level, with pixels flagged if they are significant
(Fig. 1) . In practice, a sample of 1000 randomizations (drawn without replacement from all the possible labelings) are used to generate the permutations distribution.
33 34 This eases the computation burden but still allows for a statistically exact test at standard levels of significance testing. (Larger samples would be needed to evaluate
P < 0.1%.)