Abstract
Purpose:
Patient symptom diaries are a commonly used method to collect efficacy data in clinical trials such as those for dry eye treatments. Typically, patients are asked to report the severity of several ocular symptoms multiple times per day over the course of a study that may last weeks or even months. Missing data can rarely be avoided when the patients are asked to complete many items in the patient diaries. There are several possible ways to handle the missing diary data each of which will result in an different statistical outcome. The goal of imputing missing diary data is to represent as close as possible the statistical outcome that would have been achieved with no missing data.
Methods:
5000 sets of diary data were randomly created from a multivariate normal distribution for two treatment groups (active and placebo). For each simulation, a complete two weeks of diary data was generated for 50 subjects per treatment, assuming a 0.6 treatment mean difference on a scale of 0-5 with a standard deviation of 1 and a correlation of 0.85 between diary days. Two percent of the data were randomly set as missing and ten percent of the subjects were randomly selected as early withdrawals. Several imputation methods were used to handle the missing data: (1) last observation carried forward (LOCF); (2) baseline observation carried forward (BOCF); (3) worst observation carried forward (WOCF); (4) subject mean; (5) treatment group mean. A mixed model accounted for repeated measures within each subject was used for statistical analysis. The percentages of times where the results indicated significant treatment differences based on the different imputation methods were compared to the complete simulated data as well as observed data only (ODO).
Results:
Compared to the analysis based on the complete simulated data, treatment group mean imputation (5) yielded artificially higher percentage of significance, whereas BOCF (2) yielded artificially lower percentage of significance. All other methods had similar percentages of significance as the analysis based on the complete simulated data.
Conclusions:
All of the above imputation methods are valid for missing data handling. More than one imputation method is recommended to apply to the clinical research for sensitivity analysis. The methods yielding similar percentages of significance as based on the complete simulated data are recommended.
Keywords: 459 clinical (human) or epidemiologic studies: biostatistics/epidemiology methodology •
473 computational modeling