Abstract
Purpose: :
There are many instances in which laboratory measurements have limits of detection. The results for such tests will be reported as "< d" where d represents the limit below which measurements cannot be (reliably) obtained. A number of methods can be used to handle such data when calculating numerical summary statistics. Here, we use the example of IL-6 tear cytokine levels to describe a simple, reliable method to handle such instances, and compare this method to other commonly used approaches.
Methods: :
Four common methods for calculating summary statistics in the presence of non-detectible data are: 1. Set the values = 0; 2. Set the values = d (the detection threshold); 3. Set the values = the midpoint between 0 and d, or d/2; 4. Delete or ignore the values. Our method, which is based on order statistics, involves fitting a distribution to the observed data, assuming that the nondetectible data follows the same distribution as the observed data. The mean and standard deviation of the data are then estimated directly from the distribution.A simulation was performed based on data distributions seen in actual studies for IL-6 tear cytokine data. Thirty data points were simulated from a normal distribution with a mean of 5 and a standard deviation of 1. Five threshold levels were evaluated: 3, 3.5, 4, 4.5, and 5. Based upon these thresholds, between 3% and 60% of the data was non-detectible and set to "< d". The means of the resulting data were then calculated using the 4 simple methods cited earlier as well as the method based on order statistics.
Results: :
Estimates from the common methods either overestimate (threshold or deleting) or underestimate (set to 0 or use the midpoint) the mean. In contrast, the method based on order statistics gives the most accurate estimate of the true mean of the data. The order statistic method always outperformed the other methods and shows a greater advantage as the percentage of missing data increases due to the threshold changes.
Conclusions: :
When "non-detectible" data exists, as often is the case for tear cytokine data, common methods of data substitution or deletion will introduce biases when used as part of the mean calculation. A method based on order statistics will greatly improve the accuracy of mean value estimations.
Keywords: clinical (human) or epidemiologic studies: biostatistics/epidemiology methodology • cytokines/chemokines • detection