purpose. The temporal artery biopsy (TAB) has long been the standard for diagnosing temporal arteritis (TA), but in practice this test is less than 100% sensitive; false-negative biopsy results are known to occur. The true sensitivity of a single TAB cannot be directly observed, because there is no true gold standard for comparison. The authors propose a mathematical method for calculating the true sensitivity of the TAB, using data from published bilateral TAB results.

methods. Based on Bayesian methodology, this statistical technique can be used to calculate the true sensitivity of a single TAB with data from studies reporting the results of bilateral simultaneous TABs. This technique also allows for calculation of the true prevalence of TA in a study population. Bootstrap techniques are used to provide confidence intervals. This technique is applied to data derived from four studies in the literature.

results. With this methodology, the sensitivity of a single TAB is calculated to be 87.1% (95% confidence interval, 81.8%–91.7%).

conclusions. Knowledge of the true sensitivity of any imperfect test is necessary for an accurate decision analysis, because it can affect the optimal diagnostic–therapeutic pathway. Although few studies report results of bilateral simultaneous TABs, such data are important because they permit the calculation of the true TAB sensitivity. The authors believe that this mathematical method is superior to observational methods (e.g., clinical criteria) for estimating the true sensitivity of a TAB.

*gold standard*tests.

^{ 1 }These tests are rarely used in clinical practice, either because they do not exist or because their use is not economically feasible. As a result, diagnostic tests with intrinsic error are frequently used instead.

^{ 2 }Knowing the sensitivity of an imperfect test is important for decision analysis, because it is a critical parameter that can affect the optimal pathway in a diagnostic or therapeutic decision model.

^{ 3 }We recently pooled data from 17 studies reporting the sensitivity of a unilateral TAB in clinical series, revealing a sensitivity of approximately 87% for a unilateral TAB.

^{ 4 }In these studies, the ultimate standard used most often for comparison was clinical criteria (i.e., the patient was judged clinically to have TA despite a negative biopsy). However, clinical diagnosis of TA also has its own false-negative rate, failing to detect accurately all patients who truly have TA. Since no true gold standard confirmatory test exists, the sensitivity of the TAB cannot be directly measured, and thus reported values represent best estimates.

^{ 5 }

^{ 1 }

^{ 2 }

^{ 6 }

^{ 7 }

^{ 8 }

^{ 9 }All such approaches seek to estimate at least three parameters: test sensitivity, test specificity, and prevalence of disease in the study population. Because these parameters are not directly measurable from experimental data, they are called

*latent*; such methods are often referred to collectively as

*latent class analysis*. Some techniques allow for generalization to multiple test repetitions (e.g., the same test performed

*n*times) and multiple cutoff values (e.g., low/moderate/high risk of disease instead of simply absent/present). Although powerful, such methods can be mathematically complex. Many of these methods require data from four or more tests to derive meaningful results,

^{ 1 }whereas temporal artery biopsies are almost never performed more than twice.

*N = A*

_{ ij }+

*B*

_{ ij }, where

*A*is the number of patients in that cell who are truly disease positive and

*B*is the number of patients who are truly disease negative. In this expression,

*i*and

*j*represent the results of the first and second TABs, respectively. Figure 1depicts the contingency tables demonstrating these definitions.

*not*known, sensitivity, specificity, and disease prevalence are all

*latent*variables; they are not directly obtainable from experimental values, and thus must be derived mathematically.

^{ 1 }and Su et al.

^{ 6 }These methods, in turn, represent modifications of the original Hui and Walter

^{ 8 }algorithm for estimating such parameters from correlated binary tests.

^{ 6 }

^{ 7 }“Simultaneous” bilateral biopsies are performed when the surgeon plans from the start to sample both sides regardless of the result of the first biopsy. With “sequential” bilateral biopsies, the decision to perform a second biopsy depends on the result of the first. For example, if the first side is positive for disease (on frozen or permanent section), the surgeon may choose not to perform the contralateral biopsy. If the first side is negative, the surgeon will then proceed with the contralateral biopsy if there is a high suspicion of disease. Note that the definitions of these protocols are not dependent on the actual timing of the two biopsies; whether the procedures take place on the same or different days is irrelevant. Our method is applicable only for studies reporting results of bilateral simultaneous biopsies.

*N*patients undergo bilateral simultaneous TABs. There are four possible outcomes when two simultaneous biopsies are performed: (− −), (− +), (+ −), and (+ +). Of these four possibilities, (− +) and (+ −) represent

*discordant*pairs, while (− −) and (+ +) represent

*concordant*pairs.

*D*represents true disease status and

*i,j*are the results of TAB #1 and TAB #2, respectively. These probabilities are directly related to sensitivity (

*S*

_{n}) and specificity (

*S*

_{p}) of a single TAB, as well as prevalence of disease (π) in the study population, by the following expressions:

*X*equal the proportion of biopsy pairs that are both negative for disease (concordant negative). Thus,

*X*equals the number of (− −) biopsies divided by the number of all biopsies performed. Let

*Y*equal the proportion of biopsy pairs that are discordant; thus

*Y*equals the sum of (− +) and (+ −) pairs divided by the number of all biopsies performed. Let

*Z*equal the proportion of biopsy pairs that are both positive for disease (concordant positive). Thus,

*Z*equals the number of (+ +) biopsy pairs divided by the number of all biopsies performed.

*X*,

*Y*, and

*Z*are all experimentally derived values that are reported by any study describing the results of bilateral simultaneous biopsies. Note that, by definition,

*X*+

*Y*+

*Z*= 1 (that is, the sum of the proportions of all biopsy results must equal 100%).

*X*,

*Y*, and

*Z*can all be defined by the following expressions that match predicted values with experimental values. Referring back to the equations shown in (4):

*S*

_{n}) and at least two equations, the expressions can be solved algebraically for each variable. These equations can be solved in any combination. The most simple combination algebraically would be first to solve equation 7for π, yielding:

*S*

_{n}, the true sensitivity of a single TAB, in terms of

*Y*and

*Z*:

*S*

_{n}back into equation 8 , one can solve for π, the true prevalence of disease in the study population:

*R*” (for Windows ver. 2.3.1; Microsoft Inc., Redmond, WA).

^{ 10 }“

*R*” is a free software package for statistical computing and graphics that is available for download at http://www.R-project.org. The procedure is performed by randomly sampling with replacement

*n*data points from each study population of

*n*subjects. Six thousand bootstrap samples were used for each calculation, to ensure stability of interval results, given the random nature of the sampling.

^{ 11 }To investigate the effect this would have on our technique, we used the technique of Vacek

^{ 12 }which uses the covariance (

*e*) to parameterize the amount of conditional dependence. Using this method, the following equations for sensitivity and disease prevalence were derived, which can be used to examine the effect of various levels of dependence:

*e*has a quantifiable maximum value based on the calculated false negative rate of the TAB.

^{ 12 }We performed a sensitivity analysis with input parameter

*e*ranging from 0% to 100% of maximum

*e*to evaluate the effect of various degrees of dependence on sensitivity and prevalence results derived by our technique.

^{ 13 }

^{ 14 }

^{ 15 }

^{ 16 }For two studies,

^{ 13 }

^{ 16 }additional data were provided by the authors via personal communications. Using the experimental data provided, we applied the mathematical technique described herein to each study as well as to the pooled data from all four studies. We specifically excluded equivocal cases (i.e., cases in which the biopsy sample was insufficient in size or nondiagnostic) from our analysis. Data from these studies, as well as the values derived for true sensitivity (

*S*

_{n}) of a single TAB and true prevalence of disease (π) using our technique, are shown in Table 1 . Using our method, the true sensitivity of a single TAB in these studies is 87.1% (95% confidence interval, 81.8%–91.7%). The true prevalence of TA in patients undergoing TAB in these studies is 26.4% (95% confidence interval, 22.2%–30.7%).

^{ 17 }We estimated the between-study heterogeneity by using a χ

^{2}-based Q statistic, with heterogeneity considered significant for

*P*< 0.10. The Q statistic indicated significant heterogeneity between the four included studies; as a result, a random-effects (DerSimonian and Laird) model with the inverse variance-weighting method was indicated and used. The pooled sensitivity obtained with this meta-analytic method was 88.9% (95% confidence interval, 78.0%–99.8%).

*e*ranging from 0% to 100% of its maximum possible value, to evaluate the effect of various degrees of dependence on sensitivity and prevalence results derived by our technique (Fig. 2) . At 100% of maximum

*e*(representing peak dependence), calculated sensitivity decreased from 87.1% to 79.2%. From 0% to 70% of maximum

*e*, the calculated sensitivity value falls within the 95% confidence interval calculated by our method. The prevalence value changed very little and remained within the calculated 95% confidence interval over the entire range of dependence levels. These findings suggest that our model is robust over a wide range of dependence levels.

^{ 9 }have described the application of this general technique for estimating the sensitivity and specificity of visual field testing for optic nerve damage progression in normal-tension glaucoma. A mathematical method is ideal for the calculation of true sensitivity in the absence of a gold standard test. As a reference standard, clinical criteria are subjective and dependent on interobserver variation and thus do not represent true sensitivity. A mathematical derivation from the results of bilateral biopsies is objective and relies on the test itself to calculate the true sensitivity experimentally.

^{ 4 }and further supports this mathematical method as capable of estimating the true sensitivity.

^{ 4 }consistent with the fact that using clinical criteria as a reference standard can produce false-negative results. Our calculated true prevalence of TA in these study populations (26.4%) may appear high, especially when compared to published population-based estimates that report a prevalence of ∼1%, even in the oldest age groups.

^{ 18 }However, the high true prevalence values should not be surprising when one considers that these populations include patients who are undergoing TAB. The patients in these studies probably demonstrated characteristic symptoms or signs that suggested the presence of TA. In this sense, they represent selected populations in which the likelihood of disease is greater than in the general population.

^{ 19 }Regardless of the disease being studied, such models always rely on the use of accurate parameters, such as the sensitivity and specificity of diagnostic tests, to generate meaningful results. For example, we have recently published a decision-making model for the management of patients with suspected TA.

^{ 4 }The results of this model (e.g., when to perform unilateral TAB, when to perform bilateral TAB, if/when to treat with steroids) are critically dependent on an accurate estimate for the sensitivity of the TAB.

*reduced*diagnostic yield by 50%, which is not mathematically consistent. Second, in practice, the selection of which side to sample in a unilateral TAB is not always random. Most surgeons tend to sample the side that is clinically abnormal (e.g., on palpation) or potentially symptomatically involved (e.g., monocular visual loss on that side). As a result, the discordance rate may not accurately estimate the increase in sensitivity with the second biopsy, because the true pretest probability of disease in a purely unilateral biopsy could vary, depending on which side is initially chosen.

**Figure 1.**

**Figure 1.**

**Figure 2.**

**Figure 2.**

*Stat Med*. 2002;21:2653–2669. [CrossRef] [PubMed]

*Biometrics*. 2004;60:388–397. [CrossRef] [PubMed]

*Mayo Clin Proc*. 1976;51:505–510.

*Ophthalmology*. 2005;112:744–756. [CrossRef] [PubMed]

*Arch Ophthalmol*. 1983;101:1251–1254. [CrossRef] [PubMed]

*Stat Med*. 2004;23:2237–2255. [CrossRef] [PubMed]

*Biometrics*. 1992;48:839–852. [CrossRef] [PubMed]

*Biometrics*. 1980;36:167–171. [CrossRef] [PubMed]

*J Clin Epidemiol*. 1991;44:1167–1179. [CrossRef] [PubMed]

*R: A Language and Environment for Statistical Computing*. 2005;R Foundation for Statistical Computing Vienna, Austria.Available at: http://www.R-project.org. Accessed September 10, 2006

*Biometrics*. 2004;60:427–435. [CrossRef] [PubMed]

*Biometrics*. 1985;41:959–968. [CrossRef] [PubMed]

*Am J Ophthalmol*. 1999;128:211–215. [CrossRef] [PubMed]

*J Neuroophthalmol*. 2000;20:216–218. [CrossRef] [PubMed]

*J Rheumatol*. 1988;15:997–1000. [PubMed]

*J Neuroophthalmol*. 2000;20:213–215. [CrossRef] [PubMed]

*. May 2005;Available at http://www.cochrane.org/resources/handbook . Accessed September 10, 2006.*

*Reviewers’ Handbook.*version 4.2.5*Arthritis Rheum*. 1998;41:778–799. [CrossRef] [PubMed]

*JAMA*. 1995;273:1292–1295. [CrossRef] [PubMed]