Free
Perspective  |   August 2011
Application of Advanced Statistics in Ophthalmology
Author Affiliations & Notes
  • Qiao Fan
    From the Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine and
  • Yik-Ying Teo
    From the Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine and
    the Department of Statistics and Applied Probability, National University of Singapore, Singapore; and
  • Seang-Mei Saw
    From the Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine and
    the Singapore Eye Research Institute, Singapore National Eye Center, Singapore
  • Corresponding author: Seang-Mei Saw, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, 16 Medical Drive, MD 3, Singapore 117597; ephssm@nus.edu.sg
Investigative Ophthalmology & Visual Science August 2011, Vol.52, 6059-6065. doi:10.1167/iovs.10-7108
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Qiao Fan, Yik-Ying Teo, Seang-Mei Saw; Application of Advanced Statistics in Ophthalmology. Invest. Ophthalmol. Vis. Sci. 2011;52(9):6059-6065. doi: 10.1167/iovs.10-7108.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Statistics is an integral part of research in ophthalmology. The application of appropriate statistical strategies allows clinicians to realize the full potential in analyzing data from paired ocular measurements, longitudinal design, and genome-wide association study (GWAS). The increasing popularity of longitudinal follow-up in either clinical or epidemiologic study demands advanced statistical methodologies. This article describes robust statistical models that can cope with correlated components for both paired-eye data and repeated measurements over time. Also highlight are the statistical challenges and corresponding strategies available for testing multiple hypotheses with paired-eye data in GWAS, which has been the subject of intense interest for the past 5 years within the ophthalmology community in investigating the genetic etiology of eye disorders.

There are several statistical challenges in ophthalmic research. The nature of ocular measurements of paired eyes poses a long-standing question to ophthalmologists for developing joint inference on paired-eye data. 1,2 The increasing focus on longitudinal designs in clinical trials and observational studies has also compounded the need for statistical methodologies that handle correlated data from repeated sampling across time. Furthermore, the advent of large-scale genetic studies has permitted an unbiased systematic survey of the entire genomic landscape for variants that contribute to the etiology of eye disorders. However, this approach faces the statistical dilemma of testing multiple hypotheses, since a typical genome-wide study queries up to a million variants simultaneously. The appropriate application of statistics in modern ophthalmology research is thus vital in addressing these challenges. In this article, we review problems common to longitudinal and genome-wide surveys of eye-related traits and provide an exposition of proposed solutions. 
Paired-Eyes Measurements
Whether data from one eye or both eyes are used depends on the study hypothesis and clinical relevance. For example, visual acuity (VA) in the better eye is commonly used to indicate the degree of visual impairment. Similarly, using the ocular data of the worse eye is an appropriate definition to characterize the status of eye disease for patients eligible for clinical trials. When correlated measurements from both eyes (such as intraocular pressure or refractive error) are available, using the information from only one eye is a statistically simple approach, but may not reflect the true extent of the disease. 3 The rationale behind using one eye (right, left, or randomly chosen) stems from the notion that most ocular measurements are more similar between the eyes of the same individual than between different individuals. 1 However, even though the measurements of both eyes correlated highly, it does not necessary mean that the analysis restricted to only right eyes will yield the same results as those of left eyes. Therefore, cautious interpretation of any discordant results is required. This paired eye problem affects the analysis across different study designs such as case-control, clinical, cross-sectional, and cohort studies. We examined clinical and epidemiologic articles published in IOVS and Ophthalmology from January to June 2009 and documented their analytic approaches. Of the 115 papers that are covered in our review, clinical studies exhibited a greater preference to consider all eligible eyes or the affected eye(s) than epidemiologic studies (Table 1A), whereas paired-eye designs are more commonly adopted in clinical trials. 4  
Table 1.
 
Categories of Analytic Strategies in Clinical and Epidemiological Papers*
Table 1.
 
Categories of Analytic Strategies in Clinical and Epidemiological Papers*
A. Analyses at Subject Level versus Ocular Level†
Articles n (%) No Correction for Correlation on Paired Eyes‡
Clinical study
    Subject level 24 (20.9)
    Ocular level 24 (20.9) 10
Epidemiology study
    Subject level 48 (41.8)
    Ocular level 19 (16.4) 4
Total 115 (100)
B. Statistical Approaches for Longitudinal Follow-up Study§
Articles, n (%)
t-test/paired t2/McNemar test 18 (26.1)
Wilcoxon rank sum test 11 (15.9)
Logistical/linear regression 10 (14.5)
Repeated ANOVA 6 (8.7)
Mixed model/GEE 11 (15.9)
Survival based analysis 13 (18.9)
Total 69 (100)
In the paired-eye design, both eyes of the subject are considered as matched case–control data within every subject. The pairing nature in such settings leads to the use of the paired-sample t-test and the McNemar test in ophthalmology to assess the differences between paired eyes for numerical or categorical outcomes. 4 Under the assumption that both eyes experience the same exposure within an individual, failure to account for the intrasample correlation between both eyes can overestimate the treatment effect, which leads to an increased likelihood of making a type I error. 1,5 Advanced statistical approaches that are used to perform joint modeling of paired-eye data have been covered in previous reviews. 1,2,6,7 The generalized estimating equation (GEE) is the extension of linear regression within a longitudinal framework where repeated measurements are made within every individual. 8 10 Mixed-effects regression modeling provides a flexible framework for analyzing clustered data with multilevel structures. 9,11,12 Comparisons between GEE modeling and mixed-effects regression on continuous and binary paired-eye data reveal similar performance under most conditions, 6,7,13 although GEE is recognized to be computationally more efficient in handling large datasets with binary or ordinal outcomes. 3  
Longitudinal Follow-up
Longitudinal data arising from either clinical trials or cohort studies allows the progression or natural evolution of a disease to be studied. This is often not achievable with cross-sectional study designs. Data in longitudinal follow-up studies are generally collected in the form of outcomes measured repeatedly or the time until the onset of the disease. 9 The former mainly focuses on modeling the changes in health outcomes over time and on identifying the factors that are associated with the changes; the latter primarily investigates the factors associated with the risk of event onset. In ophthalmology, inclusion of paired-eye data in a longitudinal study for some or all the study participants complicates the multilevel structure of these data, and the complexity should be recognized before the initiation of the study. 
Our literature review suggests that most studies adopted the use of simple analytical methods (see Table 1B), either in a cross-sectional fashion at discrete time points or to measure the changes in a numerical outcome over time. Both approaches transform data from repeated measurements into a single outcome for each subject, where traditional statistical strategies (such as paired t-test, χ2 test and regression modeling) can be used in the analysis. The aggregating of the outcome is appropriate in a variety of situations in ophthalmology, such as measuring the average change of intraocular pressure or quantifying the corneal refractive error before and after surgery for patients with astigmatism. However, it is important to recognize that these strategies do not use all the information that is available for each subject. To obtain a comprehensive understanding of how the outcome changes over time, especially in the presence of a treatment where the efficacy changes with time, it becomes necessary to explicitly model the repeated measurements within each subject. This explicit modeling becomes even more important when an experiment focuses on an eye trait that is liable to exhibit large discordances between the eyes, such as a longitudinal assessment of intraocular pressure progression in individuals with presence of uveitis in affected eye(s). In such a study, the failure to incorporate the correlation between the eyes will bias the statistical inference to assess the eye-specific risk factor on the longitudinal progression of the outcome. 
Appropriate analyses of longitudinal data should consider the correlation structure between the repeated measurements. Established methods include repeated measurements analysis of variance (ANOVA), 14 16 GEE, 8 10 and mixed-effects regression model 11,17,18 (Table 2). Repeated-measures ANOVA primarily focuses on balanced data where the same number of repeated observations has been made for every individual. However, this condition is often not fulfilled in observational studies. In an unbalanced design where the number of measurements for each individual may differ, mixed-effects models and GEE are the preferred methods to analyze longitudinal data. In mixed-effects modeling, the joint consideration of fixed and random effects estimates both a subject-specific baseline for the outcome and a subject-specific trend (over time) for the explanatory variables, 11 which allows the extent of interindividual variations to be measured. Random effects can be assumed on any covariate or any cluster of subjects to capture the correlated characteristics in the data; fixed-effects estimates are interpreted as the conditional effects in the presence of the covariates with random effects. In GEE, the “sandwich covariance” effectively estimates the correlation structure between all pairs of observations from the same cluster, yielding robust estimates of the standard errors for the regression coefficients while allowing the marginal treatment effects to be calculated. 14 In the case of nested multilevel structure, GEE considers the cluster at the top in assessing the potentially correlated outcomes. 19 Although both mixed-effect and GEE modeling are commonly used in ophthalmology to handle numerical and discrete outcomes, 11,12 it has been suggested that mixed-effects regression is more efficient in the presence of data with a substantial amount of nonrandom missingness. 9,17  
Table 2.
 
Statistical Approaches for Longitudinal Follow-up Study
Table 2.
 
Statistical Approaches for Longitudinal Follow-up Study
Approaches Outcome Adjust for Correlation Comments
Paired Eyes Repeated Measures
Charting Event Progression
t-test/ANOVA χ2 Wilcoxon rank tests Continuous/discrete No No Straightforward; perform analysis at each time point or use changes as outcome, less powerful due to discarded information; cannot model the time trend or the predicators associated with outcome
Linear/logistical regression Continuous/binary No No Straightforward; perform analysis at each time point or use changes as outcome; adjust baseline covariates in the model; less powerful if discarding information; cannot model the longitudinal trend
Repeated ANOVA 14 Continuous Yes Yes Analytically complex; require balanced data design; less robust to missing data; cannot model individual trend
Mixed-effects model 11,17 Continuous/binary/count Yes Yes Statistically powerful; analytically complex; can model both fixed and random effects; flexible framework in specifying parameter distribution; capable of handling unbalanced data
GEE 10 Continuous/binary/count Yes Yes Statistically powerful; analytically complex; capable of handling unbalanced data; model marginal effects; less powerful in handling missing data
Charting Event Onset Time to Event
Kaplan-Meier Continuous No NA Straightforward; estimate the survival rates
Log rank test Continuous No NA Simple nonparametric approach to compare the rates; unable to adjust covariates
Proportional Cox model Continuous No NA Quantify effects of covariates on the survival time; compare the rates by groups
Frailty model 26 Continuous Yes NA Analytically complex; capable of modeling correlated time to event data; flexible framework for random effects
Marginal model 27 Continuous Yes NA Analytically complex; capable of modeling correlated time to event data; robust to time-dependent covariates; estimate marginal effects
To illustrate the analytic approaches for fitting longitudinal data with repeated paired-eye measurements, we consider a dataset from the Singapore Cohort Study of the Risk Factors for Myopia (SCORM), 20 where a total of 1979 school children recruited from 1999 to 2001 were followed up longitudinally for the myopia development. For illustration purposes only, our primary interest is whether the school that the child comes from is associated with the students' refractive error measured annually for four consecutive years. Sphere equivalent (SE) measurements (four per eye) are clustered at eye level, and eyes (two per individual) are clustered at subject level. We perform four sets of analyses. First, we fit the repeated-measurement SE using data from both eyes of each participant by mixed-effects model (model 1) and GEE (model 2). For the mixed-effects model, the subject and eye are modeled as random effects in a nested structure, whereas GEE relies on empiric covariance estimates for the subject clusters. 21 Second, we model SE longitudinally using measurements from both eyes by mixed-effects (model 3) and GEE (model 4), but ignore the intereye correlation for each individual. We consider data from the right eye and those from the left eye as independent observations. Third, we fit the repeated measurements of SE average from paired eyes using mixed-effects model (model 5) and GEE (model 6). Fourth, we model SE at the last visit (year 4) for the data of right eye only from each individual, where observations from previous visits and from the left eye are deliberately excluded from this analysis (model 7). Table 3 compares the results of various approaches to modeling longitudinal SCORM dataset. 
Table 3.
 
Results of Analyzing Repeated Sphere Equivalent in a Longitudinal Study Using Different Analytic Approaches
Table 3.
 
Results of Analyzing Repeated Sphere Equivalent in a Longitudinal Study Using Different Analytic Approaches
Whole Data Analysis Partial Data Analysis
Longitudinal Data: Account for Intereye Correlation Longitudinal Data: Ignore Intereye Correlation Longitudinal Data: Both-Eyes Average Cross-Sectional Data
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7
β (SE) P β (SE) P β (SE) P β (SE) P β (SE) P β (SE) P β (SE) P
Intercept −0.29 (0.42) 0.492 −0.50 (0.48) 0.296 −0.27 (0.32) 0.399 −0.48 (0.35) 0.167 −0.22 (0.49) 0.661 −0.56 (0.56) 0.316 −1.93 (0.63) 0.003
School
    1 Referent Referent Referent Referent Referent Referent Referent
    2 0.78 (0.29) 0.009 0.87 (0.30) 0.003 0.75 (0.22) 0.001 0.87 (0.22) <0.0001 0.73 (0.34) 0.033 0.88 (0.35) 0.013 1.26 (0.61) 0.040
Time, y
    1st Referent Referent Referent Referent Referent Referent
    2nd −0.70 (0.04) <0.0001 −0.64 (0.06) <0.0001 −0.70 (0.04) <0.0001 −0.64 (0.05) <0.0001 −0.70 (0.06) <0.0001 −0.64 (0.07) <0.0001
    3rd −1.27 (0.05) <0.0001 −1.27 (0.13) <0.0001 −1.28 (0.05) <0.0001 −1.27 (0.10) <0.0001 −1.29 (0.07) <0.0001 −1.35 (0.15) <0.0001
    4th −1.73 (0.06) <0.0001 −1.71 (0.18) <0.0001 −1.74 (0.06) <0.0001 −1.71 (0.14) <0.0001 −1.74 (0.10) <0.0001 −1.80 (0.22) <0.0001
School* time 0.18 (0.02) <0.0001 0.15 (0.07) 0.030 0.18 (0.02) <0.0001 0.15 (0.05) 0.005 0.17 (0.08) <0.0001 0.17 (0.08) 0.037
Age, y
    7 Referent Referent Referent Referent Referent Referent Referent
    8 0.32 (0.27) 0.235 0.38 (0.27) 0.153 0.30 (0.20) 0.149 0.38 (0.20) 0.054 0.27 (0.32) 0.392 0.41 (0.32) 0.197 0.47 (0.41) 0.250
    9 −0.90 (0.30) 0.004 −0.92 (0.36) 0.010 −0.88 (0.23) <0.0001 −0.92 (0.27) 0.001 −0.85 (0.34) 0.014 −0.93 (0.39) 0.018 −0.88 (0.45) 0.051
Sex
    Male Referent Referent Referent Referent Referent Referent Referent
    Female 0.25 (0.23) 0.278 0.18 (0.24) 0.474 0.20 (0.17) 0.253 0.18 (0.18) 0.341 0.14 (0.27) 0.607 0.06 (0.28) 0.821 0.12 (0.34) 0.733
Race
    Non-Chinese Referent Referent Referent Referent Referent Referent Referent
    Chinese −0.78 (0.28) 0.005 −0.60 (0.30) 0.041 −0.76 (0.21) <0.001 −0.60 (0.22) 0.006 −0.77 (0.33) 0.018 −0.43 (0.35) 0.214 −0.83 (0.42) 0.049
Books read per week
    ≤2 Referent Referent Referent Referent Referent Referent Referent
    >2 −0.20 (0.24) 0.401 −0.15 (0.26) 0.563 −0.14 (0.18) 0.447 −0.15 (0.19) 0.442 −0.05 (0.28) 0.849 −0.11 (0.29) 0.700 −0.13 (0.36) 0.730
The first analytic approach offers distinct advantage over others at it correctly models repeated measures and intereye correlations simultaneously. We observe a significant difference of refractive error between two schools, where the difference varies linearly with time. The statistical significances for the school effect agree between models 1 and 2, whereas the estimated effect sizes are moderately different as GEE calculates the population-averaged effect. In the second scenario (models 3 and 4), longitudinal analysis at ocular level without allowing for intereye correlation results in artificially narrowed interval estimates of the school effect, even when the point estimate of the effect remains unbiased. Failure to account for intereye correlation results in an inflation of the level of statistical evidence. In the third situation, averaging the responses from both eyes results in a larger standard error of the school effect. This is particularly relevant when missing responses are generated as measurements available for only one eye or when ocular measurements are weakly correlated between the two eyes. In this setting, the main effect of school remains significant but is less significant than that from the first set of analysis. In the fourth scenario which considers right eye data only from the last visit, the effect of school is less significant compared to the reportedly significant results. This suggests that the use of limited or partial data can compromise the statistical power. It is also important to note that reducing longitudinal data to cross-sectional fashion in the fourth scenario does not yield any information on the trend or the school effect varying with time, which is often of interest in longitudinal studies. 
For the survival-based analysis used in longitudinal follow-up studies, researchers are most interested in the occurrence of the event and time to event onset. Established statistical approaches include the use of the Kaplan-Meier curve, log rank statistics, and Cox proportional hazard modeling. 22 24 If paired-eye data are of interest in the study, time-to-event is not independent at the ocular level. To fit correlated survival data, frailty model, multilevel survival or marginal models are commonly adopted in the medical community. 25 27 The application of such advanced models in ophthalmic research is worth further exploration, but is beyond the scope of this article. 
Genome-Wide Association Study
Genetic studies are conducted to identify the hereditary nature of diseases and traits, primarily relying on the comparison of genetic variation between individuals with differential expression for the trait of interest. A typical genome-wide association study (GWAS) surveys between 500,000 and 1,000,000 single-nucleotide polymorphisms (SNPs) across the entire human genome simultaneously, and such genome-wide designs have replaced candidate gene studies as the preferred strategy to study the genetic etiology of complex human traits, 28,29 including eye disorders. 30 39 Cochran-Armitage trend test, χ2 test and logistic regression model are largely used in the case–control design to study the overrepresentation of the mutated allele in cases versus controls. 40 In family-based studies, we measure the excess transmission of any allele from heterozygous parents to affected offsprings under the condition of Mendel's law. 41 Furthermore, the incorporation of longitudinal information such as modeling time to event and repeated measurements will add merit to GWAS. 42  
Testing multiple hypotheses simultaneously to draw correct statistical inference is the most challenging aspect in GWAS. It is now common to assay a million variants in a GWAS, and this effectively constitutes a million hypothesis tests. A conventional significance threshold of 5% is thus expected to artifactually identify 5000 markers that are “correlated” to the trait. To address this problem with multiple testing, geneticists have adopted a stringent statistical significance level of 5 × 10−8, commonly defined as genome-wide significance, the benchmark for evaluating the fidelity of the association signal at each marker. 40 Replication is considered the gold standard in GWAS publications. 43 The identification of candidate genetic loci for replication is mainly driven by the level of statistical evidence from single-marker association tests (either the P value or the Bayes factor). 40,44 More advanced approaches, for example, pathway-based analyses and epistasis tests, have also been proposed to prioritize genetic markers for further downstream functional evaluation. These analytic strategies have been covered comprehensively in previous reviews. 45,46  
In gene mapping, phenotypes are usually classified into two broad types: qualitative (or binary) and quantitative (or continuous) traits. Dichotomous traits have been featured in GWAS for age-related macular degeneration (AMD), 34,35 primary open-angle glaucoma (POAG), 31,39 cataract, 37 and high myopia. 36,38 The affected individuals are usually classified on the basis of diagnosis from the worse eye or both eyes, whereas controls exhibit no sign of syndrome for both eyes. Although assessing the binary outcome is more directly relevant to clinical application, quantitative traits (endophenotypes or intermediate traits) underlying diseases are also valuable in the dissection of the genetic architecture, as they take the full-spectrum measures into account. For instance, central corneal thickness (CCT) and cup-to-disc ratio (CDR) are presented as quantitative endophenotypes of open-angle glaucoma (PORG). 47 Mapping genes for CCT 48 50 and CDR 51,52 in GWAS would shed light on the joint genetic etiology of PORG. 
Often, the primary interest in ophthalmic genetic studies for quantitative trait is to locate shared genetic loci that exert effects on both eyes, 53 55 as the physiological mechanism underlying intereye difference of phenotypic abnormalities remains elusive and inadequately understood. Therefore, for quantitative traits collected from both eyes, an immediate question is whether the analyses should be performed on data from one eye or both eyes. In seven GWAS papers on eye-related QTL that have been published to date (http://www.genome.gov/gwastudies), the analytic strategies varied from the use of the right eye, 49,50,52 to a randomly chosen eye, 51 to the averaged measurement from both eyes. 32,33,48 Conducting analysis to one eye alone is a simple approach to avoid statistical model complexity. However, using partial data of one eye only may be statistical insufficient. Averaging ocular measurements between both eyes has been suggested to yield higher heterogeneity estimates than using information from one eye only and therefore tends to have more power in genetic studies. 56 Using averaged ocular measurements therefore has been the convention in linkage study for quantitative trait in the myopia genetics research community. 57 60 In a few scenarios in which the traits may be moderately or weakly correlated between the two eyes, however, 1 neither the use of data from one eye nor an average from both is appropriate, because of the negligence of phenotypic dissimilarity. 
A wide array of statistical approaches has emerged recently for the detection of the pleiotropic genetic factors contributing to multiple correlated traits, which could also be applied to paired-eye data (Table 4). Simultaneous consideration of all correlated phenotypes is shown to be statistically powered to exploit the pleiotropic genetic effects over the univariate analysis. 61 64 The first approach is to combine dependent test statistics or estimators from the univariate analyses for a global assessment on association. 61,65 67 In brief, GWAS tests are conducted for the two eyes separately. The two test statistics from both eyes (for example, z scores) are combined subsequently in a linear form weighted by the covariance matrix estimates. 61,67 Correcting for twice the number of markers is not relevant here, since only one global test is performed for each marker, using the combined statistics. This simple approach does not rely on a complicated model assumption. The second approach is to transform multiple traits to an optimal single phenotype with enhanced heritability, and one such example is principle component analysis. 62,68 This dimension-reduction technique involves intensive computation; thus, the application in paired-eye data may not be straightforward. The third one is model-based joint analysis of bivariate traits, including GEE, 63,69 71 mixed-effects, 64,72 and tree-based regression, 73 et cetera. Of these, the GEE model is the most statistically efficient in performing bivariate association tests. 63,71 To date, few statistical software programs incorporating model-based joint analyses on bivariate traits are available 74 ; much more effort should be devoted to this area. 
Table 4.
 
Summary of Analytic Approaches for Quantitative Trait of Both-Eyes Data in GWAS
Table 4.
 
Summary of Analytic Approaches for Quantitative Trait of Both-Eyes Data in GWAS
Approaches Comments
Data from One Eye
Either eye or a randomized eye Simple; less powerful if the correlation between the two traits is low
Data from Both Eyes
Transform bivariate traits to a single trait average measurements Simple and efficient; statistically less efficient if the correlation between bivariate trait and missing data present on either eye is low.
Principle components analysis 62,68 Statistically powerful; complex; reduce the phenotypes to a single trait; computational intensive.
Combining univariate test statistics 61 Simple and powerful; capable of handling paired-eye traits not highly correlated; robust for partially missing trait values; non-parametric.
Model-Based Approaches
GEE 63,69 71,74 Statistically powerful; robust for various correlation structures; efficient on both normal and nonnormal traits; complex
Mixed-effect model 69,70 Statistically powerful; complex; robust for various correlation structures of multiple traits; computational intensive
Tree-based regression 73 Analytically complex; capable of assessing multiloci association test; computation extremely intensive
Accumulated evidence suggests that most of the GWASs are underpowered, especially for the common variants with small-effect sizes and the associated SNPs generally explain little genetic variation. 75 Meta-analysis provides a robust approach to enhance statistical power and effective sample size by pooling evidence from multiple independent association studies. 76,77 Application of meta-analysis in ophthalmology has become a standard practice to identify genetic polymorphisms that are associated with eye disorders. 32,33,49 52 If the individual GWAS is conducted with different genotyping platforms, the meta-analysis strategy could use only a small subset of overlapping markers. One way to address this problem is imputation-based meta-analysis. It provides a powerful framework for the assessment of the complete array of genetic variants (most of which are untyped). Step-by-step guidelines and techniques for performing imputation-based genome-wide meta-analysis were reviewed by de Bakker et al. 77 In meta-analysis, using homogeneous populations with the similar genetic background, phenotype definition, and sample ascertainment will increase the likelihood of identifying the genuine genetic association. 78 In the presence of heterogeneity across different studies, carefully examining the potential factors that cause heterogeneity is crucial to enhance the credibility of the combined evidence. 
Conclusions
Adopting appropriate statistical methods will permit us to explore the full potential in the analysis of the data and make valid statistical inference. The simple statistical approach commonly used in longitudinal studies by using reduced data in ophthalmology may be useful in some scenarios, but is insufficient to explicitly model the trend of the treatment effects or the longitudinal change of the outcome. In addition, if paired-eye data are involved in longitudinal studies, lack of adjustment for the correlation between the eyes violates the underlying assumptions of independent observations. From a methodological point of view, both GEE and mixed-effects modeling play an increasingly important role in analyzing longitudinal repeated measurements and paired-eye data simultaneously. In GWAS, the statistical challenges raised for ocular traits center on multiple hypothesis testing and analyzing paired-eye data appropriately. Different approaches have been used for analyzing paired-eye data under various GWAS conditions, and the best strategy should be considered for all the factors at the study initiation. Understanding the strengths and weaknesses of the statistical methods enhances our ability to correctly interpret the GWAS and differentiate robust findings from spurious ones; this is especially vital, given the oncoming flood of GWAS data in the genomic era. 
Supplementary Materials
Text s1, DOC - Text s1, DOC 
Footnotes
 Supported by the National Medical Research Council of Singapore (NMRC 1176/2008), the Yong Loo Lin School of Medicine from the National University of Singapore, and the National Research Foundation, NRF-RF-2010-05, Singapore (YYT).
Footnotes
 Disclosure: Q. Fan, None; Y.-Y. Teo, None; S.-M. Saw, None
The authors thank the two anonymous reviewers for their valuable comments, and Xu Haiyan (Centre of Molecular Epidemiology, Singapore) for help with the preparation of the manuscript. 
References
Murdoch IE Morris SS Cousens SN . People and eyes: statistical approaches in ophthalmology. Br J Ophthalmol. 1998;82:971–973. [CrossRef] [PubMed]
Newcombe RG Duff GR . Eyes or patients?—traps for the unwary in the statistical analysis of ophthalmological studies. Br J Ophthalmol. 1987;71:645–646. [CrossRef] [PubMed]
Burton P Gurrin L Sly P . Extending the simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modelling. Stat Med. 1998;17:1261–1291. [CrossRef] [PubMed]
Ray WA O'Day DM . Statistical analysis of multi-eye data in ophthalmic research. Invest Ophthalmol Vis Sci. 1985;26:1186–1188. [PubMed]
Sheu CF . Regression analysis of correlated binary outcomes. Behav Res Methods Instrum Comput. 2000;32:269–273. [CrossRef] [PubMed]
Glynn RJ Rosner B . Accounting for the correlation between fellow eyes in regression analysis. Arch Ophthalmol. 1992;110:381–387. [CrossRef] [PubMed]
Katz J Zeger S Liang KY . Appropriate statistical methods to account for similarities in binary outcomes between fellow eyes. Invest Ophthalmol Vis Sci. 1994;35:2461–2465. [PubMed]
Zeger SL Liang KY . An overview of methods for the analysis of longitudinal data. Stat Med. 1992;11:1825–1839. [CrossRef] [PubMed]
Albert PS . Longitudinal data analysis (repeated measures) in clinical trials. Stat Med. 1999;18:1707–1732. [CrossRef] [PubMed]
Zeger SL Liang KY Albert PS . Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44:1049–1060. [CrossRef] [PubMed]
West BT . Analyzing longitudinal data with the linear mixed models procedure in SPSS. Eval Health Prof. 2009;32:207–228. [CrossRef] [PubMed]
Littell RC . SAS System for Mixed Models. Cary, NC: SAS Institute Inc.; 1996:xiv, p 633.
Glynn RJ Rosner B . Comparison of alternative regression models for paired binary data. Stat Med. 1994;13:1023–1036. [CrossRef] [PubMed]
Keselman HJ Algina J Kowalchuk RK . The analysis of repeated measures designs: a review. Br J Math Stat Psychol. 2001;54:1–20. [CrossRef] [PubMed]
Johnson RA Wichern DW . Applied Multivariate Statistical Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall; 2007:xviii, p 773.
Ludbrook J . Repeated measurements and multiple comparisons in cardiovascular research. Cardiovasc Res. 1994;28:303–311. [CrossRef] [PubMed]
Cnaan A Laird NM Slasor P . Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Stat Med. 1997;16:2349–2380. [CrossRef] [PubMed]
Willett JB Singer JD Martin NC . The design and analysis of longitudinal studies of development and psychopathology in context: statistical models and methodological recommendations. Dev Psychopathol. 1998;10:395–426. [CrossRef] [PubMed]
Miglioretti DL Heagerty PJ . Marginal modeling of nonnested multilevel data using standard software. Am J Epidemiol. 2007;165:453–463. [CrossRef] [PubMed]
Saw SM Shankar A Tan SB . A cohort study of incident myopia in Singaporean children. Invest Ophthalmol Vis Sci. 2006;47:1839–1844. [CrossRef] [PubMed]
Hanley JA Negassa A Edwardes MD Forrester JE . Statistical analysis of correlated data using generalized estimating equations: an orientation. Am J Epidemiol. 2003;157:364–375. [CrossRef] [PubMed]
Fleming TR Lin DY . Survival analysis in clinical trials: past developments and future directions. Biometrics. 2000;56:971–983. [CrossRef] [PubMed]
Lee ET Go OT . Survival analysis in public health research. Annu Rev Public Health. 1997;18:105–134. [CrossRef] [PubMed]
Ohno-Machado L . Modeling medical prognosis: survival analysis techniques. J Biomed Inform. 2001;34:428–439. [CrossRef] [PubMed]
Rosner B Glynn RJ . Multivariate methods for clustered ordinal data with applications to survival analysis. Stat Med. 1997;16:357–372. [CrossRef] [PubMed]
Wienke A . Frailty Models in Survival Analysis. Boca Raton, FL: Taylor & Francis; 2011.
Wei LJ Lin DY Weissfeld L . Regression analysis of multivariate incomplete failure time data by using the marginal distributions. J Am Stat Assoc. 1989:84:1065–1073. [CrossRef]
Altshuler D Daly MJ Lander ES . Genetic mapping in human disease. Science. 2008;322:881–888. [CrossRef] [PubMed]
McCarthy MI Abecasis GR Cardon LR . Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–369. [CrossRef] [PubMed]
Klein RJ Zeiss C Chew EY . Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. [CrossRef] [PubMed]
Thorleifsson G Walters GB Hewitt AW . Common variants near CAV1 and CAV2 are associated with primary open-angle glaucoma. Nat Genet. 2010;42:906–909. [CrossRef] [PubMed]
Hysi PG Young TL Mackey DA . A genome-wide association study for myopia and refractive error identifies a susceptibility locus at 15q25. Nat Genet. 2010;42:902–905. [CrossRef] [PubMed]
Solouki AM Verhoeven VJ van Duijn CM . A genome-wide association study identifies a susceptibility locus for refractive errors and myopia at 15q14. Nat Genet. 2010;42:897–901. [CrossRef] [PubMed]
Chen W Stambolian D Edwards AO . Genetic variants near TIMP3 and high-density lipoprotein-associated loci influence susceptibility to age-related macular degeneration. Proc Natl Acad Sci U S A. 2010;107:7401–7406. [CrossRef] [PubMed]
Dewan A Liu M Hartman S . HTRA1 promoter polymorphism in wet age-related macular degeneration. Science. 2006;314:989–992. [CrossRef] [PubMed]
Li YJ Goh L Khor CC . Genome-wide association studies reveal genetic variants in CTNND2 for high myopia in Singapore Chinese. Ophthalmology. 2011;118:368–375. [CrossRef] [PubMed]
Lin HJ Huang YC Lin JM . Single-nucleotide polymorphisms in chromosome 3p14.1-3p14.2 are associated with susceptibility of Type 2 diabetes with cataract. Mol Vis. 2010;16:1206–1214. [PubMed]
Nakanishi H Yamada R Gotoh N . A genome-wide association analysis identified a novel susceptible locus for pathological myopia at 11q24.1. PLoS Genet. 2009;5:e1000660. [CrossRef] [PubMed]
Nakano M Ikeda Y Taniguchi T . Three susceptible loci associated with primary open-angle glaucoma identified by genome-wide association study in a Japanese population. Proc Natl Acad Sci U S A. 2009;106:12838–12842. [CrossRef] [PubMed]
Balding DJ . A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7:781–791. [CrossRef] [PubMed]
Spielman RS McGinnis RE Ewens WJ . Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet. 1993;52:506–516. [PubMed]
Kerner B North KE Fallin MD . Use of longitudinal data in genetic studies in the genome-wide association studies era: summary of Group 14. Genet Epidemiol. 2009;33(Suppl 1):S93–S98. [CrossRef] [PubMed]
Chanock SJ Manolio T Boehnke M . Replicating genotype-phenotype associations. Nature. 2007;447:655–660. [CrossRef] [PubMed]
Stephens M Balding DJ . Bayesian statistical methods for genetic association studies. Nat Rev Genet. 2009;10:681–690. [CrossRef] [PubMed]
Cantor RM Lange K Sinsheimer JS . Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am J Hum Genet. 2010;86:6–22. [CrossRef] [PubMed]
Cordell HJ . Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10:392–404. [CrossRef] [PubMed]
Charlesworth J Kramer PL Dyer T . The path to open-angle glaucoma gene discovery: endophenotypic status of intraocular pressure, cup-to-disc ratio, and central corneal thickness. Invest Ophthalmol Vis Sci. 2010;51:3509–3514. [CrossRef] [PubMed]
Lu Y Dimasi DP Hysi PG . Common genetic variants near the Brittle Cornea Syndrome locus ZNF469 influence the blinding disease risk factor central corneal thickness. PLoS Genet. 2010;6:e1000947. [CrossRef] [PubMed]
Vithana EN Aung T Khor CC . Collagen-related genes influence the glaucoma risk factor, central corneal thickness. Hum Mol Genet. 2011;20:649–658. [CrossRef] [PubMed]
Vitart V Bencic G Hayward C . New loci associated with central cornea thickness include COL5A1, AKAP13 and AVGR8. Hum Mol Genet. 2010;19:4304–4311. [CrossRef] [PubMed]
Ramdas WD van Koolwijk LM Ikram MK . A genome-wide association study of optic disc parameters. PLoS Genet. 2010;6:e1000978. [CrossRef] [PubMed]
Khor CC Ramdas WD Vithana EN . Genome-wide association studies in Asians confirm the involvement of ATOH7 and TGFBR3, and further identify CARD10 as a novel locus influencing optic disc area. Hum Mol Genet. 2011;20:1864–1872. [CrossRef] [PubMed]
Evans K Bird AC . The genetics of complex ophthalmic disorders. Br J Ophthalmol. 1996;80:763–768. [CrossRef] [PubMed]
Wiggs JL . Genetic etiologies of glaucoma. Arch Ophthalmol. 2007;125:30–37. [CrossRef] [PubMed]
Young TL Metlapally R Shay AE . Complex trait genetics of refractive error. Arch Ophthalmol. 2007;125:38–48. [CrossRef] [PubMed]
Carbonaro F Andrew T Mackey DA Young TL Spector TD Hammond CJ . Repeated measures of intraocular pressure result in higher heritability and greater power in genetic linkage studies. Invest Ophthalmol Vis Sci. 2009;50:5115–5119. [CrossRef] [PubMed]
Ciner E Ibay G Wojciechowski R . Genome-wide scan of African-American and white families for linkage to myopia. Am J Ophthalmol. 2009;147:512–517 e512. [CrossRef] [PubMed]
Ciner E Wojciechowski R Ibay G Bailey-Wilson JE Stambolian D . Genomewide scan of ocular refraction in African-American families shows significant linkage to chromosome 7p15. Genet Epidemiol. 2008;32:454–463. [CrossRef] [PubMed]
Hammond CJ Andrew T Mak YT Spector TD . A susceptibility locus for myopia in the normal population is linked to the PAX6 gene region on chromosome 11: a genomewide scan of dizygotic twins. Am J Hum Genet. 2004;75:294–304. [CrossRef] [PubMed]
Klein AP Duggal P Lee KE Klein R Bailey-Wilson JE Klein BE . Support for polygenic influences on ocular refractive error. Invest Ophthalmol Vis Sci. 2005;46:442–446. [CrossRef] [PubMed]
Xu X Tian L Wei LJ . Combining dependent tests for linkage or association across multiple phenotypic traits. Biostatistics. 2003;4:223–229. [CrossRef] [PubMed]
Klei L Luca D Devlin B Roeder K . Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32:9–19. [CrossRef] [PubMed]
Yang F Tang Z Deng H . Bivariate association analysis for quantitative traits using generalized estimation equation. J Genet Genomics. 2009;36:733–743. [CrossRef] [PubMed]
Jiang C Zeng ZB . Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995;140:1111–1127. [PubMed]
O'Brien PC . Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [CrossRef] [PubMed]
Wei LJ Johnson WE . Combining dependent tests with incomplete repeated measurements. Biometrika. 1985;72:359–364. [CrossRef]
Yang Q Wu H Guo CY Fox CS . Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol. 2010;34:444–454. [CrossRef] [PubMed]
Lange C van Steen K Andrew T . A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol. 2004;3:Article17. [PubMed]
Liu J Pei Y Papasian CJ Deng HW . Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol. 2009;33:217–227. [CrossRef] [PubMed]
Liu YZ Pei YF Liu JF . Powerful bivariate genome-wide association analyses suggest the SOX6 gene influencing both obesity and osteoporosis phenotypes in males. PLoS One. 2009;4:e6827. [CrossRef] [PubMed]
Lange C Silverman EK Xu X Weiss ST Laird NM . A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics. 2003;4:195–206. [CrossRef] [PubMed]
Hackett CA Meyer RC Thomas WT . Multi-trait QTL mapping in barley using multivariate regression. Genet Res. 2001;77:95–106. [CrossRef] [PubMed]
Yu K Wheeler W Li Q . A partially linear tree-based regression model for multivariate outcomes. Biometrics. 2010;66:89–96. [CrossRef] [PubMed]
Lange C DeMeo D Silverman EK Weiss ST Laird NM . PBAT: tools for family-based association studies. Am J Hum Genet. 2004;74:367–369. [CrossRef] [PubMed]
Goldstein DB . Common genetic variation and human traits. N Engl J Med. 2009;360:1696–1698. [CrossRef] [PubMed]
Munafo MR Flint J . Meta-analysis of genetic association studies. Trends Genet. 2004;20:439–444. [CrossRef] [PubMed]
de Bakker PI Ferreira MA Jia X Neale BM Raychaudhuri S Voight BF . Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008;17:R122–128. [CrossRef] [PubMed]
Ioannidis JP Patsopoulos NA Evangelou E . Uncertainty in heterogeneity estimates in meta-analyses. BMJ. 2007;335:914–916. [CrossRef] [PubMed]
Table 1.
 
Categories of Analytic Strategies in Clinical and Epidemiological Papers*
Table 1.
 
Categories of Analytic Strategies in Clinical and Epidemiological Papers*
A. Analyses at Subject Level versus Ocular Level†
Articles n (%) No Correction for Correlation on Paired Eyes‡
Clinical study
    Subject level 24 (20.9)
    Ocular level 24 (20.9) 10
Epidemiology study
    Subject level 48 (41.8)
    Ocular level 19 (16.4) 4
Total 115 (100)
B. Statistical Approaches for Longitudinal Follow-up Study§
Articles, n (%)
t-test/paired t2/McNemar test 18 (26.1)
Wilcoxon rank sum test 11 (15.9)
Logistical/linear regression 10 (14.5)
Repeated ANOVA 6 (8.7)
Mixed model/GEE 11 (15.9)
Survival based analysis 13 (18.9)
Total 69 (100)
Table 2.
 
Statistical Approaches for Longitudinal Follow-up Study
Table 2.
 
Statistical Approaches for Longitudinal Follow-up Study
Approaches Outcome Adjust for Correlation Comments
Paired Eyes Repeated Measures
Charting Event Progression
t-test/ANOVA χ2 Wilcoxon rank tests Continuous/discrete No No Straightforward; perform analysis at each time point or use changes as outcome, less powerful due to discarded information; cannot model the time trend or the predicators associated with outcome
Linear/logistical regression Continuous/binary No No Straightforward; perform analysis at each time point or use changes as outcome; adjust baseline covariates in the model; less powerful if discarding information; cannot model the longitudinal trend
Repeated ANOVA 14 Continuous Yes Yes Analytically complex; require balanced data design; less robust to missing data; cannot model individual trend
Mixed-effects model 11,17 Continuous/binary/count Yes Yes Statistically powerful; analytically complex; can model both fixed and random effects; flexible framework in specifying parameter distribution; capable of handling unbalanced data
GEE 10 Continuous/binary/count Yes Yes Statistically powerful; analytically complex; capable of handling unbalanced data; model marginal effects; less powerful in handling missing data
Charting Event Onset Time to Event
Kaplan-Meier Continuous No NA Straightforward; estimate the survival rates
Log rank test Continuous No NA Simple nonparametric approach to compare the rates; unable to adjust covariates
Proportional Cox model Continuous No NA Quantify effects of covariates on the survival time; compare the rates by groups
Frailty model 26 Continuous Yes NA Analytically complex; capable of modeling correlated time to event data; flexible framework for random effects
Marginal model 27 Continuous Yes NA Analytically complex; capable of modeling correlated time to event data; robust to time-dependent covariates; estimate marginal effects
Table 3.
 
Results of Analyzing Repeated Sphere Equivalent in a Longitudinal Study Using Different Analytic Approaches
Table 3.
 
Results of Analyzing Repeated Sphere Equivalent in a Longitudinal Study Using Different Analytic Approaches
Whole Data Analysis Partial Data Analysis
Longitudinal Data: Account for Intereye Correlation Longitudinal Data: Ignore Intereye Correlation Longitudinal Data: Both-Eyes Average Cross-Sectional Data
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7
β (SE) P β (SE) P β (SE) P β (SE) P β (SE) P β (SE) P β (SE) P
Intercept −0.29 (0.42) 0.492 −0.50 (0.48) 0.296 −0.27 (0.32) 0.399 −0.48 (0.35) 0.167 −0.22 (0.49) 0.661 −0.56 (0.56) 0.316 −1.93 (0.63) 0.003
School
    1 Referent Referent Referent Referent Referent Referent Referent
    2 0.78 (0.29) 0.009 0.87 (0.30) 0.003 0.75 (0.22) 0.001 0.87 (0.22) <0.0001 0.73 (0.34) 0.033 0.88 (0.35) 0.013 1.26 (0.61) 0.040
Time, y
    1st Referent Referent Referent Referent Referent Referent
    2nd −0.70 (0.04) <0.0001 −0.64 (0.06) <0.0001 −0.70 (0.04) <0.0001 −0.64 (0.05) <0.0001 −0.70 (0.06) <0.0001 −0.64 (0.07) <0.0001
    3rd −1.27 (0.05) <0.0001 −1.27 (0.13) <0.0001 −1.28 (0.05) <0.0001 −1.27 (0.10) <0.0001 −1.29 (0.07) <0.0001 −1.35 (0.15) <0.0001
    4th −1.73 (0.06) <0.0001 −1.71 (0.18) <0.0001 −1.74 (0.06) <0.0001 −1.71 (0.14) <0.0001 −1.74 (0.10) <0.0001 −1.80 (0.22) <0.0001
School* time 0.18 (0.02) <0.0001 0.15 (0.07) 0.030 0.18 (0.02) <0.0001 0.15 (0.05) 0.005 0.17 (0.08) <0.0001 0.17 (0.08) 0.037
Age, y
    7 Referent Referent Referent Referent Referent Referent Referent
    8 0.32 (0.27) 0.235 0.38 (0.27) 0.153 0.30 (0.20) 0.149 0.38 (0.20) 0.054 0.27 (0.32) 0.392 0.41 (0.32) 0.197 0.47 (0.41) 0.250
    9 −0.90 (0.30) 0.004 −0.92 (0.36) 0.010 −0.88 (0.23) <0.0001 −0.92 (0.27) 0.001 −0.85 (0.34) 0.014 −0.93 (0.39) 0.018 −0.88 (0.45) 0.051
Sex
    Male Referent Referent Referent Referent Referent Referent Referent
    Female 0.25 (0.23) 0.278 0.18 (0.24) 0.474 0.20 (0.17) 0.253 0.18 (0.18) 0.341 0.14 (0.27) 0.607 0.06 (0.28) 0.821 0.12 (0.34) 0.733
Race
    Non-Chinese Referent Referent Referent Referent Referent Referent Referent
    Chinese −0.78 (0.28) 0.005 −0.60 (0.30) 0.041 −0.76 (0.21) <0.001 −0.60 (0.22) 0.006 −0.77 (0.33) 0.018 −0.43 (0.35) 0.214 −0.83 (0.42) 0.049
Books read per week
    ≤2 Referent Referent Referent Referent Referent Referent Referent
    >2 −0.20 (0.24) 0.401 −0.15 (0.26) 0.563 −0.14 (0.18) 0.447 −0.15 (0.19) 0.442 −0.05 (0.28) 0.849 −0.11 (0.29) 0.700 −0.13 (0.36) 0.730
Table 4.
 
Summary of Analytic Approaches for Quantitative Trait of Both-Eyes Data in GWAS
Table 4.
 
Summary of Analytic Approaches for Quantitative Trait of Both-Eyes Data in GWAS
Approaches Comments
Data from One Eye
Either eye or a randomized eye Simple; less powerful if the correlation between the two traits is low
Data from Both Eyes
Transform bivariate traits to a single trait average measurements Simple and efficient; statistically less efficient if the correlation between bivariate trait and missing data present on either eye is low.
Principle components analysis 62,68 Statistically powerful; complex; reduce the phenotypes to a single trait; computational intensive.
Combining univariate test statistics 61 Simple and powerful; capable of handling paired-eye traits not highly correlated; robust for partially missing trait values; non-parametric.
Model-Based Approaches
GEE 63,69 71,74 Statistically powerful; robust for various correlation structures; efficient on both normal and nonnormal traits; complex
Mixed-effect model 69,70 Statistically powerful; complex; robust for various correlation structures of multiple traits; computational intensive
Tree-based regression 73 Analytically complex; capable of assessing multiloci association test; computation extremely intensive
Text s1, DOC
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×