We inferred principal components of genetic ancestry using the program EIGENSOFT.
16 To make comparisons with reference populations of known ancestry, we included reference panels of unrelated Northern Europeans (CEU,
n = 87) and West Africans (YRI,
n = 88) from the 1000 Genomes Project,
17 and Native Americans (
n = 105).
18 We retained and included the first four principal components as covariates in downstream association analyses. To assess control of population stratification, the genomic control inflation factor
19 was calculated and a quantile-quantile (Q-Q) plot was generated to visualize the distribution of the test statistics. Linear regression, adjusting for age, sex, and principal components of genetic ancestry, was conducted to assess the associations between SNPs and VCDR among study participants in the discovery set using PLINK (v1.90).
13 Additionally, an additive genetic effects model was assumed. To account for relatedness among individuals in the replication set, we used a linear mixed-effects model (Proc Mixed procedure of SAS v9.4; SAS Institute, Cary, NC, USA) to test the associations between SNPs and VCDR, adjusting for age, sex, and principal components of genetic ancestry. The empirical “sandwich” estimator and compound symmetry covariance structure were used during linear mixed-effects modeling. The software EMMAX (Efficient Mixed-Model Association Expedited; available in the public domain at
http://csg.sph.umich.edu/kang/emmax/download/)
20 was used to analyze genotyped and imputed SNPs for the full study sample, the discovery and replication sets combined, using linear mixed-effects modeling to account for population stratification and relatedness, adjusting for age, sex, kinship, and principal components of genetic ancestry. Allelic dosage was used to account for genotype imputation uncertainty for imputed SNPs in EMMAX. SNPs with a
P < 1 × 10
−6 in the discovery set were retained and analyzed in the replication set. SNPs reaching the genome-wide significance threshold (
P < 5 × 10
−8) were declared significant and SNPs reaching
P < 1 × 10
−6 were declared suggestive during the full study sample analysis. The program
simpleM21–23 (available in the public domain at
http://simplem.sourceforge.net) was used to identify the effective number of independent tests as a multiple testing correction method for the replication of previously published loci. Conditional association analysis was performed by including the lead SNP as a covariate in the regression model. Graphing was performed using R
24 (available in the public domain at
https://www.r-project.org) and LocusZoom
25 (available in the public domain at
http://csg.sph.umich.edu/locuszoom) (hg19/1KGP 2014 AMR).