Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 17, Issue 6


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)

Huaqing ZhaoORCID iD: https://orcid.org/0000-0002-0953-4768 / Nandita Mitra
  • Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Peter A. Kanetsky
  • Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Katherine L. Nathanson
  • Department of Medicine, University of Pennsylvania, South Pavilion, Perelman Center for Advanced Medicine, Philadelphia, PA 19104, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Timothy R. Rebbeck
  • Division of Population Sciences, Dana Farber Cancer Institute and Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2018-12-04 | DOI: https://doi.org/10.1515/sagmb-2017-0054


Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.

This article offers supplementary material which is provided at the end of the article.

Keywords: bias; principal components analysis; propensity score; testicular germ cell tumors; Tracy-Widom statistic


  • Airy, G. (1838): “On the intensity of light in the neighbourhood of a caustic,” Thans. Cambr. Phil. Soc., 6, 379–402.Google Scholar

  • Allen, A., M. P. Epstein and G. A. Satten (2010): “Score-based adjustment for confounding by population stratification in genetic association studies,” Genet. Epidemiol., 34(5), 383–385.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Bouaziz, M., C. Ambroise and M. Guedj (2011): “Accounting for population stratification in practice: a comparison of the main strategies dedicated to genome-wide association studies,” PLoS One, 6, e28845.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Cepeda, M. S., R. Boston, J. T. Farrar and B. L. Strom (2003): “Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders,” Am J Epidemiol, 158, 280–287.PubMedCrossrefGoogle Scholar

  • Chen, H., C. Wang, M. P. Conomos, A. M. Stilp, Z. Li, T. Sofer, A. A. Szpiro, W. Chen, J. M. Brehm, J. C. Celedón, S. Redline, G. J. Papanicolaou, T. A. Thornton, C. C. Laurie, K. Rice and X. Lin (2016): “Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models,” Am. J. Hum. Genet., 98, 653–666.PubMedWeb of ScienceCrossrefGoogle Scholar

  • de Andrade, M., D. Ray, A. C. Pereira and J. P. Soler (2015): “Global individual ancestry using principal components for family data,” Hum. Hered., 80, 1–11.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Devlin, B. and K. Roeder (1999): “Genomic control for association studies,” Biometrics, 55, 997–1004.PubMedCrossrefGoogle Scholar

  • Dominici, D. and R. S. Maier (2008): Special Functions and Orthogonal Polynomials, American Mathematical Society.Google Scholar

  • Drake, C. (1993): “Effects of misspecification of the propensity score on estimators of treatment effect,” Biometrics, 49, 1231–1236.CrossrefGoogle Scholar

  • Epstein, M. P., A. S. Allen and G. A. Satten (2007): “A simple and improved correction for population stratification in case-control studies,” Am. J. Hum. Genet., 80, 921–930.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Epstein, M. P., R. Duncan, K. A. Broadaway, M. He, A. S. Allen and G. A. Satten (2012): “Stratification-score matching improves correction for confounding by population stratification in case-control association studies,” Genet. Epidemiol., 36, 195–205.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Feng, Q., J. Abraham, T. Feng, Y. Song, R. C. Elston and X. Zhu (2009): “A method to correct for population structure using a segregation model,” BMC Proc., 3(Suppl 7), S104.CrossrefPubMedGoogle Scholar

  • Hastings, S. P. and J. B. McLeod (1980): “A boundary value problem associated with the second Painleve transcendent and the Korteweg-de Vries equation,” Arch. Ration. Mech. An., 73, 31–51.CrossrefGoogle Scholar

  • Imbens, G. W. (2004): “Nonparametric estimation of average treatment effects under exogeneity: a review,” Rev. Econ. Stat., 86, 4–29.CrossrefGoogle Scholar

  • Johnstone, I. M. (2001): “On the distribution of the largest eigenvalue in principal components analysis,” Ann. Stat., 29, 295–327.CrossrefGoogle Scholar

  • Kanetsky, P. A., N. Mitra, S. Vardhanabhuti, M. Li, D. J. Vaughn, R. Letrero, S. L. Ciosek, D. R. Doody, L. M. Smith, J. Weaver, A. Albano, C. Chen, J. R. Starr, D. J. Rader, A. K. Godein, M. P. Reilly, H. Hakonarson, S. M. Schwartz and K. L. Nathanson (2009): “Common variation in KITLG and at 5q31.3 predisposes to testicular germ cell cancer,” Nat. Genet., 41, 811–815.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Kang, H. M., J. H. Sul, S. K. Service, N. A. Zaitlen, S.-Y. Kong, N. B. Freimer, C. Sabatti and E. Eskin (2010): “Variance component model to account for sample structure in genome-wide association studies,” Nat. Gene., 42, 348–354.CrossrefGoogle Scholar

  • Kang, S. J., E. K. Larkin, Y. Song, J. Barnholtz-Sloan, D. Baechle, T. Feng and X. Zhu (2009): “Assessing the impact of global versus local ancestry in association studies,” BMC Proc., 3(Suppl 7), S107.CrossrefPubMedGoogle Scholar

  • Lee, A. B., D. Luca, L. Klei, B. Devlin and K. Roeder (2010): “Discovering genetic ancestry using spectral graph theory,” Genet. Epidemiol., 34, 51–59.PubMedWeb of ScienceGoogle Scholar

  • Li, C. and M. Li (2008): “GWAsimulator: a rapid whole-genome simulation program,” Bioinformatics, 24, 140–142.Web of ScienceCrossrefPubMedGoogle Scholar

  • Li, Q., S. Wacholder, D. J. Hunter, R. N. Hoover, S. Chanock, G. Thomas and K. Yu (2009): “Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment,” Genet. Epidemiol., 33, 432–441.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Li, Q., and K. Yu (2008): “Improved correction for population stratification in genomewide association studies by identifying hidden population structures,” Genet. Epidemiol., 32, 215–226.CrossrefGoogle Scholar

  • Lin, D. Y. and D. Zeng. (2011): “Correcting for population stratification in genomewide association studies,” J. Am. Stat. Assoc., 106, 997–1008.Web of ScienceCrossrefPubMedGoogle Scholar

  • Liu, L., D. Zhang, H. Liu and C. Arendt (2013): “Robust methods for population stratification in genome wide association studies,” BMC Bioinformatics, 14, 132.Web of ScienceCrossrefPubMedGoogle Scholar

  • Luca, D., S. Ringquist, L. Klei, A. B. Lee, C. Gieger, H. E. Wichmann, S. Schreiber, M. Krawczak, Y. Lu, A. Styche, B. Devlin, K. Roeder and M. Trucco (2008): “On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants,” Am. J. Hum. Genet., 82, 453–63.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Lunceford, J. K. and M. Davidian (2004): “Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study,” Stat. Med., 23, 2937–2960.Web of ScienceCrossrefPubMedGoogle Scholar

  • McPeek, M. and M. Abney (2008): “Association testing with principal-components-based correction for population stratification,” The American Society of Human Genetics, November 13, 2008, Philadelphia, PA.Google Scholar

  • Patterson, N., A. L. Price and D. Reich (2006): “Population structure and eigenanalysis,” PLoS Genet., 2, e190.CrossrefPubMedGoogle Scholar

  • Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.CrossrefPubMedGoogle Scholar

  • Price, A. L., N. A. Zaitlen, D. Reich and N. Patterson (2010): “New approaches to population stratification in genome-wide association studies,” Nat. Rev. Genet., 11, 459–463.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Pritchard, J. K. and P. Donnelly (2001): “Case-control studies of association in structured or admixed populations,” Theor. Popul. Biol., 60, 227–237.PubMedCrossrefGoogle Scholar

  • Pritchard, J. K., M. Stephens, N. A. Rosenberg and P. Donnelly (2000): “Association mapping in structured populations,” Am. J. Hum. Genet., 67, 170–181.CrossrefPubMedGoogle Scholar

  • Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Maller, P. Sklar, P. I. de Bakker, M. J. Daly and P. C. Sham (2007): “PLINK: a tool set for whole-genome association and population-based linkage analyses,” Am. J. Hum. Genet., 81, 559–575.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Ray, D. and S. Basu (2017): “A novel association test for multiple secondary phenotypes from a case-control GWAS,” Genet. Epidemiol., 41, 413–426.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Rosenbaum, P. R. and D. B. Rubin (1983): “The central role of the propensity score in observational studies for causal effects,” Biometrika, 70, 41–55.CrossrefGoogle Scholar

  • Tracy, C. A. and H. Widom (1993): “Level-spacing distributions and the Airy kernel,” Phys. Lett. B., 305, 115–118.CrossrefGoogle Scholar

  • Tracy, C. A. and H. Widom (1994): “Level-spacing distributions and the Airy kernel,” Commun. Math. Phys., 159, 151–174.CrossrefGoogle Scholar

  • Tracy, C. A. and H. Widom (1996): “On orthogonal and symplectic matrix ensembles,” Commun. Math. Phys., 177, 727–754.CrossrefGoogle Scholar

  • Voight, B. F. and J. K. Pritchard (2005): “Confounding from cryptic relatedness in case-control association studies,” PLoS Genet., 1:e32.CrossrefPubMedGoogle Scholar

  • Wan, F. and N. Mitra (2016): “An evaluation of bias in propensity score adjusted non-linear regression models,” Stat. Methods Med. Res., 27:846–862.Google Scholar

  • Wang, D., Y. Sun, P. Stang, J. A. Berlin, M. A. Wilcox and Q. Li (2009): “Comparison of methods for correcting population stratification in a genome-wide association study of rheumatoid arthritis: Principal-component analysis versus multidimensional scaling,” BMC Proc., 3(Suppl 7), S109.CrossrefGoogle Scholar

  • Weir, B. S., A. D. Anderson and A. B. Hepler (2006): “Genetic relatedness analysis: modern data and new challenges,” Nat. Rev. Genet., 7, 771–780.PubMedCrossrefGoogle Scholar

  • Zhang, Y. and W. Pan (2015): “Principal component regression and linear mixed model in associaiton analysis of structured samples: competitors or complements?,” Genet. Epidemiol., 39, 149–155.CrossrefGoogle Scholar

  • Zhang, Z., E. Ersoz, C.-Q. Lai, R. J. Todhunter and H. K. Tiwari (2010): “Mixed linear model approach adapted for genome-wide association studies,” Nat. Genet., 42, 355–360.Web of ScienceCrossrefPubMedGoogle Scholar

  • Zhang, Y., W. Guan and W. Pan (2013a): “Adjustment for population stratification via principal components in association analysis of rare variants,” Genet. Epidemiol., 37, 99–109.Web of ScienceCrossrefGoogle Scholar

  • Zhang, Y., X. Shen and W. Pan (2013b): “Adjusting for population stratification in a fine scale with principal components and sequencing data,” Genet. Epidemiol., 37, 787–801.Web of ScienceCrossrefGoogle Scholar

  • Zhao, H., T. R. Rebbeck and N. Mitra (2009): “A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors,” Genet. Epidemiol., 33, 679–690.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Zhao, H., T. R. Rebbeck and N. Mitra (2012): “Analyzing genetic association studies with an extended propensity score approach,” Stat. Appl. Genet. Mol. Biol., 11, ISSN (Online) 1544–6115, DOI: https://doi.org/10.1515/1544-6115.1790.PubMed

  • Zhu, X., S. Li, R. S. Cooper and R. C. Elston (2008): “A unified association analysis approach for family and unrelated samples correcting for stratificaiton,” Am. J. Hum. Genet., 82, 352–365.CrossrefPubMedGoogle Scholar

  • Zou, F., S. Lee, R. Knowles and F. A. Wright (2010): “Quantification of population structure using correlated SNPs by shrinkage principal components,” Hum. Hered., 70, 9–22.PubMedCrossrefWeb of ScienceGoogle Scholar

About the article

Published Online: 2018-12-04

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 17, Issue 6, 20170054, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2017-0054.

Export Citation

©2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Supplementary Article Materials

Comments (0)

Please log in or register to comment.
Log in