2017; Martin et al. We expect that our insights will transfer to some extent to other traits, ancestries, and cohorts, but there may be significant differences. We also calculated PRS using LDpred (Vilhjálmsson et al. Enter multiple addresses on separate lines or separate them with commas. We used the UKB_eur imputed genotypes as an LD reference panel and the UKB GWAS summary statistics for height. LD clumping yields higher predictive power but depends on prior knowledge of the population-specific LD structure and has the highest difference in PRS between 1000 Genomes European and African Populations (Table S1). Next, we calculated the partial-R2 for each bin. Cohort Profile: the Health and Retirement Study (HRS). 2009; Tishkoff et al. Funding for CARe genotyping was provided by NHLBI Contract N01-HC-65226. 2015; Torkamani et al. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. With better-powered GWAS to estimate ancestry-specific effect sizes, the improvement should be more extensive. A third prediction is that, if differences in partial-R2 are driven by differences in ability to tag the causal variant, then PRS constructed from imputed genotypes should see a smaller decrease in predictive power than those constructed from genotype array data. Finally, we demonstrate that prediction for admixed individuals can be improved by using a linear combination of PRS that includes ancestry-specific effect sizes, although this approach is at present limited by the small size of non-European ancestry discovery cohorts. B: Partial-R2 for two clumping strategies (100 and 500Kb windows with either P < 0.005 or P < 0.00005) for imputed and genotyped sets of SNPs. The Women’s Health Initiative (WHI) data were obtained from dbGaP accession phs000200.v12.p3. 2018) (UKB), the Women’s Health Initiative (Hays et al. Ideally, we would have enough individuals and reference panels to properly integrate the different components of African ancestry into our analyses, and this is an essential problem for future research. 2019; Marnetto et al. We show that differences in LD structure and SFS affect the transferability of PRS but do not explain the full magnitude of the decrease. We estimate this ratio to be: 0.78 (UKB), 0.92 (HRS), 1.04 (JHS), and 1.07 (WHI), suggesting that at most 8% of the decrease in partial-R2 (in non-UKB samples) can be explained by differences in the site frequency spectrum (SFS). Here, we show that the predictive power of PRS is approximately proportional to ancestry in populations of admixed European and African ancestry. The predictive power of PRS constructed from these studies is substantially lower in non-European ancestry cohorts, although the reasons for this are unclear. J. R. Stat. To evaluate predictive power, we fitted a linear model of height as a function of sex, age, age2, genome-wide European ancestry proportion (), and PRS (), and compared it to a model without PRS (). We further filtered this set to contain individuals with at least 5% genome-wide African ancestry and sex-corrected height not less than 2 sd below the mean (Figure S3, to remove individuals with anomalously low height values), resulting in 2,270 individuals (referred to as “HRS_afr”). This result is robust to the set of SNPs used in the PRS, with intercepts ranging between -1% and 2.5%, depending on the pruning strategy (Figure S6). We first intersected the ∼13.5 million SNPs from the UK Biobank summary statistics and the genotyped SNPs in each dataset (Table 1). Such factors may include inter-cohort differences in data collection, phenotype or environment, differences in linkage disequilibrium (LD) structure or allele frequencies across populations, differences in causal or marginal effect sizes, and epistatic or gene-environment interactions (Novembre and Barton 2018). 2017; Martin et al. For African ancestry segments, effect sizes from admixed Africans are weighted by a constant, α. Finally, we performed a weighted regression – using the inverse of the bootstrap standard deviation as weights — of the partial-R2 values on the median proportion of European ancestry in each bin. Thank you for sharing this G3: Genes | Genomes | Genetics article. Second, we explore the roles of different biological and statistical factors in driving this reduction. We clumped SNPs in physical and genetic windows using a range of p-value thresholds. Our approach has several limitations. This ratio is the relative difference we would expect if the effect sizes and LD structure were the same across ancestries, and only allelic frequencies differed. Physical window sizes (in Kb) were: 1,000, 500, 100, 75, 50, 25, 10, 5. Data from genotype arrays. However, a major barrier to the use of PRS is that the majority of GWAS come from cohorts of European ancestry. 2020). The Jackson Heart Study (JHS) data were accessed through dbGaP accession phs000286.v6.p2. Assuming constant environmental variance and height heritability to be 80%, it would follow that that the European phenotypic variance would be about 24% lower (0.8*0.7+0.2). If this were true for SNPs that causally affect height, then the additive genetic variance of those SNPs would also be 30% lower. Consequently, much of the potential of genomic disease risk profiling is restricted to European ancestry populations. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. We tested 81 approaches to PRS construction, including five different p-value cutoffs and 15 window sizes, pairwise r2 and LD blocks inferred for African and European populations, and the infinitesimal model of LDpred. The UK Biobank Resource was used under Application 33923. Datasets used in this study. Sheets 1-6: Different SNP sets generated by clumping and their PRS values. By incorporating effect sizes from admixed populations in a linear combination of PRS, we are able to improve predictive power, in agreement with previous findings (Márquez-Luna et al. We focused on the clumping and thresholding approach to PRS construction, although we saw consistent results with LDpred’s infinitesimal model (Figure S7). The partial-R2 between the two models gives the proportion of the phenotypic variation explained by the PRS, to which we refer as partial-R2 or predictive power, throughout. We then computed a chi-squared statistic for the difference between the Admixed African effect size (, with standard error ) we obtained and the European effect sizes from the UK Biobank (with standard error ): We calculated PRS for each individual, j, as the weighted sum of effect sizes: where the sum is over all M SNPs used in the PRS, Gij is the effect allele dosage (0, 1 or 2) of individual j at SNP i, and is the estimated effect size of the effect allele at SNP i. We also performed the same analysis using a recombination map derived for CEU (European) individuals from the 1000 Genomes Project (Spence and Song 2019). On the other hand, if the predictive power were uniformly distributed across the genome, we would expect a quadratic relationship: the partial-R2 of the whole genome (which scales linearly with ancestry) would be multiplied by the proportion of the genome in European ancestry segments (i.e., ancestry). Polygenic risk scores (PRSs) have become the standard for quantifying genetic liability in the prediction of disease risks. Among the clumping and thresholding (c+t) strategies, increasing the p-value cutoff and window sizes improves prediction (Figure S6 and Table S1). As a result, the clinical utility of PRS has been explored mainly in European ancestry populations, and little is known about the biological and methodological factors influencing prediction in non-Europeans (Martin et al. The Health and Retirement Study genetic data were accessed through dbGaP accession phs000428.v2.p2. RFMix: A discriminative modeling approach for rapid and robust local-ancestry inference. Polygenic risk scores (PRS) use the results of genome-wide association studies (GWAS) to predict quantitative phenotypes or disease risk at an individual level, and provide a potential route to the use of genetic data in personalized medical care. Genetic window sizes (in cM) were: 1, 0.5, 0.3, 0.25, 0.2, 0.15, 0.1. 2011; Mavaddat et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Characterizing the admixed African ancestry of African Americans. Using imputed genotypes for the HRS and UKB cohorts, we find that the relationship between ancestry and partial-R2 is the same for imputed and array data suggesting that this is not the case (Figure 4A). A second prediction is that the difference between effect sizes estimated in European and African ancestry populations should be larger in regions of high recombination. One possibility is that those arrays are more biased toward SNPs that are common across ancestries. Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. It is important to note that different datasets use different arrays, and a different pattern could be observed for other datasets. In this case, the PRS could capture the same absolute amount of phenotypic variance, but the proportion of variance explained would be higher in European ancestry populations. We find no significant correlation , P = 0.97) between and local recombination rate (Figure 3D), and a small positive correlation between and European LD scores (Bulik-Sullivan et al. 2019). C: correlation between PRS SNPs effect sizes from Europeans and Admixed Africans in the WHI_afr dataset. 2020) and gene-by-ancestry interactions may also contribute, and the relative importance of these mechanisms remains to be quantified. Moreover, the ratio of genetic variance explained by PRS SNPs is similar for imputed and genotyped data (Figure 4C). We also used Equation 2 to calculate PRS based only on European ancestry segments of the genome (from the local ancestry analysis) and repeated the analysis of partial-R2 as a function of . The personal and clinical utility of polygenic risk scores. The dashed line represents no difference in performance between the linear combinations and PRSeur. Analysis of protein-coding genetic variation in 60,706 humans. Unweighted PRS and the effect of local allele frequency differences on effect size differences. 2017). PRS are simply sums of the risk alleles carried by an individual weighted by their effect sizes (Purcell et al. We ran unsupervised ADMIXTURE (Alexander et al. Although the inclusion of individual and local ancestry information yielded only a modest increase in predictive power, this is likely due to the low sample size of our African-ancestry GWAS. Data from genotype arrays. The dashed line shows the regression with standard errors shaded in light gray. 2020). Thus, we focused on strategies that are independent of LD and chose a set of SNPs using a p-value threshold of 0.0005 and a physical window of 100 Kb, which includes ∼5,600-7,100 SNPs (Table 1) and obtains partial-R2 values close to the LD clumping strategies while requiring about 10-fold fewer SNPs. Sign up to receive alert notifications of new articles. There is little to choose between these estimators in terms of power, correlation or AUC, but the unweighted estimator will perform relatively worse as sample size increases since its … This research was supported by a Research Fellowship from the Alfred P. Sloan foundation [FG-2018-10647], a New Investigator Research Grant from the Charles E. Kaufman Foundation [KA2018-98559], and NIGMS award number [R35GM133708] to I.M. The study is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I/HHSN26800001) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute for Minority Health and Health Disparities (NIMHD).