Learn more: PMC Disclaimer | PMC Copyright Notice
Meta-analysis of genome-wide association studies in >80,000 subjects identifies multiple loci for C-reactive protein levels
Associated Data
Abstract
Background
C-reactive protein (CRP) is a heritable marker of chronic inflammation that is strongly associated with cardiovascular disease. We aimed to identify genetic variants that are associated with CRP levels.
Methods and Results
We performed a genome wide association (GWA) analysis of CRP in 66,185 participants from 15 population-based studies. We sought replication for the genome wide significant and suggestive loci in a replication panel comprising 16,540 individuals from ten independent studies. We found 18 genome-wide significant loci and we provided evidence of replication for eight of them. Our results confirm seven previously known loci and introduce 11 novel loci that are implicated in pathways related to the metabolic syndrome (APOC1, HNF1A, LEPR, GCKR, HNF4A, and PTPN2), immune system (CRP, IL6R, NLRP3, IL1F10, and IRF1), or that reside in regions previously not known to play a role in chronic inflammation (PPP1R3B, SALL1, PABPC4, ASCL1, RORA, and BCL7B). We found significant interaction of body mass index (BMI) with LEPR (p<2.9×10−6). A weighted genetic risk score that was developed to summarize the effect of risk alleles was strongly associated with CRP levels and explained approximately 5% of the trait variance; however, there was no evidence for these genetic variants explaining the association of CRP with coronary heart disease.
Conclusion
We identified 18 loci that were associated with CRP levels. Our study highlights immune response and metabolic regulatory pathways involved in the regulation of chronic inflammation.
C-reactive protein (CRP) is a general marker of systemic inflammation. High CRP levels are associated with increased risks of mortality1 and major diseases including diabetes mellitus2, hypertension3, coronary heart disease4, and stroke5. The heritability of CRP levels is estimated to be 25–40%6–8, suggesting that genetic variation is a major determinant of CRP levels. A genome-wide association (GWA) study in 6,345 women found seven loci associated with CRP levels9. These loci were in or close to genes encoding CRP (CRP), leptin receptor (LEPR), interleukin 6 receptor (IL6R), glucokinase regulator (GCKR), hepatic nuclear factor 1 alpha (HNF1A), apolipoprotein E (APOE), and achaete-scute complex homolog 1 (ASCL1). Findings from other genome-wide association studies did not extend the number of loci related to CRP10,11.
In this study, we set out to discover additional genes related to CRP levels using GWA scans in 66,185 participants from 15 population-based cohort studies and replicate our findings in 16,540 participants from ten independent studies. To investigate whether the genetic variants identified interact with non-genetic determinants of CRP such as age, sex, smoking and body mass index (BMI) we examined gene-environment interactions. Finally, it is still unknown to what extent the genes associated with circulating CRP levels, individually or jointly, affect the risk of cardiovascular diseases. To address this question we examined the association of genetic variants with myocardial infarction (MI) and coronary heart disease (CHD).
Methods
Subjects and Measurements
Participants were of European ancestry. All studies had protocols approved by local institutional review boards. Participants provided written informed consent and gave permission to use their DNA for research purposes. Baseline characteristics for all participating studies are presented in Supplementary Tables 1. Baseline measures of clinical and demographic characteristics were obtained at the time of cohort entry except for B58C, FHS, NFBC66, and ARIC in which measures were obtained at the time of phenotype measurement.
GWA analysis
Genome-wide scans were performed independently in each cohort using various genotyping technologies (Supplementary Table 7). Each study carried out association analysis using the genotype-phenotype data within their cohort. Each study imputed SNPs with reference to HapMap release 22 CEU and provided results for a common set of SNPs for meta-analysis. Except for FHS, all studies conducted a linear regression analysis adjusted for age (except for NFBC66 and B58C), sex (except for WGHS), and site of recruitment (if necessary) for all SNPs based on an additive genetic model. In the ERF study, adjustments for the family structure in the GWA analysis was based on the model residuals in the score test, which accounted for pedigree structure as implemented in GenABEL software12 function “mmscore”13. In FHS, a linear mixed effects model was employed using the lmekin function of the kinship package in R with a fixed additive effect for the SNP genotype, fixed covariate effects, and random family specific additive residual polygenic effects14. In each study, we estimated the genomic inflation rate, stated as lambda (λgc), by comparing each study’s median chi-square value to 0.4549, the median chi-square for the null distribution15 (Supplementary Table 1). P-values for each cohort were adjusted for underlying population structure using the genomic inflation coefficient.
Discovery panel and the replication panel
The 15 study discovery panel included five studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium16, four studies from the European Special Population Network (EUROSPAN), and six additional independent studies comprising 66,185 participants. The replication studies included ten independent studies and 16,540 participants.
Meta-analysis
To calculate the combined p-values and beta coefficients we used an inverse-variance weighted fixed-effects meta-analysis. We used METAL, a software package designed to perform meta-analysis on GWA datasets17. We applied an a priori threshold 5.0×10−8 for genome-wide significance 18. When more than one genome-wide significant SNP clustered at a locus, we took the SNP with the smallest p-value as the lead SNP. To investigate the validity of our findings, we sought replication of the lead SNP in genome-wide significant (p<5×10−8) loci and sought additional evidence for suggestive loci 5×10−8<p<10−5) in our replication panel. We ran a fixed-effect meta-analysis to combine the results of the discovery and replication panels. The first GWA study on serum CRP published by Ridker et9 al was based on part of the WGHS population. In order to confirm that our findings were not entirely influenced by these previously published results, we performed a meta-analysis excluding the WGHS population.
Examination of heterogeneity
We examined between-study heterogeneity using Cochran’s Q test. Based on Bonferroni adjustment for 18 tests, heterogeneity was considered significant at a p-value less than 2.8×10−3. We explored the source of heterogeneity for significant SNPs by fitting a covariate (age, gender, BMI, or smoking) in a meta-regression model.
Gene-environment interaction
For all genome-wide significant SNPs, we examined gene-by-age, gene-by-sex, gene-by-BMI and gene-by-smoking interactions in each study by introducing an interaction term into a linear model with age, sex, and the covariate of interest as the independent variables and natural log transformed CRP as the outcome. A meta-analysis was performed to combine the reported interaction beta and p-values across studies for each of the top SNPs. Based on Bonferroni adjustment for 72 tests (18 SNPs for four environmental factors), we used a significance threshold at 6.9 × 10−4.
Genetic Risk Score
To model the cumulative effect of the identified loci, we created a genetic risk score comprising information from the genome-wide significant SNPs. The risk score was computed for each subject by multiplying the number of alleles associated with higher CRP by the beta coefficient from the combined meta-analysis, and taking the sum over the SNPs. To make the genetic risk score easier to interpret, we rescaled to range from zero (low CRP level) to 100 (high CRP level).
Association with MI and CHD
The association of the genome-wide significant SNPs and the genetic risk score with clinical events was tested in ARIC, AGES, CHS, FHS, RS, and WGHS using incident cases of MI and CHD (i.e. occurring after CRP concentrations were measured). Incident MI included fatal and non-fatal MI. Incident CHD included incident fatal and non-fatal MI, fatal CHD and sudden death. Each study examined the associations using a Cox proportional hazards model adjusted for age and sex. We subsequently combined these results by performing a meta-analysis.
Results
The basic characteristics of the participating studies are shown in Supplementary Table 1. Supplementary Figure 1 shows the QQ-plot (λ = 1.09) and Supplementary Figure 2 presents the p-values for > 2.5 million SNPs across 22 autosomal chromosomes. A total of 953 SNPs in 17 loci exceeded the genome-wide significance threshold (p<5×10−8) (Table 1). Moreover, we found suggestive signals (p<10−5) in 47 loci. Sixty four lead SNPs including 17 SNPs from the genome-wide significant loci and 47 SNPs from the suggestive loci were chosen for the replication stage (Supplemental Table 2). Six SNPs close to CRP, APOC1, HNF1A, LEPR, IL6R, and IL1F10 exceeded the Bonferroni significance level (0.05/64 = 7.8×10−4) in the replication stage. In a fixed-effects meta-analysis of the discovery and replication panel, 18 loci showed a genome-wide significant association; 15 loci out of the 17 genome-wide significant loci (Table 2) and three loci out of the 47 suggestive loci (Table 3). In addition to confirming seven previously-reported associations the genome-wide significant signals marked 11 novel associations within or close to the NLR family, pyrin domain containing 3 (NLRP3), interleukin 1 family, member 10 (IL1F10), protein phosphatase 1, regulatory (inhibitor) subunit 3B (PPP1R3B), hepatocyte nuclear factor 4, alpha (HNF4A), RAR-related orphan receptor A (RORA), Sal-like 1 (SALL1), poly(A) binding protein, cytoplasmic 4 (inducible form) (PABPC4), B-cell CLL/lymphoma 7B (BCL7B), proteasome assembly chaperone 1 (PSMG1), protein tyrosine phosphatase, non-receptor type 2 (PTPN2), G protein-coupled receptor, family C, group 6, member A (GPRC6A), and interferon regulatory factor 1 (IRF1). Furthermore, our meta-analysis excluding the WGHS population (Supplementary Table 3) confirmed the association of seven previously known genes9, CRP, APOE (APOC1), HNF1A, LEPR, IL6R, GCKR, and ASCL1 with CRP levels (Bonferroni significance level: 0.05/7 = 7.1×10−3).
Table 1
Association of 17 genome-wide significant loci with CRP levels in the discovery panel
SNP | Band | Significant SNPs | Coded allele | Allele frequency | Beta*(SE) | P-value | Gene |
---|---|---|---|---|---|---|---|
rs2794520 | 1q23.2 | 121 | C | 0.66 | 0.193 (0.007) | 9.5×10−189 | CRP |
rs4420638 | 19q13.32 | 16 | A | 0.80 | 0.240 (0.010) | 2.1×10−129 | APOC1 |
rs1183910 | 12q24.31 | 186 | G | 0.67 | 0.152 (0.007) | 3.3×10−113 | HNF1A |
rs4420065 | 1p31.3 | 291 | C | 0.61 | 0.111 (0.007) | 3.2×10−64 | LEPR |
rs4129267 | 1q21.3 | 90 | C | 0.60 | 0.094 (0.007) | 1.1×10−47 | IL6R |
rs1260326 | 2q13 | 54 | T | 0.41 | 0.089 (0.007) | 5.4×10−43 | GCKR |
rs12239046 | 1q44 | 13 | C | 0.61 | 0.048 (0.007) | 1.6×10−13 | NLRP3 |
rs6734238 | 2p23.3 | 92 | G | 0.42 | 0.047 (0.007) | 3.4×10−13 | IL1F10 |
rs9987289 | 8p23.1 | 15 | G | 0.90 | 0.079 (0.011) | 2.3×10−12 | PPP1R3B |
rs10745954 | 12q23.2 | 22 | A | 0.50 | 0.043 (0.006) | 1.6×10−11 | ASCL1 |
rs1800961 | 20q13.12 | 1 | C | 0.95 | 0.120 (0.018) | 2.3×10−11 | HNF4A |
rs340029 | 15q22.2 | 25 | T | 0.62 | 0.044 (0.007) | 2.6×10−11 | RORA |
rs10521222 | 16q12.1 | 6 | C | 0.94 | 0.110 (0.017) | 1.3×10−10 | SALL1 |
rs12037222 | 1p32.4 | 11 | A | 0.24 | 0.047 (0.008) | 4.5×10−10 | PABPC4 |
rs13233571 | 7q11.23 | 7 | C | 0.86 | 0.054 (0.010) | 2.8×10−8 | BCL7B |
rs2836878 | 21q22.2 | 2 | G | 0.72 | 0.040 (0.007) | 4.0×10−8 | PSMG1 |
rs4903031 | 14q24.2 | 1 | G | 0.21 | 0.046 (0.008) | 4.6×10−8 | RGS6 |
Table 2
Association of 17 genome-wide significant loci with CRP levels in the replication panel and combined with the discovery results
SNP | Coded allele | Replication | Discovery + replication | R-square** | P-value for heterogeneity | Closest Gene | ||
---|---|---|---|---|---|---|---|---|
Beta*(SE) | P-value | Beta*(SE) | P-value | |||||
rs2794520 | C | 0.086 (0.010) | 9.9×10−19 | 0.160 (0.006) | 2.0×10−186 | 1.38 | 7.4×10−26 | CRP |
rs4420638 | A | 0.200 (0.032) | 3.0×10−10 | 0.236 (0.009) | 8.8×10−139 | 0.93 | 0.03 | APOC1 |
rs1183910 | G | 0.122 (0.021) | 8.3×10−14 | 0.149 (0.006) | 2.1×10−124 | 0.76 | 0.08 | HNF1A |
rs4420065 | C | 0.045 (0.009) | 1.5×10−6 | 0.090 (0.005) | 3.5×10−62 | 0.39 | 1.1×10−9 | LEPR |
rs4129267 | C | 0.045 (0.010) | 7.3×10−6 | 0.079 (0.005) | 2.1×10−48 | 0.31 | 2.4×10−4 | IL6R |
rs1260326 | T | 0.031 (0.010) | 1.9×10−3 | 0.072 (0.005) | 4.6×10−40 | 0.24 | 2.6×10−6 | GCKR |
rs12239046 | C | 0.042 (0.018) | 1.8×10−3 | 0.047 (0.006) | 1.2×10−15 | 0.09 | 0.77 | NLRP3 |
rs6734238 | G | 0.072 (0.017) | 4.9×10−6 | 0.050 (0.006) | 1.8×10−17 | 0.14 | 0.95 | IL1F10 |
rs9987289 | A | 0.003 (0.031) | 3.5×10−2 | 0.069 (0.011) | 3.4×10−13 | 0.08 | 0.04 | PPP1R3B |
rs10745954 | A | 0.018 (0.015) | 1.3×10−1 | 0.039 (0.006) | 1.6×10−11 | 0.06 | 1.1×10−3 | ASCL1 |
rs1800961 | C | 0.023 (0.026) | 3.7×10−1 | 0.088 (0.015) | 2.2×10−9 | 0.06 | 0.07 | HNF4A |
rs340029 | T | 0.004 (0.010) | 5.2×10−1 | 0.032 (0.006) | 4.1×10−9 | 0.08 | 0.05 | RORA |
rs10521222 | C | 0.089 (0.028) | 1.4×10−3 | 0.104 (0.015) | 8.5×10−13 | 0.09 | 0.34 | SALL1 |
rs12037222 | A | 0.035 (0.017) | 3.9×10−2 | 0.045 (0.007) | 6.4 ×10−11 | 0.06 | 0.40 | PABPC4 |
rs13233571 | C | 0.049 (0.025) | 4.5×10−2 | 0.054 (0.009) | 3.6 ×10−9 | 0.08 | 0.13 | BCL7B |
rs2836878 | G | 0.013 (0.011) | 2.3×10−1 | 0.032 (0.006) | 1.7 ×10−7 | 0.05 | 0.18 | PSMG1 |
rs4903031 | G | 0.001 (0.012) | 9.1×10−1 | 0.032 (0.007) | 5.1 ×10−6 | 0.04 | 0.21 | RGS6 |
Table 3
Association of three suggestive loci with CRP levels that reached genome-wide significance after combining discovery and replication panel
SNP | Coded allele | Discovery | Replication | Discovery + replication | R-square** | P-value for heterogeneity | Closest Gene | |||
---|---|---|---|---|---|---|---|---|---|---|
Beta*(SE) | P-value | Beta*(SE) | P-value | Beta*(SE) | P-value | |||||
rs2847281 | A | 0.034 (0.007) | 1.7×10−7 | 0.018 (0.016) | 4.2×10−2 | 0.031 (0.006) | 2.2×10−8 | 0.04 | 0.97 | PTPN2 |
rs6901250 | A | 0.034 (0.007) | 1.2×10−6 | 0.038 (0.015) | 1.2×10−2 | 0.035 (0.006) | 4.8×10−8 | 0.02 | 0.89 | GPRC6A |
rs4705952 | G | 0.038 (0.008) | 4.1×10−6 | 0.065 (0.018) | 3.0×10−4 | 0.042 (0.007) | 1.3×10−8 | 0.05 | 0.47 | IRF1 |
Figure 1 presents the average CRP levels across the genetic risk score in the whole population. Individuals in the highest gene score group had a mean CRP level (4.12 mg/L; 95%CI: 4.96–5.25) that was more than double the level observed for individuals in the lowest gene score group (1.40 mg/L; 95%CI: 1.31–1.49). The percentage of overall variance in CRP which was explained by the genetic risk score ranged from 1.2% to 10.3% across studies in the discovery and replication panel and was more than 5% in half of the studies.
This figure shows the mean CRP level (right vertical axis) as solid black dots connected by solid lines for categories of the genetic risk score. The shaded bars show the distribution of the genetic risk score in the whole population (left vertical axis). The CARLA Study was not included due to missing values for some of the selected SNPs.
After adjustment for number of tests, significant heterogeneity was found for rs2794520, rs4420065, rs4129267, rs1260326, and rs10745954 (Tables 2 & 3). Meta-regression was used to explore the source of heterogeneity. Sex was associated with heterogeneity for rs10745954 (p < 2.8×10−5) (Supplementary Table 6).
All 18 SNPs that showed genome-wide significant results in the combined meta- analyses were studied for interactions with age, sex, BMI and smoking (Supplementary Table 4). After adjustment for the number of tests we found a significant interaction between BMI and the LEPR SNP, rs4420065 (p<2.9×10−6).
We examined the association of the SNPs related to CRP with risk of MI and CHD. These studies comprised 1845 cases of MI and 2947 cases of CHD. Neither the individual SNPs nor the combined genetic risk score showed consistent or genome-wide significant associations with risk of clinical events (Figure 2).
Discussion
Through a meta-analysis of GWA scans from 15 cohort studies comprising 66,185 subjects and a replication sample of 16,540 subjects, we identified 18 loci associated with circulating CRP levels and provided evidence of replication for eight of them. Our results confirm seven gene annotated loci reported by Ridker et al9. Furthermore, we introduce 11 novel loci associated with CRP levels, annotating NLRP3, IL1F10, PPP1R3B, HNF4A, RORA, SALL1, PAPBC4, BCL7B, PTPN2, GPRC6A, and IRF1.
A number of these genes including APOC1, HNF1A, LEPR, GCKR, HNF4A, and PTPN2 are directly or indirectly related to metabolic regulatory pathways involved in diabetes. Mutations in HNF1A are associated with impaired insulin secretion and maturity onset diabetes of the young (MODY) type 319. HNF4A is part of a complex regulatory network in the liver and pancreas for glucose homeostasis20. Mutations in the HNF4A gene cause MODY type 121. HNF4A is a transcription factor involved in the expression of several liver-specific genes including HNF1A21. Defects in the expression of GCKR results in deficient insulin secretion22. PTPN2, which modulates interferon gamma signal transduction at the beta cell level23, was recently identified as a novel susceptibility gene for type 1 diabetes24. PTPN2 also is linked to the inflammatory pathway. The nuclear isoform of PTPN2 is a regulator of transcription factor STAT3 in the downstream of IL-6 signaling and may affect CRP expression in Hep3B cells25.
CRP, IL6R, NLRP3, ILF10, and IRF1 are associated with CRP levels at least partly through pathways related to innate and adapted immune response. NLRP3 encodes a member of the NALP3 inflammasome complex26. The NALP3 inflammasome triggers an innate immune response and can be activated by endogenous ‘danger signals’, as well as compounds associated with pathogens27,28. Activated NALP3 inflammasome functions as an activator of NF-kappaB signaling. NF-kappaB is a transcription factor which affects CRP expression in Hep3B cells29.
Our genetic risk score explained approximately 5% of the variation in CRP levels, showing that genetic factors are of importance in determining CRP levels. In comparison, BMI as the main non-genetic determinant of CRP was reported to explain 5–7% of the variation in CRP levels in AGES30 and up to 15% in FHS31. Ridker et al reported that seven SNPs discovered in their study explained 10.1% of the variation in CRP levels after adjustment for age, smoking, BMI, hormone therapy, and menopausal status. However, without adjustment for these covariates, less than 5% of the variation in CRP levels was explained (D. Chasman, personal communication).
Adipose tissue can induce chronic low-grade inflammation by producing proinflammatory cytokines such as interleukin-632. Therefore, we examined whether adiposity modifies the effect of any of the 18 genes on CRP. We found that BMI modifies the strength of the association between LEPR and CRP. This interaction was initially found in WGHS33.
There is ample evidence that chronic inflammation is involved in atherosclerosis and cardiovascular disease. In this study, we found no association between genetically elevated CRP and risk of CHD. In agreement with our results, Elliot et al reported in a recent study that variations in the CRP gene are not associated with risk of MI and CHD, but they found associations of LEPR, IL6R, and APOCE-CI-CII with CHD10. However, the lack of association with clinical events in our study could also be due to lack of power.
Our study has the benefit of a large and homogenous sample size of 82,725 subjects of European ancestry. This enabled us to find novel genes with small effect on CRP level. Furthermore, this large sample size enabled us to study gene-environment interaction which hitherto has been less feasible. In contrast to most other studies, we used only incident cases of cardiovascular events from well defined population-based studies to examine the relation between the identified SNPs and clinical disease. The study has several limitations. Although we identified 18 loci associated with CRP levels, other genetic loci associated with CRP concentrations may still be missed by our study. Six of the genome-wide significant loci from the discovery panel were significant after Bonferroni correction in the replication panel. The other identified loci need replication for confirmation in larger samples. We acknowledge that our genetic risk score is based on our own findings and may render less efficient when used in another population. Finally, we did not fine map the identified loci; so we acknowledge that the identified SNPs may be in linkage disequilibrium with non-HapMap variants causally related to CRP levels.
In conclusion, we identified 11 novel loci and confirmed seven known loci to affect CRP levels. The results highlight immune response and metabolic regulatory pathways involved in the regulation of chronic inflammation, as well as several loci previously unknown to be related to inflammation. Furthermore, LEPR was found to affect CRP differently in the presence of low or high BMI, which may lead to new insights in the mechanisms underlying inflammation.
Acknowledgments
Footnotes
†Prof Peltonen passed away in March, 2010.
Disclosures
Dr Ridker has received research grant support from Roche, AstraZeneca, and Amgen, and is listed as a co-inventor on patents held by the Brigham and Women's Hospital that relate to the use of inflammatory biomarkers in cardiovascular disease and diabetes.