Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr;29(4):1081-1111.
doi: 10.1177/0962280219851817. Epub 2019 May 30.

Inferring the direction of a causal link and estimating its effect via a Bayesian Mendelian randomization approach

Affiliations

Inferring the direction of a causal link and estimating its effect via a Bayesian Mendelian randomization approach

Ioan Gabriel Bucur et al. Stat Methods Med Res. 2020 Apr.

Abstract

The use of genetic variants as instrumental variables - an approach known as Mendelian randomization - is a popular epidemiological method for estimating the causal effect of an exposure (phenotype, biomarker, risk factor) on a disease or health-related outcome from observational data. Instrumental variables must satisfy strong, often untestable assumptions, which means that finding good genetic instruments among a large list of potential candidates is challenging. This difficulty is compounded by the fact that many genetic variants influence more than one phenotype through different causal pathways, a phenomenon called horizontal pleiotropy. This leads to errors not only in estimating the magnitude of the causal effect but also in inferring the direction of the putative causal link. In this paper, we propose a Bayesian approach called BayesMR that is a generalization of the Mendelian randomization technique in which we allow for pleiotropic effects and, crucially, for the possibility of reverse causation. The output of the method is a posterior distribution over the target causal effect, which provides an immediate and easily interpretable measure of the uncertainty in the estimation. More importantly, we use Bayesian model averaging to determine how much more likely the inferred direction is relative to the reverse direction.

Keywords: Bayesian model averaging; Causal inference; Mendelian randomization; genetic epidemiology; instrumental variables; robust estimation; sparsity prior.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Chen et al. used the ALDH2 genetic variant as a proxy for alcohol intake to determine if there is a causal link between the latter and blood pressure. They found that alcohol intake has a marked positive causal effect on blood pressure.
Figure 2.
Figure 2.
Ference et al. used Mendelian randomization for estimating the (positive) causal effect of low-density lipoprotein cholesterol (LDL-C) on the risk of coronary heart disease (CHD). They estimated that for each mmol/L lower LDL-C, the risk of CHD is reduced by 54.5% (95% CI (48.8%, 59.5%)).
Figure 3.
Figure 3.
Vimaleswaran et al. explored the causal direction of the relationship between body mass index (BMI) and 25-hydroxivitamin D [25(OH)D] via bidirectional Mendelian randomization. They concluded that higher BMI leads to lower vitamin D levels and not the other way around.
Figure 4.
Figure 4.
Directed acyclic graph showing the instrumental variable assumptions. Causal effects are unidirectional and denoted by arrows. The association entailed by IV1 is highlighted in red, while the crossed-out dashed lines signify the absence of an association (causal or non-causal). The dotted border of U means that the variable is unobserved.
Figure 5.
Figure 5.
The local causal discovery (LCD) pattern can be discovered from observational data by testing if the genetic variant G is independent from the outcome Y when conditioning on the exposure X.
Figure 6.
Figure 6.
The genetic variant G is pleiotropic since it affects both phenotypic measures (X and Y). In this illustration, G exhibits both vertical pleiotropy (X and Y are influenced by G via the same causal pathway GXY) and horizontal pleiotropy (X and Y are influenced by G via different causal pathways). The former is crucial to the application of MR, while the latter results here in a violation of the IV3 assumption.
Figure 7.
Figure 7.
Reverse causation refers to the situation when the outcome Y precedes and causes the exposure X instead of the other way around. The term “reverse” refers to the fact that the direction of the causal effect is opposite to what was expected based on the study design. We emphasize that although we denote the exposure by X and the outcome by Y, these terms do not have any causal meaning, i.e. they do not imply a particular causal ordering.
Figure 8.
Figure 8.
Graphical description of our assumed generative model. We denote the exposure variable by X and the outcome variable by Y. We are interested in the causal effect from X to Y, which is denoted by β. The association between X and Y is obfuscated by the unobserved variable U, which we use to model unmeasured confounding explicitly. The shaded plate indicates replication across the J independent genetic variants Gj,j{1,2,,J}. Note that the replication also applies to the parameters γj and αj and their corresponding edges.
Figure 9.
Figure 9.
Competing models encompassing both possible directions for the causal link between exposure and outcome. (a) (MXY) Model where the causal relationship is in the expected direction. βXY is the parameter measuring the (linear) causal effect of the exposure X on the outcome Y. The shaded plate indicates replication across the variants Gj. (b) (MYX) Alternative model, where the causal relationship is in the reverse direction, from outcome to exposure. Note that βYX is a different parameter than βXY. The latter is equal to zero for this model, as there is no causal link from X to Y.
Figure 10.
Figure 10.
Compact description of the Bayesian inference process using plate notation. The small rectangles indicate our model parameters, while the circles denote random variables (observed or unobserved). The gray superposed areas signify replication across the J genetic variants and the N data points. This is also suggested by the subscripts used for parameters and variables.
Figure 11.
Figure 11.
Generating model where the causal relationship is from X to Y (the expected direction). This model satisfies all three instrumental variable assumptions.
Figure 12.
Figure 12.
Alternative generating model, statistically equivalent to the one in Figure 11, where the causal relationship is from Y to X (the reverse direction).
Figure 13.
Figure 13.
Generating model where the causal relationship is from X to Y (βXY=1) and in which there is weak confounding (κX=κY=0.1) and pleiotropy (α=0.1). There is a weak dependence GY|X that results in a mild violation of the IV3 assumption.
Figure 14.
Figure 14.
Estimated posterior density of the causal effect βXY for the data generated from the model in Figure 13. The dashed vertical line at zero indicates the estimated probability of the reverse direction (Figure 9(b)), in which case βXY=0 since there is no causal link from X to Y.
Figure 15.
Figure 15.
The effect of introducing pleiotropic effects on the posterior estimates is that the distribution moves away from the true value β=1. The shaded area in each posterior distribution corresponds to the 50% posterior uncertainty (credible) interval, with the posterior median in the center depicted with a vertical line. In the worst case scenario, where no genetic variant is a valid instrument, we observe the appearance of a second mode of the distribution, which is close to zero. This mode corresponds to the model explanation of the data where there is a “weak” causal effect from exposure to outcome. We notice, however, that the posterior distribution progression is gradual, thereby showcasing the robustness of BayesMR to the presence of pleiotropy. When only 40% of the genetic variants were valid instruments, the posterior distribution remained robustly centered around β=1. Even when none of the genetic variants satisfied the IV assumptions, a significant proportion of the probability mass could be found around the true value.
Figure 16.
Figure 16.
Posterior distribution of β for different hyperparameter settings (only λ is varied) in the near-LCD scenario (Figure 13). The true value for β is depicted by a dashed vertical line.
Figure 17.
Figure 17.
Estimated posterior distribution for the causal effect of birth weight on adult fasting glucose levels. The light shaded area in the posterior represents the interquartile range, while the dark shaded line indicates the median. For the Gaussian mixture prior in equation (7), we have taken τ2=1 and λ=10-4. The IVW estimate reported in Del Greco et al. and its confidence bounds are shown for comparison.
Figure 18.
Figure 18.
Estimated posterior distribution for the causal effect of birth weight on adult fasting glucose after adapting the prior knowledge to fit the classic instrumental variable setting. The light shaded area in the posterior represents the interquartile range, while the dark shaded line indicates the median. For the Gaussian mixture prior in equation (7), we have taken τ2=1 and λ=10-4, respectively. The IVW estimate reported in Del Greco et al. and its confidence bounds are shown for comparison.
Figure 19.
Figure 19.
Estimated causal effect of BMI on the risk of PD expressed as the difference in log-odds of PD per 5 kg/m2 increase in BMI. The light shaded area in the posterior represents the interquartile range, while the dark shaded line indicates the median. The IVW estimate βIVW derived from equation (3) along with its 95% confidence bounds are shown for comparison.
Figure 20.
Figure 20.
Scatter plot of the genetic associations with BMI (horizontal axis) and PD risk (vertical axis) for 77 genetic variants. The two outliers (triangles) show a relatively strong association with the outcome given their association with the exposure. The regression line including the outliers is dashed, while the regression line obtained without the outliers is continuous.
Figure 21.
Figure 21.
Estimated pleiotropic effects for the two genetic variants suspected of being pleiotropic outliers (rs17001654 and rs13107325). The light shaded area in the posterior represents the interquartile range, while the dark shaded line indicates the median. Most of the posterior mass is distributed away from zero, thereby supporting the suspicion that these two variants exhibit horizontal pleiotropy.
Figure 22.
Figure 22.
Comparison of the causal effect estimates between X (coffee consumption in cups per day) and Y (heaviness of smoking in cigarettes per day) for the two possible causal directions. The estimated evidence for the two models is p(M|XY)=54.67% and p(M|YX)=45.33%, respectively. In the left figure, we see the estimate of βXY, which is the causal effect of coffee consumption on heaviness of smoking, under the assumption that the causal link XY exists. In the right figure, we see the estimate of βYX, which is the causal effect of heaviness of smoking on coffee consumption, under the assumption that the causal link YX exists. (a) Posterior distribution of the putative causal effect of coffee consumption on smoking. In the case of reverse causation (causal link from smoking to coffee consumption), this effect is zero, as indicated by the vertical dashed line. The estimate next to the line (54.67%) is the evidence for the reverse model. (b) Posterior distribution of the putative causal effect of smoking on coffee consumption. In the case of reverse causation (causal link from coffee consumption to smoking), this effect is zero, as indicated by the vertical dashed line. The estimate next to the line (45.33%) is the evidence for the reverse model.

Similar articles

Cited by

References

    1. Fletcher RH, Fletcher SW, Fletcher GS. Clinical epidemiology: the essentials, Philadelphia: Lippincott Williams & Wilkins, 2012.
    1. Lawlor DA, Harbord RM, Sterne JAC, et al. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 2008; 27: 1133–1163. - PubMed
    1. Davey Smith G, Paternoster L, Relton C. When will Mendelian randomization become relevant for clinical practice and public health? JAMA 2017; 317: 589–591. - PubMed
    1. Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet 2014; 23: R89–R98. - PMC - PubMed
    1. Zheng J, Baird D, Borges MC, et al. Recent developments in Mendelian randomization studies. Curr Epidemiol Rep 2017; 4: 330–345. - PMC - PubMed

Publication types

-