You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 6 Causal Inference/6-5 Instrumental Variables/Instrumental Variables Solutions.Rmd
+16-16Lines changed: 16 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -458,23 +458,23 @@ There is no empirical way to determine whether the "exclusion restriction" requi
458
458
459
459
The "first stage" requirement (that $Z$ must have a causal effect on $A$), however, can be empirically tested, and as the name implies, doing so is indeed the first stage in implementing an instrumental variable analysis.
460
460
461
-
To do so, we simply run a linear regression of the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for):
461
+
To do so, we simply regress the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for) using a simple linear regression:
462
462
463
-
$$Z = \beta_0 + \beta_1A + \epsilon$$
464
-
If this regression results in a high correlation value, $Z$ is considered a **strong** instrument and we may proceed. If correlation is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
463
+
$$A = \beta_0 + \beta_1Z + \beta_2W + \epsilon$$
464
+
If this regression results in a high correlation value (the regression coefficent), $Z$ is considered a **strong** instrument and we may proceed. If value is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
465
465
466
-
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the instrument $\hat{Z}$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
466
+
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the treatment $\hat{A}$ that are a function of $Z$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
467
467
468
468
**\textcolor{blue}{Question 6:}** Consider, what are some potential concerns with using a weak instrument?
469
469
470
470
**Solution:** There are many possible answers, but the primary concern is that $Z$ may not truly have a causal effect on $A$ (or at least, not a very strong one).
471
471
472
472
## Second Stage
473
473
474
-
Now that we have the predicted values of the instrument $\hat{Z}$, we regress the outcome $Y$ on these values, like so:
474
+
Now that we have the predicted values of the treatment $\hat{A}$, we regress the outcome $Y$ on these values (and any covariates included in the first stage), like so:
475
475
476
-
$$Y = \beta_0 + \beta_1\hat{Z} + \epsilon$$
477
-
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model.
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model.*Note that this will differ slightly if you control for any $W$.*
478
478
479
479
$$\hat{\beta}_1 = \frac{Cov(Z,Y)}{Cov(Z,A)}$$
480
480
@@ -557,33 +557,33 @@ df <- df %>%
557
557
head(df)
558
558
summary(df)
559
559
```
560
-
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress proximity $Z$ on AspiTyleCedrin use $A$ and sex assigned at birth $W$. Assign the predicted values to the variable name `Z_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
560
+
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress whether the individual took AspiTyleCedrin ($A$) on proximity to a pharmacy that sells AspiTyleCedrin $Z$ and sex assigned at birth $W$. Assign the predicted values to the variable name `A_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
561
561
562
562
```{r}
563
563
564
564
# 1. first stage
565
565
# ----------
566
-
lm_out1 <- lm(Z ~ A + W, # regress Z (instrument) on A + W
566
+
lm_out1 <- lm(A ~ Z + W, # regress A (treatment) on Z (instrument) + W (covariates)
567
567
data = df) # specify data
568
568
569
569
# view model summary
570
570
summary(lm_out1)
571
571
572
572
573
573
# get fitted values (Z-hat)
574
-
Z_hat <- lm_out1$fitted.values
574
+
A_hat <- lm_out1$fitted.values
575
575
576
576
# get the covariance of Z and A
577
577
cov_za <- cov(df$Z, df$A)
578
578
```
579
579
580
-
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `Z_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
580
+
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `A_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
581
581
582
582
```{r}
583
583
584
584
# 2. reduced form
585
585
# ----------
586
-
lm_out2 <- lm(Y ~ Z_hat, # regress Y (outcome) on fitted values from first stage
586
+
lm_out2 <- lm(Y ~ A_hat + W, # regress Y (outcome) on fitted values from first stage
587
587
data = df) # specify data
588
588
589
589
# view model summary
@@ -595,7 +595,7 @@ cov_zy <- cov(df$Z, df$Y)
595
595
596
596
**\textcolor{blue}{Question 10:}** Use your `cov_za` and `cov_zy` to estimate the coefficient $\beta_1$ in the following equation:
597
597
598
-
$$Y = \beta_0 + \beta_1 A + \beta_2 W + \epsilon$$
> When controlling for sex assigned at birth, use of AspiTyleCedrin reduces migraines by approximately 3.8 per month.
609
+
> When controlling for sex assigned at birth, use of AspiTyleCedrin reduces migraines by approximately 3.8 per month.*Note this is slighlty different than the estimated coefficient in the OLS above, likely because the covariance of W was not included.*
610
610
611
611
612
-
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command:
612
+
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command (*note that the standard errors will correctly adjusted when using the `ivreg()` function*:
613
613
614
614
615
615
```{r}
@@ -625,7 +625,7 @@ summary(lm_out3)
625
625
626
626
**\textcolor{blue}{Question 11:}** Compare the estimate of the coefficient on $A$ in the output above to your previous answer.
627
627
628
-
> The results are very similar. In this case the estimate using `ivreg()` is slightly larger, but if you repeat this with a difference seed, it might be smaller. So, they will both report similar estimates, which could be due to a rounding error.
628
+
> The results are identical. However, it should be noted that the output from mannualy plugging in the y-hats won't produce the correct standard errors. Instead, we should use `ivreg()` to get the correct standard errors, which are adjusted since the y-hats are "generated regressors" (generated from the data rather than measured independently of other variables).
Copy file name to clipboardExpand all lines: 6 Causal Inference/6-5 Instrumental Variables/Instrumental Variables Student.Rmd
+12-11Lines changed: 12 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -446,25 +446,26 @@ There is no empirical way to determine whether the "exclusion restriction" requi
446
446
447
447
## First Stage
448
448
449
+
449
450
The "first stage" requirement (that $Z$ must have a causal effect on $A$), however, can be empirically tested, and as the name implies, doing so is indeed the first stage in implementing an instrumental variable analysis.
450
451
451
-
To do so, we simply run a linear regression of the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for):
452
+
To do so, we simply regress the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for) using a simple linear regression:
452
453
453
-
$$Z = \beta_0 + \beta_1A + \epsilon$$
454
-
If this regression results in a high correlation value, $Z$ is considered a **strong** instrument and we may proceed. If correlation is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
454
+
$$A = \beta_0 + \beta_1Z + \beta_2W + \epsilon$$
455
+
If this regression results in a high correlation value (the regression coefficent), $Z$ is considered a **strong** instrument and we may proceed. If value is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
455
456
456
-
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the instrument $\hat{Z}$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
457
+
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the treatment $\hat{A}$ that are a function of $Z$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
457
458
458
459
**\textcolor{blue}{Question 6:}** Consider, what are some potential concerns with using a weak instrument?
459
460
460
461
**Solution:** ...
461
462
462
463
## Second Stage
463
464
464
-
Now that we have the predicted values of the instrument $\hat{Z}$, we regress the outcome $Y$ on these values, like so:
465
+
Now that we have the predicted values of the treatment $\hat{A}$, we regress the outcome $Y$ on these values (and any covariates included in the first stage), like so:
465
466
466
-
$$Y = \beta_0 + \beta_1\hat{Z} + \epsilon$$
467
-
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model.
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model.*Note that this will differ slightly if you control for any $W$.*
468
469
469
470
$$\hat{\beta}_1 = \frac{Cov(Z,Y)}{Cov(Z,A)}$$
470
471
@@ -547,17 +548,17 @@ head(df)
547
548
summary(df)
548
549
```
549
550
550
-
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress proximity $Z$ on AspiTyleCedrin use $A$ and sex assigned at birth $W$. Assign the predicted values to the variable name `Z_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
551
+
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress whether the individual took AspiTyleCedrin ($A$) on proximity to a pharmacy that sells AspiTyleCedrin $Z$ and sex assigned at birth $W$. Assign the predicted values to the variable name `A_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
551
552
552
553
```{r}
553
554
lm_out1 <- lm(..., data = df)
554
555
summary(lm_out1)
555
556
556
-
Z_hat <- lm_out1$...
557
+
A_hat <- lm_out1$...
557
558
cov_za <- cov(..., ...)
558
559
```
559
560
560
-
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `Z_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
561
+
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `A_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
561
562
562
563
```{r}
563
564
lm_out2 <- lm(..., data = df)
@@ -578,7 +579,7 @@ beta_hat
578
579
> ...
579
580
580
581
581
-
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command:
582
+
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command (*note that the standard errors will correctly adjusted when using the `ivreg()` function*:
0 commit comments