Skip to content

Commit abba5b0

Browse files
authored
Merge pull request #132 from dlab-berkeley/kz
udpate to regressing A on Z instead of Z on A
2 parents c25e01b + 6d071b9 commit abba5b0

File tree

3 files changed

+28
-27
lines changed

3 files changed

+28
-27
lines changed

6 Causal Inference/6-5 Instrumental Variables/Instrumental Variables Solutions.Rmd

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -458,23 +458,23 @@ There is no empirical way to determine whether the "exclusion restriction" requi
458458

459459
The "first stage" requirement (that $Z$ must have a causal effect on $A$), however, can be empirically tested, and as the name implies, doing so is indeed the first stage in implementing an instrumental variable analysis.
460460

461-
To do so, we simply run a linear regression of the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for):
461+
To do so, we simply regress the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for) using a simple linear regression:
462462

463-
$$Z = \beta_0 + \beta_1A + \epsilon$$
464-
If this regression results in a high correlation value, $Z$ is considered a **strong** instrument and we may proceed. If correlation is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
463+
$$A = \beta_0 + \beta_1Z + \beta_2W + \epsilon$$
464+
If this regression results in a high correlation value (the regression coefficent), $Z$ is considered a **strong** instrument and we may proceed. If value is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
465465

466-
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the instrument $\hat{Z}$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
466+
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the treatment $\hat{A}$ that are a function of $Z$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
467467

468468
**\textcolor{blue}{Question 6:}** Consider, what are some potential concerns with using a weak instrument?
469469

470470
**Solution:** There are many possible answers, but the primary concern is that $Z$ may not truly have a causal effect on $A$ (or at least, not a very strong one).
471471

472472
## Second Stage
473473

474-
Now that we have the predicted values of the instrument $\hat{Z}$, we regress the outcome $Y$ on these values, like so:
474+
Now that we have the predicted values of the treatment $\hat{A}$, we regress the outcome $Y$ on these values (and any covariates included in the first stage), like so:
475475

476-
$$Y = \beta_0 + \beta_1\hat{Z} + \epsilon$$
477-
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model.
476+
$$Y = \beta_0 + \beta_1\hat{A} + \beta_1W + \epsilon$$
477+
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model. *Note that this will differ slightly if you control for any $W$.*
478478

479479
$$\hat{\beta}_1 = \frac{Cov(Z,Y)}{Cov(Z,A)}$$
480480

@@ -557,33 +557,33 @@ df <- df %>%
557557
head(df)
558558
summary(df)
559559
```
560-
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress proximity $Z$ on AspiTyleCedrin use $A$ and sex assigned at birth $W$. Assign the predicted values to the variable name `Z_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
560+
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress whether the individual took AspiTyleCedrin ($A$) on proximity to a pharmacy that sells AspiTyleCedrin $Z$ and sex assigned at birth $W$. Assign the predicted values to the variable name `A_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
561561

562562
```{r}
563563
564564
# 1. first stage
565565
# ----------
566-
lm_out1 <- lm(Z ~ A + W, # regress Z (instrument) on A + W
566+
lm_out1 <- lm(A ~ Z + W, # regress A (treatment) on Z (instrument) + W (covariates)
567567
data = df) # specify data
568568
569569
# view model summary
570570
summary(lm_out1)
571571
572572
573573
# get fitted values (Z-hat)
574-
Z_hat <- lm_out1$fitted.values
574+
A_hat <- lm_out1$fitted.values
575575
576576
# get the covariance of Z and A
577577
cov_za <- cov(df$Z, df$A)
578578
```
579579

580-
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `Z_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
580+
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `A_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
581581

582582
```{r}
583583
584584
# 2. reduced form
585585
# ----------
586-
lm_out2 <- lm(Y ~ Z_hat, # regress Y (outcome) on fitted values from first stage
586+
lm_out2 <- lm(Y ~ A_hat + W, # regress Y (outcome) on fitted values from first stage
587587
data = df) # specify data
588588
589589
# view model summary
@@ -595,7 +595,7 @@ cov_zy <- cov(df$Z, df$Y)
595595

596596
**\textcolor{blue}{Question 10:}** Use your `cov_za` and `cov_zy` to estimate the coefficient $\beta_1$ in the following equation:
597597

598-
$$Y = \beta_0 + \beta_1 A + \beta_2 W + \epsilon$$
598+
$$Y = \beta_0 + \beta_1\hat{A} + \beta_1W + \epsilon$$
599599
Interpret your result in words.
600600

601601
```{r}
@@ -606,10 +606,10 @@ beta_hat <- cov_zy/cov_za # divide Cov(Z,Y) / Cov(Z,A)
606606
beta_hat
607607
```
608608

609-
> When controlling for sex assigned at birth, use of AspiTyleCedrin reduces migraines by approximately 3.8 per month.
609+
> When controlling for sex assigned at birth, use of AspiTyleCedrin reduces migraines by approximately 3.8 per month. *Note this is slighlty different than the estimated coefficient in the OLS above, likely because the covariance of W was not included.*
610610
611611

612-
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command:
612+
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command (*note that the standard errors will correctly adjusted when using the `ivreg()` function*:
613613

614614

615615
```{r}
@@ -625,7 +625,7 @@ summary(lm_out3)
625625

626626
**\textcolor{blue}{Question 11:}** Compare the estimate of the coefficient on $A$ in the output above to your previous answer.
627627

628-
> The results are very similar. In this case the estimate using `ivreg()` is slightly larger, but if you repeat this with a difference seed, it might be smaller. So, they will both report similar estimates, which could be due to a rounding error.
628+
> The results are identical. However, it should be noted that the output from mannualy plugging in the y-hats won't produce the correct standard errors. Instead, we should use `ivreg()` to get the correct standard errors, which are adjusted since the y-hats are "generated regressors" (generated from the data rather than measured independently of other variables).
629629
630630
\newpage
631631

6 Causal Inference/6-5 Instrumental Variables/Instrumental Variables Student.Rmd

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -446,25 +446,26 @@ There is no empirical way to determine whether the "exclusion restriction" requi
446446

447447
## First Stage
448448

449+
449450
The "first stage" requirement (that $Z$ must have a causal effect on $A$), however, can be empirically tested, and as the name implies, doing so is indeed the first stage in implementing an instrumental variable analysis.
450451

451-
To do so, we simply run a linear regression of the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for):
452+
To do so, we simply regress the intended instrument $Z$ on the exposure $A$ (and any measured confounders $W$ that we have determined appropriate to control for) using a simple linear regression:
452453

453-
$$Z = \beta_0 + \beta_1A + \epsilon$$
454-
If this regression results in a high correlation value, $Z$ is considered a **strong** instrument and we may proceed. If correlation is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
454+
$$A = \beta_0 + \beta_1Z + \beta_2W + \epsilon$$
455+
If this regression results in a high correlation value (the regression coefficent), $Z$ is considered a **strong** instrument and we may proceed. If value is low, however, $Z$ is considered a **weak** instrument and may be a poor choice of instrument.
455456

456-
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the instrument $\hat{Z}$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
457+
If we decide to move forward with using $Z$ as an instrument, we save the predicted values of the treatment $\hat{A}$ that are a function of $Z$ and the covariance of $Z$ and $A$ ($Cov(Z,A)$) for the next stage.
457458

458459
**\textcolor{blue}{Question 6:}** Consider, what are some potential concerns with using a weak instrument?
459460

460461
**Solution:** ...
461462

462463
## Second Stage
463464

464-
Now that we have the predicted values of the instrument $\hat{Z}$, we regress the outcome $Y$ on these values, like so:
465+
Now that we have the predicted values of the treatment $\hat{A}$, we regress the outcome $Y$ on these values (and any covariates included in the first stage), like so:
465466

466-
$$Y = \beta_0 + \beta_1\hat{Z} + \epsilon$$
467-
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model.
467+
$$Y = \beta_0 + \beta_1\hat{A} + \beta_1W + \epsilon$$
468+
We then retrieve the covariance between $Z$ and $Y$ ($Cov(Z,Y)$). The ratio between this and $Cov(Z,A)$ is then our 2SLS estimate of the coefficient on $A$ in the original model. *Note that this will differ slightly if you control for any $W$.*
468469

469470
$$\hat{\beta}_1 = \frac{Cov(Z,Y)}{Cov(Z,A)}$$
470471

@@ -547,17 +548,17 @@ head(df)
547548
summary(df)
548549
```
549550

550-
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress proximity $Z$ on AspiTyleCedrin use $A$ and sex assigned at birth $W$. Assign the predicted values to the variable name `Z_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
551+
**\textcolor{blue}{Question 8:}** Use the `lm()` function to regress whether the individual took AspiTyleCedrin ($A$) on proximity to a pharmacy that sells AspiTyleCedrin $Z$ and sex assigned at birth $W$. Assign the predicted values to the variable name `A_hat`. Use the `cov()` function to find $Cov(Z,A)$ and assign the result to the variable name `cov_za`.
551552

552553
```{r}
553554
lm_out1 <- lm(..., data = df)
554555
summary(lm_out1)
555556
556-
Z_hat <- lm_out1$...
557+
A_hat <- lm_out1$...
557558
cov_za <- cov(..., ...)
558559
```
559560

560-
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `Z_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
561+
**\textcolor{blue}{Question 9:}** Use the `lm()` function to regress migraines $Y$ on your fitted values `A_hat`. Use the `cov()` function to find $Cov(Z,Y)$ and assign the result to the variable name `cov_zy`.
561562

562563
```{r}
563564
lm_out2 <- lm(..., data = df)
@@ -578,7 +579,7 @@ beta_hat
578579
> ...
579580
580581

581-
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command:
582+
The `AER` package also provides us with the `ivreg()` function which allows us to perform IV regression in one command (*note that the standard errors will correctly adjusted when using the `ivreg()` function*:
582583

583584
```{r}
584585
lm_out3 <- ivreg(..., data = df)

0 commit comments

Comments
 (0)