-
Notifications
You must be signed in to change notification settings - Fork 157
Open
Description
Python 3.11; pandas 2.0.3; numpy 1.25.1
When running the first cell in Chapter 8 in Simulated-Data.ipynb
, I got the following error in a few places: Can only use .dt accessor with datetimelike values
I was able to get it to work by adding pd.to_datetime
in various places. I originally adding pd.to_datetime
to the date column but that caused other issues. Not sure if there is a better way.
date = pd.date_range("2021-05-01", "2021-07-31", freq="D")
cohorts = pd.to_datetime(["2021-05-15", "2021-06-04", "2021-06-20"]).date
poss_regions = ["S", "N", "W", "E"]
reg_ps = dict(zip(poss_regions, [.3, .6, .7, .8]))
reg_fe = dict(zip(poss_regions, [20, 16, 8, 2]))
reg_trend = dict(zip(poss_regions, [0, 0.2, .4, .6]))
units = np.array(range(1, 200+1))
np.random.seed(123)
unit_reg = np.random.choice(poss_regions, len(units))
exp_trend = np.random.exponential(0.01, len(units))
treated_unit = np.random.binomial(1, np.vectorize(reg_ps.__getitem__)(unit_reg))
# staggered addopton dgp
df = pd.DataFrame(dict(
date = np.tile(date.date, len(units)),
city = np.repeat(units, len(date)),
region = np.repeat(unit_reg, len(date)),
treated_unit = np.repeat(treated_unit, len(date)),
cohort = np.repeat(np.random.choice(cohorts, len(units)), len(date)),
eff_heter = np.repeat(np.random.exponential(1, size=len(units)), len(date)),
unit_fe = np.repeat(np.random.normal(0, 2, size=len(units)), len(date)),
time_fe = np.tile(np.random.normal(size=len(date)), len(units)),
week_day = np.tile(date.weekday, len(units)),
w_seas = np.tile(abs(5-date.weekday) % 7, len(units)),
)).assign(
reg_fe = lambda d: d["region"].map(reg_fe),
reg_trend = lambda d: d["region"].map(reg_trend),
reg_ps = lambda d: d["region"].map(reg_ps),
trend = lambda d: (pd.to_datetime(d["date"]) - pd.to_datetime(d["date"]).min()).dt.days,
day = lambda d: (pd.to_datetime(d["date"]) - pd.to_datetime(d["date"]).min()).dt.days,
cohort = lambda d: np.where(d["treated_unit"] == 1, d["cohort"], pd.to_datetime("2100-01-01")),
).assign(
treated = lambda d: ((pd.to_datetime(d["date"]) >= d["cohort"]) & d["treated_unit"] == 1).astype(int),
).assign(
y0 = lambda d: np.round(10
+ d["treated_unit"]
+ d["reg_trend"]*d["trend"]/2
+ d["unit_fe"]
+ 0.4*d["time_fe"]
+ 2*d["reg_fe"]
+ d["w_seas"]/5, 0),
).assign(
# y0 = lambda d: np.round(d["y0"] + d.groupby("city")["y0"].shift(1).fillna(0)*0.2, 0)
).assign(
y1 = lambda d: d["y0"] + np.minimum(0.2*(np.maximum(0, (pd.to_datetime(d["date"]) - pd.to_datetime(d["cohort"])).dt.days)), 1)*d["eff_heter"]*2
).assign(
tau = lambda d: d["y1"] - d["y0"],
downloads = lambda d: np.where(d["treated"] == 1, d["y1"], d["y0"]) + np.random.normal(0,.7,len(d)),
# date = lambda d: pd.to_datetime(d["date"]),
).round({"downloads": 0})
# # # df.head()
and then in the second cell I had to change
.assign(post=lambda d: (d["date"] >= d["cohort"]).astype(int))
to
.assign(post=lambda d: (pd.to_datetime(d["date"]) >= d["cohort"]).astype(int))
Metadata
Metadata
Assignees
Labels
No labels