-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Variable labels get dropped by mutate #5255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
labelled isn't part of the tidyverse, but this will by and large be fixed in dplyr 1.0.0. |
Mister hadley, now dplyr 1.1.3 will still drop variable's label using mutate. And I didn't see related description in mutate's Arguments. |
Dear @QQuantmod, As the original submitter, I can verify that this has indeed been fixed. The original reprex now looks as follows, confirming that all relevant library(conflicted)
library(dplyr)
library(labelled)
packageVersion("dplyr")
#> [1] '1.1.4'
# Mutate no longer drops haven_labelled variable labels
f <- factor(c("yes", "yes", "no", "no", "don't know"))
value_labels <- c("yes" = 1, "no" = 2, "don't know" = 9)
dat <- tibble(responses = to_labelled(f,value_labels)) %>%
set_variable_labels(responses="Variable labels no longer dropped by mutate")
dat <- dat %>%
mutate(responses_mut = to_factor(responses))
# Assignement using base R notation does not drop labels
dat$responses_base <- to_factor(dat$responses)
sapply(dat,var_label)
#> responses
#> "Variable labels no longer dropped by mutate"
#> responses_mut
#> "Variable labels no longer dropped by mutate"
#> responses_base
#> "Variable labels no longer dropped by mutate"
# mutate_* functions now also preserve labels
dat %>%
mutate_if(is.factor,to_labelled) %>%
sapply(var_label)
#> responses
#> "Variable labels no longer dropped by mutate"
#> responses_mut
#> "Variable labels no longer dropped by mutate"
#> responses_base
#> "Variable labels no longer dropped by mutate"
# As does the (no longer so new) across() notation
dat %>%
mutate(across(is.factor,to_labelled)) %>%
sapply(var_label)
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(is.factor, to_labelled)`.
#> Caused by warning:
#> ! Use of bare predicate functions was deprecated in tidyselect 1.1.0.
#> ℹ Please use wrap predicates in `where()` instead.
#> # Was:
#> data %>% select(is.factor)
#>
#> # Now:
#> data %>% select(where(is.factor))
#> responses
#> "Variable labels no longer dropped by mutate"
#> responses_mut
#> "Variable labels no longer dropped by mutate"
#> responses_base
#> "Variable labels no longer dropped by mutate" |
Mutate does not preserve variable labels specified using the labelled package, making it more cumbersome to work with variable labels in a dplyr workflow. Admittedly, this is an interaction between two packages, but they are both part of the tidyverse, so very often used together I would imagine.
This seems to apply to mutate functions in general, which is a bummer, given how convenient in-place processing using mutate_at or the new across() syntax.
Using base R notation instead of mutate confirms that the issue is not with the intermediate functions but with mutate. In other words, assignment to columns using base R preserves the variable labels as expected. Demonstration below:
The text was updated successfully, but these errors were encountered: