Skip to content

Variable labels get dropped by mutate #5255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
torfason opened this issue May 19, 2020 · 3 comments
Closed

Variable labels get dropped by mutate #5255

torfason opened this issue May 19, 2020 · 3 comments

Comments

@torfason
Copy link

Mutate does not preserve variable labels specified using the labelled package, making it more cumbersome to work with variable labels in a dplyr workflow. Admittedly, this is an interaction between two packages, but they are both part of the tidyverse, so very often used together I would imagine.

This seems to apply to mutate functions in general, which is a bummer, given how convenient in-place processing using mutate_at or the new across() syntax.

Using base R notation instead of mutate confirms that the issue is not with the intermediate functions but with mutate. In other words, assignment to columns using base R preserves the variable labels as expected. Demonstration below:

library(dplyr)
library(labelled)

# Mutate drops haven_labelled variable labels, but this seems wrong
f <- factor(c("yes", "yes", "no", "no", "don't know"))
value_labels <- c("yes" = 1, "no" = 2, "don't know" = 9)
dat <- tibble(responses = to_labelled(f,value_labels)) %>% 
  set_variable_labels(responses="Variable labels get dropped by mutate")
dat <- dat %>%
  mutate(responses_mut = to_factor(responses))

# Assignement using base R notation does not drop labels
dat$responses_base <- to_factor(dat$responses)
sapply(dat,var_label)
#> $responses
#> [1] "Variable labels get dropped by mutate"
#> 
#> $responses_mut
#> NULL
#> 
#> $responses_base
#> [1] "Variable labels get dropped by mutate"

# This seems to apply to mutate_* functions in general
# (note that the variable label now dropped from responses_base)
dat %>% 
  mutate_if(is.factor,to_labelled) %>%
  sapply(var_label)
#> $responses
#> [1] "Variable labels get dropped by mutate"
#> 
#> $responses_mut
#> NULL
#> 
#> $responses_base
#> NULL

# It also applies to the new across() notation
# (note that the variable label now dropped from responses_base)
dat %>% 
  mutate(across(is.factor,to_labelled)) %>%
  sapply(var_label)
#> $responses
#> [1] "Variable labels get dropped by mutate"
#> 
#> $responses_mut
#> NULL
#> 
#> $responses_base
#> NULL
@hadley
Copy link
Member

hadley commented May 19, 2020

labelled isn't part of the tidyverse, but this will by and large be fixed in dplyr 1.0.0.

@hadley hadley closed this as completed May 19, 2020
@QQuantmod
Copy link

labelled isn't part of the tidyverse, but this will by and large be fixed in dplyr 1.0.0.

Mister hadley, now dplyr 1.1.3 will still drop variable's label using mutate. And I didn't see related description in mutate's Arguments.

@torfason
Copy link
Author

Dear @QQuantmod,

As the original submitter, I can verify that this has indeed been fixed. The original reprex now looks as follows, confirming that all relevant dplyr functions preserve variable labels:

library(conflicted)
library(dplyr)
library(labelled)

packageVersion("dplyr")
#> [1] '1.1.4'

# Mutate no longer drops haven_labelled variable labels
f <- factor(c("yes", "yes", "no", "no", "don't know"))
value_labels <- c("yes" = 1, "no" = 2, "don't know" = 9)
dat <- tibble(responses = to_labelled(f,value_labels)) %>% 
  set_variable_labels(responses="Variable labels no longer dropped by mutate")
dat <- dat %>%
  mutate(responses_mut = to_factor(responses))

# Assignement using base R notation does not drop labels
dat$responses_base <- to_factor(dat$responses)
sapply(dat,var_label)
#>                                     responses 
#> "Variable labels no longer dropped by mutate" 
#>                                 responses_mut 
#> "Variable labels no longer dropped by mutate" 
#>                                responses_base 
#> "Variable labels no longer dropped by mutate"

# mutate_* functions now also preserve labels
dat %>% 
  mutate_if(is.factor,to_labelled) %>%
  sapply(var_label)
#>                                     responses 
#> "Variable labels no longer dropped by mutate" 
#>                                 responses_mut 
#> "Variable labels no longer dropped by mutate" 
#>                                responses_base 
#> "Variable labels no longer dropped by mutate"

# As does the (no longer so new) across() notation
dat %>% 
  mutate(across(is.factor,to_labelled)) %>%
  sapply(var_label)
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(is.factor, to_labelled)`.
#> Caused by warning:
#> ! Use of bare predicate functions was deprecated in tidyselect 1.1.0.
#> ℹ Please use wrap predicates in `where()` instead.
#>   # Was:
#>   data %>% select(is.factor)
#> 
#>   # Now:
#>   data %>% select(where(is.factor))
#>                                     responses 
#> "Variable labels no longer dropped by mutate" 
#>                                 responses_mut 
#> "Variable labels no longer dropped by mutate" 
#>                                responses_base 
#> "Variable labels no longer dropped by mutate"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants