Skip to content

Partial fix for #330 #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Partial fix for #330 #332

wants to merge 1 commit into from

Conversation

damonbayer
Copy link

Closes #330.

This is not yet complete, but I think it is a good start. The remaining issue is that extraneous NAs in the .value column.

library(tidybayes)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(waldo)

my_tidy_draws <- 
  expand_grid(.chain = 1:2, .iteration = 1:5) |> 
  mutate(.draw = tidybayes:::draw_from_chain_and_iteration_(.chain, .iteration)) |> 
  mutate(`aa[1]` = rnorm(n()), `aa[2]` = rnorm(n()), `ab[1]` = rnorm(n())) |> 
  tidy_draws()

gather_draws_explicit <- my_tidy_draws |> gather_draws(aa[i], ab[i])

gather_draws_regex <- my_tidy_draws |> gather_draws(`a.*`[i], regex = T)

compare(gather_draws_explicit, gather_draws_regex)
#> `attr(attr(old, 'groups'), 'row.names')`: 1 2 3  
#> `attr(attr(new, 'groups'), 'row.names')`: 1 2 3 4
#> 
#> `attr(old, 'groups')$i`: 1 1 2  
#> `attr(new, 'groups')$i`: 1 1 2 2
#> 
#> `attr(old, 'groups')$.variable`: "aa" "ab" "aa"     
#> `attr(new, 'groups')$.variable`: "aa" "ab" "aa" "ab"
#> 
#> `attr(old, 'groups')$.rows` is length 3
#> `attr(new, 'groups')$.rows` is length 4
#> 
#> `attr(old, 'groups')$.rows[[4]]` is absent
#> `attr(new, 'groups')$.rows[[4]]` is an integer vector (31, 32, 33, 34, 35, ...)
#> 
#> `attr(old, 'row.names')[28:30]`: 28 29 30                      and 3 more...
#> `attr(new, 'row.names')[28:40]`: 28 29 30 31 32 33 34 35 36 37           ...
#> 
#> old vs new
#>             i .chain .iteration .draw .variable        .value
#>   old[27, ] 1      2          2     7        ab  0.3754198033
#>   old[28, ] 1      2          3     8        ab  0.2663837133
#>   old[29, ] 1      2          4     9        ab  0.7981538291
#>   old[30, ] 1      2          5    10        ab  0.6497445085
#> + new[31, ] 2      1          1     1        ab            NA
#> + new[32, ] 2      1          2     2        ab            NA
#> + new[33, ] 2      1          3     3        ab            NA
#> + new[34, ] 2      1          4     4        ab            NA
#> + new[35, ] 2      1          5     5        ab            NA
#> + new[36, ] 2      2          1     6        ab            NA
#> + new[37, ] 2      2          2     7        ab            NA
#> and 3 more ...
#> 
#> `old$i[28:30]`: 1 1 1               and 3 more...
#> `new$i[28:40]`: 1 1 1 2 2 2 2 2 2 2           ...
#> 
#> `old$.chain[28:30]`: 2 2 2               and 3 more...
#> `new$.chain[28:40]`: 2 2 2 1 1 1 1 1 2 2           ...
#> 
#> `old$.iteration[28:30]`: 3 4 5               and 3 more...
#> `new$.iteration[28:40]`: 3 4 5 1 2 3 4 5 1 2           ...
#> 
#> And 3 more differences ...

Created on 2025-03-19 with reprex v2.1.1

Lionel-Re pushed a commit to Lionel-Re/tidybayes that referenced this pull request Mar 27, 2025
Lionel-Re added a commit to Lionel-Re/tidybayes that referenced this pull request Mar 31, 2025
Closes mjskay#330 and partal Fix issues  mjskay#332
Comment for our approach:
Purpose: In our approach, we modified the gather_draws() function to handle NA values more effectively after gathering the variables. This change ensures that rows with missing values in the .value column are removed, allowing us to work with clean data.

Implementation: To achieve this, we used lapply() within the gather_draws() function. This function dynamically applies the filter() operation to each of the gathered variables, ensuring that rows containing NA values are excluded from each variable. The filtering happens automatically for each variable based on its respective values.

Specific Code Change: We added the following lines of code to filter out NA values in the .value column after gathering the data:

tidied = tidied %>% 
  filter(!is.na(.value))  # Filter out rows where .value is NA
This ensures that only rows with valid (non-NA) .value entries are retained for further analysis.

Reasoning: By applying this filter dynamically for each gathered variable, our solution became more flexible and could handle datasets where some variables might contain missing data. This approach removes the need for manually dealing with missing values for each variable, ensuring that the resulting data is clean and ready for analysis.

Outcome: After applying this change, the data returned by gather_draws() no longer contains rows with missing values in the .value column, making the data ready for the next steps in the analysis process.

library(tidybayes)
library(dplyr)
library(tidyr)
library(waldo)
my_tidy_draws <-expand_grid(.chain = 1:2, .iteration = 1:5) |>
mutate(.draw = tidybayes:::draw_from_chain_and_iteration_(.chain, .iteration)) |>
mutate(aa[1] = rnorm(n()), aa[2] = rnorm(n()), ab[1] = rnorm(n())) |>
tidy_draws()
gather_draws_explicit <- my_tidy_draws |> gather_draws(aa[i], ab[i])
gather_draws_regex <- my_tidy_draws |> gather_draws(a.*[i], regex = T)

Result
compare(gather_draws_explicit, gather_draws_regex)
✔ No differences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All nested columns must have the same number of elements error only when using regex = T
1 participant