Partial fix for #330 #332

damonbayer · 2025-03-20T02:57:11Z

Closes #330.

This is not yet complete, but I think it is a good start. The remaining issue is that extraneous NAs in the .value column.

library(tidybayes)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(waldo)

my_tidy_draws <- 
  expand_grid(.chain = 1:2, .iteration = 1:5) |> 
  mutate(.draw = tidybayes:::draw_from_chain_and_iteration_(.chain, .iteration)) |> 
  mutate(`aa[1]` = rnorm(n()), `aa[2]` = rnorm(n()), `ab[1]` = rnorm(n())) |> 
  tidy_draws()

gather_draws_explicit <- my_tidy_draws |> gather_draws(aa[i], ab[i])

gather_draws_regex <- my_tidy_draws |> gather_draws(`a.*`[i], regex = T)

compare(gather_draws_explicit, gather_draws_regex)
#> `attr(attr(old, 'groups'), 'row.names')`: 1 2 3  
#> `attr(attr(new, 'groups'), 'row.names')`: 1 2 3 4
#> 
#> `attr(old, 'groups')$i`: 1 1 2  
#> `attr(new, 'groups')$i`: 1 1 2 2
#> 
#> `attr(old, 'groups')$.variable`: "aa" "ab" "aa"     
#> `attr(new, 'groups')$.variable`: "aa" "ab" "aa" "ab"
#> 
#> `attr(old, 'groups')$.rows` is length 3
#> `attr(new, 'groups')$.rows` is length 4
#> 
#> `attr(old, 'groups')$.rows[[4]]` is absent
#> `attr(new, 'groups')$.rows[[4]]` is an integer vector (31, 32, 33, 34, 35, ...)
#> 
#> `attr(old, 'row.names')[28:30]`: 28 29 30                      and 3 more...
#> `attr(new, 'row.names')[28:40]`: 28 29 30 31 32 33 34 35 36 37           ...
#> 
#> old vs new
#>             i .chain .iteration .draw .variable        .value
#>   old[27, ] 1      2          2     7        ab  0.3754198033
#>   old[28, ] 1      2          3     8        ab  0.2663837133
#>   old[29, ] 1      2          4     9        ab  0.7981538291
#>   old[30, ] 1      2          5    10        ab  0.6497445085
#> + new[31, ] 2      1          1     1        ab            NA
#> + new[32, ] 2      1          2     2        ab            NA
#> + new[33, ] 2      1          3     3        ab            NA
#> + new[34, ] 2      1          4     4        ab            NA
#> + new[35, ] 2      1          5     5        ab            NA
#> + new[36, ] 2      2          1     6        ab            NA
#> + new[37, ] 2      2          2     7        ab            NA
#> and 3 more ...
#> 
#> `old$i[28:30]`: 1 1 1               and 3 more...
#> `new$i[28:40]`: 1 1 1 2 2 2 2 2 2 2           ...
#> 
#> `old$.chain[28:30]`: 2 2 2               and 3 more...
#> `new$.chain[28:40]`: 2 2 2 1 1 1 1 1 2 2           ...
#> 
#> `old$.iteration[28:30]`: 3 4 5               and 3 more...
#> `new$.iteration[28:40]`: 3 4 5 1 2 3 4 5 1 2           ...
#> 
#> And 3 more differences ...

^{Created on 2025-03-19 with reprex v2.1.1}

Closes mjskay#330 and partal Fix issues mjskay#332 Comment for our approach: Purpose: In our approach, we modified the gather_draws() function to handle NA values more effectively after gathering the variables. This change ensures that rows with missing values in the .value column are removed, allowing us to work with clean data. Implementation: To achieve this, we used lapply() within the gather_draws() function. This function dynamically applies the filter() operation to each of the gathered variables, ensuring that rows containing NA values are excluded from each variable. The filtering happens automatically for each variable based on its respective values. Specific Code Change: We added the following lines of code to filter out NA values in the .value column after gathering the data: tidied = tidied %>% filter(!is.na(.value)) # Filter out rows where .value is NA This ensures that only rows with valid (non-NA) .value entries are retained for further analysis. Reasoning: By applying this filter dynamically for each gathered variable, our solution became more flexible and could handle datasets where some variables might contain missing data. This approach removes the need for manually dealing with missing values for each variable, ensuring that the resulting data is clean and ready for analysis. Outcome: After applying this change, the data returned by gather_draws() no longer contains rows with missing values in the .value column, making the data ready for the next steps in the analysis process. library(tidybayes) library(dplyr) library(tidyr) library(waldo) my_tidy_draws <-expand_grid(.chain = 1:2, .iteration = 1:5) |> mutate(.draw = tidybayes:::draw_from_chain_and_iteration_(.chain, .iteration)) |> mutate(aa[1] = rnorm(n()), aa[2] = rnorm(n()), ab[1] = rnorm(n())) |> tidy_draws() gather_draws_explicit <- my_tidy_draws |> gather_draws(aa[i], ab[i]) gather_draws_regex <- my_tidy_draws |> gather_draws(a.*[i], regex = T) Result compare(gather_draws_explicit, gather_draws_regex) ✔ No differences

partial fix for 330

6d6d65d

damonbayer mentioned this pull request Mar 20, 2025

All nested columns must have the same number of elements error only when using regex = T #330

Open

Lionel-Re pushed a commit to Lionel-Re/tidybayes that referenced this pull request Mar 27, 2025

fix: unnest NA issues (mjskay#330) and (mjskay#332)

1e88616

katossky mentioned this pull request Apr 13, 2025

Fix issues 330 Lionel-Re/tidybayes#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Partial fix for #330 #332

Partial fix for #330 #332

Uh oh!

damonbayer commented Mar 20, 2025

Uh oh!

Uh oh!

Partial fix for #330 #332

Are you sure you want to change the base?

Partial fix for #330 #332

Uh oh!

Conversation

damonbayer commented Mar 20, 2025

Uh oh!

Uh oh!