-
Notifications
You must be signed in to change notification settings - Fork 268
Fix issue(s) preventing conus13km_restart and conus13km_decomp regression tests from running #2647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue(s) preventing conus13km_restart and conus13km_decomp regression tests from running #2647
Comments
@MatthewPyle-NOAA I think I fixed two issues that prevented bit-reproducible results for conus13km restart and decomp tests. Please take a look at this branch (https://github.com/DusanJovic-NOAA/ufs-weather-model/tree/rrfs_conus13km_tests) and try to run your test with it. Once you confirm that it works for you, I can open a PR. To fix the restart tests, I had to add one variable ('ebu_smoke') to the restart file. The decomp tests were fixed by reducing the number of MPI tasks in both I and J direction, such that the size of MPI subdomains is greater than 'nrows_blend'. |
Thanks @DusanJovic-NOAA for this quick fix. I'll try to take a look soonish and let you know. |
@DusanJovic-NOAA Could you also make these changes on top of the production/RRFS.v1 branch? That would simplify testing it more fully for RRFS. Thanks! |
@MatthewPyle-NOAA When I try to run one of the regression tests from production/RRFS.v1 branch compilation fails with this error:
How do you run the tests in this branch? Standard |
@DusanJovic-NOAA We run the tests within rt.conf_rrfs - not sure if that explains the difference, though. |
@MatthewPyle-NOAA Please check this branch https://github.com/DusanJovic-NOAA/ufs-weather-model/tree/rrfs_v1_conus13km_tests This is based on production/RRFS.v1 branch |
The compiling failure is due to the compiling option
and discussed here: I wrote these at the time:
|
Thanks @JiliDong-NOAA I figured it was related to the PARALLELRESTART material. |
@MatthewPyle-NOAA would you please confirm if Dusan's branch resolved the issue?
|
@junwang-noaa I'm waiting on somebody to run a full test with the RRFS, but they are occupied with a couple of other higher priority items right now. Hopefully it isn't a huge problem for it to linger for a bit longer? |
@JiliDong-NOAA has finally started trying to run it using a full RRFS configuration case this week. Initial tests comparing a 2 h continuous forecast and one restart at 1 h showed difference. Since a major difference between the regression test configuration that works properly and the RRFS configuration that failed is the choice of convection (RRFS uses saSAS), Jili did a sensitivity test with convection disabled. This worked somewhat better, with only a small number of diagnostic fields differing. Summary from Jili: After turning off deep convection, the prognostic and most diagnostic variables are reproducible in the restart run, except for three diagnostic variables: hailcast_dhail, accswe_land and accswe_ice. In a summary, the current rrfsv1 deterministic forecast has the following restart reproducibility issues: saSAS deep convection |
@MatthewPyle-NOAA @DusanJovic-NOAA to fix saSAS related restart reproducibility of rrfsv1, I just submitted a fv3atm PR to this branch: The changes are only in ccpp/physics though. |
@JiliDong-NOAA I merged your PR to my rrfs_v1_conus13km_tests branch. Is there a regression test in develop branch that uses the saSAS scheme, which can be used to test restart functionality. Will this change need to be made to develop branch as well? |
thanks @DusanJovic-NOAA ! There should be a bunch of global regression tests using saSAS in the develop branch. But the restart reproducibility issue in rrfsv1 only happens when sigmab_coldstart is turned on, which I don't believe any regression test does. @MatthewPyle-NOAA had suggested to add a regression test with configurations similar to rrfsv1, which I think is a great idea. As for whether making the same changes to develop branch, the logic there looks fine as long as sigmab_coldstart is not enabled:
|
Uh oh!
There was an error while loading. Please reload this page.
Description
We have reason to believe that the RRFS system isn't bit-reproducible with restarts compared to a continuous integration. We wanted to look at the regression tests for confirmation within a simpler framework, and it looks like the restart problem is a known issue, as the conus13km_restart test is purposely not run.
To Reproduce:
Numerous tests within tests/rt.conf in the regional "conus13km" realm are commented out under "# Expected to fail:" headings.
Additional context
Ideally would want these conus13km tests to use the RRFS_sas physics being used for RRFS rather than the FV3_HRRR suite.
Output
The text was updated successfully, but these errors were encountered: