Skip to content

Bug fix for surface roughness calculation over lakes and sea ice points when wave model is turned on #2739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

ChristianBoyer-NOAA
Copy link

@ChristianBoyer-NOAA ChristianBoyer-NOAA commented May 15, 2025

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR and its related PRs fix a bug so that the wave model only updates surface roughness over open ocean, and not over lakes and sea ice. It also avoids using unrealistic surface roughness values from the wave model, such as values greater than 0.1 m.

Note: A baseline was created and can be reproduced.

Commit Message:

* UFSWM - Updates calculation of surface roughness over lake/sea ice points when wave model is turned on.
  * FV3 - Updates calculation of surface roughness over lake/sea ice points when wave model is turned on.
    * ccpp-physics - Updates calculation of surface roughness over lake/sea ice points when wave model is turned on.

Priority:

  • Critical Bugfix: Corrects wrong surface roughness calculation over lakes and sea ice.

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Documentation:

  • No documentation update is required for this PR (please explain). Bug fix.

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR updates/changes baselines.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

Christian Boyer and others added 3 commits May 13, 2025 08:52
@DeniseWorthen
Copy link
Collaborator

@ChristianBoyer-NOAA I'm still puzzled about whether the issue is actually a field which is mapped from the WAV model to the lake or sea ice or whether it is a logic/coding bug where some sort of background or fill value is applied on lake/sea ice when the WAV model is present but not providing a value.

The land mask sent to the mediator contains a 1/0 land mask as determined by the mapped ocean mask

! use land sea mask: land:1, ocean:0
        lsmask(i,j) = floor(one + epsln - GFS_sfcprop%oceanfrac(im))

When mapping WAV->ATM, the dstMaskValue is 1, meaning that no ATM points where land mask==1 is mapped to. I really can't see how anything from the WAV is getting mapped to lake fractions. Do you have a run directory somewhere which shows the issue? I have no C1152 sandbox available to me.

@JessicaMeixner-NOAA
Copy link
Collaborator

@DeniseWorthen - C384 should also show this issue as that's the resolution @lisa-bengtsson and @egrell found it.

One thought that it could be less about the wave model having values there and more about the if statements in the sfc_diff routine.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented May 16, 2025

@JessicaMeixner-NOAA Yes, thanks. That is really my question too. Because if the values are actually coming from the WAV model (somehow) then that needs investigation.

@JessicaMeixner-NOAA
Copy link
Collaborator

@JessicaMeixner-NOAA Yes, thanks. That is really my question too. Because if the values are actually coming from the WAV model (somehow) then that needs investigation.

Agreed.

@ChristianBoyer-NOAA
Copy link
Author

@DeniseWorthen The unrealistic values of surface roughness from the wave model (z0rl_wav, first image) mentioned in the PR was only occurring over sea ice at FH000 (see circled area in images). The new structure of the if statements added by @JongilHan66 in the sfc_diff routine updates surface roughness using z0rl_wav over open ocean only, and fixes the issue in the circled area where z0rl_wav was unintentionally updating the surface roughness over the sea ice as well. The 2nd figure is surface roughness over sea ice before the bug fix and 3rd figure is the surface roughness over sea ice after the bug fix at FH000 to show the result of the fix over sea ice.

z0rl_wav

z0_over_seaice_prebugfix

z0_over_seaice_update

@grantfirl
Copy link
Collaborator

@ChristianBoyer-NOAA Please update the PR description to note that this work DOES change RT baselines (as evidenced by the test_changes.list file)

@ChristianBoyer-NOAA
Copy link
Author

@ChristianBoyer-NOAA Please update the PR description to note that this work DOES change RT baselines (as evidenced by the test_changes.list file)

PR description is updated. Thanks.

@DeniseWorthen
Copy link
Collaborator

When the WAV model initializes, it initializes the Sw_z0 field to a uniform value of 9.688E-14 everywhere. It does not update that value and send it to ATM until the first ModelAdvance completes; regardless of whether WAV is in the slow or fast loop, no value from WAV would be in the FH000 file. It seems obvious that ATM has some value at FH000 for Z0, but I really don't think it is from the WAV model. I think you have legitimately found a bug in sfc_diff and that is responsible for the issue you are seeing.

@DeniseWorthen
Copy link
Collaborator

@ChristianBoyer-NOAA Just to confirm, this also fixes the issue where you were seeing incorrect Z0 values over lakes?

@ChristianBoyer-NOAA
Copy link
Author

@ChristianBoyer-NOAA Just to confirm, this also fixes the issue where you were seeing incorrect Z0 values over lakes?

@DeniseWorthen Yes, this PR also fixes the issue for z0 over the lakes.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented May 19, 2025

When the WAV model initializes, it initializes the Sw_z0 field to a uniform value of 9.688E-14 everywhere. It does not update that value and send it to ATM until the first ModelAdvance completes; regardless of whether WAV is in the slow or fast loop, no value from WAV would be in the FH000 file. It seems obvious that ATM has some value at FH000 for Z0, but I really don't think it is from the WAV model. I think you have legitimately found a bug in sfc_diff and that is responsible for the issue you are seeing.

Is this issue only showing up in cold start runs only? If the restart run works fine, then that can confirm that the large value is not from wave.

@JessicaMeixner-NOAA
Copy link
Collaborator

I started to test some of this in cycled, but never quite finished that testing to look at similar questions you are asking @junwang-noaa . We could probably use the cpld_control_gefs to run one extra restart tests but with no Wave -> ATm coupling and look at those diffs instead of trying to set up a full cycled test like I was trying to do before.

@ChristianBoyer-NOAA - If I showed you what to run/change would you have time to do this test?

@ChristianBoyer-NOAA
Copy link
Author

I started to test some of this in cycled, but never quite finished that testing to look at similar questions you are asking @junwang-noaa . We could probably use the cpld_control_gefs to run one extra restart tests but with no Wave -> ATm coupling and look at those diffs instead of trying to set up a full cycled test like I was trying to do before.

@ChristianBoyer-NOAA - If I showed you what to run/change would you have time to do this test?

@JessicaMeixner-NOAA Yes, I'd be able to run a test to check in cycled runs if you showed me what to run and change.

Sorry for the late response. I was on leave yesterday afternoon.

@ChristianBoyer-NOAA
Copy link
Author

I completed the test Jessica suggested and have figures in the slides here with the output directories on Hercules on slide 1.

These tests ran the cpld_control_gefs RTs using my branch with the fix submitted in this PR and the develop branch. The restart runs used the original cpld_restart_gefs and a second cpld_restart_gefs which turned off the Wave -> ATM feedback by setting CPLWAV2ATM=.false. The file for this additional restart test can be found at /work/noaa/global/cboyer/PRs/fresh_lakefrac_ice_branch_clone/ufs-weather-model/tests/tests/cpld_restart_gefs2.

Below are 2 figures (also on slide 5) showing the difference of surface roughness between the fully coupled restart and the restart without Wave->ATM feedback at FH=10 h, the first output of the restart. The fix branch is on top and the develop branch is on the bottom for the globe and zoomed in on the Great Lakes. The fix branch shows that there is negligible difference between the surface roughness over lake surfaces of the fully coupled and when Wave->ATM feedback is off compared to the develop branch, which shows much larger differences. FH11 and FH12 are on slides 6 and 7, respectively, and show similar results of fixing the issue over the lakes.
download
download

Additional figures included in the slides include the difference of surface roughness between the fix branch and develop of fully coupled (top) and Wave->ATM feedback turned off (bot) on slides 2-4. Slides 8-20 show the differences in surface roughness between the fix branch and develop branch of the fully coupled control RT (cpld_control_gefs) at all forecast hours of the test.

@JessicaMeixner-NOAA
Copy link
Collaborator

@DeniseWorthen @junwang-noaa - Are there other additional questions on this PR that you have or would like to see answered?

@DeniseWorthen
Copy link
Collaborator

@ChristianBoyer-NOAA Thanks for the additional tests. I'm satisfied that this is a bug in FV3/ccpp.

@junwang-noaa
Copy link
Collaborator

I left a comment on the slides last week. I am not sure if the fixed branch has completely resolved the issue (should we expect zero values instead of the negligible differences over lakes?).

@JongilHan66
Copy link
Collaborator

@junwang-noaa At FH=0, we expect zero difference over lakes, but after the initial time, there would be some differences due to air flow differences at the model first layer over lakes.

@junwang-noaa
Copy link
Collaborator

@JongilHan66 Thanks for confirming that the issue is resolved in the fix branch.

@jkbk2004
Copy link
Collaborator

@ChristianBoyer-NOAA @JongilHan66 @DeniseWorthen @JessicaMeixner-NOAA @junwang-noaa It sounds like this pr is ready for the schedule to commit.

@jkbk2004
Copy link
Collaborator

@ChristianBoyer-NOAA can you sync up branch? if everything is ok, we like to start working on this.

@ChristianBoyer-NOAA
Copy link
Author

@ChristianBoyer-NOAA can you sync up branch? if everything is ok, we like to start working on this.

@jkbk2004 working on it now.

@grantfirl
Copy link
Collaborator

@jkbk2004 I've been asked if we can combine ufs-community/ccpp-physics#280 into this. It's documentation only and the HSD release is supposed to include this.

@grantfirl
Copy link
Collaborator

@jkbk2004 Basically, I'd like to combine that documentation update with whatever CCPP PR will go in next.

@ChristianBoyer-NOAA
Copy link
Author

@jkbk2004 I've been asked if we can combine ufs-community/ccpp-physics#280 into this. It's documentation only and the HSD release is supposed to include this.

If you would like me to include this in this PR, let me know.

@jkbk2004
Copy link
Collaborator

@jkbk2004 I've been asked if we can combine ufs-community/ccpp-physics#280 into this. It's documentation only and the HSD release is supposed to include this.

If you would like me to include this in this PR, let me know.

@ChristianBoyer-NOAA sure! go ahead to combine.

@ChristianBoyer-NOAA
Copy link
Author

@jkbk2004 ufs-community/ccpp-physics#280 should be added to my branch now

@jkbk2004 jkbk2004 added Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. jenkins-ort run ORT testing In Testing The PR that is currently in testing stages labels May 28, 2025
@FernandoAndrade-NOAA FernandoAndrade-NOAA removed the jenkins-ort run ORT testing label May 29, 2025
@jkbk2004
Copy link
Collaborator

We need to skip tests on orion and hercules: file systems are not stable after the maintenance on 05-28.

@jkbk2004
Copy link
Collaborator

Actually, jobs on orion slowly runs. Started seeing crashing case with cpld_control_c192_p8_intel.

@jkbk2004
Copy link
Collaborator

and the following cases with timeout issue on orion

cpld_control_gefs_intel
cpld_control_gfsv17_iau_intel
hafs_regional_atm_wav_intel

@FernandoAndrade-NOAA can you double check on hercules if the machine is picking up jobs?

@DeniseWorthen
Copy link
Collaborator

So is cpld_control_c192_p8_intel crashing or not? And where...platform, stderr information....?

@jkbk2004
Copy link
Collaborator

So is cpld_control_c192_p8_intel crashing or not? And where...platform, stderr information....?

crashing with MPI_Abort on orion at /work/noaa/stmp/jongkim/stmp/jongkim/FV3_RT/rt_725883/cpld_control_c192_p8_intel. I am trying to turn on debug to see if more information can be pulled out.

@jkbk2004
Copy link
Collaborator

@FernandoAndrade-NOAA Note that file system wasn't stable on orion and hercules.

@DeniseWorthen
Copy link
Collaborator

I don't know what you mean by "turning on debug" ...just copy the s2swa_debug executable from cpld_debug_p8 test.

And you have locked down your stmp directories, so nobody else can help.

@jkbk2004
Copy link
Collaborator

it looks file system issue continues on orion and hercules.

@FernandoAndrade-NOAA
Copy link
Collaborator

FernandoAndrade-NOAA commented May 30, 2025

@FernandoAndrade-NOAA Note that file system wasn't stable on orion and hercules.

FYI, I could not get a clone to start under work/, work2/ seemed fine on Hercules.

@ChristianBoyer-NOAA
Copy link
Author

Before the other PR was merged into mine and before to the maintenance period/file system issue, I ran the regression tests successfully on Hercules without errors.

@jkbk2004
Copy link
Collaborator

With -DDEBUG=ON, cpld_control_c192_p8_intel keeps running around n_atmsteps =129. Also, develop branch runs ok on orion for the cpld_control_c192_p8_intel case. It looks a bug is popping up with the pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. In Testing The PR that is currently in testing stages Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wave model impacting roughness length calculation over lakes and sea ice.
10 participants