Skip to content

Improvement of delayed allocation mechanism #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
ed7f067
Add lbounds and ubounds to wrapper
dareg Jun 6, 2025
e2bd04b
Always use dim to allocate on device
dareg Jun 6, 2025
62b43a3
Don't create host data when calling get_device_data
dareg Jun 10, 2025
50564d1
set status correctly
dareg Jun 10, 2025
9cf8191
update get_host for when the host ptr was not allocated but the gpu o…
dareg Jun 11, 2025
df7a8f5
force delayed allocation for owner
dareg Jun 11, 2025
5968c54
Rework the status system to let the host and dev ptr be allocated ind…
dareg Jun 16, 2025
3b2116b
Fix broken status modification
dareg Jun 16, 2025
82cdb3f
Fix broken get_view
dareg Jun 16, 2025
9c374a3
Fix transfer CPU<->GPU conditions
dareg Jun 16, 2025
bda3c26
Fix get_view* test-cases
dareg Jun 16, 2025
e3dc484
make sure data are allocated on cpu for get_view
dareg Jun 18, 2025
7c7ebfb
delayed default value can now be controlled through a global
dareg Jun 18, 2025
184b500
cleaning
dareg Jun 19, 2025
5f33c3b
cleaning
dareg Jun 19, 2025
216601f
cleaning
dareg Jun 19, 2025
bbc07a2
cleaning
dareg Jun 19, 2025
7301193
Add missing add_status and remove_status to field gang objects
dareg Jun 23, 2025
09753cf
Fix test-cases
dareg Jun 24, 2025
c352310
Add more tests
dareg Jun 24, 2025
4ec4760
Correct initialization when init_value is set
dareg Jun 24, 2025
5dcd5b5
cleaning
dareg Jun 24, 2025
28e69a9
Use correct negation function
dareg Jun 25, 2025
8d8f013
Use correct argument name
dareg Jun 25, 2025
2155321
There must be no return statement in main
dareg Jun 25, 2025
40b82f8
cleaning
dareg Jun 25, 2025
03a0246
Use delayed by default
dareg Jun 25, 2025
3e757b0
Let the default value of delayed be user-configurable
dareg Jun 26, 2025
5633c1d
Fix test-case
dareg Jun 26, 2025
0e6b0c8
Alloc in get view when necessary
dareg Jul 7, 2025
e44f732
Reset GPU context
dareg Jul 28, 2025
250beb2
Cleaning
dareg Jul 29, 2025
a4aa7c6
Test get view when not using GPU number 0
dareg Jul 29, 2025
73b7cc6
Recreate context for the current GPU, not necessary the 0th one
dareg Jul 29, 2025
64131a6
Fix test-case when not using openacc
dareg Jul 29, 2025
afb0b9f
Don't use delayed by default
dareg Jul 29, 2025
4224cfa
Update doc
dareg Jul 29, 2025
613e211
Merge branch 'new_delayed' into new_delayed_prepare_pr
dareg Jul 29, 2025
5df1113
Remove now useless part of test-case
dareg Jul 30, 2025
b85d5bc
Cleaning
dareg Jul 30, 2025
74d5b3e
Merge branch 'new_delayed' into new_delayed_prepare_pr
dareg Jul 30, 2025
2d99770
Make it work with OpenMP GPU offloading
dareg Jul 31, 2025
195b7f2
Update NVHPC compiler version used in CI
dareg Jul 31, 2025
feb48a8
Update doc
dareg Aug 1, 2025
648e7bf
Minor doc fix
dareg Aug 4, 2025
1fb9d2b
Merge branch 'main' into new_delayed_prepare_pr
awnawab Aug 4, 2025
1b0209a
Remove now useless UNALLOCATED constant
dareg Aug 6, 2025
d1e2922
Remove now unused DEV_ALLOCATE_HST subroutine
dareg Aug 6, 2025
b5a14ab
Add an option to change get_view behaviour
dareg Aug 6, 2025
d0f9d01
Add a compile option to switch behaviour of get_view
dareg Aug 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,9 @@ jobs:
- name: linux nvhpc-24.5
os: ubuntu-22.04
compiler: nvhpc-24.5
compiler_cc: mpicc
compiler_cxx: mpic++
compiler_fc: mpifort
compiler_cc: nvc
compiler_cxx: nvc++
compiler_fc: nvfortran
python-version: '3.8'
caching: true

Expand Down
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,12 @@ ecbuild_add_option( FEATURE FIELD_GANG
DEFAULT ON
)

##Get_view abort
ecbuild_add_option( FEATURE GET_VIEW_ABORT
DESCRIPTION "Enable this option to make get_view abort when data are not on CPU"
DEFAULT ON
)


## fypp preprocessor flags
if(HAVE_BUDDY_MALLOC)
Expand Down
11 changes: 3 additions & 8 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Features of FIELD_API can be toggled by passing the following argument to the CM
| DOUBLE_PRECISION | ON | Enable the compilation of field_api in double precision |
| CUDA | OFF | Enable the use of CUDA for GPU offload. Disables the use of the buddy memory allocator, removes the shadow host allocation for `FIELD%DEVPTR` and allocates owned fields (see below) in pinned (page-locked) host memory.|
| FIELD_GANG | ON | Enable packed storage of groups of fields. This feature is not supported for the Cray compiler as it cannot resolve the underlying polymorphism.|
| GET_VIEW_ABORT | ON | If activated, get_view will abort when the data are not present on CPU. |

## Supported compilers
The library has been tested with the nvhpc toolkit from Nvidia, version 23.9/24.5
Expand Down Expand Up @@ -155,7 +156,8 @@ would then happen only if the data would be requested at some point, later in
the program. It can be useful if one doesn't want to waste memory on data that
might be only conditionally used. But please keep in mind, that allocating data
can be slow and will slow down the program if done during a computation heavy
part of the code.
part of the code. The default value for the delayed option is false, but it can
be switched by setting delayed\_default\_value to true.

```
SUBROUTINE SUB(MYTEST)
Expand Down Expand Up @@ -280,13 +282,6 @@ write(*,*)"Total/Avg Time spend on transfer CPU->GPU", NUM_CPU_GPU_TR, "/" AVG,
...
```

## Note on GET\_VIEW

GET\_VIEW must only be called in sections of code running on the host. The
field's data must be present on the host. It will not work if the data are on
the device or if the field has not been allocated yet (when using the DELAY
option).

## Cloning fields with FIELD\_CLONE\_ON_

The subroutines FIELD_CLONE_ON_HOST and FIELD_CLONE_ON_DEVICE let a field be
Expand Down
10 changes: 10 additions & 0 deletions python_utils/offload_backends/nvhpc/openacc.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,3 +265,13 @@ def end_data_deviceptr(cls):


return "!$acc end data"

@classmethod
def reinit_gpu_context(cls):
"""
Used to force reinitialization of GPU device.
Usefull when not calling GPU transfer function from OpenMP master thread
To use it you must have called the method *runtime_api_import* before
"""

return "CALL ACC_SET_DEVICE_NUM(ACC_GET_DEVICE_NUM(ACC_DEVICE_NVIDIA), ACC_DEVICE_NVIDIA)"
10 changes: 10 additions & 0 deletions python_utils/offload_backends/nvhpc/openmp.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,3 +271,13 @@ def update_host(cls, data):
"""

return f"!$omp target update from ({','.join(data)})"

@classmethod
def reinit_gpu_context(cls):
"""
Used to force reinitialization of GPU device.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate the detailed documentation 🙏

Usefull when not calling GPU transfer function from OpenMP master thread
To use it you must have called the method *runtime_api_import* before
"""

return "CALL OMP_SET_DEFAULT_DEVICE(OMP_GET_DEFAULT_DEVICE())"
12 changes: 12 additions & 0 deletions python_utils/offload_macros.py
Original file line number Diff line number Diff line change
Expand Up @@ -582,3 +582,15 @@ def memcpy_2D_intf(indent=0):
method = _get_method(backend, 'memcpy_2D_intf')

return _format_lines(method(), indent=indent)

def reinit_gpu_context(indent=0):
"""
Used to force reinitialization of GPU device.
Usefull when not calling GPU transfer function from OpenMP master thread
To use it you must have called the method *runtime_api_import* before
"""

backend = _get_offload_backend()
method = _get_method(backend, 'reinit_gpu_context')

return _format_lines(method(), indent=indent)
30 changes: 30 additions & 0 deletions src/buffer/field_RANKSUFF_gang_module.fypp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ CONTAINS
PROCEDURE :: CREATE_DEVICE_DATA => ${ftn}$_GANG_${type}$_CREATE_DEVICE_DATA
PROCEDURE :: DELETE_DEVICE_DATA => ${ftn}$_GANG_${type}$_DELETE_DEVICE_DATA
PROCEDURE :: SET_STATUS => ${ftn}$_GANG_${type}$_SET_STATUS
PROCEDURE :: ADD_STATUS => ${ftn}$_GANG_${type}$_ADD_STATUS
PROCEDURE :: REMOVE_STATUS => ${ftn}$_GANG_${type}$_REMOVE_STATUS
END TYPE ${ftn}$_GANG_${type}$

PUBLIC :: ${ftn}$_GANG_${type}$
Expand Down Expand Up @@ -277,6 +279,34 @@ CONTAINS

END SUBROUTINE

SUBROUTINE ${ftn}$_GANG_${type}$_ADD_STATUS (SELF, KSTATUS)
CLASS(${ftn}$_GANG_${type}$) :: SELF
INTEGER (KIND=JPIM), INTENT (IN) :: KSTATUS

INTEGER (KIND=JPIM) :: JFLD

CALL SELF%${ftn}$_${type}$%ADD_STATUS (KSTATUS)

DO JFLD = 1, SIZE (SELF%CHILDREN)
CALL SELF%CHILDREN(JFLD)%PTR%ADD_STATUS (KSTATUS)
ENDDO

END SUBROUTINE

SUBROUTINE ${ftn}$_GANG_${type}$_REMOVE_STATUS (SELF, KSTATUS)
CLASS(${ftn}$_GANG_${type}$) :: SELF
INTEGER (KIND=JPIM), INTENT (IN) :: KSTATUS

INTEGER (KIND=JPIM) :: JFLD

CALL SELF%${ftn}$_${type}$%REMOVE_STATUS (KSTATUS)

DO JFLD = 1, SIZE (SELF%CHILDREN)
CALL SELF%CHILDREN(JFLD)%PTR%REMOVE_STATUS (KSTATUS)
ENDDO

END SUBROUTINE

#:endfor

#:endfor
Expand Down
7 changes: 7 additions & 0 deletions src/core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ field_api_add_object_library(
LIBRARIES field_api_debug
)

if(HAVE_GET_VIEW_ABORT)
target_compile_definitions( field_api_core PRIVATE "-DGET_VIEW_ABORT_DEFAULT_VALUE=.TRUE.")
else()
target_compile_definitions( field_api_core PRIVATE "-DGET_VIEW_ABORT_DEFAULT_VALUE=.FALSE.")
endif()


set_source_files_properties( ${CMAKE_CURRENT_BINARY_DIR}/dev_alloc_module.F90 PROPERTIES COMPILE_OPTIONS $<${HAVE_CUDA}:-cuda>)
set_source_files_properties( ${CMAKE_CURRENT_BINARY_DIR}/host_alloc_module.F90 PROPERTIES COMPILE_OPTIONS $<${HAVE_CUDA}:-cuda>)

Expand Down
43 changes: 0 additions & 43 deletions src/core/dev_alloc_module.fypp
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,6 @@ USE, INTRINSIC :: ISO_C_BINDING

IMPLICIT NONE

INTERFACE DEV_ALLOCATE_HST
#:for ft in fieldTypeList
MODULE PROCEDURE ${ft.name}$_DEV_ALLOCATE_HST
#:endfor
END INTERFACE

#:if defined('USE_BUDDY_MALLOC')
INTERFACE DEV_ALLOCATE_DIM
#:for ft in fieldTypeList
Expand Down Expand Up @@ -65,22 +59,6 @@ CONTAINS

#:if defined('USE_BUDDY_MALLOC') or defined('WITH_HIC')

SUBROUTINE ${ft.name}$_DEV_ALLOCATE_HST (DEV, HST, MAP_DEVPTR)

${ft.type}$, POINTER :: DEV(${ft.shape}$)
${ft.type}$, POINTER :: HST(${ft.shape}$)
LOGICAL, INTENT(IN) :: MAP_DEVPTR

INTEGER :: ILBOUNDS (${ft.rank}$)
INTEGER :: IUBOUNDS (${ft.rank}$)

ILBOUNDS = LBOUND (HST)
IUBOUNDS = UBOUND (HST)

CALL ${ft.name}$_DEV_ALLOCATE_DIM (DEV, UBOUNDS=IUBOUNDS, LBOUNDS=ILBOUNDS, MAP_DEVPTR=MAP_DEVPTR)

END SUBROUTINE ${ft.name}$_DEV_ALLOCATE_HST

SUBROUTINE ${ft.name}$_DEV_ALLOCATE_DIM (DEV, UBOUNDS, LBOUNDS, MAP_DEVPTR)

USE FIELD_STATISTICS_MODULE
Expand Down Expand Up @@ -167,27 +145,6 @@ END SUBROUTINE ${ft.name}$_DEV_DEALLOCATE

#:else

SUBROUTINE ${ft.name}$_DEV_ALLOCATE_HST (DEV, HST, MAP_DEVPTR)

USE FIELD_STATISTICS_MODULE

${ft.type}$, POINTER :: DEV(${ft.shape}$)
${ft.type}$, POINTER :: HST(${ft.shape}$)
LOGICAL, INTENT(IN) :: MAP_DEVPTR

#if __INTEL_COMPILER == 1800 && __INTEL_COMPILER_UPDATE == 5
! Bug with Intel 18.0.5.274
ALLOCATE (DEV (${ ', '.join (map (lambda i: 'LBOUND (HST, ' + str (i) + '):UBOUND (HST,' + str (i) + ')', range (1, ft.rank+1))) }$))
#else
ALLOCATE (DEV, MOLD=HST)
#endif

$:offload_macros.create(symbols=['DEV',])

IF (FIELD_STATISTICS_ENABLE) CALL FIELD_STATISTICS_DEVICE_ALLOCATE (SIZE (DEV, KIND=JPIB) * INT (KIND (DEV), KIND=JPIB))

END SUBROUTINE ${ft.name}$_DEV_ALLOCATE_HST

SUBROUTINE ${ft.name}$_DEV_DEALLOCATE (DEV, MAP_DEVPTR)

USE FIELD_STATISTICS_MODULE
Expand Down
Loading
Loading