CUDA memory requirement scaling with system size #26
-
Hi All! I have been running strong and weak scaling tests using several simple Allegro potentials I fitted earlier (with lmax=2, only 1 allegro layer, and very few parameters). Initially my LAMMPS codes runs fine for small system sizes, however I noticed that the memory requirements of pair-allegro seems to scale linearly with the number of atoms per GPU, and for my case if I put more than ~20,000 atoms per GPU, all of my allegro models give me the following error message, indicating an out of memory issue from CUDA:
However, as I have seen from Fig. 5 of this paper: https://www.nature.com/articles/s41467-023-36329-y. It seems that people have managed to put half a million atoms on a single GPU without problems. I was wondering what am I doing wrong here? Is there a trick I can do to reduce the memory requirement of the simulation? For reference, I am using LAMMPS (29 Sep 2021 - Update 2) compiled with KOKKOS acceleration for GPUs with CUDA support and serial backend (no OPENMP multithreading). Some of the parameters for the Allegro model fitting are attached:
Please let me know if additional information is needed for debugging this issue. Best, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi Yifan, Thanks for your interest in our code!
Yes, this is correct (see scaling section of the original Allegro paper.) How many atoms/GPU you can fit is obviously a function of how much memory your particular GPUs have available; the experiments in our papers are run on 80GB VRAM NVIDIA A100s. You can also try to use a smaller model, although your model is already quite small. Another critical parameter for memory use is the number of neighbors (i.e. system neighbor density, which is a function of cutoff). |
Beta Was this translation helpful? Give feedback.
Hi Yifan,
Thanks for your interest in our code!
Yes, this is correct (see scaling section of the original Allegro paper.) How many atoms/GPU you can fit is obviously a function of how much memory your particular GPUs have available; the experiments in our papers are run on 80GB VRAM NVIDIA A100s. You can also try to use a smaller model, although your model is already quite small. Another critical parameter for memory use is the number of neighbors (i.e. system neighbor density, which is a function of cutoff).