refactoring branch such that ML model related #182

vedhasua · 2025-04-14T13:16:41Z

stuff is out of the openpmd-streaming-continual-learning.py and put into a separate directory

out of the openpmd-streaming-continual-learning.py and put into a separate directory

scripts/job_hemera.sh

jkelling · 2025-04-15T12:25:10Z

share/env/ddp_tested_hemera_env.sh

@@ -5,5 +5,5 @@ module load gcc/12.2.0 cuda/12.1 openmpi/4.1.5-cuda121-gdr ucx/1.14.0-gdr \
 	# openpmd/0.15.2-cuda121-blosc2-py3122
 # for (re-)instaling openpmd-api
 export openPMD_USE_MPI=ON
-source /home/kelling/checkout/insitumlNp2Torch26Env/bin/activate
+source /home/pandit52/venvs/Ism/bin/activate


No edit wars: do not commit local config changes (at least not into PR) .

tools/models/__init__.py

tools/models/architectures.py

jkelling · 2025-04-15T12:30:16Z

tools/models/architectures.py

+from inSituML.ks_models import INNModel
+
+
+class ModelFinal(nn.Module):


Having a more generic concept of the model to be trained is great. Note though, this class remains a somewhat specific instance, i.e. it has encoder, decoder, inner model. Please rename to something more descriptive, to respect this.

jkelling · 2025-04-15T12:36:24Z

tools/models/model_factory.py

+from inSituML.encoder_decoder import Encoder
+from inSituML.encoder_decoder import Conv3DDecoder


Encoder and Decoder class should be declared in model_config (not wholly defined, just imported from model library in general)

jkelling · 2025-04-15T12:49:53Z

tools/models/model_factory.py

+            ndim_tot=config["ndim_tot"],
+            ndim_x=config["ndim_x"],
+            ndim_y=config["ndim_y"],
+            ndim_z=config["ndim_z"],
+            loss_fit=fit,
+            loss_latent=MMD_multiscale,
+            loss_backward=MMD_multiscale,
+            lambd_predict=config["lambd_predict"],
+            lambd_latent=config["lambd_latent"],
+            lambd_rev=config["lambd_rev"],
+            zeros_noise_scale=config["zeros_noise_scale"],
+            y_noise_scale=config["y_noise_scale"],
+            hidden_size=config["hidden_size"],
+            activation=config["activation"],
+            num_coupling_layers=config["num_coupling_layers"],


These params should be from a dictionary in model_config. Most of them already are, maybe this could be renamed (more descriptive than "config", maybe "inner_model_config"). The details can then also be left out here, by just passing **model_config.inner_model_config to the inner model ctor.

tools/models/model_factory.py

jkelling · 2025-04-15T12:53:35Z

tools/models/model_factory.py

+    optimizer = optim.Adam(
+        [
+            {
+                "params": model.base_network.parameters(),
+                "lr": lr * config["lrAEmult"],
+            },
+            {"params": model.inner_model.parameters()},
+        ],  # model.parameters()
+        lr=lr,
+        betas=config["betas"],
+        eps=config["eps"],
+        weight_decay=config["weight_decay"],
+    )
+    if ("lr_annealingRate" not in config) or config[
+        "lr_annealingRate"
+    ] is None:
+        scheduler = None
+    else:
+        scheduler = torch.optim.lr_scheduler.StepLR(
+            optimizer, step_size=500, gamma=config["lr_annealingRate"]
+        )
+
+    return optimizer, scheduler, model


This is going beyond what should be in a model_factory module, because they are parameters to the training, though it is OK for the load_objects functions. Rename the file.

Optimizer and LR scheduler should also be configurable in module_config, but with defaults available.

tools/openpmd-streaming-continual-learning.py

refactoring such that ML model related stuff is

73167e3

out of the openpmd-streaming-continual-learning.py and put into a separate directory

vedhasua marked this pull request as draft April 14, 2025 13:17

vedhasua requested a review from jkelling April 14, 2025 13:17

jkelling requested changes Apr 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactoring branch such that ML model related #182

refactoring branch such that ML model related #182

Uh oh!

vedhasua commented Apr 14, 2025

Uh oh!

Uh oh!

jkelling Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

jkelling Apr 15, 2025

Uh oh!

jkelling Apr 15, 2025

Uh oh!

jkelling Apr 15, 2025

Uh oh!

Uh oh!

jkelling Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

		from inSituML.ks_models import INNModel


		class ModelFinal(nn.Module):

		from inSituML.encoder_decoder import Encoder
		from inSituML.encoder_decoder import Conv3DDecoder

refactoring branch such that ML model related #182

Are you sure you want to change the base?

refactoring branch such that ML model related #182

Uh oh!

Conversation

vedhasua commented Apr 14, 2025

Uh oh!

Uh oh!

jkelling Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jkelling Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

jkelling Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

jkelling Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jkelling Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!