Refer to environments.yml
for the full list of required Python packages.
Key dependencies are based on the TopoModelX repository and include:
torch-sparse
torch-scatter
torch-cluster
optuna
optuna-integration
To run the code, first configure the machine and data type. These settings are easily extendable to accommodate different use cases.
The machine.py
file serves as the master configuration, governing the entire pipeline from preprocessing to training and testing.
Important configuration options include:
MACHINE
,TYPE
,SUBGRID
: define the machine setup and data variant.BASE_DIR
,DATA_DIR
,RESULT_DIR
: set paths for base, data, and output directories.LABEL_FILES
,CATALOG_SIZE
: specify label sources and dataset size.
Next, modify config_param.py
to set up priors for the cosmological and astrophysical parameters of the simulations.
All preprocessing-related files are located in the /preprocessing
directory. Run generate_cc.py
to construct combinatorial complexes.
The complex includes:
- Rank 0 (Nodes): Individual galaxies/halos
- Rank 1 (Edges): Connections using linking distance
r_link
- Rank 2 (Tetrahedra): Created via Delaunay triangulation
- Rank 3 (Clusters): Groups of tetrahedra
- Rank 4 (Hyperedges): Minimum spanning tree (MST) of clusters
Adjust preprocessing settings such as r_link
and the cutoff for higher-order cells in config_preprocess.py
.
The training setup supports PyTorch DDP (DistributedDataParallel), multi-node GPU execution, and is designed to be scalable. If you're using a SLURM-based queue system, submit training jobs with:
sbatch run.slurm
There are two main training options:
- Set hyperparameters via
config/config.py
- Run
main.py
- Set hyperparameter ranges in
config/hyperparameters.py
- Run
tune.py
Our models are invariant under E(3) transformations. We support four model variants:
- GNN
- TetraTNN
- ClusterTNN
- TNN
These models differ based on the selection of cells (e.g., nodes, edges, tetrahedra, clusters). They are stackable, and the number of stacked layers is treated as a tunable hyperparameter. Cell-cell invariant features (e.g., Euclidean or Hausdorff distances between arbitrary-rank cells) can be incorporated into the computations.
All computations are implemented using SparseTensor
operations for efficiency and scalability. Image below briefly demonstrates how higher-order message-passings are conducted.
We acknowledge the use of TopoModelX and TopoNetX for our higher-order network models and creation of combinatorial complexes. We also acknowledge the use and modification of CosmoGraphNet for building graphs.