Similar to classical numerics, regular grids are ideal for certain situations, but sub-optimal for others. Diffusion models are no different, but luckily the concepts of the previous sections do carry over when replacing the regular grids with graphs. Importantly, denoising and flow matching work similarly well on unstrucuted Eulerian meshes, as will be demonstrated below. This test case will illustrate another important aspect: diffusion models excel at completing data distributions. I.e., even when the training data has an incomplete distribution for a single example (defined by the geometry of the physical domain, boundary conditions and physical parameters), the "global" view of learning from different examples let's the networks complete the posterior distribution over the course of seeing partial data for many different examples.
Most simulation problems like fluid flows are often poorly represented by a single mean solution. E.g., for many practical applications involving turbulence, it is crucial to access the full distribution of possible flow states, from which relevant statistics (e.g., RMS and two-point correlations) can be derived. This is where diffusion models can leverage their strengths: instead of having to simulate a lengthy transient phase to converge towards an equilibrium state, diffusion models can completely skip the transient warm-up, and directly produce the desired samples. Hence, this allows for computing the relevant flow statistics very efficiently compared to classic solvers.
In the following, we'll demonstrate these capabilities based on the diffusion graph net (DGN) approach {cite}lino2025dgn
, the full source code for which can be found here.
To learn the probability distribution of dynamical states of physical systems, defined by their discretization mesh and their physical parameters, the DDPM and flow matching frameworks can directly be applied to the mesh nodes. Additionally, DGN introduces a second model variant, which operates in a pre-trained semantic latent space rather than directly in the physical space (this variant will be called LDGN).
In contrast to relying on regular grid discretizations as in previous sections, the system’s geometry is now represented using a mesh with nodes
---
height: 180px
name: probmodels-graph-over
---
(a) DGN learns the probability distribution of the systems' converged states provided only a short trajectory of length $\delta << T$ per system. (b) An example with a turbulent wing experiment. The distribution learned by the LDGN model accurately captures the variance of all states (bottom right), despite seeing only an incomplete distribution for each wing during training (top right).
In many engineering applications, such as aerodynamics and structural vibrations, the primary focus is not on each individual state along the trajectory, but rather on the statistics that characterize the system’s dynamics. However, simulating a trajectory of converged states
Given a dataset of short trajectories from
We'll use DDPM (and later flow matching) to generate states
In the diffusion (or forward) process, node features from
with
$$ \begin{aligned} {\mu}\theta^r = \frac{1}{\sqrt{\alpha_r}} \left( {Z}^r - \frac{\beta_r}{\sqrt{1-\bar{\alpha}r}} {\epsilon}\theta^r \right), \qquad {\Sigma}\theta^r = \exp\left( \mathbf{v}\theta^r \log \beta_r + (1-\mathbf{v}\theta^r)\log \tilde{\beta}_r \right), \end{aligned} $$
with $\tilde{\beta}r := (1 - \bar{\alpha}{r-1}) / (1 - \bar{\alpha}r) \beta_r$. Here, ${\epsilon}\theta^r \in \mathbb{R}^{|\mathcal{V}| \times F}$ predicts the noise
DGNs predict ${\epsilon}\theta^r$ and $\mathbf{v}\theta^r$ using a regular message-passing-based GNN {cite}sanchez2020learning
. This takes
$$ \begin{aligned} [{\epsilon}\theta^r, \mathbf{v}\theta^r] \leftarrow \text{{DGN}}_\theta({Z}^{r-1}, {\mathcal{G}}, {V}_c, {E}_c, r). \end{aligned} $$
The DGN network is trained using the hybrid loss function proposed in "Improved Denoising Diffusion Probabilistic Models" by Nichol et al. The full denoising process requires
DGN follows the widely used encoder-processor-decoder GNN architecture. In addition to the node and edge encoders, the encoder includes a diffusion-step encoder, which generates a vector ${r}\text{emb} \in \mathbb{R}^{F\text{emb}}$ that embeds the diffusion step
$$ \begin{aligned} {r}_\text{emb} \leftarrow \phi \circ {\small \text{Linear}} \circ {\small \text{SinEmb}} (r), \quad {v}_i \leftarrow {\small \text{Linear}} \left( \left[ \phi \circ {\small \text{Linear}} ({v}i^c) \ | \ {r}\text{emb} \right] \right), \quad \forall i \in \mathcal{V}, \end{aligned} $$
where
$$ \begin{aligned} \mathbf{e}{ij} &\leftarrow W_e \mathbf{e}{ij} + \text{MLP}^e \left( \text{LN} \left([\mathbf{e}{ij}|\mathbf{v}{i}|\mathbf{v}{j}] \right) \right), \qquad \forall (i,j) \in \displaystyle \mathcal{E},\ \bar{\mathbf{e}}{j} &\leftarrow \sum_{i \in \mathcal{N}^-j} \mathbf{e}{ij}, \qquad \forall j \in \displaystyle \mathcal{V},\ \mathbf{v}_j &\leftarrow W_v \mathbf{v}j + \text{MLP}^v \left( \text{LN} \left( [\bar{\mathbf{e}}{j} | \mathbf{v}_j]\right) \right), \qquad \forall j \in \displaystyle \mathcal{V}. \end{aligned} $$
Previous work on graph-based diffusion models has used sequential message passing to propagate node features across the graph. However, this approach fails for large-scale phenomena, such as the flows studied in the context of DGN, as denoising of global features becomes bottlenecked by the limited reach of message passing.
To address this, a multi-scale GNN is adopted for the processor, applying message passing on
---
height: 200px
name: probmodels-graph-pooling
---
Message passing is applied on ${\mathcal{G}}$ and multiple coarsened versions of it in a U-Net fashion. The lower-resolution graphs are obtained using a mesh coarsening algorithm popularised in CFD applications.
Diffusion models can also operate in a lower-dimensional graph-based representation that is perceptually equivalent to
---
height: 220px
name: probmodels-graph-arch
---
(a) The VGAE consists of a condition encoder, a (node) encoder, and a (node) decoder. The multi-scale latent features from the condition encoder serve as conditioning inputs to both the encoder and the decoder. (b) During LDGN inference, Gaussian noise is sampled in the VGAE latent space and, after multiple denoising steps conditioned on the low-resolution outputs from the VGAE's condition encoder, transformed into the physical space by the VGAE's decoder.
In this configuration, the VGAE captures high-frequency information (e.g., spatial gradients and small vortices), while the LDGN focuses on modeling mid- to large-scale patterns (e.g., the wake and vortex street). By decoupling these two tasks, the generative learning process is simplified, allowing the LDGN to concentrate on more meaningful latent representations that are less sensitive to small-scale fluctuations. Additionally, during inference, the VGAE’s decoder helps remove residual noise from the samples generated by the LDGN. This approach significantly reduces sampling costs since the LDGN operates on a smaller graph rather than directly on
For the VGAE, an encoder-decoder architecture is used with an additional condition encoder to handle conditioning inputs. The condition encoder processes
where MP denotes a message-passing layer and GraphPool denotes a graph-pooling layer.
The encoder produces two
and a bottleneck block:
The output features are passed through a node-wise MLP that returns
The VGAE is trained to reconstruct states
Unlike in conventional VGAEs, the condition encoder is necessary because, at inference time, an encoding of
Let's directly turn to a complex case to illustrate the capabilities of DGN. (A more basic case will be studied in the Jupyter notebook on the following page.)
The Wing experiments of the DGN project target wings in 3D turbulent flow, characterized by detailed vortices that form and dissipate on the wing surface. This task is particularly challenging due to the high-dimensional, chaotic nature of turbulence and its inherent multi-scale interactions across a wide range of scales.
The geometry of the wings varies in terms of relative thickness, taper ratio, sweep angle, and twist angle.
These simulations are computationally expensive, and using GNNs allows us to concentrate computational effort on the wing's surface, avoiding the need for costly volumetric fields. A regular grid around the wing would require over
A high accuracy for each sample does not necessarily imply that a model is learning the true distribution. In fact, these properties often conflict. For instance, in VGAEs, the KL-divergence penalty allows control over whether to prioritize sample quality or mode coverage.
To evaluate how well models capture the probability distribution of system states, we use the Wasserstein-2 distance. This metric can be computed in two ways: (i) by treating the distribution at each node independently and averaging the result across all nodes, or (ii) by considering the joint distribution across all nodes in the graph. These metrics are denoted as
To ensure stable results when computing these metrics, the target distribution is represented by 2,500 consecutive states, and the predicted one by 3,000 samples. While the trajectories in the training data are long enough to capture the mean flow, they fall short of capturing the standard deviation, spatial correlations, or higher-order statistics. Despite these challenges, the DGN, and especially the LDGN, are capable of accurately learning the complete probability distributions of the training trajectories and accurately generating new distribution for both in- and out-of-distribution physical settings. The figure below shows a qualitative evaluation together with correlation measurements. Both DGN variants also fare much better than the Gaussian-Mixture model baseline denoted as GM-GNN.
---
height: 220px
name: probmodels-graph-wing
---
(a) The _Wing_ task targets pressure distributions on a wing in 3D turbulent flow. (b) The standard deviation of the distribution generated by the LDGN is the closest to the ground-truth (shown here in terms or correlation).
In terms of Wasserstein distance
Comparisons between runtimes of different implementations always should be taken with a grain of salt.
Nonetheless, for the Wing experiments, the ground-truth simulator, running on 8 CPU threads, required 2,989 minutes to simulate the initial transient phase plus 2,500 equilibrium states. This duration is just enough to obtain a well converged variance. In contrast, the LDGN model took only 49 minutes on 8 CPU threads and 2.43 minutes on a single GPU to generate 3,000 samples.
If we consider the generation of a single converged state (for use as an initial condition in another simulator, for example), the speedup is four orders of magnitude on the CPU, and five orders of magnitude on the GPU.
Thanks to its latent space, the LDGN model is not only more accurate, but also
These results indicate that diffusion modeling in the context of unstructured simulations represent a significant step towards leveraging probabilistic methods in real-world engineering applications. To highlight the aspects of DGN and its implementation, we now turn to a simpler test case that can be analyzed in detail within a Jupyter notebook.