Change image dimensions requirement for DiT models #742

stduhpf · 2025-07-26T18:43:12Z

Currently, there is a requirement on the CLI to ensure that the image dimensions are multiple of 64. This is because the unet architecture of stable diffusion expects the latent image to be divisible by 8 (8 latent pixels is 64 pixels) in each dimension, but the supported DiT models like SD3 and Flux don't have the same requirements

With Flux and SD3.x, the image dimensions need to be divisible by 8 for the VAE, but then the transformer can run without crashing on latent images of arbitrary size. I've noticed that the images look very broken if the dimensions of the latents are odd (and it would also break VAE tiling), so I added a requirement for the dimensions to be multiple of 16. (looks like a positional encoding issue that could be fixable?)

As there is no way to know the architecture of the model from the CLI, I had to move the check to the stable-diffusion.cpp file in the generate_image function. The downside of doing so is that the model has to be loaded (which can take some time) at the creation of the sd_ctx, before we can check if the image dimensions are correct.

wbruna · 2025-07-26T20:12:50Z

Since 64 is a multiple of 16, perhaps it'd be better to keep the test for 16 at the command line validation, and only check for 64 for non-DiT models?

Or maybe rounding instead of validating could be a bit more user-friendly.

stduhpf · 2025-07-26T21:06:55Z

Since 64 is a multiple of 16, perhaps it'd be better to keep the test for 16 at the command line validation, and only check for 64 for non-DiT models?

Maybe adding the test for 16 in the cli is better, but I think it should be kept in the generate function anyways, as sd.cpp is supposed to be able to be used as a library and is not just a CLI program.

Or maybe rounding instead of validating could be a bit more user-friendly.

I thought about something like that, but on the other hand it messes with the users input wich could cause some problems. Plus there's always the question: How should we round? Up, down or nearest?

On a related note if we're rounding anyways, we might as well add padding around input images in img2img mode to match the rounding.

wbruna · 2025-07-26T22:44:53Z

Maybe adding the test for 16 in the cli is better, but I think it should be kept in the generate function anyways, as sd.cpp is supposed to be able to be used as a library and is not just a CLI program.

True.

How should we round? Up, down or nearest?

Or round to the nearest aspect ratio, like I've implemented for Koboldcpp:
https://github.com/LostRuins/koboldcpp/blob/concedo/otherarch/sdcpp/sdtype_adapter.cpp#L352

On a related note if we're rounding anyways, we might as well add padding around input images in img2img mode to match the rounding.

It would make sense, yeah.

I also miss an option to preserve the input image aspect ratio while resizing (but cropping instead of padding). A way to adjust the image placement for outpainting would be very useful, too... Far too complex for this same PR, of course :-)

leejet · 2025-07-28T13:58:25Z

Thank you for your contribution.

Change image dimensions requirement for DiT models

e4aac89

leejet merged commit 59080d3 into leejet:master Jul 28, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change image dimensions requirement for DiT models #742

Change image dimensions requirement for DiT models #742

Uh oh!

stduhpf commented Jul 26, 2025 •

edited

Loading

Uh oh!

wbruna commented Jul 26, 2025

Uh oh!

stduhpf commented Jul 26, 2025

Uh oh!

wbruna commented Jul 26, 2025

Uh oh!

Uh oh!

leejet commented Jul 28, 2025

Uh oh!

Uh oh!

Change image dimensions requirement for DiT models #742

Change image dimensions requirement for DiT models #742

Uh oh!

Conversation

stduhpf commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Jul 26, 2025

Uh oh!

stduhpf commented Jul 26, 2025

Uh oh!

wbruna commented Jul 26, 2025

Uh oh!

Uh oh!

leejet commented Jul 28, 2025

Uh oh!

Uh oh!

stduhpf commented Jul 26, 2025 •

edited

Loading