-
-
Notifications
You must be signed in to change notification settings - Fork 213
How to setup and evaluate validation datasets
Validation helps you clearly understand how well your training is progressing, determine whether your model is learning effectively, and pinpoint the best checkpoints during training. A well-designed validation setup provides visual metrics, like a clear loss curve, enabling precise, data-driven decisions rather than guesswork.

-
On your "concept" page (in your training UI):
-
Add a new concept.
-
Set its type explicitly to VALIDATION.
-
-
Crucially, do not enable:
-
Text variation
-
Image variation (no augmentations)
-
Shuffle
-
Caption dropout
We want the validation process to remain consistent and completely repeatable across training sessions.
-
-
On the general settings page, enable:
-
TensorBoard logging
-
Validation
-
-
Step Validation Intervals:
-
If your dataset has fewer than 500 images, simply set validation intervals to 1 epoch.
-
For larger datasets, calculating steps per epoch helps maintain graph consistency.
Example:-
If your dataset completes an epoch in 1350 steps:
- Set intervals to something divisible by this (e.g., 675 steps, validating twice per epoch; or 450 steps, validating three times per epoch).
-
Perfect divisions are not mandatory, but strongly advised to yield a smoother and clearer validation graph.
-
-
-
Every validation image must be unique and cannot be present in your training dataset.
-
Minor modifications like cropping do NOT count as new images.
-
Select simple, clear images that precisely represent the core of what you want the model to learn.
-
If training a person's likeness, a close-up headshot or portrait is ideal.
-
Avoid overly complex backgrounds, interactions, or busy compositions—simplicity helps clearly evaluate training effectiveness.
-
-
There's no strict minimum or maximum. However:
-
3-5 images: minimal but sufficient to identify basic trends and training trajectory.
-
10-15 images: ideal range, providing smoother and more reliable validation graphs while remaining manageable.
-
-
Select images representing diverse yet clear aspects of your target concept to effectively gauge generalization and concept reproduction.
-
Your validation loss graph should initially show a smooth downward trend as the model improves and internalizes your concept.
-
Eventually, validation loss hits a first low point, which indicates the model has achieved optimal or near-optimal understanding of your dataset.
-
Larger datasets (1000+ images) typically see one clear low-point that’s often your ideal checkpoint.
-
Smaller datasets (30–100 images) may experience fluctuations, causing validation loss to rise after an initial low, before eventually going down again to a new low point.
-
Keep all checkpoints corresponding to these "low points" in your validation curve. (either exactly on the low point, or the next checkpoint that you saved after a low point was hit)
-
Each low-loss checkpoint will have differing strengths:
-
Some excel at precise reproduction of your trained concept.
-
Others offer improved generalization across varied prompts and contexts.
-
-
By focusing exclusively on checkpoints identified clearly by these validation low points, you significantly reduce testing and guesswork after training concludes.