- [2025-6-26] ๐๐๐ Lumina-Image 2.0 is accepted by ICCV 2025.
- [2025-4-21] ๐๐๐ We have released Lumina-Accessory, which supports single-task and multi-task fine-tuning for controllable generation, image editing, and identity preservation based on Lumina-Image 2.0.
- [2025-3-28] ๐๐๐ We are excited to announce the release of the Lumina-Image 2.0 Tech Report. We welcome discussions and feedback!
- [2025-2-20] Diffusers team released a LoRA fine-tuning script for Lumina2. Find out more here.
- [2025-2-12] Lumina 2.0 is now available in Diffusers. Check out the docs to know more.
- [2025-2-10] The official Hugging Face Space for Lumina-Image 2.0 is now available.
- [2025-2-10] Preliminary explorations of video generation with Lumina-Video 1.0 have been released.
- [2025-2-5] ComfyUI now supports Lumina-Image 2.0! ๐ Thanks to ComfyUI@ComfyUI! ๐ Feel free to try it out! ๐
- [2025-1-31] We have released the latest .pth format weight file Hugging Face.
- [2025-1-25] ๐๐๐ We are excited to release
Lumina-Image 2.0
, including:- ๐ฏ Checkpoints, Fine-Tuning and Inference code.
- ๐ฏ Website & Demo are live now! Check out the Huiying and Gradio Demo!
- Inference
- Checkpoints
- Web Demo (Gradio)
- Finetuning code
- ComfyUI
- Diffusers
- LoRA
- Technical Report
- Unified multi-image generation
- Control
- PEFT (LLaMa-Adapter V2)
Demo.mp4
Resolution | Parameter | Text Encoder | VAE | Download URL |
---|---|---|---|---|
1024 | 2.6B | Gemma-2-2B | FLUX-VAE-16CH | hugging face |
git clone https://github.com/Alpha-VLLM/Lumina-Image-2.0.git
conda create -n Lumina2 python=3.11 -y
conda activate Lumina2
cd Lumina-Image-2.0
pip install -r requirements.txt
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl --no-build-isolation
Kindly find proper flash-attn version from this link.
You can place the links to your data files in ./configs/data.yaml
. Your image-text pair training data format should adhere to the following:
{
"image_path": "path/to/your/image",
"prompt": "a description of the image"
}
Note
Since gemma2-2B requires authentication, youโll need a Huggingface Access Token and pass it via the --hf_token
argument.
bash scripts/run_1024_finetune.sh
We support multiple solvers including Midpoint Solver, Euler Solver, and DPM Solver for inference.
Note
You can also directly download from huggingface. We have uploaded the .pth weight files, and you can simply specify the --ckpt
argument as the download directory.
python demo.py \
--ckpt /path/to/your/ckpt \
--res 1024 \
--port 10010 \
--hf_token xxx
-
--model_dir
: provide the path to your local checkpoint directory or specifyAlpha-VLLM/Lumina-Image-2.0
. -
--cap_dir
: point to either- a JSON file that contains a
"prompt"
field, or - a plain-text file with one prompt per line.
- a JSON file that contains a
bash scripts/sample.sh
import torch
from diffusers import Lumina2Pipeline
pipe = Lumina2Pipeline.from_pretrained("Alpha-VLLM/Lumina-Image-2.0", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
prompt = "A serene photograph capturing the golden reflection of the sun on a vast expanse of water. "
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=4.0,
num_inference_steps=50,
cfg_trunc_ratio=0.25,
cfg_normalization=True,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("lumina_demo.png")
We are hiring interns and full-time researchers at the Alpha VLLM Group, Shanghai AI Lab. If you are interested, please contact [email protected].
If you find the provided code or models useful for your research, consider citing them as:
@misc{lumina2,
author={Qi Qin and Le Zhuo and Yi Xin and Ruoyi Du and Zhen Li and Bin Fu and Yiting Lu and Xinyue Li and Dongyang Liu and Xiangyang Zhu and Will Beddow and Erwann Millon and Victor Perez,Wenhai Wang and Yu Qiao and Bo Zhang and Xiaohong Liu and Hongsheng Li and Chang Xu and Peng Gao},
title={Lumina-Image 2.0: A Unified and Efficient Image Generative Framework},
year={2025},
eprint={2503.21758},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/pdf/2503.21758},
}