Skip to content

Alpha-VLLM/Lumina-Image-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

52 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation


Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Lumina-Nextย  Badgeย 

Static Badge Static Badge Lumina-Nextย 

Static Badgeย  Static Badgeย 

๐Ÿ“ฐ News

  • [2025-6-26] ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Lumina-Image 2.0 is accepted by ICCV 2025.
  • [2025-4-21] ๐Ÿš€๐Ÿš€๐Ÿš€ We have released Lumina-Accessory, which supports single-task and multi-task fine-tuning for controllable generation, image editing, and identity preservation based on Lumina-Image 2.0.
  • [2025-3-28] ๐Ÿ‘‹๐Ÿ‘‹๐Ÿ‘‹ We are excited to announce the release of the Lumina-Image 2.0 Tech Report. We welcome discussions and feedback!
  • [2025-2-20] Diffusers team released a LoRA fine-tuning script for Lumina2. Find out more here.
  • [2025-2-12] Lumina 2.0 is now available in Diffusers. Check out the docs to know more.
  • [2025-2-10] The official Hugging Face Space for Lumina-Image 2.0 is now available.
  • [2025-2-10] Preliminary explorations of video generation with Lumina-Video 1.0 have been released.
  • [2025-2-5] ComfyUI now supports Lumina-Image 2.0! ๐ŸŽ‰ Thanks to ComfyUI@ComfyUI! ๐Ÿ™Œ Feel free to try it out! ๐Ÿš€
  • [2025-1-31] We have released the latest .pth format weight file Hugging Face.
  • [2025-1-25] ๐Ÿš€๐Ÿš€๐Ÿš€ We are excited to release Lumina-Image 2.0, including:
    • ๐ŸŽฏ Checkpoints, Fine-Tuning and Inference code.
    • ๐ŸŽฏ Website & Demo are live now! Check out the Huiying and Gradio Demo!

๐Ÿ“‘ Open-source Plan

  • Inference
  • Checkpoints
  • Web Demo (Gradio)
  • Finetuning code
  • ComfyUI
  • Diffusers
  • LoRA
  • Technical Report
  • Unified multi-image generation
  • Control
  • PEFT (LLaMa-Adapter V2)

๐ŸŽฅ Demo

Demo.mp4

๐ŸŽจ Qualitative Performance

Qualitative Results

๐Ÿ“Š Quantitative Performance

Quantitative Results

๐ŸŽฎ Model Zoo

Resolution Parameter Text Encoder VAE Download URL
1024 2.6B Gemma-2-2B FLUX-VAE-16CH hugging face

๐Ÿ’ป Finetuning Code

1. Create a conda environment and install PyTorch

git clone https://github.com/Alpha-VLLM/Lumina-Image-2.0.git
conda create -n Lumina2 python=3.11 -y
conda activate Lumina2

2.Install dependencies

cd Lumina-Image-2.0
pip install -r requirements.txt
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl --no-build-isolation

Kindly find proper flash-attn version from this link.

3. Prepare data

You can place the links to your data files in ./configs/data.yaml. Your image-text pair training data format should adhere to the following:

{
    "image_path": "path/to/your/image",
    "prompt": "a description of the image"
}

4. Start finetuning

Note

Since gemma2-2B requires authentication, youโ€™ll need a Huggingface Access Token and pass it via the --hf_token argument.

bash scripts/run_1024_finetune.sh

๐Ÿš€ Inference Code

We support multiple solvers including Midpoint Solver, Euler Solver, and DPM Solver for inference.

Note

You can also directly download from huggingface. We have uploaded the .pth weight files, and you can simply specify the --ckpt argument as the download directory.

Gradio Demo

python demo.py \
    --ckpt /path/to/your/ckpt \
    --res 1024 \
    --port 10010 \
    --hf_token xxx

Direct Batch Inference

  • --model_dir: provide the path to your local checkpoint directory or specify Alpha-VLLM/Lumina-Image-2.0.

  • --cap_dir: point to either

    • a JSON file that contains a "prompt" field, or
    • a plain-text file with one prompt per line.
bash scripts/sample.sh

Diffusers inference

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("Alpha-VLLM/Lumina-Image-2.0", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A serene photograph capturing the golden reflection of the sun on a vast expanse of water. "
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=0.25,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("lumina_demo.png")

๐Ÿ”ฅ Open Positions

We are hiring interns and full-time researchers at the Alpha VLLM Group, Shanghai AI Lab. If you are interested, please contact [email protected].

๐ŸŒŸ Star History

Star History Chart

Citation

If you find the provided code or models useful for your research, consider citing them as:

@misc{lumina2,
    author={Qi Qin and Le Zhuo and Yi Xin and Ruoyi Du and Zhen Li and Bin Fu and Yiting Lu and Xinyue Li and Dongyang Liu and Xiangyang Zhu and Will Beddow and Erwann Millon and Victor Perez,Wenhai Wang and Yu Qiao and Bo Zhang and Xiaohong Liu and Hongsheng Li and Chang Xu and Peng Gao},
    title={Lumina-Image 2.0: A Unified and Efficient Image Generative Framework},
    year={2025},
    eprint={2503.21758},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/pdf/2503.21758}, 
}

About

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published