LlamaGen: Autoregressive Model Beats Diffusion

LlamaGen: Autoregressive Model Beats Diffusion

License: MIT
Model Type: Image Generation
LlamaGen is a family of autoregressive image generation models that apply the next-token prediction paradigm of large language models to the visual domain. By scaling appropriately and reexamining design spaces of image tokenizers, LlamaGen achieves state-of-the-art image generation performance without relying on inductive biases tailored for vision.

Key Features

  • Class-conditional image generation models ranging from 100M to 3.1B parameters.
  • Text-conditional image generation model with 775M parameters.
  • Image tokenizers with downsample ratios of 16 and 8.
  • Achieves a 2.18 FID on the ImageNet 256×256 benchmark.
  • Demonstrates competitive performance in visual quality and text alignment.
  • Utilizes vLLM serving framework to achieve 326%–414% speedup in inference.
  • Releases pre-trained model weights and training/sampling PyTorch codes.
  • Provides online demos and Gradio-based interfaces for interactive use.
  • Supports both class-conditional and text-conditional image generation tasks.

Project Screenshots

Project Screenshot