Android DevHub

LlamaGen: Autoregressive Model Beats Diffusion

LlamaGen: Autoregressive Model Beats Diffusion

Category: Computer Vision

License: MIT

Model Type: Image Generation

LlamaGen is a family of autoregressive image generation models that apply the next-token prediction paradigm of large language models to the visual domain. By scaling appropriately and reexamining design spaces of image tokenizers, LlamaGen achieves state-of-the-art image generation performance without relying on inductive biases tailored for vision.

Key Features

Class-conditional image generation models ranging from 100M to 3.1B parameters.
Text-conditional image generation model with 775M parameters.
Image tokenizers with downsample ratios of 16 and 8.
Achieves a 2.18 FID on the ImageNet 256×256 benchmark.
Demonstrates competitive performance in visual quality and text alignment.
Utilizes vLLM serving framework to achieve 326%–414% speedup in inference.
Releases pre-trained model weights and training/sampling PyTorch codes.
Provides online demos and Gradio-based interfaces for interactive use.
Supports both class-conditional and text-conditional image generation tasks.

GitHub Live Demo Github

Project Screenshots

Project Screenshot

Similar Projects

StyleGAN3: Alias-Free Generative Adversarial Networks

StyleGAN3: Alias-Free Generative Adversarial Networks

Computer Vision

OmniGen – Unified Image Generation Model

OmniGen – Unified Image Generation Model

Computer Vision

Diffusers: State-of-the-Art Diffusion Models for Image, Video, and Audio Generation

Diffusers: State-of-the-Art Diffusion Models for Image, Video, and Audio Generation

Computer Vision

Image Super-Resolution via Iterative Refinement (SR3)

Image Super-Resolution via Iterative Refinement (SR3)

Computer Vision

Stable Diffusion: High-Resolution Text-to-Image Synthesis with Latent Diffusion Models

Stable Diffusion: High-Resolution Text-to-Image Synthesis with Latent Diffusion Models

Computer Vision

Awesome Segment Anything

Awesome Segment Anything

Computer Vision