Creative Text-to-Audio Generation (CTAG)

Creative Text-to-Audio Generation (CTAG)

CTAG is a research-oriented tool for generating audio from text prompts using a modular synthesizer engine. Developed on top of SynthAX, a JAX-based synthesizer framework, CTAG enables flexible and efficient audio synthesis through textual input. It supports rich configuration via Hydra and integrates LAION-CLAP models for aligning audio output with natural language descriptions. This system was introduced at ICML 2024.

Key Features

  • Converts descriptive text prompts into synthetic audio
  • Utilizes SynthAX, a JAX-based modular synthesizer
  • Supports LAION-CLAP audio-text similarity models
  • Fully customizable architecture via Hydra configuration
  • Enables hyperparameter tuning with Ax
  • Provides example scripts for quick experimentation
  • Compatible with CPU and GPU environments
  • Includes reproducible generation pipelines and model checkpoints