CTAG is a research-oriented tool for generating audio from text prompts using a modular synthesizer engine. Developed on top of SynthAX, a JAX-based synthesizer framework, CTAG enables flexible and efficient audio synthesis through textual input. It supports rich configuration via Hydra and integrates LAION-CLAP models for aligning audio output with natural language descriptions. This system was introduced at ICML 2024.
Key Features
Converts descriptive text prompts into synthetic audio
Utilizes SynthAX, a JAX-based modular synthesizer
Supports LAION-CLAP audio-text similarity models
Fully customizable architecture via Hydra configuration
Enables hyperparameter tuning with Ax
Provides example scripts for quick experimentation
Compatible with CPU and GPU environments
Includes reproducible generation pipelines and model checkpoints