RaveFussion

License: MIT

Model Type: Speech Synthesis

RaveFussion is a text-to-audio generative pipeline that combines a fine-tuned Riffusion model (based on Stable Diffusion) with RAVE, an audio-to-audio autoencoder. The system translates text input into audio features via Riffusion and then refines these into high-quality waveforms using RAVE.

Key Features

Text-to-audio synthesis using a fine-tuned diffusion model
Integration of RAVE autoencoder for audio enhancement
Two-stage generation: visual embeddings to audio embedding to final waveform
Configuration and scripting based on Python environment
Support for experimentation and research in generative audio

GitHub

Similar Projects

RaveFussion

Key Features

Similar Projects

Creative Text-to-Audio Generation (CTAG)

PotPlayer ChatGPT Subtitle Translator

AudioLDM Google Colab Interface

Soundstorm – AI‑Powered Audio Toolkit

Generative AI Use Cases: Practical Applications for Business Operations

GANSpace: Discovering Interpretable GAN Controls