RaveFussion

RaveFussion

RaveFussion is a text-to-audio generative pipeline that combines a fine-tuned Riffusion model (based on Stable Diffusion) with RAVE, an audio-to-audio autoencoder. The system translates text input into audio features via Riffusion and then refines these into high-quality waveforms using RAVE.

Key Features

  • Text-to-audio synthesis using a fine-tuned diffusion model
  • Integration of RAVE autoencoder for audio enhancement
  • Two-stage generation: visual embeddings to audio embedding to final waveform
  • Configuration and scripting based on Python environment
  • Support for experimentation and research in generative audio