A self‑hosted full‑stack AI audio platform offering capabilities similar to ElevenLabs. It supports text‑to‑speech, voice conversion, and creative text‑to‑audio generation. The system combines Dockerized model containers, a FastAPI backend, and a Next.js frontend for a complete, interactive experience.
Key Features
Text‑to‑speech synthesis using StyleTTS2 models
Voice cloning/conversion with Seed‑VC
Text‑to‑audio creative synthesis using Make‑An‑Audio module
Fine‑tuning support for custom voice identities
Containerized deployment via Docker and Docker Compose