ElevenLabs Clone

Category: Other

License: MIT

Model Type: Speech Synthesis

A self‑hosted full‑stack AI audio platform offering capabilities similar to ElevenLabs. It supports text‑to‑speech, voice conversion, and creative text‑to‑audio generation. The system combines Dockerized model containers, a FastAPI backend, and a Next.js frontend for a complete, interactive experience.

Key Features

Text‑to‑speech synthesis using StyleTTS2 models
Voice cloning/conversion with Seed‑VC
Text‑to‑audio creative synthesis using Make‑An‑Audio module
Fine‑tuning support for custom voice identities
Containerized deployment via Docker and Docker Compose
FastAPI backend providing scalable inference endpoints
Modern Next.js frontend featuring voice selection, playback UI, and user history
Authentication via Auth.js and credit-based usage tracking
Job queuing through Inngest to manage model workloads
Integration with AWS S3 for storing generated audio files
Multiple pre-trained voices included
Fully responsive UI built with Tailwind CSS

GitHub

Similar Projects

MMagic – OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox

Other

ElevenLabs Clone

Key Features

Similar Projects

DiffusionDB: A Large-Scale Text-to-Image Prompt Gallery Dataset

Graphite – Procedural 2D Vector and Raster Graphics Editor

Khoj: Self-Hosted AI Second Brain for Research and Automation

Paint-by-Example: Exemplar-based Image Editing with Diffusion Models

Text-to-Music Generation App

MMagic – OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox