SubToAudio: Subtitle-to-Audio Conversion with Coqui TTS

Category: Deep Learning

License: MIT

Model Type: Speech Synthesis

SubToAudio is a Python-based tool that converts subtitle files (e.g., .srt, .ass) into synchronized audio files. It utilizes Coqui TTS, a high-quality open-source text-to-speech engine, to generate speech and aligns the audio timing with the subtitle timestamps. This is particularly useful for creating audio versions of videos or for accessibility purposes.

Key Features

Subtitle Synchronization: Aligns audio output with subtitle timestamps for accurate timing.
Multiple Subtitle Formats Supported: Handles .srt, .ass, and other common subtitle formats.
Coqui TTS Integration: Leverages Coqui TTS for high-quality, open-source text-to-speech synthesis.
Language Support: Supports various languages through Coqui TTS models.
Pretrained Models: Offers pretrained models for quick setup and usage.
Command-Line Interface: Provides a simple CLI for easy integration into workflows.
Colab Notebook: Includes a Colab notebook for interactive usage and experimentation.

GitHub Live Demo Colab

Similar Projects

Auffusion: Leveraging Diffusion and Large Language Models for Text-to-Audio Generation

Deep Learning

OpenMusic: Quality-Aware Diffusion Transformer for Text-to-Music Generation

Deep Learning

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Deep Learning

SubToAudio: Subtitle-to-Audio Conversion with Coqui TTS

Key Features

Similar Projects

TangoFlux: Super‑Fast and Faithful Text‑to‑Audio Generation via Flow Matching

Make‑An‑Audio: Prompt‑Enhanced Diffusion Model for Text‑to‑Audio Generation

Amphion: Real-Time Audio Generation Toolkit by OpenMMLab

Auffusion: Leveraging Diffusion and Large Language Models for Text-to-Audio Generation

OpenMusic: Quality-Aware Diffusion Transformer for Text-to-Music Generation

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations