AI Subtitle Generator
Auto-recognises the speech in videos and produces subtitles. Output as SRT, burned-in (hardsub), or both. Optionally translates into another language.
What it does
Use this for Turkish captions on training videos, stylish Pill-format subtitles on social media clips, English-to-Turkish caption pipelines, or to produce SRT files for podcast videos.
How to use
- Drag video files into the list.
- Pick a Whisper Model (quality vs speed).
- Pick a Language (Auto or specific).
- Pick an Output Mode: Burn, SRT, or Both.
- Optionally turn on AI Translation and pick a target language.
- Set the Style (font, colour, position).
- Check the preview panel.
- Click Run.
Output modes
| Mode | What it does |
|---|---|
| Burn (hardsub) | Subtitles are baked into the video, no separate file. |
| SRT | Only produces a .srt file, the video is untouched. |
| Both | Burned video + SRT file. |
| AI Dubbing | Translates, captions and also dubs. |
Whisper models
| Model | Speed | Accuracy |
|---|---|---|
| tiny | Very fast | Low |
| base | Fast | Medium |
| small | Moderate | Good (default) |
| medium | Slow | High |
| large-v3 | Very slow | Highest |
Style settings
- Font: Every font on your machine.
- Size: Between 12 and 72 pt.
- Colour: Text, stroke and shadow.
- Position: Bottom, top, middle.
- Pill mode: TikTok/Reels-style rounded background capsules.
Translation
The NLLB-200 model can translate captions into another language. A manual review dialog opens so you can fix the translation before burning.
Examples
Turkish captions on a training video: Add the video, model small, language Turkish, mode SRT, run. You get a .srt.
Pill captions for social media: Add the video, model small, mode Burn, Pill style on, run. Modern look.
English video with Turkish captions: Add the video, language English, AI Translation on, target Turkish, mode Burn, run.
Professional-grade accuracy: Add important videos, model large-v3, GPU on, run. Takes hours but very accurate.
Watch out
- The Whisper model downloads on first use (100MB-3GB).
- Bigger models (medium, large) need a lot of RAM. A GPU is strongly recommended.
- Translations of technical terms may be off, manual review helps.
- Burned subtitles cannot be removed, keep the original video.
- Noisy recordings hurt accuracy.
License
This tool is Ultimate only. The AI engines are resource-heavy.