What is the af_whisper filter in FFmpeg?

af_whisper is a new FFmpeg audio filter that integrates whisper.cpp to perform on-device automatic speech recognition (ASR), enabling transcription, subtitle generation, and metadata output directly within FFmpeg workflows.

Which output formats does af_whisper support?

The filter can produce plain text transcripts, SRT subtitle files, and structured JSON metadata. These outputs make it easy to add captions, create searchable archives, or automate downstream processing.

Can af_whisper be used for live streaming and does it support GPU acceleration?

Yes. af_whisper supports both live audio streams and pre-recorded files. It also supports GPU acceleration on compatible hardware to speed up transcription, and includes options like Voice Activation Detection (VAD) and queueing to balance latency and accuracy.

How does af_whisper compare to cloud transcription services?

Unlike cloud APIs, af_whisper can run locally with whisper.cpp, offering lower latency, better privacy, and simpler automation since transcription is handled in a single FFmpeg command without exporting audio to external services.

FFmpeg Integrates af_whisper: On-Device AI Transcription with whisper.cpp

3 Minutes

FFmpeg brings AI transcription to the command line

FFmpeg, the ubiquitous open-source media toolkit, has added a new audio filter called af_whisper that embeds automatic speech recognition (ASR) directly into FFmpeg workflows. Built on the lightweight whisper.cpp runtime, this integration brings a powerful AI transcription model into media processing pipelines, moving FFmpeg beyond traditional encoding and filtering into AI-enabled content handling.

Key features of the af_whisper filter

Model selection and language options

af_whisper supports different whisper.cpp models, letting users pick the balance between speed and accuracy. You can also specify the target language to improve transcription fidelity for multilingual content.

Flexible output formats

The filter can emit plain text, SRT subtitles, or structured JSON metadata. That makes it easy to generate subtitle files for videos and podcasts, feed automatic captions to streaming platforms, or pipe transcription metadata into downstream automation.

Live streaming, VAD, queueing, and GPU acceleration

af_whisper handles both pre-recorded audio and live streams. Voice Activation Detection (VAD) is available to reduce noise and improve accuracy on sparse speech segments. A queue technique allows tuning between transcription latency and precision, and GPU acceleration support can dramatically speed up processing on compatible hardware.

How af_whisper compares to external ASR services

Unlike cloud transcription services, whisper.cpp-powered af_whisper can run locally, offering lower latency, better privacy, and simpler automation. It replaces multi-step external workflows—exporting audio, sending to a cloud API, receiving transcripts—by consolidating everything into a single FFmpeg command line while still supporting high-quality ASR and subtitle generation like SRT.

Advantages for creators and developers

This new filter saves time and reduces complexity for content creators, archivists, journalists, and developers. Benefits include on-device transcription, integrated subtitle generation, output metadata for indexing and search, and a single-tool pipeline that supports automation and batch processing.

Practical use cases

Use cases include creating SRT captions for videos and podcasts, live captioning for streams and broadcasts, searchable transcripts for archives, and automated metadata generation for content management systems. The combination of VAD, GPU support, and flexible outputs makes af_whisper suitable for both real-time applications and large-scale batch jobs.

Market relevance and future directions

Embedding whisper.cpp into FFmpeg sets a precedent for adding more AI and machine learning models to the platform. This move reinforces FFmpeg's position as an industry-standard media tool and signals wider adoption of AI across media tooling. As on-device AI and hybrid workflows grow, expect FFmpeg to continue evolving with additional AI-driven filters and optimizations.

Getting started

To try af_whisper, update to a recent FFmpeg build that includes the filter and explore options for model, language, output format, VAD, and GPU acceleration. For many users, this single-filter approach replaces cumbersome multi-tool transcription pipelines while improving speed, privacy, and automation capability.

Source: neowin

FFmpeg Integrates af_whisper: On-Device AI Transcription with whisper.cpp

FFmpeg brings AI transcription to the command line

Key features of the af_whisper filter

Model selection and language options

Flexible output formats

Live streaming, VAD, queueing, and GPU acceleration

How af_whisper compares to external ASR services

Advantages for creators and developers

Practical use cases

Market relevance and future directions

Getting started

Leave a Comment

Comments

Related Posts

Google Upgrades Circle to Search with AI-Powered Mode

AI Predicts Refugee Flows from Social Media Signals

Battlefield 6 Creator on EA's Kernel Anti-Cheat Impact

Can Apple’s Custom Chips Prevent iPhone 18 Pro Price Hikes

Samsung T7 Resurrected: Eco SSD from Recycled Aluminum

Huawei Pura X2: Kirin 9030 Foldable Arriving Q1 2026

Nothing OS 4.0 Stable: Android 16 Hits Phone (2) Series

Caviar’s Secret Love iPhone 17 Pro Debuts at $10,200

USPTO Says Generative AI Is a Tool, Not an Inventor

Samsung Could Introduce Super Fast Charging 3.0 Soon

OpenAI Confirms Mixpanel Breach Affecting API Users

Huawei Unveils Dawn Pink MateBook Pro 14.2 Laptop Now