Milo, the AVR mascot — a friendly blue android

● OPEN SOURCE / SELF-HOSTABLE / ASTERISK-NATIVE

TALK TO YOUR INFRASTRUCTURE

Agent Voice Response turns any Asterisk PBX into a real-time, interruptible, sub-second voice AI agent. Your providers, your servers, your rules. Free forever.

Read the docs

OpenAI◆Anthropic◆Deepgram◆ ElevenLabs◆Google◆Ollama◆ Vosk◆Kokoro◆Ultravox◆ OpenRouter◆CoquiTTS◆Silero◆

01 THE SIGNAL

Every phone call is a stream of signal waiting for a mind on the other end. AVR puts one there — listening, thinking and answering in the same breath — without sending your callers' voices to anyone you didn't choose.

Built as composable microservices: a core that speaks Asterisk's AudioSocket natively, with hot-swappable ASR, LLM and TTS engines. Run it all on your own metal with Ollama and Vosk, or wire in OpenAI Realtime — same dialplan, your call.

No license keys. No per-minute tax. No vendor gravity. AVR is free for personal and commercial use, developed in the open, and already answering calls for thousands of developers and businesses.

02 CAPABILITIES

Engineered for the speed of speech

F·01

Real-time speech-to-speech

Ultra-low-latency conversations with sub-second round trips. The pause between question and answer is shorter than a human "um".

< 800ms RTT

F·02

Voice activity detection

Callers can interrupt mid-sentence and the agent yields instantly — barge-in handling that makes the conversation feel human, not IVR.

BARGE-IN NATIVE

F·03

Noise suppression

AI-powered noise cancellation and echo suppression scrub the line before the model ever hears it. Call centers sound like studios.

DENOISE + ECHO

F·04

Asterisk-native integration

First-class AudioSocket support for Asterisk 18+. Drops into FreePBX, VitalPBX, Vicidial, Elastix and custom dialplans without adapters.

AUDIOSOCKET / 18+

03 ANATOMY OF A CALL

One breath, six hops

CallerPSTN / SIP

AsteriskAudioSocket

AVR CoreVAD + DENOISE

ASR≈150ms

LLM≈300ms

TTS≈150ms

» Caller speaks. Raw audio enters the PBX. » Asterisk streams PCM over AudioSocket — no transcoding detours. » AVR Core detects speech, scrubs noise, segments the utterance. » ASR turns the waveform into words. Deepgram, Google, or local Vosk. » The LLM reasons and drafts a reply. GPT, Claude, or local Ollama. » TTS speaks. Total round trip: under one second. Caller never noticed.

04 PATCH PANEL

Your stack, your switchboard

Every stage of the pipeline is hot-swappable. Mix cloud horsepower with local privacy — per agent, per call, per line of dialplan.

ASR — SPEECH RECOGNITION

GoogleCLOUD
DeepgramCLOUD
ElevenLabsCLOUD
SonioxCLOUD
VoskLOCAL

LLM — REASONING

OpenAICLOUD
AnthropicCLOUD
OpenRouterCLOUD
OllamaLOCAL

TTS — SPEECH SYNTHESIS

GoogleCLOUD
DeepgramCLOUD
ElevenLabsCLOUD
CartesiaCLOUD
SonioxCLOUD
RimeCLOUD
CoquiTTSLOCAL
KokoroLOCAL

REALTIME — SPEECH-TO-SPEECH

OpenAI RealtimeCLOUD
UltravoxCLOUD
DeepgramCLOUD
ElevenLabsCLOUD
xAICLOUD
SarvamCLOUD
GeminiCLOUD
HumeAICLOUD
SpeechmaticsCLOUD

05 ON THE LINE

U·01

24/7 customer support

Answer every call on the first ring, at 3 a.m., in any language, without a queue.

U·02

Contact-center automation

Deflect tier-one volume on Vicidial and FreePBX floors while agents take the calls that need a human.

U·03

Healthcare assistants

Appointment booking and triage lines that run on your own servers — where patient audio stays.

U·04

Voice commerce

Take orders, check stock, and upsell over the phone with an agent that never misquotes a price.

U·05

Interactive voice learning

Language drills and training hotlines with a tutor that listens, corrects, and never gets tired.

U·06

IoT voice control

A phone number as a control plane — call your building, your fleet, your lab, and tell it what to do.

0msvoice-to-voice round trip

0%uptime reliability

0%open source, forever

$0license cost, cloud or on-prem

06 FREE AS IN SPEECH

No license keys.
No per-minute tax.
No catch.

AVR is provided free of charge for personal and commercial use. The code is on GitHub, the images are on Docker Hub, and the roadmap is argued about in public on Discord.

If it saves your team money, you can buy the maintainers a coffee — donations are voluntary and buy you exactly nothing extra. That's the point.

Star on GitHub ↗ Docker Hub Donate ♥

07 STATION MANUAL

Asked on the line

How does real-time speech-to-speech actually work?

Audio streams from Asterisk into AVR Core over AudioSocket. Voice activity detection segments speech as it happens, ASR transcribes incrementally, the LLM streams its reply, and TTS speaks it back — all stages pipelined so the total round trip stays under a second. With realtime providers like OpenAI Realtime or Ultravox, ASR/LLM/TTS collapse into a single speech-native model.

Does it work with my FreePBX / VitalPBX / Vicidial setup?

Yes. AVR speaks AudioSocket, which is native to Asterisk 18+. Anything built on Asterisk — FreePBX, VitalPBX, Vicidial, Elastix, or a hand-rolled dialplan — can route calls to an AVR agent with a few lines of configuration.

Can I run everything locally, with no cloud at all?

Completely. Pair Vosk or Silero for recognition, Ollama for reasoning, and CoquiTTS or Kokoro for synthesis. Caller audio never leaves your network — which is why healthcare and finance teams pick this configuration.

How do callers interrupt the agent?

AVR's voice activity detection runs continuously, even while the agent is speaking. The instant a caller starts talking, playback stops and the new utterance takes priority — the barge-in behaviour you'd expect from a human operator.

Will it scale to a high-volume call center?

AVR is a set of stateless microservices — core, ASR, LLM, TTS — shipped as Docker images. Scale each stage horizontally behind your load balancer; the architecture is the same one running high-volume contact-center floors today.

How do I get started?

Pull the images from Docker Hub, point the compose file at your providers, and add an extension to your dialplan. The documentation walks through a working agent in minutes, and the Discord is where everyone compares configs.