Voice Stack

Definition

harmony-core’s proprietary voice pipeline. 70ms response latency vs competitor (e.g. elevenlabs-based) ~800ms. Local models for each conversation stage (closing, negotiation, objection-handling) rather than one large monolithic model — enables targeted optimization per stage.

Acquired team built this over ~3 years. This is the engineering asset Saar/Vitali point to when arguing the 14-engineer team can credibly compete against $100M+ funded competitors in 5 months (OQ-012) — the gap is productization, not engineering.

DEC-005 — Self-learning is the differentiator; voice stack enables targeted improvement per stage
DEC-006 — Voice stack is what enables the July 1 production target

Sources

offsite-2026-04-14 — capability discussed
vision-2026-04-15 — used to defend the timeline against Nadav’s skepticism

vitali — architect
elevenlabs — the comparison baseline (800ms, customer-prompted, vulnerable to OSS disruption)
oneai, retell, bland, vapi — competing voice-agent stacks (often ElevenLabs wrappers)

self-learning-voice-agent — the productized expression
self-learning-loop — voice stack’s per-stage models make targeted learning possible
voice-as-feature — voice stack is the feature; brain is the core

Cost economics

Per-call costs in single-digit cents (potentially eliminating Twilio dependency)
Lower cost than 11Labs-based competitors

Harmony Brain

Explorer

voice-stack

Voice Stack

Definition

Sources

Cost economics

Graph View

Table of Contents

Backlinks

Harmony Brain

Explorer

voice-stack

Voice Stack

Definition

Related decisions

Sources

Related entities

Related concepts

Cost economics

Graph View

Table of Contents

Backlinks