Voice Stack

Definition

harmony-core’s proprietary voice pipeline. 70ms response latency vs competitor (e.g. elevenlabs-based) ~800ms. Local models for each conversation stage (closing, negotiation, objection-handling) rather than one large monolithic model — enables targeted optimization per stage.

Acquired team built this over ~3 years. This is the engineering asset Saar/Vitali point to when arguing the 14-engineer team can credibly compete against $100M+ funded competitors in 5 months (OQ-012) — the gap is productization, not engineering.

  • DEC-005 — Self-learning is the differentiator; voice stack enables targeted improvement per stage
  • DEC-006 — Voice stack is what enables the July 1 production target

Sources

  • vitali — architect
  • elevenlabs — the comparison baseline (800ms, customer-prompted, vulnerable to OSS disruption)
  • oneai, retell, bland, vapi — competing voice-agent stacks (often ElevenLabs wrappers)

Cost economics

  • Per-call costs in single-digit cents (potentially eliminating Twilio dependency)
  • Lower cost than 11Labs-based competitors