HN
Hey HN, I'm Arkadiy from Leaping AI (https://leapingai.com). Leaping lets you build voice AI agents in a multi-stage, graph-like format that makes testing and improvement much easier. By evaluating each stage of a call, we can trace errors and regressions to a particular stage. Then we autonomously vary the prompt for that stage and A/B test it, allowing agents to self-improve over time.

You can talk to one of our bots directly at https://leapingai.com, and there’s a demo video at https://www.youtube.com/watch?v=xSajXYJmxW4.

Large companies are understandably reluctant to have AI start picking up their phone calls—the technology kind of works, but often not very well. If they do take the plunge, they often end up spending months tuning the prompts for just one use-case, and sometimes never even end up releasing the voice bot.

The problem is two-sided: it's non-trivial to specify the exact way a bot should behave using plain language, and it's tedious to ensure the LLM always follows your instructions the way you intended them.

Existing voice AI solutions are a pain to set up for complex use cases. They require months of prompting all edge cases before going live, and then months of monitoring and improving prompting afterwards. We do that better than human prompters, and much faster, by running a continuous analysis + testing loop.

Our tech is roughly divided into three subcomponents: core library, voice server, and self-improvement logic. Core library models and executes the multi-stage (think n8n-style) voice agents. For the voice server we are using the ol’ reliable cascading way of STT->LLM->TTS. We tried out the voice-to-voice models, and although they felt really great to talk to, function-calling performance was expectedly much worse, so we are still waiting for them to get better.

The self-improvement works by first taking conversation metrics and evaluation results to produce ‘feedback’, i.e. specific ideas how the voice agent setup could be improved. After enough feedback is collected, we trigger a run of a specialized self-improvement agent. It is a cursor-style AI with access to various tools that changes the main voice agent. It can rewrite prompts, configure a stage to use a summarized conversation instead of a full one, and more. Each iteration produces a new snapshot of the agent, enabling us to route a small part of the traffic to it and promote it to production if things look ok. This loop can be set to run without any human involvement, thus making agents self-improve.

Leaping is use-case agnostic, but we currently focus on inbound customer support (travel, retail, real estate, etc.) and lead pre-qualification (medicare, home services, performance marketing) since we have a lot of success stories there.

We started out in Germany since that’s where we were in university, but initially growth was challenging. We decided to target enterprise customers right away and they showed reluctance to adopt voice AI as the front-door ‘face’ of their company. Additionally, for an enterprise with thousands of calls daily, it is infeasible to monitor all the calls and tune agents manually. To address their very valid concerns, we put all effort into reliability—and still haven’t gotten around to offering self-serve access, which is one reason we don’t have fixed pricing yet. (Also, with some clients we have outcome-based pricing, i.e. you pay nothing for calls that didn't convert a lead, only the ones that did.)

Things picked up momentum ever since we got into YC and moved to the US, but the cautious sentiment is also present here if you try to sell to big enterprises. We believe that doing evals, simulation, and A/B testing really really well is our competitive edge and what will enable us to solve large, sensitive use cases.

We’d love to hear your thoughts and feedback!