Skip to main content

DEVELOPER

Exciting Updates: Voice Agent SDK & Framework APIs

Published: 16 April 2026

Developers are at the heart of our ecosystem. We’re building tools that make it easy to integrate voice into any application—whether you’re prototyping a voice bot, building multilingual customer support or powering an embedded device. Today we’d like to share a sneak peek into our Voice Agent SDK, our Framework APIs and the technical foundations that will soon be available for you to build on.

Voice Agent SDK

Our upcoming SDK is designed for simplicity, flexibility and performance:

  • Multi‑platform: Native support for Python, JavaScript (Node.js and browser), and mobile platforms (Android/iOS). You can embed voice interactions in a web app or a microcontroller.
  • Streaming architecture: Full duplex streaming using WebSocket/gRPC. The SDK handles end‑to‑end audio capture, encoding, transcription, translation and synthesis, returning real‑time responses.
  • Language and dialect detection: Built‑in language identification lets your bot adapt to speakers who switch between Hindi, English and regional languages mid‑sentence, taking advantage of our code‑switching research.
  • Low latency: Optimised pipelines and caching ensure sub‑200 ms round trips for interactive experiences.
  • Security and consent: The SDK includes encryption, user‑auth tokens, watermarking and consent prompts, reflecting best practices from voice cloning research.

Framework APIs

We’re also releasing a suite of modular APIs:

  • Adapters: Protocol‑agnostic adapters for REST, gRPC and WebSocket. These provide unified access to our ASR, TTS, translation and voice cloning models.
  • Models: Parameterised endpoints to fine‑tune inference behaviour (e.g., specify target language, pick a synthetic voice or adjust speaking style).
  • Authentication & metering: Secure API keys, role‑based access control and usage metrics. Ready for deployment at scale.
  • Event hooks: Webhooks and Pub/Sub events for streaming transcripts, sentiment scores and user‑level analytics, enabling event‑driven architectures.

Roadmap and Collaboration

Here are the milestones we’re working towards:

  • Beta release of SDK & adapters: We aim to release an open beta in the coming months with sample applications and documentation. Early adopters can experiment with voice chatbots, cross‑language dictation and interactive voice UIs.
  • Model customisation: A library of pre‑trained voices and the ability to fine‑tune small models on domain‑specific data. Combining our data‑efficient voice cloning pipeline with your brand voice will allow tailored experiences.
  • Community feedback loop: We plan to open source parts of the SDK, invite pull requests and share our roadmap. Developers can submit feature requests or contribute adapters for additional languages and frameworks.
  • Full release: Post‑beta, we will ship the stable SDK and frameworks with comprehensive documentation, CLI tools and integration templates.

Stay tuned for further updates. We can’t wait to see what you build when voice is a native capability in every application.