Tomas
•
January 3, 2025
•
4 minutes read
At Hello, we've been building, deploying, and running voice agents for a range of use cases and at various scales for almost two years now. Before that, I and my co-founder built a venture-backed speech-to-text company focused on transcribing legal proceedings (currently transcribing thousands of court hearings).
While developing our own voice agent, we ran into an interesting challenge: how do you ensure that testing reflects real-world interactions? When testing the agent ourselves, we realized that we were subconsciously adapting our speaking patterns to match what the agent could handle. This phenomenon, known as acoustic convergence, skews results by creating an artificial ideal where the agent appears to perform better than it would in reality. It became clear that to truly perfect our voice agent, we needed more diversity in our testing.
To bring diversity into our voice agent testing, we turned to platforms like Mechanical Turk and Prolific. These platforms gave us access to thousands of participants on demand, providing a wide range of speech styles, accents, and conversational behaviors. This level of diversity was invaluable—not only for refining our agent but for uncovering a surprising new insight: people love to talk.
As participants interacted with our voice interface, we noticed that they shared far more than we expected. Conversations were deeper, more detailed, and often touched on unexpected topics. To quantify this, we analyzed a sample of 1,000 AI interviews conducted through our agent. The results were fascinating:
When we asked participants why they spoke so much, their responses were telling. Many said they found it easier to express their thoughts verbally than through typing. Others mentioned the convenience of speaking while multitasking. Even if there's some selection bias—since we were targeting people willing to engage with a voice agent—it's clear that there's a significant segment of the population that prefers speaking over writing.
For comparison, we ran a control survey using Google Forms. While the written responses were valuable, the diversity and depth of the spoken conversations stood out. Not only did the AI interviews yield richer insights, but the process of speaking also seemed to encourage spontaneity and openness.
This observation led us to a key insight: voice interfaces engage a different mode of thinking. Drawing on Daniel Kahneman's Thinking, Fast and Slow, we believe that voice interactions primarily activate System 1 thinking—fast, intuitive, and emotional. By contrast, written surveys engage System 2 thinking—slow, deliberate, and analytical.
This distinction explains the richer and more diverse responses we see in AI interviews conducted via a voice agent. When people speak, they're less constrained by the analytical filters of their System 2 brain. This allows for more authentic, spontaneous, and varied interactions, which are invaluable for applications like market research, customer feedback, and even entertainment.
One of the most memorable examples came from an AI interview about hobbies. A participant started with a simple answer—painting—but soon launched into an emotional story about how they rediscovered their creativity during the pandemic and began selling their work online. This level of depth and narrative richness is rare in written surveys but common in voice-based interactions.
These moments highlight the transformative potential of voice agents. They don't just collect data; they uncover stories, perspectives, and ideas that might otherwise remain untapped.
At Hello, we believe that voice interfaces are the future of human-computer interaction. By leveraging the intuitive nature of System 1 thinking, they open up entirely new possibilities for engagement. Whether it's gathering insights through AI interviews, creating dynamic customer experiences, or redefining entertainment, voice agents are poised to play a central role.
Every day, as we refine our own voice agent, I'm amazed by what people are willing to share when given the chance to speak. The richness of these conversations underscores the immense potential of voice interfaces to revolutionize how we interact with technology—and with each other.