"Can you say that again?"
"Can you say that again?"
"Sorry, I didn't catch that."
For millions of people, these phrases are part of daily life—not from humans, but from voice assistants and AI-powered systems that fail to understand them.
The reason? Bias in how speech AI listens. Most models are trained on dominant speech patterns—American English, male voices, standard accents. Everyone else is a statistical afterthought.
A Moment That Sparked a Mission
"I kept seeing how voice systems failed to understand people who didn't fit the standard mold—whether because of their accent, age, or tone. These weren't just isolated glitches; they were signals that AI was leaving entire groups behind.
WavShape grew out of the belief that machines can do better—that they can learn to listen fairly, and forget responsibly."
The Problem: AI That Listens Too Closely
Today's speech models—like Whisper or wav2vec—are incredibly powerful, but they over-listen. They extract:
- What you say (words, phonemes)
- How you say it (gender, accent, emotional state, age, regional identity)
This leads to serious consequences:
- Bias propagation: AI favors standard voices, penalizes variation
- Privacy leakage: Even anonymized speech can expose identity
- Unequal access: Non-dominant voices get left behind
Supporting Evidence
- In a 2020 study, major ASR systems had nearly twice the word error rate for Black speakers vs. white speakers (Koenecke et al.).
- A Stanford analysis found Scottish accents had 53% recognition accuracy, compared to 78% for Indian English (SSIR).
- NIH-backed research shows that marginalized speakers often change their natural voice just to be understood (PMC).
Enter WavShape: A New Way to Hear
WavShape is our answer. It's not just a model. It's a mission-driven framework that transforms how machines listen—fairly, efficiently, and respectfully.
We combine information theory with machine learning to:
- Keep what matters for the task
- Remove what could be biased or privacy-sensitive
- Compress the rest for low-resource deployment
Who WavShape Is For
- AI teams building voice systems in regulated industries
- Developers targeting multilingual or diverse user bases
- Researchers needing control over embedding leakage and structure
How to Use WavShape (In 3 Simple Steps)
WavShape is easy to plug into your existing speech pipeline:
1. Extract
Use a pre-trained model like Whisper to get audio embeddings.
2. Filter
Feed those embeddings into our bias-filtering projection layer.
3. Train
We help you keep useful info while removing sensitive attributes using mutual information.
Mid-Section Callback: The Vision Behind WavShape
WavShape wasn't conceived as just a technical innovation—it was a response to a recurring failure in modern voice systems.
Again and again, people who spoke differently were misunderstood or excluded. That failure revealed a design flaw, not just in models—but in how we define "understanding."
WavShape is our way of rethinking that definition. Of giving voice AI a conscience, not just computation.
How WavShape Compares
Method | Strengths | Weaknesses |
---|---|---|
Adversarial Fairness | Learns invariance to bias | Hard to train, unpredictable outcomes |
Differential Privacy (DP) | Provable guarantees | Adds noise, may degrade task utility |
WavShape (MI-based) | Controlled filtering, task-aware compression | Requires MI estimation, tuning needed |
It Works: Results That Speak
- 81% drop in sensitive mutual information (gender, accent)
- 97% retention of task-relevant features
- 38% lower AUROC for private attributes (i.e., harder to infer)
- Visual embedding plots confirm reduced bias and privacy leakage
Let's Build Listeners That Respect You
We're not just training models anymore.
We're training better listeners—ones that recognize your voice without making assumptions about who you are.
Because the future of speech AI should be fair.
And it should sound like everyone.