WavShape: It's Time Speech AI Stopped Judging Your Voice

The Problem

"Can you say that again?"

The daily reality for millions of people with non-standard voices

"Can you say that again?"
"Sorry, I didn't catch that."

For millions of people, these phrases are part of daily life—not from humans, but from voice assistants and AI-powered systems that fail to understand them.

The reason? Bias in how speech AI listens. Most models are trained on dominant speech patterns—American English, male voices, standard accents. Everyone else is a statistical afterthought.

Our Mission

A Moment That Sparked a Mission

The story behind WavShape's creation

"I kept seeing how voice systems failed to understand people who didn't fit the standard mold—whether because of their accent, age, or tone. These weren't just isolated glitches; they were signals that AI was leaving entire groups behind.

WavShape grew out of the belief that machines can do better—that they can learn to listen fairly, and forget responsibly."

Oguzhan Baser, Founder of SkyThorn AI Labs

Technical Challenge

The Problem: AI That Listens Too Closely

How current speech models extract more than they should

Today's speech models—like Whisper or wav2vec—are incredibly powerful, but they over-listen. They extract:

What you say (words, phonemes)
How you say it (gender, accent, emotional state, age, regional identity)

This leads to serious consequences:

Bias propagation: AI favors standard voices, penalizes variation
Privacy leakage: Even anonymized speech can expose identity
Unequal access: Non-dominant voices get left behind

This isn't just inconvenient—it's discriminatory.

Research Evidence

Supporting Evidence

Empirical data showing the scope of the problem

In a 2020 study, major ASR systems had nearly twice the word error rate for Black speakers vs. white speakers (Koenecke et al.).
A Stanford analysis found Scottish accents had 53% recognition accuracy, compared to 78% for Indian English (SSIR).
NIH-backed research shows that marginalized speakers often change their natural voice just to be understood (PMC).

Our Innovation

Enter WavShape: A New Way to Hear

A mission-driven framework that transforms how machines listen

WavShape is our answer. It's not just a model. It's a mission-driven framework that transforms how machines listen—fairly, efficiently, and respectfully.

We combine information theory with machine learning to:

Keep what matters for the task
Remove what could be biased or privacy-sensitive
Compress the rest for low-resource deployment

Think of it as a noise-canceling filter for bias.

Target Audience

Who WavShape Is For

The diverse community that can benefit from our technology

AI teams building voice systems in regulated industries
Developers targeting multilingual or diverse user bases
Researchers needing control over embedding leakage and structure

Implementation

How to Use WavShape (In 3 Simple Steps)

Easy integration into existing speech pipelines

WavShape is easy to plug into your existing speech pipeline:

1. Extract

Use a pre-trained model like Whisper to get audio embeddings.

2. Filter

Feed those embeddings into our bias-filtering projection layer.

3. Train

We help you keep useful info while removing sensitive attributes using mutual information.

It works with any encoder. Just add a projection layer and our loss. That's it.

Philosophy

Mid-Section Callback: The Vision Behind WavShape

The deeper motivation driving our technical innovation

WavShape wasn't conceived as just a technical innovation—it was a response to a recurring failure in modern voice systems.

Again and again, people who spoke differently were misunderstood or excluded. That failure revealed a design flaw, not just in models—but in how we define "understanding."

WavShape is our way of rethinking that definition. Of giving voice AI a conscience, not just computation.

Technical Comparison

How WavShape Compares

Advantages over existing fairness and privacy methods

Method	Strengths	Weaknesses
Adversarial Fairness	Learns invariance to bias	Hard to train, unpredictable outcomes
Differential Privacy (DP)	Provable guarantees	Adds noise, may degrade task utility
WavShape (MI-based)	Controlled filtering, task-aware compression	Requires MI estimation, tuning needed

Performance Results

It Works: Results That Speak

Empirical validation on Common Voice and VCTK datasets

81% drop in sensitive mutual information (gender, accent)
97% retention of task-relevant features

38% lower AUROC for private attributes (i.e., harder to infer)
Visual embedding plots confirm reduced bias and privacy leakage

Future Vision

Let's Build Listeners That Respect You

Our commitment to fair and inclusive voice AI

We're not just training models anymore.
We're training better listeners—ones that recognize your voice without making assumptions about who you are.

Because the future of speech AI should be fair.
And it should sound like everyone.

It's Time Speech AI Stopped Judging Your Voice