It's Time Speech AI Stopped Judging Your Voice

WavShape: A mission-driven framework that transforms how machines listen—fairly, efficiently, and respectfully

The Problem

"Can you say that again?"

The daily reality for millions of people with non-standard voices

"Can you say that again?"
"Sorry, I didn't catch that."

For millions of people, these phrases are part of daily life—not from humans, but from voice assistants and AI-powered systems that fail to understand them.

The reason? Bias in how speech AI listens. Most models are trained on dominant speech patterns—American English, male voices, standard accents. Everyone else is a statistical afterthought.

Our Mission

A Moment That Sparked a Mission

The story behind WavShape's creation

"I kept seeing how voice systems failed to understand people who didn't fit the standard mold—whether because of their accent, age, or tone. These weren't just isolated glitches; they were signals that AI was leaving entire groups behind.

WavShape grew out of the belief that machines can do better—that they can learn to listen fairly, and forget responsibly."

Oguzhan Baser, Founder of SkyThorn AI Labs
Technical Challenge

The Problem: AI That Listens Too Closely

How current speech models extract more than they should

Today's speech models—like Whisper or wav2vec—are incredibly powerful, but they over-listen. They extract:

  • What you say (words, phonemes)
  • How you say it (gender, accent, emotional state, age, regional identity)

This leads to serious consequences:

  • Bias propagation: AI favors standard voices, penalizes variation
  • Privacy leakage: Even anonymized speech can expose identity
  • Unequal access: Non-dominant voices get left behind
Research Evidence

Supporting Evidence

Empirical data showing the scope of the problem
  • In a 2020 study, major ASR systems had nearly twice the word error rate for Black speakers vs. white speakers (Koenecke et al.).
  • A Stanford analysis found Scottish accents had 53% recognition accuracy, compared to 78% for Indian English (SSIR).
  • NIH-backed research shows that marginalized speakers often change their natural voice just to be understood (PMC).
Our Innovation

Enter WavShape: A New Way to Hear

A mission-driven framework that transforms how machines listen

WavShape is our answer. It's not just a model. It's a mission-driven framework that transforms how machines listen—fairly, efficiently, and respectfully.

We combine information theory with machine learning to:

  • Keep what matters for the task
  • Remove what could be biased or privacy-sensitive
  • Compress the rest for low-resource deployment
Target Audience

Who WavShape Is For

The diverse community that can benefit from our technology
  • AI teams building voice systems in regulated industries
  • Developers targeting multilingual or diverse user bases
  • Researchers needing control over embedding leakage and structure
Implementation

How to Use WavShape (In 3 Simple Steps)

Easy integration into existing speech pipelines

WavShape is easy to plug into your existing speech pipeline:

1. Extract

Use a pre-trained model like Whisper to get audio embeddings.

2. Filter

Feed those embeddings into our bias-filtering projection layer.

3. Train

We help you keep useful info while removing sensitive attributes using mutual information.

Philosophy

Mid-Section Callback: The Vision Behind WavShape

The deeper motivation driving our technical innovation

WavShape wasn't conceived as just a technical innovation—it was a response to a recurring failure in modern voice systems.

Again and again, people who spoke differently were misunderstood or excluded. That failure revealed a design flaw, not just in models—but in how we define "understanding."

WavShape is our way of rethinking that definition. Of giving voice AI a conscience, not just computation.

Technical Comparison

How WavShape Compares

Advantages over existing fairness and privacy methods
Method Strengths Weaknesses
Adversarial Fairness Learns invariance to bias Hard to train, unpredictable outcomes
Differential Privacy (DP) Provable guarantees Adds noise, may degrade task utility
WavShape (MI-based) Controlled filtering, task-aware compression Requires MI estimation, tuning needed
Performance Results

It Works: Results That Speak

Empirical validation on Common Voice and VCTK datasets
  • 81% drop in sensitive mutual information (gender, accent)
  • 97% retention of task-relevant features
  • 38% lower AUROC for private attributes (i.e., harder to infer)
  • Visual embedding plots confirm reduced bias and privacy leakage
Future Vision

Let's Build Listeners That Respect You

Our commitment to fair and inclusive voice AI

We're not just training models anymore.
We're training better listeners—ones that recognize your voice without making assumptions about who you are.

Because the future of speech AI should be fair.
And it should sound like everyone.

Get Involved

Join Us

Ways to contribute to fair and inclusive voice AI
Developers

Integrate WavShape with your ASR pipeline

GitHub
Enterprises

Book a bias audit or integration demo

Contact us
Researchers

Collaborate on next-gen privacy and fairness

Reach out