← Back to blog
Explainer

Why Nigerian AI must be trained on Nigerian voices — not adapted from US models

When a caller switches from Yoruba to English mid-sentence, a US-trained model hears noise. Here's why purpose-built Nigerian language AI isn't optional — it's the entire product.

Every major AI speech recognition system — Whisper, Google Speech-to-Text, AWS Transcribe, Azure Cognitive Services — was primarily trained on American and British English. Some support "English (Nigeria)" as a locale option. Most do not. And even the ones that do treat Nigerian English as a minor dialect of American English, not as a distinct linguistic environment shaped by four dominant languages with their own phonological rules, tonal systems, and vocabulary.

The result: these models fail Nigerian callers in ways that matter.

The problem isn't accent. It's phonology.

Nigerian English speakers don't just have different accents. They operate in a linguistic environment where:

A model trained on American English has never heard these sounds in its training data. It guesses. On telephone-quality audio — with compression, background noise, and the acoustic characteristics of Nigerian mobile networks — it guesses badly.

Code-switching is the real test

The hardest problem — and the one that most distinguishes Nigerian callers from any other market — is mid-sentence code-switching. A typical caller doesn't stay in one language for the duration of a call. They move between languages the way fluent multilingual speakers always do: naturally, mid-clause, based on what's easier to express in which language.

A sample sentence from a real call: "Ẹ káàsán — I want to book an appointment, and please tell me the doctor dey available on Saturday?"

This opens in Yoruba, switches to standard English, then closes with Nigerian Pidgin. A US-trained model hears the Yoruba greeting as noise, produces gibberish for the Pidgin, and gets the English clause — the least informative part of the sentence — roughly right.

Maraba handles this correctly because it was trained on it. Every line of training data includes the kind of mixed-language utterances real Nigerian callers produce.

What "purpose-built" actually means

Maraba's speech recognition is built on a fine-tuned Whisper base model re-trained on telephone-quality audio from Nigerian speakers across all four languages. "Telephone-quality" matters: 8kHz audio with the codec distortion and background noise characteristics of Nigerian mobile networks is acoustically different from studio recording or broadband audio. A model trained on the latter performs badly on the former.

The training corpus includes:

Why this matters for your business

If your AI receptionist mishears a caller's name, misses their appointment date, or transcribes "àárọ̀" (morning) as noise, you're not running an AI receptionist. You're running a call answering service that occasionally produces a garbled log.

The downstream consequences are real: wrong bookings, unanswered escalations, callers who hang up and don't call back. In a market where every missed call is a lost customer, an AI that doesn't understand your callers is worse than no AI at all — it creates the illusion of coverage while failing silently.

Purpose-built Nigerian language AI isn't a differentiator. It's the minimum bar for the product to work.

Hear Maraba handle a real Nigerian call

Listen to Maraba answer in Yoruba, switch to English, and back again — without missing a word.

Hear a live demo →