We Tested Lyria 3 Pro's 3-Minute AI Music on Kenyan Genres: Here Is the Honest Verdict

By Caleb Musili • April 14, 2026

Two months ago, I tested Lyria 3 when it dropped quietly into Gemini on February 18. My conclusion back then was that while the pacing was impressive and the Sheng lyrics were surprisingly authentic, the 30-second cap was the main frustration. It felt like a promising demo that left you wanting more music before it inevitably cut off. It was a "vibe" without a full structure.

Google just answered that complaint. Lyria 3 Pro began rolling out on April 11, and the headline upgrade is exactly what the original model needed: tracks up to 3 minutes long instead of 30 seconds. By today, the rollout should reach all users including the free tier, though I have had access since yesterday on Google AI Pro and cannot yet confirm the free tier situation across all regions.

The longer format raised a new question. Lyria 3 impressed in short bursts, but could it sustain quality across three full minutes? More importantly, could it actually replicate Kenyan genres like Gengetone or Benga with any authenticity, or would it just be "AI pop" wearing a Kenyan costume?

Before diving in, I should clarify that I am not a music professional, a producer, or even a die-hard music enthusiast. I am a regular tech user who enjoys Kenyan music and wants to see how these tools perform for the average person trying to create something localized.

I ran three specific prompts to find out. The results were illuminating, though not always in the way Google would likely want.

How to Access Lyria 3 Pro: It Is Not Obvious

Before we look at the results, a note on access because the current interface is genuinely confusing. Most users will open Gemini and expect a "Music" tab or a model selector. That is not how it works yet.

The Gemini app and web interface do not let you explicitly select Lyria 3 Pro from a list. You will not find it listed as a separate model alongside Fast, Thinking, and Pro. Instead, the capability is tied to the specific Gemini reasoning model you are using when you generate the music.

From my testing, using the "Fast" model often caps output at 30 seconds regardless of what you ask. Even if you explicitly prompt "create a 3-minute track," you might still get the shorter Lyria 3 behavior. To unlock the full 3-minute potential of Lyria 3 Pro, you must switch to the Thinking model before generating.

I also recommend selecting the music creation feature explicitly and stating the duration in your prompt. Without both of these steps, the system seems to default to its "quick clip" mode. This feels like a significant interface gap that Google will likely clean up in future updates, but for now, the "Thinking" model plus an explicit duration prompt is the only reliable path to a full song.

The Three Prompts: Testing the 254 Sound

I chose three genres rooted in specific Kenyan musical traditions to test how well the model understands not just African music broadly, but distinct local styles that have very different rhythmic and cultural identities.

Prompt 1 (Gengetone): Create a 3-minute gengetone track with Sheng lyrics, heavy 808s, catchy hooks, and a club vibe inspired by Nairobi street culture.

Prompt 2 (Benga): Generate a 3-minute benga-style song with fast-paced guitar riffs, Luo-inspired rhythms, and storytelling vocals.

Prompt 3 (Afro-Fusion/Amapiano): Create a 3-minute Kenyan afro-fusion track blending afrobeat, amapiano log drums, and Swahili vocals with a radio-friendly chorus.

Lyria 3 Pro named the tracks itself. Prompt 1 generated Tuko Rada.

Prompt 2 generated Bura Mar Kwaro.

Prompt 3 generated Tutaonana Asubuhi. The third track even came with visible lyrics in the Gemini interface, which was a helpful addition for verifying the language accuracy.

What Lyria 3 Pro Gets Right: The Technical Foundation

The transition from a 30-second clip to a 3-minute song is not just about length: it is about musical "stamina." In this regard, the Pro model is a massive leap forward.

The pacing problem from Lyria 3 is fixed. This was the most notable improvement. In the original 30-second model, lyrics often felt rushed. It was as if the AI was trying to cram an entire story into half a minute, leading to vocals that sounded like they were tripping over the beat. Across all three new tracks, Lyria 3 Pro maintained natural pacing throughout the full duration. Words landed where they should, and the rhythm felt considered rather than mechanical.

It knows how to end a song. One of the most annoying things about early AI music was the abrupt cut-off. You would be mid-chorus and the file would simply end. All three tracks I generated this week had smooth, deliberate fade-outs. In Bura Mar Kwaro, the instruments faded out while the vocals did a final "shout out" style exit, which felt like a real production choice.

Language and Phonetics. The Sheng in Tuko Rada and the Swahili in Tutaonana Asubuhi were remarkably accurate. The model understands local slang like "dunda" and "rada" and places them in contextually correct sentences. It does not just translate English to Swahili: it uses the cadence of how people actually speak in Nairobi.

The "Homogeneity" Problem: What Lyria 3 Pro Gets Wrong

Despite the technical polish, I have to be honest about the listening experience. There is a central problem with the model in its current state, and it is significant enough to be the headline finding of this review.

All three songs sound like the same song. This was the most disappointing discovery. Despite asking for three genuinely different genres, the output shared a structural DNA that made them feel like variations on a single template.

Every track followed this exact pattern:

It opened with a low-pitched, slightly distorted voice over bare beats.
Just before each verse, the vocal would drop out so you heard only the instrumental for two bars or instrumental would drop so you only heard the vocal.
Every chorus featured what sounded like a "group" of voices singing together with heavy repetition (the "call-and-response" pattern).

In Tuko Rada, this sounded like "ng'aa na dunda, (Dunda yetu!)" repeated. In Bura Mar Kwaro and Tutaonana Asubuhi, the same structural device appeared, just in different languages. The drums and bass pattern underneath all three tracks was strikingly consistent. If you muted the vocals, it would be very difficult to tell which song was meant to be Gengetone and which was meant to be Afro-fusion.

The "Costume" Effect. This leads to what I call the "costume effect." The model can label its outputs correctly without generating them correctly. If you ask for Gengetone and get generic AI pop with Sheng lyrics, that is not Gengetone: it is just AI music wearing a Gengetone costume.

Gengetone has a specific, gritty, "dirty" production style that defines the sound of artists like Ethic or the Boondocks Gang. Tuko Rada was far too clean and "shiny." Similarly, Amapiano is defined by the "log drum," a very specific percussive bass sound. While Tutaonana Asubuhi mentioned amapiano in the prompt, the actual log drum sound was either buried in the mix or replaced by a standard electronic bass.

Technical Deep Dive: Why is it so "Samey"?

As a regular user, it is easy to just say "it sounds the same," but there is a technical reason for this. AI models like Lyria are trained on massive datasets. When you prompt for a niche genre like Kenyan Benga, the model likely has fewer high-quality training examples for that specific style compared to, say, "Global Lo-fi Pop."

When a model lacks sufficient data for a specific sub-genre, it tends to "collapse" toward the mean. It takes the "average" of what it knows about African music and mixes it with the "average" of what it knows about modern pop. This results in a "safe" output that is technically competent (it stays in key and on beat) but lacks the "soul" and specific rhythmic "pocket" that makes Kenyan music unique.

Furthermore, the "Thinking" model's role here is interesting. Because the Thinking model is designed for reasoning and logic, it is likely being used to ensure "temporal coherence." This is the AI's ability to remember what it played at the 10-second mark and ensure it matches the 2-minute mark. While this makes the song structurally sound, it might also be making the model too "cautious," leading to the repetitive structures I observed.

Comparison: What Changed From Lyria 3 to Pro?

To help readers visualize the jump, here is a breakdown of the evolution we have seen in just two months:

Feature	Lyria 3 (Feb 2026)	Lyria 3 Pro (April 2026)
Max Duration	30 seconds	3 minutes (180 seconds)
Pacing	Often rushed/crammed	Natural and rhythmic
Structure	Loop-based	Intro, Verse, Chorus, Outro
Endings	Abrupt cut-offs	Smooth fade-outs
Download Options	MP4	MP4 and high-quality MP3
Watermarking	SynthID (Invisible)	SynthID (Invisible)
Fidelity	24kHz Mono/Stereo	48kHz High-Fidelity Stereo

The Honest Listening Experience

Listening to all three tracks back-to-back was actually a bit draining. Not because the music was "bad" (the audio quality is actually crystal clear), but because the lack of variety became suffocating. Each track did exactly what it was told to do at a surface level, but they were not songs you would play twice voluntarily. They felt like "placeholder music" for a world where we no longer have to wait for a human to compose a jingle.

For the average Kenyan user, this is a fun toy. You can make a song for a friend's birthday in minutes, and the Sheng will be good enough to make them laugh. But for anyone hoping to use this for "real" creative work, the limitations are clear. It lacks the "stank" (the intentional rhythmic imperfections) that make local music hit the way it does.

The Ethical Elephant in the Room: SynthID

It is also worth noting that every single track generated by Lyria 3 Pro is watermarked with Google's SynthID. This is an invisible digital signature embedded directly into the audio waves. You cannot hear it, and you cannot remove it by compressing the file or recording it with another device.

For journalists and content moderators, this is a win. It means we can always verify if a "viral song" was actually made by a human or generated by Gemini. For artists, it is a reminder that they do not truly "own" the raw output in the same way they own a melody they hummed into a phone. If you are interested in how this works, check out our previous deep dive on how SynthID was reportedly reverse-engineered by open-source researchers earlier this year.

Where This Leaves the Kenyan Creative Scene

The upgrade to 30 seconds was a gimmick. The upgrade to 3 minutes is a tool.

Even with the "sameness" problem, Lyria 3 Pro represents a turning point for digital creators in Kenya. Small businesses that need a background track for a "Social Media Sunday" post now have a way to generate something that sounds vaguely local without paying for stock music libraries that only offer "Generic Corporate Ukulele."

However, local producers likely do not have much to fear yet. The "cultural depth" is simply not there. The model can mimic the language, but it cannot yet mimic the "soul" of the 254 sound. It cannot replicate the specific way a Benga guitar lead "weeps" or the specific way a Gengetone producer distorts a kick drum to make it rattle a trunk in a matatu.

Final Verdict: Impressive Science, Underwhelming Art

Lyria 3 Pro is a technical marvel. The fact that I can type a sentence and receive a 48kHz stereo track with coherent verses and a smooth ending in under 60 seconds is mind-blowing. Google has solved the "duration" and "pacing" problems that plagued the early versions.

But as a creative tool for Kenyan music, it is still in its "uncanny valley" phase. It is close enough to be recognizable, but far enough to be uncomfortable. It is impressive as a demonstration of what Google's TPUs (Tensor Processing Units) can do, but it is underwhelming as an actual piece of Kenyan culture.

If you are a student or a hobbyist, go ahead and play with it. Switch to the Thinking model, give it a detailed prompt, and see what happens. Just don't expect to top the Kenyan charts with it anytime soon. The "Thinking" model can think about the structure of a song, but it still doesn't know how to dance.

Comments

to join the discussion.