- Roll Roll AI: Build // AI.
- Posts
- Natural Language, Artificial Understanding:
Natural Language, Artificial Understanding:
40 Years of Talking Past Each Other

Remember calling customer service and screaming "REPRESENTATIVE!" at an automated system?
Yeah, we've come a long way. Or have we?
I've been diving deep into how conversational design evolved over the last 40 years, and here's what nobody tells you: We keep solving the same problem over and over, just with fancier tech.
Best Reads of the Week
How to intellectually debate AI while completely missing the point - Ok it’s basically an ad for his book but the idea of unbundling <=> rebundling due to IA is interesting.
The Big LLM Architecture Comparison - If you are technical and/or nerdy, a good overview of the architecture of the biggest LLMs around.
Inside the semantic attack that fools Grok-4 (and other LLMs) - Two attacks, one jailbreak. How researchers quietly dismantled the guardrails of xAI’s most advanced model.
Why most startups are building AI the wrong way - Still slapping AI on old tools? The real wins come from rethinking the whole system: outcomes first, software second.
Onboarding For AI Products With Kate Syuma - Shortening time-to-value won’t save your AI product if the outcome sucks, here’s what great onboarding actually looks like.
The Theater of Conversation
In the 1980s, linguists had this revolutionary idea: maybe conversation isn't just about words. Maybe it's about what people are trying to DO with those words. Mind-blowing, right?
They called it "speech act theory." Basically: when I say "Is it cold in here?" I'm not asking for a weather report. I'm asking you to close the damn window.
This should have changed everything. Instead, we built ELIZA - a chatbot that just repeated your questions back at you like a digital therapist. "You say you're frustrated with chatbots?"
Fast forward to the 2000s. Mobile phones everywhere. Voice recognition that could understand you... if you spoke like a robot. "CALL. MOM. MOBILE."
The promise? Natural conversation with machines. The reality? "I'm sorry, I didn't catch that. Did you say 'Call Tom's nobile?'"
Multimodal Maze
Humans don't just talk. We point, we gesture, we show things. When I ask "What's wrong with this?" while holding my broken phone, the "this" only makes sense with the visual context.
So naturally, tech companies decided to... completely separate voice and visual interfaces. Because why make things easy?
Your smart TV has voice control AND a remote with 47 buttons. Your phone has Siri AND touch controls that do completely different things. It's like having two different employees who refuse to talk to each other.
We've had the research on multimodal interaction since the early 2000s. We KNOW people want to seamlessly switch between talking, typing, and pointing. But most products still treat these as separate kingdoms.
Enter the AI Hype Train
Then came deep learning. Suddenly, speech recognition actually worked. Like, actually actually. No more "SPEAK. SLOWLY. AND. CLEARLY."
LLMs brought something even crazier: computers that could hold a conversation. Not just respond to commands, but actually engage in back-and-forth dialogue.
The promise? This time it's different! Natural conversation! Multimodal interaction! AI understands context!
The reality?
"Hey Siri, remind me about this when I get home" "I'm sorry, I don't know what 'this' refers to." throws phone
Here's what kills me: Every breakthrough in conversational AI solves yesterday's problems while creating tomorrow's.
1980s problem: Computers can't understand speech
2000s solution: Voice recognition!
2000s problem: It only works in perfect conditions
2010s solution: Deep learning!
2010s problem: It still doesn't understand context
2020s solution: LLMs and multimodal AI!
2020s problem: Now it confidently misunderstands context
What Actually Matters
After 40 years of evolution, here's what we've learned about making conversational AI that doesn't suck:
Natural beats optimal. The most efficient interface isn't always the best. Sometimes typing is faster than talking. Sometimes pointing is clearer than describing. Let users choose.
Context is everything. "Set a timer" means something different when I'm cooking versus when I'm working out. If your AI doesn't know the difference, it's just a fancy command line.
Trust trumps features. Users need to know when to speak, when to type, when to tap. They need to understand what the system can and can't do. Mysterious AI is abandoned AI.
Multimodal isn't multi-interface. Stop building separate voice and visual systems. Build ONE system that accepts multiple inputs. Like, you know, humans do.
Plot Twist (You Know It)
The biggest barrier to great conversational AI isn't technology. It's organizational.
Your voice team doesn't talk to your GUI team. Your AI researchers don't sit with your UX designers. Your product managers are optimizing for different metrics than your conversation designers.
You're not building bad conversational interfaces because the tech isn't there. You're building them because your org chart is fighting itself.
So What's Next?
The next 10 years of conversational AI won't be about better speech recognition or smarter language models. We've mostly solved those problems.
It'll be about:
Systems that actually remember context across modalities
Interfaces that adapt to how YOU communicate, not the other way around
AI that knows when to shut up and let you type
Products built by teams that actually talk to each other
The future of conversational AI isn't about making computers talk better. It's about making them conversation partners that actually understand what we're trying to do.
And maybe, just maybe, we'll finally build something better than yelling "REPRESENTATIVE!" at our phones.
But probably not. See you in 2035 when we're solving this problem again with quantum computers or whatever.
More to explore:
Share your thoughts:
How did you like today’s newsletter?
You can share your thoughts at [email protected] or share the newsletter using this link.
Reply