Natural Language, Artificial Understanding:

40 Years of Talking Past Each Other

Remember calling customer service and screaming "REPRESENTATIVE!" at an automated system?

Yeah, we've come a long way. Or have we?

I've been diving deep into how conversational design evolved over the last 40 years, and here's what nobody tells you: We keep solving the same problem over and over, just with fancier tech.

Best Reads of the Week

The Theater of Conversation

In the 1980s, linguists had this revolutionary idea: maybe conversation isn't just about words. Maybe it's about what people are trying to DO with those words. Mind-blowing, right?

They called it "speech act theory." Basically: when I say "Is it cold in here?" I'm not asking for a weather report. I'm asking you to close the damn window.

This should have changed everything. Instead, we built ELIZA - a chatbot that just repeated your questions back at you like a digital therapist. "You say you're frustrated with chatbots?"

Fast forward to the 2000s. Mobile phones everywhere. Voice recognition that could understand you... if you spoke like a robot. "CALL. MOM. MOBILE."

The promise? Natural conversation with machines. The reality? "I'm sorry, I didn't catch that. Did you say 'Call Tom's nobile?'"

Multimodal Maze

Humans don't just talk. We point, we gesture, we show things. When I ask "What's wrong with this?" while holding my broken phone, the "this" only makes sense with the visual context.

So naturally, tech companies decided to... completely separate voice and visual interfaces. Because why make things easy?

Your smart TV has voice control AND a remote with 47 buttons. Your phone has Siri AND touch controls that do completely different things. It's like having two different employees who refuse to talk to each other.

We've had the research on multimodal interaction since the early 2000s. We KNOW people want to seamlessly switch between talking, typing, and pointing. But most products still treat these as separate kingdoms.

Enter the AI Hype Train

Then came deep learning. Suddenly, speech recognition actually worked. Like, actually actually. No more "SPEAK. SLOWLY. AND. CLEARLY."

LLMs brought something even crazier: computers that could hold a conversation. Not just respond to commands, but actually engage in back-and-forth dialogue.

The promise? This time it's different! Natural conversation! Multimodal interaction! AI understands context!

The reality?

"Hey Siri, remind me about this when I get home" "I'm sorry, I don't know what 'this' refers to." throws phone

Here's what kills me: Every breakthrough in conversational AI solves yesterday's problems while creating tomorrow's.

1980s problem: Computers can't understand speech
2000s solution: Voice recognition!
2000s problem: It only works in perfect conditions
2010s solution: Deep learning!
2010s problem: It still doesn't understand context
2020s solution: LLMs and multimodal AI!

2020s problem: Now it confidently misunderstands context

What Actually Matters

After 40 years of evolution, here's what we've learned about making conversational AI that doesn't suck:

  • Natural beats optimal. The most efficient interface isn't always the best. Sometimes typing is faster than talking. Sometimes pointing is clearer than describing. Let users choose.

  • Context is everything. "Set a timer" means something different when I'm cooking versus when I'm working out. If your AI doesn't know the difference, it's just a fancy command line.

  • Trust trumps features. Users need to know when to speak, when to type, when to tap. They need to understand what the system can and can't do. Mysterious AI is abandoned AI.

  • Multimodal isn't multi-interface. Stop building separate voice and visual systems. Build ONE system that accepts multiple inputs. Like, you know, humans do.

Plot Twist (You Know It)

The biggest barrier to great conversational AI isn't technology. It's organizational.

Your voice team doesn't talk to your GUI team. Your AI researchers don't sit with your UX designers. Your product managers are optimizing for different metrics than your conversation designers.

You're not building bad conversational interfaces because the tech isn't there. You're building them because your org chart is fighting itself.

So What's Next?

The next 10 years of conversational AI won't be about better speech recognition or smarter language models. We've mostly solved those problems.

It'll be about:

  • Systems that actually remember context across modalities

  • Interfaces that adapt to how YOU communicate, not the other way around

  • AI that knows when to shut up and let you type

  • Products built by teams that actually talk to each other

The future of conversational AI isn't about making computers talk better. It's about making them conversation partners that actually understand what we're trying to do.

And maybe, just maybe, we'll finally build something better than yelling "REPRESENTATIVE!" at our phones.

But probably not. See you in 2035 when we're solving this problem again with quantum computers or whatever.

More to explore:

Share your thoughts:
How did you like today’s newsletter?
You can share your thoughts at [email protected] or share the newsletter using this link.

Reply

or to participate.