- Roll Roll AI: Build // AI.
- Posts
- The Only Framework That Hasn’t Lied to You Yet
The Only Framework That Hasn’t Lied to You Yet
Why the scientific method beats every product playbook in AI.
It feel like there’s a new framework every week and it is starting to sound like “clean eating”: mostly vibes, very little rigor.
In this issue, we’re not adding to the pile. We’re subtracting.
We’re going back to the only method that survives hype cycles: the scientific method.
Not a metaphor. Not a mindset. A practical, testable approach to building in uncertainty.
Inside:
Why most frameworks collapse under AI complexity
How the scientific method maps 1:1 to product work
Two real-world examples: support automation & slide summarization
This isn’t “best practice.” This is what’s left when the buzzwords wear off.
Let’s get into it.
Table of Contents
Best Reads Of The Week
The rise of Data slop - As AI makes data analysis more accessible, it also makes data manipulation effortless. This piece dissects how “data slop”, defined as misleading but technically accurate charts will become endemic.
This article invite you to be careful for misaligned incentives and low data literacy and to invest in shared metrics and reproducible workflows.The Solitude Generation - If your product serves younger users, internalize this presentation. Anyone creating AI tools that touch learning, socialization, or mental health should read/watch it. But also, you should give a shit.
Why high performers make assertions - Insights are cheap. Assertions are leadership. When data is plentiful and direction is scarce, your value isn’t in noticing but in committing. Your goal is to have a point-of-view, make a bet and keep your skin in the game.
Please stop forcing Clippy on those who want Anton - Underneath the RLHF and “glazing” memes lies an ancient tension: should your AI act like a silent workhorse (Anton) or a supportive sidekick (Clippy)? This post nails the cultural divide and reminds us that these aren't new questions. They're just returning in LLM form, with the same unresolved tradeoffs HCI’s been chewing on for decades.
The No-Framework Framework: The Scientific Method.
With the excitement around AI and AI solutions, products and companies, we also witness a lot of noise.
Everyone’s got a framework. Everyone’s got a playbook. Everyone’s got an angle to play and most of it is recycled thinking, dressed up for the AI age.
In moments like this, the temptation is to try to catch up, to find THE method that makes sense of this chaos. But we already have one, it’s old, very old but also more reliable, more adaptable than the framework du jour lurking around.
It’s the scientific method.
It’s not a metaphor, not a vibe, a simple and working model for reasoning under uncertainty.
What’s Wrong With Frameworks?
Not much and a lot at the same time. In an ideal world, frameworks would be used to provide a common playground under explicit hypothesis. In an ideal world. Most methods and frameworks nowadays are not used to refine and make your thinking clear but to skip the thinking part and focus on following a step-by-step method that will provide the perfect solution.
With the speed of improvement we face in AI, you are operating in a space of unknown unknowns where emerging capabilities optimize for probability and not certainty and most process-oriented break down in this space.
Scientific Method for AI Product Development
Let’s strip it down:
Scientific Method | AI Product Development |
---|---|
Ask a question | Observe user friction or business opportunity |
Form a hypothesis | Propose a capability that might help |
Design an experiment | Build a narrow, testable prototype |
Collect data | Observe user behavior, performance, trust |
Analyze results | Is it better? Is it usable? Is it safe? Is it valuable? |
Iterate or reject | Improve or throw it out |
Please note the scientific method is often represented as an ongoing process and there is many others variants
You define a falsifiable claim (“This feature will reduce support tickets by 20%”)
You run a limited test
You learn
You try again
The work is closer to research than traditional product discovery.
And most importantly, you don’t hide the uncertainty—you design around it.
Why This Approach Works
AI Is Probabilistic
Your system doesn't always return the same output. Accuracy can vary wildly depending on context, prompt phrasing, user behavior. You’re not building deterministic software. You’re building with likelihoods.Quality Is Emergent
You can’t spec your way to a good AI UX. You have to watch how people interact with it, where it fails, where it surprises. The experience emerges from the interplay between model, interface, and human.Failure Is Signal
In the scientific mindset, failure is data. Model hallucination? Maybe your prompt is too brittle. User distrust? Maybe the framing is wrong. You treat misfires as a lens, not a setback.Discovery Is Continuous
You never really leave the discovery phase. Every deploy is a new experiment. You’re not “shipping features”; you’re running field tests.
Applied Examples
To make this concrete, here are two teams applying the scientific method in their approaches. One in a mature domain, the other in an emerging behavior.
1. Case 1: Automating Ticket Triage in SupportOps
At a mid-sized SaaS company, the support team is underwater. Not because tickets are too complex, because they’re too messy. Every incoming ticket needs to be tagged, prioritized, routed. It’s a noisy, manual pre-processing step that burns time and slows responses.
The product team doesn’t start with a feature idea. They start with a question:
“Can we reduce the triage burden without hurting quality?”
They run a shadow test: no UI, no rollout. They use GPT-4 to classify historical tickets into support categories and urgency levels, and compare against actual agent tags.
84% label accuracy in key categories
2.1s average inference time
Agents trust the urgency prediction 65% of the time when shown it blind
They ship nothing. Instead, they kill the urgency prediction (too inconsistent), and bubble up category suggestions to a low-friction part of the UI. In parallel, they run a two-week test to see if this nudge actually speeds up ticket handling.
At no point do they “build a feature.” They build an experiment. Loop one ends. Loop two begins.
Hypothesis: Auto-classifying incoming tickets will reduce time-to-first-response without hurting user satisfaction.
Experiment: GPT-4 model predicts tags based on historical data, tested in shadow mode.
Interpretation: 84% accuracy on key categories, 65% agent trust on urgency prediction, 2.1s latency.
Next Loop: Drop urgency prediction, promote best ones to UI, test impact on agent workflow.
2. Case 2: Making Slide Decks Useful Again
In an async-first startup, people constantly share slides after meetings. But no one reads them. Threads get buried. Decisions stall. The team notices a pattern:
People reply not with reactions, but with “tl;dr?”
Instead of launching a full-on document analysis feature, the team tries a bet:
“If we can summarize slide decks automatically, people will be more likely to follow up.”
They use GPT-4 to extract a short summary from decks under 20 slides. No UX investment, just a 3-sentence auto-generated TL;DR next to the upload.
Users start referencing summaries instead of opening decks
Summary-enabled posts get 2× more responses
Authors start asking, “Can I edit the summary before sharing?”
It’s not polished. It’s not complete. But it moves behavior. That’s signal.
Next loop: enable optional editing, add action-item extraction, test OCR to summarize slides with heavy visuals.
Hypothesis: Auto-generated slide summaries will boost engagement.
Experiment: Extract text from PDFs, use GPT-4 to generate 3-sentence TL;DRs.
Interpretation: Higher comment rates, users cite summary instead of opening deck.
Next Loop: Add action items, allow human-in-the-loop editing, explore image OCR.
Different contexts, same structure:
Start from friction
Hypothesize an outcome
Run a narrow, falsifiable test
Interpret, kill, or extend
Repeat
That’s the loop. That’s the work. That’s all.
Conclusion
This isn’t about rejecting frameworks. It’s about knowing they become cargo cults.
If you’re building AI products, you’re not following a map, you’re walking through fog with a flashlight. You don’t need rituals. You need rigor.
The scientific method isn’t a framework. It’s a habit of mind.
It rewards precision, falsifiability, and iteration. It scales with complexity. It survives contact with reality.
Closing Loop
You saw how the scientific method maps cleanly onto AI product development:
Starting from real friction, not imaginary personas
Making testable bets instead of launching full-stack features
Treating failure as signal, not waste
Running fast, falsifiable loops with just enough structure to learn
You also saw it in action: one team untangling noisy support tickets, another making post-meeting slide decks useful again with zero theatrics, just disciplined iteration.
If there’s one takeaway, it’s this:
In AI, the best product thinkers are closer to scientists than operators.
They don’t optimize process. They optimize for learning.
More to explore:
Share your thoughts:
How did you like today’s newsletter?
You can share your thoughts at [email protected] or share the newsletter using this link.
Reply