Learn to identify "good" data

Don't make this mistake, know the quality of your data.

Jeremy Lezac
June 16, 2025

Hey.

It’s story time:
Company launches ambitious AI project. Team is excited. Pilot looks promising. Then somehow, somewhere between proof-of-concept and production, everything falls apart.

The model that worked beautifully in testing starts making weird predictions. The insights that seemed groundbreaking suddenly feel irrelevant. The whole thing becomes a cautionary tale about "AI hype."

The problem is not AI at all. It’s the data feeding it.

We've all heard "garbage in, garbage out," but most people think of this as a technical problem. It's not. It's a business strategy problem.

The companies crushing it with AI aren't necessarily the ones with the fanciest algorithms or the biggest data science teams.

They're the ones who figured out how to build a solid data foundation first.

Best Reads of the Week
What "Good" Data Actually Means

Best Reads of the Week

God is hungry for Context: First thoughts on o3 pro - Key learning: They might be smart, but they’re not a useful employee if they can’t integrate.
My recipe for solving real world problems with LLMs - I usually really like Pau work. No fluff. Very pragmatic. Check this one out if you are not actively working on an LLM pipeline, you can kick-start a pipeline from scratch.
Why Anthropic’s bet against Windsurf is a mistake - Drama alert? Maybe. Was it a good idea or a bad one? Learn more with this article.
Generative Art Riso Prints, Notebooks & Website update - Fun fact about me: I started in AI by learning from artists working at the intersection of art and technology. Here a quick note from a lovely artist doing OG “Generative Art” with a pen plotter.

What "Good" Data Actually Means

Forget the technical definitions for a moment. Good data is simply data you can trust to make decisions. That's it. Data that's accurate enough, complete enough, and current enough that when your AI system uses it to make a prediction or recommendation, you're confident acting on it.

But getting to that level of trust requires understanding what makes data reliable in the first place.

The six things that matter

Accuracy: Does This Reflect Reality? Your data needs to tell the truth about what actually happened. Sounds obvious, but you'd be surprised how often data gets corrupted somewhere along the way—typos in data entry, system errors, or just plain human mistakes. When AI learns from inaccurate data, it learns the wrong patterns. Those wrong patterns become wrong predictions, which become wrong business decisions.

Completeness: Are We Missing Important Pieces? Missing data isn't just annoying, it's dangerous. AI systems are pattern-recognition machines, and when they're missing crucial information, they fill in the gaps with assumptions. Sometimes those assumptions are harmless. Sometimes they create blind spots that lead to systematically biased outcomes. Complete data gives AI the full picture it needs to make reliable predictions.

Consistency: Are We Speaking the Same Language? Data consistency means your customer data in the CRM uses the same format as your customer data in the billing system. It means your product categories are standardized across all touchpoints. Without consistency, AI systems spend their time trying to figure out what different data sources are actually talking about instead of finding useful patterns.

Timeliness: Is This Still Relevant? Data has a shelf life. Customer preferences change. Market conditions shift. Regulations update. AI systems trained on outdated data aren't just less effective; they can also be actively misleading. Timely data keeps AI relevant and useful.

Relevance: Does This Actually Matter? Not all data is created equal. Relevant data directly connects to the business problem you're trying to solve. In a world where we can collect data on everything, the art is figuring out what data actually matters for your specific use case.

Trustworthiness: Can We Rely on This? Trustworthy data comes with proper governance, security, and compliance controls. It's data you can audit, trace, and defend. When your AI system makes a recommendation that affects customers or business operations, you need to know the data behind it is solid.

Why this matters more than you think

The impact of data quality on AI isn't linear, it's exponential. Small data quality issues compound quickly in AI systems. A 5% error rate in your training data doesn't just make your AI 5% less accurate. It can make it completely unreliable in edge cases, which are often the most important business scenarios.

But the flip side is also true. Organizations that invest in data quality see outsized returns on their AI investments. They deploy AI systems faster because they're not constantly debugging data issues. Their AI performs better in production because it's trained on reliable information. And they scale AI initiatives more easily because they have reusable, trustworthy data foundations.

Building something that works

Creating good data isn't about perfect data. It's about data that's good enough for your specific AI use cases. The key is building systems and processes that maintain data quality over time, not just achieving it once.

Start with the Business Problem: The best data strategies work backward from clear business objectives. Instead of collecting data and then figuring out what to do with it, start with the question you want AI to answer, then design your data collection and management around that question.

Just enough Governance: Effective data governance feels invisible to the people who need to use data. It's there to ensure quality and compliance, not to create bureaucracy. The goal is to make good data practices the easy choice, not the painful one.

Break down the silos: Data silos limit what your AI can see and they limit what it can learn. Modern data strategies focus on making data accessible across the organization while maintaining appropriate security and privacy controls. (…It’s not fun otherwise)

Like software, build for reuse: Good data management ensures that data collected for one purpose can be easily repurposed for other legitimate business uses. This creates a multiplier effect where each data investment supports multiple AI initiatives.

Reality check

Most organizations approach AI backward. They start with cool AI use cases and then try to figure out if they have the data to support them. The organizations that succeed consistently do it the other way around: they build strong data foundations first, then use those foundations to enable AI use cases that actually deliver business value.

While LLMs can let you do great think without domain-specific data, keep in mind that you have literally no fundamental moat without this sweet sweet data.

This doesn't mean you need perfect data before you start any AI projects. It means you need to be honest about your data quality and realistic about what AI can accomplish with the data you have. Sometimes that means starting with smaller, more focused AI initiatives while you build up your data capabilities. Sometimes it means investing in data infrastructure before you invest in AI tools.

What success looks like

Organizations with strong data foundations share a few characteristics. They can deploy AI systems faster because they're not constantly fixing data issues. They reuse data across multiple AI initiatives, creating economies of scale. And they trust their AI systems enough to actually act on their recommendations.

The path to get there isn't about buying the latest data platform or hiring more data scientists. It's about building organizational capabilities that treat data as a strategic asset, not a technical afterthought.

The future belongs to organizations that understand data isn't just fuel for AI, it's the foundation that makes AI actually useful.

AI is only as good as the data it learns from. But "good" data isn't about perfection, it's about reliability, relevance, and trust.

The difference between AI that works and AI that doesn't usually comes down to something much more mundane than most people realize: whether you can trust the data behind it. Get the data foundation right, and everything else becomes possible. Get it wrong, and even the most sophisticated AI becomes an expensive experiment.

More to explore:

Share your thoughts:
How did you like today’s newsletter?
You can share your thoughts at [email protected] or share the newsletter using this link.

Reply

or to participate.