Back to Blog

Data Quality: Why 60% of AI Projects Fail — And How to Fix It

Sven RickeMarch 27, 20268 min read
Data QualityAIData EngineeringMid-Size BusinessAutomation
Contrast image: chaotic data and Excel spreadsheets next to a clean, structured database visualization

Data Quality: Why 60% of AI Projects Fail

If you read my previous post about AI agents for mid-size companies, you might remember this line:

"87% of German companies have a data quality problem. An AI agent working with bad data is like a sports car on a dirt road."

I put that sentence in there on purpose. Because it's the most important thing in the entire article. And because it describes exactly the problem I see in every other project.

Today, we're talking about that. Just that. No AI agents, no fancy automation. We're talking about the boring, unglamorous reason why most AI projects crash and burn.

The Number Nobody Wants to Hear

60% of AI projects will be abandoned by the end of 2026 — because of poor data quality. That's not from some random blog. That's Gartner. And anyone who knows Gartner knows they tend to be conservative with their predictions.

Sixty percent. Let that sink in.

Out of ten companies currently building an AI agent, a machine learning model, or "something with AI," six will stop. Not because the technology doesn't work. Not because the budget was too small. Not because the consultant was bad.

Because the data was garbage.

"Our Data Is Fine" — The Biggest Lie in Business

Here's the thing: when a CEO tells me "our data is actually pretty good," I know it's going to be really bad. It's like when a real estate agent says "the house is in great condition" — you know exactly what's coming.

Here's what I typically find when I look at data in mid-size companies:

The 5 Data Skeletons Hiding in Every Company

1. Duplicates Everywhere

"Miller Inc.", "Miller Inc", "Miller Incorporated", "Miller Inc. (old)" — that's the same customer. But in the ERP, those are four different records with four different payment histories. And the sales team wonders why the quote landed at the wrong contact.

2. Empty Required Fields

Cost center? Empty. Industry? "Other." Payment terms? The default setting that hasn't been accurate since 2019. Half the product master data has no category assigned. But sure, let's have AI find patterns in this.

3. Outdated Master Data

The supplier address is from 2018. The contact person hasn't worked there for three years. The phone number has an area code that doesn't exist anymore. But the data was "maintained at the time."

4. Inconsistent Formats

Dates: "03/12/2026", "2026-03-12", "12.03.2026", "March 2026." All in the same column. Prices: "$1,234.56", "1234.56", "USD 1234.56". Units: "kg", "kilos", "kilogram", "KG." Good luck running an automated match on that.

5. The Shadow Excel Universe

The ERP is the official system. But the real data? It's in Sabine from accounting's Excel spreadsheet. The one on the network drive. Third version. Named "Customers_CURRENT_v3_FINAL_really_final.xlsx."

Think I'm exaggerating? I've seen this literally. More than once.

Why This Isn't an IT Problem

Here's the part that stings: data quality isn't a technical problem. It's an organizational problem.

IT can deploy the best CRM in the world. If sales doesn't fill in the fields because they're "too busy," the data is still garbage. If accounting runs a parallel Excel sheet because "the ERP is too complicated," you have two versions of the truth. And neither is reliable.

87% of German companies have a data quality problem. And most don't even know it. Because as long as a human interprets the data — "Oh, Miller Inc. and Miller Incorporated, that's obviously the same company" — nobody notices. But AI can't do that. AI takes data as-is. Garbage in, garbage out.

What This Costs in Real Money

Because "data quality" sounds abstract, here are the hard numbers:

Problem Cost
Wrong customer data (duplicates) Lost deals, approx. 3-5% revenue loss
Missed payment discounts (missing data) $30,000-50,000/year at 500 invoices/month
Manual data cleansing labor 1-2 full-time positions doing nothing else
Failed AI project $50,000-200,000 burned budget
Compliance violation (GDPR, NIS2) Up to $10M in fines

These aren't hypotheticals. These are numbers I've seen at real companies. And the failed AI project doesn't just cost the budget — it costs management's trust in any future AI initiative. That's the real damage.

Data Observability: Stop Guessing, Start Measuring

Let's get practical. How do you actually get data quality under control?

The first step is brutally simple and almost never done: Measure it.

Most companies have no idea how good or bad their data actually is. They guess. "Pretty good." "Could be better." "It's fine."

That's like a CEO saying "our finances are pretty good" — without ever looking at the balance sheet.

Data observability means you measure your data quality. Continuously. Automatically. Not an annual audit — every single day.

Specifically:

  • Completeness: What percentage of required fields are actually filled?
  • Uniqueness: How many duplicates do you have? Customers, suppliers, products?
  • Freshness: How old is your master data? When was it last updated?
  • Consistency: Is the same data identical across different systems?
  • Accuracy: Are addresses, phone numbers, and bank details still correct?

Sounds like a lot of work? A simple dashboard tracking these five metrics can be built in two to three days. And from that moment on, you know where you stand. Not "pretty good" — but "83% of customer master data is complete, 12% has outdated addresses, 340 duplicates identified."

Quality Rules Built Into the Pipeline

The second step is what makes the long-term difference: governance-as-code.

Instead of someone writing a 40-page document defining data quality rules — that nobody reads — you build the rules directly into your data pipelines.

A few examples:

  • No new customer record without required fields. Period. No workaround, no "I'll fill it in later." If industry, contact person, and payment terms are missing, the record doesn't get created.
  • Automatic duplicate detection on every new entry. Fuzzy matching — catches "Miller" vs. "Müller" vs. "Mueller."
  • Format validation at the point of entry. Dates must be ISO format, ZIP codes must be 5 digits, emails must contain @. Sounds trivial, saves hours.
  • Anomaly alerts. If suddenly 50% of new records have an empty cost center, someone needs to know. Immediately. Not at the next quarterly review.

The point is: these rules run automatically. Every record that enters the system gets validated. No human has to do it. No human can bypass it.

The Roadmap: 4 Steps to AI-Ready Data

No 200-page strategy document. Four steps. You can start the first one tomorrow.

Step 1: Data Audit (1-2 weeks)

Take your ERP, CRM, and the three most important Excel files (you know which ones). Measure:

  • How many customer duplicates do you have?
  • What percentage of required fields are actually filled?
  • How old is your master data?
  • How many different formats exist per field?

The results will hurt. That's the point. Because only when you know how bad it is can you decide what to fix first.

Step 2: Quick Wins (2-4 weeks)

Fix the worst problems first:

  • Merge duplicates (automated, not manual!)
  • Make required fields actually required — in the system, not just on paper
  • Implement format validation
  • Identify shadow Excel files and decide: Do we need this data? If yes, migrate it. If no, archive it.

Step 3: Set Up Monitoring (1 week)

A dashboard showing the five core metrics daily. Doesn't need to be fancy. Can be a simple web app, can be an automated email report. The key: you see every day whether quality is improving or declining.

Step 4: Automate Governance (2-4 weeks)

Build quality rules into your pipelines. Validation at entry, duplicate detection on creation, anomaly alerts. From this point on, quality can't silently degrade anymore.

Total effort: 6-11 weeks. Investment: $8,000-25,000 depending on complexity.

For comparison: a failed AI project costs $50,000-200,000. Do the math.

And Then? Then Come the AI Agents.

Once you've completed these four steps, something remarkable happens: suddenly, all the things that didn't work before start working.

  • The AI agent for invoice processing? Reliably identifies suppliers because there are no more duplicates.
  • The automated mailbox sorting? Assigns emails to the correct customer because master data is accurate.
  • The reporting dashboard? Shows consistent numbers because data across all systems is actually consistent.

Clean data isn't a goal in itself. It's the prerequisite for everything that comes after.

Bottom Line: Not Glamorous, But Critical

I know data quality isn't sexy. Nobody posts on LinkedIn: "We merged 3,400 customer duplicates!" There's no standing ovation in the boardroom for that.

But you know what is impressive? When your AI project actually works. When the ROI is real. When you're among the 40% that don't fail.

And you know what that requires? Someone who sits down and does the unglamorous work. Cleans up the duplicates. Defines the required fields. Sets up the monitoring. All the unsexy stuff nobody wants to talk about.

That's exactly what I do. Not the PowerPoint presentation about "data strategies." I look at the data, clean it up, and put guardrails in place — so your AI projects actually have a chance.


Want to know where your data quality stands before investing in AI? Get in touch. I'll give you an honest audit. No sales pitch — just facts.

Read this article in German: datenqualitaet-ki-projekte

Data Quality: Why 60% of AI Projects Fail — And How to Fix It | Sven Ricke