DataSan — Practical AI for Real Enterprise Workflows

"We'd love to do AI, but our data isn't ready."

This is the most common reason APAC enterprises give for delaying automation and AI initiatives. And it's almost always wrong — not because the data is fine, but because "data readiness" is the wrong frame.

The real question isn't "is our data clean?" It's "is our data clean enough for this specific use case?" Those are very different questions with very different answers.

The Data Quality Myth

There's an implicit belief in many organisations that data must be cleaned, normalised, and centralised before any AI or automation project can begin. This belief has a name: the data quality myth. And it's responsible for more stalled AI initiatives than any technical limitation.

The myth works like this: leadership announces an AI initiative. The first discovery phase reveals that data is scattered across systems, inconsistently formatted, partially duplicated, and sometimes contradictory. A "data quality improvement programme" is launched as a prerequisite. Months pass. The data quality programme becomes a project unto itself. The original AI initiative quietly dies of old age.

We've seen this cycle play out in banks, manufacturers, logistics companies, and consumer goods firms across APAC. The data quality programme becomes an end in itself, consuming budget and attention without ever enabling the operational improvements that justified it.

Reframing the Problem

Data quality isn't binary. It's a spectrum, and the acceptable point on that spectrum depends entirely on what you're trying to do.

An invoice processing automation needs vendor names, invoice numbers, amounts, and dates to be extractable and matchable. It doesn't need a clean, normalised master data set across every entity in the organisation.

A compliance checking workflow needs regulatory rules to be encoded and document types to be classifiable. It doesn't need a unified data lake with complete historical data.

A trade promotion reconciliation system needs claim documents to be matched against promotion terms. It doesn't need a perfectly integrated CRM-ERP-TPM data model.

In each case, the data requirements are specific and bounded. And in each case, the automation itself can improve data quality as a byproduct — by standardising extraction, validating against rules, and flagging inconsistencies at the point of entry rather than months later.

The Use-Case-First Approach

Instead of trying to boil the ocean on data quality, start with a specific use case and ask: "What data does this workflow need, where does it come from, and what does 'good enough' look like?"

Identify the data sources: For invoice automation, the sources are the invoices themselves plus the ERP data (POs, goods receipts). For compliance checking, the sources are the regulatory databases plus the documents being checked.

Define the quality threshold: What accuracy level makes the automation useful? 95% extraction accuracy on invoices, with human review for exceptions, might deliver 70% time savings. That's a perfectly good starting point.

Build quality improvement into the workflow: The automation should flag data quality issues as it encounters them. An invoice with an unrecognisable vendor name gets flagged, the human resolves it, and the system learns. Data quality improves as a byproduct of processing, not as a prerequisite.

Measure and iterate: Track the quality metrics that matter for your use case. If extraction accuracy is 92% in month one and 97% by month three, you're on the right trajectory. You don't need to wait for perfection before starting.

The Hidden Cost of Waiting

While organisations wait for "data readiness," the manual processes continue. People keep typing data into spreadsheets, copying between systems, and making decisions based on incomplete information.

Every month of delay has a real cost — not just in operational inefficiency, but in the compounding effect of not collecting the structured data that automation would produce. The organisation falls further behind the data quality curve, not closer to it.

The paradox of enterprise data quality is that the best way to improve it is to start using the data — with intelligent systems that can handle imperfection, flag issues, and learn over time.

Waiting for perfect data is waiting forever. Start with good enough, and improve from there.

Let's find your starting point

The Data Quality Problem Nobody Wants to Talk About

The Data Quality Myth

Reframing the Problem

The Use-Case-First Approach

The Hidden Cost of Waiting