The Explainability Crisis in Machine Learning
Machine learning systems now make decisions that affect loan approvals, hiring, medical diagnoses, and criminal sentencing—yet most organisations deploying them cannot articulate why a specific prediction was made.
This is not a technical limitation. It is a choice disguised as inevitability. The field has optimised for accuracy at the expense of intelligibility, creating a widening gap between what models do and what their operators understand. That gap is becoming untenable.
What Everyone Gets Wrong About Explainability
The dominant narrative treats explainability as a post-hoc problem. Build the best-performing model first, the thinking goes, then bolt on interpretation tools afterward. LIME, SHAP, attention visualisations—these are presented as solutions that restore transparency to black boxes.
They do not. They are forensic tools applied to corpses. They can approximate which features mattered in a particular prediction, but they cannot answer the question that actually matters: Is this system making decisions for reasons I would endorse?
The confusion runs deeper. Many organisations conflate explainability with interpretability. A model can be interpretable—its mechanics transparent—without being explainable in any meaningful sense. A decision tree with 500 branches is technically interpretable; it is also incomprehensible. Conversely, a neural network's internal states may be opaque, but if it is trained on data and objectives you understand, you can explain its outputs in business terms.
The real problem is that explainability requires alignment between three things: the model's logic, the data it learned from, and the values embedded in how success is measured. When those three are misaligned—when a model optimises for a metric that does not capture what you actually care about—no amount of post-hoc interpretation fixes it.
Why This Matters More Than People Realise
The explainability crisis is not primarily a compliance issue. Yes, regulators are demanding it. Yes, lawsuits are coming. But the deeper cost is epistemic: organisations are losing the ability to learn from their own systems.
When you cannot explain a prediction, you cannot debug it. You cannot identify whether the model has learned something genuinely useful or merely exploited a statistical artefact in the training data. You cannot tell whether it is making decisions that align with your actual business objectives or optimising for a proxy that looked good on a holdout set. You cannot spot when it has learned to discriminate on protected characteristics through proxy variables.
This creates a peculiar form of organisational blindness. The system works—it has high accuracy—but you do not know why. You cannot transfer that knowledge to new contexts. You cannot explain it to stakeholders. You cannot defend it when it fails. And it will fail, because all models fail on data they have not seen.
The second-order effect is worse: teams stop asking hard questions. If the model is a black box, scrutiny feels futile. Responsibility diffuses. "The algorithm decided" becomes an excuse rather than an explanation.
What Actually Changes When You See It Clearly
The solution is not to abandon machine learning or demand perfect transparency. It is to invert the development process.
Start with explainability as a constraint, not an afterthought. Define, before building, what kinds of explanations would satisfy your stakeholders and regulators. Then design models that can produce those explanations. This often means accepting lower accuracy in exchange for intelligibility—a trade-off that is almost always worth making.
It means treating data quality and feature engineering as primary concerns, not preprocessing steps. It means auditing training data for the values it encodes. It means measuring success against multiple objectives, not a single metric.
Most importantly, it means recognising that explainability is not a technical problem. It is a governance problem. Someone must be responsible for understanding the system, defending its decisions, and knowing when to override it. That responsibility cannot be delegated to a post-hoc interpretation tool.
The organisations that will lead in machine learning are not those with the most sophisticated models. They are those that can explain them.