Why Standard Metrics Fail for AI
The two most common AI metrics — cost savings and time savings — are consistently insufficient. Cost savings are often theoretical (time saved doesn't automatically translate to headcount reduction or margin improvement). Time savings are hard to verify and hard to connect to business outcomes. When AI initiatives are evaluated primarily on these metrics, they either inflate the numbers to maintain budget or produce accurate numbers that fail to justify continued investment.
The measurement problem is also a framing problem. AI transformation is being measured as a cost reduction exercise when it's actually a capability expansion exercise — and those require different measurements.
The Three-Layer Measurement Framework
Layer 1: Activity Metrics (Lead Indicators)
What AI is doing: adoption rates, usage frequency, task coverage, prompt volumes. These measure whether AI is being used, not whether it's producing value. They matter because they're early signals — a decline in adoption often precedes a decline in outcomes. But they're not sufficient on their own.
Layer 2: Output Metrics (Process Outcomes)
What AI use produces: cycle time reduction, output quality scores, error rates, rework rates, throughput. These connect AI activity to process performance. A customer service team using AI should show measurable improvement in resolution times and customer satisfaction, not just in hours of AI tool usage. Output metrics require baseline data — you need to know what things looked like before AI to measure what they look like after.
Layer 3: Impact Metrics (Business Outcomes)
What process improvements produce: revenue impact, customer retention, market share, product quality, employee retention. These are the metrics that matter to boards and investors. They're also the hardest to attribute to AI specifically — many other factors affect them simultaneously. The methodology for impact attribution requires: pre/post comparison with a control group where possible, statistical rigour, and honest acknowledgment of attribution uncertainty.
The Baseline Problem
Impact measurement requires baselines. Many organisations launch AI initiatives without establishing baseline measurements of the processes they're trying to improve. This is not just a measurement problem — it's a strategic problem, because it makes it impossible to demonstrate value later.
The practice: before any significant AI initiative begins, spend two weeks measuring current state. Document the process, the time it takes, the quality of outputs, the error rate. These baselines are worth more than almost anything else you can do in week one of an AI initiative.
Building a Board-Level AI Business Case
A business case that sustains through setbacks connects three things:
- Strategic logic: Why does this capability matter for our competitive position? (Not "AI is important" but "this capability closes a specific competitive gap.")
- Measured progress: What have we learned so far, and what do the metrics say? Including honest reporting of what hasn't worked.
- Forward case: Based on current trajectory, what do we project at 12, 24, and 36 months? With explicit assumptions so the board can challenge them.