Metrics, the hard way, part one
Stumbling through the fuzzy front end of software delivery performance metrics
Umm, no, I didn’t gather any hard metrics
In late summer on San Francisco’s peninsula, the weather turns simultaneously hot and cold. Walk down a street in San Mateo, Redwood City, or Palo Alto, and you’ll inevitably see someone wearing shorts and flip-flops next to someone wearing a puffer vest, flannel shirt, and boots. Wind and fog blow over the coastal hills while the sun beats down on bayside communities. These communities house Silicon Valley, where developers sit in sprawling office parks churning out software in breathtaking quantities.
In one of those office parks, my product group is sitting in a conference room preparing for two dreaded events: the company’s user conference and our customer advisory board. During these events, we’ll tell stories about what we’ve accomplished over the past year, what we’ll deliver in the year ahead, and when customers can expect to see those new products and features. To do this, we must build demos and create PowerPoint presentations that we’ll show on stage at various conference sessions. Other presentations, prefaced with disclaimers about the unreliability of future-looking statements, we will present to select customers at our annual two-day customer advisory board meetings.
Like the dichotomous weather, the presentations are somewhat at odds, with conference materials intended for public consumption and customer advisory board materials laced with fantasies about what we might build if only we had the time, money, and resource.
The year prior, the customer advisory board went poorly. I was berated by a roomful of angry customers complaining about the software’s poor quality and our inability to deliver on our promised roadmap. This year, I was determined that wouldn’t happen. So, my team and I prepared drafts of presentations to demonstrate the progress we’d made in fixing the previous year’s problems.
The group’s senior vice president sat listening to our presentation walkthrough, tapping his pen on the edge of the conference room’s table.
“Where’s your data?”
When I’d sat with my team brainstorming the presentation’s outline and content, we’d never discussed metrics or collecting data. I’d advanced in my career but hadn’t matured enough to know how important data were to make a case or support an argument. Seems silly, I know, but I still had a lot to learn.
Product managers occupy a certain space in the software delivery lifecycle. It’s a weird space full of things that are hard to define and harder to measure, where instinct and intuition are useful tools. Most of the time, I was winging it, listening for insights or feedback, and then choosing a path with barely a consideration for an alternative. What I lacked, namely a methodology, I made up for with overconfidence spilling into bravado.
It worked sometimes but lacked rigor, which reflected on my products and teams.
I needed some metrics
I needed some metrics and a framework through which to interpret them. There were certain basic questions I couldn’t answer. How quickly do we deliver value to customers? How do we know when a feature is valid and useful? How do we know when that feature is done? How do we know we’ve delivered a high-quality, robust feature or product?
About ten years ago, a group of researchers began an investigation of “what capabilities and practices are important to accelerate the development and delivery of software and, in turn, value to companies”. The results were published in the annual State of DevOps reports starting in 2014 and eventually collected into the book Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, by Nicole Forsgren, Jez Humble, and Gene Kim.
Their research is deep and rigorous and promotes lead time as a key metric to measure the effectiveness of your software delivery organization. Lead time describes “the time it takes to go from a customer making a request to the request being satisfied” and has two parts: “the time it takes to design and validate a product or feature, and the time to deliver the feature to customers”.
Most of the book focuses on the latter part of lead time, where they describe and dig into four key metrics that measure how long it takes to deliver a feature: delivery lead time, deployment frequency, change failure rates, and mean time to recover from failures. They apply to their survey data some sophisticated statistical analysis, sorting delivery organizations into high, medium, and low-performing cohorts.
It’s not foolproof but is pretty good for determining how well the team is at delivering software. The framework stresses continuous improvement and, as a result, influences a mindset of delivery performance. The measures are there to tell you how well the mindset is taking hold.
But…
There is a problem with the framework because it prefers very specific scientific measures. A deployment makes it to production, or it doesn’t. The software works, or it doesn’t. (Actually, this one turns out to be highly debatable. More on that in a future post.) A certain percentage of changes fail, and it takes this amount of time to resolve the failures. There’s little variability in the measure, and the data are easy to interpret.
Then, the authors drop this little nugget:
“...there are two parts of lead time: the time it takes to design or validate a product or feature, and the time to deliver the feature to customers. In the design part of the lead time, it’s often unclear when to start the clock, and often there is high variability. For this reason, Reinertsen calls this part of the lead time the “fuzzy front end” (Reinertsen 2009).
As it turns out, there is a substantial part of the software development process that confounds researchers–figuring out how to measure the effort it takes to know exactly what customers want.
The difficulty is introduced and then quickly discarded: “The delivery part of lead time… is easier to measure and has a lower variability.” In an accompanying table, the authors take pains to point out the uncertainty of estimates and the high variability of outcomes during the design, validation, and requirements-gathering phase.
And then, the rest of the chapter somewhat ebulliently describes cluster analysis and how, when applied to the four measures, it is possible to see quite precisely how well a development organization delivers software.
Cool.
Measuring the “fuzzy front end”
Clearly, as researchers focused on rigorous statistical analysis, the fuzzy front end is a terrifying place. It defies analysis. But that is what I love the most about it. The fuzzy front end is the playground for product managers.
How long does it take, and how efficiently do you go from an idea to validated and implementable requirements? If it seems like a ‘duh’-style question, I promise you it’s not. You can actually build good software without knowing the answer to this question. And I spent too much time during my career trying to prove that point, mainly because it took me a long time to figure out how to get and use reliable metrics. I didn’t have the background or skills to invent a model, collect the data, and use it in my day to day. It made my job much harder, resulting in questions from SVPs about whether I had any metrics to explain how well we’ve delivered software in the past and how likely our future products and features are to succeed.
Also, I never had any good dashboards for a retort, and it’s a universal truth that SVPs love dashboards.
I’ve had crazy-making conversations with engineering managers who’ve tried to convince me that this part of the software-making process–the part where you sit with ambiguity and nuance, where you practice close listening, where you spitball and hypothesize–isn’t actually part of what we are trying to measure. As long as we know how well we are doing at the delivery, the design and validation are orthogonal, at best.
What do customers want? Is it worth building? Can we make a profitable business from it?
There’s a flaw in their model, but it isn’t fatal. I’ve adapted the model by refining their design and delivery categories into product metrics and engineering metrics. The engineering delivery metrics are quite easily tracked using tools like GitHub, Jira, and CI-CD tools. The product metrics must be defined for your teams and then measured using tools like Jira, Google Forms, and feature flagging tools (for which you’ll need to d a build vs. buy analysis).
The product design metrics mirror what you have in the engineering delivery metrics:
Product lead time is a measure of how long it takes to uncover and validate requirements and write those requirements into your backlog.
Design time measures how long it takes to build prototypes or mockups you can use to get actionable feedback.
Design fail rate measures how often designs require significant changes or are rejected entirely and feeds back into the design time metric.
Voice of the customer measures how often and with how many customers you interview to gather feedback.
Combine the product design and engineering delivery metrics together, and you’ll have a really clear picture of the time it takes to deliver value to customers. And you’ll have all the baseline data needed to achieve a state of continuous improvement.
In part two, I’ll talk about how I use things like Objectives and Key Results (OKRs) to prioritize and track these measures. The goal is to establish clarity about “when to start the clock”.
Back to the future
The most important outcome of measuring your software delivery performance is reaching a state of continuous improvement. As I’ve talked about in the past, it’s tremendously important to establish a mindset, in this case, one where continuous improvement becomes ingrained in the team.
What happened is what was possible, but sometimes I wish I had a time machine and could take my learnings about metrics back to the early days of my career. I’d stand at the front of those rooms and show the story in numbers–we did well in this area, can improve in these others, and here’s what we are doing to achieve those improvements.
To that angry customer, I’d have the tools to say it took us thirty days from when you told us about your requirement to deliver the feature. That’s not so bad. And since you, dear customer, have a six-month change control window, perhaps you are upset about something else?
What a great perspective from the trenches of product management. Marty Cagan discusses how important it is to get the product choices correct through quick pivoting and iteration in his book Inspired. I'm looking forward to seeing the next article on this substack for guidance on how to measure effectiveness during this phase of delivery.