The Engineering Productivity Measurement Problem

There is perhaps no topic more fraught in software engineering management than developer productivity measurement. The disagreements are both technical — about what can and cannot be meaningfully quantified — and philosophical, touching on questions about trust, autonomy, and the nature of creative technical work.

Yet the demand from engineering leaders and finance organizations for better productivity data is real and growing. Engineering is one of the largest and most expensive functions in technology companies, and the pressure to demonstrate the return on that investment — especially in an environment where technology hiring has faced renewed scrutiny — is not going away. The question is not whether to measure engineering productivity, but how to do it in a way that generates genuine insight without creating perverse incentives or eroding the trust that high-performing engineering teams require.

At DeepDots Ventures, we invest in companies building developer experience and engineering analytics tools, which means we have thought deeply about this problem from both a product and an investment perspective. This article shares our current thinking.

Why Simple Activity Metrics Fail

The most common approach to measuring developer productivity — counting outputs like commits, pull requests, lines of code, and tickets closed — fails because these metrics are easily gamed and poorly correlated with the outcomes that actually matter. A developer who merges twelve small pull requests in a day is not necessarily more productive than one who spends the same day deep in a single complex refactoring. A team that closes fifty Jira tickets per sprint is not necessarily delivering more value than one that closes twenty.

The gaming problem is particularly damaging. When developers know they are measured on commit count, they make smaller, more frequent commits. When they are measured on pull request count, they break changes into smaller units that are individually less valuable. When they are measured on tickets closed, they gravitate toward smaller tickets. The measurement system creates incentives that work against the organizational interest in large-scale, high-impact engineering work.

The correlation problem is equally serious. The outputs that are easy to count — commits, PRs, tickets — are the smallest-granularity artifacts of engineering work. The outcomes that actually matter — features shipped, technical debt reduced, system reliability improved, team capability developed — are largely invisible to automated measurement systems. Optimizing for measurable outputs at the cost of unmeasured outcomes is a classic Goodhart's Law failure: when a measure becomes a target, it ceases to be a good measure.

The DORA Framework: Valuable but Incomplete

The DORA metrics — deployment frequency, lead time for changes, mean time to restore, and change failure rate — represent a genuine advance in engineering productivity measurement. They measure outcomes (delivery performance, system reliability) rather than activities, they are grounded in extensive empirical research across thousands of organizations, and they provide actionable benchmarks that allow engineering teams to understand where they stand relative to industry peers.

But DORA has limitations that become apparent when it is used as the primary measurement framework. It captures delivery performance but not the quality of what is being delivered. It measures the velocity of the delivery pipeline but not the value of the features moving through it. And it is an inherently team or organization-level measurement — it cannot provide visibility into the productivity or contribution of individual engineers.

Several high-profile engineering productivity controversies in 2023 and 2024 — most notably the debate over a widely-circulated paper proposing to use DORA metrics as a basis for individual performance assessment — revealed the limitations of applying team-level delivery metrics to individual evaluation. The research community and engineering leaders converged on a consensus that DORA is valuable for organizational benchmarking and improvement but is not appropriate for individual performance management.

The SPACE Framework: A More Complete Picture

The SPACE framework, published by researchers from Microsoft Research, the University of Victoria, and GitHub, was developed specifically to address the limitations of activity-based productivity metrics. The framework identifies five dimensions of developer productivity: Satisfaction and wellbeing, Performance, Activity, Communication and collaboration, and Efficiency and flow.

The power of SPACE is that it treats developer productivity as multidimensional and acknowledges the importance of subjective factors — satisfaction, wellbeing, and the quality of flow states — alongside objective measures. Research consistently shows that developer satisfaction is a leading indicator of productivity: satisfied developers are more productive, produce higher-quality work, and stay at organizations longer. Measurement systems that ignore this dimension are measuring a shadow of the underlying reality.

The challenge of implementing SPACE is that several of its dimensions require qualitative data collection — developer experience surveys, retrospective feedback, and similar instruments that are slower and more expensive than automated instrumentation. Organizations that do this well invest in regular, structured developer satisfaction measurement and treat the results as first-class signals alongside their quantitative delivery metrics.

The Flow State Problem

One of the most important and least well-measured aspects of developer productivity is the quality of flow states — the extended periods of uninterrupted, deeply focused work that produce the highest-quality engineering output. Research on flow states in knowledge workers shows that they require extended periods of uninterrupted focus (typically 45 to 90 minutes or more to initiate), that they are highly susceptible to interruption, and that they produce output of substantially higher quality and quantity than work done without flow.

The number of hours per day in which a developer is in a flow state is one of the strongest predictors of engineering productivity we are aware of. But it is also one of the most difficult to measure directly and one of the most sensitive to surveillance concerns. The commercial tools that attempt to measure flow states — typically through analysis of continuous coding activity in IDE plugins — are genuinely valuable but require careful communication and transparent data handling to avoid the perception that they are monitoring tools rather than productivity improvement tools.

We are interested in companies that can measure the organizational conditions that support flow states — meeting load, interruption frequency, collaboration overhead — and provide actionable recommendations for improving those conditions, without crossing the line into individual surveillance. The organizational-level measurement approach avoids the trust and privacy concerns of individual monitoring while providing the data that engineering leaders actually need.

Investment Opportunities in Engineering Analytics

The engineering analytics and developer productivity measurement market is one of our most active investment areas at DeepDots. We see it as a market that is being driven by converging forces: the growing scale of engineering organizations, the pressure on technology companies to justify their engineering investments, and the genuine desire of engineering leaders to improve their teams' performance rather than just justify headcount.

The specific product approaches we find compelling combine automated instrumentation of engineering systems (code repositories, CI/CD pipelines, issue trackers) with developer-self-reported data and present the combination in ways that are transparently explained and clearly beneficial to developers, not just to management. The companies that crack this combination — providing genuine insight for engineering leaders without creating surveillance dynamics that damage developer trust — will build very durable businesses.

We also see significant opportunity in the benchmarking and comparison layer: tools that allow organizations to understand their engineering metrics in the context of industry benchmarks, peer company comparisons, and longitudinal trends. This data is uniquely valuable to engineering leaders and the finance organizations they work with, and the companies that can aggregate anonymized benchmark data across their customer base will have a powerful and defensible data advantage.

Key Takeaways

Activity-based metrics (commits, PRs, tickets) are easily gamed and poorly correlated with engineering outcomes that matter.
DORA provides valuable delivery performance benchmarking but does not capture the full picture of engineering productivity.
SPACE addresses DORA's limitations by treating productivity as multidimensional, including satisfaction and flow state quality.
Flow state quality is one of the strongest predictors of engineering productivity but requires careful measurement approaches to avoid surveillance concerns.
The winning formula combines automated instrumentation with self-reported data, presented transparently as a developer benefit.

← All Insights View Our Portfolio →