From Lines of Code to Tokens Burned: When AI Token Usage Becomes Performance Theater — The Return of a Bad Engineering Metric
When CEO of NVidia Jensen Huang says "If your $500K engineer isn't burning at least $250K in tokens, something is wrong.", it brought a lot of people's attention to the new paradigm that tokens consumption has become a key metric in AI adoption. Although it's from the guy selling shovels in a gold rush saying you need to dig more, there's some truth in how companies are evaluating their own developers.
Some companies are evaluating developers on if they have used up their token quota, and some companies focus on if there is a "lack of" token usage for certain developers, which may be a signal that this developer is not keeping up with the trend of AI development. The token quota for developers has even become a benefit, or status symbol for some developers, signaling that they are doing more work than others and being more AI adept than his peers.
All of this is reminiscent of how some companies in the 1970s and 1980s used to use line of code to evaluate developer performance. It has the same simple, direct, and easy to understand relationship to how much work is being done, and it's accurately quantifiable for everyone. But as Bill Gates famously said: "Measuring programming progress by lines of code is like measuring aircraft building progress by weight.", and once a measure becomes a target, it basically ceases to be a good measure. Developers can easily write bloated code to meet their quotas.
Evaluating work or the level of AI adoption by token usage has almost the exact same problematic pattern. It's extremely easy to bloat token usage by paralleling agents to do simple tasks and just call it being careful and thorough. Modern larger LLM models have a high tendency to over-engineer, writing lines of code that looks professional but serve very little purpose in terms of the actual context of how the code is being used. KPI like this will certainly encourage developers to both intentionally and unintentionally write code that are costly to maintain.
Like the cost of maintaining an application with a high line of code, high token consumption is a very direct cost to the companies. If team members start to compete on how many tokens they consumed, it will most certainly drive up development cost. Or even making some relatively simple work a lot more complicated than it should be. Worst of all, AI assisted development tend to accelerate the growth of one’s code base size. Whether it’s a result of higher productivity, or a result of AI over-engineering, or even developers intentionally bloating their output, a larger code base will always cost more token to maintain, and as multiple studies has shown, AI coding quality deteriorates as the relevant code base gets larger.
AI tokens have all the characteristics of a bad metrics like line of code. Consuming more tokens is not a direct indication of the developer using AI in a more sophisticated way, and the lack of token usage does not necessary mean that the developer is not keeping up with the age of AI. Actually in most cases, the nature of the problem that the developer is trying to solve decides how much token should be consumed in normal circumstances.
A more practical approach that some managers are taking is to simply not taking AI token usage into account. If a developer actually benefits from AI, the performance numbers already in place before AI should be a clear enough indication that AI is helping. All AI benefits should be measured as actual performance result from a developer, whether AI is helping or not. A developer using less AI to achieve the same result should not be punished just because he used less AI. Although right now most managers does not reward doing the same amount of work with less token, but as AI assisted development becomes more and more common that does not require the extra incentive, in the long term it will make sense to encourage developers to be smart about their token consumption and be AI cost conscious with their work.
Currently the trend of burning as much token as one can is being fueled by massive subsidy from AI model vendors, making AI subscriptions significantly cheaper than their by API call counterpart. Most expert believe this is not a sustainable model, and AI token cost will eventually need to be reflected more clearly when the subscription subsidy gradually goes away. When the cost of token becomes more and more closer to reality, development teams will start to realize how unreasonable it is to use token usage as a performance metrics. The performance metrics that were already in place for the developers should be the actual bar to measure developer performance. In the end, using AI is always just a means to the end, not the end itself.
Category: Tips and Tricks > Digital Tips and Tools
