Technical Debt: Strategies & Tactics for Avoiding & Removing It
For clarity, this blog will focus on how to manage technical debt i.e. make it visible, communicate it, quantify it, and prioritise its removal. This is because in my experience it is not only a challenge to fix the debt, it can also be very challenging to get time and resources assigned to fix it unless a high profile issue has occurred which highlights the debt.
Most companies are creating technical debt all the time – lets take an example:
“The team I’m part of has just started a project. We don’t have a test environment but we’ve got our dev boxes and a build server so we can get cracking. Our definition of done is okay, we’ve said we’ll use coding standards, will build features to the acceptance criteria in the story and will have a review of features with the Product Owner. We’ll write unit tests on new bits of code, and prepare test scripts but because we haven’t got a test environment we won’t be able to do integration testing or regression test the features to ensure nothing has been broken in the process of creating new features. We’ve done a couple of sprints and there’s more bugs in the solution but once the test environment is available we’ll sort them all out. “
For some clients we work with, this in not a rare scenario, but inherent within this approach is the creation of large amounts of technical debt and its growing exponentially. I think the following slide fromColin Bird’sCSM course highlights the problem very succinctly.
As these items of increasing bugs, lack of regression testing, unclear coding standards, unclear understanding of the quality of code being produced due to a lack of pair programming, TDD or even peer review and a complete lack of visibility of how the solution will perform in an integrated, production environment increase sprint on sprint (phew!)… so the mounting technical debt will start to impede velocity and add more work on to the end of the project in order to move the solution from an immature definition of done to production ready.
The typical result of this approach left unaddressed is a set of hardening and stabilisation sprints that take months to get into production. This causes knock-on effects of losing stakeholder trust and support, blocking other projects from starting, keeping resources longer than expected messing up resource planning, and generally losing kudos and trust across your internal teams – and that’s before we talk about the impact on consumers of your product!
Strategy 1 – Evolving a Mature Definition of Done
One of the first major conversations I expect to have with a Scrum team is about the definition of done. Scrum specifies that the team should create ‘potentially shippable product’ every sprint. This is, in essence, the heart of the problem of adopting Scrum. It is the reason why so many technical, engineering and test capabilities must change and adapt in order to be able to achieve this ‘rule’.
An immature Definition of Done speaks volumes about the team’s overall ability to reach hyper-productivity levels of software development. If the gap between a story done in sprint and a story deployed into production is big, then after the final ‘development’ sprint, you should all expect a considerable period of hardening and stabilisation.
We’ve recently been using Value Stream Maps to help articulate the flow of work through development teams. The Value Stream Maps show customer expectations, information flows, physical flows, productivity metrics and value stream metrics including waste.
The physical flow often shows software being created and then moving through the environments of development, test, user acceptance, pre-production and into production. So, with your team, where is code deployed within the sprint i.e. when its tested and set to done in sprint, how far away is it from production?
And what is the cost or amount of pain to be inflicted before it reaches production?
How many unknowns exist between your team’s definition of done and production ready code?
What are you doing to evolve the Definition of Done to reduce the gap between sprint done and production done?
Again value stream maps come to the fore when considering these questions as does Application Lifecycle Management (ALM).
In order to avoid the long and painful hardening/stabilisation period, the team must be focussed on continuously evolving the Definition of Done to bring feedback loops and risks into the sprint and reduce the gap from sprint to production so that they are truly creating ‘potentially shippable product’.
Ahh, the 2 x 2 grid! Takes me back to hours upon hours of MBA study! Thank god that’s done with!
I have often used this grid to work with teams on prevention. i.e. preventative measures in order to reduce future technical debt. As with all preventative activities, they usually take longer to implement but when adopted have a significant impact.
Use these categories to decide which sources of debt are acceptable and which are not acceptable within your organisation, and then establish tactics for preventing unacceptable behaviour…
Prudent & Deliberate
Generally, I don’t have a problem with behaviours that sit in the top right quadrant… assuming that is really where they belong. Often this quadrant is misunderstood. It is not about having to ship because stakeholders are expecting a shipping date, or someone’s bonus depends on it shipping on a set date – those scenarios actually belong in the top left quadrant.
The top right quadrant is reserved for business drivers that have a compelling ROI for why a product has to ship immediately – i.e. responding to a threat to the business bottom line, or exploiting an opportunity that positively affects the bottom line. If in doubt, a good challenge would be to inform a board member of the situation and ask if they are aware that the product is shipping due to a compelling business agenda.
Examples of where the top-right might be justified:
Entry into new market (first to market may mean that less quality or features with more work-arounds might be acceptable)
Regulatory or Legal Requirement (a compliance date has been set whereby non-compliance would represent significant negative brand, operational or financial impact to the business
Peak-period opportunity – within retail the Christmas period is increasingly becoming the most critical part of the annual cycle and failure to launch prior to Christmas could have a significant negative impact on the company’s performance.
Reckless & Deliberate
Top left is indicative of poor management, usually corners are being cut in order to hit a deadline that is related to perceived operational needs rather than an underlying clear business case. Rushing teams because someone somewhere has communicated a deadline and then driving to hit the deadline because of the deadline’s sake rather than a compelling business need.
This is a very common cause of Technical Debt. If the board or shareholders were informed that a project was cutting corners and creating technical debt which will slow the company down in the future, they’d want to know why. In fact, I think that if the subject of technical debt was discussed more openly and quantified, they’d be far more examination of portfolio management capabilities.
A lot of the time, this is occurring not because there is a business case for hitting this date, rather due to how the company delivers projects. Examples of Reckless & Deliberate:
Rushing the project to completion because there’s lots more projects to deliver and the matrix resource plan needs to transfer resources
Cutting corners in a project because the programme manager and/or director have incentivised objectives based on the project being delivered into production this year
Pushing the project through because the client wants it on a set date, no one has built a relationship with the client to discuss the details, nor has the client been informed of the affect on quality if the delivery is rushed
None of these are necessarily easy to change, but, stopping these behaviours will have a significant, positive impact on the long-term velocity and well-being of the company.
Reckless & Inadvertent
Incompetence at one level or another is the key contributor to debt created within this quadrant. You don’t know what you don’t know and could therefore be blissfully unaware that you are creating a huge amount of technical debt. As a manager/leader, I’d want to prevent the reckless & inadvertent creation of technical debt and there are many tools to help me do this.
Essentially, this is about investing in your people, processes and tools – again something not done enough!
Pair programming, code reviews, static code analysis, continuous integration, automated test suites help to provide feedback on code and design quality. Communities of practice, clear role description, personal development plans, training budgets with technical strategy alignment are tactics for helping people get better at what they do and lightweight iterative methods with visual management help to ensure processes are continuously reviewed and improved.
Prudent & Inadvertent
This is a natural occurrence. Regardless of what walk of life we are in, or what job we do, over time, we’ll return to a previous piece of work and see a better way of doing it. It is the natural sequence to gain more domain knowledge about a particular piece of work.
Just because we know that with hindsight and increased experience in a year or two’s time we’ll look back and see a better way, doesn’t mean that we procrastinate or spend large amounts of time trying to second guess the future.
Rather, we keep to the agile principles of:
Our highest priority is to satisfy the customer through early and continuous deliver of valuable software.
Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale
Working software is the primary measure of focus
Continuous attention to technical excellence and good design enhances agility.
The best architectures, requirements and designs emerge from self-organising teams
At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behaviour accordingly
So, lets assume we’ve implemented some tactics for reducing our Reckless & Inadvertent debt and our Reckless & Deliberate debt and now we’re going to spend some time ‘speeding-up’ the company.
Where Does it Hurt?
This is how I have approached the problem in the past.
The first thing to do is to try to establish how painful a particular piece of technical debt is. i.e. if you have to touch this part of the solution in the future, how ‘painful’ will it be? How much additional time will be spent understanding the area, re-writing parts to get is to work or integrate with other new parts? This ‘pain’ factor is represented by the yellow part below and we’ll refer to this as the interest.
Next for each item of debt, we need to understand the effort involved in fixing it, and we’ll refer to this estimate as the Principal.
How Much Does it Hurt?
Having established how much interest is payable upon the use of any part, and knowing what it takes to fix it - the principal - we can now work out which pieces to fix, and why.
Looking at the slide above, which piece of debt do you think should be fixed first?
The answer with this amount of information is A, as it has the highest amount of ‘pain’ or Interest, with the lowest effort to resolve ‘Principal’. However, what we haven’t considered yet is time and the frequency of payments. i.e. what if in the next 6 months item A on the left will only be changed/touched once, but the item on the far right ‘D’ will be touched 10 times?
Now which item of debt do you think should be fixed first?
So we’ve established the frequency of Interest payments and found out that the most painful item of debt is item D on the far right. Have we finished?
No. Because until now we’ve focussed purely on the IT decision-making side of Technical Debt i.e. how painful it is, what it takes to fix it, and how often it hurts our development efforts. What we have yet to consider is whether our prioritisation takes into account business value.
A fair few years ago as Head of Software Development, my team created a product portfolio. i.e. we took all our disparate applications (and there were many, and they were disparate) and grouped them into families of applications that served specific business needs.
This was a very powerful exercise and it helped us to clearly discern where we were adding value and supporting the business. It also allowed us to quantify the value of software applications based on which business units were using them and how much revenue was being generated by those units.
In this way, we were able to prioritise fixing technical debt based on a clear alignment of business value.
So now we can quantify the size and frequency of Interest payments and we know what it will take to pay-off the principal to remove the debt. In addition to this we can prioritise the debt in terms of the value of the application to the business at the moment, but what about the foreseeable future.
The final consideration when prioritising technical debt activities across your application portfolio is the future needs of the business upon the application landscape. Which systems are critical to the future of the business? Which will be decommissioned as the portfolio moves through the foreseeable future?
In answering these questions, we will understand the future needs of the business upon the application landscape and be able to focus the entire development effort on improving the highest value systems of the business both for today and tomorrow.
Technical Debt Portfolio Prioritisation
By taking the factors we’ve already discussed: Interest Payment; Principal; Frequency of Payment; Business Value; and Strategic Intent - we can now create a high level, business-derived prioritisation matrix for articulating, quantifying, prioritising and resolving Technical Debt.
Thefirst step is to map current business value/usage against the future strategic needs of the company.
This mapping will provide an initial view of the value of applications within the landscape.
Once these applications have been reviewed, categorised and the output discussed with business stakeholders, we can initially provide a generic approach to dealing with the debt:
Fix the Debt
Don’t Touch the Application
Focus on Prevention
Strategy 4 - Tactics for Fixing Technical Debt
We now have a means of categorising all the applications within the landscape and assigning relative value to them based on current and future usage requirements. Within these categories we are able to specify the ‘pain’ of technical debt, as well as the effort required to fix it.
We also have some strategies for dealing with the debt:
Ignore – for applications that are Dogs
Reduce – for Cash Cows to ensure the application is operationally maintainable
Remove – for the Stars
Prevent – for the Problem Children
Hide the Work- I’ve seen teams hide the work to fix the Technical Debt. This tactic can be successful but I feel it goes against the grain of transparency and honesty and ultimately the fixing of debt should be a business driven activity. But, I have seen this tactic work so it should be considered within your context.
Leave the Code in a Better State- The on-going development strategy should always be to leave the code in a better state. It’s a simple statement that is rarely adhered to. The rule is that every time you touch a piece of code, you leave it in a better state. It could be as simple as an additional unit test or some clearly articulated comments or as complex as writing a suite of unit tests and refactoring a component. If every single developer adopted this strategy in your organisation the state of the code base would improve significantly over time.
Ask to Leave Code in a Better State– This tactic is closely related to the one above, but is cognizant of the fact that some companies micro-manage their teams to the extent that any time not spent directly working on the addition of features can be rapidly identified and can cause friction. This tactic addresses the scenario where a developer identifies a piece of technical debt during a sprint, that they think should be fixed. Rather than just sorting out the problem (which is desirable outcome), the developer would escalate to the Scrum Master and Product Owner. If you have to resort to this tactic then there is a lot of work to do in educating the business and IT management about the problems of technical debt, as well as a recognition that you have not provided an environment that allows self-organising teams to flourish.
Create Story & Justify– Typically, leave the code in a better state is not a fully adopted way of working across development teams and the more typical way of fixing technical debt (other than complete re-platforming) is to create individual stories and treat them as ordinary backlog items. When you get this up and working fully, involving the Product Owner in the reasoning behind the work, both parties learn a great deal about each other’s perspectives and it can be a valuable lesson in understanding the business and technical domains.
Allocate Release Ratio– Another tactic I have introduced in order to achieve some progress in fixing technical debt is to ‘Allocate a Release Ratio’. When articulating the value of addressing technical debt and prioritising it within a Product Backlog, I have often seen that despite best intentions, the Technical Debt items sink toward the bottom of the backlog and do not get resolved.
There are a number of reasons why this might occur so to mitigate against this occurrence, I seek agreement from key stakeholders to allocate a certain amount of a release backlog to technical debt items. This approach has certainly been successful in ensuring TD items are addressed. Ultimately though, the best tactic remains to leave the code in a better state as it is more efficient.
Ultimately this is about understanding the value of your application landscape, understanding where its weaknesses lie and finding a way within your organisation to ensure it is fit for purpose now and in the future. I hope that the next time you hear the term Technical Debt you’ll have gained a clear understanding of what it is from Martin Fowler’s blogs and you’ll also have some tips for how to address the problem from this blog.