How much should I refactor when working on a feature for a legacy codebase?

As a developer, there’s a familiar moment many of us have experienced...

You open a file to add a small feature and suddenly you’re staring at code that feels dated, awkward, or harder to work with than you’d like.

Naturally, the instinct kicks in:

“While I’m here, I’ll just tidy this up.”

Optimise that method.

Rename a few things.

Reorganise the files.

Introduce a more modern pattern.

Upgrade the framework.

After all, that’s what good engineers do… right?

Well. Sometimes. And sometimes that’s how a one-line feature turns into a multi-sprint adventure, a nervous QA team, and a pull request that needs its own table of contents.

The blast radius problem

Refactoring isn’t inherently bad. In fact, leaving code worse than you found it is almost always the wrong call. But unplanned, opportunistic refactoring dramatically increases the blast radius of a change.

What started as “add a new button to export the data” can quickly become:

Increased regression risk
Missed sprint commitments
Unplanned QA workload
Hard-to-review pull requests
Delivery predictability going out the window

All of this for a feature that, from a user’s perspective, barely moved the needle.

The uncomfortable truth is that engineering craftsmanship and delivery risk are often in direct competition.

Best practice moves faster than codebases

One of the hardest lessons to internalise is that best practice evolves far faster than most production systems. A codebase that’s been alive for ten years has survived multiple frameworks, architectural fashions, and “this will never change” moments.

Judging it entirely by today’s standards isn’t always fair - or useful.

Trying to drag everything to “modern best practice” every time you touch it is a bit like insisting on rewiring an entire house because you wanted to add a plug socket. Technically admirable. Practically… questionable.

The engineer guitar solo

We’ve all seen it. Sometimes we’ve been it (I know I have).

A heroic refactor appears mid-feature. The diff explodes. The original requirement becomes almost incidental. The engineer knows it’s better - but now:

The PR is exhausting to review
QA has no idea what’s actually changed
The sprint plan quietly dies in the corner

It’s rarely malicious. It’s usually enthusiasm, curiosity, and a genuine desire to improve things. Especially with new starters, who are keen to make an impact and leave the codebase better than they found it.

That energy is valuable - but unmanaged, it introduces risk and uncertainty at exactly the wrong time.

IDE and AI productivity tooling

Tools like GitHub Copilot, ReSharper, VS Code extensions, and powerful modern IDEs such as Rider are all incredible productivity boosters. However, they are also very good at pulling your attention towards local improvements - often by suggesting refactors that may improve the codebase in isolation, but are not cognisant of the task you are working on or the need to stay focused on the job in hand.

Before applying any suggested changes, it’s worth considering the impact - not just in terms of the readability of the refactor itself (for example, changing an entire method into a dense one-liner that’s harder for the next reader to reason about), but also the wider impact on your task and the downstream players involved. That includes your tech lead reviewing the pull request, QA trying to understand what has changed, and the additional time pressure placed on the sprint as a whole.

I’ve certainly accepted suggestions like this myself without fully considering the knock-on effects.

It’s sometimes worth dialling these tools back, or being more selective about when you accept suggestions, so they support the task rather than subtly steer it.

Tests change the equation

Refactoring without safety nets increases uncertainty significantly.

Before any non-trivial refactor, it’s worth asking:

Do we have meaningful unit test coverage here?
If not, can some be back-filled first?
Are there integration tests covering this path?
Is there UI automation that would catch regressions?

The absence of tests doesn’t mean “never refactor” - but it does mean the cost, risk, and time all increase. Often significantly.

And yes, time spent increases too. That needs to be acknowledged explicitly, not quietly absorbed and then explained away later.

Plan to fix the mess early

Many refactoring debates arrive too late - mid-implementation, when momentum is already high.

A better place for these conversations is refinement or technical planning:

Identify the ugly parts before starting
Agree how far you’ll go
Decide what’s in scope and what isn’t
Align with the tech lead on approach and risk - even if this is mid-sprint

This turns refactoring from a surprise into a decision.

Pull request overhead

Large, mixed-purpose PRs are hard to reason about. They combine behaviour change, structural change, and style preferences into one cognitive load.

That’s not just uncomfortable - it’s dangerous.

Smaller, focused changes:

Are easier to review
Are easier to test
Fail more predictably
Are easier to roll back

If a refactor can’t be explained succinctly, it probably doesn’t belong inside a feature PR.

“We’ll fix it later”

Tech debt tickets have a bad reputation - often deserved. “Later” can feel like “never”.

But when there is real value in a cleanup, it should be articulated clearly:

Why does it matter?
What risk does it reduce?
What velocity does it unlock?

If it’s valuable, it belongs in a conversation with the product owner or manager - ideally within whatever allocation exists for technical debt (tech debt team / tech debt sprints / tech debt percentage of time in sprint). Cynicism aside, this is still healthier than folding it into unrelated work.

Fix broken windows, not full renovations

A useful middle ground is a “broken windows” mindset:

If you find a bug, and it’s genuinely low-risk to fix, fix it
If you touch code, don’t make it worse
If something is actively harmful, address it
But don’t deviate far from the path

This keeps quality trending upwards without turning every feature into a renovation project.

Legacy doesn’t mean futureless

One final question that often gets overlooked: what’s the future of this system?

If there’s already a plan to replace or significantly rework it, large refactors may deliver very little return. In those cases, creating new features in a clean, modern pattern - with a thin, well-defined integration point into the legacy code - can be a smarter investment.

It’s a subtle shift from “perfecting the past” to “protecting the future”.

No right answer - just better questions

There’s no universal rule for how much to refactor. But there are better questions to ask:

What’s the risk?
What’s the value?
What’s the plan?
Who needs to agree?
What does this do to delivery predictability?

Answer those honestly, and the right level of refactoring usually becomes clearer.

I’m curious how others navigate this trade-off - especially across different team cultures and delivery pressures. Thoughts welcome.

Disclaimer: These are my personal views and do not necessarily reflect the views of my employer.

blog.buil.uk

Search This Blog