Introduction
I entered IT because I enjoy solving problems.
Early in my career the most rewarding moments often involved troubleshooting something complex: restoring a failed server, recovering corrupted data, or fixing a network outage .
Those moments feel heroic.
Systems come back online.
Users are grateful.
Problems are solved.
But over time something becomes clear.
The technicians who build the most reliable environments are rarely the ones performing the most dramatic recoveries.
They are the ones quietly designing systems where those failures happen less often in the first place.
This article is a reflection on lessons that took me years to understand. Many of them came from mistakes, shortcuts and assumptions that seemed reasonable at the time.
None of these ideas are revolutionary. Most experienced engineers already know them.
But they are the principles I wish someone had emphasised when I first started working in IT.
Documentation Matters More Than Hero Troubleshooting
Every IT environment eventually develops a hero technician.
When something breaks, everyone calls the same person. They know where everything lives. They remember which server runs which service and why a configuration was changed three years ago.
The problem is that this knowledge usually lives in one place: their head.
When environments rely on memory instead of documentation, they become fragile.
Documentation rarely feels exciting.
But over time it quietly becomes one of the most valuable assets in any IT environment.
Automation Beats Manual Skill
Early in my career I spent huge amounts of time doing things manually.
Building machines.
Creating accounts.
Deploying software.
Checking logs.
At the time that felt like productivity.
Eventually you realise manual processes do not scale.
As environments grow, repetitive tasks begin to consume the majority of your time. Automation changes that completely.
A few small scripts or management tools can quietly remove hours of routine work every week.
Examples appear everywhere in real environments.
- Automatically provisioning user accounts when new staff or students join
- Deploying applications across hundreds of machines without visiting each one
- Patching devices automatically overnight instead of manually checking updates
- Generating system health reports instead of manually reviewing logs
- Alerting technicians when disks or certificates are about to expire
- Automatically enrolling devices into management systems when they are first powered on
None of these tasks are technically difficult. But they become extremely time-consuming when repeated hundreds or thousands of times.
Over the lifetime of an environment, automation compounds.
One script that saves ten minutes a day will save more than 60 hours of technician time each year.
That time can then be spent improving infrastructure instead of repeating routine tasks.
Security Fundamentals Are Often Ignored
Most security incidents are not sophisticated.
They happen because basic practices are missing.
Across many environments the same problems appear repeatedly:
- shared administrator accounts
- systems that have not been patched for months
- flat internal networks with no segmentation
- services exposed unnecessarily to the internet
These problems rarely appear all at once. They accumulate slowly as environments evolve.
A common pattern in real environments
Systems are built correctly at the start.
Over time small changes are made.
Documentation falls behind.
Temporary fixes become permanent.
Eventually the environment no longer resembles the original design.
Security improves dramatically when teams consistently address the fundamentals.
The NCSC 10 Steps to Cyber Security framework exists largely because those fundamentals matter more than complex tools.
Backups Are Still Broken in 2026
One of the most uncomfortable truths in IT is how many backup systems do not actually work.
Backups fail silently. Storage fills up. Jobs stop running.
Everything appears fine until the moment data is actually needed.
A backup strategy is only complete when recovery has been verified.
Monitoring Prevents Most Emergencies
Many outages follow a predictable pattern.
A disk fills up.
A certificate expires.
A service stops responding.
Users discover the problem first.
Monitoring reverses this.
How incidents unfold
Reactive environment
- Issue occurs
- Users report outage
- Technician investigates
- Service restored hours later
Monitored environment
- Monitoring detects anomaly
- Alert triggered automatically
- Technician investigates early
- Issue resolved before disruption
Monitoring rarely receives much attention.
The Most Important Skill Is Systems Thinking
Over time the biggest shift in perspective is understanding that IT is not about individual machines.
It is about systems.
Servers, networks, identity platforms, security controls and applications all interact with each other.
Small design decisions can have large downstream effects.
Technicians who understand these relationships build more stable environments.
Final Thoughts
Early in your career it is easy to believe that technical excellence comes from solving difficult problems quickly.
The best technicians build environments where those problems appear less frequently in the first place.
That shift from reactive troubleshooting to proactive engineering is where the real growth in IT happens.