Introduction

I entered IT because I enjoy solving problems.

Early in my career the most rewarding moments often involved troubleshooting something complex: restoring a failed server, recovering corrupted data, or fixing a network outage .

Those moments feel heroic.

Systems come back online.
Users are grateful.
Problems are solved.

But over time something becomes clear.

The technicians who build the most reliable environments are rarely the ones performing the most dramatic recoveries.

They are the ones quietly designing systems where those failures happen less often in the first place.

This article is a reflection on lessons that took me years to understand. Many of them came from mistakes, shortcuts and assumptions that seemed reasonable at the time.

None of these ideas are revolutionary. Most experienced engineers already know them.

But they are the principles I wish someone had emphasised when I first started working in IT.


Documentation Matters More Than Hero Troubleshooting

Every IT environment eventually develops a hero technician.

When something breaks, everyone calls the same person. They know where everything lives. They remember which server runs which service and why a configuration was changed three years ago.

The problem is that this knowledge usually lives in one place: their head.

When environments rely on memory instead of documentation, they become fragile.

Two very different IT environments
Memory-driven infrastructure
  • One technician knows everything
  • Troubleshooting takes hours
  • Changes feel risky
  • New staff struggle to learn the system
Documented infrastructure
  • Multiple people understand the system
  • Problems are diagnosed quickly
  • Infrastructure evolves safely
  • Knowledge persists when people leave

Documentation rarely feels exciting.

But over time it quietly becomes one of the most valuable assets in any IT environment.


Automation Beats Manual Skill

Early in my career I spent huge amounts of time doing things manually.

Building machines.
Creating accounts.
Deploying software.
Checking logs.

At the time that felt like productivity.

Eventually you realise manual processes do not scale.

As environments grow, repetitive tasks begin to consume the majority of your time. Automation changes that completely.

A few small scripts or management tools can quietly remove hours of routine work every week.

Examples appear everywhere in real environments.

  • Automatically provisioning user accounts when new staff or students join
  • Deploying applications across hundreds of machines without visiting each one
  • Patching devices automatically overnight instead of manually checking updates
  • Generating system health reports instead of manually reviewing logs
  • Alerting technicians when disks or certificates are about to expire
  • Automatically enrolling devices into management systems when they are first powered on

None of these tasks are technically difficult. But they become extremely time-consuming when repeated hundreds or thousands of times.

Over the lifetime of an environment, automation compounds.

One script that saves ten minutes a day will save more than 60 hours of technician time each year.

That time can then be spent improving infrastructure instead of repeating routine tasks.


Security Fundamentals Are Often Ignored

Most security incidents are not sophisticated.

They happen because basic practices are missing.

Across many environments the same problems appear repeatedly:

  • shared administrator accounts
  • systems that have not been patched for months
  • flat internal networks with no segmentation
  • services exposed unnecessarily to the internet

These problems rarely appear all at once. They accumulate slowly as environments evolve.

A common pattern in real environments

Systems are built correctly at the start.
Over time small changes are made.
Documentation falls behind.
Temporary fixes become permanent.

Eventually the environment no longer resembles the original design.

Security improves dramatically when teams consistently address the fundamentals.

The NCSC 10 Steps to Cyber Security framework exists largely because those fundamentals matter more than complex tools.


Backups Are Still Broken in 2026

One of the most uncomfortable truths in IT is how many backup systems do not actually work.

Backups fail silently. Storage fills up. Jobs stop running.

Everything appears fine until the moment data is actually needed.

Backup maturity

Level 1 - False confidence
Backups exist but restoration has never been tested.

Level 2 - Monitored backups
Backup jobs are checked and failures investigated.

Level 3 - Recovery ready
Restoration procedures are tested regularly and documented.

A backup strategy is only complete when recovery has been verified.


Monitoring Prevents Most Emergencies

Many outages follow a predictable pattern.

A disk fills up.
A certificate expires.
A service stops responding.

Users discover the problem first.

Monitoring reverses this.

How incidents unfold

Reactive environment

  1. Issue occurs
  2. Users report outage
  3. Technician investigates
  4. Service restored hours later

Monitored environment

  1. Monitoring detects anomaly
  2. Alert triggered automatically
  3. Technician investigates early
  4. Issue resolved before disruption

Monitoring rarely receives much attention.


The Most Important Skill Is Systems Thinking

Over time the biggest shift in perspective is understanding that IT is not about individual machines.

It is about systems.

Servers, networks, identity platforms, security controls and applications all interact with each other.

Small design decisions can have large downstream effects.

Technicians who understand these relationships build more stable environments.

Compound Skills in IT
Foundations
Documentation
Monitoring
Backups
Operational Skills
Automation
Security
Device Management
System Design
Identity & Access
Networking
Infrastructure Architecture
Each skill reinforces the others. Over time they compound into environments that are easier to maintain, easier to secure, and far more resilient.

Final Thoughts

Early in your career it is easy to believe that technical excellence comes from solving difficult problems quickly.

The best technicians build environments where those problems appear less frequently in the first place.

That shift from reactive troubleshooting to proactive engineering is where the real growth in IT happens.