Cloud Exit Is Not Lift and Shift

Ricky avatar
Ricky
Cover for Cloud Exit Is Not Lift and Shift

Mention leaving AWS and the response is predictable: “You can’t just lift and shift everything to bare metal.”

Correct. You can’t. But nobody serious about cloud repatriation is suggesting you fork-lift your entire infrastructure onto Hetzner over a weekend. Framing it that way kills conversations that companies genuinely need to have.

The all-or-nothing trap

The cloud industry frames the exit question as binary: you’re either “all in” or you’re making a terrible mistake. This framing serves cloud providers and the ecosystem of consultancies, tools, and certifications built around them. It doesn’t serve you.

Most companies should be running a hybrid infrastructure — some workloads on public cloud, some on dedicated servers, some on managed platforms that aren’t hyperscalers. The right question is “which workloads belong where?” and it deserves a rigorous answer, not a religious one.

What it actually looks like

Cloud repatriation done properly is phased and selective. Nobody moves everything. The process:

Phase 1: Workload classification (2–4 weeks)

You classify every workload along two axes:

Cloud-native dependency. How tightly coupled is this to cloud-specific services? A Lambda function triggered by SQS, writing to DynamoDB, with CloudWatch alarms — deeply cloud-native. A Docker container running a Python API with PostgreSQL — portable.

Traffic pattern. Bursty or steady-state? Bursty workloads (seasonal spikes, event-driven processing) benefit from cloud elasticity. Steady-state workloads (your main app server running 24/7 at 40–60% utilisation) are paying a premium for elasticity they never use.

This typically reveals 30–50% of workloads are strong candidates for repatriation, 20–30% are borderline, and 20–40% genuinely belong on cloud.

Phase 2: Target architecture (2–4 weeks)

For each workload you’re moving, you design the target state. This is where “not lift and shift” actually matters.

A workload running on ECS with an RDS database doesn’t just get copied to a bare-metal server. You might:

  • Replace ECS with Docker Compose or Kubernetes
  • Replace RDS with self-managed PostgreSQL using automated backups
  • Replace ALB with Caddy or Nginx
  • Replace CloudWatch with Prometheus and Grafana
  • Replace S3 with MinIO or a cheaper object storage provider

Each replacement is a deliberate architectural choice. Some managed services get replaced with self-hosted equivalents. Others get replaced with simpler alternatives that didn’t make sense on cloud but work well on dedicated hardware.

Phase 3: Parallel run (4–8 weeks per workload)

You don’t cut over. You run in parallel. The workload runs simultaneously on cloud and target infrastructure. Traffic shifts gradually — 10%, 25%, 50%, 100% — with monitoring at every stage. If anything breaks, traffic shifts back instantly.

This is exactly how responsible infrastructure migrations work. It’s how cloud migrations work too, ironically. Same process, different direction.

Phase 4: Decommission (1–2 weeks per workload)

Once a workload has run on the target infrastructure for a full billing cycle with no issues, you decommission the cloud resources. Not before.

Full process for a single workload: 8–14 weeks. For a company moving five workloads, you can parallelise, but expect the overall programme to take 4–6 months.

What stays on cloud

Not everything should leave:

Genuinely bursty workloads. Batch processing that needs 100 cores for 2 hours a day and zero for 22 hours. Cloud is perfect for this — you pay for exactly what you use.

Global edge. Content delivery or compute in 40+ regions. CloudFront and Lambda@Edge are hard to beat here. Building your own CDN isn’t sensible.

Managed AI/ML services. GPU availability and managed tooling on cloud is genuinely valuable for model training and inference.

Regulatory sandboxes. Some compliance frameworks have pre-approved cloud configurations. Replicating that on dedicated infrastructure may not be worth the audit cost.

Experimentation. Spin up, test, tear down. Still what cloud does best.

What should almost always leave

Steady-state web applications. Your main product, running 24/7, predictable traffic. Dedicated hardware costs 70–80% less for the same compute.

Databases with predictable storage growth. If you know your DB grows 50GB per month and you need 1TB of storage, buying that directly is dramatically cheaper than EBS or RDS pricing.

CI/CD pipelines. Build servers run at near-100% utilisation during business hours. Self-hosted runners cost a fraction of cloud-hosted build minutes.

Internal tools. Low-traffic, low-risk workloads. A single Hetzner box running your admin panel, monitoring stack, and internal tools costs £40/month instead of £400.

The hybrid end state

Most companies end up here:

  • Core application servers on dedicated hardware (60–70% cost reduction)
  • Databases on dedicated hardware with automated failover (50–70% cost reduction)
  • CDN and edge on Cloudflare or similar
  • Burst capacity on cloud for spikes and batch processing
  • Dev and staging on dedicated hardware (massive savings, low risk)

Cost efficiency of dedicated for predictable workloads. Elasticity of cloud for unpredictable ones. Flexibility to shift as needs change.

Why “just optimise on cloud” isn’t enough

The standard response: “Don’t leave. Just optimise. Reserved Instances. Right-size. Spot.”

All valid tactics. Do them regardless. But they have a ceiling:

  • Right-sizing saves 20–40%
  • Reserved Instances / Savings Plans save another 20–30%
  • Spot saves 60–90% but only for fault-tolerant workloads
  • Combined: maybe 40–50% from your unoptimised baseline

Meaningful. But dedicated hardware for steady-state workloads is 70–80% cheaper than on-demand cloud. Even after optimising, dedicated is still significantly less for the right workloads.

Optimisation-first makes sense as a starting point — lower risk, faster to implement. But for companies spending £100k+/month, optimisation alone usually isn’t enough to make the economics comfortable.

Where to start

Four questions:

  1. What percentage of your cloud spend is steady-state compute? Over 60% = strong candidates for repatriation.
  2. How cloud-native are your workloads? Count the AWS-specific services each depends on. Fewer dependencies = easier to move.
  3. Do you have the operational capacity? You need at least one engineer comfortable with Linux sysadmin. If not, hiring or contracting for it is part of the cost model.
  4. What’s your timeline? This is a 4–6 month programme, not a weekend project.

A Platform Fit Verdict starts with workload classification — which parts of your infrastructure are repatriation candidates, which should stay, and what the numbers look like for each.

Need an independent technology verdict?

See Our Verdicts