~/blog/resilient-software-c-enterprise-patterns
Engineering2006·15 November 2006

Building Resilient Software with C: Enterprise Patterns for Disaster Recovery

Working on Microsoft's disaster recovery platform in C taught me lessons about reliability engineering that I still apply at every level of the stack — from kernel modules to cloud-native applications.

Cdisaster-recoveryenterpriseMicrosoft

Why C for Critical Infrastructure?

When I joined Microsoft's disaster recovery team in 2006, the choice of C for core platform components wasn't arbitrary — it was deliberate. For software that needed to operate reliably at the kernel level, with predictable memory behaviour and minimal runtime dependencies, C was the right choice.

This post is about what I learned building in that environment, and why those lessons translate to any language or platform.

The Memory Discipline

C forces you to think about memory explicitly. Every allocation has a corresponding free. Every pointer is an address you're responsible for. This discipline — what I'd now call ownership thinking — is something I carry into every technology I work with.

In Rust, ownership is enforced by the compiler. In modern C++, smart pointers handle it. In Go, the GC takes care of it. But understanding why ownership matters — preventing use-after-free, double-free, and memory leaks in long-running processes — is foundational.

For a disaster recovery platform that might run for months without a restart, even small memory leaks are fatal. We profiled everything with Valgrind. We treated any memory warning as a P0 bug.

Error Handling Without Exceptions

C has no exceptions. Every function that can fail must communicate that failure through its return value or an output parameter. This leads to verbose but honest code:

c
int result = replicate_block(src, dst, block_size); if (result != DR_SUCCESS) { log_error("Block replication failed: %d", result); return result; }

Verbose, yes. But also explicit. The caller knows the operation can fail and is forced to handle it. Compare this to languages where exceptions can silently propagate up the call stack, crashing in unexpected places.

This explicitness is something I advocate for in all languages: make your error paths obvious, handle them close to the source, and never let failures silently disappear.

Lessons That Outlasted the Language

Years later, writing PHP at TouchNote, Go at Airhub, Node.js at Freddie's Flowers — the patterns I learned in C stayed with me:

  1. 1.Assume failure at every system boundary
  2. 2.Own your resources: connections, file handles, memory — clean them up
  3. 3.Make the error case as visible as the success case
  4. 4.Understand what's happening below your abstraction layer

The languages changed. The principles didn't.