~/blog/sql-server-continuous-data-protection
Infrastructure2007·20 August 2007

SQL Server Continuous Data Protection: Real-Time Backup in Practice

Continuous Data Protection promised zero-RPO recovery. Building it for SQL Server at Microsoft taught me the gap between a promising architecture and a production-grade implementation.

SQL ServerCDPbackupMicrosoft

What CDP Promised

In 2007, Continuous Data Protection (CDP) was the gold standard for data backup. Unlike traditional scheduled snapshots, CDP captured every write to disk as it happened — theoretically giving you the ability to restore to any point in time, not just the last backup.

Building a CDP system for SQL Server was my primary project that year. Here's what I learned.

How SQL Server CDP Works

SQL Server's transaction log is the key. Every write to the database — INSERT, UPDATE, DELETE — is first written to the transaction log before being applied to the data files. This Write-Ahead Logging (WAL) pattern means if you can capture and replay the transaction log, you can reconstruct the database at any point.

Our CDP system worked in two stages:

  1. 1.Base snapshot: A full volume snapshot of the SQL Server data files, taken with SQL Server in a backup-consistent state (using VSS — the Volume Shadow Copy Service)
  2. 2.Continuous log shipping: Capture and archive transaction log backups on a rolling basis, as frequently as every 30 seconds

For restore, you'd apply the base snapshot, then replay transaction logs up to your desired recovery point. Granularity was configurable — down to the specific transaction.

The Challenges Nobody Tells You About

VSS coordination is hard. Getting SQL Server, the Volume Shadow Copy Service, and the disk subsystem to agree on a consistent snapshot point required careful orchestration. A failed VSS freeze — where SQL Server pauses writes for snapshot consistency — could cause application-visible latency spikes.

Log volume surprises you. High-write databases generate enormous transaction logs. Our 30-second log capture interval on a busy OLTP database could generate hundreds of MB per interval. Storage sizing for a CDP system is never what you initially estimate.

Recovery testing is non-negotiable. We built automated restore tests into the platform — nightly, we'd restore the previous day's backups to a test environment and validate the data. Many backup systems fail silently; only restore testing reveals the failures.

The Lesson I Still Apply

Always test your restores. Not just backup success, but the actual restore process. I've seen teams with years of backup history discover — at the worst possible moment — that the backups were corrupt or incomplete. At Freddie's Flowers, we apply the same principle to all our backup and failover procedures: if you haven't tested the restore, you don't have a backup.