Why Caching Is Non-Negotiable in Travel
A flight search isn't a simple database query. It's a fanout to multiple GDS systems (Amadeus, Sabre, Sirena), each requiring authentication, a structured query in a proprietary protocol, and a response that may take seconds to arrive. Doing this synchronously for every user search is not viable.
The good news: travel search has temporal locality. A user searching London-Dubai for 15 March will likely see similar results to the user who searched the same route 5 minutes ago. Caching with appropriate TTLs lets you serve the second user's search from cache at millisecond speed.
The hard part: knowing when your cache is stale enough to matter.
Cache Hierarchy Design
We built a three-tier cache:
L1 — In-memory (per-process): Tiny, hot data. Airline code lookups, airport names, currency rates. Things that change rarely and are needed on every request. Sub-millisecond access. Invalidated on deployment.
L2 — Redis: Search result cache. When we query a GDS for a route+date combination, we store the results in Redis with a route-specific TTL. Popular routes get shorter TTLs (prices change faster). Off-peak routes get longer TTLs.
L3 — MySQL: Historical search cache. Older search results that are past their freshness window but useful for analytics and for pre-warming the Redis cache on startup.
The Stampede Problem
Cache stampede (or cache thundering herd) is a nasty failure mode: the cache entry for a popular route expires, and simultaneously hundreds of users trigger GDS queries for the same route. Your GDS quota exhausts, your response times spike, and users see errors.
Our solution: probabilistic early expiration combined with a single-flight pattern. When a cache entry is within 10% of its TTL, we'd probabilistically decide (with increasing probability as the TTL approaches) to refresh it in the background — before it expires. The current stale value continues serving users while the refresh happens.
For true stampede prevention, we used a Redis lock: the first request to find a stale/missing cache entry acquires a lock and triggers the GDS query. All subsequent requests for the same route wait on the lock (or serve the slightly-stale value if available). Only one GDS query fires.
Symfony Framework for Clean Architecture
At Silverdoor and during this period, I was working heavily with Symfony. Its dependency injection container and HTTP kernel abstraction made building a layered cache architecture clean — you could inject a CacheAwareSearchService that transparently checked caches before hitting GDS, without the calling code needing to know.
The event system was also valuable for cache warming: on certain events (new route added, airline schedule update), subscribers would queue cache warming jobs for affected routes.
What I'd Do Differently
In hindsight, I'd add more telemetry earlier. We could see cache hit rates in aggregate, but not by route, by TTL bucket, or by GDS provider. When cache hit rates dropped, diagnosing why required more investigation than it should have.
Instrument your cache as carefully as your application. Cache hit rate, miss rate, TTL distribution, and stampede frequency are as important as application response times.