Technology·Edge & gateway

CDN (Content Delivery Network)

A globally distributed cache that serves content from edge locations near users — cutting latency, absorbing read traffic off your origin, and shielding it from spikes.

Also worth naming: Cloudflare · Amazon CloudFront · Akamai · Fastly · Google Cloud CDN

~25 min read·15 sections

The best request is the one your origin never sees. A CDN turns "my origin is overloaded serving the same bytes worldwide" into "my origin is bored" — if the cache key and TTLs are designed with intent.

What it is

A content delivery network is a globally distributed fleet of cache servers (edge locations or PoPs) that sit between users and your origin (your servers or an S3 bucket). When a user requests content, the CDN routes them to the nearest edge; if that edge has the object cached, it serves it immediately without ever touching your origin. Done right, the edge absorbs 80–99% of your read traffic, leaving the origin to handle only cache misses and writes.

A CDN buys you two things at once. Lower latency, because content is served from a PoP physically near the user instead of crossing oceans to your origin — a 150 ms trans-continental round trip becomes a 10–20 ms edge hit. And origin offload + protection, because the same popular object is served from the cache millions of times while the origin computes or stores it once; the CDN also absorbs traffic spikes and DDoS that would otherwise flatten the origin.

Classically CDNs cached static assets — images, video, JS/CSS bundles — but modern CDNs also cache dynamic content (API responses, HTML) and run edge compute (auth, A/B routing, personalization) close to users. In an interview, a CDN is the near-default for any read-heavy, globally-accessed, cacheable content; the substance is in the cache key design, the TTL ladder, and how you handle invalidation and stampedes.

When to reach for it

Reach for this when…

You serve static assets or media (images, video, bundles) to a global audience
A read-heavy, cacheable workload where the same content is requested many times
You need to cut user-perceived latency by serving from near the user
You want to offload and protect the origin from read traffic and spikes/DDoS

Not really this pattern when…

Content is unique per request and fully personalized (low cache-hit rate — cache at origin instead)
Write-heavy or strongly-consistent data where staleness is unacceptable
Tiny internal services with no geographic spread or repeated reads
You need the data computed fresh every time (then a CDN adds little)

How it works

Four ideas decide whether a CDN helps or hurts:

1. Hit at the edge, miss to the origin. A request hits the nearest PoP. Hit → served instantly from cache. Miss → the PoP fetches from origin, caches it for the TTL, and serves it; subsequent requests hit. Your whole goal is to maximise the hit rate — a 95% hit rate means the origin sees 1 in 20 requests.

Architecture diagram· A CDN caches origin responses at edge locations near users

On a cache hit the nearest edge serves the object directly — fast, and the origin never sees the request. On a miss the edge fetches from origin, caches it for the TTL, and serves subsequent requests locally. Done well, the edge absorbs the vast majority of traffic.

2. The cache key is the most consequential design choice. By default the key is method + URL, but you often need to vary on a header (Accept-Language, Accept-Encoding) or strip noise (tracking query params, cookies). Put a session cookie in the key for public content and every user gets a unique key → ~0% hit rate. Strip the right things and you get 95%+. A bad cache key is the number-one cause of a useless CDN.

3. TTL is a freshness-vs-load dial, and invalidation is the escape hatch. Too short → low hit rate → origin load. Too long → stale content. The modern pattern is a TTL ladder by content type plus stale-while-revalidate (serve slightly stale instantly while refreshing in the background). Purging invalidates explicitly but is slow to propagate (seconds–minutes); lean on TTLs and content-hashed URLs (a new version = a new URL) so you rarely need to purge.

4. Pull vs push, and protecting the origin. Pull (default) caches on first miss; push pre-uploads known assets so there is no cold-start miss. To stop a TTL expiry from stampeding the origin, use an origin shield + request coalescing so concurrent misses for one key collapse into a single origin fetch.

Architecture diagram· Origin shield + request coalescing protect the origin from stampedes

Many edge PoPs funnel misses through one shield PoP, which collapses concurrent misses for the same key into a single origin fetch. Instead of every PoP hitting the origin when a TTL expires, the origin sees at most one request per key per TTL.

Performance envelope

CDN characteristics — the numbers to quote.

Dimension	Number	Why it matters
Edge hit latency	~5–30 ms (near user)	vs 100–150 ms crossing continents to origin
Origin offload	80–99% of reads at a good hit rate	The origin handles only misses + writes
Hit rate	95%+ with a well-designed cache key	Cache-key design is the lever that gets you there
Edge footprint	Hundreds of PoPs worldwide	Content served from near nearly every user
Purge propagation	Seconds to minutes	Why you prefer TTL + hashed URLs over purging
Spike / DDoS	Absorbed at the edge	Origin shielded from volumetric traffic

Capabilities in interviews

Static asset & media delivery

Cache images, video, and bundles at the edge with long TTLs and content-hashed URLs.

The classic use. Content-hash the filename so each version is immutable and cacheable forever:

text

/static/app.a1b2c3.js   Cache-Control: public, max-age=31536000, immutable

A new build produces a new hash → a new URL → a guaranteed cache miss, so you never purge. Images and video stream from the nearest PoP with the origin (often S3) only serving misses. This is the highest-hit-rate, highest-value CDN use.

Choose this variant when

JS/CSS bundles, fonts, images
Video / large media delivery
Any immutable, versioned asset

Dynamic content & API caching

Cache HTML and public API responses at the edge with short TTLs and stale-while-revalidate.

CDNs are not just for static files. Cache an HTML shell or a public API GET briefly:

text

Cache-Control: public, max-age=60, stale-while-revalidate=300

Users always get an instant (possibly slightly stale) response while the edge refreshes in the background, and the origin sees one refresh per TTL instead of every request. Strip user cookies from the cache key so public responses stay cacheable; personalized responses are marked private and skip the edge.

Choose this variant when

HTML shells and landing pages
Public, cacheable API responses
Near-real-time data where seconds of staleness is fine

Edge compute & personalization

Run lightweight logic at the edge — auth, routing, A/B, assembling personalized pages.

Edge runtimes (Cloudflare Workers, Lambda@Edge, Fastly Compute) execute code in the PoP near the user:

text

edge: validate JWT → assemble cached shell + per-user slot → return in <50ms

This lets you cache the expensive public shell while personalizing a small slice at the edge — getting cache benefits on pages that used to be uncacheable. Use it for auth checks, geo/A-B routing, and edge-side includes; it is powerful but priced per request, so reserve it for logic that genuinely needs to be near users.

Choose this variant when

Personalizing otherwise-cacheable pages
Auth / routing / A-B at the edge
Latency-critical logic that must run near users

Origin protection & security

Absorb spikes and DDoS, terminate TLS, and apply a WAF at the edge.

Because all traffic flows through the CDN, it is a natural security and resilience layer: it absorbs volumetric DDoS at the edge, terminates TLS near users (faster handshakes), and applies a WAF and rate limiting before requests reach the origin. The origin can even be locked down to accept traffic only from the CDN. This shielding is a major reason to front any public origin with a CDN, beyond caching.

Choose this variant when

Public-facing origins needing DDoS protection
TLS termination near users
A WAF / rate-limiting layer in front of origin

Operating knobs

Cache key design

The single biggest hit-rate lever. Default is method + URL; add Vary only where genuinely needed (language, encoding), and strip tracking query params and cookies for public content. A session cookie in the key for a public page drops the hit rate to near zero. Canonicalize URLs at the edge so equivalent requests share a key.

TTL ladder + stale-while-revalidate

Set TTLs by content class: years for content-hashed immutable assets, ~60s + stale-while-revalidate for HTML shells, a few seconds for public API GETs, and no edge cache (origin-only) for personalized responses. Stale-while-revalidate keeps reads instant while refreshing in the background — the modern default for dynamic content.

Invalidation strategy

Prefer content-hashed URLs so a change is a new URL (no purge needed, old version ages out). Reserve explicit purge for corrections/emergencies, accepting seconds-to-minutes propagation. Purge by URL, by tag/surrogate key, or globally depending on the CDN — and know which, because "we will just purge" is slow at scale.

Origin shield & coalescing

For high-traffic objects, enable an origin shield (a designated PoP all misses funnel through) and request coalescing so concurrent misses for the same key become one origin fetch. This prevents a TTL expiry from stampeding the origin — essential for popular content with bursty demand.

Versus the alternatives

CDN vs related layers.

Dimension	CDN (edge cache)	Origin cache (Redis)	Load balancer
Location	Globally near users	In your data center	In front of your servers
Caches	HTTP responses / objects	Computed values / data	Nothing — it routes
Primary win	Latency + origin offload	Avoid recompute / DB load	Distribute load across servers
Staleness	TTL-bounded (you tune it)	TTL/invalidation you control	N/A
Best for	Global, cacheable, read-heavy	Hot data behind the app	Spreading traffic, health checks

Failure modes & gotchas

A cache key that destroys the hit rate

Session cookies or unstripped tracking params (utm_*, fbclid) in the cache key make every request unique → near-zero hit rate → the origin sees everything. Strip cookies and tracking params for public content, narrow Vary, and canonicalize URLs at the edge.

Caching personalized content publicly

Caching a response that contains one user's data at a shared edge can serve it to other users — a serious leak. Mark personalized responses private/no-store, and only edge-cache the public shell, personalizing the rest at the edge or client.

Cold deploy without warm-up → origin stampedeAdvanced

A deploy that changes URLs or purges globally empties the edge, so every request misses at once and floods the origin. Use content-hashed filenames (old URLs stay cached), pre-warm top URLs after deploy, and use an origin shield with request coalescing.

Caching an error or stale objectAdvanced

A cached 500 or a stale object persists until TTL or purge — you have cached an outage. Never cache non-2xx with a long TTL (short negative TTL only), and prefer hashed URLs so replacing an object is automatic rather than a purge.

Treating the CDN as strongly consistent

Edge caches are eventually consistent by design — there is a window where users see stale content after a change. For data that must be immediately correct everywhere (prices at checkout, balances), read through to origin; the CDN is for content that tolerates bounded staleness.

In production

Netflix

Open Connect — a purpose-built CDN inside ISPs

Netflix delivers a huge share of global internet traffic, and it does so through Open Connect, its own CDN of cache appliances placed inside ISP networks and internet exchanges — as close to viewers as physically possible. Popular titles are pre-positioned (a push CDN) onto these appliances during off-peak hours, so when you hit play, the bytes come from a box in your ISP, not from a Netflix data center across the country.

This is the CDN philosophy taken to its logical extreme: serve from the edge, keep the origin (S3) doing almost nothing, and pre-warm predictable content so there's no cold-start miss. The takeaway engineers cite is that for read-heavy media at scale, the network distance to the user dominates latency — so you move the content to the user, not the user to the content.

Takeaway: For global read-heavy content, network distance dominates latency — push popular content to edge locations near users so the origin barely sees traffic.

Cloudflare

Stale-while-revalidate and request coalescing at internet scale

Cloudflare operates one of the world's largest CDNs, fronting millions of websites from hundreds of cities and serving tens of millions of requests per second. Their public engineering writing is a practical masterclass in the exact levers from this page: tiered caching / origin shield so a small set of upper-tier data centers absorb misses and the origin sees one request per asset, request coalescing so a thousand simultaneous misses for the same URL become one origin fetch, and stale-while-revalidate so users never block on a refresh.

These features exist specifically to solve the cache stampede — the moment a popular object's TTL expires and every edge tries to refresh at once. Cloudflare's scale makes the failure mode (and its mitigation) concrete: without coalescing and shielding, a viral object's TTL expiry can hammer an origin offline, which is why those controls are the senior answer to "how do you protect the origin?"

Takeaway: Origin shield + request coalescing + stale-while-revalidate are the standard defenses against cache stampedes — collapse concurrent misses into one origin fetch and never block on a refresh.

Good vs bad answer

Interviewer probe

“Your social app serves profile images and feeds to users worldwide and the origin is overloaded. What do you add and how?”

Weak answer

"Add more origin servers and a bigger database so it can handle the global read load, and maybe a cache in the data center to speed things up."

Strong answer

"A CDN in front of everything cacheable — the origin is overloaded serving the same bytes repeatedly across the world, which is exactly what an edge cache fixes. Images: content-hashed URLs (avatar.<hash>.jpg) with a one-year immutable TTL, origin on S3 — these get a ~99% hit rate and a new image is just a new URL, no purge. Feed HTML / public API: short TTL (~60s) with stale-while-revalidate so reads are instant while the edge refreshes, and I strip the session cookie from the cache key so public responses stay cacheable; the personalized slice is assembled at the edge or client so the shell can still be cached. That offloads 90%+ of reads from the origin and cuts latency by serving from near each user. For popular objects I enable an origin shield with request coalescing so a TTL expiry can't stampede the origin. Just scaling origin servers would be paying to serve identical bytes a million times instead of caching them once at the edge."

Why it wins: Diagnoses the repeated-global-read problem, designs per-content cache keys and a TTL ladder, uses hashed URLs + stale-while-revalidate, strips cookies to protect the hit rate, adds origin-shield stampede protection, and explains why scaling origin is the wrong lever.

Interview playbook

Interview playbook2 min on most designs; longer when the product is global or media-heavy

When it comes up

Global users and "make it fast worldwide"
Serving images, video, or static assets at scale
A read-heavy public surface where the origin is the bottleneck
The interviewer asks how you reduce latency or protect the origin

Order of reveal

1
1. Front cacheable content with a CDN. Put a CDN over everything cacheable; the edge serves from near the user and the origin only sees misses.
2
2. Design the cache key. Key on URL plus minimum Vary; strip tracking params and cookies for public content so the hit rate stays high.
3
3. TTL ladder. Years for content-hashed assets, ~60s + stale-while-revalidate for HTML, seconds for public API, origin-only for personalized.
4
4. Invalidation. Content-hashed URLs mean a change is a new URL — I rarely purge; purge is the emergency lever.
5
5. Protect the origin. Origin shield + request coalescing so a TTL expiry never stampedes the origin; the CDN also absorbs spikes/DDoS.

Signature phrases

“The best request is the one the origin never sees.”

“Cache key is URL plus minimum Vary — strip the tracking params and cookies.”

“Content-hash the URL so a new version is a new URL — no purge.”

“Stale-while-revalidate keeps reads instant while the edge refreshes.”

“The best request is the one the origin never sees.” — Frames the CDN as the primary traffic absorber.
“Cache key is URL plus minimum Vary — strip the tracking params and cookies.” — Names the single biggest hit-rate lever.
“Content-hash the URL so a new version is a new URL — no purge.” — The clean invalidation pattern.
“Stale-while-revalidate keeps reads instant while the edge refreshes.” — Shows modern dynamic-caching command.

Likely follow-ups

?“How do you serve personalized pages but still get caching benefit?”Reveal

Split the response. Cache the public shell (layout, nav, anonymous content) at the edge with a normal TTL, and hydrate the personalized slice separately — either with a small uncached request from the client, or with edge compute that validates the user and assembles per-user fragments in the PoP. What I never do is put the user id or session cookie in the cache key for the whole page, which drives the hit rate to zero. The principle: cache the part that is the same for everyone, personalize the small part that differs.

?“A popular object's TTL expires and the origin gets hammered. Why, and how do you prevent it?”Reveal

That is a cache stampede: when the TTL expires, every edge PoP (and every concurrent request) misses at once and rushes the origin for the same object. Prevent it with an origin shield — funnel all PoP misses through one designated PoP — plus request coalescing so concurrent misses for the same key collapse into a single origin fetch, and stale-while-revalidate so users keep getting the stale copy while one background request refreshes. Jittering TTLs across keys also stops many objects expiring on the same second. Lowering the TTL is the wrong instinct — it makes stampedes more frequent.

?“When does a CDN NOT help?”Reveal

When content is unique per request and fully personalized, the hit rate is near zero and the CDN just adds a hop — cache at the origin (Redis) instead. When data must be strongly consistent and immediately correct everywhere (checkout prices, account balances), edge caching's bounded staleness is unacceptable, so you read through to origin. And for small internal services with no geographic spread or repeated reads, there is nothing to cache. The CDN earns its place specifically for global, cacheable, read-heavy content.

Worked example

Setup. A news site with a global audience is melting its origin during a breaking-news spike: millions of readers worldwide hit the same article and home page simultaneously, and the origin servers and database can't keep up.

The move. Front everything cacheable with a CDN and classify content by cacheability. Static assets (JS/CSS bundles, images) get content-hashed filenames (app.a1b2c3.js) and a 1-year immutable TTL — a new build is a new URL, so I never purge. Article + home HTML gets a short TTL (~60s) with stale-while-revalidate, so readers always get an instant (possibly 1-minute-stale) page while the edge refreshes in the background and the origin sees one refresh per minute instead of millions of hits.

Cache key. The hit rate lives or dies here: I key on URL, strip tracking params (utm_, fbclid) so every social share doesn't create a unique key, and keep the *session cookie out of the key** for public pages. Personalization (the logged-in header) is a small uncached fragment hydrated separately, so the article body still caches.

Stampede protection. When a hot article's TTL expires, I don't want every edge PoP rushing the origin at once — so I enable an origin shield (all misses funnel through one PoP) plus request coalescing (concurrent misses for one key collapse into a single origin fetch), and jitter TTLs so a million keys don't expire on the same second.

What breaks. The classic disaster is caching a 500 or a personalized page — so I never cache non-2xx with a long TTL, and personalized responses are marked private. A cold deploy that purges everything would stampede the origin, so I rely on content-hashed URLs (old ones stay cached) and pre-warm the top URLs.

The result. The CDN absorbs 95%+ of reads at the edge, breaking-news spikes hit cache not origin, global readers get sub-50ms pages, and the origin handles only the ~5% misses plus writes.

Cheat sheet

•CDN = globally distributed edge cache between users and origin. Hit → instant; miss → fetch + cache.
•Wins: lower latency (serve near user) + origin offload/protection (80–99% of reads).
•Cache key is the #1 hit-rate lever: URL + minimum Vary; strip cookies/tracking params for public content.
•TTL ladder: 1yr immutable hashed assets → 60s + stale-while-revalidate HTML → seconds API → origin-only personalized.
•Invalidate via content-hashed URLs (new version = new URL); purge is the slow emergency lever.
•Stampede protection: origin shield + request coalescing + TTL jitter.
•Never cache personalized content publicly or cache errors with a long TTL.
•Also a security layer: absorbs DDoS, terminates TLS, runs a WAF at the edge.

Drills

Why does a CDN cut latency even when the origin is fast?Reveal

Because most user-perceived latency on a global request is network round-trip, not origin compute. A user in Europe hitting a US origin pays ~100–150 ms each way regardless of how fast the server is; a CDN serves the same bytes from a PoP in Europe in ~10–20 ms. The CDN removes the long-distance round trip for cached content. It also offloads the origin so it stays fast under load — but the headline latency win is geographic proximity.

Interviewer: "your CDN hit rate is 40%, you expected 95%. What's likely wrong?"Reveal

The cache key is over-varying. Usual culprits: tracking query params (utm_*, fbclid) not stripped, so every share creates a unique key; a session cookie included in Vary on public pages; or Accept-Language/Accept-Encoding splitting the cache unnecessarily. Also check the TTL — a 5-second TTL guarantees frequent misses. Fix by stripping tracking params and cookies for public content, narrowing Vary to what genuinely changes the response, canonicalizing URLs at the edge, and lengthening TTLs (with stale-while-revalidate) where freshness allows.

How do you push a content change so all users see it quickly, without hammering the origin?Reveal

For routine changes, rely on TTLs — users see the new version within the TTL, and stale-while-revalidate keeps it instant. For an urgent correction, purge the affected URLs (or a surrogate-key tag), accepting seconds-to-minutes propagation. The cleanest approach for assets is to avoid purges entirely with content-hashed URLs: publish the new version under a new URL and update the reference, so the change is instant for new requests and the old object simply ages out. Pre-warming the top URLs after a big change prevents a miss stampede.

What it is

CDN (Content Delivery Network)

A globally distributed cache that serves content from edge locations near users — cutting latency, absorbing read traffic off your origin, and shielding it from spikes.

Also worth naming: Cloudflare · Amazon CloudFront · Akamai · Fastly · Google Cloud CDN

~25 min read·15 sections

Dimension

Number

Why it matters

Edge hit latency

~5–30 ms (near user)

vs 100–150 ms crossing continents to origin

Origin offload

80–99% of reads at a good hit rate

The origin handles only misses + writes

Hit rate

95%+ with a well-designed cache key

Cache-key design is the lever that gets you there

Edge footprint

Hundreds of PoPs worldwide

Content served from near nearly every user

Purge propagation

Seconds to minutes

Why you prefer TTL + hashed URLs over purging

Spike / DDoS

Absorbed at the edge

Origin shielded from volumetric traffic

Dimension

CDN (edge cache)

Origin cache (Redis)

Load balancer

Location

Globally near users

In your data center

In front of your servers

Caches

HTTP responses / objects

Computed values / data

Nothing — it routes

Primary win

Latency + origin offload

Avoid recompute / DB load

Distribute load across servers

Staleness

TTL-bounded (you tune it)

TTL/invalidation you control

N/A

Best for

Global, cacheable, read-heavy

Hot data behind the app

Spreading traffic, health checks