Real-time transport choices
Polling, long-polling, SSE, WebSockets, connection budgets, and reconnect semantics.
Polling, long-polling, SSE, WebSocket — the transport choice is about connection count and direction, not cleverness. Pick wrong and you pay in money or in user-visible lag.
Read this if your last attempt…
- You defaulted to WebSocket for "real-time" without costing it
- You can't name the difference between long-polling and SSE
- You haven't thought about connection count at 1M users
- You don't know what a load balancer does with a persistent connection
The concept
Four mechanisms, roughly ordered by connection cost:
- Short polling — client asks every N seconds. Dead simple, works through any proxy, wastes battery + bandwidth. Fine when N is tens of seconds and freshness isn't critical.
- Long polling — client request blocks server-side until data arrives or timeout. One outstanding request per client. Proxy-friendly, but every message still pays a TCP + TLS + HTTP overhead.
- Server-Sent Events (SSE) — one persistent HTTP connection, server → client text stream. Browser built-in, automatic reconnect, survives HTTP proxies. One-way only.
- WebSocket — one persistent TCP connection, bidirectional, any payload. Most proxies and LBs support it but need config. Can run over HTTPS (wss://).
Polling pulls. SSE streams server→client. WebSocket is bidirectional and the most expensive.
Real-time mechanisms — pick by direction + frequency.
| Mechanism | Direction | Best for | Cost per connection |
|---|---|---|---|
| Short polling | Client asks | Low-frequency updates (seconds to minutes) | Tiny (no persistent socket) |
| Long polling | Client asks, server holds | Moderate freshness on HTTP-only clients | Medium — one open request per client |
| SSE | Server → client | Live feeds, notifications, dashboards | One HTTP connection per client |
| WebSocket | Both | Chat, collaborative editing, games | One TCP + handshake per client |
How interviewers grade this
- You pick the mechanism by direction and frequency, not by vibe.
- You size the connection count at peak and cost the gateway fleet.
- You name the reconnect strategy (backoff, resume token).
- You name the LB story (sticky, consistent-hash, or session affinity).
- If messages fan out, you separate connection layer from delivery layer (pub/sub between them).
Variants
SSE for server-push feeds
One HTTP stream per client; server pushes as events arrive.
Clean fit for notifications, live stock tickers, dashboards. Browser auto-reconnects on drop. Upstream proxies and LBs handle it without special config because it's HTTP.
Pros
- +Works through HTTP-aware proxies
- +Auto-reconnect in browsers
- +Simple protocol (text events + id)
Cons
- −One-way only
- −Limited browser connection-per-origin quota (~6) unless HTTP/2
- −Mobile support weaker than WebSocket
Choose this variant when
- One-way server push
- Browser clients
- Notifications, dashboards, live counters
WebSocket for bidirectional real-time
Persistent TCP; messages both ways; any payload (text or binary).
The right choice for chat, collab editing, games, live trading. Requires LB support (ALB, NGINX with upgrade), reconnection logic on the client, and careful capacity planning on the gateway tier.
Pros
- +Bidirectional
- +Low per-message overhead after handshake
- +Any payload format
Cons
- −Most expensive at scale (FDs, memory)
- −LB/proxy configuration is non-trivial
- −Reconnect + message replay is client-side work
Choose this variant when
- Bidirectional messaging
- Games, chat, collab editing
- When SSE's one-way limit bites
Long-poll fallback
Use long-poll where WebSockets/SSE are blocked (corporate proxies).
Not a primary choice, but valuable as a fallback. Libraries like socket.io handle the negotiation automatically.
Pros
- +Works through any HTTP proxy
- +Every client can reach
- +No sticky LB requirements
Cons
- −Higher latency and request overhead
- −More backend load than SSE/WS
Choose this variant when
- Hostile network environments
- Fallback tier for WebSocket failures
Worked example
Design: a chat app with 1M concurrent users.
Decision tree:
- Bidirectional? Yes (client sends messages, server pushes others'). → WebSocket.
- Browser + mobile clients. WebSocket works on both.
Connection-tier sizing:
- 1M concurrent WS connections. At ~10 KB per connection (buffers + state), that's ~10 GB across the gateway fleet.
- One gateway handles ~50–100k connections comfortably on modern hardware. → 10–20 gateway instances.
- LB: consistent-hash on user_id so reconnects land on the same gateway (preserves in-memory state, or at least makes recovery cheap).
Delivery tier (separate from connection tier):
- Gateways publish inbound messages to Kafka keyed on recipient user_id.
- A "fan-out" service reads Kafka and looks up which gateway owns the recipient's connection (via Redis: user_id → gateway_id); forwards over internal gRPC.
- This separation means you can restart gateways without touching the delivery layer, and you can scale them independently.
Reconnect:
- Client holds a last-seen message id. On reconnect, sends it as a query parameter; server replays missed messages from a per-user buffer (retained 24h).
- Exponential backoff from 1s to 30s.
Rolling restart:
- Drain signal → gateway stops accepting new connections, pushes "please reconnect soon" to existing clients, gives them ~30s to move, then closes. Clients reconnect; LB consistent-hash lands them on a live gateway.
Good vs bad answer
Interviewer probe
“Your app needs real-time updates. What mechanism?”
Weak answer
"WebSocket, because it's real-time."
Strong answer
"Depends on direction and freshness. If it's server→client only (notifications, a live counter, a feed), SSE — browser-native, auto-reconnect, works through HTTP proxies, cheaper than WebSocket. If bidirectional (chat, collab), WebSocket. If it's low-frequency and freshness tolerates seconds, short polling — skip persistent connections entirely. The cost axis at scale is connection count: 1M persistent connections is a real infrastructure item (10–20 gateway VMs, consistent-hash LB, reconnect logic). Don't pay that if polling fits the UX."
Why it wins: Picks by direction, names the cost at scale, identifies specific mechanisms for specific use cases.
When it comes up
- The prompt says "live", "real-time", "push", or "presence"
- Chat, notifications, dashboards, collaborative editing, multiplayer
- The interviewer asks "how does the client get updates?"
- A feature where polling would be too slow or too wasteful
Order of reveal
- 11. Direction + frequency first. Is this one-way server→client or bidirectional, and how fresh does it need to be? That picks the transport — not the word "real-time".
- 22. Pick the mechanism. One-way feed → SSE. Bidirectional → WebSocket. Low-frequency → short polling and skip persistent connections entirely.
- 33. Size the connection count. At 1M concurrent connections, ~10 KB each is ~10 GB of state across the fleet — 10–20 gateways. Persistent connections are a capacity line item.
- 44. Split connection tier from delivery tier. Gateways hold sockets; a pub/sub layer (Kafka/Redis) carries messages between them. That lets me restart gateways without touching delivery and scale each independently.
- 55. Reconnect + LB story. Consistent-hash on user_id so reconnects land on the owning gateway; client holds a last-seen id and replays from a per-user buffer on reconnect.
Signature phrases
- “Direction and frequency pick the transport, not the word "real-time".” — Shows you reason from requirements instead of defaulting to WebSocket.
- “One-way is SSE, bidirectional is WebSocket, low-frequency is polling.” — A crisp decision rule the interviewer can hear you applying.
- “A million persistent connections is a fleet-sizing decision.” — Signals you cost the expensive part instead of hand-waving it.
- “Connection tier and delivery tier are separate, with pub/sub between them.” — The architecture that separates a senior real-time answer from a toy one.
Likely follow-ups
?“How does the load balancer handle a million WebSockets?”Reveal
A persistent connection pins a client to one gateway for its life, so you cannot round-robin per message. Use an L4 load balancer (or session affinity) to place the connection, and consistent-hash on user_id so a reconnect lands on the same gateway. Per-message routing to the recipient is done by a separate fan-out tier that looks up which gateway owns that user, not by the LB.
?“A message is for a user connected to one of 20 gateways. How do you reach them?”Reveal
Maintain a presence registry in Redis: user_id → gateway_id, written when the socket opens, TTL-refreshed by heartbeat. The fan-out service consumes the message from pub/sub, looks up the owning gateway, and forwards over internal gRPC. If the user is offline (no entry), the message goes to their durable inbox for later pull.
?“Rolling deploy drops every connection at once. How do you avoid a thundering reconnect?”Reveal
Drain gracefully: the gateway stops accepting new sockets, signals existing clients to reconnect within a jittered window (say 0–30 s), then closes. Clients reconnect with exponential backoff + jitter so they do not all stampede the new fleet in the same millisecond. The consistent-hash ring keeps each user landing on a predictable node.
Common mistakes
If the update is one-way, SSE is cheaper and simpler. If it's low-frequency, polling is near-free. Defaulting to WebSocket imposes the reconnect + sticky-LB tax on every client.
Connections drop constantly (mobile network, LB restart). Clients without reconnect + last-seen-id replay lose messages. Design this from day 1 or it'll bite.
Each message picks a random gateway that may not own the recipient's connection. Use consistent-hash or a separate fan-out tier that routes to the owning gateway.
1M WebSockets is an infrastructure decision. Size gateways, memory, FDs, and LB support before the feature ships.
Practice drills
Chat app with 100k concurrent users. Budget?Reveal
100k WS connections, ~10 KB each → ~1 GB fleet memory. One gateway can handle 50–100k; so 2 for capacity, 3 for redundancy. Consistent-hash LB on user_id. Messages via Kafka + fan-out service routing to the owning gateway by user_id → gateway_id map in Redis. Per-user message buffer 24h for replay on reconnect.
Interviewer: "SSE or polling for live stock quotes updating every 100 ms?"Reveal
SSE. Polling every 100 ms is an HTTP request every 100 ms per client — astronomical at scale. SSE keeps one connection open and pushes. Alternative at extreme scale: WebSocket + binary protocol to save per-message overhead; but SSE is the starting point.
WebSocket client drops mid-session. What should the server do?Reveal
Detect disconnect (TCP RST or heartbeat timeout). Mark the user as offline (or with a "last seen" timestamp). Buffer messages for replay if you guarantee delivery, or drop them if your UX tolerates it. When the client reconnects, consistent-hash should land them on the original gateway (or any gateway can query the buffer); client sends last-seen-id; server replays missed messages; user is back online.
Cheat sheet
- •One-way, server→client? SSE.
- •Bidirectional? WebSocket.
- •Low-frequency (>10s)? Polling.
- •Blocked by corporate proxy? Long-poll fallback.
- •At scale: separate connection tier from delivery tier; Kafka/pub-sub between them.
- •LB: consistent-hash or sticky for persistent connections.
- •Always: reconnect with backoff + last-seen id replay.
Practice this skill
These problems exercise Real-time transport choices. Try one now to apply what you just learned.
Read this if