intermediatedeep dive

Real-time transport choices

Polling, long-polling, SSE, WebSockets, connection budgets, and reconnect semantics.

~10 min read

Polling, long-polling, SSE, WebSocket — the transport choice is about connection count and direction, not cleverness. Pick wrong and you pay in money or in user-visible lag.

Read this if your last attempt…

You defaulted to WebSocket for "real-time" without costing it
You can't name the difference between long-polling and SSE
You haven't thought about connection count at 1M users
You don't know what a load balancer does with a persistent connection

The concept

Four mechanisms, roughly ordered by connection cost:

Short polling — client asks every N seconds. Dead simple, works through any proxy, wastes battery + bandwidth. Fine when N is tens of seconds and freshness isn't critical.
Long polling — client request blocks server-side until data arrives or timeout. One outstanding request per client. Proxy-friendly, but every message still pays a TCP + TLS + HTTP overhead.
Server-Sent Events (SSE) — one persistent HTTP connection, server → client text stream. Browser built-in, automatic reconnect, survives HTTP proxies. One-way only.
WebSocket — one persistent TCP connection, bidirectional, any payload. Most proxies and LBs support it but need config. Can run over HTTPS (wss://).

Architecture diagram· Mechanism vs direction

Polling pulls. SSE streams server→client. WebSocket is bidirectional and the most expensive.

Real-time mechanisms — pick by direction + frequency.

Mechanism	Direction	Best for	Cost per connection
Short polling	Client asks	Low-frequency updates (seconds to minutes)	Tiny (no persistent socket)
Long polling	Client asks, server holds	Moderate freshness on HTTP-only clients	Medium — one open request per client
SSE	Server → client	Live feeds, notifications, dashboards	One HTTP connection per client
WebSocket	Both	Chat, collaborative editing, games	One TCP + handshake per client

How interviewers grade this

You pick the mechanism by direction and frequency, not by vibe.
You size the connection count at peak and cost the gateway fleet.
You name the reconnect strategy (backoff, resume token).
You name the LB story (sticky, consistent-hash, or session affinity).
If messages fan out, you separate connection layer from delivery layer (pub/sub between them).

Variants

SSE for server-push feeds

One HTTP stream per client; server pushes as events arrive.

Clean fit for notifications, live stock tickers, dashboards. Browser auto-reconnects on drop. Upstream proxies and LBs handle it without special config because it's HTTP.

Pros

+Works through HTTP-aware proxies
+Auto-reconnect in browsers
+Simple protocol (text events + id)

Cons

−One-way only
−Limited browser connection-per-origin quota (~6) unless HTTP/2
−Mobile support weaker than WebSocket

Choose this variant when

One-way server push
Browser clients
Notifications, dashboards, live counters

WebSocket for bidirectional real-time

Persistent TCP; messages both ways; any payload (text or binary).

The right choice for chat, collab editing, games, live trading. Requires LB support (ALB, NGINX with upgrade), reconnection logic on the client, and careful capacity planning on the gateway tier.

Pros

+Bidirectional
+Low per-message overhead after handshake
+Any payload format

Cons

−Most expensive at scale (FDs, memory)
−LB/proxy configuration is non-trivial
−Reconnect + message replay is client-side work

Choose this variant when

Bidirectional messaging
Games, chat, collab editing
When SSE's one-way limit bites

Long-poll fallback

Use long-poll where WebSockets/SSE are blocked (corporate proxies).

Not a primary choice, but valuable as a fallback. Libraries like socket.io handle the negotiation automatically.

Pros

+Works through any HTTP proxy
+Every client can reach
+No sticky LB requirements

Cons

−Higher latency and request overhead
−More backend load than SSE/WS

Choose this variant when

Hostile network environments
Fallback tier for WebSocket failures

Worked example

Design: a chat app with 1M concurrent users.

Decision tree:

Bidirectional? Yes (client sends messages, server pushes others'). → WebSocket.
Browser + mobile clients. WebSocket works on both.

Connection-tier sizing:

1M concurrent WS connections. At ~10 KB per connection (buffers + state), that's ~10 GB across the gateway fleet.
One gateway handles ~50–100k connections comfortably on modern hardware. → 10–20 gateway instances.
LB: consistent-hash on user_id so reconnects land on the same gateway (preserves in-memory state, or at least makes recovery cheap).

Delivery tier (separate from connection tier):

Gateways publish inbound messages to Kafka keyed on recipient user_id.
A "fan-out" service reads Kafka and looks up which gateway owns the recipient's connection (via Redis: user_id → gateway_id); forwards over internal gRPC.
This separation means you can restart gateways without touching the delivery layer, and you can scale them independently.

Reconnect:

Client holds a last-seen message id. On reconnect, sends it as a query parameter; server replays missed messages from a per-user buffer (retained 24h).
Exponential backoff from 1s to 30s.

Rolling restart:

Drain signal → gateway stops accepting new connections, pushes "please reconnect soon" to existing clients, gives them ~30s to move, then closes. Clients reconnect; LB consistent-hash lands them on a live gateway.

Good vs bad answer

Interviewer probe

“Your app needs real-time updates. What mechanism?”

Weak answer

"WebSocket, because it's real-time."

Strong answer

"Depends on direction and freshness. If it's server→client only (notifications, a live counter, a feed), SSE — browser-native, auto-reconnect, works through HTTP proxies, cheaper than WebSocket. If bidirectional (chat, collab), WebSocket. If it's low-frequency and freshness tolerates seconds, short polling — skip persistent connections entirely. The cost axis at scale is connection count: 1M persistent connections is a real infrastructure item (10–20 gateway VMs, consistent-hash LB, reconnect logic). Don't pay that if polling fits the UX."

Why it wins: Picks by direction, names the cost at scale, identifies specific mechanisms for specific use cases.

Interview playbook3–4 min when real-time delivery is a core requirement

When it comes up

The prompt says "live", "real-time", "push", or "presence"
Chat, notifications, dashboards, collaborative editing, multiplayer
The interviewer asks "how does the client get updates?"
A feature where polling would be too slow or too wasteful

Order of reveal

1
1. Direction + frequency first. Is this one-way server→client or bidirectional, and how fresh does it need to be? That picks the transport — not the word "real-time".
2
2. Pick the mechanism. One-way feed → SSE. Bidirectional → WebSocket. Low-frequency → short polling and skip persistent connections entirely.
3
3. Size the connection count. At 1M concurrent connections, ~10 KB each is ~10 GB of state across the fleet — 10–20 gateways. Persistent connections are a capacity line item.
4
4. Split connection tier from delivery tier. Gateways hold sockets; a pub/sub layer (Kafka/Redis) carries messages between them. That lets me restart gateways without touching delivery and scale each independently.
5
5. Reconnect + LB story. Consistent-hash on user_id so reconnects land on the owning gateway; client holds a last-seen id and replays from a per-user buffer on reconnect.

Signature phrases

“Direction and frequency pick the transport, not the word "real-time".”

“One-way is SSE, bidirectional is WebSocket, low-frequency is polling.”

“A million persistent connections is a fleet-sizing decision.”

“Connection tier and delivery tier are separate, with pub/sub between them.”

“Direction and frequency pick the transport, not the word "real-time".” — Shows you reason from requirements instead of defaulting to WebSocket.
“One-way is SSE, bidirectional is WebSocket, low-frequency is polling.” — A crisp decision rule the interviewer can hear you applying.
“A million persistent connections is a fleet-sizing decision.” — Signals you cost the expensive part instead of hand-waving it.
“Connection tier and delivery tier are separate, with pub/sub between them.” — The architecture that separates a senior real-time answer from a toy one.

Likely follow-ups

?“How does the load balancer handle a million WebSockets?”Reveal

A persistent connection pins a client to one gateway for its life, so you cannot round-robin per message. Use an L4 load balancer (or session affinity) to place the connection, and consistent-hash on user_id so a reconnect lands on the same gateway. Per-message routing to the recipient is done by a separate fan-out tier that looks up which gateway owns that user, not by the LB.

?“A message is for a user connected to one of 20 gateways. How do you reach them?”Reveal

Maintain a presence registry in Redis: user_id → gateway_id, written when the socket opens, TTL-refreshed by heartbeat. The fan-out service consumes the message from pub/sub, looks up the owning gateway, and forwards over internal gRPC. If the user is offline (no entry), the message goes to their durable inbox for later pull.

?“Rolling deploy drops every connection at once. How do you avoid a thundering reconnect?”Reveal

Drain gracefully: the gateway stops accepting new sockets, signals existing clients to reconnect within a jittered window (say 0–30 s), then closes. Clients reconnect with exponential backoff + jitter so they do not all stampede the new fleet in the same millisecond. The consistent-hash ring keeps each user landing on a predictable node.

Common mistakes

WebSocket by default

If the update is one-way, SSE is cheaper and simpler. If it's low-frequency, polling is near-free. Defaulting to WebSocket imposes the reconnect + sticky-LB tax on every client.

No reconnect + replay

Connections drop constantly (mobile network, LB restart). Clients without reconnect + last-seen-id replay lose messages. Design this from day 1 or it'll bite.

Round-robin LB in front of WebSockets

Each message picks a random gateway that may not own the recipient's connection. Use consistent-hash or a separate fan-out tier that routes to the owning gateway.

Not sizing connection countAdvanced

1M WebSockets is an infrastructure decision. Size gateways, memory, FDs, and LB support before the feature ships.

Practice drills

Chat app with 100k concurrent users. Budget?Reveal

100k WS connections, ~10 KB each → ~1 GB fleet memory. One gateway can handle 50–100k; so 2 for capacity, 3 for redundancy. Consistent-hash LB on user_id. Messages via Kafka + fan-out service routing to the owning gateway by user_id → gateway_id map in Redis. Per-user message buffer 24h for replay on reconnect.

Interviewer: "SSE or polling for live stock quotes updating every 100 ms?"Reveal

SSE. Polling every 100 ms is an HTTP request every 100 ms per client — astronomical at scale. SSE keeps one connection open and pushes. Alternative at extreme scale: WebSocket + binary protocol to save per-message overhead; but SSE is the starting point.

WebSocket client drops mid-session. What should the server do?Reveal

Detect disconnect (TCP RST or heartbeat timeout). Mark the user as offline (or with a "last seen" timestamp). Buffer messages for replay if you guarantee delivery, or drop them if your UX tolerates it. When the client reconnects, consistent-hash should land them on the original gateway (or any gateway can query the buffer); client sends last-seen-id; server replays missed messages; user is back online.

Cheat sheet

•One-way, server→client? SSE.
•Bidirectional? WebSocket.
•Low-frequency (>10s)? Polling.
•Blocked by corporate proxy? Long-poll fallback.
•At scale: separate connection tier from delivery tier; Kafka/pub-sub between them.
•LB: consistent-hash or sticky for persistent connections.
•Always: reconnect with backoff + last-seen id replay.

Practice this skill

These problems exercise Real-time transport choices. Try one now to apply what you just learned.

chat system

Read this if

Mechanism

Direction

Best for

Cost per connection

Short polling

Client asks

Low-frequency updates (seconds to minutes)

Tiny (no persistent socket)

Long polling

Client asks, server holds

Moderate freshness on HTTP-only clients

Medium — one open request per client

SSE

Server → client

Live feeds, notifications, dashboards

One HTTP connection per client

WebSocket

Both

Chat, collaborative editing, games

One TCP + handshake per client