API Gateway
The single front door to a backend: one entry point that authenticates, rate-limits, routes, and shapes every request so individual services do not each re-implement cross-cutting concerns.
Also worth naming: Amazon API Gateway · Kong · Apigee · NGINX / Envoy (as a gateway) · a BFF (backend-for-frontend)
An API gateway is where you put everything that should happen to every request — auth, rate limiting, routing, TLS — so your services only do business logic. In a microservice design it is almost always the first box clients hit.
What it is
An API gateway is the single entry point that sits in front of your backend services and handles the concerns that are common to every request. A client makes one request to the gateway; the gateway authenticates it, enforces rate limits, routes it to the correct backend service, optionally transforms the request/response, and returns the result. Without it, every microservice would have to re-implement authentication, throttling, TLS, and logging — and clients would have to know the address of every service.
The value is consolidating cross-cutting concerns into one layer. Authentication and authorization, rate limiting and quotas, TLS termination, request validation, response aggregation, caching, and observability all live at the gateway instead of being duplicated (and subtly inconsistent) across dozens of services. A request that fails auth or exceeds its quota is rejected at the door and never reaches a backend, which protects the services and centralizes policy.
In a product-design interview, you draw an API gateway as the first component after the load balancer for almost any system with multiple services — it is where you say "auth and rate limiting happen here." Be precise about what it does versus a plain load balancer (which only distributes traffic) and avoid pushing business logic into it: the gateway is a thin, fast policy-and-routing layer, not a place for domain logic.
When to reach for it
Reach for this when…
- You have multiple backend services and want one client-facing entry point
- Cross-cutting concerns — auth, rate limiting, TLS, logging — should be handled once
- You want to decouple clients from internal service topology (route by path/host)
- You need per-client API keys, quotas, request validation, or response aggregation
Not really this pattern when…
- A single service / monolith with no cross-cutting routing needs (a load balancer may suffice)
- You only need to spread traffic across instances — that is a load balancer, not a gateway
- Pure internal service-to-service calls — a service mesh handles those concerns east-west
- You are tempted to put business/domain logic in it — that belongs in a service
How it works
Three ideas explain what a gateway is for:
1. One front door, cross-cutting concerns handled once. Everything that should happen to every request — authentication, rate limiting, TLS termination, request validation, logging/metrics — is implemented in the gateway, not copied into each service. A request that fails any of these is rejected before it touches a backend, so services receive only clean, authenticated, in-quota traffic and can focus purely on business logic.
Authentication, rate limiting, TLS termination, request validation, and observability are handled once at the gateway instead of being duplicated in every microservice — and a request that fails them never reaches a backend.
2. It decouples clients from your internal topology. Clients call stable public paths (/users, /orders); the gateway maps them to whatever services exist behind it, which can be split, merged, or moved without breaking clients. It can also aggregate several backend calls into one response (a backend-for-frontend pattern) so a mobile client makes one request instead of five.
One entry point authenticates, rate-limits, and routes every request to the right backend service, then returns the response. Services stop re-implementing auth and throttling and focus on business logic.
3. It is a thin policy/routing layer, not a service. The gateway should be fast and stateless, doing auth, throttling, routing, and light transformation — never domain logic, heavy computation, or its own database. Put business logic in a service; the gateway just decides whether and where a request goes.
Two precise distinctions interviewers probe: a load balancer distributes traffic across instances of one thing and does health checks — it does not authenticate or apply per-client policy; a gateway adds those L7 concerns (the two are often layered: LB → gateway → services). And a service mesh (Envoy sidecars) handles the same cross-cutting concerns for internal, service-to-service (east-west) traffic, whereas the gateway handles north-south (client-to-backend) traffic. Also keep the gateway highly available and not a bottleneck — it is on every request path, so it is run redundantly and kept lightweight.
Performance envelope
API gateway characteristics — what to reason about.
| Concern | Handled at the gateway | Why centralize it |
|---|---|---|
| Authentication | Validate token / API key once | Services trust the gateway; no duplicated auth |
| Rate limiting | Per-client / per-key quotas | Reject abuse at the door, protect backends |
| Routing | Path/host → service | Clients decoupled from internal topology |
| TLS | Terminate at the edge | Offload crypto; central cert management |
| Aggregation | Combine backend calls (BFF) | Fewer round trips for mobile/web clients |
| Observability | Central logging, metrics, tracing | One consistent view of all API traffic |
Capabilities in interviews
Authentication & authorization
Validate the caller once at the door so services trust every request they receive.
The gateway verifies the credential — a JWT signature, an OAuth token, or an API key — and rejects anything invalid before it reaches a service:
Authorization: Bearer <jwt> → gateway verifies signature + expiry → forwards identity to serviceIt often forwards a trusted identity header (user id, scopes) so downstream services do per-request authorization without re-validating the token. Centralizing authentication means one place to rotate keys, enforce policy, and audit — instead of every service re-implementing it slightly differently.
Choose this variant when
- Any multi-service system with authenticated clients
- Centralized token / API-key validation
- Consistent auth policy across services
Rate limiting & quotas
Throttle per client, key, or plan at the entry point to protect backends from abuse and overload.
The gateway is the natural home for rate limiting because it sees every request and can reject excess before it costs a backend anything:
per API key: 1000 req/min → 429 + Retry-After when exceededIt enforces per-client quotas (free vs paid tiers), protects against abuse and traffic spikes, and returns proper 429 responses with Retry-After. Layer it with a CDN (volumetric) and per-user app limits, but the gateway is where per-key API quotas live.
Choose this variant when
- Per-client / per-plan API quotas
- Abuse and spike protection at the edge
- Monetized APIs with tiered limits
Routing & service decoupling
Map stable public paths to internal services so clients never know your topology.
One public surface fans out to many services, and you can reshape the backend freely:
/users/* → users-service
/orders/* → orders-service
/v2/orders → orders-service-v2 (canary / versioning)Path/host/version routing lets you split a service, run canaries and blue-green deploys, and version APIs without clients changing their calls. The gateway is the indirection layer that keeps the public contract stable while the internals evolve.
Choose this variant when
- Microservice backends behind one endpoint
- API versioning, canaries, blue-green
- Evolving internal topology without breaking clients
Aggregation & transformation (BFF)
Combine multiple backend calls and reshape payloads so clients make one tailored request.
A backend-for-frontend gateway fans out to several services and merges the results, so a mobile client gets exactly what its screen needs in one round trip:
GET /home → gateway calls profile + feed + notifications → merges → one responseIt can also transform protocols and shapes (REST↔gRPC, trim fields for mobile, inject defaults). Keep this orchestration thin — fan-out and merge, not business rules — and watch latency, since the response is as slow as the slowest backend call.
Choose this variant when
- Mobile/web clients needing fewer round trips
- Per-client response shaping
- Protocol translation at the edge (REST↔gRPC)
Operating knobs
What belongs in the gateway (and what does not)
Put cross-cutting, per-request concerns here: auth, rate limiting, routing, TLS, validation, light transformation, observability. Keep business/domain logic, heavy computation, and stateful data out — those belong in services. A gateway that grows domain logic becomes a fragile monolith and a bottleneck. The test: would every service otherwise duplicate this? If yes, it belongs in the gateway.
Managed vs self-hosted
Managed (AWS API Gateway, Apigee) gives you auth, throttling, and scaling with no ops, at higher per-request cost and less control. Self-hosted (Kong, Envoy, NGINX) gives full control and lower marginal cost but you operate it. Choose by team capacity and how much custom behavior you need.
High availability & latency budget
The gateway is on every request path, so it must be redundant (multi-instance, multi-AZ, behind a load balancer) and fast — every millisecond it adds is paid by every request. Keep plugins lean, cache auth/JWKS lookups, and avoid synchronous heavy work in the request path.
North-south gateway vs east-west mesh
Use the gateway for client-to-backend (north-south) traffic — the public edge. For service-to-service (east-west) concerns (mTLS, retries, circuit breaking, internal rate limits) a service mesh (Envoy sidecars) is the right tool. Many architectures run both; do not stretch the gateway to police all internal calls.
Versus the alternatives
API gateway vs adjacent components.
| Dimension | API gateway | Load balancer | Service mesh |
|---|---|---|---|
| Traffic | North-south (client→backend) | North-south (to a server pool) | East-west (service→service) |
| Core job | Auth, rate limit, route, transform | Distribute load + health checks | mTLS, retries, circuit breaking |
| Layer | L7 (application) | L4 or L7 | L7 sidecar per service |
| Per-client policy | Yes (keys, quotas, auth) | No | Internal policy, not client-facing |
| Position | Front door for clients | In front of instances | Between internal services |
Failure modes & gotchas
Stuffing domain rules, computation, or data access into the gateway turns a thin policy layer into a fragile shared monolith that every team must coordinate on and that bottlenecks every request. Keep it to cross-cutting concerns; business logic belongs in services.
It is on every request path, so one instance (or an overloaded one) takes the whole API down. Run it redundantly across instances and zones behind a load balancer, keep it lightweight, and watch its latency and capacity as carefully as any service.
A load balancer distributes traffic and does health checks; it does not authenticate, apply per-client quotas, or route by API semantics. Calling an LB an "API gateway" misses the cross-cutting concerns that justify the gateway — be precise about which does what (they are often layered).
A BFF that fans out to many backends and waits for all of them is as slow as the slowest call, and one failing backend can fail the whole response. Parallelize the calls, set timeouts, degrade gracefully (partial responses), and keep aggregation shallow.
Validating a JWT or calling an auth service on every request without caching the signing keys (JWKS) or token introspection adds latency and a dependency on every call. Cache verification material and verify signatures locally where possible so auth is fast and resilient.
In production
Netflix
Zuul — the gateway fronting thousands of microservices
Netflix's Zuul is one of the most-cited API gateways in the industry. It is the front door for Netflix's enormous microservice backend, handling billions of requests per day and routing each to the right service while applying cross-cutting concerns centrally: authentication, rate limiting, request routing, and rich dynamic filters for canary testing, traffic shaping, and resilience.
What makes Zuul a textbook example is the filter model — every request passes through pre-routing, routing, and post-routing filters where Netflix applies the concerns that would otherwise be duplicated across every service. They also use it for operational superpowers: routing a slice of traffic to a new service version, shedding load during incidents, and per-device response shaping (a backend-for-frontend). It is the concrete proof of "consolidate cross-cutting concerns at one front door so services only do business logic."
Amazon
From a wall of web servers to a managed API gateway
In Amazon's early architecture, a giant fleet of Apache web servers served as the entry point — the original "API gateway" before the term existed — handling routing and cross-cutting concerns in front of the services behind. As the industry matured, that role was productized: Amazon API Gateway now provides authentication, throttling, request validation, and routing as a fully managed service, scaling automatically with no servers to operate.
The lesson is the managed-vs-self-hosted lever from this page. Managed gateways (API Gateway, Apigee) give you the cross-cutting concerns and elastic scale with zero ops at a higher per-request cost; self-hosted (Kong, Envoy, NGINX) give control and lower marginal cost but you run them. The constant across both eras is the pattern: a single front door that consolidates the concerns every request shares, so backend services don't each reinvent them.
Good vs bad answer
Interviewer probe
“Your system has separate users, orders, and payments services. How do clients talk to them securely without each service re-doing auth and throttling?”
Weak answer
"Each service exposes its own public endpoint and implements authentication and rate limiting itself, and the mobile app calls whichever services it needs directly."
Strong answer
"Put an API gateway as the single front door. Clients hit one endpoint; the gateway terminates TLS, authenticates every request once (verifies the JWT signature, caching the JWKS), enforces rate limits per API key, and routes /users, /orders, /payments to the right service — so each service receives only clean, authenticated, in-quota traffic and implements zero cross-cutting plumbing. It also decouples the app from our topology: we can split or version a service behind stable public paths, and for the mobile home screen the gateway can aggregate profile + orders into one response so the client makes one round trip instead of three. I'd run it redundantly across zones behind a load balancer and keep it thin — no business logic — so it isn't a bottleneck or a single point of failure. Exposing every service directly would duplicate auth and throttling (inconsistently), leak our internal topology to clients, and give us no central place for policy, quotas, or observability."
Why it wins: Names the gateway as the front door, lists the cross-cutting concerns it centralizes (auth, rate limit, TLS, routing), adds topology decoupling and BFF aggregation, makes it HA and thin, and explains precisely why per-service public endpoints are worse.
Interview playbook
When it comes up
- Almost any multi-service / microservice product design
- "Where does auth and rate limiting happen?" — the gateway is the answer
- Mobile/web clients that need fewer round trips (BFF aggregation)
- API versioning, canaries, or hiding internal topology from clients
Order of reveal
- 11. One front door. An API gateway is the single entry point; clients hit one endpoint and never see internal service topology.
- 22. Cross-cutting concerns once. It authenticates, rate-limits, terminates TLS, and validates requests centrally, so services only do business logic.
- 33. Route by path/host. It maps stable public paths to services, which lets me version, canary, and reshape the backend without breaking clients.
- 44. Aggregate where useful. For mobile I can fan out and merge into one response — a BFF — keeping the orchestration thin.
- 55. Thin + HA. No business logic in it, and run it redundantly since it is on every request path.
Signature phrases
- “The gateway is the single front door for cross-cutting concerns.” — States its purpose in one line.
- “Auth, rate limiting, TLS, and routing happen once — not in every service.” — Names exactly what it consolidates.
- “It decouples clients from internal topology.” — Captures the routing/versioning benefit.
- “Keep it thin — policy and routing, never business logic.” — Shows you avoid the classic anti-pattern.
Likely follow-ups
?“How is an API gateway different from a load balancer?”Reveal
A load balancer spreads traffic across instances of a service and removes unhealthy ones via health checks — it answers "which instance serves this request" and adds no per-client policy. An API gateway is an L7 entry point that applies cross-cutting request logic — authentication, rate limiting, routing to the right service by path/host, request/response transformation, observability. They are complementary and usually layered: clients → load balancer → API gateway → services (and the gateway tier is itself load-balanced). Conflating them misses the auth/quota/routing concerns that justify a gateway.
?“Where does authorization happen — gateway or service?”Reveal
Split them. The gateway does authentication (is this a valid, non-expired token / key?) once at the door and forwards a trusted identity (user id, scopes) downstream. Fine-grained authorization — "can this user modify this order?" — usually belongs in the service, because only the service knows the resource and the domain rules. Coarse authorization (this API key may access the orders API at all) can live at the gateway. The pattern is: authenticate centrally, authorize on the resource where the context lives.
?“Does every internal service-to-service call go through the gateway too?”Reveal
No — the gateway is for north-south (client-to-backend) traffic. Forcing all internal east-west calls through it adds a hop and a bottleneck. For service-to-service concerns — mutual TLS, retries, circuit breaking, internal rate limits, tracing — the right tool is a service mesh (Envoy sidecars) that applies those policies between services without a central chokepoint. Many architectures run both: a gateway at the public edge and a mesh internally. Keep their roles distinct.
Worked example
Setup. A product is split into users, orders, and payments services. Mobile and web clients need to call them securely, without each service re-implementing authentication and rate limiting, and without clients knowing the internal topology.
The move. Put an API gateway as the single front door. Clients hit one endpoint; the gateway terminates TLS, authenticates every request once (verifies the JWT signature, caching the JWKS so it's fast), enforces per-API-key rate limits, and routes /users, /orders, /payments to the right service. Each service now receives only clean, authenticated, in-quota traffic and contains zero cross-cutting plumbing.
Decoupling + aggregation. The gateway maps stable public paths to internal services, so I can split, version (/v2/orders), or canary a service behind the same public contract without clients changing. For the mobile home screen I use a backend-for-frontend aggregation: the gateway fans out to profile + orders + notifications and merges them into one response, so the client makes one round trip instead of three — keeping the orchestration thin (fan-out and merge, no business logic).
Auth split. The gateway does authentication (valid token?) and forwards a trusted identity header; fine-grained authorization ("can this user edit this order?") stays in the service that owns the resource and the domain rules.
What breaks. The gateway is on every request path, so I run it redundant across zones behind a load balancer and keep it thin and fast — no domain logic, or it becomes a shared monolith and a bottleneck every team must coordinate through. A BFF aggregation is only as fast as its slowest backend call, so I parallelize the fan-out, set timeouts, and degrade gracefully.
The result. One secure front door, cross-cutting concerns handled once instead of duplicated across services, clients decoupled from internal topology, fewer mobile round trips — and a gateway that stays a thin, highly-available policy-and-routing layer.
Cheat sheet
- •API gateway = single front door that handles cross-cutting concerns for every request.
- •Centralizes auth, rate limiting, TLS termination, routing, validation, observability — so services do not.
- •Routes stable public paths → internal services, decoupling clients from topology (versioning, canaries).
- •Can aggregate backend calls into one response (BFF) — keep orchestration thin, watch latency.
- •Keep it thin: policy + routing, NEVER business logic, computation, or its own database.
- •On every request path → run it redundant + lightweight; not a bottleneck or single point of failure.
- •Authenticate at the gateway; do fine-grained authorization in the service that owns the resource.
- •Gateway = north-south (client→backend); service mesh = east-west (service→service). LB just distributes.
Drills
Why centralize auth and rate limiting in a gateway instead of each service?Reveal
Consistency, security, and simplicity. If every service implements auth and throttling itself, you get subtly different (and sometimes wrong) implementations, many places to rotate keys and change policy, and no single view of API traffic — and an unauthenticated or over-quota request still reaches and costs a backend before being rejected. Handling these once at the gateway means one consistent policy, one place to audit and update, rejection at the door before any backend is touched, and services that contain only business logic. It is the textbook win of consolidating cross-cutting concerns.
Interviewer: "isn't the gateway just a single point of failure?"Reveal
It would be if you ran one instance — so you don't. The gateway is deployed redundantly across multiple instances and availability zones behind a load balancer (or as a managed, inherently-distributed service), so losing an instance or a zone does not take the API down. You also keep it thin and fast — auth, routing, throttling, no heavy logic — so it does not become a performance bottleneck on the every-request path, and you monitor its latency and capacity like any critical component. The principle: the component on every request path must be at least as available as the system behind it.
What should you refuse to put in an API gateway?Reveal
Business/domain logic, heavy computation, and stateful data. The gateway should decide whether and where a request goes — authenticate, throttle, route, lightly transform — and nothing more. The moment it grows order-pricing rules, complex orchestration, or its own database, it becomes a shared monolith that every team must coordinate changes through and that bottlenecks and risks the entire API. Domain logic belongs in the owning service; keep the gateway a thin, fast, stateless policy-and-routing layer.
What it is