Technology·Edge & gateway

API Gateway

The single front door to a backend: one entry point that authenticates, rate-limits, routes, and shapes every request so individual services do not each re-implement cross-cutting concerns.

Also worth naming: Amazon API Gateway · Kong · Apigee · NGINX / Envoy (as a gateway) · a BFF (backend-for-frontend)

~25 min read·15 sections

An API gateway is where you put everything that should happen to every request — auth, rate limiting, routing, TLS — so your services only do business logic. In a microservice design it is almost always the first box clients hit.

What it is

An API gateway is the single entry point that sits in front of your backend services and handles the concerns that are common to every request. A client makes one request to the gateway; the gateway authenticates it, enforces rate limits, routes it to the correct backend service, optionally transforms the request/response, and returns the result. Without it, every microservice would have to re-implement authentication, throttling, TLS, and logging — and clients would have to know the address of every service.

The value is consolidating cross-cutting concerns into one layer. Authentication and authorization, rate limiting and quotas, TLS termination, request validation, response aggregation, caching, and observability all live at the gateway instead of being duplicated (and subtly inconsistent) across dozens of services. A request that fails auth or exceeds its quota is rejected at the door and never reaches a backend, which protects the services and centralizes policy.

In a product-design interview, you draw an API gateway as the first component after the load balancer for almost any system with multiple services — it is where you say "auth and rate limiting happen here." Be precise about what it does versus a plain load balancer (which only distributes traffic) and avoid pushing business logic into it: the gateway is a thin, fast policy-and-routing layer, not a place for domain logic.

When to reach for it

Reach for this when…

You have multiple backend services and want one client-facing entry point
Cross-cutting concerns — auth, rate limiting, TLS, logging — should be handled once
You want to decouple clients from internal service topology (route by path/host)
You need per-client API keys, quotas, request validation, or response aggregation

Not really this pattern when…

A single service / monolith with no cross-cutting routing needs (a load balancer may suffice)
You only need to spread traffic across instances — that is a load balancer, not a gateway
Pure internal service-to-service calls — a service mesh handles those concerns east-west
You are tempted to put business/domain logic in it — that belongs in a service

How it works

Three ideas explain what a gateway is for:

1. One front door, cross-cutting concerns handled once. Everything that should happen to every request — authentication, rate limiting, TLS termination, request validation, logging/metrics — is implemented in the gateway, not copied into each service. A request that fails any of these is rejected before it touches a backend, so services receive only clean, authenticated, in-quota traffic and can focus purely on business logic.

Architecture diagram· Cross-cutting concerns move out of every service into one layer

Authentication, rate limiting, TLS termination, request validation, and observability are handled once at the gateway instead of being duplicated in every microservice — and a request that fails them never reaches a backend.

2. It decouples clients from your internal topology. Clients call stable public paths (/users, /orders); the gateway maps them to whatever services exist behind it, which can be split, merged, or moved without breaking clients. It can also aggregate several backend calls into one response (a backend-for-frontend pattern) so a mobile client makes one request instead of five.

Architecture diagram· The API gateway is the single front door that handles cross-cutting concerns

One entry point authenticates, rate-limits, and routes every request to the right backend service, then returns the response. Services stop re-implementing auth and throttling and focus on business logic.

3. It is a thin policy/routing layer, not a service. The gateway should be fast and stateless, doing auth, throttling, routing, and light transformation — never domain logic, heavy computation, or its own database. Put business logic in a service; the gateway just decides whether and where a request goes.

Two precise distinctions interviewers probe: a load balancer distributes traffic across instances of one thing and does health checks — it does not authenticate or apply per-client policy; a gateway adds those L7 concerns (the two are often layered: LB → gateway → services). And a service mesh (Envoy sidecars) handles the same cross-cutting concerns for internal, service-to-service (east-west) traffic, whereas the gateway handles north-south (client-to-backend) traffic. Also keep the gateway highly available and not a bottleneck — it is on every request path, so it is run redundantly and kept lightweight.

Performance envelope

API gateway characteristics — what to reason about.

Concern	Handled at the gateway	Why centralize it
Authentication	Validate token / API key once	Services trust the gateway; no duplicated auth
Rate limiting	Per-client / per-key quotas	Reject abuse at the door, protect backends
Routing	Path/host → service	Clients decoupled from internal topology
TLS	Terminate at the edge	Offload crypto; central cert management
Aggregation	Combine backend calls (BFF)	Fewer round trips for mobile/web clients
Observability	Central logging, metrics, tracing	One consistent view of all API traffic

Capabilities in interviews

Authentication & authorization

Validate the caller once at the door so services trust every request they receive.

The gateway verifies the credential — a JWT signature, an OAuth token, or an API key — and rejects anything invalid before it reaches a service:

text

Authorization: Bearer <jwt>  → gateway verifies signature + expiry → forwards identity to service

It often forwards a trusted identity header (user id, scopes) so downstream services do per-request authorization without re-validating the token. Centralizing authentication means one place to rotate keys, enforce policy, and audit — instead of every service re-implementing it slightly differently.

Choose this variant when

Any multi-service system with authenticated clients
Centralized token / API-key validation
Consistent auth policy across services

Rate limiting & quotas

Throttle per client, key, or plan at the entry point to protect backends from abuse and overload.

The gateway is the natural home for rate limiting because it sees every request and can reject excess before it costs a backend anything:

text

per API key: 1000 req/min → 429 + Retry-After when exceeded

It enforces per-client quotas (free vs paid tiers), protects against abuse and traffic spikes, and returns proper 429 responses with Retry-After. Layer it with a CDN (volumetric) and per-user app limits, but the gateway is where per-key API quotas live.

Choose this variant when

Per-client / per-plan API quotas
Abuse and spike protection at the edge
Monetized APIs with tiered limits

Routing & service decoupling

Map stable public paths to internal services so clients never know your topology.

One public surface fans out to many services, and you can reshape the backend freely:

text

/users/*    → users-service
/orders/*   → orders-service
/v2/orders  → orders-service-v2   (canary / versioning)

Path/host/version routing lets you split a service, run canaries and blue-green deploys, and version APIs without clients changing their calls. The gateway is the indirection layer that keeps the public contract stable while the internals evolve.

Choose this variant when

Microservice backends behind one endpoint
API versioning, canaries, blue-green
Evolving internal topology without breaking clients

Aggregation & transformation (BFF)

Combine multiple backend calls and reshape payloads so clients make one tailored request.

A backend-for-frontend gateway fans out to several services and merges the results, so a mobile client gets exactly what its screen needs in one round trip:

text

GET /home → gateway calls profile + feed + notifications → merges → one response

It can also transform protocols and shapes (REST↔gRPC, trim fields for mobile, inject defaults). Keep this orchestration thin — fan-out and merge, not business rules — and watch latency, since the response is as slow as the slowest backend call.

Choose this variant when

Mobile/web clients needing fewer round trips
Per-client response shaping
Protocol translation at the edge (REST↔gRPC)

Operating knobs

What belongs in the gateway (and what does not)

Put cross-cutting, per-request concerns here: auth, rate limiting, routing, TLS, validation, light transformation, observability. Keep business/domain logic, heavy computation, and stateful data out — those belong in services. A gateway that grows domain logic becomes a fragile monolith and a bottleneck. The test: would every service otherwise duplicate this? If yes, it belongs in the gateway.

Managed vs self-hosted

Managed (AWS API Gateway, Apigee) gives you auth, throttling, and scaling with no ops, at higher per-request cost and less control. Self-hosted (Kong, Envoy, NGINX) gives full control and lower marginal cost but you operate it. Choose by team capacity and how much custom behavior you need.

High availability & latency budget

The gateway is on every request path, so it must be redundant (multi-instance, multi-AZ, behind a load balancer) and fast — every millisecond it adds is paid by every request. Keep plugins lean, cache auth/JWKS lookups, and avoid synchronous heavy work in the request path.

North-south gateway vs east-west mesh

Use the gateway for client-to-backend (north-south) traffic — the public edge. For service-to-service (east-west) concerns (mTLS, retries, circuit breaking, internal rate limits) a service mesh (Envoy sidecars) is the right tool. Many architectures run both; do not stretch the gateway to police all internal calls.

Versus the alternatives

API gateway vs adjacent components.

Dimension	API gateway	Load balancer	Service mesh
Traffic	North-south (client→backend)	North-south (to a server pool)	East-west (service→service)
Core job	Auth, rate limit, route, transform	Distribute load + health checks	mTLS, retries, circuit breaking
Layer	L7 (application)	L4 or L7	L7 sidecar per service
Per-client policy	Yes (keys, quotas, auth)	No	Internal policy, not client-facing
Position	Front door for clients	In front of instances	Between internal services

Failure modes & gotchas

Putting business logic in the gateway

Stuffing domain rules, computation, or data access into the gateway turns a thin policy layer into a fragile shared monolith that every team must coordinate on and that bottlenecks every request. Keep it to cross-cutting concerns; business logic belongs in services.

The gateway as a single point of failure / bottleneck

It is on every request path, so one instance (or an overloaded one) takes the whole API down. Run it redundantly across instances and zones behind a load balancer, keep it lightweight, and watch its latency and capacity as carefully as any service.

Confusing it with a load balancer

A load balancer distributes traffic and does health checks; it does not authenticate, apply per-client quotas, or route by API semantics. Calling an LB an "API gateway" misses the cross-cutting concerns that justify the gateway — be precise about which does what (they are often layered).

Synchronous aggregation blowing the latency budgetAdvanced

A BFF that fans out to many backends and waits for all of them is as slow as the slowest call, and one failing backend can fail the whole response. Parallelize the calls, set timeouts, degrade gracefully (partial responses), and keep aggregation shallow.

Centralized auth without cachingAdvanced

Validating a JWT or calling an auth service on every request without caching the signing keys (JWKS) or token introspection adds latency and a dependency on every call. Cache verification material and verify signatures locally where possible so auth is fast and resilient.

In production

Netflix

Zuul — the gateway fronting thousands of microservices

Netflix's Zuul is one of the most-cited API gateways in the industry. It is the front door for Netflix's enormous microservice backend, handling billions of requests per day and routing each to the right service while applying cross-cutting concerns centrally: authentication, rate limiting, request routing, and rich dynamic filters for canary testing, traffic shaping, and resilience.

What makes Zuul a textbook example is the filter model — every request passes through pre-routing, routing, and post-routing filters where Netflix applies the concerns that would otherwise be duplicated across every service. They also use it for operational superpowers: routing a slice of traffic to a new service version, shedding load during incidents, and per-device response shaping (a backend-for-frontend). It is the concrete proof of "consolidate cross-cutting concerns at one front door so services only do business logic."

Takeaway: A gateway centralizes auth, routing, rate limiting, and traffic shaping (canaries, load shedding) as request filters — so thousands of services stay focused on business logic.

Amazon

From a wall of web servers to a managed API gateway

In Amazon's early architecture, a giant fleet of Apache web servers served as the entry point — the original "API gateway" before the term existed — handling routing and cross-cutting concerns in front of the services behind. As the industry matured, that role was productized: Amazon API Gateway now provides authentication, throttling, request validation, and routing as a fully managed service, scaling automatically with no servers to operate.

The lesson is the managed-vs-self-hosted lever from this page. Managed gateways (API Gateway, Apigee) give you the cross-cutting concerns and elastic scale with zero ops at a higher per-request cost; self-hosted (Kong, Envoy, NGINX) give control and lower marginal cost but you run them. The constant across both eras is the pattern: a single front door that consolidates the concerns every request shares, so backend services don't each reinvent them.

Takeaway: The "single front door for cross-cutting concerns" pattern is constant; the choice is managed (zero ops, higher per-request cost) vs self-hosted (control, lower marginal cost).

Good vs bad answer

Interviewer probe

“Your system has separate users, orders, and payments services. How do clients talk to them securely without each service re-doing auth and throttling?”

Weak answer

"Each service exposes its own public endpoint and implements authentication and rate limiting itself, and the mobile app calls whichever services it needs directly."

Strong answer

"Put an API gateway as the single front door. Clients hit one endpoint; the gateway terminates TLS, authenticates every request once (verifies the JWT signature, caching the JWKS), enforces rate limits per API key, and routes /users, /orders, /payments to the right service — so each service receives only clean, authenticated, in-quota traffic and implements zero cross-cutting plumbing. It also decouples the app from our topology: we can split or version a service behind stable public paths, and for the mobile home screen the gateway can aggregate profile + orders into one response so the client makes one round trip instead of three. I'd run it redundantly across zones behind a load balancer and keep it thin — no business logic — so it isn't a bottleneck or a single point of failure. Exposing every service directly would duplicate auth and throttling (inconsistently), leak our internal topology to clients, and give us no central place for policy, quotas, or observability."

Why it wins: Names the gateway as the front door, lists the cross-cutting concerns it centralizes (auth, rate limit, TLS, routing), adds topology decoupling and BFF aggregation, makes it HA and thin, and explains precisely why per-service public endpoints are worse.

Interview playbook

Interview playbook1–2 min — usually a quick confident box; deeper on auth-split or gateway-vs-LB questions

When it comes up

Almost any multi-service / microservice product design
"Where does auth and rate limiting happen?" — the gateway is the answer
Mobile/web clients that need fewer round trips (BFF aggregation)
API versioning, canaries, or hiding internal topology from clients

Order of reveal

1
1. One front door. An API gateway is the single entry point; clients hit one endpoint and never see internal service topology.
2
2. Cross-cutting concerns once. It authenticates, rate-limits, terminates TLS, and validates requests centrally, so services only do business logic.
3
3. Route by path/host. It maps stable public paths to services, which lets me version, canary, and reshape the backend without breaking clients.
4
4. Aggregate where useful. For mobile I can fan out and merge into one response — a BFF — keeping the orchestration thin.
5
5. Thin + HA. No business logic in it, and run it redundantly since it is on every request path.

Signature phrases

“The gateway is the single front door for cross-cutting concerns.”

“Auth, rate limiting, TLS, and routing happen once — not in every service.”

“It decouples clients from internal topology.”

“Keep it thin — policy and routing, never business logic.”

“The gateway is the single front door for cross-cutting concerns.” — States its purpose in one line.
“Auth, rate limiting, TLS, and routing happen once — not in every service.” — Names exactly what it consolidates.
“It decouples clients from internal topology.” — Captures the routing/versioning benefit.
“Keep it thin — policy and routing, never business logic.” — Shows you avoid the classic anti-pattern.

Likely follow-ups

?“How is an API gateway different from a load balancer?”Reveal

A load balancer spreads traffic across instances of a service and removes unhealthy ones via health checks — it answers "which instance serves this request" and adds no per-client policy. An API gateway is an L7 entry point that applies cross-cutting request logic — authentication, rate limiting, routing to the right service by path/host, request/response transformation, observability. They are complementary and usually layered: clients → load balancer → API gateway → services (and the gateway tier is itself load-balanced). Conflating them misses the auth/quota/routing concerns that justify a gateway.

?“Where does authorization happen — gateway or service?”Reveal

Split them. The gateway does authentication (is this a valid, non-expired token / key?) once at the door and forwards a trusted identity (user id, scopes) downstream. Fine-grained authorization — "can this user modify this order?" — usually belongs in the service, because only the service knows the resource and the domain rules. Coarse authorization (this API key may access the orders API at all) can live at the gateway. The pattern is: authenticate centrally, authorize on the resource where the context lives.

?“Does every internal service-to-service call go through the gateway too?”Reveal

No — the gateway is for north-south (client-to-backend) traffic. Forcing all internal east-west calls through it adds a hop and a bottleneck. For service-to-service concerns — mutual TLS, retries, circuit breaking, internal rate limits, tracing — the right tool is a service mesh (Envoy sidecars) that applies those policies between services without a central chokepoint. Many architectures run both: a gateway at the public edge and a mesh internally. Keep their roles distinct.

Worked example

Setup. A product is split into users, orders, and payments services. Mobile and web clients need to call them securely, without each service re-implementing authentication and rate limiting, and without clients knowing the internal topology.

The move. Put an API gateway as the single front door. Clients hit one endpoint; the gateway terminates TLS, authenticates every request once (verifies the JWT signature, caching the JWKS so it's fast), enforces per-API-key rate limits, and routes /users, /orders, /payments to the right service. Each service now receives only clean, authenticated, in-quota traffic and contains zero cross-cutting plumbing.

Decoupling + aggregation. The gateway maps stable public paths to internal services, so I can split, version (/v2/orders), or canary a service behind the same public contract without clients changing. For the mobile home screen I use a backend-for-frontend aggregation: the gateway fans out to profile + orders + notifications and merges them into one response, so the client makes one round trip instead of three — keeping the orchestration thin (fan-out and merge, no business logic).

Auth split. The gateway does authentication (valid token?) and forwards a trusted identity header; fine-grained authorization ("can this user edit this order?") stays in the service that owns the resource and the domain rules.

What breaks. The gateway is on every request path, so I run it redundant across zones behind a load balancer and keep it thin and fast — no domain logic, or it becomes a shared monolith and a bottleneck every team must coordinate through. A BFF aggregation is only as fast as its slowest backend call, so I parallelize the fan-out, set timeouts, and degrade gracefully.

The result. One secure front door, cross-cutting concerns handled once instead of duplicated across services, clients decoupled from internal topology, fewer mobile round trips — and a gateway that stays a thin, highly-available policy-and-routing layer.

Cheat sheet

•API gateway = single front door that handles cross-cutting concerns for every request.
•Centralizes auth, rate limiting, TLS termination, routing, validation, observability — so services do not.
•Routes stable public paths → internal services, decoupling clients from topology (versioning, canaries).
•Can aggregate backend calls into one response (BFF) — keep orchestration thin, watch latency.
•Keep it thin: policy + routing, NEVER business logic, computation, or its own database.
•On every request path → run it redundant + lightweight; not a bottleneck or single point of failure.
•Authenticate at the gateway; do fine-grained authorization in the service that owns the resource.
•Gateway = north-south (client→backend); service mesh = east-west (service→service). LB just distributes.

Drills

Why centralize auth and rate limiting in a gateway instead of each service?Reveal

Consistency, security, and simplicity. If every service implements auth and throttling itself, you get subtly different (and sometimes wrong) implementations, many places to rotate keys and change policy, and no single view of API traffic — and an unauthenticated or over-quota request still reaches and costs a backend before being rejected. Handling these once at the gateway means one consistent policy, one place to audit and update, rejection at the door before any backend is touched, and services that contain only business logic. It is the textbook win of consolidating cross-cutting concerns.

Interviewer: "isn't the gateway just a single point of failure?"Reveal

It would be if you ran one instance — so you don't. The gateway is deployed redundantly across multiple instances and availability zones behind a load balancer (or as a managed, inherently-distributed service), so losing an instance or a zone does not take the API down. You also keep it thin and fast — auth, routing, throttling, no heavy logic — so it does not become a performance bottleneck on the every-request path, and you monitor its latency and capacity like any critical component. The principle: the component on every request path must be at least as available as the system behind it.

What should you refuse to put in an API gateway?Reveal

Business/domain logic, heavy computation, and stateful data. The gateway should decide whether and where a request goes — authenticate, throttle, route, lightly transform — and nothing more. The moment it grows order-pricing rules, complex orchestration, or its own database, it becomes a shared monolith that every team must coordinate changes through and that bottlenecks and risks the entire API. Domain logic belongs in the owning service; keep the gateway a thin, fast, stateless policy-and-routing layer.

What it is

API Gateway

The single front door to a backend: one entry point that authenticates, rate-limits, routes, and shapes every request so individual services do not each re-implement cross-cutting concerns.

Also worth naming: Amazon API Gateway · Kong · Apigee · NGINX / Envoy (as a gateway) · a BFF (backend-for-frontend)

~25 min read·15 sections

Concern

Handled at the gateway

Why centralize it

Authentication

Validate token / API key once

Services trust the gateway; no duplicated auth

Rate limiting

Per-client / per-key quotas

Reject abuse at the door, protect backends

Routing

Path/host → service

Clients decoupled from internal topology

TLS

Terminate at the edge

Offload crypto; central cert management

Aggregation

Combine backend calls (BFF)

Fewer round trips for mobile/web clients

Observability

Central logging, metrics, tracing

One consistent view of all API traffic

Dimension

API gateway

Load balancer

Service mesh

Traffic

North-south (client→backend)

North-south (to a server pool)

East-west (service→service)

Core job

Auth, rate limit, route, transform

Distribute load + health checks

mTLS, retries, circuit breaking

Layer

L7 (application)

L4 or L7

L7 sidecar per service

Per-client policy

Yes (keys, quotas, auth)

Internal policy, not client-facing

Position

Front door for clients

In front of instances

Between internal services