How to Design a Rate Limiter in a System Design Interview
A rate limiter system design guide covering requirements, token bucket vs sliding window, distributed counters, Redis, consistency, and abuse trade-offs.
Define what is being limited
Rate limiting is not one requirement. You need to know whether limits apply per user, API key, tenant, IP address, endpoint, region, or a combination. You also need the policy shape: requests per second, daily quota, burst allowance, or cost-based tokens.
The interviewer is listening for fairness and abuse thinking. A consumer API, payments API, and login endpoint should not share the same policy.
Choose the algorithm by product behavior
Token bucket is usually a strong default because it allows bursts while enforcing an average rate. Leaky bucket smooths traffic. Fixed windows are simple but allow boundary bursts. Sliding windows are more accurate but cost more storage and computation.
- Use token bucket when legitimate clients send short bursts.
- Use sliding window when boundary accuracy matters.
- Use quotas when usage is tied to plans, billing, or abuse budgets.
Distribute without pretending it is exact
A distributed rate limiter usually stores counters or buckets in Redis or another low-latency shared store. The challenge is atomic updates, hot keys, multi-region latency, and what happens when the limiter store is unavailable.
Strong answers state the consistency tolerance. For many APIs, approximate enforcement is acceptable if it protects the backend. For payment or security-sensitive endpoints, you may need stricter central enforcement and lower availability during limiter failures.
Discuss failure mode and client experience
Rate limiting is part of the API contract. Return clear 429 responses, include retry hints when possible, log enough context for support, and monitor false positives. When a large customer is unfairly throttled, you need evidence and override tooling.