Technology·Databases

PostgreSQL

The sensible default database for almost every product-design interview — ACID, rich SQL, JSONB, full-text and geo extensions, and far more scale on one node than candidates assume.

Also worth naming: Amazon RDS / Aurora for PostgreSQL · Google Cloud SQL · Supabase · Neon · Citus (sharded)

~25 min read·15 sections

The strongest move with Postgres is refusing to over-engineer: a single well-tuned instance handles tens of thousands of writes per second and terabytes of data, so "I'll start with Postgres and add replicas / partitioning when the numbers demand it" is a senior answer, not a naive one.

What it is

PostgreSQL is a relational, ACID-compliant database. Data lives in tables of typed rows; you query it with SQL, including joins, common table expressions, and window functions; and transactions are atomic, consistent, isolated, and durable by default. When two operations must both happen or neither (create a user and their first post, move money between accounts), a transaction guarantees it.

Its reputation as "boring" is the point — it is predictable, battle-tested, and broadly useful. But modern Postgres is also quietly versatile: a JSONB column gives you schemaless documents inside a relational table; GIN indexes power full-text search; the PostGIS extension does geospatial; INSERT ... ON CONFLICT gives clean idempotent upserts; and it scales reads with replicas and big tables with native partitioning.

The interview framing: Postgres is the default you justify by its properties, not by a SQL-vs-NoSQL debate. "I'll use Postgres because its ACID guarantees let me keep the ledger consistent" is a strong opener. You only move off it when a specific requirement — write volume beyond one primary, key-access at massive scale with minimal ops, or a workload that is genuinely not relational — forces the change.

When to reach for it

Reach for this when…

The default for product-design interviews — user records, orders, relationships, anything transactional
You need ACID guarantees (payments, bookings, inventory, anything where partial writes corrupt state)
The data is relational — entities with relationships you will query across with joins
You want one store that can also do JSON, full-text search, and geo without bolting on a new system

Not really this pattern when…

Sustained write throughput exceeds what one primary can take and you do not want to shard manually (consider DynamoDB / Cassandra)
You need seamless horizontal scale with minimal ops and a strictly key-based access pattern (DynamoDB)
You need sub-millisecond hot-path reads or coordination primitives (put Redis in front)
You need a high-throughput durable event log with replay (Kafka, not a DB table)

How it works

Four mechanisms explain how Postgres behaves:

1. WAL — durability comes from the log, not the table. Every change is first written to the write-ahead log and fsynced; only then is it considered committed. The table pages are updated in memory (the buffer cache) and flushed lazily. This is why a crash recovers cleanly (replay the WAL) and why the WAL is also the replication feed.

Architecture diagram· The write path — WAL first, then the table

A commit is durable once it is in the write-ahead log (WAL), not when the table page is updated. The WAL is also what streaming replicas tail, so it underpins both durability and replication.

2. MVCC — readers and writers do not block each other. An UPDATE does not overwrite a row in place; it writes a new version and marks the old one dead. Each transaction sees a consistent snapshot based on transaction ids, so a long read never blocks a write and vice versa. The cost is bloat: dead row versions accumulate and VACUUM (usually autovacuum) must reclaim them. A long-running transaction holds back vacuum and is the classic source of table bloat.

Architecture diagram· MVCC — readers never block writers

An UPDATE writes a new row version rather than overwriting in place. Each version is stamped with the transaction id that created/deleted it, so every transaction sees a consistent snapshot. The cost is dead tuples that VACUUM must reclaim.

3. Indexes + the planner. Postgres has a cost-based query planner that chooses how to execute a query. Your job is to give it the right indexes: B-tree for equality and range (the default), GIN for full-text and JSONB containment, GiST for geospatial / ranges, BRIN for huge naturally-ordered tables (time-series). EXPLAIN ANALYZE shows what the planner actually did — a sequential scan on a large hot query is the smell to fix.

4. One primary, many replicas. All writes go to a single primary; reads scale out to streaming replicas that tail the WAL. Replication is asynchronous by default, so replicas lag by milliseconds to seconds — fine for most reads, but a "read-your-own-write" flow must either read from the primary or wait for the replica to catch up.

Architecture diagram· Scaling reads with replicas — writes stay on one primary

Reads fan out to async streaming replicas; all writes go to the single primary. This scales reads horizontally but not writes, and replicas can serve slightly stale data.

Performance envelope

PostgreSQL performance envelope — the numbers to quote.

Dimension	Number	Why it matters
Write throughput	~10K–50K TPS on one primary	Far more than candidates assume — do not propose sharding at 1K QPS
Storage per instance	Comfortable into the 10 TB+ range	Vertical scale goes a long way before partitioning or sharding
Read scaling	Linear with N read replicas	Reads scale horizontally; the primary does not
Write scaling	Single primary is the ceiling	When writes outgrow one node you partition or shard — the real limit
Connections	Hundreds before pain	Each connection is a process; front it with PgBouncer to pool thousands of clients
Replica lag	Milliseconds to seconds (async)	Replicas can serve stale reads — mind read-your-writes

Capabilities in interviews

ACID transactions

Group multiple writes into one atomic, all-or-nothing unit — the core reason to pick Postgres.

When several changes must succeed or fail together, wrap them in a transaction:

sql

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;   -- both or neither; a crash mid-way rolls back

This is the property you name explicitly in an interview: "I use Postgres here because the transfer has to be atomic — I can't have the debit succeed and the credit fail." For contended rows, SELECT ... FOR UPDATE takes a row lock so two transactions can't both spend the same balance; for optimistic concurrency, a version column checked on UPDATE avoids holding locks.

Choose this variant when

Money movement, inventory decrement, booking a seat
Any multi-row change that must not partially apply
Enforcing invariants the application cannot safely check

Idempotent upserts & constraints

Unique constraints plus ON CONFLICT make "create-or-update" safe under retries.

Let the database enforce uniqueness and absorb duplicate writes:

sql

INSERT INTO idempotency_keys (key, response)
VALUES ($1, $2)
ON CONFLICT (key) DO NOTHING;   -- duplicate request → no-op

A UNIQUE constraint is a real correctness boundary (two concurrent inserts can't both win), unlike an "index that happens to be unique" enforced in app code. This is the backbone of idempotency keys, "register or fetch existing user", and any retry-safe create — the DB does the dedup under every concurrency scenario.

Choose this variant when

Idempotency-key storage for retry-safe writes
Create-if-missing flows (users, accounts)
Enforcing real uniqueness (usernames, short codes)

JSONB — schemaless when you need it

Store flexible documents inside a relational table, indexed and queryable.

A JSONB column gives you NoSQL-style flexibility without a second database:

sql

ALTER TABLE events ADD COLUMN payload JSONB;
CREATE INDEX ON events USING GIN (payload);          -- index containment
SELECT * FROM events WHERE payload @> '{"type":"signup"}';

This is the answer to "but I have evolving / semi-structured data" — you keep the relational core (foreign keys, transactions, joins) and put the flexible part in JSONB, indexed with GIN. It undercuts the reflexive "I need NoSQL because my schema changes" claim that interviewers treat as a yellow flag.

Choose this variant when

Evolving or sparse attributes on an otherwise relational entity
Event payloads, settings blobs, third-party API responses
Avoiding a separate document store for a small slice of flexible data

Full-text search (tsvector + GIN)

Built-in search good enough for millions of rows — no separate engine.

Postgres has real full-text search, with stemming and ranking:

sql

CREATE INDEX ON articles USING GIN (to_tsvector('english', body));
SELECT id, ts_rank(to_tsvector('english', body), q) AS rank
FROM articles, to_tsquery('english', 'redis & cluster') q
WHERE to_tsvector('english', body) @@ q
ORDER BY rank DESC LIMIT 20;

Up to roughly 10M documents this is genuinely sufficient and keeps search transactionally consistent with your data — a major simplification over running Elasticsearch. You graduate to a dedicated search engine only when scale, faceting, or custom relevance tuning exceed what this offers.

Choose this variant when

Search over up to ~10M rows
Keeping search consistent with the primary data
Avoiding the ops cost of a separate search cluster

Advisory locks & LISTEN/NOTIFY

Lightweight coordination and pub/sub without standing up another system.

Two underused features. Advisory locks give you a named application-level lock backed by the DB:

sql

SELECT pg_try_advisory_lock(42);   -- e.g. "only one worker runs this job"

LISTEN/NOTIFY is a simple publish/subscribe channel:

sql

LISTEN job_ready;            -- consumer
NOTIFY job_ready, 'job:991'; -- producer (often from a trigger)

These let a modest system get leader-election-ish coordination and event nudges from the database it already runs — handy for "process each row once" workers — though for heavy coordination you still reach for a dedicated system (etcd) or a real broker (Kafka).

Choose this variant when

Single-runner jobs without a separate lock service
Nudging workers when rows change (outbox-lite)
Small systems that want to avoid extra infrastructure

Partitioning & read replicas

Native table partitioning for huge tables; streaming replicas for read scale.

When a table gets large, declarative partitioning splits it by range/list/hash so queries touch only relevant partitions and old data is dropped cheaply:

sql

CREATE TABLE events (id bigint, created_at timestamptz, ...)
  PARTITION BY RANGE (created_at);
-- monthly partitions; DROP an old partition to expire data instantly

For read scale, add streaming replicas and route read-only queries to them — reads scale horizontally while writes stay on the primary. The two together (partition the hot table, offload reads to replicas) take a single Postgres a very long way before you ever need to shard across primaries.

Choose this variant when

Time-series / append-heavy tables with retention
Read-heavy workloads that outgrow one node's read capacity
Deferring sharding by scaling vertically + replicas first

Operating knobs

Indexing strategy

The single biggest performance lever. B-tree for equality/range, GIN for full-text and JSONB, GiST for geo/ranges, BRIN for huge time-ordered tables. Index the columns your hot queries filter and sort on; use EXPLAIN ANALYZE to confirm the planner uses them. Every index also taxes writes, so cap them to the ones that serve real queries.

Connection pooling

Each Postgres connection is a backend process, so a few hundred is the practical ceiling. A serverless or high-concurrency app must put PgBouncer (transaction-pooling mode) in front, multiplexing thousands of client connections onto a small pool. "Too many connections" is a classic outage that pooling prevents.

Replication: async vs sync

Async streaming replication (default) is fast but a failover can lose the last few unreplicated writes (non-zero RPO). Synchronous replication waits for a replica to confirm, giving RPO≈0 at the cost of write latency. Choose per the data: async for most, sync for a ledger you cannot lose.

Vacuum / autovacuum tuning

MVCC produces dead tuples that autovacuum reclaims. On high-write tables, under-tuned autovacuum lets bloat grow, degrading performance, and a long-running transaction blocks vacuum entirely. Keep transactions short and watch table/index bloat on write-heavy workloads.

Versus the alternatives

PostgreSQL vs the usual alternatives.

Dimension	PostgreSQL	MySQL	DynamoDB
Data model	Relational + JSONB + extensions (PostGIS, FTS)	Relational	Key-value / document
Consistency	ACID, strong by default	ACID (InnoDB)	Tunable strong or eventual per read
Scaling	Vertical + read replicas; sharding is manual	Similar	Horizontal, automatic, fully managed
Queries	Rich SQL: joins, CTEs, window fns	Rich SQL	Key-access only; design tables per access pattern
Best for	Default for product design + relational data	Same niche, slightly leaner footprint	Predictable key-access at massive scale, ops-light

Failure modes & gotchas

Connection exhaustion

Because each connection is a process, a few hundred is the ceiling. A spike of clients (or a serverless fleet) opening connections directly exhausts them and the database starts refusing connections. Front Postgres with PgBouncer transaction pooling; treat raw connections as a scarce resource.

Unindexed query → sequential scan

A query filtering on an unindexed column scans the whole table — fine at thousands of rows, fatal at millions. Add the right index and confirm with EXPLAIN ANALYZE. The classic interview tell is proposing a query without saying which index serves it.

Long transactions & vacuum bloatAdvanced

A transaction left open for minutes holds back autovacuum, so dead tuples accumulate and the table bloats — queries slow down and disk grows. Keep transactions short, and never wrap slow external calls inside an open DB transaction.

The single-primary write ceilingAdvanced

Replicas scale reads, not writes — every write still funnels through one primary. When sustained write throughput exceeds one node, you must partition the workload or shard across primaries (Citus, app-level sharding), which is a real project. Recognise the ceiling before you hit it.

Replica lag breaking read-your-writes

Async replicas lag, so a user who just wrote and immediately reads from a replica may not see their change. For those flows, route the read to the primary (or wait for the replica LSN), rather than assuming replicas are instantly consistent.

In production

Notion

Sharding Postgres only when one primary truly ran out

Notion ran its entire product on a single Postgres instance for years, and only sharded when the table holding every "block" (the unit of content in Notion) grew past what one primary could serve — into the hundreds of billions of rows. Their public write-up is a masterclass in not sharding prematurely: they pushed vertical scaling and connection pooling (PgBouncer) a long way first, and when they finally sharded, they partitioned by workspace id across many logical shards mapped onto a smaller number of physical Postgres instances.

The takeaway interviewers love is the discipline: a single Postgres handled a wildly successful product to enormous scale before sharding became necessary, and even then the relational model and ACID guarantees were worth keeping rather than fleeing to NoSQL.

Takeaway: A single Postgres scales much further than people assume — pool connections and scale vertically first; shard by a tenant key only when one primary is genuinely the ceiling.

Instagram

Postgres at massive scale with logical sharding

Instagram built on Postgres and scaled it to hundreds of millions of users by logically sharding — thousands of logical shards spread across a smaller set of physical Postgres machines, with a globally-unique id scheme (a custom 64-bit id encoding timestamp + shard id + sequence) so every row knows which shard it belongs to. They leaned on Postgres features rather than abandoning them: partial and functional indexes to keep hot queries fast, and careful schema design to keep the working set in memory.

It is a canonical example that "Postgres can't scale" is a myth: the relational model, indexing flexibility, and a deliberate sharding scheme took it to social-network scale while preserving SQL.

Takeaway: Logical shards over fewer physical nodes, plus a shard-encoding id scheme, let Postgres scale to hundreds of millions of users without giving up SQL.

Good vs bad answer

Interviewer probe

“You're designing a payments / wallet system. What database, and why?”

Weak answer

"I'd use a NoSQL database like DynamoDB because it scales better and payments need to be fast and highly available."

Strong answer

"PostgreSQL, specifically for its ACID transactions. A transfer debits one account and credits another, and those have to be atomic — a crash can't leave money debited but not credited. I wrap the two updates in a transaction with SELECT ... FOR UPDATE on the rows so two concurrent transfers can't double-spend a balance, and I store an idempotency key with a UNIQUE constraint so a retried request can't post the transfer twice. A single primary comfortably handles tens of thousands of TPS, which is plenty for most payment volumes; I scale reads (statements, balance checks) with replicas, and only consider sharding if write volume genuinely outgrows one node. I'd reach for NoSQL only if the access pattern were simple key-lookups at a scale Postgres couldn't take — but here the correctness guarantees are the whole point, so trading them away for theoretical scale would be backwards."

Why it wins: Leads with the specific property (ACID) the problem demands, names the concrete mechanisms (FOR UPDATE, unique-constraint idempotency), sizes the throughput honestly, and explicitly rejects the SQL-vs-NoSQL reflex in favour of matching guarantees to requirements.

Interview playbook

Interview playbook2–4 min during the data-store decision; revisited whenever transactions or scaling come up

When it comes up

The default datastore choice in almost any product-design round
Anything transactional — payments, orders, bookings, inventory
The interviewer asks "SQL or NoSQL?" (answer with properties, not dogma)
When you need relational queries, joins, or strong uniqueness

Order of reveal

1
1. Justify by property, not by category. I pick Postgres because the workload needs ACID transactions and relational queries — not because "SQL is better than NoSQL".
2
2. Size it honestly. One primary handles tens of thousands of TPS and terabytes, so I start there and add complexity only when the numbers demand it.
3
3. Name the index. For each hot query I state the index that serves it — a B-tree here, a GIN index for this JSONB / full-text lookup.
4
4. Scale reads, then writes. Reads scale with replicas; writes stay on the primary, so I partition the hot table and only shard when write volume truly outgrows one node.
5
5. Guard the operational edges. Connection pooling with PgBouncer, short transactions to avoid vacuum bloat, and read-your-writes handled by reading the primary when needed.

Signature phrases

“I'll use Postgres because its ACID guarantees keep this consistent.”

“A single primary handles tens of thousands of TPS — I won't shard prematurely.”

“Replicas scale reads; writes stay on the primary.”

“A unique constraint is a correctness boundary, not just an index.”

“I'll use Postgres because its ACID guarantees keep this consistent.” — Justifies the choice by the property the problem needs.
“A single primary handles tens of thousands of TPS — I won't shard prematurely.” — Signals you know the real capacity and avoid over-engineering.
“Replicas scale reads; writes stay on the primary.” — Shows you know exactly where the scaling ceiling is.
“A unique constraint is a correctness boundary, not just an index.” — Demonstrates idempotency / concurrency awareness.

Likely follow-ups

?“When would you actually move off Postgres?”Reveal

When a specific requirement forces it: sustained write throughput beyond what one primary can take and you don't want to run manual sharding (DynamoDB/Cassandra); a strictly key-access pattern at massive scale where you want managed horizontal scaling and minimal ops (DynamoDB); or sub-millisecond hot-path reads / coordination (Redis in front). I move for a named reason, not because a table got big — partitioning and replicas handle "big" for a long time.

?“Your read replicas are lagging and users don't see their own writes. What do you do?”Reveal

For read-your-writes flows, route that specific read to the primary (or pin the session to the primary for a short window after a write). Alternatively, wait until the replica has applied the write's WAL position (LSN) before reading from it. I would not blanket-route everything to the primary — that defeats the replicas — just the flows where a user reads immediately after writing.

?“How do you handle 50,000 concurrent client connections?”Reveal

Not as 50,000 Postgres connections — each is a backend process and the practical ceiling is in the hundreds. I put PgBouncer in transaction-pooling mode in front, so thousands of clients multiplex onto a small pool of real connections that are only held for the duration of each transaction. The app talks to PgBouncer; PgBouncer rations the scarce real connections.

Worked example

Setup. Design the data layer for a ticket-booking system (think a smaller Ticketmaster): users browse events, then buy a specific seat. The hard requirement is that two people can never buy the same seat — and the interviewer is watching for how you guarantee it.

The move. This is an ACID problem, so Postgres is the answer and I say why out loud: I need a transactional guarantee that a seat goes to exactly one buyer. The core tables are events, seats(event_id, seat_no, status, held_by, held_until), and bookings. The purchase is one transaction:

sql

BEGIN;
  SELECT status FROM seats
   WHERE event_id = $1 AND seat_no = $2
   FOR UPDATE;                 -- row lock: serializes contenders
  -- if already sold, abort
  UPDATE seats SET status='sold', held_by=$user WHERE ...;
  INSERT INTO bookings (...);
COMMIT;

SELECT ... FOR UPDATE takes a row-level lock so two concurrent buyers of the same seat are serialized — the second blocks, then sees sold and aborts. The booking is idempotent on a client request key (a UNIQUE constraint on idempotency_keys) so a retried "buy" can't double-insert.

Capacity check. Even a popular on-sale is a few thousand purchases/sec at peak — a single Postgres primary (tens of thousands of TPS) handles the writes without sharding. The pressure is on reads: tens of millions of fans refreshing the seat map. So I scale reads with replicas and a Redis cache for the seat-availability view, and keep the write path — the actual purchase — on the single primary where the lock and the ACID guarantee live.

What breaks. The row lock is also the contention point: for a single hot seat, contenders queue. That's correct (one winner) but I keep the transaction tiny — lock, check, update, commit, no slow calls inside it — so locks are held for microseconds. I'd also watch connection count: a flood of buyers means I front Postgres with PgBouncer so thousands of clients multiplex onto a small pool.

The result. Provably no double-sells (the DB enforces it under every concurrency scenario), a write path that fits comfortably on one primary, reads scaled out to replicas + cache, and idempotent purchases — all from properties Postgres gives you for free, which is exactly why you name it for this problem.

Cheat sheet

•Default for product design. Justify by property (ACID, relational), not "SQL vs NoSQL".
•One primary ≈ 10K–50K TPS and 10 TB+. Don't shard at 1K QPS.
•WAL = durability + replication feed. Commit is durable when it's fsynced to the WAL.
•MVCC: updates write new versions; VACUUM reclaims dead ones. Keep transactions short.
•Indexes: B-tree (default), GIN (FTS/JSONB), GiST (geo), BRIN (huge time-ordered). Verify with EXPLAIN ANALYZE.
•Scale reads with replicas; writes stay on the primary. Partition big tables; shard last.
•Each connection is a process → use PgBouncer. Async replicas lag → mind read-your-writes.
•JSONB + GIN for flexible data; tsvector + GIN for search; ON CONFLICT for idempotent upserts.

Drills

Why is a commit durable once it is in the WAL, before the table page is even updated?Reveal

Because recovery replays the WAL. The write-ahead log records the change and is fsynced to disk at commit; the actual table pages are updated in the buffer cache and flushed lazily. If the server crashes before those pages hit disk, startup replays the WAL to reconstruct them. So durability is a property of the log, which is also why the WAL is what streaming replicas consume.

Interviewer: "your table has 200M rows and a query on `email` is slow. Why, and fix it?"Reveal

No usable index on email, so the planner does a sequential scan of all 200M rows. Add a B-tree index on email (or a unique index if emails are unique), and confirm with EXPLAIN ANALYZE that the plan switches from Seq Scan to an Index Scan. If the column is high-cardinality and frequently filtered, this turns a multi-second scan into a sub-millisecond lookup.

You're told "Postgres can't scale, use NoSQL". How do you respond?Reveal

Push back with numbers. A single primary handles tens of thousands of TPS and terabytes; reads scale with replicas and big tables scale with partitioning. That covers the vast majority of products. Postgres's real ceiling is write throughput on one primary — when sustained writes exceed that, you partition or shard. So "NoSQL for scale" is only right when the workload is genuinely key-access at a write volume beyond one node; otherwise it trades away ACID and joins for scale you don't need yet.

What it is

PostgreSQL

The sensible default database for almost every product-design interview — ACID, rich SQL, JSONB, full-text and geo extensions, and far more scale on one node than candidates assume.

Also worth naming: Amazon RDS / Aurora for PostgreSQL · Google Cloud SQL · Supabase · Neon · Citus (sharded)

~25 min read·15 sections

Dimension

Number

Why it matters

Write throughput

~10K–50K TPS on one primary

Far more than candidates assume — do not propose sharding at 1K QPS

Storage per instance

Comfortable into the 10 TB+ range

Vertical scale goes a long way before partitioning or sharding

Read scaling

Linear with N read replicas

Reads scale horizontally; the primary does not

Write scaling

Single primary is the ceiling

When writes outgrow one node you partition or shard — the real limit

Connections

Hundreds before pain

Each connection is a process; front it with PgBouncer to pool thousands of clients

Replica lag

Milliseconds to seconds (async)

Replicas can serve stale reads — mind read-your-writes

CREATE INDEX ON articles USING GIN (to_tsvector('english', body)); SELECT id, ts_rank(to_tsvector('english', body), q) AS rank FROM articles, to_tsquery('english', 'redis & cluster') q WHERE to_tsvector('english', body) @@ q ORDER BY rank DESC LIMIT 20;

Dimension

PostgreSQL

MySQL

DynamoDB

Data model

Relational + JSONB + extensions (PostGIS, FTS)

Relational

Key-value / document

Consistency

ACID, strong by default

ACID (InnoDB)

Tunable strong or eventual per read

Scaling

Vertical + read replicas; sharding is manual

Similar

Horizontal, automatic, fully managed

Queries

Rich SQL: joins, CTEs, window fns

Rich SQL

Key-access only; design tables per access pattern

Best for

Default for product design + relational data

Same niche, slightly leaner footprint

Predictable key-access at massive scale, ops-light

BEGIN; SELECT status FROM seats WHERE event_id = $1 AND seat_no = $2 FOR UPDATE; -- row lock: serializes contenders -- if already sold, abort UPDATE seats SET status='sold', held_by=$user WHERE ...; INSERT INTO bookings (...); COMMIT;